Data Quality Rules

What are Data Quality Rules (Data Validation Rules)?

Data quality rules (also known as data validation rules) are, like automation rules, special forms of business rules. They clearly define the business requirements for specific data. Ideally, data validation rules should be "fit for use", i.e. appropriate for the intended purpose.

Since data quality rules can be used to check the quality of data and data records, these are the primary tools for determining Data Quality. By checking against the validation rules, it is possible to test whether the data meet the defined criteria and possess the required attributes. In this way, potential weak points (e.g., in processes) can be detected and recommendations for action can be derived.

Data quality rules allow for the measurement of different data quality dimensions, such as

  • The contextual accuracy of values (correctness, accuracy)
  • The consistency among values (consistency)
  • The allowed format of values (representational consistency, accuracy)
  • The completeness of values

About the author

Simon Schlosser
Head of Product

"Data quality rules are the hidden champions of data quality management because they enable both in-depth analyses of data quality as well as efficient data cleansing."

How does a data quality rule look?

A ready-to-use data quality rule is an algorithm that checks the structure, format, and arrangement of data and matches it with previously defined requirements. However, data validation rules are first generally conceptualized and described before developing an executable, machine-readable validation rule. The development process usually follows this scheme:

  1. Documentation of the technical requirements:
    • Technical description of the data quality rule
    • Determination of the importance/priority of the rule for company processes
    • Clear overview showing dependent data elements and data quality rules
  2. Translation into machine-readable format:
    • Development of a machine-readable validation rule
    • Definition of test cases
  3. Realization or implementation

The data rule should be documented in a technically oriented manner, i.e., written in the language of the specialist department so that the functionality and the requirements for the data can be understood by all those involved.

A concrete example of this would be a simple data rule that checks the postal codes in Germany within the company master data; they are only accepted as complete if their values meet the following criteria:

  • The postal code may only contain numbers and no letters or special characters (Representational consistency, format rule)
  • The postal code must be five integers long (Representational consistency, format rule)
  • The postal code must be consistent with the specified city (consistency)
  • The postal code must actually be assigned (accuracy)

Why do you need data validation rules?

Data quality rules can be used to analyze and evaluate the quality of specific data sets. The insights gained from this, such as problem areas and recommendations for action, then provide the basis for subsequent data cleansing or data enrichment, such as the removal of duplicates or the addition of missing data elements. For example, they can be used to supplement incomplete address data values.

The analysis and cleansing of data, such as business partner data, can be automated with the help of data quality rules; thus, saving on resources. Thanks to powerful cloud technologies, the processing of large data volumes is no longer a problem.

Ready to use data quality rules from CDQ Data Management Solutions

Save time and effort in development, documentation and review of quality rules.

Benefit from knowledge and good practices of global companies.

Get support for customized data quality rules implementation.

Data Quality Rules in the CDQ Metadata Wiki

In our Data Sharing Community Wiki, all of our data quality rules are documented in a form that is understandable by business users referencing the defined data model concepts in the rule definitions. Data Quality Rules Documentation

Community-based data quality rules save resources during development

By using ready-made data quality rules, the effort for implementation is reduced considerably because the resources for development, documentation and verification are largely eliminated.

Using AI/ML can even further optimize an existing set of data rules. For example, in so-called data quality rule mining, new data validation rules can be identified by pattern analysis of data sets, which ultimately leads to an improvement of existing rules.

  On average, our member companies save over 85 man-days in designing and creating data quality rules

On average, our member companies use 30% of the approximately 1,700 data quality rules. This means that business and data management professionals spend a total of 2,275 man-hours on research, documentation and testing only. These can be saved by using the already tested ready-to-use CDQ data quality rules. Another benefit is that IT saves on implementation costs for each individual rule. These often amount to several hundred euros per rule.

  Continuous savings in the maintenance of data validation rules

In addition, the companies save the annual maintenance costs for these rules, which amounts to about 85 man-days per year, in the following years.

Use of data quality rules within CDQ Data Management Services

Data quality rules form the basis of our data management services. Through their use in data validation or continuous data quality measurement, they also ensure ongoing data quality assurance for shared data among the data sharing community.

CDQ currently has more than 1,700 data quality rules, which are continually being improved upon through cooperation with the companies in our community. In this way, the effort for maintenance and further development is not only spread over several shoulders, but everyone can also benefit from the know-how of the fellow member companies.

Some rules are also checked against reference sources, such as European VAT numbers in special databases or, as in the example above, the postal codes in Germany. The community also maintains its own reference data for which there are no, or no trustworthy, external sources, such as official legal forms of businesses in a particular country.

If a company has special business requirements for a certain data format for which there is no explicit data quality rule yet, our software specialists work together with the customer to develop a fitting solution. In this way, we also enable customer-specific extensions, e.g., for individual data fields that are not otherwise used by any other member of the community.

What is Data Quality?

Data quality is a measure for the suitability of data for specific requirements in the business processes, where it is used. A low level of data quality will reduce the value of the data assets in the company because its usability is minimal. Companies are, therefore, striving to achieve the quality of data required by the business strategy using data quality management.

Data Quality Defines How Well-Suited Data Sets are for Intended Tasks

Data quality characterizes the degree of how given data objects satisfy the needs (fitness for use) of consuming business processes.

In a broader sense, it refers to both the quality of data content as well as the performance of the underlying data management processes.

Data quality measurement is used to assess the data quality level for selected quality dimensions that are relevant to the selected business uses. Typical examples for data quality dimensions are completeness, consistency, validity or timeliness (Fig. 1)

Data quality is Essential for the Value of Data

Poor data quality has a negative impact on the value of data (as reflected by the popular idea of "garbage in, garbage out"). In the digital economy, the role of data is changing. Data is changing from a secondary asset that supports business processes and decision-making even to a primary asset enabling digital business strategies and business models. Recent studies identify data management and data quality as two major pain points when it comes to launching business intelligence and advanced analytics/data science initiatives.

Do you need more information about data quality in the corporate area or would you like to talk to one of our data quality experts? Just feel free to contact us!

Dimensions of Data Quality
Fig. 1: Dimensions of Data Quality

CDQ Sample Vendor in Gartner Hype Cycle

In the recent Hype Cycle for Data and Analytics Governance and Master Data Management 2020, CDQ was recognized among three vendors in the Interenterprise MDM category. CDQ as a Sample Vendor in Gartner Hype Cycle 2020

How CDQ Business Partner Lookup Works

Watch this video to get a feeling how you can easily maintain your customer and vendor master data:


Fast and

Without duplicates!

See how the web app works and how it functions integrated in a SAP system. Find out which successful companies already rely on CDQ Data Management services.

Data Management Services

Whether data quality analytics, address validation, deduplication or bank account verification: Our innovative data management services help you to enhance the quality of your vendor and customer master data. Data Management Services

Data Maintenance Cost Reduction Calculator

Benefit now from the Data Shareconomy and reduce your IT costs! Calculate here fast and easily the potential cost reduction for your company by using our data management services! Calculate your cost reduction potential now!

Success Story: Data Management at Schaeffler Group

In an article recently published in the specialist magazine "Big Data Insider", Markus Rahm provides insights into data management at Schaeffler. And how it has evolved with the help of CDQ. Data Management at the Schaeffler Group

Data Sharing Community

Better quality, less manual efforts through Data Sharing: Our Data Sharing Community helps companies improve their business partner data together, share the burden of data maintenance, and reliably protect all participants against invoice fraud. Want to know more? CDQ Data Sharing Community

'Introducing DQR-1297 or The Good Data Quality Rule'

Meet our virtual employee DQR-1297 of our international AI team of data quality rules: Its main task is to identify invalid legal forms from customer and vendor organizations within all data sets. The real challenge is remembering all valid legal forms around the globe. Introducing DQR-1297
Go to top