Open data sources can improve the quality of business partner data
External data sources can be used for curating and validating business partner and address data.
- Curation: Covers standardization, enrichment, cleansing, translation, and geo-coding of addresses
This functionality is key in order to enable cross-corporate data management because it supports dealing with different languages, abbreviations, formatting of addresses, etc.
- Data validation: Checks given business partner and address data for data defects according to the data requirements postulated (for example, by the CDQ Data Sharing Community.
Typical Open Data Sources for Company Data
Company data is mainly retrieved from two different kinds of sources:
- Data sources that explicitly provide information about companies, e.g., "UK Companies House" or the "Japanese National Tax Agency"
- Data sources that only provide partial information related to a business partner, such as "Geonames", "Open Street Maps" or the Australian business register, which provides company details and address data, but is missing street information
After determining which external data source you want to use, you need to take the following steps:
Data Mapping for Using External Reference Data
Start mapping the data fields. Simply put, for each data field you want to use, you must define the corresponding counterpart in your database. For example, UK Companies Houses provides address information in the semantically ambiguous data fields such as "AddressLine1" which require mapping and transformation into your data model.
Data Curation to Use External Data Sources for your Business Partner Data
After mapping the data, you need to curate the data. As an example, if you look at the information from the Zefix (Central Business Name Index: https://www.zefix.ch/) you can see that the "address" field contains a mix of information. There is the street name (in our database, this is "thoroughfare value") and there is the house number (in our database, this is "thoroughfare number").
Some sources even mix more information within one data field as you can see in the example of the address field in the European Commission’s VAT database (accessed via VIES: http://ec.europa.eu/taxation_customs/vies).
Of course, this can be solved. As an example, our software developer solved this by writing the appropriate code, so that the required information is provided in a semantically unambiguous and structured representation by our services.
1a) Example of mapping the external data source Zefix to the CDQ Business Partner Lookup: In the reference data from Zefix, the "adress" field contains a mix of information. There is the street name and house number in one data field.
1b) Example of mapping the external data source Zefix to the CDQ Business Partner Lookup: The information "Lukasstrasse 4" now is mapped to street (data field: "thoroughfare value") and street number (data field: "thoroughfare number").
2a) Example of mapping the external data sources VIES to the CDQ Business Partner Lookup: In the data field adress is the information of street name, street number, locality and postal code combined.
Integration of Reference Data
First, you must do some upfront research on how often data sources are updated because this differs a lot. For example, the “Global Legal Entity Identifier Foundation” updates their data daily, the Belgium Company Register updates its data weekly and the Company Register in Norway is updated monthly.
Then, you must find a technical solution for how to integrate this data. For example, we do not use an API connection for most of the data sources, but instead, we have programmed bots that monitor for updated dump files (usually provided in various formats such as CSV, JSON or Excel).
Our expert team programs the bots to ensure that we get the new data right after it is published, so that we can always provide the latest data to the users of our DQaaS services.
Maintenance of External Data Sources
The usage of open data sources brings a lot of benefits. To make this data usable, however, quite a lot of effort has to be invested! Additionally, these integrations require you to perform continuous maintenance. We monitor changes to the data as well as the data models and implement these changes accordingly.