Filter by/
Region/  All
Type/  All
Sorted By/  Most Recent

Dealing with international data: how to clean global databases

By / / In Best practice /
A clear data hygiene strategy is vital if your business is operating on a global scale, says Anna Kayfitz – but it’s not an easy task to monitor and maintain international data and to standardise fields and formats. Here are her tried and tested tactics for optimising data management and keeping globalised data pristine.
international dm, international data

Today’s businesses are global. They ship their products across the ocean, manage offices in different parts of the world and store international data in one place. The challenge for marketers is not only to be relevant both in Spain and in China, but to keep track of international data. There are many data quality issues that may arise from having globalised data.

Some issues include:

  • Different Address Formats: In some countries, the house number is placed in front of the street name, while in other countries people write the house number after the street name. This makes it very difficult to standardise the data or to check if the address is correct.
  • Zip Codes: Some countries have zip codes, others use postal codes and some do not have any code at all. This is the case in the Bahamas, Cook Islands, Gabon, Hong Kong, Mali and Qatar, to name a few. In such cases, it can be tricky to make a zip or postal code a mandatory field and to calculate data completeness including this field.
  • Phone Numbers –With country codes, area codes and, sometimes, mobile codes, it can be very problematic to have any consistency in phone numbers. For example, you can have a UK number that can look something like 01-44-9244-444222 or +44 9244-444222, or 9244-444-222. Such a variety makes it impossible to use this field for establishing duplicates, install an auto-dialler and maintain any consistency in the CRM.
  • Languages – Adding to the issues with address and phone number, there is working with the information in different languages. Having the information in non-Latin characters complicates everything and creates even more data challenges. Sometimes, the information is written from right to left – in the case of Hebrew or Arabic. Special characters are an integrated part of Japanese, Chinese and Korean languages. Even languages that use Latin letters can create issues with accents – for example, Spanish or French.
  • Currency – This is another issue. Currency can be presented in US dollars, UK pounds, yen, euros and so on. How do you calculate it into the currency that you are using? What is the date of the collected data? What exchange rate to use? All of these mentioned above can be problematic for your data.
  • Privacy Policies – Finally, privacy policies are different around the world. Some need double opt-in, others do not require any opt-in at all. This makes it tricky to keep track of who is affected by what policy. One needs to understand how it may affect email marketing, especially, if there is no data on the country where the person lives.

Other issues include lack of ability to distinguish First Name and Last Name. Long Last Names may not fit your fields and people names are cut off (this is quite typical with Indian names or Latin names, where two last names are common) and so on. All these are just some of the most widespread problems that marketers are dealing with on a daily basis.

International data: hygiene strategy

So how do you clean your database having all those issues present? Start by identifying the rules that govern your database. For example, all currency is written in euros and is converted on the day the record is created, or when a special field is modified. Another rule may include making sure all the data is written in English. Therefore, if the data is not in English, you use a translator or data cleaning service provider to help you convert it into English.

Once your rules are established, your most common data cleaning can begin, but with a twist:

  1. De-Duping – you can de-dupe your data using your own business rules just as you would a simpler database. The difference is that you may want to make sure all your data is written in English and de-dupe only within the same country.
  2. Data Standardisation – data should be consistent so it can be used for segmentation, personalisation and analysis. An example of standardisation, that is a must for global marketers, is the Country field. If you have USA, United States of America, US, U.S. in your system, it makes it very difficult to deploy an email to United States with all the variations present.
  3. Data Enhancement/Appending – in the case of missing data, companies often go to third party data providers to fill in missing information such as zip code or phone number, so they can deploy their marketing campaigns. This becomes very tricky with international data. It is rare that one can find a good quality global data provider. For B2B marketers, D&B is a major reputable source in the marketplace. For B2C, it is a bit harder. Research is required in order to find a good provider in each region they operate in.
  4. Cleaning Historical and Bogus Data – before historical data can be addressed, business should decide on what is defined as an outdated record. It can be a bounced email, a person who did not buy anything in the past year or two years and so on. Also, one of the biggest issues is the spam laws around the world that are different. Some countries allow you to communicate with a prospect or a customer within a year, others indefinitely. Keeping track of dates and retiring old records should be a part of everyday life for global marketers. But for those who have yet to get started, this should be step number one. Bogus data can be tricky to identify due to various languages and the differences in addresses and phone numbers. These variations make it seem like bogus data when in fact it isn’t. A good example of this, is ‘Hell’ in Norway. To an English-speaker, finding a record with the address ‘Hell’ would indicate a joke. But it is not a laughing matter; the town actually exists in Norway!
  5. Data Verification – verifying data such as an email can be a fairly easy task. However, verifying address and phone number can be difficult in most countries, while in others it is often impossible. My best suggestion here is to make sure you have a check-box field which tells the administrator or marketer if the address or phone number has been verified. Nobody can have 100% data verification when dealing with global data.

When you are ready to clean your data, find a reputable data cleaning provider to help you navigate your international data.

Anna Kayfitz
Author: Anna Kayfitz
CEO/founder at StrategicDB Corporation |

StrategicDB Corporation is an analytics and data cleansing company. Founder Anna Kayfitz has more than 10 years of marketing and analytics experience, with companies such as Oracle Marketing Cloud (previously known as Eloqua), Harlequin Enterprises, Sunwing Travel Group and a few start-ups. She also holds an MBA from a top business school in Canada. StrategicDB Corporation helps businesses get more from their data – its services include: segmentation modelling, dashboard building, market basket analysis, lifetime value analysis and more. Email:

Leave your thoughts

Related reading

  • Keep up to date with global best practice in data driven marketing

  • This field is for validation purposes and should be left unchanged.