Data quality: Why "dirty data" can foul your marketing efforts

Data quality: Why

What is data quality?

The importance of data for the success of an organization's advertising and marketing efforts is widely accepted. It is seen as a valuable resource and can become an advantage in a highly competitive market. Dirty data on the other hand may hinder the success of data-driven marketing initiatives.

In a study by the Experian from 2019 the authors found that 95% of respondents see impacts in their organization from low data quality.

But what exactly determines data quality (DQ)?

Lee et al (2006) define data quality as a measure of the condition of data based on the following dimensions:

  • Free of error: This dimension reflects the extent to which the data is accurate.
  • Completeness: The degree to which the entities, attributes, or values are complete in a data set.
  • Consistency: All instances of a data item in a data set or across databases should be the same.
  • Believability: The extent to which the data is regarded as true and believable.
  • Appropriate amount of data: The amount of data at hand should be neither too little nor too much. 
  • Timeliness: Reflects how up-to-date the data is in respect of the task for which it is intended.
  • Accessibility: This dimension reflects the ease of attainability of the data.

The assessment of the data quality depends on the data requirements and the purpose for which they are to be used. Thus, the same standard for data quality can be sufficient in one case but not in another. For example, the invoicing data for advertising campaigns need to meet very high requirements of the criteria mentioned above, while there might be a higher tolerance for errors in third-party data like Nielsen Ad Intel.

The Problem of Dirty Data

All too often data quality management has no operational priority and consequently data quality is unknown. When data is dirty the true picture is biased and the probability of costly decision-making increases. You probably have heard of the "garbage in, garbage out" principle. It states that if you feed a system with inferior data, it will likely produce an inferior output. The negative effects of wrong decision-making on your marketing efforts can be many. Inefficient targeting might waste budget in scattering losses and thus lowering profitability. Unpersonalized content could arouse aversion in consumers and thereby worsen your customer relationship. Errors in invoicing and biased reports can undermine trust and credibility in your company. Dirty data will also increase the expenses for data cleansing. Spending costly time on resolving those issues will occupy resources that can not be used elsewhere.

All in all, poor data quality can put your enterprise in an economic position where it is exposed to a competitive disadvantage. If the market's products and services are significantly better in the long run, the success of your business might be jeopardized.

Why high data quality might become harder to achieve

As more and more firms are processing data in the terabytes they are facing data quality problems in the context of big data. In their paper published in 2015, Cai and Zhu elaborate on those effects. They note that the diversity of data sources bring about a variety of data types and complex data structures; which further complicates the data integration process. As the term big data already suggests, it is becoming increasingly difficult to assess the DQ for ever-growing amounts of data in a given amount of time. And further, they argue that data is changing at a very fast pace, posing more sophisticated requirements upon data processing technology.

Increasing requirements for data-related roles paired with rising demand for qualified candidates have to lead to a skill gap. In a recent study by the Experian, the authors found that 87% of respondents see difficulties in hiring data-related roles in their companies.

How to improve and sustain data quality?

All activities related to analyzing, improving, and assuring data quality can be summarized under the term data quality management (DQM).

In general, a distinction is made between preventive and reactive measures. The former aims to avoid errors that have a negative effect on data quality. While the latter tries to detect and resolve already existing data quality problems. In general, the goal should be to hinder data deficiencies from entering the data warehouse.

But not only dirty data is associated with costs, but DQM measures also consume valuable resources. Otto and Österle presented an economic interpretation of the optimal level of data quality. They state that the cost of dirty data is sinking with higher data quality.

On contrary, the marginal cost of data quality measures is rising with higher data quality.

So the optimal level of data quality is not the absence of dirty data but in the minimum of the total costs curve. Therefore data quality management should use a cost-optimal combination of reactive and preventive measures. As these suggestions are rather theoretical I'd like to provide you with a couple of hands-on tips:

  1. If this is not yet the case, then bundle data quality competencies in a competence center. They can be concentrated in a business intelligence or data engineering team; In larger organizations there are entire data governance departments. This will increase the level of DQ and entail clear responsibilities for DQM measures.
  2. The gathering of data requirements plays another important role. This may sound trivial at first, but it actually demands a deep understanding of the states and scenarios that can become relevant. Often times expert knowledge of a specialized department is helpful in formulating business rules that can be used to test data quality.
  3. Often times dirty data comes from other sources or departments. Since the quality of the data is unknown, it should be examined thoroughly.
  4. If DQ issues do not arise outside of the team, they originate from the inside. Duplicates after joining data sources, missing data due to unharmonized job scheduling are typical examples for what can go wrong internally. A careful design of data pipelines in every step of the process from data integration, data processing, and testing to job orchestration is crucial.
  5. The most common source of dirty data is human error. Whether it is employees making typos or missing to provide the required information. Take measures to avoid or restrict human interactions and try to automate processes.

Conclusion

Dirty data is a serious threat to brand success. Recent developments as the mentioned skill gap or big data are making it even harder to manage data quality. Nonetheless, you need to take action and start an initiative against dirty data in your organization. High DQ will require a high-performing competence center that is able to implement carefully designed data pipelines. Furthermore, it is advisable to reduce or restrict human error to a minimum and automate processes where it's possible. By following these tips you will be able to increase and sustain DQ in your organization. While this can take some effort, it can be extremely beneficial for your brand.

If you feel overwhelmed by the duties of data quality management or you fear the consequences of dirty data, then reach out to us!

We are looking forward to supporting you become a data-driven marketing enterprise.

Ressources

Receive 4 times a year a summary of our articles most relevant to you.

Subscribe to our newsletter


© Copyright 2021 | Mercury Media Technology GmbH