Data is often compared to gold or oil but is arguably way similar to water on our planet. In that, it's abundant but almost never the way you would want it to be. It's usually very salty or murky, oftentimes highly polluted, and all in all not ready for consumption. In that sense, the ownership of data in itself doesn't necessarily hold any value. To extract true value from data a whole lot of processing is required. This is especially true for marketing where data sources are plentiful and atomistic.
"To enable data-driven decision-making, i.e. to extract actionable insights from data is a key driver for growth in digital marketing."Darian Heede
Head of Data Engineering, MMT
Creating new campaign strategies, for reporting purposes, or understanding in which direction a market is moving. Nearly every domain in digital marketing will touch on data at a certain stage. To enable data-driven decision-making, i.e. to extract actionable insights from data is a key driver for growth in digital marketing. Hence, the ability to manage this data efficiently is paramount.
Let's dive into some key principles on how to manage your data for advertising:
Before any data is touched, we need to specify expectations regarding the insights we want to generate, as this will help define the required data as well as transformations that need to be done on the data. The goals can be general like reducing internal costs for reporting tasks or rather precise like finding out if a target group has a certain preference of social media formats.
A good approach is to look at a given decision process and analyze when and why decisions are made without being based on data and if data might not help make these decisions more informed. All stakeholders that are relevant to the decision-making process need to be involved. This especially includes all stakeholders that will process and analyze the data. Data analysts or visualization experts will have extensive experience when it comes to defining what goals are valuable. This allows for a more comprehensive approach when specifying objectives.
Since advertising is a dynamic environment, revisiting the set goals on a regular basis and improving on them if the need arises is a must.
Based on the specified goals, the next step is to define the data requirements: the haves wants and needs. It is possible that some data requirements can already be met and are just not accessible due to organizational silos. This is the best opportunity to break up those silos, share data sources with all stakeholders and get everyone involved - for instance, IT, Sales, Marketing, and Key Accounts.
When defining the requirements for a data source the following questions need to be answered:
As the last question implies, it makes sense to reiterate the defined goals and objectives based on the answers given to these questions. It is always possible for external factors to thwart well-meant objectives when for example a necessary metric just isn’t available in third-party data.
The detailed implementation of a data warehouse depends on the given infrastructure and technology that is employed. Nevertheless, there are some general guidelines:
A lot can go wrong when doing complex transformations on data, especially when it comes to updating, joining, or aggregating. Oftentimes data issues are hidden for a time until a data consumer notices some irregularities they can't explain. Finding the source of such issues can be very tedious and time-consuming. Hence, it's wise to be proactive. Ensuring data quality is multi-faceted and not a one-time project but rather a continual process that needs to be enhanced upon regularly.
The most important measure to implement is rigorous testing. This sounds obvious but can be hard to be put into effect efficiently.
Separating data source testing into two parts can be helpful:
General tests for any data source:
Testing for the uniqueness of a field, testing if fields are non-null, testing the structure of a table.
Data source specific tests:
Testing if a time series contains dates for the current quarter, check if IDs are present that are used as common IDs for joins down the line.
Testing can be made a lot easier by employing the right tools for the job. For instance, the schema and data testing capabilities in DBT are well worth a look. The timing aspects of testing can also be key: automatically testing on a regular basis as well as on changes to the data sources or transformations. Testing is the main driver of increasing confidence in data.
Having a concise and standardized naming convention can save the strained nerves of a database manager. It saves time searching for objects and makes the data model explicit, uncluttered, and understandable.
It's also helpful to have a standardized workflow for all data-related processes and review the workflow on a regular basis. This helps when looking for the source of an issue. If the workflow is deterministic, identifying flaws is a lot simpler.
Documenting is often seen as a drag when it involves writing a long and detailed document that nobody will probably ever read. This doesn't need to be the case though. Documentation can be a useful side effect of a well-implemented workflow. The key here is version control.
Using modern version control software will if correctly used, implement documentation as part of the data management process. For instance, making changes to a database will lead to a merge request, which needs to be reviewed and approved by someone other than the person that created the merge request. For the reviewer to understand what the merge request entails, the creator will write a short explainer on the implemented changes. This has two main benefits:
This is where the actionable insights come into play and all the previous sections come to fruition. A clean and concise data model will allow for easier modeling and more reliable analysis results. Data analysts and visualization experts won't waste their creativity by searching for data or worrying about data quality.
Data management is about the effectiveness and optimizing processes to a point where you get the most bang for your buck.
Having a deterministic and concise workflow, using new technologies, not standing still, and integrating the newest tools and ideas will increase efficiency and allow organizations to focus on their main business goals and use their data capabilities to the fullest.
Just like making water potable, a whole lot of effort is required to get data to a state where actionable insights are extractable. It will pay off!