The growth in the amount of data we as people store is staggering. In the past two years, we have generated more data than ever created throughout our entire history. In the data driven professional world, it is imperative that you bring some kind of expertise to the market. But employees frequently feel overwhelmed by the amount of data with which they are confronted. Here is a short guide that will show you how to get started.
Before starting your analysis, you should formulate an objective. All too often businesses start their research without a clear-cut goal and end up losing themselves in a never-ending analysis. What question do you want to answer? Take your time and think about the aim of your project.
Based on your research question, you need to figure out which data sources should be considered. There are plenty of ways to tap data sources. In most cases you can export a spreadsheet or a flat file. A rather professional approach would be to query an API (application programming interface). This allows external programs to regularly query data in an automated way.
Once you have your data compiles, it's time to take a closer look. You can group columns into two classes: dimensions and metrics. Dimensions define attributes of a certain unit, while a metric determines a measurement for that unit. Among the dimensions there is a combination that marks the lowest aggregation level. For example, a row could display the performance of an ad on a given day. The same ad could be part of an ad group, which itself is embedded in an advertiser campaign and so on and so forth. The lowest aggregation level in this case is the ad in connection with a date. Knowing the level of aggregation is important when data sources are joined together or need to be summarized. This also helps you understand what is being represented in the table.
Before starting any kind of analysis, it is essential that you take care of cleaning and transformation activities. There might be some inconsistent entries or duplicate data points that could potentially bias your results. These activities are a field of research on their own. But let’s highlight the importance of this step by taking a look at some helpful examples.
While reviewing a categorical variable named “campaign target,” you find that some of the imputed values are not following the specified naming convention. To correctly interpret the performance of your branding campaigns, for example, you will need to adjust those items.
In another scenario, you find that a numeric variable contains outliers. To evaluate the general behavior of the variable, you might want to substitute the extreme values or leave them out of the analysis.
You might have heard the saying “garbage in garbage out”. If you do not take this step seriously, you are jeopardizing the success of the project.
In order to get a quick summary, it is a good idea to take a look at the maximum, minimum, mean and standard deviation of your KPIs. In most data-science-focused languages, this task can be done by a single function. In MS Excel, for instance, you can use a pivot table to carry out these calculations. This will give you an idea of how the KPIs are distributed in the dataset.
Another, more visual approach is plotting graphs. You might want to display the relationships between a target variable, let’s say revenue, and the advertising expenses in a scatter plot. Or you might want to visualize the age of visitors to your website in groups in a histogram. In essence, you should choose a chart type that suits your narrative. Leave out anything that would distract the viewer. The key criterion here should be comprehensibility. Once the graph is boiled down to the key message, you can think about adding features. For example, using the industry benchmark cpc could give you an insightful perspective on the cost per click of your campaign. But remember that simplicity is key.
The demand for sophisticated techniques is on the rise. Buzzwords like big data, data warehouse and machine learning are everywhere. If you need help implementing the right data strategy, don’t hesitate to contact the MMT team. We are a strong partner that can help you tackle the challenges of your data-driven business.