Garbage In Garbage Out at the petabyte scale is suffocating for business innovation
Editorial note: Many marketing leaders are understandably cautious about how best to spend their budget right now – if at all. But in fact, this is an opportunity for forward-looking teams to prepare today to get the most from a likely ML investment tomorrow. As Michael explains in this blogpost, whatever method teams choose to extract these insights, data is where the journey begins. – Ju-kay
Machine Learning (ML)* has evolved from a science project buzzword to a C-Suite topic of interest. In this Red Hat report, technology leaders unsurprisingly cite it as a top consideration in 2020. Significantly, Accenture’s recent survey highlights the cognitive dissonance between the more than 80% of business leaders saying they must scale the technology to remain competitive, and the 16% who have moved beyond the experimentation stage. Why is this?
A typical challenge is accessing production data, which is often landlocked in technical and/or organizational data silos. Without well-organized, accessible, and reliable data streams – or foundational data – ML projects can spiral into endless complexity making them ineffective at best. What’s worse, poor data makes it hard to accurately measure business outcomes which means it becomes all but impossible to justify the investment. No wonder the projects are still in the labs.
Let’s first take a look at some of the ways marketers can use ML, before reviewing best practices for data preparation.
A marketing case for ML
Fundamentally, ML is really effective at repeating a process at scale: running a repetitive algorithm without the need to keep reprogramming it. Today, it’s likely your data sets can be reasonably queried by your analyst team. You have a finite number of combinations (perhaps dozens or even hundreds) of different questions you need to pose to different audience demographics in order to effectively measure engagement. But the explosion of content and corresponding data feeds are a harbinger to a data asset that becomes increasingly difficult to explore using traditional methods. And that’s where ML comes in.
Using ML, the marketing team can pose minor variations of the same question – for instance “is this particular customer engaging with this particular content?” – thousands of times across different subsections of your audience, or different iterations of your advertising creative. As the team drills deeper into audience clusters, interesting and often unexpected trends start to emerge which can help to inform the marketing strategy. For example, ML could be used to:
Determine best-performing ad units in granular detail (e.g. format or position on page)
Identify most-engaged audience segments – right down to a particular subsection of a particular demographic – by finding clusters you didn’t know you had, or even, didn’t know you needed to search for
Prove (and expand on) a hunch about a particular subsection of an audience segment
Make more meaningful comparisons with historical data to make more accurate predictions for future audience clustering and targeting strategies
What ML brings the marketing team is the inexhaustible tenacity to continually reevaluate the queries – or “questions” – to help find insights in novel ways. This frees the team to make insightful decisions that can drive the business.
Pre-ML data challenges
One of the greatest challenges our customers face is the ability to feed ML algorithms with trustworthy data at the scale required to capitalize on the algorithm’s capability, and therefore achieve the desired ROI. This is because disparate data sets must be continually consolidated and monitored before they can be ingested, and before training the machine to model patterns in the data.
At the point of implementing ML, many if not most businesses realize they don’t have what we call foundational data, because it is either fragmented or incomplete; they are unable to access enough of it; or they are unsure where it is stored. And this is before the reliability of the data is considered – do you trust it?
Even basic ML is impossible without first building foundational data. As volumes of data grow, aggregating it becomes harder and harder. Garbage In Garbage Out at the petabyte scale is suffocating for business innovation. The message: build the process, have the tools, and create trusted foundational data today.
Building an ML-ready data asset
Most businesses considering ML will have a combination of first-party and third-party data, from sources such as sales data, demographic data, advertising data, and/or content consumption data. These are all huge datasets containing up to billions of rows of data, which need to be aggregated so that the ML algorithm can access the data and carry out a meaningful analysis.
In addition to this, to build a model for ML you also need:
Historical data for each of the aforementioned data sources, to instruct the machine to analyze previous data, form an opinion, and apply it to current data. This allows for detailed month-on-month or year-on-year comparisons, which in turn enable predictive insight to more reliably inform future marketing strategy.
A continual, aggregated data stream to feed an ML algorithm without interruption as you continue to collect fresh data.
Building a foundational data asset can be done in two ways. Typically, enterprises first try a “build it ourselves” approach and launch a custom engineering project. A team of developers write bespoke code, which allows you to apply your own business-specific rules to the data. An obvious challenge of custom code is the team needed to support it on an ongoing basis.
Alternatively, you can deploy a data unification platform. Using a solution that doesn’t require custom development – and yet still enables you to apply your own custom business rules – means you can be confident about your timelines and budget to create a strategic ML-ready data asset. The entire process, from mapping, normalizing, aggregating and assembling, to applying business logic and pulling missing and/or historical data, can be implemented automatically and in real-time across billions of rows of data, in weeks instead of months or more.
Get to know your data
ML outcomes will only ever be as good as the data being ingested into the deployment model in the first place. Investing in ML technology is a major consideration for any business. But the lesson is clear – it’s crucial to take control of your data now. The benefit: you have foundational data that enables your team to derive real-time insights from your data today, and you are far more likely to realize the desired ROI on your ML strategy tomorrow.
If you want to discuss how “ML-ready” your data is, get in touch and we’d be happy to share our experiences.
*AI and ML are often used interchangeably, but here, we’re referring to initial ML projects that might one day lead to true AI implementations. -MM