...and why it is central to the success of your data project
As I said in my previous post, getting your data sources into a single format is time-consuming and expensive. But that’s not the whole story. In fact, data unification gets exponentially more expensive as you add in new data sources.
At the risk of giving you flashbacks from Econ 101, let me illustrate with my favorite graph, which I created myself thank you very much:
Have you ever heard of the Laffer Curve? Well my friends, this is the Manoochehri Data Curve, if I may be so bold to name it after myself.
Much like the Laffer Curve, I quickly jotted this down on the back of a napkin as you can see above. Unlike the Laffer Curve, the Manoochehri Data Curve has practical consequences for data-driven enterprises.
Let me explain.
The X axis represents time, and the Y access represents the number of data sources you add on to your current data project. As you can see by the two lines, every time you add a new data source, your costs skyrocket exponentially. This happens every time you add in a new data source.
In fact, the cost of using that data goes up even higher than the value of the data. So, you probably can’t afford to integrate new data sources into your data knowledge base.
But, you also can’t afford not to. Because if you don’t integrate new data sources, you will miss out on mission-critical information.
Why does Manoochehri’s Data Curve hold true? In my experience there are three main culprits:
Wrong talent: it’s not that your team isn’t any good. It’s that they don’t have the right expertise.
Too expensive: unrealistic budgeting can sink a data project, quickly.
ROI not justified: as you can see from the graph, the amount of increased value of the data does not track with the increased cost. But you cannot not add new data sources either.
That’s quite a conundrum. And the challenge is probably worse than you realize: according to Gartner, 65% of data projects fail! (It’s actually more like 80%.)
So, how do you create a data unification project that becomes one of the 20% that succeed and justifies the ROI, even as you add new and exponentially more expensive data sources?
Make sure the data quality is good every day. You need to consistently check the quality of your data. And the only way to do that successfully is to have your engineering team write your own systems to audit and validate the data.
Be ready for scale. Your data will grow relentlessly. Don’t be afraid to scale -- make sure you have systems in place to make sure there are no limits to your scaling.
Carefully calculate your costs. Do you have the right engineering and product teams to deploy, monitor and adjust on an ongoing basis? And, can you afford to distract your teams from doing other work?
Be Flexible. Monitor and react to changes. How quickly can you react to schema changes? For instance, if OpenX messes with their schema and data changes, you need to adjust quickly to make it happen.
Be mindful that you might have 50 different data sources to look through, and you have to constantly be on top of all of it. So this is a daunting challenge, and not one which is dealt with easily. But I have seen this happen to organizations over and over again. (This is one of the reasons I co-founded Switchboard.)
And as I said in my last post, timing is everything. When should you start tackling this project? There’s an expression when it comes to mind:
“The best time to get started is 5 years ago; the second best time is today.”
Again, by not getting started immediately, you are missing out on mission-critical insights that could save, or cost, you millions of dollars. I know this because I have seen it over and over again with clients such as Meredith and The Financial Times, not to mention my time launching Google BigQuery over 7 years ago. You really need to start thinking about this now.
Eat your heart out Arthur Laffer.