There's a revolution underway in data science to effectively manage and derive value from the growing amount of data being generated in our world. We've been driving cars, using phones, and manufacturing product for decades. But now we have mobile phones pinging cell towers everywhere we go, sensors in industrial systems recording information every second, and our cars reporting data back to manufacturers via telematics. This modern Internet of Things (IoT) landscape is creating new sources of big data.
Marketing has certainly embraced and put millions into it, but Big Data is real and its value is not fully recognizable in its raw form. Once properly organized and managed, this raw data (potential value) can be used to provide vastly different insight into behaviors, trends, and exceptions (actual value).
The big data landscape is complicated, and changing quickly. The patterns and best practices for making sense of and deriving value from big data as part of a larger enterprise data strategy are in flux. One quickly evolving area is how to identify, organize, and manage the master data that is embedded inside of most big data sets. Understanding the interplay between master data and big data, and how to marry your big data with other enterprise data sources is key to converting it into a valuable information asset.
"Things" vs. "Activities"
At its simplest level, any question aimed at insight is some permutation of the generic question, "Which things are doing what activity?" For example:
- Which customers are buying?
- Which products are selling?
- Which oil wells are producing?
- Which clients are happy?
- Which pipelines are leaking?
To answer these questions, you have to first solve two interrelated problems. The first is to effectively comprehend and manage your understanding of things. Things include people, objects, places, businesses, etc. – in essence, your master data. The second is to capture and organize the activities of things so that insights can be derived. Activities can range from traditional sales transactions, the Internet of Things, social media posts, etc.
In the big data world, things and activities are often intermingled. The challenge is determining how to separate these two concepts, bring them under proper management, and then marry them back together.
“I’m OK, You’re OK”…The key difference between things and activities
There is one major difference between things and activities. While the number of things we need to understand and manage is significant and growing, it isn't growing at nearly the pace of activities. In fact, they are vastly different in terms of both growth and volume.
To illustrate this, we'll approximate the growth of Things (Master Data) and Activities (Big Data). We'll approximate the growth of things in the world using the growth of the world population from 2010 to 2020 (projected). We'll approximate the number of activities in the world with the estimated growth of global data storage over that same time period.
Getting a good understanding of things is not a "big data" problem. It is a data complexity problem –
something that modern MDM platforms are well suited to help solve. Analyzing activities, on the other hand, is your potential big data problem.
“Tying the knot” (two together)
Long before the "Big Data" and "Master Data" terms were coined, there were many evolutions in technology focused on gaining insight from data. These include early reporting solutions, more formalized decision support systems, the data warehouse, and more recently business intelligence solutions.
Today’s challenge is how to deal with the sheer volume of data, both structured and unstructured, and the complexity of connecting the dots between them. Sources of data have expanded beyond a traditionally small number of tightly controlled enterprise applications. They now include an increasing number of third party data sources: external lists, data services, social media, etc. Onboarding large amounts of data from disparate data sources is where big data platforms shine.
Embedded in each data set are things (master data) representing customers, product, places, and locations as well as activities (big data) depicting purchases, "likes", comments, and documents. Understanding this distinction is the first step towards leveraging these two complementary technologies as a part of larger enterprise data strategy.