What is Master Data Management | Definition, Tools, Solutions [Updated 2024]

Master data management (MDM) arose out of the necessity for businesses to improve the consistency and quality of their key data assets, such as product data, asset data, customer data, location data, etc.

Benjamin Bourgeois Head of Product and Customer Marketing

September 15, 2022
4:48 pm

Table of Contents

Contents

What is Master Data Management?
What is Master Data?
Examples of Master Data
Building a Master Data Management Strategy
The Benefits of Master Data Management
Why Bother With Managing Master Data?
Getting Started With Your MDM Program
Master Data Management Best Practices
How Do You Create a Master List?
How Do You Maintain a Master List?
Who Should Be Involved in Your MDM Program?
Conclusion
Master Data Management Frequently Asked Questions

Download Guide

Updated: August 2, 2024

Many businesses today — especially global enterprises — have hundreds of separate applications and systems (i.e., ERP, CRM) where data that crosses organizational departments or divisions can easily become fragmented, duplicated, and most commonly, out of date. When this occurs, answering even the most basic but critical questions about any type of performance metric or KPI for a business accurately becomes a pain.

To meet these challenges, businesses turn to master data management (MDM).

Download your exclusive copy of the guide to keep in your back pocket. Or if you’re ready to dive in, continue your journey below.

Let’s get started!

What is Master Data Management?

Master Data Management (MDM) is the technology, tools and processes that ensure master data is coordinated across the enterprise. MDM provides a unified master data service that provides accurate, consistent and complete master data across the enterprise and to business partners.

There are a couple things worth noting in this definition:

MDM is not just a technological problem. In many cases, fundamental changes to business process will be required to maintain clean master data and some of the most difficult MDM issues are more political than technical.
MDM includes both creating and maintaining master data. Investing a lot of time, money and effort in creating a clean, consistent set of master data is a wasted effort unless the solution includes tools and processes to keep the master data clean and consistent as it gets updated and expands over time.

Depending on the technology used, MDM may cover a single domain (customers, products, locations or other) or multiple domains. The benefits of multi-domain MDM include a consistent data stewardship experience, a minimized technology footprint, the ability to share reference data across domains, a lower total cost of ownership and a higher return on investment.

The 6 Disciplines of a Strong MDM Program

Given that MDM is not just a technological problem, meaning you can’t just install a piece of technology and have everything sorted out, what does a strong MDM program entail? Before you get started with a master data management program, your MDM strategy should be build around these 6 disciplines:

Governance: Directives that manage the organizational bodies, policies, principles and qualities to promote accurate and certified master data. Essentially, this is the process through which a cross-functional team defines the various aspects of the MDM program.
Measurement: How are you doing based on your stated goals? Measurement should look at data quality and continuous improvement.
Organization: Getting the right people in place throughout the MDM program, including master data owners, data stewards and those participating in governance.
Policy: The requirements, policies and standards to which the MDM program should adhere.
Process: Defined processes across the data lifecycle used to manage master data.
Technology: The master data hub and any enabling technology.

What is Master Data?

Most software systems have lists of data that are shared and used by several of the applications that make up the system.

For example: A typical ERP system will have at the very least Customer Master, Item Master and Account Master data lists. This master data is often one of the key assets of a company. In fact, it’s not unusual for a company to be acquired primarily for access to its Customer Master data.

Rudimentary Master Data Definition

One of the most important steps in understanding master data is getting to know the terminology. To start, there are some very well understood and easily identified master data items, such as “customer” and “product.” Truth be told, many define master data simply by reciting a commonly agreed upon master data item list, such as: Customer, Product, Location, Employee and Asset.

But how you identify elements of data that should be managed by a MDM software is much more complex and defies such rudimentary definitions. And that has created a lot of confusion around what master data is and how it is qualified.

To give a more comprehensive answer to the question of “what is master data?”, we can look at the 6 types of data typically found in corporations:

Unstructured Data: Data found in email, white papers, magazine articles, corporate intranet portals, product specifications, marketing collateral and PDF files.
Transactional Data: Data about business events (often related to system transactions, such as sales, deliveries, invoices, trouble tickets, claims and other monetary and non-monetary interactions) that have historical significance or are needed for analysis by other systems. Transactional data are unit level transactions that use master data entities. Unlike master data, transactions are inherently temporal and instantaneous by nature.
Metadata: Data about other data. It may reside in a formal repository or in various other forms, such as XML documents, report definitions, column descriptions in a database, log files, connections and configuration files.
Hierarchical Data: Data that stores the relationships between other data. It may be stored as part of an accounting system or separately as descriptions of real world relationships, such as company organizational structures or product lines. Hierarchical data is sometimes considered a super MDM domain because it is critical to understanding and sometimes discovering the relationships between master data.
Reference Data: A special type of master data used to categorize other data or used to relate data to information beyond the boundaries of the enterprise. Reference data can be shared across master or transactional data objects (e.g. countries, currencies, time zones, payment terms, etc.)
Master Data: The core data within the enterprise that describes objects around which business is conducted. It typically changes infrequently and can include reference data that is necessary to operate the business. Master data is not transactional in nature, but it does describe transactions. The critical nouns of a business that master data covers generally fall into four domains and further categorizations within those domains are called subject areas, sub-domains or entity types.

The four general master data domains are:

Customers

Within the customer’s domain, there are customer, employee and salesperson sub-domains.

Products

Within products domain, there are product, part, store and asset sub-domains.

Locations

Within the locations domain, there are office location and geographic division sub-domains.

Other

Within the other domain, there are things like contract, warranty and license sub-domains.

Some of these sub-domains may be further divided. For instance, customer may be further segmented based on incentives and history, since your company may have normal customers as well as premiere and executive customers.

Meanwhile, product may be further segmented by sector and industry. This level of granularity is helpful because requirements, like lifecycle and CRUD cycle, for a product in the Consumer Packaged Goods (CPG) sector is likely very different from those for products in the clothing industry.

The granularity of domains is essentially determined by the magnitude of differences between the attributes of the entities within them.

Examples of Master Data

Now that we know what master data is, let’s look at some examples. What gets counted as master data can vary somewhat across industries and organizations, but remember that, in general, there are four main broad categories (or domains) of master data:

Customer master data
Product master data
Location master data
Other types of master data

So, an example of master data might look like the following for any of those four domains.

Example of Customer Master Data

This might be the name of a customer, patient or employee. A master data record for a customer might look like this:

Name: Rachel Otis

Billing address: 5761 Webb Bridge Rd., Alpharetta, GA 30022

Shipping address: 5761 Webb Bridge Rd., Alpharetta, GA 30022

Email address: rachel.otis@piedpiper.com

Phone number: (404) 555-5555

Account ID: 924819478

Example of Product Master Data

This could be the name of a product, a part or some other asset subdomain. A product master data record might look like this:

Product name: USB desk lamp

Color: Black

Base material: Metal

Product dimensions: 6″D x 5″W x 4″H

Product weight: 12 ounces

Wattage: 7 watts

Power source: USB

Light bulb type: LED

Example of Location Master Data

Location master data could be anything from a brick-and-mortar store to a hospital floor to a room within a building.

Location name: Dave’s Fried Chicken — Grant Park

Address: 193 Memorial Dr. SE, Atlanta, GA 30312

Phone number: (470) 555-5555

Store number: 16

Region: Southeast

Building a Master Data Management Strategy

A master data management (MDM) strategy takes into account the core data types, or domains, that have the greatest business impact.

While identifying master data entities is pretty straightforward, not all data that fits the definition for master data should necessarily be managed as such. In general, master data is typically a small portion of all of your data from a volume perspective, but it’s some of the most complex data and the most valuable to maintain and manage.

So, what data should you manage as master data?

We recommend using the following criteria, all of which should be considered together when deciding if a given entity should be treated as master data.

Behavior Data

Master data can be described by the way that it interacts with other data.

For example:

In transaction systems, master data is almost always involved with transactional data. A customer buys a product, a vendor sells a part and a partner delivers a crate of materials to a location. An employee is hierarchically related to their manager, who reports up through a manager (another employee). A product may be a part of multiple hierarchies describing its placement within a store.

This relationship between master data and transactional data may be fundamentally viewed as a noun/verb relationship. Transactional data captures the verbs, such as sale, delivery, purchase, email and revocation, while master data captures the nouns. This is the same relationship data warehouse facts and dimensions share.

Lifecycle (CRUD Cycle)

Master data can be described by the way that it is created, read, updated, deleted and searched. This lifecycle is called the CRUD cycle and is different for various master data element types and companies.

For example:

How a customer is created depends largely upon a company’s business rules, industry segment and data systems. One company may have multiple customer creation vectors, such as through the Internet, directly through account representatives or through outlet stores. Another company may only allow customers to be created through direct contact over the phone with its call center. Further, how a customer element gets created is certainly different from how a vendor element gets created.

The following table illustrates the differing CRUD cycles for four common master data subject areas.

	Customer	Product	Asset	Employee
Create	A customer visit, such as to the company website or a facility triggers account creation	A product gets purchased or manufactured with SCM involvement	A unit gets acquired by opening a PO following the necessary approval process	HR hires a new employee, who must then fill out numerous forms, attend orientation, make benefits selections, determine asset allocations and follow office assignments
Read	Contextualized views based on credentials of viewer	Periodic inventory catalogues	Periodic reporting purposes, figuring depreciation, verification	Office access, reviews, insurance-claims, immigration
Update	Address, discounts, phone number, preferences, credit accounts	Packaging changes, raw materials changes	Transfers, maintenance, accident reports	Immigration status, marriage status, level increase, raises, transfers
Destroy	Death, bankruptcy, liquidation, do-not-call	Canceled, replaced, no longer available	Obsolete, sold, destroyed, stolen, scrapped	Termination, death
Search	CRM system, call center system, contact management system	ERP system, orders processing system	GL tracking, asset DB management	HR LOB system

Cardinality

As cardinality (the number of elements in a set) decreases, the likelihood of an element being treated as a master data element—even a commonly accepted subject area, such as customer—decreases.

For example:

If a company has only three customers, most likely the organization would not consider those customers master data—at least, not in the context of supporting them with a MDM solution, simply because there is no benefit to managing those customers with a master data infrastructure. In contrast, a company with thousands of customers would consider customer an important subject area because of the concomitant issues and benefits around managing such a large set of entities.

The customer value to each of these companies is the same, as both rely on their customers for business. However, one does not need a customer master data solution and the other does. Cardinality does not change the classification of a given entity type; however, the importance of having a solution for managing an entity type increases as the cardinality of the entity type increases.

Lifetime

Master data tends to be less volatile than transactional data. As it becomes more volatile, it is typically considered more transactional.

For example:

Some might consider “contract” a master data element. Others might consider it a transaction. Depending on the lifespan of a contract, it can go either way.

An agency promoting professional athletes might consider their contracts master data. In this case, each is different from the other and typically has a lifetime of greater than a year. It may be tempting to simply have one master data item called “athlete.” However, athletes tend to have more than one contract at any given time: One with their teams and others with companies for product endorsements. The agency would need to manage all those contracts over time as elements of each contract get renegotiated or as athletes get traded.

Other contracts—for example, contracts for detailing cars or painting a house—are more like a transaction. They are one-time, short-lived agreements to provide services for payment and are typically fulfilled and destroyed within hours.

Complexity

Simple entities, even if they are valuable entities, are rarely a challenge to manage and are rarely considered master data elements. The less complex an element, the less likely the need to manage change for that element. Typically, such assets are simply collected and tallied.

For example:

Fort Knox likely would not track information on each individual gold bar it stores, but rather only keep a count of them. The value of each gold bar is substantial, the cardinality high and the lifespan long, but the complexity is low.

Value

The more valuable the data element is to the company, the more likely it will be considered a master data element. Value and complexity work together.

Volatility

While master data is typically less volatile than transactional data, entities with attributes that do not change at all typically do not require a master data solution.

For example:

Rare coins would seem to meet many of the criteria for a master data treatment. A rare coin collector would likely have many rare coins, so cardinality is high. They are also valuable and complex since they have a history and description (e.g. attributes such as condition of obverse, reverse, legend, inscription, rim and field as well as designer initials, edge design, layers and portrait).

Despite all of these conditions, rare coins do not need to be managed as a master data item because they don’t change over time—or, at least, they don’t change enough. There may need to be more information added as the history of a particular coin is revealed or if certain attributes must be corrected, but, generally speaking, rare coins would not be managed through a master data management system because they are not volatile enough to warrant it.

Reuse

One of the primary drivers of master data management is reuse.

For example:

In a simple world, the CRM system would manage everything about a customer and never need to share any information about the customer with other systems. However, in today’s complex environments, customer information needs to be shared across multiple applications. That’s where the trouble begins.

Because—for a number of reasons—access to a master datum is not always available, people start storing master data in various locations, such as spreadsheets and application private stores. There are still reasons, such as data quality degradation and decay, to manage master data that is not reused across the enterprise. However, if a master data entity is reused in multiple systems, it’s a sure bet that it should be managed with a MDM software.

In Summary…

While it is simple to enumerate the various master data entity types, it is sometimes more challenging to decide which data items in a company should be treated as master data.

Often, data that does not normally comply with the definition for master data may need to be managed as such and data that does comply with the definition may not.

Ultimately, when deciding on what entity types should be treated as master data, it is better to categorize them in terms of their behavior and attributes within the context of the business needs than to rely on simple lists of entity types.

The Benefits of Master Data Management

While creating a clean master list can be a daunting challenge, there are many positive benefits to the bottom line that come from having a common master list, including:

A single, consolidated bill, which saves money and improves customer satisfaction
No concerns about sending the same marketing literature to a customer from multiple customer lists, which wastes money and irritates the customer
A cohesive view of customers across the organization, that way users know before they turn a customer account over to a collection agency whether or not that customer owes money to other parts of the organization or, more importantly, if that customer is another division’s biggest source of business
A consolidated view of items to eliminate wasted money and shelf space as well as the risk of artificial shortages that come from stocking the same item under different part numbers

Finally, the movement toward SOA and SaaS make MDM a critical issue.

For example:

If you create a single customer service that communicates through well-defined XML messages, you may think you have defined a single view of your customers. But if the same customer is stored in five databases with three different addresses and four different phone numbers, what will your customer service return?

Similarly, if you decide to subscribe to a CRM service provided through SaaS, the service provider will need a list of customers for its database. Which list will you send?

For all of these reasons, maintaining a high quality, consistent set of master data for your organization is rapidly becoming a necessity. The systems and processes required to maintain this data are known as Master Data Management.

Why Bother With Managing Master Data?

Because master data is used by multiple applications, an error in the data in one place can cause errors in all the applications that use it.

For example:

An incorrect address in the customer master might mean orders, bills and marketing literature are all sent to the wrong address. Similarly, an incorrect price on an item master can be a marketing disaster and an incorrect account number in an account master can lead to huge fines or even jail time for the CEO—a career-limiting move for the person who made the mistake.

How Does Master Data Management Drive Digital Transformation?

The key to driving an organization’s digital transformation lies in intelligent and automated data management. Whether it is embarking on cloud modernization, reimagining the customer experience through a comprehensive and unified view of data across the business, or implementing enterprise data governance and privacy, effectively managing data plays a crucial role in achieving a successful digital transformation.

Real Life Master Data Example: Why You Need Master Data

This is the heading

A Typical Master Data Horror Story

A credit card customer moves from 2847 North 9th St. to 1001 11th St. North. The customer changed his billing address immediately but did not receive a bill for several months. One day, the customer received a threatening phone call from the credit card billing department asking why the bill has not been paid. The customer verifies that they have the new address and the billing department verifies that the address on file is 1001 11th St. North. The customer asks for a copy of the bill to settle the account.

After two more weeks without a bill, the customer calls back and finds the account has been turned over to a collection agency. This time, the customer finds out that even though the address in the file was 1001 11th St. North, the billing address is listed as 101 11^th St. North. After several phone calls and letters between lawyers, the bill finally gets resolved and the credit card company has lost a customer for life.

In this case, the master copy of the data was accurate, but another copy of it was flawed. Master data must be both correct and consistent. Even if the master data has no errors, few organizations have just one set of master data. Many companies grow through mergers and acquisitions, and each company that the parent organization acquires comes with its own customer master, item master and so forth.

This would not be bad if you could just union the new master data with the current master data, but unless the company acquired is in a completely different business in a faraway country, there’s a very good chance that some customers and products will appear in both sets of master data—usually with different formats and different database keys.

If both companies use the Dun & Bradstreet Number or Social Security Number as the customer identifier, discovering which customer records are for the same customer is a straightforward issue; but that seldom happens. In most cases, customer numbers and part numbers are assigned by the software that creates the master records, so the chances of the same customer or the same product having the same identifier in both databases is pretty remote. Item masters can be even harder to reconcile if equivalent parts are purchased from different vendors with different vendor numbers.

In Summary…

Merging master lists together can be very difficult since the same customer may have different names, customer numbers, addresses and phone numbers in different databases. For example, William Smith might appear as Bill Smith, Wm. Smith and William Smithe. Normal database joins and searches will not be able to resolve these differences.

A very sophisticated tool that understands nicknames, alternate spellings and typing errors will be required. The tool will probably also have to recognize that different name variations can be resolved if they all live at the same address or have the same phone number.

Getting Started With Your MDM Program

Once you secure buy-in for your MDM program, it’s time to get started. While MDM is most effective when applied to all the master data in an organization, in many cases the risk and expense of an enterprise-wide effort are difficult to justify.

PRO TIP: It is often easier to start with a few key sources of master data and expand the effort once success has been demonstrated and lessons have been learned.

If you do start small, you should include an analysis of all the master data that you might eventually want to include in your program so that you do not make design decisions or tool choices that will force you to start over when you try to incorporate a new data source. For example, if you’re initial customer master implementation only includes the 10,000 customers your direct sales force deals with, you don’t want to make design decisions that will preclude adding your 10,000,000 web customers later.

Master Data Management Best Practices

Your MDM project plan will be influenced by requirements, priorities, resource availability, time frame and the size of the problem. Most MDM projects include at least these phases:

Identify sources of master data

This step is usually a very revealing exercise. Some companies find they have dozens of databases containing customer data that the IT department did not know existed.

Identify the producers and consumers of the master data

This step involves pinpointing which applications produce the master data identified in the first step, and—generally more difficult to determine—which applications use the master data. Depending on the approach you use for maintaining the master data, this step might not be necessary. For example, if all changes are detected and handled at the database level, it probably does not matter where the changes come from.

Collect and analyze metadata for your master data

For all the sources identified in step one, what are the entities and attributes of the data and what do they mean? This should include:

Attribute name
Data type
Allowed values
Constraints
Default values
Dependencies
Who owns the definition and maintenance of the data

‘Owner’ is the most important and often the hardest to determine. If you have a repository loaded with all your metadata, this step is an easy one. If you have to start from database tables and source code, this could be a significant effort.

Appoint data stewards

These should be the people with the knowledge of the current source data and the ability to determine how to transform the source data into the master data format. In general, stewards should be appointed by the owners of each master data source, the architects responsible for the MDM softwares and representatives from the business users of the master data.

Implement a data governance program and data governance council

This group must have the knowledge and authority to make decisions on how the master data is maintained, what it contains, how long it is kept and how changes are authorized and audited. Hundreds of decisions must be made in the course of a master data project, and if there is not a well-defined decision-making body and process, the project can fail because politics prevent effective decision-making.

Develop the master data model

Decide what the master records look like, including what attributes are included, what size and data type they are, what values are allowed and so forth. This step should also include the mapping between the master data model and the current data sources. This is normally both the most important and most difficult step in the process. If you try to make everybody happy by including all the source attributes in the master entity, you often end up with master data that is too complex and cumbersome to be useful.

For example:

If you cannot decide whether weight should be in pounds or kilograms, one approach would be to include both (WeightLb and WeightKg). While this might make people happy, you are wasting megabytes of storage for numbers that can be calculated in microseconds and running the risk of creating inconsistent data (WeightLb = 5 and WeightKg = 5). While this is a pretty trivial example, a bigger issue would be maintaining multiple part numbers for the same part.

As in any committee effort, there will be fights and deals resulting in suboptimal decisions. It’s important to work out the decision process, priorities and final decision-maker in advance to make sure things run smoothly.

Choose a toolset

You will need to buy or build tools to create the master lists by cleaning, transforming and merging the source data. You will also need an infrastructure to use and maintain the master list. These functions are covered in detail later in this article. You can use a single toolset from a single vendor for all of these functions or you might want to take a best-of-breed approach. In general, the techniques to clean and merge data are different for different types of data, so there are not a lot of tools that span the whole range of master data. The two main categories of tools are Customer Data Integration (CDI) tools for creating the customer master and Product Information Management (PIM) tools for creating the product master. Some tools will do both, but generally tools are better at one or the other. The toolset should also have support for finding and fixing data quality issues and maintaining versions and hierarchies. Versioning is a critical feature because understanding the history of a master data record is vital to maintaining its quality and accuracy over time.

For example:

If a merge tool combines two records for John Smith in Boston and you decide there really are two different John Smiths in Boston, you need to know what the records looked like before they were merged in order to “unmerge” them.

Looking at the big picture, functional capabilities for which to look include data modeling, integration, data matching, data quality, data stewardship, hierarchy management, workflow and data governance. From a non-functional perspective, you should also consider scalability, availability and performance.

Design the infrastructure

Once you have clean, consistent master data, you will need to expose it to your applications and provide processes to manage and maintain it. When this infrastructure is implemented, you will have a number of applications that will depend on it being available, so reliability and scalability are important considerations to include in your design. In most cases, you will have to implement significant parts of the infrastructure yourself because it will be designed to fit into your current infrastructure, platforms and applications.

Generate and test the master data

This step is where you use the tools you have developed or purchased to merge your source data into your master data list. This is often an iterative process that requires tinkering with rules and settings to get the matching right. This process also requires a lot of manual inspection to ensure that the results are correct and meet the requirements established for the project.

No tool will get the matching done correctly 100 percent of the time, so you will have to weigh the consequences of false matches versus missed matches to determine how to configure the matching tools. False matches can lead to customer dissatisfaction if bills are inaccurate or the wrong person is arrested. Too many missed matches make the master data less useful because you are not getting the benefits you invested in MDM to get.

Modify the producing and consuming systems

Depending on how your MDM implementation is designed, you might have to change the systems that produce, maintain or consume master data to work with the new source of master data. If the master data is used in a system separate from the source systems—a data warehouse, for example—the source systems might not have to change.

If the source systems are going to use the master data, however, there will likely be changes required. Either the source systems will have to access the new master data or the master data will have to be synchronized with the source systems so that the source systems have a copy of the cleaned-up master data to use. If it’s not possible to change one or more of the source systems, either that source system might not be able to use the master data or the master data will have to be integrated with the source system’s database through external processes, such as triggers and SQL commands.

The source systems generating new records should be changed to look up existing master record sets before creating new records or updating existing master records. This ensures that the quality of data being generated upstream is good so that the MDM can function more efficiently and the application itself manages data quality. MDM should be leveraged not only as a system of record, but also as an application that promotes cleaner and more efficient handling of data across all applications in the enterprise.

As part of your MDM strategy, you need to look into all three pillars of data management:

Data origination
Data management
Data consumption

It is not possible to have a robust, enterprise-level MDM strategy if any one of these aspects is ignored.

Implement maintenance processes

As stated earlier, any MDM implementation must incorporate tools, processes and people to maintain the quality of the data. All data must have a data steward who is responsible for ensuring the quality of the master data.

The data steward is normally a business person who has knowledge of the data, can recognize incorrect data and has the knowledge and authority to correct the issues. The MDM infrastructure should include tools that help the data steward recognize issues and simplify corrections. A good data stewardship tool should point out questionable matches that were made—customers with different names and customer numbers that live at the same address, for example.

The steward might also want to review items that were added as new because the match criteria were close but below the threshold. It is important for the data steward to see the history of changes made to the data by the MDM software in order to isolate the source of errors and undo incorrect changes. Maintenance also includes the processes to pull changes and additions into the MDM software and to distribute the cleansed data to the required places.

As you can see, MDM is a complex process that can go on for a long time. Like most things in software, the key to success is to implement MDM incrementally so that the business realizes a series of short-term benefits while the complete project is a long-term process.

Additionally, no MDM project can be successful without the support and participation of the business users. IT professionals do not have the domain knowledge to create and maintain high-quality master data. Any MDM project that does not include changes to the processes that create, maintain and validate master data is likely to fail.

The rest of this article will cover the details of the technology and processes for creating and maintaining master data.

How Do You Create a Master List?

Whether you buy a MDM tool or decide to build your own, there are two basic steps to creating master data:

Cleaning and standardizing the data
Matching data from all the sources to consolidate duplicates.

Cleaning and Standardizing Master Data

Before you can start cleaning and normalizing your data, you must understand the data model for the master data. As part of the modeling process, you should have defined the contents of each attribute and defined a mapping from each source system to the master data model. Now, you can use this information to define the transformations necessary to clean your source data.

Cleaning the data and transforming it into the master data model is very similar to the Extract, Transform and Load (ETL) processes used to populate a data warehouse. If you already have ETL tools and transformation defined, it might be easier just to modify these as required for the master data instead of learning a new tool. Here are some typical data cleansing functions:

Normalize data formats: Make all the phone numbers look the same, transform addresses and so on to a common format.
Replace missing values: Insert defaults, look up ZIP codes from the address, look up the Dun & Bradstreet Number.
Standardize values: Convert all measurements to metric, convert prices to a common currency, change part numbers to an industry standard.
Map attributes: Parse the first name and last name out of a contact name field, move Part# and partno to the PartNumber field.

Most tools will cleanse the data that they can and put the rest into an error table for hand processing. Depending on how the matching tool works, the cleansed data will be put into a master table or a series of staging tables. As each source gets cleansed, you should examine the output to ensure the cleansing process is working correctly.

Matching Data to Eliminate Duplicates

Matching master data records to eliminate duplicates is both the hardest and most important step in creating master data. False matches can actually lose data (two Acme Corporations become one, for example) and missed matches reduce the value of maintaining a common list.

As a result, the matching accuracy of MDM tools is one of the most important purchase criteria.

Some matches are pretty trivial to do. If you have Social Security Numbers for all your customers or if all your products use a common numbering scheme, a database JOIN will find most of the matches. This hardly ever happens in the real world, however, so matching algorithms are normally very complex and sophisticated. Customers can be matched on name, maiden name, nickname, address, phone number, credit card number and so on, while products are matched on name, description, part number, specifications and price.

PRO TIP: The more attribute matches and the closer the match, the higher degree of confidence the MDM software has in the match.

This confidence factor is computed for each match, and if it surpasses a threshold, the records match. The threshold is normally adjusted depending on the consequences of a false match.

For example:

You might specify that if the confidence level is over 95 percent, the records are merged automatically, and if the confidence level is between 80 percent and 95 percent, a data steward should approve the match before they are merged.

How Should You Merge Your Data?

Most merge tools merge one set of input into the master list, so the best procedure is to start the list with the data in which you have the most confidence and then merge the other sources in one at a time. If you have a lot of data and a lot of problems with it, this process can take a long time.

PRO TIP: You might want to start with the data from which you expect to get the most benefit once it’s consolidated and then run a pilot project with that data to ensure your processes work and that you are seeing the business benefits you expect.

From there, you can start adding other sources as time and resources permit. This approach means your project will take longer and possibly cost more, but the risk is lower. This approach also lets you start with a few organizations and add more as the project demonstrates success instead of trying to get everybody on board from the start.

Another factor to consider when merging your source data into the master list is privacy. When customers become part of the customer master, their information might be visible to any of the applications that have access to the customer master. If the customer data was obtained under a privacy policy that limited its use to a particular application, you might not be able to merge it into the customer master.

Because of implications around privacy, you might want to add a lawyer to your MDM planning team.

At this point, if your goal was to produce a list of master data, you are done. Print it out or burn it to an external hard drive and move on. If you want your master data to stay current as data gets added and changed, you will have to develop infrastructure and processes to manage the master data over time.

The next section provides some options on how to do just that.

How Do You Maintain a Master List?

There are many different tools and techniques for managing and using master data. We will cover three of the more common scenarios here:

Single copy: In this approach, there is only one master copy of the master data. All additions and changes are made directly to the master data. All applications that use master data are rewritten to use the new data instead of their current data. This approach guarantees consistency of the master data, but in most cases it’s not practical. That’s because modifying all your applications to use a new data source with a different schema and different data is, at least, very expensive. If some of your applications are purchased, it might even be impossible.
Multiple copies, single maintenance: In this approach, master data is added or changed in the single master copy of the data, but changes are sent out to the source systems in which copies are stored locally. Each application can update the parts of the data that are not part of the master data, but they cannot change or add master data.
For example:
The inventory system might be able to change quantities and locations of parts, but new parts cannot be added and the attributes that are included in the product master cannot be changed. This reduces the number of application changes that will be required, but the applications will minimally have to disable functions that add or update master data. Users will have to learn new applications to add or modify master data and some of the things they normally do will not work anymore.
Continuous merge: In this approach, applications are allowed to change their copy of the master data. Changes made to the source data are sent to the master, where they are merged into the master list. The changes to the master are then sent to the source systems and applied to the local copies. This approach requires few changes to the source systems. If necessary, the change propagation can be handled in the database so no application code is changed.On the surface, this seems like the ideal solution because application changes are minimized and no retraining is required. Everybody keeps doing what they are doing, but with higher quality, more complete data. However, this approach does have several issues:
- Update conflicts are possible and difficult to reconcile: What happens if two of the source systems change a customer’s address to different values? There’s no way for the MDM software to decide which one to keep, so intervention by the data steward is required. In the meantime, the customer has two different addresses. This must be addressed by creating data governance rules and standard operating procedures to ensure that update conflicts are reduced or eliminated.
- Additions must be remerged: When a customer is added, there is a chance that another system has already added the customer. To deal with this situation, all data additions must go through the matching process again to prevent new duplicates in the master.
- Maintaining consistent values is more difficult: If the weight of a product is converted from pounds to kilograms and then back to pounds, rounding can change the original weight. This can be disconcerting to a user who enters a value and then sees it change a few seconds later.

In general, all these things can be planned for and dealt with, making the user’s life a little easier at the expense of a more complicated infrastructure to maintain and more work for the data stewards. This might be an acceptable trade-off, but it’s one that should be made consciously.

A Few Thoughts On Versioning and Auditing

No matter how you manage your master data, it’s important to be able to understand how the data got to the current state.

For example:

If a customer record was consolidated from two different merged records, you might need to know what the original records looked like in case a data steward determines that the records were merged by mistake and should really be two different customers. The version management should include a simple interface for displaying versions and reverting all or part of a change to a previous version.

The normal branching of versions and grouping of changes that source control systems use can also be very useful for maintaining different derivation changes and reverting groups of changes to a previous branch. Data stewardship and compliance requirements will often include a way to determine who made each change and when it was made.

To support these requirements, an MDM software should include a facility for auditing changes to the master data. In addition to keeping an audit log, the MDM software should include a simple way to find the particular change for which you are looking. An MDM software can audit thousands of changes a day, so search and reporting facilities for the audit log are important.

A Few Thoughts On Hierarchy Management

In addition to the master data itself, the MDM software must maintain data hierarchies—for example, bill of materials for products, sales territory structure, organization structure for customers and so forth. It’s important for the MDM software to capture these hierarchies, but it’s also useful for an MDM software to be able to modify the hierarchies independently of the underlying systems.

For example:

When an employee moves to a different cost center, there might be impacts to the Travel and Expense system, payroll, time reporting, reporting structures and performance management. If the MDM software manages hierarchies, a change to the hierarchy in a single place can propagate the change to all the underlying systems.

There might also be reasons to maintain hierarchies in the MDM software that do not exist in the source systems.

For example:

Revenue and expenses might need to be rolled up into territory or organizational structures that do not exist in any single source system. Planning and forecasting might also require temporary hierarchies to calculate “what if” numbers for proposed organizational changes. Historical hierarchies are also required in many cases to roll up financial information into structures that existed in the past, but not in the current structure.

For these reasons, a powerful, flexible hierarchy management feature is an important part of an MDM software.

Who Should Be Involved in Your MDM Program?

Now that you understand the what and why, let’s talk about the who and really, there are a several different ways to think about who to involve in an MDM program. First, let’s take a high-level look at three core roles:

Data Governance: Individuals who drive the definition, requirements and solution. These users help administrators know what to create and data stewards know what to manage and how to manage it. Data governance users dictate to data stewards how data should be managed, including the processes for doing so, and then hold the data stewards accountable to following those requirements. Data governance users also dictate to administrators what to create during the implementation of the MDM solution, especially from a data matching and quality perspective.Data governance users also need to maintain a feedback loop from the MDM software to ensure everything is working as expected. This feedback covers the measurement perspective of the MDM program and might include information like:
- How long does it take to onboard a new customer?
- Is that process getting faster or slower?
- How is the company doing compared to its SLA?
- If there are any areas that are slipping, why is that happening?
- How well is the data matching working?
- How many business rules are failing from a data quality perspective?
Administrators: Individuals in IT who are responsible for setting up and configuring the solution.
Data Stewards: Boots on the ground individuals responsible for fixing, cleaning and managing the data directly within the solution. Ideally, data stewards come from departments across the business, such as finance and marketing. Typically, the activities that data stewards take on within the MDM program are defined by data governance users.

Other MDM roles can include and vary by organization/project type:

Role	Skills/Responsibilities	Level Of Involvement
Program Manager	Owns the data management strategy and platform.	Part time
Project Manager	Develops and manages project plans, ensures timely quality deliverables and reports project progress. Responsible for risk and issue management and escalation.	None
System Admin and DBA	Sys Admin: Systems administrators tend to work on things managing things like domains, storage, virtualization, group policies, DNS, some networking, etc. Basically they tend to be more generalized. DBA: DBA combines some skills from system administration along with some from the development world along with specialized knowledge of the database platforms used.	Occasional support
Developer	Developers implement custom SDK and/or Workflow solutions to extend MDM platforms. This may include web services based integrations, bespoke user interfaces, or custom applications or processes that leverage APIs or MDM data. A developer must have a working knowledge of C#.NET, Windows Communications Framework and ASP.NET.	Occasional support
ETL Developer	Batch data loading from source systems (ETL integration) is performed by these team members, with Profisee providing training and guidance on how to execute the implementation within the scope.	Occasional support
Business Analyst/SME	Resources who are familiar with the data and the business processes related to a MDM solution. Provides deep knowledge of application functionality and requirements and participates in workshops, planning and execution of the review and testing activities.	Occasional support
Data Architect/Data Modeler	Oversees enterprise conceptual, logical, and physical data models that conform to an organization’s standards and conventions; Provides leadership and guidance with enterprise data strategies, especially as they relate to MDM; Assists with organization governance practices, and standards and acts as a liaison between business and IT to clarify data requirements.	Occasional support
End Users/Data Stewards	Individuals who interact with the master data and/or business processes. These are the business users of the MDM system and act as stewards/maintainers of the data.	Up to full time
Governance Council	The Master Data Governance Council (MDGC) is the decision-making and policy-making authority for matters related to data. The MDGC oversees the implementation of data standards and quality assurance to ensure that the MDM team and Data Stewards are developing, maintaining, and providing acceptable system data for the use of others.	Part time (regular meetings)

Master Data Management Stakeholders:

Aside from the roles that execute and manage an MDM strategy, one of the keys to a successful MDM project is active commitment by the key stakeholders. The stakeholders for a typical MDM engagement include those representing both the business and IT. Active stakeholders usually include, but are not limited to, the following types of roles:

Business or IT Executive Sponsor
IT Project Lead
Subject-matter experts from the impacted Line-of-business
Data Stewards
IT delivery team

As MDM stakeholders are defined throughout an organization, it is critical to secure their engagement and be committed to their organization’s MDM journey. Through multiple implementations, Profisee has identified several “Health” indicators to help determine the MDM stakeholder impact:

Healthy Signs

Executive incentives tied to project results
Investments in change management and training
Subject matter experts dedicated full-time
The right sponsor is appropriately engaged and funded
Regular Steering Committee meetings are being held, decisions and actions are being taken in a timely fashion and are effective
All appropriate stakeholder groups are effectively represented and engaged

Unhealthy Signs

No executive sponsor visible
Resistance to new ideas
No “experts” available

Master Data Management Steering Committee

It’s recommended that management-level representation from the MDM stakeholders form a Steering Committee to facilitate cross-functional decision-making. Here are a few characteristics of an effective Steering Committee:

Be sized appropriately – Big enough to represent the priority stakeholders, but small enough to quickly analyze key information and make decisions.
Focused on fast decision-making
Become a vehicle for removing organizational barriers and not simply a regular meeting for listening to reporting from the Project Team members
Not be a substitute for hands-on Sponsorship

Once the stakeholders are identified, the MDM Project Charter should include formation of a Steering Committee. Based on running hundreds or MDM projects, Profisee recommends the following roles participate in the Steering Committee. Note that there may be more than one team member per role, or some roles may not be applicable or a company’s organizational structure.

Role	Description
Executive Sponsor(s)	Primary budget owner for MDM Initiative. This role typically comes from the line of business expected to benefit from the MDM solution.
Data Governance Lead	MDM is a component of a larger Data Governance strategy. If the organization has a Data Governance team in place, it should be an active participant in an MDM Steering Committee.
Data Steward or SME	The team responsible for day-to-day data management, including making decisions about how data is presented in operational or analytical systems, is typically part of the Steering Committee.
IT Sponsor(s)	MDM Sponsorship sometimes resides within the IT organization as MDM can be considered an IT-driven effort. Organizations also often have formal or informal Business and IT partnerships whereas the IT Sponsor supports the business-led initiatives. In either case, the IT sponsor plays a critical role in the MDM project’s success and should be part of the Steering Committee.
Organization Standards Bodies	In cases where organizations have cross-functional teams driving adoption of common standards across the enterprise, this role might be a good candidate for the MDM Steering Committee. Examples of such standards may include IT Architecture, IT Integration, Meta Data Management and more.
Data Domain Owner	When companies are organized around the key components of its business cycle, such as Customers, Products, or Suppliers, there may be Data Domain Owners who will be part of Steering Committee decision-making.
MDM Champion	In some instances, an MDM champion oversees all business and IT aspects of an MDM implementation. In such cases, this role is part of the MDM Steering Committee.
MDM Partner	In order to drive optimal value from its MDM investment, companies are encouraged to include their MDM implementation and/or software partner in the Steering Committee. The MDM Partner offers best practice insight to support Steering Committee decision-making.

Conclusion

While it’s easy to think of master data management as a technological issue, a purely technological solution without corresponding changes to business processes and controls will likely fail to produce satisfactory results.

This article has covered the reasons for adopting master data management, the process of developing a solution, several options for the technological implementation of the solution and who should be involved along the way to make sure the program runs smoothly.

This article is an update of the original article titled “The What, Why, and How of Master Data Management” by Kirk Haselden and Roger Wolter, originally published in 2006. Special thanks to Roger and Kirk for their contributions, and allowing Profisee to repubish their article, with updates for today.

Interested in learning more? Download a full copy of the guide below.

Master Data Management Frequently Asked Questions

How can master data management drive operational efficiency with simplified workflows?

Master data management drives operational efficiency by simplifying workflows. By centralizing and streamlining data processes, organizations can eliminate redundancies, reduce manual tasks, and automate data-related workflows. This leads to improved efficiency, reduced errors, and increased productivity across the organization.

How can master data management increase agility with 360-degree views of data across the enterprise?

Master data management increases agility by providing 360-degree views of data across the enterprise. This means that organizations have a comprehensive and unified view of their data from various sources, enabling them to make faster and more informed decisions, respond quickly to changes, and adapt to evolving business needs.

How can master data management boost revenue and profitability with more accurate AI models?

Master data management can boost revenue and profitability by providing more accurate AI models. By ensuring that data is accurate, reliable, and up-to-date, organizations can train AI models with high-quality data, leading to more accurate predictions and insights that can drive revenue growth and improve profitability.

How can master data management enhance workforce productivity?

Master data management enhances workforce productivity by enabling self-service data access. This means that employees can easily access the data they need without relying on IT or data specialists, allowing them to work more efficiently and make informed decisions.

What are the business-critical benefits of master data management?

Master data management provides several business-critical benefits, including enhancing workforce productivity through self-service data access, boosting revenue and profitability with more accurate AI models, increasing agility with 360-degree views of data across the enterprise, driving operational efficiency with simplified workflows, and increasing access to data on any platform, any cloud, and for any type of user in multicloud and multi-hybrid environments.

Benjamin Bourgeois

Ben Bourgeois is the Head of Product and Customer Marketing at Profisee, where he leads the strategy for market positioning, messaging and go-to-market execution. He oversees a team of senior product marketing leaders responsible for competitive intelligence, analyst relations, sales enablement and product launches. He has experience managing teams across the B2B SaaS, healthcare, global energy and manufacturing industries.

Tagged Blog Post

What is Master Data Management | Definition, Tools, Solutions [Updated 2024]

Let’s get started!

What is Master Data Management?

The 6 Disciplines of a Strong MDM Program

What is Master Data?

Rudimentary Master Data Definition

Customers

Products

Locations

Other

Examples of Master Data

Example of Customer Master Data

Example of Product Master Data

Example of Location Master Data

Building a Master Data Management Strategy

So, what data should you manage as master data?

Behavior Data

Lifecycle (CRUD Cycle)

Cardinality

Lifetime

Complexity

Value

Volatility

Reuse

In Summary…

The Benefits of Master Data Management

Why Bother With Managing Master Data?

How Does Master Data Management Drive Digital Transformation?

Real Life Master Data Example: Why You Need Master Data

This is the heading

In Summary…

Getting Started With Your MDM Program

Master Data Management Best Practices

How Do You Create a Master List?

Cleaning and Standardizing Master Data

Matching Data to Eliminate Duplicates

How Should You Merge Your Data?

How Do You Maintain a Master List?

A Few Thoughts On Versioning and Auditing

A Few Thoughts On Hierarchy Management

Who Should Be Involved in Your MDM Program?

Master Data Management Stakeholders:

Healthy Signs

Unhealthy Signs

Master Data Management Steering Committee

Conclusion

Master Data Management Frequently Asked Questions

How can master data management drive operational efficiency with simplified workflows?

How can master data management increase agility with 360-degree views of data across the enterprise?

How can master data management boost revenue and profitability with more accurate AI models?

How can master data management enhance workforce productivity?

What are the business-critical benefits of master data management?

Benjamin Bourgeois

LET'S DO THIS!

REGISTER BELOW