Integrating Master Data Management into the Data Warehouse

Organizations that want to take advantage of a higher performing, more agile, and lower-cost data warehouse architecture should implement master data management (MDM) to improve data quality. Master data are those data that are the most critical to your organization’s success (“building blocks” of business transactions), and which are usually created, maintained, and shared among many applications. In commercial enterprises, these data are data about customers, products, locations, accounts, portfolios, and similar data.

Marty Moseley

Bookmark and Share

Nearly every data warehouse ecosystem has attempted to manage master data within its data warehouse architecture, but it has typically focused on mastering data after transactions occur. This approach does little to improve data quality because data are corrected after the fact. The best way to improve data quality is to move the process upstream of the data warehouse before transactions are executed.

One goal of MDM is to prevent bad data from entering the ecosystem by enforcing data quality at the edges of the IT ecosystem. For example, when creating a sales order, the transactional system should guarantee that customer data on the order is not duplicated and that address data are correct and current .The same thing could happen for the product data, prices, discounts, and payments on the order. The system should also enforce business rules and policies and guarantee the transaction is correct.

Master data provides an enterprise-wide purview of data, whereas transactional systems have insight only into the data they contain. Since each source does not contain the sum total of master or reference data required to guarantee enterprise-wide data quality, they only provide a partial solution. This is why MDM can play a critical role.

If MDM is implemented using a real-time architecture, based on the principles of service-oriented architecture (SOA), a service that enforces the business rules and policies governing each kind of master data can be made available to all transactional systems.

For example, a service that manages the uniqueness and quality of customer data can be created to ensure that a customer is created only once and made available to any system of record whenever a transaction occurs. This satisfies the need to enforce data quality at the edges of the ecosystem. The same applies for any master data, such as data about people and organizations of interest; products and related catalog and pricing data; locations of customers, products, and company assets; and calendars of events.

These services, which can be called master data services (MDS), guarantee the integrity of the most critical data within every transactional and data capture system in the organization. Each MDS can be an autonomous, decoupled service that is individually scalable and managed to ensure the quality of the domain of master data it manages.

One of the prime services provided by an enterprise service bus should be an MDS that serves as the real-time single version of the truth for a specific kind of master data. This type of service contributes significant business quality improvements because it provides data to the data warehouse and to all IT systems that consume or refer to master data.

Properly implemented, MDS dramatically improves data quality, which reduces the size, scope, and complexity of data warehouse architectures. If each transaction is complete and accurate before it enters the data warehouse, the savings in terms of the work required to integrate it is greatly reduced. The result is a decrease in the cost of managing data quality and a significant reduction in the brittleness of the data warehouse data chain.

By rethinking the data warehouse data chain and getting transactional details from the message bus, (also known as an “Enterprise Service Bus,” these transport data and transactions between systems via “messages” that flow over channels called “busses”) the storage, replication, and movement of data will be streamlined. The ultimate result is that scores of problem areas and risks will be eliminated, costs and complexities will be reduced, and there will be fewer errors. Here are some recommendations to make it happen:

  1. Organizations should not try to accomplish all the above in one massive project. Chances are high that, in attempting to implement most of these ideas at once, the MDM project may become too encumbered to produce results in a timely fashion. Practical experience has shown an MDS can be implemented and in production in less than nine months.
  2. Migrate in an evolutionary fashion, not a revolutionary one. Start by replacing one part of a data warehousing architecture with a more SOA-compliant design. This will allow the MDM system to produce results without forcing change on the data warehouse.
  3. Master only one domain at a time, such as customer data, product data, asset data, account data, and portfolio data. Don’t try to master everything or generally do too much at once. It is ultimately better to show a win in a production system in less time and then be asked to repeat the success in other master data domains than to overload a project and never produce results.
  4. Each MDM solution should be a separate environment, not one giant data management engine. Putting all master data into one physical solution is a continuation of failed enterprise resource planning thinking – all data in one giant box. When the results from multiple MDS projects need to be combined, this can be accomplished through orchestration (the coordination of data and transformations from multiple sources, following a prescribed process model – see the World Wide Web Consortium at www.w3.org for standards on orchestration and other WWW constructs).
  5. Let the data warehouse be the recipient of mastered data, not the master controller of it. The data warehouse is meant to support analytical business processes, not transactional requirements. This follows the principles of “separation of concerns” and “specialization.” – fairly standard principles used in computer science (see Wikipedia for good treatment of these subjects).
  6. Implement common models in the message and document schemas, not in the Extract, Transform & Load (ETL) or only in the data warehouse. ETL is commonly used as a modifier for technology related to the process of moving data from one place to another. For example, when one company using an IBM database acquires a company using an Oracle database, traditionally the IT department would have to use ETL tools to move data stored in Oracle tables to the IBM database tables. One of the biggest bangs for the buck in data models is to implement them in the SOA messaging layer. This provides the opportunity to design a standard, canonical model that defies any particular vendor’s data models and supports the organization‘s business semantics.

Adopting an SOA-compliant strategy in the data warehousing ecosystem will decouple systems in the data chain, hide keys and schemas of source systems, and minimize the ripple effects of changes. It will also facilitate creation of canonical data objects that represent the objects of interest to the business-in-business vernacular, not the cryptic language used by vendors to describe their physical schemas.

Additionally, adopting MDM will enable reuse of common data quality, reference, and master data services, which will help enforce data quality upstream of the data warehouse and ensure data are of high quality when they are captured, rather than trying to repair broken data after the fact.

Marty Moseley can be contacted at mmoseley@initiate.com.