Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Digital Library
Subscribe
Home




May 24, 2001



Joint Effort

Administer the distributed data warehouse correctly or lose it!

By Ralph Kimball

As you may have noticed, my column's name has changed to "Data Warehouse Designer." This new name is more descriptive of the broader scope and more practical flavor of the subjects I am now covering here.

I will continue to invite a number of experienced data warehouse practitioners to alternate with me in writing these columns. I hope you will enjoy some of their fresh perspectives.

While we are on the subject of new things, this column's topic is about the most important new development in data warehousing: the distributed data warehouse (DDW). Inspired by the Web and the ability to share data through XML, a DDW arises whenever several organizations agree to share data.

A DDW encourages "drilling across" many separated databases to produce coordinated overall results. The most visible and urgent DDWs are supply chains in which confederations of manufacturers, distributors, and retailers share data in order to implement just-in-time "efficient consumer response." But there are many other examples of DDWs in other industries.

The advantages of a distributed architecture are enormous. In situations like supply chains, we simply can't build a centralized data warehouse. The separate organizations have separate technologies, IT staffs, data elements, and administrative rhythms. What ties these diverse entities together is a strong desire to share data, but not share much else!

With the benefit of a few years' experience building DDW architectures, we are beginning to understand the implications of administering and supporting these systems. We need to do our homework in understanding DDW administration, because if we do it incorrectly we will lose just about all the advantages we hope for.

Before outlining what it takes to administer a DDW, it's important to define carefully what it is. A DDW is a collection of organizations that:

  • May be physically separated
  • May use database technologies from diverse vendors, including both ROLAP and OLAP approaches
  • Agree to use centrally administered, conformed dimensions
  • Be willing to use centrally defined, conformed facts
  • Make an agreement to publish their respective fact tables to the rest of the DDW, subject to role-based security control
  • Agree to administer their individual databases according to the principles described in this column.

It should be evident that conformed dimensions are copied and distributed widely around the DDW, and that the conformed fact tables exist as single copies within their respective DDW member organizations.

Centralized Agreement

A DDW doesn't come about because a group of organizations just wants to share data. Significant planning, compromises, and technical development must take place before a DDW can go live. All the organizations participating in the DDW must embrace a centralized agreement. The members of the DDW agree to the following:

To define and use conformed dimensions. For instance, all members will use the Item dimension in a supply chain. All must be willing to live with the attribute names and attribute values in this centrally defined dimension. All the members of the DDW agree to rely on a single item czar who will administer and publish the Item dimension. This person takes the title "dimension authority."

Each conformed dimension in the overall DDW needs a dimension authority. Although not mandatory, it probably would work best to have all the dimension authorities work together in a single office.

To define and use conformed facts. For instance, all members must agree on the definition of the "revenue" measure in a DDW so that separate revenue measurements coming from separate fact tables can be combined mathematically. Unlike with conformed dimensions, the individual member organizations create these facts locally when they publish their own fact tables.

To use the surrogate keys contained in the conformed dimensions. The dimension authority creates a well-administered set of simple keys for each record in the dimension. These keys will usually be integers devoid of any structure, meaning, or order. The Calendar Date dimension is the only exception to the lack of ordering, because we usually use the Calendar Date surrogate key as the basis for physically partitioning the downstream fact tables. The surrogate keys are the primary keys of the dimensions, and all fact tables using conformed dimensions must use the surrogate key values as foreign keys.

To synchronize the addition of new fact records with the correct current release of a dimension. Each member organization makes sure that new fact records they add to their respective fact tables are processed using the correct surrogate keys defined in the most recent release of each dimension. If the member organization has implemented a surrogate key pipeline with lookup tables for finding the contemporary surrogate keys, these lookup tables must be updated whenever a new release of a dimension is received. See the article "Pipelining Your Surrogates" from the June 1998 issue of DBMS magazine (available at www.ralphkimball.com or www.dbmsmag.com).







IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space