Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home


June 26, 2000, Volume 3 - Number 10


Eric Sebo   



In Support of CRM

Do data profiling and mapping boost data warehouse value for customer relationship management?

Field Notes
Problem: CRM initiative required automation of the data analysis process in order to ensure target model integrity.

Environment: Axio resides on a multiprocessor Sun 6500 box running Solaris. Clients gain access via Windows 95 or NT workstations. The business/systems analyst is responsible for measuring data quality in an open systems environment. The more memory, the better; it’s more disk space to hold sample data.

Product used: Evoke Software Corp.’s Axio

Product requirements: Windows-centric and requires knowledge of SQL and RDMS and modeling concepts.

Project duration: This data quality process is ongoing.

Next steps planned: Two similar implementations are underway.

Wish list for product features in future releases or patches: Evoke informed us that it incorporated the majority of our wish list items in the new version of Axio, which we have not previewed.
Those items include:

• Ability to toggle back and forth among multiple screens*

• Ability to click on a domain value and “drill-down” to a sample row containing that value

• More flexible sort: ability to sort on multiple columns

• Redesign of the Dependency Profile screen to reduce clutter*

• Ability to print domain comparison details: nonoverlapping values produced in redundancy profiling,for example.*

*Expect in new release.

Faced with increased competition in the customer relationship management (CRM) space, companies are turning to products and techniques that use data profiling and mapping to automate their data analysis process and ensure the integrity of their production models. Such projects are underway in various industries, including financial services, telecommunications, insurance, and transportation.

Data profiling systems enable the analysis of far more rows of data than with traditional methods, with a higher accuracy rate and fewer resources required, resulting in significant project savings and a very fast return on investment. This new technology will be implemented at an enterprise rather than project level, and becomes part of a larger data quality process supporting CRM initiatives.

I am involved with such a CRM effort at AXA Financial Inc., parent company of The Equitable Life Assurance Society of the United States, AXA Advisors, Donaldson, Lufkin, & Jenrette, DLJdirect, and Alliance Capital Management. Our AXA Client Solutions unit is implementing a CRM strategy to support the company’s continued emphasis on customer service. The CRM project involves the migration of large quantities of data to a second-generation customer-centric data warehouse and business reporting environment.

AXA Financial recognized that in order to build a world-class financial services organization, it needed to accelerate the building of the CRM framework. The company developed an enterprise data warehouse architecture (EDWA) to provide underlying support for strategic business initiatives and the major CRM goals of customer acquisition, expansion, and retention. The EDWA includes information- and technical-architecture components, as well as multiple data store approaches to meet informational and operational data requirements. The EDWA uses new technologies and a multitiered data mart architecture to provide decision-makers with easy access to performance information. Reuse opportunities include requirements, designs, specifications, documentation, and system code. An initial information-architecture component was a new customer database sourced from an existing data warehouse.

The Value of Timely Data

The EDWA opens the door to new data possibilities. The consolidation of data from disparate sources onto a common platform provides a competitive advantage through timely access to a consistent source of information. With the advent of sophisticated knowledge discovery and data mining products, data used in the past to facilitate mainframe processes can now be leveraged for strategic purposes. For example, data used on an annual statement for an insurance policy can now be used as input to product, sales force, and customer profitability.

However, in order to achieve a true understanding of profitability, the integrity of the target model and the quality of the data being measured must be of the highest level. With a solid infrastructure being put into place, management’s attention turned to taking a closer look at the data itself.

Data’s Potential And Personality

AXA Financial knew that its existing data analysis process was extremely labor and time intensive and that even with knowledgeable workers putting in long hours, data issues would surface after the analysis phase. Being open to new technology, AXA Financial looked toward the emerging category of data profiling and mapping strategies to automate the data analysis process.

What is data profiling? One of the dictionary definitions of “profile” is “a concise description of a person’s abilities, personality, or career.” The purpose of data profiling is to uncover the personality and potential of relational data so you can gain a true understanding of its content. In order to describe an attribute’s personality, profiling will provide data characteristics, such as data type, length, uniqueness, values, and frequency of occurrence.

Understanding the personality of the data lets you assess its quality; you can identify missing values and defects, for example. Data’s value lies in its potential for use.

Can the data support the business rules? A profile of the functional dependencies that exist between tables’ columns can answer that question before you build the data model. Do similar-sounding fields from two tables have the same business meaning? Uncovering the overlap of data values across tables via a redundancy profile can pave the way for the integration of data from multiple sources.

In any integration or migration process, you need to know where data fields are coming from and where they are going. A data mapping document tracks the evolution of data fields from source schema, through normalization, and on to the production model. The benefit of data mapping software is that it tracks changes to attributes automatically, thereby eliminating the time-consuming task of manually adding to and maintaining a mapping document. A completed data mapping document can be input for the creation of source-to-target data maps within a transformation tool.

Data Quality Checks in the Architecture

Seeking to add data quality capabilities to the EDWA, AXA Financial turned to Evoke Software Corp. Evoke’s data profiling and mapping solution, Axio, saves significant time and resources in several data quality tasks, as shown in Table 1.

Data Quality Tasks Benefits of Data Profiling
1. Identify risk factors Data quality Issues discovered early
2. Gain access to the data Imports data from multiple platforms
3. Specify the content of the data sample Interactive sampling strategy
4. View the data GUI interface based on windows concepts
5. Understand data structure and content Infers metadata and points out anomalies
6. Compare data fields for similar values Provides overlapping domain values across files
7. Understand data relationships Determines functional dependencies
8. Document and present the results Capture and report on data discovery
Table 1 Profiling benefits to common data quality tasks.


For example, with the ability to define sample parameters through data profiling, you no longer have a reason to compromise on the size or content of the sample. Axio provides several sample load methods including random load, consecutive rows, and every nth row.

Without a profiling solution, the samples taken from large files will represent less than an entire table because the time to review thousands of rows of data is simply not available. With a profiling product, statistical inference takes the place of “best guess.”

Axio lets AXA Financial accelerate project completion time dramatically by automating what was a manual, error-prone process, and by eliminating iterative rework particularly in the development of target data models.

Putting Theory Into Action

AXA Financial recently used Axio to analyze data being moved into its new customer database. Axio reviewed more than 700 attributes on 32 tables within the project timeframe: a volume of 9GB! Data profiling contributed to a quicker deployment time and a greater confidence in data quality.

Output generated from Axio, along with analyst comments and observations, were placed into a spreadsheet format and shared with project team members. Table 2 is a partial example of documentation containing inferred metadata and observations based on a review of a column’s domain values profile. A high-level rating of the files being reviewed can also be produced, as shown in Table 3.

Attribute # Distinct Distinct Ratio Documented Data Type Inferred Data Type Data Profile Observations
ADDRBD 419 0.0001 DATE DATE Null (99%) The <1% of data populating this field should be deleted if field not being used
ADDRID 4,754,480 0.8503 NUMBER (9, 0) NUMBER (9, 0) Contains varying length values: e.g.‘100,030’, ‘10,166,922’ Determine reason for and effect of varying length fields
ENTDATE 1722 0.0003 DATE DATE 31 JUL 95 (74%) Consistent population of values
Table 2 Table quality snapshot.


Source Table Name Number of Fields Excellent/Good Rating Fair/Poor Rating Comments
PRODUCT_ADDR 8 4 4 Clean-up and consistency issues
PRODUCT 8 6 2 Data availability issues
HOUSEHOLD 23 23 0 No significant data issues
PLAN_UNIT 14 10 4 Redundancy issues
Table 3 Sample column profiling results.


A Change in Mindset

Trepidation often persuades people not to look for problems: Don’t look under the hood, you might find something wrong! Being the internal advocate of a data profiling and mapping solution can be challenging at times, because the technology really does “look under the hood” at the data. New data quality issues come to light, and these issues need to be addressed immediately, rather than later. And “immediately” may not be included in the project plan.

Other items to keep in mind when implementing a data profiling and mapping solution are as follows:

• Training. Users will require a three- to four-day training class, which the vendor will most likely provide.

• Project resources. Successful use of the software requires both product knowledge and modeling skills that will necessitate the training and availability of both business and technical resources.

• Baby steps. As with a good video game, it is best to master the easier levels of the product first — such things as discovery of metadata and data defects. Plunging too deep, too fast can be overwhelming.

• Knowledge trap. Knowledge of the data may already exist within the project team, leading its members to believe that they don’t need an automated analysis process. Profiling will always shed new light on data!

In contrast to these concerns, learning and using data profiling techniques will:

• Reduce project time by minimizing “scrap and rework”

• Add to the “learning and growth” perspective of the organization’s balanced scorecard measurement

• Extend the knowledge of a project team through statistical inference capabilities

• Become a building block of an enterprise data quality process that also includes a data correction and standardization product, metadata management, and so on.

At AXA Financial, Evoke’s data profiling and mapping product is already considered a key part of the organization’s technology framework and is being used in two additional strategic initiatives in early 2000. Whereas the initial use of Axio concentrated on understanding the data’s personality (values, anomalies, quality, and so on), current project teams are also exploring the data’s potential. For example, several uses work in conjunction with the extract, transform, and load (ETL) process. AXA Financial uses redundancy profiling to determine possible data gaps in the building of multidimensional “cubes” and dependency profiling to validate business rules discovered during transformation. Also, column profiling is being used to benchmark the output of a file when first moved onto the Oracle platform, so that future comparisons can be made at various staging areas within the data flow.



Eric Sebo (eric.sebo@equitable.com) is a technology consultant in information architecture for AXA Financial Inc. He has more than 10 years of experience in IT and has worked as a business/systems analyst on many of the company’s strategic initiatives.





IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space