Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home



From Text Analytics to Data Warehousing | Intelligent Enterprise Blog
Breakthrough Analysis, by Seth Grimes
Seth Grimes is an analytics strategist with Washington DC based Alta Plana Corporation. He consults on data management and analysis systems.
See More by Seth Grimes

From Text Analytics to Data Warehousing

Posted by Seth Grimes
Sunday, May 18, 2008
11:08 AM

IBM recently posted a quite nice page on extracting business value from "unstructured" data. The page describes use of IBM's own products and formats to be sure, but it is potentially helpful for anyone who wishes to learn about information extraction from textual sources for data warehousing.

IBM's page starts with a brief text-analytics overview. It then dives into implementation with the OmniFind Analytics Edition for DB2 and its pureXML capabilities. It describes a process flow includes XML tagging of document features and the alternatives of mapping the XML schema to relational database structures or use using the XML structures directly for analyses. This text-analytics workflow, and the choices involved in dealing with text-sourced information, are not specific to IBM's tools, however. So which IBM provides diagrams and code listings and an analysis of the alternative approaches that relate to their own products, the lessons apply much more generally.

The premise is that because much valuable business information originates in "unstructured" form — e-mail, Web pages, news and blog articles, corporate reports, etc. — you need to look at text analytics as a technology that can unlock value. And naturally, if you already have a BI program and a data warehouse, you'll want to explore integrating text-sourced information into your existing data-analysis infrastructure. You'll want to explore unified analytics.

Information extraction to databases enables unified analytics. I cover approaches in my own text-analytics courses and presentations — I use open-source GATE (General Architecture for Text Engineering) software for illustrations and examples in order to remain independent of any product — but IBM's is the first clear, freely available, and practical technical exposition that I have seen on this topic. If you want to learn more about unified analytics, do visit IBM's From Text Analytics to Data Warehousing page.

Disclosure: IBM is a sponsor of a editorially independent text-analytics report I am writing, which is unrelated to my Intelligent Enterprise writing.


E-MAIL | SLASHDOT | DIGG




This is a public forum. CMP Technology and its affiliates are not responsible for and do not control what is posted herein. CMP Technology makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.

Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of CMP Media LLC and may be edited and republished in print or electronic format as outlined in CMP Technology's Terms of Service.

Important Note: This comment area is NOT intended for commercial messages or solicitations of business.


 




    Subscribe to RSS feed of all blogs


 



InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space