|
Breakthrough Analysis, by Seth Grimes
Seth Grimes is an analytics strategist with Washington DC based Alta Plana Corporation. He consults on data management and analysis systems. See More by Seth Grimes Column stores and Census data: ParAccel and SuperSTAR
ParAccel has won well-deserved attention in recent months, including Intelligent Enterprise recognition as a Company to Watch. They're a start-up that boasts an all-star cast of executives, positioned in a hot category, namely column-store DBMSes that are optimized for analytics. There's irony, however, in their market positioning. It's not that column stores, most notably SybaseIQ, have been around for decades. It's that ParAccel chose to explain their product with an application, analysis of U.S. Census data, that is essentially owned by a competing column-store system, SuperSTAR from Space-Time Research. I have personal history here: I designed the U.S. Census Bureau's Census 2000 tabulation system, working on subcontract to IBM. Back in 1998, I wrapped up the selection of SuperSTAR over competing options. We chose SuperSTAR for superior performance and ease of use. I then led the development team that created a system that supported both ad-hoc queries and the production of hundreds of billions of statistical tables for subsequent publication via the Census Bureau's American FactFinder Web site. The Census Bureau and IBM chose SuperSTAR for the very reasons that Intelligent Enterprise cited in naming ParAccel a Company to watch: "column-store databases are nothing new; it's well known that they offer super scalability and blazing query response in analytic applications." Like ParAccel today, ten years ago, STR was "an upstart that's blowing by established price and performance benchmarks." STR has continued to improve the product, and the Census Bureau and IBM recently reupped for analysis of the 2010 Census using SuperSTAR. (I left the project myself in 2002 after four and one-half years.) I haven't compared ParAccel to SuperSTAR or Vertica, SybaseIQ, MonetDB, Infobright, or other column-store DBMSes, and frankly, although my company has a business relationship with STR, I'm a consultant and I would happily work with any of them, whichever best suited the project and customer. In the case of Census data, no other product I know of has SuperSTAR's capabilities for handling important requirements such as confidentiality protection; ability to roll-up multiply branching geographic hierarchies, up to 9 levels deep at Census; support for hierarchical datasets such as the Census’s, which represent data by geographic area-household-person; built-in support for "multi-response" data, at the bureau, accommodating people who identify themselves in more than one racial category; and both a GUI and also a non-graphical interface for automation of large scale, production tabulations. The first lesson here is that a conceptually simple illustration, for instance ParAccel's Census-analysis example, may not be so simple in real life. Hot technology such as a column-store DBMS isn’t enough. Applications that require high performance tend also to require other specialized capabilities. Secondly, if application of established technology seems like a good idea, there’s a good chance that someone else got there first. Seth Grimes is an analytics strategist with Washington DC based Alta Plana Corporation. He consults on data management and analysis systems.
This is a public forum. CMP Technology and its affiliates are not responsible for and do not control what is posted herein. CMP Technology makes no warranties or guarantees concerning any advice dispensed by its staff members or readers. Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of CMP Media LLC and may be edited and republished in print or electronic format as outlined in CMP Technology's Terms of Service. Important Note: This comment area is NOT intended for commercial messages or solicitations of business.
|
Blog Channels
Cindi Howson on Business Intelligence The Brain Food Blogger Tony Byrne on Content Management SQL Puzzlers by Joe Celko Rajan Chandras on IT & Information Management Seth Grimes on Analytics In Context by Doug Henschen Phil Kemelor on Web Analytics Sandy Kemsley's Column Two Nelson King on Enterprise App Development David Linthicum on Software as a Service Natural Insight, By Mark Madsen Alan Pelz-Sharpe on Content Management Mark Smith on Performance Management Neil Raden on Business Intelligence Bruce Silver on Business Process Management Product Maven Subscribe to RSS Archives
|
|
|












