Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home




March 20, 2003

Keys and History

Let's ditch the punch cards once and for all

by Joe Celko

Let me take you back to those thrilling days of sequential file systems, magnetic tape drives, and punch cards. Magnetic tapes were essentially the same as the sequential model, but when you dropped a tape it wasn't a disaster like dropping a deck of punch cards. This physical model led to a system design style antithetical to the relational approach.

Data, in the pre-relational approach, was located by its position within a file. Records were physically contiguous within files; and fields within records were contiguous to each other. If one record was an exact copy of another, it was different from the first because of its separate location. (I call it the evil twin.)

Placement and order

Success depended on files with sorted order. In a procedural language, READ a, b, c FROM FileX; doesn't return the same results as READ b, c, a FROM FileX; and some languages let you overwrite the variable: READ a, a, a FROM FileX;.

In SQL, SELECT a, b, c FROM TableX; returns the same data as SELECT b, c, a FROM TableX; because things are located by name, not position.

When the relational model first appeared, the only mental model we had was the sequential file system. This is why SQL has a PRIMARY KEY syntax. In the relational sense, all keys are equally powerful.

Files always had keys in those days. And the first SQL products were implemented on top of existing file systems: Just map the table concept to a file, rows to records, and columns to fields! No need to learn anything new.

Since most SQL programmers never worked with cards or tapes, it amazes me that they rediscovered the underlying file implementation. Programmers often grab onto SEQUENCE, IDENTITY, ROWID, or other autonumbering vendor extensions to get a key to locate a given row, imitating a magnetic tape's sequential access. This approach exposes the physical order in which a row was added to a table — just like records placed on the end of a magnetic tape. And just like tape files, this lets the evil twin mess up the data.

A relational key is a subset of attributes that identify a row in a table. Thus, an autonumbering scheme can never be a key by definition: It's not an attribute of anything except the machinery's internal state. Pointers and other physical implementation details fail as identifiers on the same principle.

Table 1 shows a taxonomy of the methods used to make a row unique within its internal representation in a database.

Other keys

A natural key is a subset of attributes that occur in a table and act as a unique identifier — the classic relational key. Keys are visible, and you can verify them in the external reality. Examples include UPC codes, geographical coordinates, and DNA.

An artificial key is a visible attribute added to the table. It doesn't exist in the external reality but can be verified for syntax or check digits inside itself. For example, open codes in the UPC scheme can be assigned to a user's own stuff. The check digits still work, but you have to verify them inside your own enterprise.

A "uniqueifier" (please, dear reader, suggest a better word), isn't based on attributes in the data model and is exposed to the user. There's no way to predict or verify it. The system obtains a value through some physical process totally unrelated to the logical data model. Example: IDENTITY columns, other autonumbering devices.



Rate This Article

Comments:

Optional e-mail address:

A surrogate key is a system generated to replace the actual key behind the covers where the user never sees it. It may or may not be based on attributes in the table. Examples: hashing algorithms or pointers from many columns back to a common set of domain values.

Note: The terms "artificial key" and "surrogate key" are used interchangeably, but I think there's a big difference, as shown on the chart. This grid is a good place to start to talk about classifying types of keys. If anyone wants to add to it, please do.


Joe Celko [celko@northfacelearning.com] is the vice president of RDBMS at North Face Learning in Salt Lake City and author of five books on SQL.









IE Weekly Newsletter
Subscribe to the newsletter
    Email Address







InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space