Guide to the TechWeb Network

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Advanced Search
RSS
Webcasts
Whitepapers
Subscribe
Home



And the 'Email Validation' SQL Puzzle Winner Is... | Intelligent Enterprise Blog
And the 'Email Validation' SQL Puzzle Winner Is...

Posted by Joe Celko
Thursday, September 27, 2007
8:59 AM

The winner of last week's 'Email Address Validation' SQL puzzle is "Guest" (see comments), because he/she bothered to do the research and come up with an answer that is generic enough to port to any SQL dialect with a SIMILAR TO or a regexp() function. So, "Guest" please email me with your snail mail address (and some attempt to validate your SQL mastery/identity) and I'll send you one of my books.

My answer to last week's puzzle is as follows:

You can find regular expressions at http://regexlib.com/ and copy them into your code, making changes for your dialect. For example, this regular expression checks an email format against the RFC 3696 Standard and was written by David Thompson

^[a-z0-9!$'*+\-_]+ (\.[a-z0-9!$'*+\-_]+)*
@
([a-z0-9]+(-+[a-z0-9]+)*\.)+
([a-z]{2}
|aero|arpa|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|pro|travel)$

If you do not read regular expressions, this says that a valid email address is one or more groups of strings of alphanumeric characters and some limited punctuation marks, optionally separated by a period. Then there is one "little snail" or "at-sign" in the middle. This followed by more groups of strings of alphanumeric characters and a more limited set of punctuation marks separated by periods. The string finally ends with either a two-letter country code or one of several explicit domain codes.

The problem is that the "[a-z]{2}" pattern matches any two letters even when they are not a valid country code.

If you do not have a SIMILAR TO predicate in your SQL, there is another approach. Set up a CREATE TRANSLATION declaration that maps the legal postfixes into a single unique token not used in an email address. This result is then passed on to another TRANSLATE () expression which reduces the alphanumeric and punctuation characters to a second unique token. Eventually, you wind up with a reduced pattern made up of the two tokens and the at-sign, say '#@#?' since neither '#' nor '?' appear in an email address.

A third approach is to use the TRIM( FROM ) function to reduce the suspect email address to a single at-sign or empty string.

The same effect can be had with nested REPLACE expressions in some dialects but the nesting can be pretty deep. If you are a LISP programmer, you will not mind a bit.

Are these good methods to use in place of using an external call to an external procedure in a 3GL language or SQL/PSM? Baroque as these suggestions are, they often run much faster than the external call and they are portable. But they are ugly to maintain.

Joe Celko is an independent consultant in Austin, Texas, and the author of SQL Puzzles and Answers (2006), Joe Celko's SQL for Smarties: Advanced SQL Programming (2005), and Joe Celko's Trees and Hierarchies in SQL for Smarties (2004).



E-MAIL | SLASHDOT | DIGG




This is a public forum. CMP Technology and its affiliates are not responsible for and do not control what is posted herein. CMP Technology makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.

Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of CMP Media LLC and may be edited and republished in print or electronic format as outlined in CMP Technology's Terms of Service.

Important Note: This comment area is NOT intended for commercial messages or solicitations of business.


 




    Subscribe to RSS feed of all blogs


 



InformationWeek Business Technology Network
InformationWeekInformationWeek 500InformationWeek 500 ConferenceInformationWeek AnalyticsInformationWeek CIO
InformationWeek EventsInformationWeek ReportsInformationWeek MagazinebMightyByte and SwitchDark Reading
Digital LibraryIntelligent EnterpriseInternet EvolutionNetwork ComputingNo Jitter
space
Techweb Events Network
InteropVoiceConWeb 2.0 ExpoWeb 2.0 SummitEnterprise 2.0 ConferenceMobile Business ExpoSoftware ConferenceCSI - Computer Security Institute
Black HatGTECEnergy CampMashup CampStartup Camp
space
Light Reading Communications Network
Light ReadingLight Reading EuropeUnstrungLight Reading's Cable Digital NewsConstantinopleInternet Evolution
Heavy ReadingLight Reading Live!Light Reading InsiderEthernet ExpoOptical ExpoTeleco TVTower Technology Summit
space
Financial Technology Network
Advanced TradingBank Systems & TechnologyInsurance & TechnologyWall Street & TechnologyAccelerating Wall StreetBank Systems & Technology Executive SummitBuyside Trading SummitInsurance & Technology Executive Summit
space
Microsoft Technology Network
MSDN MagazineTechNetThe Architecture Journal
space