It was during SQLBits 10 in London that I first uttered the phrase “data cleansing for fun & profit”, as a spoof presentation title during Buck Woody’s “
Moves Presenting Like Jagger” presentation. I hadn’t really thought about it much since then, until just recently, I was thinking about the sort of support calls that come through to me. And they (almost entirely) amount to dodgy unclean data.
For example: a field named “TelephoneNumber” should contain phone numbers, right? Not email addresses, nicknames, first line of addresses etc. The big problem is, though, how do you keep this clean? Particularly when the data is stored in two separate tables in two separate databases, and is regularly fed from another unclean source over which you have no control?
edit: I seem to have got the presentations confused. I offer no excuses beyond the usual lack of caffeine… Props to Jonathan “FatherJack” Allen. Should have checked my facts at http://www.sqlbits.com/Speakers/Buck_Woody