This article is for demonstrating how to use a SOUNDEX in a select and then listing all the variations based on case-sensitivity.
Why?
We have a database with data in it. For a particular column we have setup default values, let's use the example "Data Not Yet Available". Unfortunately the end-user reported these default values sometimes list twice, especially when the case is different, eg. "Data not yet available". The final system (qlikview) was case-sensitive despite our server collation being case-insensitive.
Furthermore, we now have the task of finding all the variations of the default values which we found we could do with the built-in SOUNDEX function.
How?
Lets assume the following, we have a table containing:
StudentID StudentDetail Soundex ----------------- ---------------------- ----------- 1 Data not yet available D300 2 Data Not Yet Available D300 3 data not yet available D300 4 Daat oNt EYt Avialable D300
- StudentID StudentDetail Soundex
- ----------------- ---------------------- -----------
- 1 Data not yet available D300
- 2 Data Not Yet Available D300
- 3 data not yet available D300
- 4 Daat oNt EYt Avialable D300
SELECT DISTINCT StudentDetail FROM StudentTable WHERE SOUNDEX(StudentDetail)=SOUNDEX('Data Not Yet Available') -- yields StudentDetail ----------------- Data Not Yet Available
- SELECT DISTINCT
- StudentDetail
- FROM
- StudentTable
- WHERE
- SOUNDEX(StudentDetail)=SOUNDEX('Data Not Yet Available')
- -- yields
- StudentDetail
- -----------------
- Data Not Yet Available
SELECT DISTINCT StudentDetail COLLATE Latin1_General_CS_AS FROM StudentTable WHERE SOUNDEX(StudentDetail)=SOUNDEX('Data Not Yet Available') -- yields StudentDetail ----------------- Data not yet available Data Not Yet Available data not yet available Daat oNt EYt Avialable
- SELECT DISTINCT
- StudentDetail COLLATE Latin1_General_CS_AS
- FROM
- StudentTable
- WHERE
- SOUNDEX(StudentDetail)=SOUNDEX('Data Not Yet Available')
- -- yields
- StudentDetail
- -----------------
- Data not yet available
- Data Not Yet Available
- data not yet available
- Daat oNt EYt Avialable
Now we can run this on a column in a table to determine if there are discrepancies in our default values.
Additional
Note how this also picked up typos in the 4th value which is good as I didn't want to overcomplicate things using LEVENSHTEIN and DIFFERENCE:
SOUNDEX('Data Not Yet Available') = SOUNDEX('Daat oNt EYt Avialable')
- SOUNDEX('Data Not Yet Available') = SOUNDEX('Daat oNt EYt Avialable')