Microsoft KB Archive/171299

From BetaArchive Wiki
Knowledge Base


INF: How to Obtain Case-Sensitive Query Results with a Case-Insensitive Sort Order

Article ID: 171299

Article Last Modified on 2/14/2005



APPLIES TO

  • Microsoft SQL Server 6.0 Standard Edition
  • Microsoft SQL Server 6.5 Standard Edition
  • Microsoft SQL Server 7.0 Standard Edition
  • Microsoft SQL Server 2000 Service Pack 3a
  • Microsoft SQL Server 2000 Standard Edition



This article was previously published under Q171299

SUMMARY

A sort order is a set of rules that determines how SQL Server collates and presents data in response to database queries. The sort order determines the order in which data is presented in response to SQL Server statements involving GROUP BY, ORDER BY, and DISTINCT clauses. The sort order also determines how certain queries are resolved, such as queries involving WHERE and DISTINCT clauses.

During installation, SQL Server allows you to select a character set and a sort order that the server will use. Several sort orders and character sets are available. The default character set is ANSI (ISO 8859-1), and the default sort order is dictionary order, case-insensitive.

If a case-insensitive sort order is chosen, SQL Server considers character values that differ only in case as equal in value. That is, 'John' = 'john'. With a case-insensitive sort order, it may often be necessary to obtain query results with case-sensitivity considered. This article explains techniques to accomplish this.

SQL Server 2000 replaces code pages and sort orders with collations. SQL Server 2000 includes support for most collations supported in earlier versions of SQL Server, and introduces a new set of collations based on Windows collations. You can now specify collations at the database level or at the column level. Previously, code pages and sort orders could be specified only at the server level and applied to all databases on a server. For more information, see "Collations" in SQL Server 2000 Books Online.

MORE INFORMATION

To examine the ramifications of a case-insensitive sort order on query results, consider the following scenario:

   create table T1(col1 char(5) NOT NULL, col2 int NOT NULL)
   go
   insert T1 values('John',1)
   insert T1 values('John',1)
   insert T1 values('JOhn',1)
   insert T1 values('JOhn',1)
   insert T1 values('JoHn',1)
   insert T1 values('JoHn',1)
   insert T1 values('John',1)
   insert T1 values('JohN',1)
   insert T1 values('JohN',1)
   insert T1 values('john',1)
   insert T1 values('john',1)
   insert T1 values('john',1)
   go
                


Executing the following query on a server with a case-insensitive sort order returns the following results:

   select DISTINCT(col1) from T1
   go
                


   col1
   -----
   John

   (1 row(s) affected)
                


However, executing the same query on a server with a case-sensitive sort order returns the following results:

   col1
   -----
   John
   JOhn
   JoHn
   JohN
   john

   (5 row(s) affected)
                


Regarding the usage of GROUP BY, ORDER BY, and WHERE clauses, observe the results of these queries that are run on a server with a case-insensitive sort order, and compare them with the modified queries below. Note that in all cases, all of the values in col1 are treated as one distinct value:

   SELECT sum(col2)'col2 Group Totals' from T1 group by col1
   go
                
   col2 Group Totals
   -----------------
   12

   (1 row(s) affected)
                


   SELECT * from T1 order by col1
   go
                
   col1  col2
   ----- -----------
   John  1
   John  1
   JOhn  1
   JOhn  1
   JoHn  1
   JoHn  1
   John  1
   JohN  1
   JohN  1
   john  1
   john  1
   john  1

   (12 row(s) affected)
                


   SELECT * from T1 where col1='john'
   go
                
   col1  col2
   ----- -----------
   John  1
   John  1
   JOhn  1
   JOhn  1
   JoHn  1
   JoHn  1
   John  1
   JohN  1
   JohN  1
   john  1
   john  1
   john  1

   (12 row(s) affected)
                


The ability to obtain case-sensitive results with the DISTINCT function and the GROUP BY, ORDER BY, and WHERE clauses relies on the fact that the actual data is stored in binary form, with each character represented by a unique hexadecimal value. You can use the CONVERT function to convert the data from character format to a string of hexadecimal values. The results of the conversion can be operated on by the DISTINCT function and the GROUP BY, ORDER BY, and WHERE clauses without regard to case-sensitivity.

Consider the results of the following query, which displays the original character data and its corresponding hexadecimal representation:

   select col1,'=',CONVERT(binary(5),col1)'hex values' from T1
                
   col1    hex values
   ----- - ------------
   John  = 0x4a6f686e20
   John  = 0x4a6f686e20
   JOhn  = 0x4a4f686e20
   John  = 0x4a4f686e20
   JoHn  = 0x4a6f486e20
   JoHn  = 0x4a6f486e20
   John  = 0x4a6f686e20
   JohN  = 0x4a6f684e20
   JohN  = 0x4a6f684e20
   john  = 0x6a6f686e20
   john  = 0x6a6f686e20
   john  = 0x6a6f686e20

   (12 row(s) affected)
                


Note that in the results, each unique value of col1 has a distinct hexadecimal representation. To get the distinct values in character format, run the next set of statements. It produces the same results as a query using the DISTINCT function on a case-sensitive server.

   select DISTINCT(CONVERT(binary(5),col1))'col1'
   into #t1
   from T1
   go
   select CONVERT(char(5),col1)'distinct character results' from #t1
   go

   (5 row(s) affected)
                
   -----
   JOhn
   JoHn
   JohN
   John
   john

   (5 row(s) affected)
                


For queries using GROUP BY and/or ORDER BY clauses, consider the following examples:

   select DISTINCT(CONVERT(binary(5),col1))'col1',sum(col2)'col2'
   into #t1
   from T1
   group by CONVERT(binary(5),col1)
   go
   select CONVERT(char(5),col1)'distinct character results',col2'Group
   Totals'
   from #t1
   go
                
   (5 row(s) affected)

   distinct character results Group Totals
   -------------------------- ------------
   JOhn                       2
   JoHn                       2
   JohN                       2
   John                       3
   john                       3

   (5 row(s) affected)
                


   SELECT *
   from T1
   order by CONVERT(binary(5),col1)
   go
                
   col1  col2
   ----- -----------
   JOhn  1
   JOhn  1
   JoHn  1
   JoHn  1
   JohN  1
   JohN  1
   John  1
   John  1
   John  1
   john  1
   john  1
   john  1

   (12 row(s) affected)
                


For a query using the WHERE clause, use the CONVERT function to convert the constant and/or column values to comparable hexadecimal strings, as in the following example:

   SELECT *
   from T1
   where CONVERT(binary(5),col1)=CONVERT(binary(5),'john ')
   go
                
   col1  col2
   ----- -----------
   john  1
   john  1
   john  1

   (3 row(s) affected)
                


It is important to understand that the results returned from queries can be dramatically affected by the choice of sort order. Due consideration must be given to the importance of case-sensitivity when planning the installation and configuration of SQL Server. For further information on this topic, see "Character Sets And Sort Orders" in the SQL Server Books Online and the Administrator's Companion.


Additional query words: hex

Keywords: kbhowto kbinfo kbusage KB171299