Microsoft KB Archive/172308

= INF: How to Manipulate DBCS Data in a Text Field =

Article ID: 172308

Article Last Modified on 10/16/2003

-

APPLIES TO


 * Microsoft SQL Server 6.5 Standard Edition

-



This article was previously published under Q172308



SUMMARY
This article describes how to correctly manipulate double-byte character set (DBCS) data in text fields.



MORE INFORMATION
UPDATETEXT and WRITETEXT are powerful tools that allow users to modify existing text data through a text pointer. While using a text pointer, it is very common for users to accidentally split a double-byte character in half. This may result in "data corruption" from user's perspective. The following scripts demonstrate this common problem.

USE pubs GO  SET NOCOUNT ON   DROP TABLE textTable GO  CREATE TABLE textTable (     id SMALLINT         IDENTITY(1,1)         PRIMARY KEY CLUSTERED,      hexValues VARBINARY(20) NOT NULL,      textField TEXT NOT NULL   ) GO  DECLARE @hexValues VARBINARY(20) SELECT @hexValues = 0xa540a541a542 INSERT textTable VALUES (@hexValues,        CONVERT (VARCHAR(20), @hexValues)) INSERT textTable VALUES (@hexValues,        CONVERT (VARCHAR(20), @hexValues)) GO  SELECT * FROM textTable DECLARE @hexValues VARBINARY(20) SELECT @hexValues = 0xa543 DECLARE @string VARCHAR(20) SELECT @string = CONVERT(VARCHAR(20), @hexValues) DECLARE @textPointer VARBINARY(16) SELECT @textPointer = TEXTPTR(textField) FROM textTable WHERE id = 2 UPDATETEXT textTable.textField @textPointer 1 2 @string UPDATE textTable SET hexValues = CONVERT (VARBINARY(20), CONVERT(VARCHAR(20), textField)) WHERE id = 2

As the execution result indicates, the text data becomes invalid because a double-byte character has been split. It is important for users to check whether the target offset from the start of the existing text is a lead byte or not, so that incorrect results can be avoided. The following sample scripts demonstrate checking the target offset of a text field in a SQL Server that was installed with code page 950. Please note the lead byte ranges may vary in the other DBCS code pages. For more information about lead byte ranges in different code pages, please see "Developing International Software for Windows 95 and Windows NT" (ISBN 1-55615-840-8).

CREATE PROCEDURE IsThisLeadByteInCP950 @string VARCHAR(255), @offset SMALLINT

/* @string: is the target string that will be checked */ /* @offset: is the 1-based starting position,        */ /*         specified as the number of bytes (from    */    /*          the start of the existing text value) to  */ /*         skip. */   AS      SELECT @string = RIGHT(REVERSE(@string), @offset) DECLARE @i SMALLINT SELECT @i = ASCII(@string) IF @i >= 129 and @i <= 254 /* lead byte range: 0x81 to 0xFE */ RETURN 1 ELSE RETURN 0 GO  DECLARE @hexValues VARBINARY(20) SELECT @hexValues = 0xa540a541a542 DECLARE @string VARCHAR(20) SELECT @string = CONVERT(VARCHAR(20), @hexValues) DECLARE @i SMALLINT EXEC @i = IsThisLeadByteInCP950 @string, 1 IF (@i = 1) PRINT "It is a lead byte in code page 950, command aborted." ELSE PRINT "It is not a lead byte in code page 950, command continue..."

Additional query words: JIS Wansung Johab Hangul KB BIG-5 932 936 949 950 traditional Chinese Japanese Korean codepage cp

Keywords: kbhowto kbusage KB172308

-

[mailto:TECHNET@MICROSOFT.COM Send feedback to Microsoft]

© Microsoft Corporation. All rights reserved.