Microsoft KB Archive/829938

= Surrogate pair characters can be split by using automation in Visio 2003 =

Article ID: 829938

Article Last Modified on 5/21/2007

-

APPLIES TO


 * Microsoft Office Visio Professional 2003
 * Microsoft Office Visio Standard 2003

-





SYMPTOMS
When you write a custom program for use in Microsoft Office Visio 2003, you may find that the custom code can split a surrogate pair. For example, you can write automation code to insert text in the middle of a surrogate pair or to delete one half of a surrogate pair.



CAUSE
This problem occurs because the Characters automation object contains a &quot;begin&quot; and an &quot;end&quot; text position. The &quot;begin&quot; and &quot;end&quot; text positions can be set to use any location in text, including between each half of a surrogate pair.



WORKAROUND
To work around this problem, write automation code that treats the surrogate pairs as atomic characters. For example, if you create a custom program to simulate a text editor, make sure that the pointer cannot be inserted in the middle of a surrogate pair.



STATUS
Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the &quot;Applies to&quot; section of this article.



MORE INFORMATION
A surrogate pair is a pair of 16-bit Unicode code values that represent a single character. The first (high) surrogate is a 16-bit code value in the range U+D800 to U+DBFF. The second (low) surrogate is a 16-bit code value in the range U+DC00 to U+DFFF. Surrogate pairs extend the character set beyond the Unicode character. Using surrogates pairs, Unicode can support over one million characters.

Each surrogate pair is an indivisible unit. That is, each half of the pair does not have any meaning individually. A character is represented only when both halves of the surrogate pair are combined. When you edit text that contains surrogate pairs, the text editor cannot split the halves of a surrogate pair. For example, you cannot do any one of the following:
 * Insert text in the middle of a surrogate pair
 * Delete or replace one member of a surrogate pair
 * Change the formatting of one member of a surrogate pair
 * Distinguish the difference between ordinary text and text in a surrogate pair

However, if you are a Visio 2003 developer, you can write custom code to do any one of the following:
 * Insert text in the middle of a surrogate pair
 * Delete or replace one member of a surrogate pair
 * Change the formatting of one member of a surrogate pair

When custom automation code modifies one half of a surrogate pair, Visio 2003 configures the text so that there are no dangling pairs. After each atomic automation call, the half of the “dangling” surrogate pair is changed to use the special character, 0xFFFD, also known as the Unicode “Replacement Character”. This special character is used to replace characters that cannot be otherwise represented. Additionally, if custom automation code tries to change formatting properties that affect surrogate pairs, Visio 2003 extends the formatting so that the formatting is not modified in the middle of the surrogate pair.

For more information about surrogate pairs, visit the following Microsoft Web site:

http://msdn2.microsoft.com/en-us/library/ms776414.aspx

For more information about Visio 2003, visit the following Microsoft Web site:

http://www.microsoft.com/office/visio

Keywords: kbpending kbbug KB829938

-

[mailto:TECHNET@MICROSOFT.COM Send feedback to Microsoft]

© Microsoft Corporation. All rights reserved.