Microsoft KB Archive/251134

= PRB: XML Parser Cannot Parse UTF-7 Documents =

Article ID: 251134

Article Last Modified on 5/11/2006

-

APPLIES TO


 * Microsoft XML Parser 2.0
 * Microsoft XML Parser 2.5

-



This article was previously published under Q251134



SYMPTOMS
When attempting to load an XML file saved as UTF-7 (a transfer encoding format for Unicode), the XML parser in Internet Explorer generates the following error message:

Invalid at the top level of the document.

The same error also occurs when using the MSXML parser from server-side or client-side script.



CAUSE
Versions of the MSXML parser prior to MSXML 2.6 do not support UTF-7.



RESOLUTION
To resolve this problem, save your XML documents as UTF-8, the preferred transfer encoding format for Unicode.

MSXML 2.6 or later supports UTF-7 encoding.



STATUS
This behavior is by design.



MORE INFORMATION
Although Unicode is a uniform character set representing nearly all the world's languages, there are many byte representations, or transformation formats, that a Unicode file can use. The most popular format is UTF-8, which represents Unicode characters as a sequence of one to four 8-bit bytes. UTF-7 is a 7-bit transformation format defined to allow Unicode text to pass through mail gateways that assume ASCII and strip out the high bit of text messages.

Based on the XML 1.0 standard, Section 4.3.3, a valid XML file is required to be one of following:
 * A Unicode file in UTF-8 format.
 * A Unicode file in UTF-16 format.
 * A file in some other character encoding (for example, ASCII) that has as its very first bytes the

UTF-7 does not use the Byte Order Mark. Also, UTF-7 converts the special XML character < to +ADw, which ends up being the first character of the UTF-7 encoded XML document. Since this is not compliant with the XML standard, MSXML refuses to load such files.

Many text editors and word processors allow you to save Unicode text files, known as encoded text in Microsoft Word, in many different transfer encodings, including UTF-7. So if you save a document in Word as "encoded text UTF-7," MSXML will refuse to load it for the above reasons.

Steps to Reproduce Behavior
  Create a simple XML file in Word 2000: 

    Save the file as encoded text. When Word asks you if you wish to lose formatting, click Yes. Word will then prompt you for an encoding format to use. Select UTF-7, and then save the document as cap file name TestUTF7.xml. Load cap file name TestUTF7.xml in Internet Explorer 5. You will receive the following error message:

Invalid at the top level of the document. Line 1, Position 1

+ADw-?xml version+AD0AIg-1.0+ACI-?+AD4-.

</li></ol>

<div class="references_section">