Microsoft KB Archive/251134
Article ID: 251134
Article Last Modified on 5/11/2006
- Microsoft XML Parser 2.0
- Microsoft XML Parser 2.5
This article was previously published under Q251134
When attempting to load an XML file saved as UTF-7 (a transfer encoding format for Unicode), the XML parser in Internet Explorer generates the following error message:
The same error also occurs when using the MSXML parser from server-side or client-side script.
Versions of the MSXML parser prior to MSXML 2.6 do not support UTF-7.
To resolve this problem, save your XML documents as UTF-8, the preferred transfer encoding format for Unicode.
MSXML 2.6 or later supports UTF-7 encoding.
This behavior is by design.
Although Unicode is a uniform character set representing nearly all the world's languages, there are many byte representations, or transformation formats, that a Unicode file can use. The most popular format is UTF-8, which represents Unicode characters as a sequence of one to four 8-bit bytes. UTF-7 is a 7-bit transformation format defined to allow Unicode text to pass through mail gateways that assume ASCII and strip out the high bit of text messages.
Based on the XML 1.0 standard, Section 4.3.3, a valid XML file is required to be one of following:
- A Unicode file in UTF-8 format.
- A Unicode file in UTF-16 format.
- A file in some other character encoding (for example, ASCII) that has as its very first bytes the
UTF-7 does not use the Byte Order Mark. Also, UTF-7 converts the special XML character < to +ADw, which ends up being the first character of the UTF-7 encoded XML document. Since this is not compliant with the XML standard, MSXML refuses to load such files.
Many text editors and word processors allow you to save Unicode text files, known as encoded text in Microsoft Word, in many different transfer encodings, including UTF-7. So if you save a document in Word as "encoded text UTF-7," MSXML will refuse to load it for the above reasons.
Steps to Reproduce Behavior
Create a simple XML file in Word 2000:
<?xml version="1.0"?> <MyTag> <EmbeddedTag name1="value"/> </MyTag>
- Save the file as encoded text. When Word asks you if you wish to lose formatting, click Yes. Word will then prompt you for an encoding format to use. Select UTF-7, and then save the document as cap file name TestUTF7.xml.
- Load cap file name TestUTF7.xml in Internet Explorer 5. You will receive the following error message:
For more information on Unicode, please visit the following Web addresses
http://www.microsoft.com/globaldev/articles/unicode.asp for the latest Microsoft Global Software Development
http://www.unicode.org/ for the latest Unicode Standard.
For more information about developing Web-based solutions for Microsoft Internet Explorer, visit the following Microsoft Web sites:
(c) Microsoft Corporation 2000, All Rights Reserved. Contributions by Jay Andrew Allen, Microsoft Corporation.
Additional query words: unicode utf-7 utf-8 transfer encoding transformation format
Keywords: kbbug kbintl kbintldev kbnofix kbprb kbunicode KB251134