Microsoft KB Archive/243254

= PRB: Internet Explorer Removes Quotes when Saving Documents as HTML Complete =

Article ID: 243254

Article Last Modified on 7/20/2001

-

APPLIES TO


 * Microsoft Internet Explorer 5.0

-



This article was previously published under Q243254



SYMPTOMS
When saving a Web page using Internet Explorer 5 Web Page Complete option, the quotes may be removed from the attribute tags within that page. This may also occur when saving the page directly from MSHTML using scripting or some other method.



RESOLUTION
If you would like to save a Web page in its original format, choose the option to save a page as "HTML Only".



STATUS
This behavior is by design.



MORE INFORMATION
When saving a Web page using the Web Page Complete option of Internet Explorer, the page has to be completely parsed and re-written to contain local paths for all page resources. As the page is re-written, it is saved in the most efficient form possible. According to the World Wide Web Consortiums (W3C) HTML 4.0 spec, the quotes around all attribute tags are recommended but not required:

"In certain cases, authors may specify the value of an attribute without any quotation marks. The attribute value may only contain letters (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), and periods (ASCII decimal 46). We recommend using quotation marks even when it is possible to eliminate them."

When required, Internet Explorer will include the quotes in the page.

If you are looking use Internet Explorer as a parsing engine, you should consider using the DHTML Editing Control. This control does not remove the quote marks from HTML when saving files.

NOTE: HTML documents do not follow the strict rules of the XML specification and are rarely valid as XML documents without conversion. In particular, saving documents from Internet Explorer as Web Page Complete will usually generate HTML documents that are invalid XML, because XML requires all attributes to be quoted. Furthermore, there are a number of other concerns for systems attempting to parse HTML documents using an XML parser. The W3C is currently working on a reformulation of HTML as an XML application named XHTML to help solve these problems.

