Microsoft KB Archive/244085

= PRB: Parsing HTML on Server Using Internet Explorer Components =

Article ID: 244085

Article Last Modified on 7/20/2001

-

APPLIES TO


 * Microsoft Internet Explorer 4.0 128-Bit Edition
 * Microsoft Internet Explorer 4.01 Service Pack 2
 * Microsoft Internet Explorer 4.01 Service Pack 1
 * Microsoft Internet Explorer 4.01 Service Pack 2
 * Microsoft Internet Explorer 5.0

-



This article was previously published under Q244085



SYMPTOMS
It may be desirable to parse HTML files inside a Web server process in response to a browser page request. However, the WebBrowser control, DHTML Editing Control, MSHTML, and other Internet Explorer components may not function properly in an Active Server Pages (ASP) page or other application run in a Web server application.



CAUSE
Internet Explorer and its associated components were not designed or tested to be used in the constraints of the high-performance, secure user context of a Web server process.



RESOLUTION
Microsoft does not support the use of the WebBrowser control, DHTML Editing Control, DHTMLED, or MSHTML from inside a Web server (IIS) process. Applications experiencing problems with these components should be redesigned to use alternate technologies.



STATUS
This behavior is by design.



MORE INFORMATION
Most Web applications that attempt to programmatically parse HTML on the server have two steps they need to accomplish: retrieve the HTML from a remote server and parse the HTML.

Retrieving HTML from Server
When retrieving HTML from another server, Internet Explorer components should not be used. All of the mechanisms used in Internet Explorer or that use Internet Explorer components -- WebBrowser control, Internet Transfer Control, and so on -- rely ultimately on the services of a low-level client module called WININET to make requests to other Web servers. WININET is not supported in a server context and has a number of known performance problems in this environment. Thus, any server-side application that needs to parse HTML must either store all necessary HTML locally or use a lower-level Networking component or technology, such as the WinSock API or Visual Basic WinSock Control, to retrieve the HTML file before attempting to parse it. For additional information, click the article number below to view the article in the Microsoft Knowledge Base:

238425 INFO: WinInet Not Supported for Use in Services

In general, downloading data from another Web server adds an extra level of delay to a Web server that is not appreciated in a typical Web application. It is recommended that high performance server applications use an alternative design to avoid this delay.

Parsing HTML
As with the retrieval of the HTML from other Web servers, but not to as large an extent, parsing HTML is a time expensive operation. Web application developers should consider their design very carefully before creating an application that needs to do parsing on a per-request basis.

Microsoft offers a number of components that developers can re-use to parse HTML in their own applications, either with a user interface (UI) for editing or with a simple parser without a user interface. The DHTML Editing Control is probably the best choice for this job.

However, none of the HTML parsing technologies offered by Microsoft today have been designed or tested to work in a high-performance server context. There may be a number of performance concerns, especially for high-use Web servers. Developers that experience problems with these technologies in these environments should consider writing custom HTML parsing code that is optimized for just the information that the application needs to retrieve from the HTML. This yields the best performance in any scenario.

Keywords: kbprb kbfaq KB244085

-

[mailto:TECHNET@MICROSOFT.COM Send feedback to Microsoft]

© Microsoft Corporation. All rights reserved.