Microsoft KB Archive/258089

From BetaArchive Wiki

Article ID: 258089

Article Last Modified on 1/23/2007



APPLIES TO

  • Microsoft Site Server 3.0 Standard Edition



This article was previously published under Q258089

SUMMARY

This article is designed to help you address some of the most common problems that users run into when configuring Site Server 3.0 Usage Import. This article contains an overview of some of the common settings in Usage Import, and in some sections, discusses the impact that certain settings have on reports.

This article does not cover every setting in Usage Import, and should not be used as the sole source of information for configuring Usage Import. It is intended only as a supplement to the Microsoft Site Server documentation.

MORE INFORMATION

Site Server Usage Import is a utility that is included in Site Server 3.0. Usage Import is used to import Web log files into a database so that the log information can be reported. To get accurate information from your reports, you must accurately import the information from the log files into the database. The accuracy of the import is dependent on properly configuring the Usage Import settings before importing the log files. Two areas of Usage Import contain the configuration settings that are used during an import. The first area that is discussed in this article is the Server Manager, and the second area is the Tools/Options menu settings. Both of these areas work in unison to extract the Web site activity from a log file and store it in the database accurately.

Server Manager

To begin, it is useful to understand what the different elements in Server Manager represent.

Log Data Sources

Log Data Sources is the root container for all the Web sites and log files that are contained in the database. The Log Data Sources represents a single Analysis Database.

Log Data Source

Note that Log Data Source is not the same as Log Data Sources. Log Data Source is singular and it is preceded with an icon that looks like a scroll of paper. The Log Data Source represents a single group of log files for a Web site. For example, Internet Information Server (IIS) 4.0 keeps log files for each Web site that you create in individual folders. For the default Web site, the default folder that the default Web site creates its log files in is \%Windir%\System32\Logfiles\W3svc1.

A Log Data Source for the default Web site equates to all log files contained in the preceding folder.

To create a new log data source, right-click Log Data Sources, and then click New Log Data Source. In the Select your log file format window, Microsoft recommends that you choose Auto Detect. Auto Detect reads multiple lines from the log file, and then determines what format those lines are saved as. Auto Detect is extremely accurate in determining Log File Format.

Server

The server equates to all requests in a log file that match to a single IP address and IP Port, or all IP addresses that are associated with a single IP Port. Servers are preceded with a spider icon.

The Server properties contain the following six options that need configuration:

  • Server Type: The server type describes the type of log file that Usage Import imports. The choices are World Wide Web, Proxy, FTP, and Streaming Media. Choose the server type that creates the log file.
  • Directory Index Files: List the files that are used on your Web server as the default file that is launched when a client does not request a specific file in their URL. The most common index file names are Default.htm, Default.html, Default.asp, Index.htm, or Index.htm. If you are not certain what files are used on your site, contact the content developer of the site, who should be able to provide you with this information.
  • IP Address: This field may be optional depending on the information that you want to report, how your Web site is configured, and how your Web server logs requests to a log file. Most log files contain the IP address of the server that a request is intended for. The IP Address field is used by Usage Import to filter all activity (requests) to and from a specific IP address out of the log file. If your log file contains requests only to a single Web server and you want to report on all activity on that Web server regardless of what IP address from which the request comes, you can leave IP Address blank. If your Web site is multi-homed (multiple IP addresses bound to the same Web site) and you want to report on requests based on the IP address they came in on, you need to create a server instance for each IP address, and then enter one IP address in each server instance that corresponds to one of the multi-homed IP addresses bound to the Web site. This setting is also used when you need to separate the requests to a single Web site from a log that contains requests to multiple Web sites. Some Web servers place all of their information on all of their Web sites into a single log file. Internet Information Server (IIS) 3.0 works this way. In IIS 3.0, if you have multiple sites bound to different IP addresses, all the logging information for all the sites is contained in one single log file. To separate one Web site's log information from another Web site's log information, create a server instance for each Web site, and then enter the IP address bound to each Web site in IP Address. By placing the IP address in the IP Address field, Usage Import can extract out requests specific to that IP address and import them into the database. You can the run reports against a specific server instance so that you can get information that only pertains to one individual Web site.
  • IP Port: This is used to specify the IP port that your Web site is bound to. For instance, if your Web site is bound to port 8080, then you would enter 8080. When you have a Web server that is configured to handle both SSL and non-SSL communications, create two server instances. In the first server instance, enter the non-SSL port that you have configured for your site. In the second server instance, enter the SSL port that you have configured for your Web site.
  • Local Timezone: This setting is a common area for mistakes, as it does not mean what most people think. The local time zone that is referred to here is not the time zone in which the Web server resides, nor is it the time zone from which the administrator is running Analysis. The time zone referred to here is the time zone that the time stamp on each request in the log file is set to. For example, IIS stamps all of its entries in the log files with a time based on Greenwich Mean Time (GMT). For IIS-created log files, Microsoft recommends that you set Local Timezone to GMT 00:00 Monrovia, Casablanca. The difference between Greenwich Mean Time and Monrovia, Casablanca is that Monrovia, Casablanca adjusts for Daylight Saving Time. This keeps your time values constant all year round, whether your system adjusts to Daylight Saving Time or not.
  • Local Domain: For this setting, enter the local domain of the Web server hosting facility. For example, if your Web site is www.microsoft.com, then you would enter microsoft.com as the local domain.

Site

The Site property represents the logical grouping of Web site content in the log file. Sites are preceded with a ball icon.

The Site property contains six sections. Of these six sections, this article covers only Basic, Excludes, Inferences, and Query Strings. For information regarding the Advertising and File Names sections, see the Site Server online documentation.

  • Basic

    • Home Page URLs: The Home Page URL is a required field. This field is a major part of the method used by Usage Import to determine visits to a site.

      For accurate visit counts to your site, it is essential that you enter all possible DNS names, NetBIOS names, and IP addresses that point to this Web site. If a DNS name, NetBIOS name, or IP address for the Web site is missing from this field, the visit count may be inaccurate. Entries in this filed must be in the following format:

      HTTP://<name or IP address>

      Multiple entries must be separated by a space. Note that this field only holds up to 255 characters (including spaces). After 255 characters, the remaining characters are truncated.

      A good sign that this field is improperly configured is when the visit count is nearly as high as the hit count in reports.
    • URL Path Extensions for This Site: This field is used to partition content during an import.

      URL path extensions refer to virtual directories under the Web site. These virtual directories are sometimes referred to as virtual sites, sites, sub-Webs, and child Webs. In reality, they are simply directories under a root Web site. If this filed is left blank, then all information for the entire root Web is imported under this site. For example, the URL path extension for the Web site http://www.microsoft.com/products would be /products/*. This entry informs Usage Import to take all the information pertaining to requests referencing the products directory and associate it with this particular site.

      The only time you need to enter a URL path extension is when you need to run the Bandwidth Report. Except for the Bandwidth Report, all other reports can be customized to report on individual sites by setting a filter on the specific report of Dimension equals <filename>, Boolean equals =, and value equals <URL path extension> (that is, <filename> = "/products/*").

      If you do not need Bandwidth Reports, leave this field blank, and filter out your site on the reports.
  • Excludes

    • Hosts to Exclude: The Hosts to Exclude field informs Usage Import to ignore requests from the specified client IP addresses during the import.

      This field is often used to exclude activity on a Web site from internal Web support personnel, but it can be used to filter out hits from an IP address for any reason you chose. For example, if you have developers who are accessing the Web sites and testing their code, or uploading code and applications, you may not want to include them in your site statistics when you run reports. You can exclude all of the developers' activity that is captured in the Web log files by excluding the developers' IP address or domain name. This field allows wildcards so that you can exclude an entire range of IP addresses instead of entering each IP address individually. The same is true for domain names. Entries must be separated by a space.

      NOTE: Entries should not be enclosed in quotes as shown in the online documentation.
    • File Types to Exclude: File Types to Exclude is the primary component in the method used by Usage Import to determine requests.

      When a user makes a request for a page from a Web server, there are usually a number of supporting files that must be sent back to the client's browser. These pages may contain images or image maps or other types of supporting files that are needed to present the complete requested page in the client's browser. For example, a request for a file named Default.htm, may result in ten requests (one request for Default.htm and nine more requests for the supporting .gif or .jpg inline images), so that Default.htm appears properly in the clients browser. Although ten requests were made to the Web server, the user only requested one file, Default.htm. To import requests that are made by a user accurately, you need to exclude requests made for supporting files such as .gif or .jpg. The final result is one request for the Default.htm file being logged during an import with ten hits being logged to service the one request.

      This field comes pre-configured to cover the most common inline image file types. To make sure that your request statistics are accurate in Analysis, it is important that you list all support file types for the Web site in this field. This field can contain specific file names, such as MyImageMap.mfa, or it can accept wildcards, such as *.mfa. Files and file extensions must be separated by a space.

      NOTE: Do not put quotation marks around the file names or extensions as the online documentation states.
  • Inferences

    • Request Algorithm: This item is used to help Analysis compensate for caching on the network. Proxy servers and client browsers can cache content, which means that a user can request a file from your Web site, but your Web site may not get that request. That request may get handled from the client browser's local cache or from a proxy server's cache. This setting tries to compensate for the caching that occurs on networks. It informs Usage Import to try and determine what requests from a visit may have been handled out of cache, and to insert the missing requests into the visit sequence.
    • Visit Algorithm: This item is used to determine when a visit ends. The Internet is a stateless environment. Users do not generally log off of a Web session. To determine when a user has stopped using your site, you need to set up a timeout for Usage Import. With this setting enabled, Usage Import can determine that a user's visit is over when no activity has occurred for the configured number of minutes. If the user resumes activity on the Web site after the configured amount of idle time has passed, then a new visit is started for the user.
    • User Algorithm: If you use cookies to identify users, you can leave this item unchecked. Users' visits are tracked by their logged-on user names. If your site or a portion of your site allows multiple users to log on using the same user name, you must check this option. If you check this option, Usage Import does not rely on user names, but instead relies only on IP addresses to track user visits. For example, if your Web site has an area that everyone can access, but they all logged in as Guest, then everyone accessing that location appears in the log files as the Guest user. To determine the difference between the users, you need to look at the individual client IP addresses that are accessing that location.

      NOTE: The preceding information does not apply to Anonymous user access to your site. Anonymous access is a special case in that the log file does not register any user name in the logs for anonymous user hits. Anonymous user hits simply contain a dash in the user name field of a log file. When no user name is in the user name field of the log file, Usage Import automatically uses the client IP address for user-visit tracking.
  • Query Strings

    • Filesystem paths (URIs) containing query strings to import: This field is used to enter the URI of query strings that you want to import. For example, if there is a URI query string, such as /marketing/usersfavoritecolor.asp, for which you want to track the requests, enter the entire URI query string into the Filesystem paths (URIs) containing query strings to import field. If you have multiple query strings, enter each one, separated by a space. This field can also use wildcards. If you have multiple query strings under a subdirectory, you can enter /directory/* to import all query strings that come from that directory. URI query strings can be reported in Report Writer through the Query String Dimension.

      NOTE: Do not place quotation marks around the query strings as stated in the online documentation. Also, the URI query string cannot have a space in the string. For example /helpdesk/cgi-bin/* is a valid URI query string, but /help desk/cgi-bin/* is not a valid URI query string.
    • Names of single-value query parameters to parse: This item is used to import query parameters and their values into the database. For example, if you configure Filesystem paths (URIs) containing query strings to import with the URI query string /marketing/usersfavoritecolor.asp, and this dynamic page takes one value named Color, then Color would have a single value such as Red, Blue, or Green when the client submits its response to the Usersfavoritecolor.asp page. The URI stem in the log file would be as follows:

      /marketing/usersfavoritecolor.asp?Color=Red

      Color is a single-value query parameter. Although there are multiple possible values for the parameter Color, it can only hold a single value at a time. To add query parameters to this field, you need to speak with the developer of the site to determine the query parameters that are used by the query strings. You can then enter the query parameters, separated by spaces, into this field. For this example, you would enter Color. Do not enter the values that the query parameter can have, only the query parameter itself. These query parameters become dimensions in Report Writer that you can report after your import is complete. In other words, from our example, you can run a report on Top 10 Colors and get a report that shows how many times each color was chosen as the users' favorite color.

      NOTE: Do not place quotation marks around the query parameters as stated in the online documentation. In addition, you cannot use query parameters that match any of the default dimension names already used by Analysis. See the "Managing Query Strings" topic in the online documentation for other special character information. Take caution, however, because once a query string and/or query string parameter have been imported, they cannot be deleted or edited.
    • Names of multi-value query parameters to parse: Multi-value query parameters are somewhat similar to single-value query parameters. Unlike the single-value query parameter, which only contains one value (as in the preceding Color example), multi-value query parameters can contain multiple values for a single query parameter. The following is an example of a multi-value query parameter that uses a dynamic page example. If Usersfavoritehobbies.asp takes a multi-value query parameter named Hobbies, you may have a URI query string similar to the following:

      /marketing/usersfavoritehobbies.asp?hobbies=Golf+Tennis+Skiing

      Hobbies is an example of a multi-value query parameter.

      NOTE: Do not place quotation marks around the query parameters as stated in the online documentation. In addition, you cannot use query parameters that match any of the default dimension names already used by Analysis. See the "Managing Query Strings" topic in the online documentation for other special character information. Take caution, however, because once a query string or a query string parameter has been imported, it cannot be deleted or edited.

Import Options Configuration

The second part to properly configuring Usage Import involves configuring the values on the Tools/Options menu of Usage Import. To access the Options portion of Usage Import, click the Tools menu, and then click Options.

The Import Options window contains eight tabs. This section discusses the options that are on located each tab.

Import Tab

  • Drop database indexes before import: This option informs Usage Import to drop the database indexes before importing, and then add them back after the import is complete. If your database is small, you should select this option to increase the speed of the import. When your database becomes larger and the import is only a small percentage of the size of the database, this option should be unchecked.
  • Adjust requests timestamps to: This option provides Usage Import with the proper timezone information to adjust the requests before saving them to the database. This is where you set the local timezone in which the physical Web server resides so that reports show the actual local time in which the requests were received and handled by the Web server.
  • Use cookies for inferences: If your Web site uses cookies for inferences regarding users, select this option. The Use cookies for inferences option informs Usage Import to track users by their cookie instead of their username or IP address. If you do not use cookies, leave this option unchecked.
  • Save query strings with URI: If you configured values for the Single-value or Multi-value query strings in the Site Properties, you need to select this check box. Usage Import does not save the query string values associated with a URI unless this check box is selected. If you are not reporting on query string values, leave this option unchecked.
  • Start day of week: Report Writer uses this value when you run a report that uses the Weekday or Week dimension. It informs Report Writer what day to use as the start of a week.
  • After Import: The options that you select under this heading automatically occur after every import that Usage Import performs. It is usually best practice not to automatically perform these functions after every import. These functions can be very time and resource intensive. Most often, Microsoft recommends that you disable these functions from running after every import, and then run them manually or by a scheduled batch job.


    • Lookup unknown HTML file titles: This option associates a title with every file in the database that has an HTML Title Tag. For example, if you have a Title Tag with the value "My Companies Home Page" in your Default.htm page, that title is stored in the database with an association to Default.htm. When you run reports, the reports show more meaningful document titles instead of just document names.
    • Resolve IP addresses: This function attempts to resolve an IP address into a domain name. There are no guarantees that an IP address can be resolved into a domain name. DNS provides the reverse lookup information on IP addresses. If an IP address is not configured with a PTR record in DNS, then the IP address cannot be resolved into a corresponding domain name. Usage Import has no control over the number of IP addresses that get resolved. Usage Import sends the IP addresses to the DNS server, and then the DNS server sends back the addresses that it can resolve.
    • Whois query for unknown domains: This function is used to find organization information for resisted domain names. Network Solutions and other organizations provide servers that maintain information on organizations based on the domain name that they have registered for that organization. Not all domain names on the Internet have organization information registered, but nearly every major organization in the world does. Just as in IP resolution, Usage Import has no control over the information it receives for the whois servers. Usage Import requests the organization information on a domain name from a whois server, and then imports the information that the whois server returns.

IP Resolution

  • Cache IP resolutions for: This setting informs Usage Import when it should attempt to resolve IP addresses again that failed to resolve previously. If an IP address fails to resolve, it gets saved in a cache. Usage Import will attempt to resolve that IP address again, but only after the number of days in this setting have expired. You do not want to set this value too low. Doing so can drastically increase the number of IP addresses that Usage Import needs to resolve. To force Usage Import to resolve all IP addresses contained in the cache, set this value to zero.
  • Timeout a resolution attempt after: This setting specifies that amount of time that IP Resolution attempts to resolve an IP address before considering it unresolved. Increasing this setting increases the time it takes to run IP Resolution, but it may allow Usage Import to increase the number of IPs that it can resolve.
  • Use a resolution batch size of: IP addresses are set to the DNS servers in batches. It is important that you do not send too many IP addresses to the DNS server at one time. It is possible to overload a DNS server if you do. On the other hand, you don't want to send too few IPs to the DNS server, which makes the resolution process very slow. A general rule of thumb is to start at 200 and increase or decrease as appropriate.

Log File Overlaps

  • To be considered an overlap, records must overlap by at least: This option is used to handle re-entry of a log file more then one time. This value is used by Usage Import to determine the amount of requests in the specified time value that need to match the same requests in the Analysis database. For example, if all the requests in the first 30 minutes of the log file match the requests are already in the database, then it is considered an overlap.
  • If an overlap is detected: This setting informs Usage Import what action to perform when an overlap is detected. There are four actions that Usage Import can perform when an overlap is detected:
    • Discard Records and Proceed: This option informs Usage Import to discard all the overlapping requests and proceed to import any requests that do not overlap.
    • Import All: This option informs Usage Import to ignore the overlap and import the requests anyway, even if they are duplicates.
    • Stop the Import: This option informs Usage Import to stop importing the specific log file that contains the overlap. If there are other logs in the import list, Usage Import continues importing the next log file in the list.
    • Stop All Imports: This option informs Usage Import to stop importing the specific log file that contains the overlap. If there are other logs in the import list, Usage Import dose not import them either.

Default Directory

  • Log files: This setting is used to specify the default directory when you click the Browse button in the Import Manager window. Instead of having to browse to the \Winnt\System32\Logfiles directory every time you click the Browse button, you can redirect the default directory to \Winnt\System32\Logfiles. When you click Browse in Import Manager, you start in the Logfiles directory.

IP Servers

  • HTTP Proxy & HTTP Port: Usage Import uses this setting when you run an HTML File Titles Lookup. If you need to go through a proxy server to access the site that you are doing a File Titles Lookup on, you need to enter the proxy server name and port number in this field. If you are uncertain of the proxy server name or port number, check the connection settings in your Web browser.
  • FTP Proxy & FTP Port: If you have to go through a proxy server to connect to the FTP server on the Internet, you need to specify a proxy server and port to connect to the FTP server.
  • Local domain of DNS server: This setting is used to specify the domain name of your DNS server. This entry is necessary if you access your DNS server through a proxy server and you want to do IP resolutions.

Crawler List

  • Exclude crawlers: This setting informs Usage Import to exclude requests from Internet crawlers (also referred to as spiders and robots) from being imported into the analysis database. Crawlers are used by search engines to catalog what is contained in a Web site. They do this by accessing each and every page on a Web site. This type of activity is not something most Web administrators want to report on. By enabling this option, Usage Import can ignore requests that come from crawlers listed in the Crawler user agent strings list.
  • Crawler user agent strings: This list includes many of the major Web auditing organizations on the Internet today. Proper Internet etiquette requires that a crawler contain a word in the user agent string that identifies it as a crawler. Spider and bot are two examples of such words. When Usage Import matches one of the words in the user agent string to one listed in the Crawler user agent string list, it determines that request came from a search engine crawl. Based on the setting in the Exclude Crawlers field, Usage Import will either discard that request or process it like a usual client request.

Intranet

  • Number of domain parts to parse beyond Internet organization: This option informs Usage Import how many parts (subdomains) to the left of the domain name that it should save as the domain name. By default, Usage Import saves two parts of a domain name. For example, in a large intranet setting, you may have subdomains within the domain name microsoft.com, such as marketing.microsoft.com or sales.microsoft.com. If you want Usage Import to include the subdomains when it imports the domain name, you can change this setting from zero to the number of subdomain parts to include with the domain name.

Clear Cache

The Clear Cache button only works if you have selected the Store open visits to a cache option. If you import a log file, and then decide to delete the log file import and re-import it again, you must clear out the cache before performing the import. If you do not clear out the cache, you will have twice as many open visits in the cache, which are duplicates.

Additional Information

This article is in no way intended to be an in-depth comprehensive explanation of Site Server 3.0 Usage Import. It is intended to help answer some of the common questions that are asked regarding the settings in Usage Import. For additional information on each item referenced in this article, please see the Microsoft Site Server documentation.

Keywords: kbfix kbhowto KB258089