Microsoft KB Archive/318747

= Large Text Files Are Not Fully Indexed =

Article ID: 318747

Article Last Modified on 2/28/2007

-

APPLIES TO


 * Microsoft SharePoint Portal Server 2001

-



This article was previously published under Q318747



IMPORTANT: This article contains information about modifying the registry. Before you modify the registry, make sure to back it up and make sure that you understand how to restore the registry if a problem occurs. For information about how to back up, restore, and edit the registry, click the following article number to view the article in the Microsoft Knowledge Base:

256986 Description of the Microsoft Windows Registry



SYMPTOMS
If you crawl documents on a computer that is running SharePoint Portal Server, large text files may not be fully indexed.

The Microsoft Search service may log an error message in the Microsoft Windows Event Viewer Application event log that is similar to:

Event Type: Warning

Event Source: Microsoft Search

Event Category: Gatherer

Event ID: 3035

Date: 1/1/2002

Time: 12:00:00 PM

User: N/A

Computer: COMPUTERNAME

Description:

One or more warnings or errors were logged to file . If you are interested in these messages, please, look at the file using the gatherer log query object (gthrlog.vbs, log viewer web page).

Context: SharePointPortalServer Application, WORKSPACE Catalog

The Content Source log may also contain error messages that are similar to:

Time: 1/1/2002 12:00:00 PM

Type: Document Added

Message: Error fetching URL, (8004173e - The document was too large to filter in its entirety. Portions of the document were not emitted.)

URL: file://./backofficestorage/localhost/sharepoint portal server/workspaces/HOME/Do...

Time: 1/1/2002 12:00:00 PM

Type: Document Added

Message: Error fetching URL, (8004173e - The document was too large to filter in its entirety. Portions of the document were not emitted.)

URL: \\.\backofficestorage\localhost\sharepoint portal server\workspaces\HOME\documen...

NOTE: To view the Content Source log for a workspace, browse to the following URL on your SharePoint Portal Server computer (where  is the name of your SharePoint Portal Server computer, and   is the name of your workspace):

http:// / /portal/resources/updatelog.asp?Workspace=



CAUSE
This issue can occur if some text files are too large for the server to index by using the default SharePoint Portal Server settings, which are configured for performance reasons.



Indexing Large Text Files
If you are indexing large text files (.txt), to resolve this issue, change the MaxTextFilterBytes registry value.

WARNING: If you use Registry Editor incorrectly, you may cause serious problems that may require you to reinstall your operating system. Microsoft cannot guarantee that you can solve problems that result from using Registry Editor incorrectly. Use Registry Editor at your own risk.

To change the MaxTextFilterBytes registry value:  Start Registry Editor (Regedt32.exe). Locate the following key in the registry:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex

 Double-click the MaxTextFilterBytes value, change the value to Decimal, and then type the new value. The value is the maximum size (in bytes) for files that the text filter indexes. (The default value is 25,000,000 bytes, or approximately 25 megabytes.)

See the &quot;More Information&quot; section of this article for a description of the MaxTextFilterBytes value.

Indexing Other Types of Large Documents
You can fully index most other document types by changing the MaxDownloadSize and MaxGrowFactor registry values:  Start Registry Editor (Regedt32.exe).</li> Locate the following key in the registry:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Search\1.0\Gathering Manager

</li> Double-click the MaxDownloadSize value, change the value to Decimal, and then type the new value. The value is the maximum size (in megabytes) for files that the gatherer downloads.</li> Double-click the MaxGrowFactor value, change the value to Decimal, and then type the new value. The value is the size of the output for the index filter.</li> Quit Registry Editor.</li></ol>

See the &quot;More Information&quot; section of this article for a description of the MaxDownloadSize and MaxGrowFactor values.

NOTE: After you make these changes, restart the Microsoft Search service. If you want your documents to be re-indexed immediately, do a full update on the content source that contains the large files.

<div class="moreinformation_section">

MORE INFORMATION
For additional information, click the article number below to view the article in the Microsoft Knowledge Base:

287231 Search Only Indexes 16 Megabytes of a Document

The following registry keys and values are used in this article: <ul> Indexing Service. The Content Indexing service registry values are located in the following registry path:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex

The Content Indexing service value that is used in this article is: <ul> MaxTextFilterBytes. The MaxTextFilterBytes value specifies the maximum amount of information that the text filter can process from a single file with a well-known extension.

Type: REG_DWORD

Units: Bytes

Default: 25000000 (approximately 25 MB)

Range: 1-4294967295 (0xFFFFFFFF)</li></ul> </li> Gatherer Service. The Gatherer service registry values are located in the following registry path:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Search\1.0\Gathering Manager

The Gatherer service values that are used in this article are: <ul> MaxDownloadSize. The MaxDownloadSize value specifies the maximum size of the document text that is filtered.

Type: REG_DWORD

Units: Megabytes

Default: 16 (16 MB)

Range: 1-4294967295 (0xFFFFFFFF)</li> MaxGrowFactor. The MaxGrowFactor value specifies how large (as a factor of the MaxDownloadSize value) the output of the Index Filter on the document can be.

Type: REG_DWORD

Units: Megabytes

Default: 4 (4 MB)

Range: 1-4294967295 (0xFFFFFFFF)</li></ul> </li></ul>

Additional query words: sps

Keywords: kbprb KB318747

-

[mailto:TECHNET@MICROSOFT.COM Send feedback to Microsoft]

© Microsoft Corporation. All rights reserved.