Microsoft KB Archive/249256

= How to troubleshoot intra-site replication failures =

Article ID: 249256

Article Last Modified on 11/23/2006

-

APPLIES TO


 * Microsoft Windows 2000 Advanced Server
 * Microsoft Windows 2000 Datacenter Server
 * Microsoft Windows 2000 Server

-



This article was previously published under Q249256



IMPORTANT: This article contains information about modifying the registry. Before you modify the registry, make sure to back it up and make sure that you understand how to restore the registry if a problem occurs. For information about how to back up, restore, and edit the registry, click the following article number to view the article in the Microsoft Knowledge Base:

256986 Description of the Microsoft Windows Registry



SUMMARY
This step-by-step article describes the various symptoms and resolution methods that you can use to troubleshoot intra-site replication failure issues.

Important Although Directory Services (DS) replication and File Replication Services (FRS) use the same connection setup mechanisms and replication schedules, they are two completely different components. This article describes common tools and techniques to troubleshoot replication connection objects to diagnose intra-site replication problems.

Common symptoms of replication failure
Important These steps may increase your security risk. These steps may also make your computer or your network more vulnerable to attack by malicious users or by malicious software such as viruses. We recommend the process that this article describes to enable programs to operate as they are designed to, or to implement specific program capabilities. Before you make these changes, we recommend that you evaluate the risks that are associated with implementing this process in your particular environment. If you choose to implement this process, take any appropriate additional steps to help protect your system. We recommend that you use this process only if you really require this process.

Common symptoms that indicate intra-site replication failure include the following:
 * Users and computers do not receive updated policies.
 * The correct SYSVOL share content is not replicated to all domain controllers (DCs).

Note This may also occur because of an FRS failure.

To troubleshoot these issues, use the following utilities:  Domain Controller Diagnostics (Dcdiag.exe) and Network Diagnostics (Netdiag.exe) utilities. You can obtain these tools from the Windows 2000 Support Tools on the Windows 2000 CD-ROM. For more informationP1, click the following article numberP2 to view the articleP2 in the Microsoft Knowledge Base:

P3 P4

For additional information about how to obtain and use the Dcdiag.exe and Netdiag.exe diagnostic utilities, click the following article number to view the article in the Microsoft Knowledge Base:

265706 DCDiag and NetDiag in Windows 2000 Facilitate Domain Join and DC Creation

 Replication diagnostics utility (Repadmin.exe). Use this tool to verify correct site links and to display inbound and outbound connections. You can also use it to display the replication queue. You can obtain this tool from the Windows 2000 Support Tools on the Windows 2000 CD-ROM. For more informationP1, click the following article numberP2 to view the articleP2 in the Microsoft Knowledge Base:

P3 P4

For additional information about how to obtain and use the Repadmin.exe utility, click the following article number to view the article in the Microsoft Knowledge Base:

229896 Using Repadmin.exe to Troubleshoot Active Directory Replication

 File Replication Service utility (Ntfrsutil.exe). Active Directory Replication Monitor (Replmon.exe). You can obtain this tool from the Windows 2000 Support Tools on the Windows 2000 CD-ROM.

The following list describes the basic steps to follow when you try to troubleshoot problems of this type:
 * Make sure that the Domain Name service (DNS) is correctly configured. A correct DNS configuration is necessary to correct directory replication.
 * Make sure that you can use the Ping.exe utility to "ping" the domain controller by host name and IP address from its hub partner.
 * Make sure that computers in the branch can resolve names in the hub. For example, "ping" server1.domain1.site1.forest.com.
 * Make sure that you can ping servers by their Globally Unique Identifiers (GUIDs) as they are listed in the event logs. If you can successfully ping a server by its host name, but not by its GUID, a DNS configuration problem exists.
 * Run the Dcdiag.exe utility. This utility runs a series of tests, with the result of either "Passed" or "Failed". Make sure that all tests pass.
 * View the Directory Service log of the Event Viewer on the branch with which you experience problems. Investigate and resolve all errors.
 * Verify correct site links by using the Repadmin.exe utility with the /showreps switch.
 * Verify inbound connections by using the Repadmin.exe utility with the /showconn switch.
 * View all the log files in the Winnt\Debug folder.

Specific symptoms and troubleshooting steps
Note In the following sections, the domain controller that is reporting the problem is referred to as the "destination server". The domain controller from which the destination server tries to replicate content is referred to as the "source server".

"Access Denied" errors
When you use the Repadmin.exe tool with the /showreps switch, one or more "Access Denied" error messages are listed in the replication status information that is returned. This indicates that the domain controller was unsuccessful when it last tried to contact the other domain controller. Because a domain controller is a member of the Enterprise Domain Controllers Group, it is authorized to call any function on another domain controller. If you see that calls between domain controllers result in "Access_Denied" errors, it is not an issue about the lack of correct credentials, but that one of the domain controllers is not configured correctly.
 * If the error is "ERROR_ACCESS_DENIED", look for a Kerberos problem.
 * If the error is "ERROR_DRA_ACCESS_DENIED", look to see if the computer accounts for both of the two computers involved, on both directories, are correct. Make sure that the userAccountControl field is correct for a domain controller.

Repadmin.exe or Replmon.exe Report "Access Denied" for a particular directory partition
This issue typically indicates a Kerberos authentication problem, although there are several exceptions. To resolve the replication failure in this case, resolve the authentication failure before you try to fix the replication problem. To resolve this issue:  Make sure that the "Access this computer from network" user right in the source server's security policy includes the destination servers' machine account. You can do so either by the Everyone group, the Enterprise Domain Controllers group, or by specifying it individually. Make sure that the Key Distribution Center service is started. You can use Dcdiag.exe to test for service failure on all domain controllers by using the dcdiag /test:services command.

Note In this command, there is a colon between "test" and "services".</li> Make sure that the destination server has connection objects from other source servers. If it does not, you may have to create manual connections if the Knowledge Consistency Checker (KCC) does not automatically create them, or if it has been disabled.</li> Make sure that the KCC topology is connected. If the KCC has not formed a full topology, changes cannot be replicated. To test this, use the dcdiag/test:topology command, specifying the domain topology that you want to check.</li> Make sure that the Trust computer for delegation check box is selected on the General tab of the  Properties dialog box in the Active Directory Users and Computers MMC snap-in.</li> If the problem exists between domain controllers from different domains, check the trust relationship. To do so, use the Active Directory Domains and Trusts snap-in or by using the netdom trust  /domain:  /verify /kerberos command.</li> Make sure that each computer is synchronized for the Configuration Naming Context (Config NC). The KCC must know what the servers and sites are. You can use the repadmin/syncall command to force a server to become up-to-date with the whole enterprise. Specify that the naming context that you want to synchronize is the Config NC. Make sure that your site link topology is correct. Force the KCC to run on each server to rebuild the topology, or wait 15 minutes.</li> Make sure that key bridgehead servers are operational. You must determine if changes can flow throughout the enterprise. Run the dcdiag/test:intersite command one time for each site. This command returns the names of the bridgehead servers and whether or not they are reporting errors.</li> Check the attribute of the userAccountControl property. Make sure that the UF_SERVER_TRUST_ACCOUNT 0x2000 and the UF_TRUSTED_FOR_DELEGATION 0x80000 attributes are defined. For example, if you convert the attribute value of 532480 decimal to hexadecimal, it becomes x82000 of which 0x8000 corresponds to UF_TRUSTED_FOR_DELEGATION and 0x2000 corresponds to UF_SERVER_TRUST_ACCOUNT.</li> Use the Replmon.exe utility to determine if the pwdLastSet and unicodePwd attributes have consistent time/date stamps across computers.</li> Make sure that service principal names (SPNs) are registered on each domain controller. Use the dcdiag/test:outboundsecurechannels command to test this. You can identify the SPN that is used for replication by the previous GUID: E3514235-4B06-11D1-AB04-00C04FC2DCD2/b2f6f255-4446-45e8-81a3-0649d5d71a66/ .</li> Force all computer accounts to be replicated throughout the enterprise. That means that all domain controllers must be synchronized with all other copies of their domain. For each computer that is reporting a replication error such as "Access Denied", use the repadmin/syncall command to force that computer to become up- to-date. Note that you must specify the domain that you want to synchronize.</li> You may receive the following error message when you run the previous Repadmin.exe command:

The security context could not be established due to a failure in the requested quality of service.

If you do, turn up internal processing and look for "DSID"s. Contact Microsoft Product Support Services (PSS) for information about how to obtain the Dsid.exe tool. For information about how to contact Microsoft PSS, visit the following Microsoft Web site:

http://support.microsoft.com

</li> Make sure that the Enterprise Domain Controllers group has the required permissions on the directory partitions ACLs: <ol style="list-style-type: lower-alpha;"> Start the Active Directory Users and Computers snap-in.</li> On the View menu, click Advanced Features, if it is not already selected.</li> Right-click the root domain object, and then click Properties.</li> <li>Click the Security tab, click ENTERPRISE DOMAIN CONTROLLERS in the name list, and then make sure that the following permissions are selected under Allow:

Manage Replication Topology

Replicating Directory Changes

Replication Synchronization

</li></ol> </li> <li>Use the Active Directory Sites and Services snap-in to make sure that the Server object and its corresponding "NTDS Settings" child object exist in the correct site.</li> <li>Check the destination server for old or invalid tickets to the source server. Use the Kerbtray and Klist Windows 2000 Resource Kit utilities to perform these tests. Use the NETDOM RESETPWD command to reset the account password and write this change to an immediate replication partner. This effectively changes the password, sets the old and new passwords to be the same, and then writes this change to the replication partner. This requires that you use the following command or that you restart the computer:

klist purgeall

</li></ol>

"The DSA operation is unable to proceed because of a DNS lookup failure" error
To troubleshoot this error: <ol> <li>Use the Nltest /dsgetdc: /pdc /force /avoidself command to determine if the correct PDC is returned.</li> <li>If there a connection object and not a replication link reported by the REPLMON or REPADMIN commands, the problem might be related to the KCC.</li> <li>Run the following commands on the PDC, and then submit the output to Microsoft PSS for more troubleshooting:

nltest /DBFLAG:0x2000FFFF

-and-

nltest /DSGETDC: /GC

</li> <li>Run the nltest /dsgetdc: /gc /force command to determine if you can contact a global catalog server (GC).</li> <li>Check the "password last changed" parameter on both the PDC and the server(s) with which you experience the problem.</li></ol>

Operation queued or no replication links displayed
No replication links are reported when you run the Repadmin.exe or Replmon.exe utilities. To troubleshoot this issue, Trigger the KCC and look in the Directory Services log for any events that relate to the KCC. This typically points to a failure to communicate with a domain controller.

Replication access denied or naming context is in the process of being deleted
You receive one of the following messages when you try to trigger replication:

Replication access is denied.

-or-

The naming context is in the process of being deleted.

This may occur if the user who is using the Active Directory Sites and Services snap-in to trigger replication on a domain controller does not have the appropriate permission to initiate replication. Check the credentials of the user who performs this operation.

Duplicate connection objects between sites
To troubleshoot this issue: WARNING: If you use Registry Editor incorrectly, you may cause serious problems that may require you to reinstall your operating system. Microsoft cannot guarantee that you can solve problems that result from using Registry Editor incorrectly. Use Registry Editor at your own risk. <ol> <li>Determine if explicit bridgeheads between sites were used in the past and not removed, or are currently used and misconfigured. One way to verify this is to use the LDP tool to connect to the Inter-Site Topology Generator (ISTG) in the site that has duplicate connections. If you look through the Config NC to the Intersite Transports container, then to cn=ip, view this object. If it contains the "bridgeheadServerListBL" attribute, explicit bridgeheads exist. For more informationP1, click the following article numberP2 to view the articleP2 in the Microsoft Knowledge Base:

P3 P4

For additional information about how to determine the ISTG of a Site, click the following article number to view the article in the Microsoft Knowledge Base:

224599 Determining the Inter-Site Topology Generator (ISTG) of a Site in the Active Directory

</li> <li>Determine if the duplicate connections appear in all sites or in a particular subset. Look for a pattern such as duplicate connections between certain sets of servers. In a site that has duplicate connections, view the fromServer attribute on the duplicate connection. For that "fromServer", consider the site in which the "fromServer" resides. Try to isolate the activities in that site. How many servers are in that site? Are there any servers that are reachable, by using the Ping utility from the ISTG?</li> <li>Make sure that the replication interval is appropriately set and the ISTG can complete it's replication.</li> <li>To help isolate duplicate connection issues: <ol style="list-style-type: lower-alpha;"> <li>Pick a DC that is building duplicate inbound intersite connections. For example, the same source DC and destination DC, not just the same source site and destination site. The selected DC must be the ISTG for its site. You can determine the ISTG for a site by viewing the NTDS Site Settings properties for that site in the Active Directory Sites and Services snap-in.</li> <li>Increase the Directory Service event log to a very large size. For example, 64 megabytes (MB).</li> <li>Use Registry Editor to set the regedit to set the 1 Knowledge Consistency Checker value to a data value of 5 and 9 Internal Processing value to a data value of 1 in the following registry subkey:

 

</li> <li>Run the ldifde -f before.ldf -d "CN=Sites,CN=Configuration,DC=Site1,DC=Forest1,DC=com" command.</li> <li>Let T0=current time.</li> <li>Run the repadmin /kcc command, and then wait for it to complete.</li> <li>Start the Event Viewer, and then make sure that the Directory Service event log recorded informational events back to time T0 (including KCC event 1009, "The consistency checker has started updating the replication topology for this server"). If not, double the event log size and go back to step e: Let T0=current time.</li> <li>Save the Directory Service event log.</li> <li>Run the ldifde -f after.ldf -d "CN=Sites,CN=Configuration,DC=Site1,DC=Forest1,DC=com" command.</li> <li>Review the Before.ldf, the After.ldf, and the Directory Service event log for more analysis.</li></ol> </li></ol>

Group Policy is applied inconsistently across domain controllers
You can use the following example script to make sure that Group Policy has replicated correctly throughout the domain controllers in your domain. Microsoft provides programming examples for illustration only, without warranty either expressed or implied, including, but not limited to, the implied warranties of merchantability and/or fitness for a particular purpose. This article assumes that you are familiar with the programming language being demonstrated and the tools used to create and debug procedures. Microsoft support professionals can help explain the functionality of a particular procedure, but they will not modify these examples to provide added functionality or construct procedures to meet your specific requirements. If you have limited programming experience, you may want to contact a Microsoft Certified Partner or the Microsoft fee-based consulting line at (800) 936-5200. For more information about Microsoft Certified Partners, visit the following Microsoft Web site

https://partner.microsoft.com/global/30000104

For additional information about the support options available from Microsoft, visit the following Microsoft Web site:

http://support.microsoft.com/default.aspx?scid=fh;EN-US;CNTACTMS

Use the chkpolicy command to run this script: @echo off

REM \logs\chkpolicy domain_name

set filename=sysvol\%dom_name%\Policies\{6AC1786C-016F-11D2-945F-00C04fB984F9}\Machine\Microsoft\Windows NT\SecEdit\GPTTMPL.INF

nltest /dclist:%dom_name% > dclist.tmp

del dclist1.tmp

FOR /F "eol=; tokens=1 delims=, " %%i in (dclist.tmp) do (

@echo %%i >> dclist1.tmp

)

FOR /F "eol=. tokens=1 delims=. " %%i in (dclist1.tmp) do (

@echo %%i

dir "\\%%i\%filename%"

)

The directory service is too busy to complete the operation
You may receive error 8438, ERROR_DS_DRA_BUSY, "The directory service is too busy to complete the replication operation at this time." This is the error that the Directory Service returns when it has made progress removing the Naming Context (having removed 500 objects), but there are too many objects to complete in one pass without tying up the replication queue. If Global Catalog cleanup is preventing successful replication, you create a batch file to speed up the process. You can then re-promote the computer to act as a global catalog server. The following example script provides this functionality: Microsoft provides programming examples for illustration only, without warranty either expressed or implied, including, but not limited to, the implied warranties of merchantability and/or fitness for a particular purpose. This article assumes that you are familiar with the programming language being demonstrated and the tools used to create and debug procedures. Microsoft support professionals can help explain the functionality of a particular procedure, but they will not modify these examples to provide added functionality or construct procedures to meet your specific needs. If you have limited programming experience, you may want to contact a Microsoft Certified Partner or the Microsoft fee-based consulting line at (800) 936-5200. For more information about Microsoft Certified Partners, see the following Microsoft Web site:

https://partner.microsoft.com/global/30000104

For additional information about the support options available from Microsoft, visit the following Microsoft Web site:

http://support.microsoft.com/default.aspx?scid=fh;EN-US;CNTACTMS

setlocal

set destgc=__setgcnamehere__.site1.forest1.com


 * domain1

repadmin /delete DC=domain1,DC=site1,DC=forest1,DC=com %destgc% /nosource

if %errorlevel% == 8438 goto :domain2


 * domain2

repadmin /delete DC=domain2,DC=Site1,DC=forest1,DC=com %destgc% /nosource

if %errorlevel% == 8438 goto :domain3

REM ...

endlocal

Knowledge consistency checker and ISTG
You can create an event log for the Knowledge Consistency Checker that contains more diagnostic information. To do this perform the following steps on the ISTG of the site where duplicate connections appear: <ol> <li>Save the contents of the event log, and then clear the event log.</li> <li>Set the 1 Knowledge Consistency Checker registry DWORD value to 5 in the following registry subkey:

 

</li> <li>Run the Knowledge Consistency Checker by running the repadmin /kcc command.</li> <li>Reset the 1 Knowledge Consistency Checker registry DWORD value to 0 (zero).</li> <li>Save the new event log.</li></ol>

To obtain a new baseline measurement:
 * 1) Make sure that the computer has a site link to the hub. If it does not, create one.
 * 2) Delete all connection objects that come into the computer.
 * 3) Run the Knowledge Consistency Checker by running the repadmin /kcc command.
 * 4) Make sure that it has created the connections you expect by running the repadmin /showconn command.
 * 5) Look in the Directory Service event log for errors. You may see errors (for example, event ID 1265) indicating that a replica cannot be added for naming context , and error  . Determine if the error is related to a DNS issue or if it is a connectivity error, and then try to correct the corresponding problem. If the error indicates that a target account name is incorrect or if it is an SPN error, it may be more difficult to resolve.
 * 6) If the event log reports that the replica was added successfully, check this by running the repadmin /showreps command.

After you adjust site link replication intervals, wait for the configuration change to replicate to other hub servers, and then restart each of the hub servers to clear the replication queue. You can use the repadmin/sync command or the Active Directory Sites and Servers snap-in to force replication of the Configuration naming context so that the updated site links are visible on each of the hub servers before you restart them. Use the Dcdiag.exe utility to assess the replication health of each site. This can be run remotely through a script and the output parsed for the word "fail". You can use the following sample script as an example: Microsoft provides programming examples for illustration only, without warranty either expressed or implied, including, but not limited to, the implied warranties of merchantability and/or fitness for a particular purpose. This article assumes that you are familiar with the programming language being demonstrated and the tools used to create and debug procedures. Microsoft support professionals can help explain the functionality of a particular procedure, but they will not modify these examples to provide added functionality or construct procedures to meet your specific needs. If you have limited programming experience, you may want to contact a Microsoft Certified Partner or the Microsoft fee-based consulting line at (800) 936-5200. For more information about Microsoft Certified Partners, see the following Microsoft Web site:

https://partner.microsoft.com/global/30000104

For additional information about the support options available from Microsoft, visit the following Microsoft Web site:

http://support.microsoft.com/default.aspx?scid=fh;EN-US;CNTACTMS

REM check replications in site site1

dcdiag /s:dc1 /test:replications /a /n:domain1

dcdiag /s:dc1 /test:replications /a /n:domain2

dcdiag /s:dc1 /test:replications /a /n:domain3

REM check replications in site site2

REM continue Dcdiag statements for domains in site2

File Replication Service (FRS)
<ol> <li>If you suspect that Directory Service replication is working, but that FRS is failing, make sure the FRS post-Service Pack 1 (SP1) hotfix is installed on all replication partners. This update is included in Service Pack 2 and Service Pack 3 for Windows 2000.</li> <li>Run the Ntfrsutil ds command to verify the following: <ul> <li>Make sure that there is only one subscriber object with the name "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)" and that it has a "Member Ref". For example: <ul> <li>SUBSCRIBER: DOMAIN SYSTEM VOLUME (SYSVOL SHARE)</li> <li>Member Ref: CN= ,CN=Domain System Volume (SYSVOL share),CN=File Replication Se...</li></ul> </li> <li>Locate the member object output ("dump") for this domain controller, and then make sure that it has a Server Ref and a Computer Ref attribute. Also make sure that at least one connection exists right under this member object. This is the inbound connection to this domain controller. For example: MEMBER: TEST1 <ul> <li>Server Ref : CN=NTDS Settings,CN= ,CN= ,CN=Default-First-Site-Name,CN=Sit...</li> <li>Computer Ref : cn= ,ou=domain controllers,dc= ,dc= ,dc= ,dc= ...</li> <li>DN : cn=d7874204-c331-4750-82ec-30b96a8ec732,cn=ntds settings,cn= ,cn=s...</li></ul> </li> <li>Make sure that at least one other member object had this domain controller as its inbound partner. Use the Partner Dn attribute to indicate which partner this connection is from. <ul> <li>Partner Dn : cn=ntds settings,cn= ,cn= ,cn=default-first-site-name,cn=sit...</li></ul> </li></ul> </li> <li>Run the Ntfrsutil command to check the following: <ul> <li>Make sure that the replica set DOMAIN SYSTEM VOLUME (SYSVOL SHARE) has a Service State value of ACTIVE For example:

ServiceState : 3 (ACTIVE)

</li> <li>Make sure that there is at least one inbound and one outbound connection from this domain controller. For example:

Inbound : FALSE

Inbound : TRUE

</li></ul> </li> <li>Increase FRS logging levels. To do this, add the following registry values to the  registry subkey:

Value name: Debug Log Severity

Value type: REG_DWORD

Value: 0x00000004

Value name: Debug Maximum Log Messages

Value type: REG_DWORD

Value: 50000

Value name: Debug Log Files

Value type: REG_DWORD

Value: 0x00000032

</li> <li> To aid troubleshooting, you can "dump" the state of the FRS on a domain controller to a file. Use the following sample script as an example of how to do this: Microsoft provides programming examples for illustration only, without warranty either expressed or implied, including, but not limited to, the implied warranties of merchantability and/or fitness for a particular purpose. This article assumes that you are familiar with the programming language being demonstrated and the tools used to create and debug procedures. Microsoft support professionals can help explain the functionality of a particular procedure, but they will not modify these examples to provide added functionality or construct procedures to meet your specific needs. If you have limited programming experience, you may want to contact a Microsoft Certified Partner or the Microsoft fee-based consulting line at (800) 936-5200. For more information about Microsoft Certified Partners, see the following Microsoft Web site:

https://partner.microsoft.com/global/30000104

For additional information about the support options available from Microsoft, visit the following Microsoft Web site:

http://support.microsoft.com/default.aspx?scid=fh;EN-US;CNTACTMS

@echo off

REM FRS_CHECK.CMD - Records the state of FRS

SETLOCAL ENABLEEXTENSIONS

SET FRSCK=C:\FRS_CHECK

if NOT EXIST %FRSCK% (md %FRSCK%)

REM run dcdiag

dcdiag >  %FRSCK%\dcdiag.txt

REM For FRS

ntfrsutl ds  > %FRSCK%\ntfrs_ds.txt ntfrsutl sets  > %FRSCK%\ntfrs_sets.txt ntfrsutl inlog  > %FRSCK%\ntfrs_inlog.txt ntfrsutl outlog  > %FRSCK%\ntfrs_outlog.txt ntfrsutl version  > %FRSCK%\ntfrs_version.txt regdmp HKEY_LOCAL_MACHINE\system\currentcontrolset\services\NtFrs\Parameters > %FRSCK%\ntfrs_reg.txt dir \\.\sysvol /s > %FRSCK%\ntfrs_sysvol.txt

REM scan the frs debug logs for errors.

findstr /i ":SO: error invalid fail abort warn" %windir%\debug\ntfrs_*.log  |  findstr /v "IO_PEND ERROR_SUCCESS FrsErrorSuccess" > %FRSCK%\ntfrs_errscan.txt

REM For DS replication

repadmin /showreps >  %FRSCK%\ds_showreps.txt repadmin /showconn >  %FRSCK%\ds_showconn.txt </li></ol>

<div class="references_section">