Microsoft KB Archive/286342

= Network failure detection and recovery in Windows Server 2003 Clusters =

Article ID: 286342

Article Last Modified on 3/1/2007

-

APPLIES TO


 * Microsoft Windows Server 2003, Enterprise Edition (32-bit x86)
 * Microsoft Windows Server 2003, Datacenter Edition (32-bit x86)

-



This article was previously published under Q286342



SUMMARY
The way a server cluster in Windows Clustering handles the loss of private, internal cluster (heartbeat) communication is different in Microsoft Windows Server 2003 from the way it does in Microsoft Windows 2000. In Windows 2000, if there was a complete loss of heartbeat communication between the nodes in a cluster, the node that owned the Quorum resource, takes ownership of all resources. This article compares the behavior of a Windows 2000 cluster and a Windows Server 2003 cluster in the handling of such a situation.



Windows 2000
When a cluster node loses connection to all networks that are set for intra-cluster communication, the Cluster service must use the Quorum disk resource to arbitrate and determine which node should remain up and functioning because the nodes have no other way of communicating. The node that receives the ownership of the Quorum resource then brings all resources online, and the Cluster service takes all other nodes in the cluster offline.

Example
There is a complete loss of all networks, where node A owns the Quorum resource:

By disconnecting all of node A's network interfaces, there is a situation where there is no LAN for private cluster communication. Therefore, when node A loses all of its network connections, it is no longer able to detect whether node B is running. Likewise, node B is no longer able to detect if node A is running. The two nodes arbitrate for the Quorum resource, and node A successfully defends its ownership. Node B removes itself from the cluster, and all of its resources failover to node A.

Note: This type of double failure is extremely rare.

If node A no longer has any viable public network interfaces, it cannot receive service requests from clients, but it owns all the resources, which eventually transition to a Failed state. At this point, no resources are available to external clients. Meanwhile, node B may have a perfectly viable public network interface, but it is has been excluded from the cluster because it has no private network connectivity to the node that owns the Quorum resource.

Windows Server 2003
Prior to arbitrating for the Quorum resource, a node checks whether at least one of its network interfaces, which is enabled for cluster use, is connected to any network. In this scenario, this would be any network enabled for client access (All Communications or Client Access Only). If it finds no viable interfaces, the node voluntarily drops out of Quorum resource arbitration, thus removing itself from the cluster.

In the &quot;Example&quot; section, node A determines that both of its networks are unavailable, and it declines to arbitrate. If the Quorum device responds, and the node A reservation terminates quickly, node B wins the Quorum arbitration, and all resources switch to node B. Node B then makes the cluster resources available to clients.

Node A cannot rejoin the cluster until it re-establishes network connectivity with node B and you restart the cluster service. For additional information, click the following article number to view the article in the Microsoft Knowledge Base:

242600 Network failure detection and recovery in a two-node Windows Server 2000 cluster

Additional query words: MSCS Network Media Sense

Keywords: kbenv kbinfo kbnetwork KB286342

-

[mailto:TECHNET@MICROSOFT.COM Send feedback to Microsoft]

© Microsoft Corporation. All rights reserved.