Microsoft KB Archive/814459

From BetaArchive Wiki
Knowledge Base


Cluster Resource Group Does Not Fail Over If the State of Both Network Interfaces Is Unreachable

PSS ID Number: 814459

Article Last Modified on 4/30/2003



The information in this article applies to:

  • Microsoft Windows 2000 Advanced Server




SYMPTOMS

When a cluster node loses connectivity with the client-access network, resource groups that contain IP address resources on that cluster node may not fail over to the other node. All resources may remain in an online state on the original node.

If you manually move these resource groups to the other node, the IP address resources do not come online, and the resource group fails over between the two cluster nodes until it enters a failed state.

CAUSE

This behavior occurs if the client-access network interfaces on both cluster nodes are in an unreachable state.

WORKAROUND

To work around this issue:

  • If neither of the cluster nodes can communicate with external hosts, check the physical network components, such as network interface cards, cables, and network switches or hubs, to make sure that at least one cluster node is accessible to clients.
  • If one cluster node is accessible to clients through the client-access network interface, stop the Microsoft Cluster service on the inaccessible cluster node. Resource groups that contain the IP address resources of this client-access network subnet will be brought to an online state on the other cluster node.


STATUS

This is expected behavior when the network interface media sense feature is disabled.

MORE INFORMATION

When the two cluster nodes (for example, node A and node B) cannot contact each other by using the client-access network connection (for example, if you unplug the network cable from node A), both cluster nodes then test network connectivity by using Internet Control Message Protocol (ICMP) echo requests to determine whether they can connect to at least one external host. An external host is represented by an IP address that has the following characteristics:

  • It is not local to either cluster node. For example, it is not a cluster virtual IP address.
  • It is on the same client-access network subnet as both cluster nodes.
  • It currently exists as a destination address in the routing table of either cluster node, and the routing interface is the corresponding local client-access cluster network interface. Or it is currently present as an active TCP connection for either cluster node.

For example, the default gateway is typically used as an external host because it meets all three of these characteristics.

According to the network connectivity tests, the server cluster will enter one of the following scenarios:

  • If node A cannot communicate with all external hosts and node B can communicate with one external host, the client-access network interface of node A is determined to be in a failed state, and the status of the client-access network interface of node B is considered as up. In this case, if node B is designated as a possible owner of the cluster resources, it takes ownership of the cluster groups that contain the IP address resources of this client-access network subnet.
  • If neither node A or node B can communicate with external hosts (for example, if there are two network connections, one for the heartbeat signal and one for client-access, and the network switch that both client-access network interfaces connect to is turned off), both client-access network interfaces are determined to be unreachable. In this case, the client-access network interfaces remain in an unreachable state until the network connectivity is recalculated.
  • If both node A and node B can communicate with an external host, both client-access network interfaces are determined to be in an unreachable state. In this case, the client-access network interfaces remain in an unreachable state until the network connectivity is recalculated. The following example describes a scenario where this behavior can occur:

    Example Consider the following configuration:
    • You have two sets of server clusters running SAP software, and these SAP instances communicate with each other.
    • Each cluster node has two public network interfaces (one for client-access communication and one for SAP program communication) and one private network interface for heartbeat communication.
    • Static routes are configured in each cluster node so clients can access SAP instances by using the Client Access Network Connection while cluster nodes access SAP instances by using the Application Communication Network Connection.

    When you unplug the network cable from the client-access network interface of the cluster node that owns IP address resources, the following behaviors occur:

    • The cluster node that loses client-access network connectivity can communicate with one external host (the virtual IP address used by another instance of SAP on another server cluster) by using Application Communication Network Connection according to the static routes.
    • The other cluster node can communicate with an external host by using Client Access Network Connection because it is unchanged on this node.

For additional information about how Microsoft Windows Cluster Services (MSCS) detects and recovers from network failures, click the following article number to view the article in the Microsoft Knowledge Base:

242600 Network Failure Detection and Recovery in a Two-Node Server Cluster


For additional information about the recommended network configuration on a server cluster, click the following article number to view the article in the Microsoft Knowledge Base:

258750 Recommended Private "Heartbeat" Configuration on a Cluster Server


For additional information about network failure detection on a Microsoft Windows NT 4.0 cluster, click the following article number to view the article in the Microsoft Knowledge Base:

257925 Cluster Server Does Not Detect Network Problems in Windows NT 4.0 Enterprise Edition


For additional information about impacts when configuring static route on a server cluster, click the following article number to view the article in the Microsoft Knowledge Base:

814989 Static Routing Entry May Cause a Problem in a Cluster Environment


For additional information about how Windows Server 2003 server clusters detect and recover from network failures, click the following article number to view the article in the Microsoft Knowledge Base:

286342 Network Failure Detection and Recovery in Windows Server 2003 Clusters


Keywords: kbprb KB814459
Technology: kbwin2000AdvServ kbwin2000AdvServSearch kbwin2000Search kbWinAdvServSearch