A Survival Authority (SA) decides on the shutdown or continuation of processing for a node in the failover case wherein the two nodes cannot communicate over the cluster cross connects and cannot shutdown the partner node via the maintenance controller. Its main function is to determine which node should continue running, and which node should be shut down during a massive power outage or building failure. In the case of a geographically separated cluster, the Survival Authority is mandatory. For a co-located cluster a SA is optional if the cross connect between both nodes is a physical LAN cable (e.g. both nodes in the same rack) and mandatory if the cross connect link between the nodes is setup over switches. In this case, the SA must be on a separate server (separate subnet).
A new Stand Alone Service (SAS) feature is offered in the legacy system software release. Instead of shutting down one node, the node that takes over goes into a "Stand Alone Primary" state while the node which is supposed to shutdown and reboot goes into a "Stand Alone Secondary" state. This allows phones local to each node to continue making calls to each other or even calls to the PSTN via the local gateway available on that network.
Even though there can only be one SA, it is not a single point of failure since it is not needed for Lotus Sametime Unified Telephony operation. SA is only needed when the two nodes cannot communicate over the redundant x-channel and the maintenance controller of the partner node.
To avoid a split brain situation the SA is the final decision maker when two Telephony Control Server nodes cannot communicate over the redundant cluster cross connects and the maintenance controller of the partner node.
System Specific Information
In redundant co-located and geo-separated integrated duplex configurations (redundant Telephony Control Servers with integrated Lotus Sametime Applications software) the survival authority should be installed on a third (off-board) SLES 10 SP2 machine. In the case of redundant co-located and geo-separated standard duplex configurations (redundant Telephony Control Servers without integrated Lotus Sametime Applications software) the Survival Authority is implemented as a software module integrated with the Telephony Control Server Assistant software of the off-board Lotus Sametime Applications server.
More details about installation and configuration of the Survival authority can be found in the following manual:
Lotus Sametime Unified Telephony R0, Service Manual Volume 1, Installation and Upgrades
The communication between each Lotus Sametime Unified Telephony node and the SA is tested every 5 minutes and a failure is alarmed. So in order to have a problem with the Survival Authority, three conditions need to be present:
Communication failure between Lotus Sametime Unified Telephony and SA.Parent topic: Cluster Redundancy
Survival Authority Alarms are being ignored.
A node or network failure resulting in failures of both x-channels and communications to the partner maintenance controller.