In this post I want to talk about an error encountered on a MSSQL cluster running Windows Server 2008 R2 Enterprise edition. A cluster is a mechanism that provides redundancy between shared resources. By using such design, if a devices becomes unavailable the whole system is not affected. Using a SQL cluster, the SQL service is shared between two or more nodes (servers). The service runs on a server which is called the active node. If the active node fails, the service is automatically moved on the remaining nodes. I don’t have much experience with clusters so please if you detect any wrong information don’t hesitate to correct me. Such cluster usually has two interfaces, one for connecting with the whole network and another one, usually called HEARTBEAT, that is used for exchanging information between the cluster’s nodes (information such as who is the active node).
In the problem that I’ve encountered, the cluster could not be seen in the “Failover Cluster Manager”:
As you can see from the image, the console acts like no cluster has been created. It’s easier to interact with the cluster using the GUI rather than using the command line so this problem had to be fixed fast. I’ve tried to select the managed cluster by right clicking on the “Failover Cluster Manager” and by selecting “Manage a Cluster”, but with no success:
After searching the Internet for a resolution, I’ve stumbled on the following article from Microsoft’s website http://support.microsoft.com/kb/2462468
. In this article they’ve explained that this problem occurs when the “Server” service is stopped. I’ve immediately checked the “Services console” and discovered that indeed, the service was stopped. After starting the service, the Cluster came back to normal:
The worst part was that I did not know why this problem has occurred. I’ve opened the Event Viewer to check for any suspicious activity on the server:
I’ve found the events who reported the crush but not the actual cause of this problem. After reading Microsoft’s article I’ve also checked that the following protocols are activated on the network interface:
1. Client for Microsoft networks
2. File and printer sharing for Microsoft networks
This problem can also be caused by the HEARTBEAT interface failure. I’ve checked the interface and it was up for a while and could not be the reason for this problem:
I’ve looked a lot on the Internet but couldn’t find the exact cause for this problem. What I know that at the time of the event, there were two administrators working on the servers and probably an action made by them caused the service to stop.
I hope this article will help those that came upon this problem. The resolution is pretty simple but the actual cause of this problem is not known, at least by me. If you know any possible cause for this problem please leave a comment and share your idea. Enjoy your day and stay tuned for the following articles.