Recently I had to patch nodes in a failover cluster and encountered a very strange situation. I decided to post about it just in case someone else out there encounters the same issue.
Let me start by giving you a little background…
I have a 3-node Windows Server Failover Cluster with SQL Server FCI and I had to apply SQL Server SP2 patch. The environment I’m on is quite secure (i.e. STIG‘d down tight).
What I normally do when I have to patch cluster nodes is first, I manually failover between the nodes…NodeA –> NodeB –> NodeC, then finally back to NodeA. (I know, most likely unnecessary, but I do it any way).
Second, remove the node I intend on patching from the “possible owners” list of the Resource (in this case SQL Server).
Third, I install the patch/es (and reboot).
Fourth, after rebooting, the node joins the cluster (this happen automatically), then I go in and add the node back to the “possible owners” list of the Resource.
Fifth, manually failover from the active node. I do this to make sure everything fails over as well as all the SQL Server binaries are updated (to the patch level I just installed). Some times this failover takes time. Don’t freak out if it takes longer than usual time to failover.
Sixth, repeat on other Nodes.
That is NOT what happened this time
When I rebooted the patched node it did not automatically join the cluster like it should have. It kept trying to join but ended up failing (red X). I checked the cluster logs (which is a pain in the *ss to decipher) and found a couple errors that of course didn’t make sense.
I ran a “cluster validation” test with all the nodes and it failed. The error was pretty telling, yet confusing. It was something like, “CLIUSR account is not part Local Account”…(something like that).
Up until then, I have never heard of “CLIUSR.” I had no idea what it was.
What is the CLIUSR Local Account?
Apparently the CLIUSR account is a local user account created by the Failover Cluster feature. Windows Server Failover Cluster service uses this local account for adding nodes, joining nodes to the cluster, etc.
Per Microsoft, if you delete this local account, WSFC will recreate it upon joining a node to the cluster. *
So Why Couldn’t My Node Join the Cluster?
Apparently a GPO policy had put “Local Account” in the “Deny access to this computer from the network” User Rights Assignment in the Local Security Policy (which is a DISA STIG by the way). I spoke with our AD Admin and they modified it based on this article here.
Once this was updated the node automatically joined the cluster.