Cluster: Race condition with WaitForCluster #190
Labels
enhancement
The issue is an enhancement request.
help wanted
The issue is up for grabs for anyone in the community.
Details of the scenario you tried and the problem that is occurring
This is the expected way of doing things which usually works:
I'm building 1+4 node clusters (all in one subnet) and in these configurations with a lot of nodes a race condition occurs where xWaitForCluster believes the cluster is ready but xCluster will fail randomly during Test-Resource with a "The name used to access the cluster is not currently available" message.
So the create works, and one or two add nodes will pass, and the rest will fail. This is rectified about 10-15 minutes when DSC re-runs the configuration but that's as long again as building the entire lab.
I can only imagine there is a short window where the cluster exists but the add to node isn't functional just yet. I replaced xWaitForCluster with WaitForAll on the first node's cluster - this seemed to work more reliably for single-subnets but for multi-subnets (which might briefly take the cluster offline when you add the subnet) it can also fail.
Suggested solution to the issue
What would you think about adding a RetryIntervalSec defaulting to 10 and RetryCount defaulting to 0 on the xCluster resource? This can attempt to retry whatever operation it is doing in case of a transient failure like it flipping on and off during other node's operations, avoids throwing transient errors to the DSC logs and waiting for the entire process to re-run.
The operating system the target node is running
Windows Server 2012
Version and build of PowerShell the target node is running
WMF 5.1
Version of the DSC module that was used ('dev' if using current dev branch)
dev
The text was updated successfully, but these errors were encountered: