Blade enclosures and HA
HA has always been an interest of mine, its such a cool and effective feature of VMware ESX. Whilst it is so simple and effective the understanding of how it actually works is often a black art, essentially because VMware have left much of it undocumented. Don't start me on slot calculation.
So this week another edge case for HA came to my attention. Over on the VMTN forum there has been a discussion about the redundancy level of blade chassis. You can read all of the gory details (debate) there but Virtek highlighted an interesting scenario. Here is what Virtek had to say.
I have also seen customers with 2 Blade Chassis in a C7000 6 Blades in each. An firmware issue affected all switch modules simultaneously instantly isolating all blades in the same chassis. Because they were the first 6 blades built it took down all 5 Primary HA agents. The VMs powered down and never powered back up. Because of this I recommend using two chassis and limiting cluster size to 8 nodes to ensure that the 5 primary nodes will never all reside on the same chassis.I have never thought of that before, but the resolution can be better, this is not a reason to limit your cluster size.
My point is that blades are a good solution but require special planning and configuration to do right.
You see with VMware HA you have up to five primary nodes and beyond that secondary nodes. Primary nodes do the (real) work and without a primary node everything goes foo bar. So what happened in Virtek's case? Well as you add nodes to the HA cluster they are added as primary nodes first. Therefore, if you purchase two blade chassis, spliting the nodes between, but add all of the blades in one chassis first, guess what, the first five all become primary. That lovely redundancy you paid all that money for has gone out the window as all the primary nodes will reside within the first chassis. As Virtek found, if all those hosts go, HA is unable to manage the restart of the machines on the ESX hosts in the other chassis, because they are all secondary nodes.
Is this bad, not really, the resolution is to reconfigure your HA once you have added all of your blades in to the HA cluster. This reconfigure will redistribute the primary and secondary nodes around the cluster, which should leave them spread across your chassis. Problem solved.
To determine which notes are primary, if you really want to check, run the "listnodes" command from AAM which will dump a report like this.
/opt/vmware/aam/bin/ftcli -domain vmware -connect YOURESXHOST -port 8042 -timeout 60 -cmd "listnodes"If you want some more details on how HA works Duncan Epping has a great summary over at YellowBricks.
Node Type State
----------------------- ------------ --------------
esx1 Primary Agent Running
esx2 Primary Agent Running
esx3 Secondary Agent Running
esx4 Primary Agent Running
esx5 Primary Agent Running
esx6 Secondary Agent Running
esx7 Primary Agent Running
There, easily fixed and much easier to analyse compared to HA admission control and slot calculations.
If you have any further insights or links, post into the comments.
Rodos
P.S. Thanks to Alan Renouf via Twitter for the command line on listnodes as I did not have access to a cluster to confirm the right syntax.
The other thing we set differently for blades vs rackmount is the default HA isolation response. For blades it's "leave VM's powered on", rackmount is "power off VM's". Reason being that if 1 blade in a chassis becomes isolated, it's likely that every blade is isolated because it would have to be due to either a misconfiguration somewhere that meant we didn't have the actual redundancy that we design (for example, someone in the networking group didn't configure the upstream switches correctly), or a singularity like the scenario described above or both chassis switches failing at the same time. Of course, both those scenarios are highly unlikely, but you can guess which one is more likely of the 2 ;-)
ReplyDelete