Who stole my UCS boot order, well may you ask.
Check out the differences in the following screen grab from UCS Manager for one of my blades (I hacked the two screens together so you could see both at the same time).
Notice the difference between the two? The Configured order has boot from CD and then from SAN storage, the Actual order has the local hard drive in the middle, with the storage at the bottom. Weird thing is, I never made that change.
The issue here is a known bug, in that if a device is not available on boot the BIOS moves it back in the boot order.
To fix it you can apply the boot order again or re-associate the blade, both require a reboot, which does not really matter as your probably never had a successful boot in the first place.
When can this happen? It can occur when you have an outage or failure and things don't come up in the correct order or a timely manner. In my case I had booted my blades whilst the MDS FC switch was down. When the blades did not boot it was obvious that the MDS was not up, however after fixing that it was not obvious why the blades would still not boot. Reason, the boot order changed without me knowing it.
So if you have stack blades all booting from SAN, and let them boot with the boot device unavailable, you will have many blades to quickly fix boot order for.
Its really not a big deal when you know its coming, and now you know! Also if it was not my testing environment I would not have had local drives in there, after all this is stateless computing.
This is why we do comprehensive regression testing peoples, repeat after me, this is why we do regression testing.
Rodos
[Update : The DDTS, bug number, for this is CSCtb48651]
As of the UCS 1.1x release notes, this bug is outstanding.
ReplyDeleteThis happened to me this evening with UCSM 2.0 when I changed from FC switch mode to End Host mode. My ESXi boot from SAN blades were still up when the FIs rebooted, so I was left with a mess. Thanks for leading me in the right direction.
ReplyDelete