Pages

Tuesday, August 04, 2009

HA in Cisco UCS Menlo Card

Like Scott Lowe I am doing the Cisco UCS bootcamp. Scott did a nice writeup of his notes. I will also write up my notes but here is a summary of something that has been a bit confusing, so I wanted to put out some clarity.

There are the three types of adapter cards for the UCS blade. A Oplin, Menlo or Palo. The Oplin has no FC ports, it only provides two 10GE.

The Menlo card provides two FC and two 10GE and is currently shipping. The details of the card are :
The Menlo adapter is a converged network adapter (CNA) with 2 host-side 10GE ports and 2 FC ports to the backplane. The 2 network ports can run either native Ethernet or FCoE protocols and can be configured for failover. This failover is performed by the Menlo ASIC and does not require multipathing sofrware on the host. The Menlo ASIC is a Cisco-designed multiplexor and FCoE protocol offload engine with a 350MHz 24K MIPS processor. There are 2 versions of this card: Menlo-E (with an Emulex chipset) and Menlo-Q (with a Qlogic chipset), thereby supporting existing proven driver stacks to the customer.

The interesting thing here is the statement about failover. On your first reading you would think that there is some great functionality here. People have been sprouting this failover without any diving into what it really means.

To cut to the chase, this is for the vNICs only and NOT for the vHBAs. A lot of the documentation is not that clear and its easy to just assume it works for all virtual interface types.

You can see it on the dialog for creating a vNIC for a server. Notice the "Enable Failover" check box. Its actually disabled in this shot because the lab environment I am connected to does not have redundant Fabric-Interconnects and IO Modules. If you enable this feature only one two vNICs will appear to the OS and the Menlo card will handle the failover internally, routing the failed vNIC via a path on the other IOM.



When you have a look at the dialog for creating a vHBA you can see there is no redundancy option.



For some further details the Frequently Asked Questions for UCS - Ciscowiki has an the following.
Q: If the fabric extender is connected to fabric interconnect using 4 links and one of the links fail, what will happen?
Ans: The server interfaces that are affected will either lose connectivity or fail over to another fabric extender, depending if interface is created as a HA interface. Menlo has the capability to fail over Ethernet interfaces if so configured. Oplin does not have this capability. Fibre Channel interfaces that are pinned to failing fabric extender link will just fail and their HA capability depends purely on host side multipathing driver. If HA/mutipathing is not configured for Ethernet/Fibre Channel then servers connected to the failed link will loose connectivity but the other three links will be working as usual. Remember that no automatic re-pinning will happen. You can manually re-pin the servers using two link topology, since three link topology is not supported.
Also note that to restore connectivity you need to re-pin the IO Module that had a failed link. To do this you need to reset it. From the CLI it would look like this.
carrot-A# scope chassis 1
carrot-A /chassis # scope iom 1
carrot-A /chassis/iom # where
Mode: /chassis/iom
Mode Data:
scope chassis 1
scope iom 1
carrot-A /chassis/iom # reset
carrot-A /chassis/iom* # commit
carrot-A /chassis/iom #
However this is UCS and the GUI for UCSM is just as powerful as the CLI, so here is where you do it in the GUI.



Of course smart people like Brad Hedlund get this already, he wrote
The vHBA’s do not support the UCS fault tolerant feature and therefore a standard HBA multi-pathing configuration is still required in the operating system (ESX kernel) for Fibre Channel high availability.


Rodos

P.S. Thanks to everyone in the class today who put up with me asking endless questions about this, to Glenn from Cisco in Australia who answered some stuff over chat and our Instructor David. Any errors are all mine, post in the comments if you have any fixes or further insights.

[UPDATE : Brad Hedlund and I had a conversation over Twitter about how the redundant Ethernet is presented to the blade and how the fail over occurs. Turns out it is slightly different and better than everyone thought, but even Brad had to do some digging to be 100% sure. Thanks Brad!]

1 comment:

  1. Anonymous3:24 am

    "If you enable this feature only one two vNICs will appear to the OS and the Menlo card will handle the failover internally, routing the failed vNIC via a path on the other IOM."


    This is false. Two NICS PCI will presented to the OS regardless if you select "enable failover" or not.

    ReplyDelete