Pages

Wednesday, December 17, 2008

FC storage maximums explained

Have you ever been confused by all of those maximums and descriptions in the Fibre Channel section of the Configurations Maximums reference document? Which ones are for a host and which are for a cluster? Well read on to find out.

Here is the table that lists the limits for Fibre Channel.



Whilst it’s simple it can be confusing, why do some have “per server” and others don’t. When it says "Number of paths to a LUN" is that across the cluster or for the host? Are you sure?

Well here is some clarification.

LUNs per server 256

On your ESX host you can only zone 256 LUNs to that host. That’s a big number.

LUN size 2TB

Your LUN can’t be bigger than 2TB, but you can use extents to combine multiple LUNs to make a larger datastore. Most SANs will not create a LUN larger than 2TB either.

Number of paths to a LUN 32

This is the confusing one, because it does not have “per server” and then the seed of doubt is sown. To confirm, this is a server metric, not a cluster one. Also note that only active paths are counted. So if you have an active/passive SAN that’s one path, even if you have redundant HBAs. If it’s an Active/Active SAN you can have four Paths (one for each combination of HBA and SP). In some high end storage arrays, like a HDS USP/VM you can configure way more than 32 paths to a single LUN, now that’s some scalability and redundancy.

Number of total paths on a server 1024

So if you have two HBAs on an Active/Active SAN that’s a maximum of 256 LUNs on the host. Hey, go figure, that matches the LUNs per server limit!

LUNs concurrently opened by all virtual machines 256

Again, you can only be talking to 256 LUNs from the one server.

LUN ID 255

No, its not a misprint, LUN counting starts a 0. This is effectively cluster wide, as for multipathing to work properly, each LUN must present the same LUN ID number to all ESX Server hosts.

So what do you really need to be worried about for the maximums? 256 LUNs is your limit per cluster and host, and you can have up to four active paths for each (but ESX will only use one path at a time for a particular LUN). Of course, as Edward recently pointed out the real limit may be your SAN, as some have limitations on how many hosts per LUN.

What other things should you be looking out for then? Here is a quick dump of some of the considerations in regards to FC and pathing.

  • Do not use a Fixed path policy if you have an Active/Passive SAN, use Most Recently Used, this avoids potential issues with path thrashing.
  • Round Robin load balancing is experimental and not supported for production use.
  • HA does not monitor or detect storage path failover. If all your paths to your storage fail, bad things happen. To quote the SAN configuration guide (pg 42) “A virtual machine will fail in an unpredictable way if all paths to the storage device where you stored your virtual machine disks become unavailable.”
  • With certain active/active arrays you can do static load balancing to place traffic across multiple HBAs to multiple LUNs. To do this, assign preferred paths to your LUNs so that your HBAs are being used evenly. For example, if you have two LUNs (A and B) and two HBAs (X and Y), you can set HBA X to be the preferred path for LUN A, and HBA Y as the preferred path for LUN B. This maximizes use of your HBAs bandwidth. Path policy must be set to Fixed for this case. Duncan has details of a script written by Ernst that can automate this process for you. Duncan writes in English which is helpful for us single language people.
  • When you use VMotion with an active/passive SAN storage device, make sure that all ESX Server systems have consistent paths to all storage processors. Not doing so can cause path thrashing when a VMotion migration occurs.
  • For best results, use the same model of HBA in one server. Ensure that the firmware level on each HBA is the same in one server. Having Emulex and QLogic HBAs in the same server to the same target is not supported.
  • Set the timeout value for detecting when a path fails in the HBA driver. VMware recommends that you set the timeout to 30 seconds to ensure optimal performance.
  • For boot from SAN, if you have an active/passive SAN the configured SP must be available, if its not, it can’t use the passive one and the ESX host will not boot.
Further details can be found in the Fibre Channel SAN Configuration Guide. Thanks my Simon my local VMware SE for letting me push to the pedantic edge the issue of is "Number of Paths to a LUN" really a host number, I wanted it in writing.

Rodos

2 comments:

  1. As a follow up. If you need to present more than 4TB of VMDK files to a host you will need to increase the VMFS3.MaxHeapSizeMB. See http://kb.vmware.com/kb/1007256

    ReplyDelete
  2. Another follow up. Looks like some of the drivers place lower limits on some of these figures. For example the HP driver hpsa has a reduced total paths on a server. See the post at http://malaysiavm.com/blog/calculation-of-max-lun-supported-in-esx-server/ for some more details.

    ReplyDelete