Pages

Saturday, November 06, 2010

IBM XIV

I posted the interview with Craig McKenna from IBM on XIV. Here is more of the details from the SNIA Blogfest event this past week.


First off Craig went through the XIV, which there has been a bit of talk about in the industry. Here is his slide on the specs.


The details from my notes were :
  • It comes in its own rack.
  • It is built up from modules, each which hold 12 SATA drives.
  • 6 modules also contain the FC (4G) and iSCSI (1G) interfaces.
  • Internally the backplane uses 1G Ethernet. Each module has four connections.
  • You can start with 6 modules, which gives you 72 disks. Maxed out at 15 modules you will have 180 drives.
  • You need to pick a standard drive size across the entire rack, it only has one tier, full stop. Either 1TB or 2TB drives. You can't match drives, if you do the additional space in the larger drives is not used. Once you have all of the drives in at the larger size, you do get the space as it rebuilds/re-levels. I don't think you can mix drives sizes within a module. Over the lifetime of the machine I wonder if customers are going to want to get the benefits of larger drive sizes. Hopefully they will have a great relationship with their IBM sales rep and can get them to trade in their old drives. Its a result of the architecture but if an XIV was the only storage on your floor it might not be flexible enough for you.
  • With the smallest drive and smallest number of modules you starting point is 27TB. The largest capacity you can go to is 161TB. These figures are for usable space after loss of data protection (mirroring) and sparing (not disk based) is factored in.
  • The read architecture makes the SATA drives perform close to FC speeds.
  • The controllers use a grid architecture, all can access and service data at the same time.
  • The cache is 240G (depending on number of modules).
  • It is always doing thin provisioning but you don't have to over provision.
  • You can put your XIV in front of your existing storage (disruptive to get it into the data path) and then get it to ingest existing data to conduct data migration.
  • Redirect on write is used used for snapshots, similar to Netapp but unlike Netapp the snap data is independent and resides outside of the volume. You can do up to 16000 snaps.
  • Async Replication is based on snapshots (not my favourite method).
  • In the future you will be able to connect multiple frames (racks) together and these could have different drive sizes. Infiniband will be used for the interconnection.
  • Data is broken into 1MB chunks and these are pseudo-random distributed across all resources in the frame as well as being mirrored. This is called RAID-X or mirrored protection.
  • The mirror of a chunk never resides on the same module. Chunks that are on one disk are not mirrored to a matching disk in another module (like a RAID mirror) but rather spread across all the other drives in the system. There is potential for data loss if two disks fail but clever maths and some other techniques are used to make this risk very low. Across the 300 [correction:3000] installations world wide there have been no double drive failures. Of course your traditional RAID systems are at risk from a double drive failure in a set too.
  • Of course with XIV the failure domain is wider if two drives were to fail. This is where rebuild speed comes in. If a disk fails only the 1MB chunks it contained need to be re-mirrored. So if the drive was only half full thats half the data to process than a more traditional RAID rebuild. As the data that has to re-mirrored is spread across all the drives in the system, as is the destination of the re-mirroed chunks, all the disks are involved in the read. This means that a re-mirror is really fast. A 1TB drive can be rebuilt in 30 minutes this way, as opposed to sometime up to 24 hours in traditional systems. The bigger your XIV system (more drives) the faster the re-mirror will be.
  • This great rebuild performance is a key advantage to RAID-X as disk drives continue to get larger.
  • No need in XIV to worry about hardware RAID or hot spare drive management. Operation is very simple, the systems takes care of it for you.
  • Licensing for all functions is included up front.
Whats my take on XIV :
  • You can't discount it. IBM acquired the technology from startup headed by Moshe Yanai who is known as the father of EMC's Symmetrix disk system.
  • Most of the vendors are moving to this commodity hardware and operational simplicity that XIV offers. The smarts is in the software and not the tin or brown spinning stuff. We are seeing more of these grid architectures and chunking of data. Traditional vendors are back filling this into their existing systems, XIV had the luxury of doing it fresh from the get go.
  • XIV looks like storage that does what it does well, but it only does one thing. The nerd knobs don't exist. I suspect that companies that uses XIV are going to be large and that it won't be the only storage sitting on their floor. At an entry point of 27TB usable its no small entry point, so there is going to be some big storage needs. Companies with this amount of data are probably going to have a wider variety of storage requirements, that XIV may not yet handle.
  • RAID-X sounds lovely but it has two drawbacks. First has the most expensive protection level, mirroring. The price is going to have to be right to compensate for the high overhead. Second, that large failure domain means you are only going to be using this for either scratch data or something you have backed up somewhere else. Yes a single drive can rebuild real fast. But IF (and its a long if) that was to happen, because the chunks are so wide spread you loose more than just the data you might on a single traditional RAID group, or none at all if the second disk was in a different RAID set. With RAID-X you may loose a bit of data from everything across the system. Thats going to be a hard one to recover from, and restoring between 27 to 162 TB of data is not going to be fast.
Would like to hear your thoughts on the XIV, post in the comments. Below is the video of Craig taking us through all of this.



Craig then went on to go through the new Sorwize V7000. They have taken the best of SVC, added RAID functionality from the DS8000 box, basically merging the two product lines to deliver a new mid-range controller.

I won't go into all the details of this. Here is the slide and its covered at the end of the video above, after XIV, if you want to watch it.


[Edit : Please see the comments for a response from Craig and some good links with further detail.]

Rodos

4 comments:

  1. Craig McKenna1:53 pm

    Hey Rodos - nice blog on the SNIA Fest. . .

    I was reading your summary of XiV and I noticed you said we have 300 boxes installed. Actually it s 3111 worldwide (and even that data is 2 months old). We'll be pretty close to 4000 by years end. . and still no double drive failures =).

    On DD failure (this is by far the most common 'concern'/FUD we hear). Beyond incredibly fast rebuild (which is the only thing I really talked about on the day) which, mathematically, puts a DD failure in XiV less likely than a DD failure in a 7+1 RAID-5, we also do a bunch of proactive things to avoid the possibility altogether. For instance, as we're perfectly load balanced at all times, we never develop hot spots (even during data rebuilds) so our disks don't fail as often in the first place. We also monitor drive thresholds and, if deemed appropriate, we will take a 'suspect' drive offline (after creating a tertiary mirror) before it fails. An actual 'hard' failure is very rare. . .and 2 (within a 180 disk pool) within a 30min period is something that would occur (mathematically) every ~11,000 years. We agree that a remote possibility is still a possibility though. . .so, if data is important (and we expect it is), you can mirror to a second XiV like you would with important data on any array. The benefit for an XiV client is that the software's already there.

    Also, I think there's a misconception that the failure domain of a RAID-5 environment is isolated to ~8 disks (vs XiV at 180). With O/S striping (which is very common in older storage devices that didn't support wide stripes), or things like metavolumes, you end up 'merging' multiple failure domains in a traditional RAID environment. So the failure of 2 drives in any RAID array inside a storage system would likely drag down most/all applications (especially more important ones which tend to have wider stripes) and result in the same data recovery operations you mention for XiV (unless you mirrored to another system).

    It's a big/interesting topic but enough on DD failure. . .

    You also made a comment about only using XiV for scratch space or for something you've got backed up elsewhere? Not sure if I left you with that impression (if I did, sorry). On the contrary we have some of the largest storage consumers in the world with mission critical applications on XiV. . .core banking, telco billing apps, SAP. . .you name it. You can't sell 3000+ units as scratch space. . .there are cheaper way to do that.

    I think your assertion that mega large accounts would still have needs that XiV doesn't meet is true. . .not many but some is certainly fair to say. What we see is that 80%+ of a corporations data volume would live perfectly happily on autonomic technology like XiV. Doing so would save those corporations so much money (capex and opex) that, if they need to spend a little more on some very unique/special applications, they'll have ample time, money and resource to go do it. . .no need to manage everything that way though (too expensive).

    Nice job on the videos (it's so strange watching yourself) and content overall. . .stay in touch!

    Craig

    ReplyDelete
  2. Anonymous3:19 am

    This article goes into more detail about what 'should happen' if there is a DDF.

    https://www.ibm.com/developerworks/mydeveloperworks/blogs/InsideSystemStorage/entry/ddf-debunked-xiv-two-years-later?lang=en_us

    Obviously this is written from the perspective of IBM ... but I think it does a good job of debunking the idea that you will 'lose everything' if the unthinkable happens.

    ReplyDelete
  3. _Craig_

    Thanks for posting a comment.

    I got the 300 number from the recording. I have updated the post with corrected figure of 3000.

    Thanks for some extra info on the double drive failure (DDF). Yes there are lots of proactive things done reduce the chance of a DDF, and other vendors do some of those things as well. Yes you can mirror to another system (hence my comment around having backup).

    Whilst its true that on other systems you can do wide striping which will increase the failure domain, its your choice, you don't have to; you can understand and keep your failure domain low.

    I am sure there are many reputable and large companies using XIV. You mention banking and other mission critical applications as an example. Thats my point, I am sure that they would be the type of company/data to replicate or backup.

    My simple point is that without another copy your failure domain is large. Yes you may only loose a tiny amount of data and it may only be across a few LUNS, but which ones they are is non-determanistic.

    I still don't think that XiV is a one stop box, just as a simple example lack of NFS support. In my opinion, if someone has this size storage need they are going to have a number of other requirements on their floor and may have other arrays for other (all be it specialist) purposes. Sometimes its good to be great at one particular thing (which is why I suspect many customers buy XiV).

    _Anonymous_

    Thanks for the great link. It has some good details about what does happen when a DDF occurs (even though one has never happened). Its great that it reports which LUNs have been effected, it may be a small number or quite a few. As indicated you will then have to go back to your secondary copies (backups) to restore that data. This is essentially my point. All I said was if you are going to use XIV you are going to have to have your ALL data backed up somewhere else, thats quite typical. As the post referenced says, you are going to need this.

    Thanks for the great discussion.

    Rodos

    ReplyDelete
  4. Late commenter but I ran XIV for 2 years prior to my latest job move and loved it. We ran a fair sized virtual environment and it handled everything we threw at it and more. Interface was super easy as were all configuration tasks.

    I initially was thrown off by the Raid-X and still find it disk costly but we never had any data loss issues. We were also surprised that even as we filled the unit we recovered a drive quickly and saw no performance deprecation. In the end I was converted overall.

    My only real complaint with the XIV was the slow release and adoption of VMware api functionality like VAAI. This wasn't a showstopper for me but did hurt my feelings a little. :)

    Great post and thanks for your contribution to the community!

    ReplyDelete