Musings of Rodos: Gestalt IT TechFieldDay Seattle

Take a pile of smart just with backgrounds from Sun, Netapp and Data Domain, throw in a few PHDs (I assume) and see what falls out; thats Nimble Storage who launched at Gestalt IT TechFieldDay Seattle.

The company was formed in 2008, based in San Jose. The two founders are

Varun Mehta (Sun, NetApp, Data Domain)
Umesh Maheshwari (Data Domain)

They have some interesting people on their board of directors as well

Suresh Vasudevan (Omneon, NetApp, McKinsey)
Kirk Bowman (Equallogic, VMware, Inktomi)

Nimble call their technology game-changing, taking what was available in separate products and putting it all into one. Nimble coverage of iSCSI primary storage, backup storage and disaster recovery in a new architecture that combines FLASH and high capacity low cost SATA in a new way.

This brings FLASH into the range of many enterprises who would like to use it for more common workloads like Exchange, SQL and VMware. Their target is for organisations with 200 to 2000 employees.

Nimbles competition in the iSCSI market with market sizes (from IDC) are Equallogic who have 35%, EMC 15%, HP and Netapp are around 10% each.

Nimble have done the brave thing and started with a clean sheet of paper to try and create something that no one else can deliver.

The problems they are trying to solve are delivering fast performance without all those expensive disks and how to efficiently back it all up plus replicate that data to a second site for continuity purposes.

Techniques include

capacity optimised snapshots rather than backups
FLASH is used to give great performance
replication that is efficient and based on the primary information so that the time to recover and use that data is very quick, you don't need to wait for a resto

A key think that Nimble bring is their CASL architecture, it provides the following :

Inline Compression. A real time compression engine as data comes in. On primary datasets they are seeing about a 2:1 saving and on things like databases a 4:1 saving. Blocks are variable in sizes and Nimble take advantage of the current state of multi-core processors having a highly threaded software architecture.
Large Adaptive Flash Cache. Flash as a caching layer, starting at 3/4 of a TB for the entry box. They store a copy of all frequently access data, but all data is also storage on the cheaper SATA storage as well.
High-Capacity Disk storage. Using large SATA drives.
Integrated Backup. 60 to 90 days worth of "delta compressed incremental snapshots" can be stored on the system. They have put a lot of work into integration with Microsoft applications, integrating the VSS for ensuring consistency. The snapshot efficiency should remove the requirement for a secondary backup system outside of the primary storage. Combine this with replication to a remote site and you have a protected system.

Nimble showed the results of some testing they performed on a Exchange 2010 19GB database running snaps over 10 days, the other vendor (Equallogic at a guess) consumed over 100GB of data whereas Nimble only consumed 3GB. A 35x improvement was claimed. This then results in less to replicate. Its suspected that the reason for this difference is the smaller and variable blocksize that Nimble can use, the competitor has a large blocksize.
Replication. The replication is point in time snapshot replication. Once nice thing that you can do is maintain different retention periods at each site. For example you might want to maintain a much higher frequency of snaps locally and a less frequent but longer tail of snaps over at DR, very nice. They have a VMware Site Recovery Manager (SRM) plugin in development but it has not been certified yet. Today you can't cascade replication but it will be coming in a future release. Cascade my be important for people who want to use the Nimble for backup, replicate locally and then offsite.

The befits that result from CASL are :

Enhanced enterprise application performance
Instant local backups and restores with fast offsite DR
Eliminates high RPM drives, EFDs, separate disk-based backup solution
60%+ lower costs than existing solutions

When you create volumes they can be tuned for various application types, tweaking such things as page size or if it should be cached. The Nimble ships with a set of predefined templates for popular appellations. The same for snapshot policies which can be templates and a predefined set are provided.

The pricing estimates they have done is at under $3 per Gb for primary storage at an entry price of around $50K.

Here is the specs of the units.

There is no 10GB interface option yet but it will be considered on customer demand. The same goes for having a Fiber Channel interface. The controllers are active, passive on a system (not LUN) basis.

They currently have 10 to 12 beta accounts.

Umesh Maheshwari then have some further details on the technology behind Nimble. A great discussion from someone who knows the industry and the technologies, as you would expect.

Nimble is all about having the

capacity to store backups (through hi-capacity disks, compression and block sharing) along with
random IO performance for primary storage (through Flash cache for random reads and sequentialized random writes)

This technique of sequentialized was developed by Mendel Rosenblum in his PHD thesis in 1991 (see paper). If you don't remember Mendel was one of the founding brains behind VMware so his ideas have a good track record. Its called a Log Structured File System.

So why has this been done before, well it took technology a while to catch up to the idea. The original concept relies on the assumption that files are cached in main memory and that increasing memory sizes will make the caches more and more effective at satisfying read requests, hence the disk traffic will become dominated by writes. With only small amounts of RAM available it was a problem. Secondly the process requires a background job to do garbage collection.

Nimble have created CASL, an implementation of the log based file system. It utilises a large amount of FLASH for the cache and its integrated closely into the disk based file system. The index or metadata of the system is cached in the Flash and therefore the garbage collection can now work efficiently. Of course cache is bit of a simple word for what it does, its not a LRU, there is some complex meta data being tracked for performance.

The second element is the sequential layout of the data on the disks. How you store data on disk could be categorised into 3 different techniques.

1. Write in place. eg. EMC, EqualLogic

its a very simple layout, you don't need lots of indexes.
reads can go quite well
poor at random writes
parity RAID makes it worse

2. Write anywhere. eg. Netapp WAFL (write anywhere file layout)

more write optimised
between full stripes and random writes
its write a sequence of writes wherever there is free space. So when you starts is sequential but after a while the spaces that are free will be fragmented so you end up doing random writes

3. Write sequently. eg DataDomain, Nimble CASL

most write optimised
always do you writes in full stripes
good when writing to RAID
the blocks can now be variable size which is very efficient but it has a secondary effect that you now have room to store some metadata about the block such as a checksum
this requires the garbage collection process which runs in idle times to always ensure there is space available for writing full stripes, what makes this work is that the index is in Flash and the power of the current set of processors
the difference between what DataDomain do and CASL is that DD do their sharing based on hashes and CASL does it based on snapshots

Of course this makes you wonder whats the difference between the CASL cache and what many other providers are doing with a Tier of Flash?

Because the cache is backed by disk (the data is in the cache and on the disk) you don't need to protect the data on the disk. This means you can use cheaper flash drives and you don't need to do any parity or mirroring giving you saving of 1.3 to 2 times.
Its much easier to evict or throw away data in the cache than it is to demote data out of a Flash tier into a lower one, you don't have to copy any data.
You don't have to be so careful about putting things in cache as its not an expensive operation so all writes or reads can be put in cache for fast access if you need it again and of course cache is a lot more effort to integrate into your file system than tiering so if you are dealing with legacy its much harder then when you are starting from scratch like Nimble have.

Thoughts?

I really got the feeling that Nimble are not trying to be everything to everyone. They are focused on a particular market segment, hitting their pain points and attempting to do it better than the incumbents are.

They have a few things to deliver in my opinion to reach the goal, such as

cascaded replication to offer true local and remote data protection
get the SRM module for VMware certified
its looks hard to scale out if you just need some further storage as you can't add disk shelves, you get what you get. Yet their is nothing in their architecture to preclude some changes here which is good.

The big question will be is it different enough to the competitors for them to get into the market. If you only difference is doing something better (no matter how clever it is under the hood) how easy is it for your competitors to be "good enough" or a much better price point. Some good marketing, sales force and channel are going to be key.

With CASL, Nimble certainly have some very nice technology, but nice technology does not always win in the market. Its certainly going to be great to see how their early adopters go and how they adjust the hardware range and feature set over the next 12 months!

Note that its not available in Australia or EMEA yet.

Rodos

Note : Tech Field Day is a sponsored event. Although I receive no direct compensation and take personal leave to attend, all event expenses are paid by the sponsors through Gestalt IT Media LLC. No editorial control is exerted over me and I write what I want, if I want, when I want and how I want.

Musings of Rodos

Pages

Tuesday, July 20, 2010

Gestalt IT TechFieldDay Seattle - Nimble Storage

No comments:

Post a Comment