Gestalt IT TechFieldDay Seattle - Nimble Storage
Take a pile of smart just with backgrounds from Sun, Netapp and Data Domain, throw in a few PHDs (I assume) and see what falls out; thats Nimble Storage who launched at Gestalt IT TechFieldDay Seattle.
The company was formed in 2008, based in San Jose. The two founders are
- Varun Mehta (Sun, NetApp, Data Domain)
- Umesh Maheshwari (Data Domain)
- Suresh Vasudevan (Omneon, NetApp, McKinsey)
- Kirk Bowman (Equallogic, VMware, Inktomi)
This brings FLASH into the range of many enterprises who would like to use it for more common workloads like Exchange, SQL and VMware. Their target is for organisations with 200 to 2000 employees.
Nimbles competition in the iSCSI market with market sizes (from IDC) are Equallogic who have 35%, EMC 15%, HP and Netapp are around 10% each.
Nimble have done the brave thing and started with a clean sheet of paper to try and create something that no one else can deliver.
The problems they are trying to solve are delivering fast performance without all those expensive disks and how to efficiently back it all up plus replicate that data to a second site for continuity purposes.
- capacity optimised snapshots rather than backups
- FLASH is used to give great performance
- replication that is efficient and based on the primary information so that the time to recover and use that data is very quick, you don't need to wait for a resto
- Inline Compression. A real time compression engine as data comes in. On primary datasets they are seeing about a 2:1 saving and on things like databases a 4:1 saving. Blocks are variable in sizes and Nimble take advantage of the current state of multi-core processors having a highly threaded software architecture.
- Large Adaptive Flash Cache. Flash as a caching layer, starting at 3/4 of a TB for the entry box. They store a copy of all frequently access data, but all data is also storage on the cheaper SATA storage as well.
- High-Capacity Disk storage. Using large SATA drives.
- Integrated Backup. 60 to 90 days worth of "delta compressed incremental snapshots" can be stored on the system. They have put a lot of work into integration with Microsoft applications, integrating the VSS for ensuring consistency. The snapshot efficiency should remove the requirement for a secondary backup system outside of the primary storage. Combine this with replication to a remote site and you have a protected system.
Nimble showed the results of some testing they performed on a Exchange 2010 19GB database running snaps over 10 days, the other vendor (Equallogic at a guess) consumed over 100GB of data whereas Nimble only consumed 3GB. A 35x improvement was claimed. This then results in less to replicate. Its suspected that the reason for this difference is the smaller and variable blocksize that Nimble can use, the competitor has a large blocksize.
- Replication. The replication is point in time snapshot replication. Once nice thing that you can do is maintain different retention periods at each site. For example you might want to maintain a much higher frequency of snaps locally and a less frequent but longer tail of snaps over at DR, very nice. They have a VMware Site Recovery Manager (SRM) plugin in development but it has not been certified yet. Today you can't cascade replication but it will be coming in a future release. Cascade my be important for people who want to use the Nimble for backup, replicate locally and then offsite.
- Enhanced enterprise application performance
- Instant local backups and restores with fast offsite DR
- Eliminates high RPM drives, EFDs, separate disk-based backup solution
- 60%+ lower costs than existing solutions
The pricing estimates they have done is at under $3 per Gb for primary storage at an entry price of around $50K.
Here is the specs of the units.
There is no 10GB interface option yet but it will be considered on customer demand. The same goes for having a Fiber Channel interface. The controllers are active, passive on a system (not LUN) basis.
They currently have 10 to 12 beta accounts.
Umesh Maheshwari then have some further details on the technology behind Nimble. A great discussion from someone who knows the industry and the technologies, as you would expect.
- capacity to store backups (through hi-capacity disks, compression and block sharing) along with
- random IO performance for primary storage (through Flash cache for random reads and sequentialized random writes)
So why has this been done before, well it took technology a while to catch up to the idea. The original concept relies on the assumption that files are cached in main memory and that increasing memory sizes will make the caches more and more effective at satisfying read requests, hence the disk traffic will become dominated by writes. With only small amounts of RAM available it was a problem. Secondly the process requires a background job to do garbage collection.
Nimble have created CASL, an implementation of the log based file system. It utilises a large amount of FLASH for the cache and its integrated closely into the disk based file system. The index or metadata of the system is cached in the Flash and therefore the garbage collection can now work efficiently. Of course cache is bit of a simple word for what it does, its not a LRU, there is some complex meta data being tracked for performance.
The second element is the sequential layout of the data on the disks. How you store data on disk could be categorised into 3 different techniques.
1. Write in place. eg. EMC, EqualLogic
- its a very simple layout, you don't need lots of indexes.
- reads can go quite well
- poor at random writes
- parity RAID makes it worse
- more write optimised
- between full stripes and random writes
- its write a sequence of writes wherever there is free space. So when you starts is sequential but after a while the spaces that are free will be fragmented so you end up doing random writes
- most write optimised
- always do you writes in full stripes
- good when writing to RAID
- the blocks can now be variable size which is very efficient but it has a secondary effect that you now have room to store some metadata about the block such as a checksum
- this requires the garbage collection process which runs in idle times to always ensure there is space available for writing full stripes, what makes this work is that the index is in Flash and the power of the current set of processors
- the difference between what DataDomain do and CASL is that DD do their sharing based on hashes and CASL does it based on snapshots
- Because the cache is backed by disk (the data is in the cache and on the disk) you don't need to protect the data on the disk. This means you can use cheaper flash drives and you don't need to do any parity or mirroring giving you saving of 1.3 to 2 times.
- Its much easier to evict or throw away data in the cache than it is to demote data out of a Flash tier into a lower one, you don't have to copy any data.
- You don't have to be so careful about putting things in cache as its not an expensive operation so all writes or reads can be put in cache for fast access if you need it again and of course cache is a lot more effort to integrate into your file system than tiering so if you are dealing with legacy its much harder then when you are starting from scratch like Nimble have.
I really got the feeling that Nimble are not trying to be everything to everyone. They are focused on a particular market segment, hitting their pain points and attempting to do it better than the incumbents are.
They have a few things to deliver in my opinion to reach the goal, such as
- cascaded replication to offer true local and remote data protection
- get the SRM module for VMware certified
- its looks hard to scale out if you just need some further storage as you can't add disk shelves, you get what you get. Yet their is nothing in their architecture to preclude some changes here which is good.
With CASL, Nimble certainly have some very nice technology, but nice technology does not always win in the market. Its certainly going to be great to see how their early adopters go and how they adjust the hardware range and feature set over the next 12 months!
Note that its not available in Australia or EMEA yet.