Well I have escaped from the wet and dreary shores of Sydney to spend some time geeking it up with the crew for the Virtualization Field Day #2. Having been to one of these events before I know just how much hard work and fun it can be. Its so great to hang out with people so smart in their field, plus to hear direct from the best people within the presenting vendors.
The activities start Wednesday night with a get together dinner. Thursday and Friday are all of the vendor presentations. I arrived this last Sunday to do a few days work of meeting before the event. Of course I had to do a bit of the usual Silicon Valley shop hop around some of the favourite haunts for all things geek.
One place I went today, that I had never thought of before, was to Apple HQ. Here is some guy who was wearing a suit, who wears a suit in the valley, only me!
Who's the stiff in the suit!
The cool thing is that there is a company store there. Its not like an Apple store. It has a lot more Apple merchandise. It also has a t-shirt that can only be purchased from the Apple campus store. Of course I had to get one.
Apple Company Store
Of course I also had to do a trip to Fry's and pick up something. Ended up getting a 4 port 1G switch for the home office. I am sick of 100Mb transfer speed between me and the Drobo storage device (which hangs off a Mac mini). Also some of those nice little pop up speakers for use in hotel rooms etc. This is on top of the other stuff I pre-shipped to my hotel, none of which has arrive yet. I pre shipped a bunch of t-shirts from ThinkGeek for the kids and a SSD drive for me.
One place I have never been to here in the US is In-N-Out burger. My American friends rave about it. So I had to check it out.
The back wall of In-N-Out burger, the view from the car park.
After eating mostly healthy food for about a year it was great to chow down on great fast food. This stuff is fresh, you have to wait for it to be cooked. The fries are cut from whole potatoes just before they are cooked. However my stomach rebelled about half an hour later, the temple had been defiled! But was worth it. Repeat after me, "In-N-Out is occasional food!".
But what is going to be really fun this week is hanging out with the old friends plus some new people at the field day. The attendees this year are Edward Haletky, Bill Hill, Mike Laverick, Dwayne Lessner, Scott Lowe, Roger Lund, Robert Novak, David Owen, Brandon Riley, Todd Scalzott, Rick Schlander and Chris Wahl. Some real who's who of virtualisation thinkers.
The vendors this event are interesting, we have Symantec, Zerto, Xangati, PureStorage, Truebit.tv and Pivot3. Some big names their, some interesting new ones and great to see that I will get to hear the thoughtful words of Mr Backup himself, aka Mr W. Curtis Preston again.
The only vendor I will call out specifically as sparking some very high interest from me pre event is Zerto. They have DR capabilities with full integration to VMware vCloud Director. As I deal daily with one of the leading deployments of vCloud Director in the service provider space this really gets my brain juices flowing. There is big interest in this topic and I am really keen to see exactly what these guys have. I want to separate the hype from the reality and really hope that the reality is an exciting story.
You can see the details of the whole event over at the Field Day site http://techfieldday.com/, the links page really gives you all the resources you need. The sessions will be broadcast online and you can follow the tweet stream via the hashtag #VFD2.
Want to get yourself to VMworld San Francisco but can't convince your boss or afford to drag yourself there under your own steam?
Well here is your chance to win a free conference ticket, plus the accommodation and airfare. The airfare can be international so all you Australians out there, this one is for you too!
The prize is being organised by Gestalt IT and sponsorship is being provided by Xsigo and Symantec!
The winner will be picked by a group of judges (I am one) based on how well you plan to "pay it forward" from your win. Describe how you will share your enthusiasm for VMware, virtualisation or whatever its is you are into as a result of your visit to VMworld.
This is my 5th year in a row at VMworld so I can attest its the biggest geek fest and party of the year. So get your entry in and I might just see you there!
Take a pile of smart just with backgrounds from Sun, Netapp and Data Domain, throw in a few PHDs (I assume) and see what falls out; thats Nimble Storage who launched at Gestalt IT TechFieldDay Seattle.
The company was formed in 2008, based in San Jose. The two founders are
Varun Mehta (Sun, NetApp, Data Domain)
Umesh Maheshwari (Data Domain)
They have some interesting people on their board of directors as well
Suresh Vasudevan (Omneon, NetApp, McKinsey)
Kirk Bowman (Equallogic, VMware, Inktomi)
Nimble call their technology game-changing, taking what was available in separate products and putting it all into one. Nimble coverage of iSCSI primary storage, backup storage and disaster recovery in a new architecture that combines FLASH and high capacity low cost SATA in a new way.
This brings FLASH into the range of many enterprises who would like to use it for more common workloads like Exchange, SQL and VMware. Their target is for organisations with 200 to 2000 employees.
Nimbles competition in the iSCSI market with market sizes (from IDC) are Equallogic who have 35%, EMC 15%, HP and Netapp are around 10% each.
Nimble have done the brave thing and started with a clean sheet of paper to try and create something that no one else can deliver.
The problems they are trying to solve are delivering fast performance without all those expensive disks and how to efficiently back it all up plus replicate that data to a second site for continuity purposes.
Techniques include
capacity optimised snapshots rather than backups
FLASH is used to give great performance
replication that is efficient and based on the primary information so that the time to recover and use that data is very quick, you don't need to wait for a resto
A key think that Nimble bring is their CASL architecture, it provides the following :
Inline Compression. A real time compression engine as data comes in. On primary datasets they are seeing about a 2:1 saving and on things like databases a 4:1 saving. Blocks are variable in sizes and Nimble take advantage of the current state of multi-core processors having a highly threaded software architecture.
Large Adaptive Flash Cache. Flash as a caching layer, starting at 3/4 of a TB for the entry box. They store a copy of all frequently access data, but all data is also storage on the cheaper SATA storage as well.
High-Capacity Disk storage. Using large SATA drives.
Integrated Backup. 60 to 90 days worth of "delta compressed incremental snapshots" can be stored on the system. They have put a lot of work into integration with Microsoft applications, integrating the VSS for ensuring consistency. The snapshot efficiency should remove the requirement for a secondary backup system outside of the primary storage. Combine this with replication to a remote site and you have a protected system.
Nimble showed the results of some testing they performed on a Exchange 2010 19GB database running snaps over 10 days, the other vendor (Equallogic at a guess) consumed over 100GB of data whereas Nimble only consumed 3GB. A 35x improvement was claimed. This then results in less to replicate. Its suspected that the reason for this difference is the smaller and variable blocksize that Nimble can use, the competitor has a large blocksize.
Replication. The replication is point in time snapshot replication. Once nice thing that you can do is maintain different retention periods at each site. For example you might want to maintain a much higher frequency of snaps locally and a less frequent but longer tail of snaps over at DR, very nice. They have a VMware Site Recovery Manager (SRM) plugin in development but it has not been certified yet. Today you can't cascade replication but it will be coming in a future release. Cascade my be important for people who want to use the Nimble for backup, replicate locally and then offsite.
The befits that result from CASL are :
Enhanced enterprise application performance
Instant local backups and restores with fast offsite DR
Eliminates high RPM drives, EFDs, separate disk-based backup solution
60%+ lower costs than existing solutions
When you create volumes they can be tuned for various application types, tweaking such things as page size or if it should be cached. The Nimble ships with a set of predefined templates for popular appellations. The same for snapshot policies which can be templates and a predefined set are provided.
The pricing estimates they have done is at under $3 per Gb for primary storage at an entry price of around $50K.
Here is the specs of the units.
There is no 10GB interface option yet but it will be considered on customer demand. The same goes for having a Fiber Channel interface. The controllers are active, passive on a system (not LUN) basis.
They currently have 10 to 12 beta accounts.
Umesh Maheshwari then have some further details on the technology behind Nimble. A great discussion from someone who knows the industry and the technologies, as you would expect.
Nimble is all about having the
capacity to store backups (through hi-capacity disks, compression and block sharing) along with
random IO performance for primary storage (through Flash cache for random reads and sequentialized random writes)
This technique of sequentialized was developed by Mendel Rosenblum in his PHD thesis in 1991 (see paper). If you don't remember Mendel was one of the founding brains behind VMware so his ideas have a good track record. Its called a Log Structured File System.
So why has this been done before, well it took technology a while to catch up to the idea. The original concept relies on the assumption that files are cached in main memory and that increasing memory sizes will make the caches more and more effective at satisfying read requests, hence the disk traffic will become dominated by writes. With only small amounts of RAM available it was a problem. Secondly the process requires a background job to do garbage collection.
Nimble have created CASL, an implementation of the log based file system. It utilises a large amount of FLASH for the cache and its integrated closely into the disk based file system. The index or metadata of the system is cached in the Flash and therefore the garbage collection can now work efficiently. Of course cache is bit of a simple word for what it does, its not a LRU, there is some complex meta data being tracked for performance.
The second element is the sequential layout of the data on the disks. How you store data on disk could be categorised into 3 different techniques.
1. Write in place. eg. EMC, EqualLogic
its a very simple layout, you don't need lots of indexes.
its write a sequence of writes wherever there is free space. So when you starts is sequential but after a while the spaces that are free will be fragmented so you end up doing random writes
3. Write sequently. eg DataDomain, Nimble CASL
most write optimised
always do you writes in full stripes
good when writing to RAID
the blocks can now be variable size which is very efficient but it has a secondary effect that you now have room to store some metadata about the block such as a checksum
this requires the garbage collection process which runs in idle times to always ensure there is space available for writing full stripes, what makes this work is that the index is in Flash and the power of the current set of processors
the difference between what DataDomain do and CASL is that DD do their sharing based on hashes and CASL does it based on snapshots
Of course this makes you wonder whats the difference between the CASL cache and what many other providers are doing with a Tier of Flash?
Because the cache is backed by disk (the data is in the cache and on the disk) you don't need to protect the data on the disk. This means you can use cheaper flash drives and you don't need to do any parity or mirroring giving you saving of 1.3 to 2 times.
Its much easier to evict or throw away data in the cache than it is to demote data out of a Flash tier into a lower one, you don't have to copy any data.
You don't have to be so careful about putting things in cache as its not an expensive operation so all writes or reads can be put in cache for fast access if you need it again and of course cache is a lot more effort to integrate into your file system than tiering so if you are dealing with legacy its much harder then when you are starting from scratch like Nimble have.
Thoughts?
I really got the feeling that Nimble are not trying to be everything to everyone. They are focused on a particular market segment, hitting their pain points and attempting to do it better than the incumbents are.
They have a few things to deliver in my opinion to reach the goal, such as
cascaded replication to offer true local and remote data protection
get the SRM module for VMware certified
its looks hard to scale out if you just need some further storage as you can't add disk shelves, you get what you get. Yet their is nothing in their architecture to preclude some changes here which is good.
The big question will be is it different enough to the competitors for them to get into the market. If you only difference is doing something better (no matter how clever it is under the hood) how easy is it for your competitors to be "good enough" or a much better price point. Some good marketing, sales force and channel are going to be key.
With CASL, Nimble certainly have some very nice technology, but nice technology does not always win in the market. Its certainly going to be great to see how their early adopters go and how they adjust the hardware range and feature set over the next 12 months!
Note that its not available in Australia or EMEA yet.
Rodos
Note : Tech Field Day is a sponsored event. Although I receive no direct compensation and take personal leave to attend, all event expenses are paid by the sponsors through Gestalt IT Media LLC. No editorial control is exerted over me and I write what I want, if I want, when I want and how I want.
A big vendor in the networking and Internet market is F5. We visited them on the Gestalt IT TechFieldDay Seattle.
As you can see the room was full of people.
Introduction
Kirby Wadsworth (VP of Global Marketing) did a who F5 are and what they do. F5 see themselves as the strategic point of control in your data center architecture optimising the relationships between users and the applications and the data that they need.
F5 have 44% of the general application controller delivery market which includes things such as load balancing and some minor layer 2 to 7 functions. In the advanced market where you go beyond layer 4 load balancing and taking advantage of caching, rate shaping and other elements the share in higher.
F5 have a broad set of products most of which a run from their BigIP, which is the hardware platform. The BigIP runs the TMOS OS. These products plugin or layer onto TMOS. The core business is certainly around Local Traffic Manager, where connections are balanced across servers. Global traffic manager does this across data centers. There are many products in the range :
Local Traffic Manager (LTM)
Global Traffic Manager (GTM)
Link Controller (LC)
Application Security Manager (ASM)
WebAccelerator (WA)
Edge Gateway
WAN Optimization Module (WOM)
Access Policy Manager (APM)
To me one of the most exciting things is that earlier this year F5 released a virtual edition of their Big-IP Local Traffic Manager. The LTM is a great device to run as a virtual machine and thankfully its not limited in terms of features. Great to see vendors starting to deliver choice to customers in how they would like to run vendors software! F5 did not make much of a deal about this, especially considering there were some virtualisation people attending. However there is probably not much you can say about it.
Long Distance VMotion
Next we had a demonstration of long distance VMotion. A really interesting part of this was that they use vOrchestrator to control the Big-IPs and the VMware tasks. It was great to see automation being done through Orchestrator workflows. It also shows the power of what you can do with F5 products when you start to pull multiple together and automate them.
I have seen this before at VMworld and its a little difficult to describe it in great detail. If you are interested in it seek out F5 at VMworld or look for the videos of the event which will come online at GestaltIT later. There are multiple elements at work including adjusting the load balancing pools, performing layer 2 over layer 3 tunnels and acceleration of traffic, which is what makes the storage VMotion work in a much faster and more reliable way. The workflow did some nice things such as when starting, first waiting for the number of connections to the server being moved to clear after it had been removed from the balancing pool.
Automation
Next we had Joe Pruitt (Sr. Strategic Architect, @joepruitt) do a great talk on automation and control through the APIs of F5 technologies. They were very early to support SOAP and cover a lot of languages as you can see below.
We looked at what the APIs covered, which is just about everything you could ever imagine doing. A number of examples were walked through which shows both the simplicity alongside the power of what you can achieve. They are split between iControl which covers all of the admin style process and iRule which is the rules for the traffic.
My only issue was that the code examples were not quite real as they contained comments, who comments their code in the real world!
Joe was one of the most enthusiastic presenters across the two days and his passion and joy for the technology really showed, it was great!
Remote Access
We then had a demo of joining some of the F5 products together to provide a bigger and more complex solution, being a global deployment of accelerated remote access. Using the global traffic director they could detect where the user was accessing from, align then with the appropriate entry point into the network (such as the local country) and then accelerate the resulting traffic. Its was good example of if you tie all these things together you can do much more.
ARX
Next was looking at some storage technologies, being ARX. Data is growing and file servers need to become building blocks where you can have policies to place data. ARX does this through open standards, being NFS and CIFS. The ARX is a device that acts as an enterprise class proxy file system. The diagram shown shows the structure.
You can take any storage you want with the characteristics you want and then use policies to move the data around those as required. This is achieved by placing the ARX device in front as a proxy. The ARX appliance looks like a standard client to the lower tiers so will work with many storage systems. The example included Cloud storage but in my opinion this was a little bit of Cloudwashing. Sure the use case was there but it relied on you using a Cloud provider who presented CIFS/NFS locally to your site, its not that the ARX could transpose its requests to talk to a Cloud based service (such as S3) directly. It was not an invalid example, but it does rely on a specific bit of technology that is not part of ARX.
The way ARX works is to place out a namespace across all of your tiers, tracks which bit of data (file) is where, route/proxy the requests accordingly and move the data around the tiers as required. The databases for routing the requests in real time is a non-trivial problem to solve according to F5, their namespace can contain a billion objects.
Curtis Preston discussed the issues around backup and restore with the way the data was laid out. The tiers supporting ARX is where you will probably need to backup and it does not have all the knowledge. Backup is probably going to be okay but restore is going to be hard and its not fully baked. If you need to restore something you are going to have to go and ask the ARX where to put the restored file or where was it previously so you can go and find it in your backup set.
F5 think the difference with ARX is that you can use multi-vendors on the backend and you are not having execute do a stub based solution like some of the alternative technologies.
An interesting last thought on this was the prediction that in a year data traffic management will be better understood, data will be considered another piece of traffic and managed accordingly.
Tour
F5 have a well kitted lab with lots of their equipment along with specialist device such as networking emulation and testing devices. People enjoyed getting back into a server room after a long day.
Thoughts
F5 did a good job, they had some demos and the right technical people presenting who knew their stuff. There might have been a few too many F5 staff filling the room but when TechFieldDay is in the building no one once to miss out right!
The core F5 technology is good and mature, this came through in the earlier presentations. You also got to see how the different products could be combined together. The interesting part was the ARX. I am sure it is a difficult problem to solve at the scales they discussed. However my feeling was it could do with its own interface into some Cloud APIs, maybe they are waiting for further standardisation. The backup and restore is a realistic problem and people will want to have resolved how they might handle it in their environment. Because they are integrating with the tiers as a client the ability to leverage any great features of those Tiers is abstracted or lost (but could be handled directly at that tear). I wonder if there would be any advantage for the ARX to be aware of certain elements to optimise its use of a particular Tiers vendor implementation, for example if its doing proxy for a DataDomain device it may use a more efficient method or interface (not having a good example for what one might be). The ARX from what I could see only added the large name space and tiering to the market. I am sure its not an inexpensive solution but I wonder if its need some more tricks up its sleeve than those two to get some key adoption. Certainly something to keep an eye on.
Thanks F5 for an interesting and fruitful few hours.
Rodos
Note : Tech Field Day is a sponsored event. Although I receive no direct compensation and take personal leave to attend, all event expenses are paid by the sponsors through Gestalt IT Media LLC. No editorial control is exerted over me and I write what I want, if I want, when I want and how I want.
Have you ever wondered how long it would take for a 12 year old child to configure a Drobo storage device? Well you are about to find out. In this short time lapse film you can see it for yourself!
I was lucky enough to win this Drobo from Data Robotics at the Gestalt IT Field Day last week. It arrived today via Fedex. If Drobo is meant to be simple for home users than there is not really any point in me testing it out, I know a thing or two about computers. However my youngest son Tim does not.That's not quite true, he is bit of a geek and a whiz at using applications but he is only just starting to learn about computer technology itself like storage. What a great test of the simplicity of the device. Seriously, the hardest thing was removing all of the packaging.
Tim also managed to figure out how to create the partition and format it. The Mac was kind enough to pop up the right utility when the Drobo was plugged in. After the partition was formatted it auto mounted. His test was to then copy a video file into the Drobo and play it from there. This is all included in the time.
If you are wondering what a Drobo is check out their web site. Essentially is a desktop storage device that contains data protection.
Drobo utilizes the revolutionary BeyondRAID storage technology that protects data against a hard disk crash, yet is simple enough for anyone to use. As long as you have more than a single disk in Drobo, all data on Drobo is safe no matter which hard disk fails. There’s no need to worry about anything else.
Its technology lets you add disk drives, it will take up to four. If you run out of space you simply pop out the smallest drive (its okay your data is protected) and insert a larger one, you will then get more space and the data protection re-configures itself underneath. For large drives the re-configuration process can take quite some time.
If you enjoyed this, please post a comment, I am sure Tim would appreciate it.
Rodos
[Note : I attended the Gestalt IT Field Days as a guest of Gestalt IT. Travel and accommodation was provided as part of the event. See the Field Day FAQ and my comments for details.]
The analyst industry is telling us that unstructured data growth is going to outpace that of transactional based data. "While transactional data is still projected to grow at a compound annual growth rate of 21.8%, it’s far outpaced by a 61.7% CAGR predicted for unstructured data in traditional data centers." You don't have to look far past your own explosion of data consumption to realise this is becoming a large problem for IT departments. Combined with this growth is our desire to keep more aged data online, in order to provide much faster retrieval.
What is one to do? Well a company called Ocarina Networks says they "make free space on storage you already have" through some very clever content aware compression and de-duplication. The key element here is that it works on your online storage so the savings to save are multiplied as there are flow on effect to the transmission of your data over networks and to the amount of data that you need to backup. So even though companies like Netapp (which Ocarina say they are 57x better than) and DataDomain do de-dupe its only at the underlying storage without these possible secondary benefits.
A quick look at just three of the people involved in Ocarina gives you a good impression that they have the pedigree to achieve great things here. Their CEO, Murli Thirumale and CTO, Goutham Rao, hail from the same roles in the Citrix Advanced Solutions Group, where they led the SSL-VPN division (acquired via Net6). In those roles they took their technology to the number #1 unit in market share in eighteen months. The Chief Scientist, Dr Matt Mahoney is a thought leader in next generation data compression. Also as a company they have been very busy in creating some interesting patents.
Last week at the Gestalt IT Field Day I got some deep dive into the Ocarina technology. Here is a video I took of Goutham and Murli.
However the insights from these guys on the science of de-dupe and compression was very informative, so lets look at what they had to say in more detail.
There are two approaches to compressing data, either a dictionary or a statistical approach. A dictionary encoder approach, such as the LZ algorithm, "operate[s] by searching for matches between the text to be compressed and a set of strings contained in a data structure (called the 'dictionary') maintained by the encoder. When the encoder finds such a match, it substitutes a reference to the string's position in the data structure."
The statistical approach is much more interesting. If you can predict what is coming next in a data series, you don't need to record it, you only need to record the things you did not expect (this is what takes up the space). As long as you use the same algorithm to extract the exception data you get exactly the same data (or file) whilst only saving a very small part of it. You can also have a feedback loop from from the errors back into the input to improve the prediction. For example if you look at a photo of the room you are sitting in now, there are probably lots of boarders or edge framed objects or walls etc. If you turned all of these edges into axis's and you were to follow an axis of colour moving down the edge of the wall you can expect that the next element moving down will be more of that same edge, you only need to record something when its not. Complex but you can do some clever things with the right algorithms [more on that shortly].
Compression is something you can only do on a single file. As mentioned the key to compression is predicting what the next value is going to be in an incoming stream of data. The more data you have available in the incoming data steam the better you may be able to predict the next value. Also note that a lot of file types being generated today are already compress internally, such as JPEG images either by themselves or embedded inside other documents.
De-dupe is all about finding the similar chunks of data by comparing hash values or a fingerprint. The smaller the chunks you are comparing the better because it increases the likelihood of a match between the two. Dividing the data into fixed chunks will get you so far but unless you have really small chunk you can miss a match that might occur across the boundary of two chunks. Netapp de-dupe does it this way. To get maximum effect you need what is called a sliding chunk window, looking for a matching bit of data anywhere, yet this is expensive computationally as you have to calculate a lot more hash values. There is a risk that two different chunks may produce the same hash or fingerprint, a false positive. Typical hashing algorithms are MD5, which is very weak or SHA256 which is strong, but Rabin [http://en.wikipedia.org/wiki/Rabin_fingerprint] is most liked [its fast to implement in software and works well on sliding windows]. How does all this comparing of chunks of data save you data? When you find a duplicate chunk you don't need to save a second copy, you can just save a small reference to the original piece of data you already have. Some technologies, such as Microsoft Storage Server 2008 do single instance storage (de-dupe) by only comparing whole files, which is bit of a joke really, it not going to get you much saving, because these days we create so many copies of the same files which are only slightly different (we add a few words to a document but save it as a new file name) or there is a lot of repetitive elements across files (images and templates). Yet this technique is really easy to do. Lastly, not all data can be de-duped, some just has very little if any repetition.
Now it also matters what you are de-duping, is it a data moving over a network, a backup or your storage. Each of these has a different "window" of time that they are looking at. On a network transfer you don't have much of a window and the data in that short window may not be very repetitive, whereas a backup has a very long window with repeated cycles of data coming in that is probably very repetitive. These different characteristics of the data stream require different algorithms to achieve greatest efficiencies.
Compression does not preclude de-dupe but they do pull against one another. For example as mentioned earlier a lot of data is already compressed and compressed data removes just about any chance of finding duplicate chunks of data. If you are a photo storing site you probably want to turn de-dupe of and not waste all the effort. Likewise in a corporate environment you may have millions of occurrences of your company logo image but they are all compressed and embedded inside Word and Powerpoint files that are then also compressed. All that repetitive data has been obfuscated! Remember, all that growth in storage is in this unstructured data area.
Yet you want both de-dupe and compression, because there is always data you need save so compress it.
So given this primer what do Ocarina do? Well Ocarina find the optimal chunk size for everything, compression and de-dupe, by performing object chunking. If you take all of the data and break it into objects, so a zip file is broken down into its multiple files, a Word document may be broken down into images and text. Then the actions occur at the object level. Hence a jpeg would not be broken down into smaller chunks, as the best windows size to compress or de-dupe a jpeg is the whole image.
Going beyond the object based chunking Ocarina then use a neural network to determine what the best compression algorithm is for this particular type of chunk, in fact they have over 120 different algorithms. There are even different algorithms for variations of the same object, such as for a small versus a large jpeg. Their algorithms range from plain text to gene sequences. For images they have some very smart algorithms that perform spatial optimization or what can your eye see, based on chrominance and luminance. If you take a typical scenario it helps to understand the power of this. If you have the same photo at different sizes, or if you slightly adjust a photo (such as removing the red eye) the data on the disk is all very different and there is probably no repetition across them. However because Ocarina can "look" at the image it is able to determine that they are all in fact the same photo.
How does all of this work? Well an appliance accesses your storage and process the data. It breaks files down into their objects, weaves it magic and puts the smaller shrunk version back. This all occurs in RAM. To be safe, before it replaces the file it compares the original file with an expansion of the shrunk file to ensure they match exactly so there are no errors. Of course the files on the storage are now different, so you need to use the ECOreader (a file system filter driver) which expands the files in real time as they are read so you get them back in their original format. Of course sometimes you may want to read the shrunk file and not expand it, for example if you want to transmit it over a network (replication) or for backup. The software can be integrated into storage to make it all transparent to the user. Performance when reading and expanding is on par for de-dupe, for compression its dependent on the method but usually the same rate to uncompress as it was to compress it. Essentially you are performing an economic tradeoff of consuming compute cycles for disk capacity gains.
Having reviewed all of this organisations which are having to store, transmit and backup large amounts of unstructured data could benefit a lot from the Ocarina technologies. Especially those that the Ocarina algorithms work well. From speaking to them they are working hard on new and improved algorithms but just as importantly on how to make the technology solution work well.
[Note : I attended the Field Days as a guest of Gestalt IT. Travel and accommodation was provided as part of the event. See the Field Day FAQ and my comments for details.]
You will have seen from my recent posts (1, 2, 3) that I have been attending the Gestalt IT Tech Field Days in Silicon Valley, home of all computer nerds.
Unlike your normal conferences and vendor events the Gestalt IT Tech Field days take a different approach to engaging with the vendor community. As I quoted previously :
This unique event brings together innovative IT product vendors and independent thought leaders who have immense influence on the ways that products and companies are perceived and understood by the general public. The world of media has changed, with social media and blogging gaining special importance.
Our Field Day is an opportunity for tech companies and independent writers to get to know each other. Ultimately, we hope to provide a forum for engagement, education, hands-on experience, and feedback.
However now that I am actually here in person I have been able to witness just how well it has worked. One of the vendors doubled their daily website traffic. The amazing thing was that this was the day before we came, and it was no small company. That just shows you how much engaging right can create value all round. On the flip side, at the same company, we got to challenge their CEO, head of marketing and Chief Architect around product and engagement complaints from the community, some of which were coming in over Twitter whilst we were talking!
I took the chance to get Stephen Foskett, the organizer of the event, the man behind it all (with the great help of many others) to briefly share his thoughts. I don't think he has slept much in the last week so he did well to be coherent late on the last night when I put him on the spot without notice.
Look to see my posts over the next while with some more technical details, the good, bad and the ugly about what we experience over the two days.
Rodos
[Note : I am attending the Field Days as a guest of Gestalt IT. Travel and accommodation is provided as part of the event. See the Field Day FAQ and my comments for details. Also note that Stephen Foskett is affiliated with one of the participant vendors, Nirvanix]
Starting at 7:30am it was a short walk across the street from the Hotel to the offices of Ocarina Networks. Ocarina do content aware storage optimization and they did a great job of taking us through their technologies along with some deep dives by their CTO on how compression and de-dupe work for different data types.
Next was Nirvanix who are a Cloud storage platform. You can use them as tier "n" for a backup destination or as a Storage Delivery Network (they have 5 locations across the globe).
W. Curtis Preston then launched Truth In IT, a new online community for users of technologies to freely exchange information whilst receiving formal product research materials and testing results. Curtis also provided the great lunch. Thanks!
From here it was back onto the bus and over to Data Robotics the company behind the amazing Drobo storage devices. Drobo have a great technology called BeyondRAID which lets you protect data across multiple drives where the drives can be different sizes, there is also zero admin. I was in a group of four people who won a Drobo device to take home so expect to hear more from me about this amazing little unit. Of course I had to head to Frys afterwards to pick up a few large drives to whack into it, unlike a lot of Geeks in attendance I don't have a stack of SATA disks laying around the house.
Here is the video summary from the day with each of the vendors explaining their technologies.
Of course over the next few days I will review my notes and write up some technical items on some of the technologies with my thoughts. There were some great things discussed today so there is much to write.
Rodos
[Note : I am attending the Field Days as a guest of Gestalt IT. Travel and accommodation is provided as part of the event. See the Field Day FAQ and my comments for details.]
Yesterday was the first day of the Gestalt IT Field Days.
It was a great event which ran smoothly. We started the day at the VMware ECB for breakfast then a tour of the VMware demo lab racks. From the main building we walked over to one of the R&D buildings for some vendor presentations.
First off was MDS Micro, followed by Xsigo and then the VMware team responsible for building their demo labs, including for VMworld. Some good time was spent running through a lab exercise on the Xsigo equipment creating virtual vHBA and vNICs to present dynamically to an ESX host. Bandwidth control was also applied to the some storage traffic to show QoS.
After more nice VMware food it was back onto the bus to the 3Par offices. Here we had a number of speakers from 3Par followed my a number from Symantec. Certainly the primary speaker from 3Par received my vote as best presentation for the day (if I try and spell his name I will get it terribly wrong, will try and update the post tomorrow). The vote is due to the fact that he was passionate, knowledgeable and was the first person all day to pick up a whiteboard pen and start drawing!
Here is a video of the days events where each of the vendors gives a little summary of their message.
Over the next few days I will review my notes and write up some technical items on some of the technologies with my thoughts.
Rodos
[Note : I am attending the Field Days as a guest of Gestalt IT. Travel and accommodation is provided as part of the event. See the Field Day FAQ and my comments for details.]
With over 20 years working in the IT industry I have had varied sub careers. My first decade was as a programmer, developing applications whilst working and living in Asia. There was the obligatory dotcom involvement in a fun start up. Working in the SI space I loved being able to work at integrating many various technologies and solving a wide variety of IT problems.
Falling in love with server virtualization caused me to become involved in Cloud Computing which became a great passion due to how much it could help IT do greater things.
Today I spend my time assisting a large team of Solutions Architects across A/NZ at Amazon Web Services. Just like everyone at Amazon I enjoy working hard, try to have some fun and hope to be a small part of making history.