Musings of Rodos

The second vendor for day 1 of Virtualisation Tech Field Day #2 was Zerto.

The session was held back at the hotel. The camera crew did not come back and this event was not broadcast over the Internet. To be honest I was a little confused about the what you can say, and what you can't say, discussion. There was going to be mention of some things which are coming out in the next version as well as some customer names. Both of these might be things that should not be in the public domain until appropriate. To be honest, this is difficult as a blogger and I thought there was some "understanding" with these events that everything was public, no NDAs. Whilst I respect a vendor wants to keep things private whilst balancing giving us the greatest insights and information, it makes it really difficult for me to navigate what I can and can't write about. So if I make a bad mistake treading that line I am sure someone will let me know in the morning and I will be editing this post very fast.

Our presenting were Gil Levonai who is the VP Marketing and Products and Oded Kedem, CTO & Co-founder. So we were really getting things from the experts.

Zerto do disaster recovery solutions for the VMware environment. Their target customers are the Enterprise and now the Service or Cloud providers. Having spent quite a few years working in the DR and recently the Cloud space I was very keen to hear what Zerto had to say.

Here is my summary notes for the session.

The founders were also the founders of Kashya Inc which was sold to EMC. After being sold to EMC Kashya turned into RecoverPoint which is one of the mainstream replication technologies for Continuous Data Protection (CDP) based DR today.
They are more after the Enterprise market and not the SMB players. I have no idea what their pricing is like, I wonder if their pricing matches that market segmentation?
Replication of a workload can be of a number of scenarios. One is between internal sites within the same Enterprise. Alternatively you can go from within an Enterprise to an external Cloud provider. There is a third use case (which is very similar to the first) where a Cloud provide could us Zerto to replicate between their internal sites.
The fundamental principal for Zerto is moving the replication from the storage layer up to the hypervisor without loosing functionality. Essentially it is a CDP product in the nature of RecoverPoint or FalconStore Continuous Data Protector, but rather than being done at the storage or fabric layer it utilises the VMware SCSI Filter Driver (as detailed by Texiwill) to get up and close to the Virtual Machine. This means that Zerto can be totally agnostic to the physical storage that might be being used which is a great feature. This is important in the Cloud realm were the consumer and the provider might be running very different storage systems.
The goal of Zerto is to still keeping all of the Enterprise class functions such as consistency groups, point in time recovery, low RPOs and RTOs. The only obvious high end feature that I saw was lacking was synchronous replication. This question was asked and Gil responded that they felt that this was not really that much of a requirement these days and synchronous might not be required. I think their is still a case of needing synchronous but Zerto just does not seam to be wanting to go after it, which is fair enough.

There are two components (shown above). The Zerto Virtual Manager that sits with vCenter. This is the only thing that you interact with. It is provided as a windows package that you need to deploy on a server and it integrates to vCenter as a plugin.
Then there is the Zerto Virtual Replication Appliance (Linux based) which is required on each host. This is deployed by the manager.
Some of the features of Zerto are :

Replication from anything to anything, its not reliant on any hardware layers, just VMware
Its highly scalable, being software based
It has an RPO in seconds, near sync (but not sync)
It has bandwidth optimisation and WAN resiliency. Built in WAN compression and throttling.
Built-in CDP which is journal based.
It is policy based and understands consistency groups. You can set CDP timelines for retention in a intelligent way.
If it gets behind and can't keep up it will drop from a send every write mode to a block change algorithm and drop writes in order to catch up. This catchup mode is only used if the replication can't keep up for some reason (lack of bandwidth, higher priority servers to be replicated. What I would like to see is for this to be a feature you can turn on. So rather than CDP you can pick a number of points in time that you want and writes between these are not replicated. This would emulate what occurs with SAN snapshots. Yes its not as protection but for lower tier workloads you might want to save the bandwidth, you can match what you might be doing with SAN snapshots but can do it across vendor. Gil did not think this was a great idea but I think their is real merit to it, but I would, being my idea.

Often people want to replicate the same workload to multiple sites. Sometime the same machine two different sites from the primary one (call this A to B and A to C), or from the primary to a secondary site and then a replication from the secondary site to a third site (A to B to C). You can't do either of these modes at the moment but watch this space.
There is a concept of Virtual protection groups. VM and VMDK level consistency groups. This is very important for some applications which need to have data synchronised across multiple disks or across systems, its great to see this supported.
Support for VMotion, Storage VMotion, HA, vApp.
There are check points in the CDP stream every few second and you can set a time for doing a special VSS check point. This is excellent.
Its vApp awareness is very good. If you add a VM to a vApp it will start replicating it. It also knows things like startup order within the vApp and retains that information for recovery at the other site. This is better then VMware Site Recovery Manager (SRM).
You can denote a volume as swap or scratch so its not replicated. It does replicate it once, just so it has the disk to be able to mount up to the OS. Once once replicated it does not send any writes made to the disk. This way you get a valid disk that will mount fine at the destination with the initial swap or scratch state. This is a great feature.
They will be able to pre-seed the destination disk at the other site to speed up the synchronisation, a big need in the DR space when you are dealing of very large amounts of bandwidth down restricted bandwidth pipes.
There is no need for a shadow VM at the destination site. They are created on recovery or failover. At the failover the VMs are created and disks connected to it.
Failback is supported.
Test failover is provided and it can have read write capability. Replication continues to run while the test recovery is taking place (you always need to be protected). The test can't run longer than your CDP journal size. The test recovery is very efficient in storage size as it sources the reads from the replica journal and it does not have to create a full copy of the disk, so only your writes to the test copy take up additional space.
For the recovery migration you can do a move instead of a failover which does a shutdown of the VM first to give constancy.
For the failover you can choose the network to connect each nic to at the other site. You can specify different NICs for actual failover versus a test failover. It can also re-ip address the machine if required.
Support the Nexus 1KV but as port groups. I don't think it can orchestrate network creation on the N1K.
Pre and post recovery scripts can be configured to run, so you can script actions to want ever you want, such as updating DNS entires etc.
Now the really really really nice thing is that you can destine to a VMware vCloud implementation. When you target to a vCloud you select which of your available organisation VDCs you want to recover to. Then, when you are selecting you networking, it presents the organisational Networks as your choices. Very nice. A demo was done or doing a failover to a VCD environment and it worked very nicely. I was quite impressed. I discussed with Oded how all of the provider side was handled, the multi-tennacy, security etc, just about everything had been covered an quickly explained. This showed to me that this stuff is very real and they have thought about this a lot. I see a lot of potential solutions for this that might work in a Enterprise space but that have no chance in the service provider space, but from what I could see I think Zerto gets it.
When you need to do a failover , what happens if the source site no longer exists. Well you go to the vCenter on the destination site and do it their. This is a problem in the Cloud space as the customers are not going to have access to the vCenter, only the provider. Today the provider is going to have to do the recovery for you if you your site is gone. Their is an API for the provider to use with their own portal. Ultimately Zerto are saying they will provide a stand alone interface to do this function.

I really enjoyed the presentation for Gil and Oded. Not to many slides, a great demo, lots of explanation and really showing of what was unique about their offering. I am looking forward to learning more about what they are doing and in seeing their functionality grow, I think they have many things right in this new hybrid Cloud world.

Rodos

P.S. Note that I am at this event at the invite of GestaltIT and that flights and expenses are provided. There is also the occasional swag gift from the vendors. However I write what I want, and only if I feel like it. I write nice things and critical things when I feel it is warranted.

I went to two sessions this morning on DR to the Cloud.

I think the first thing you could say about these sessions is that they were named a little wrong. They should have been about DR to "managed service provider" or "hosting company". There might have been a bit of Cloud washing going on here. There were certainly elements of clouds and this is a developing space of which we are at the start if the journey, but I think the topics may be a little "over sold" in their wording.

So what were my notes?

SRM will evolve to be application or vApp aware rather than it's VM centric nature of today.
Today SRM is all about protecting a machine in site A in another site B. In the future there will be more sites involved, protecting works in one site to multiple sites. For example you might protect a machine to your internal second site plus an external Cloud provider.
VMware are working on creating layer two connectivity between multiple sites. This combined with VMotion across sites will allow some interesting DR scenarios. In my opinion this will be helpful for disaster avoidance.
The goad is to be able to intermix vSphere and vCloud Director as either sources or destinations of DR.
There is the use case of DR to the cloud as well as DR off the cloud.
The plan is their would be a plugin for your vSphere Client that would do all the work of setting up DR to a Cloud provider. I imagine this would be like the vCloud Connector plugin.
The attributes that are proposed for this future state of software are; VM level protection granularity, multi-tenancy, self serviceability, storage agnostic, vm portability, role based management, scalability, extensibility, simplified deployment and management, security and RAS (reliability, serviceability and availability).
Hosting.com described their use of SRM 5 to provide Cloud DR. From what I could see this looked like an implementation of SRM on top of a vSphere implementation that had a Cloud front end. They have their portal for consuming virtual machines in a Cloud manner. By adding SRM underneath and then using the SRM APIs to control it from their portal they are able to give DR functions to users. This shows what can be done when you build your own world and don't use vCloud Director. The service is in Beta.
A question was asked from the audience about when SRM and vCloud Director would be integrated or compatible. The answer was that thus was in the roadmap but no detail. I suspect this person was like a lot in the audience was wondering about this given the title of the session.
In the service provider session a number of organizations got up and spoke about their DR solutions and how they were integrating in SRM. There was a lot of managed services wrapped around these. Lots of array based replication, customer specific ESX clusters and other such non-Cloud scenarios. There is certainly some great solutions out there and the providers are working hard with what they have.

What was my take away from two hours of presentation of SRM and Cloud. Essentially we are not there yet. Yes there are some DR solutions and some providers will even let you use SRM. The true cloud experience DR from your own infrastructure into a VMware based Cloud is there in parts but there is still portions of string and sticky tape holding all together. Actually that is probably not fair, it makes them sound unstable. What we don't have is the simplicity that we have with SRM today.

The question is, how long will it take for VMware and the providers to get there. I suspect 12 months, problem is we are greedy and want it all TODAY!

Rodos

P.S. Slowly getting used to Blogsy on the iPad to write this stuff up. Doing straight content is a lot easier than pulling things in from multiple places.

Musings of Rodos

SRM

VFD2 - Zerto

DR to the Cloud with SRM

Rodney Haywood

Archives

TripIt

Categories