Pages

Tuesday, October 30, 2012

Architecture effects cost

"Architecture effects cost", thats a simple and somewhat obvious statement. Those of us who architect systems know that there are outcomes for the design decisions that are made. Often there are tradeoffs and no environment lives in a world where money is unlimited. Well okay the unlimited funds projects probably do exist somewhere, but I have not had the pleasure of working in one of those environments in my career.

The concept of architecture effecting costs is a key element when you are architecting for Cloud solutions. One of the reasons this is key, is that your ongoing costs have a greater potential to be variable. This is a good thing, one of the benefits of moving to Cloud models is the elastic nature and you want your applications to be right sized and running efficiently both technically but also financially.

There are two considerations when thinking about costs for Cloud.

First there is the selection cost. As an example, if you were using Amazon S3 there are two durability options available, one is lower durability and also lower cost. If you have scratch data or data that can be regenerated you might simply choose to use the less expensive storage.

Second is then understanding your elastic cost, based on your predicted demand. Based on your forecast demand, what will your peak and average expenditure be over time? What size of cheque are you signing up to if you implement your selected solution?

An example of this crossed my mind this last week as a read a blog post from VMware on their vFabric blog. It was entitled "ROBLOX: RabbitMQ, Hybrid Clouds, and 1 Billion Page Views/Month".

Don't be distracted by the 1 billion number. Its an interesting article on how the application architecture for this company needed to be changed as they started to scale out. The example was an AWS customer and in order to speed up their service they introduced a message queue to decouple the synchronous relationship between the web services and the back-end services (as you do). Here is a pertinent bit of text.
To dive deeper, ROBLOX implemented RabbitMQ to help deal with the volume of database requests and slow response times.  The queue is managing 15 million requests a day. The example scenario is when there is an update to the ROBLOX catalog content, which needs to update the search index.  Before RabbitMQ, the web server would have to wait for an update to complete.  Now, the loosely coupled architecture allows the web server to update the message queue, not the database, and move on (technically, the update is enqueued and picked up by the consumers).  At some point, the message is picked up by a service that updates their search platform index.
Wonderful stuff. But but it hit me, why bother implementing RabbitMQ? It does not sound like they were using a lot of its sophisticated functions, surely they could have used the AWS Simple Queue Service (SQS). Thats when the "architect for cost" mantra kicked in. 15 million requests a day, that does not seam like much but lets see what that would cost for SQS. SQS is $0.01 per 10,000 requests and there are no data charges if the transfer is within a single region. Thats $15 a day which is reasonable. Thats about $465 a month. Thats about the same prices as a full license over a year potentially for RabbitMQ (or you could use the free version) but that is now a fixed cost and you have to factor in the cost for running the server to execute it along with the effort to maintain it (no cheating on the hidden costs). Looks like others have pondered this SQS vs RabbitMQ question as well (1, 2). However this is just an example of the point which jumped out this week.

So when architecting for Cloud, don't forget "architecture effects costs". With a Cloud there is not only the selection cost but also consider your elastic cost.

Rodos


Wednesday, October 17, 2012

Secure Multi-Tenancy

Edward Haletky (blog) once asked me what I thought secure multi-tenancy was in relation to Cloud Computing. I am willing to admit that at first I gave an answer reminiscent of a Ronald Reagan quote.

Due to the painful memory of this event the actual definition of what security multi-tenancy is has always had my attention. I work with a lot of vendors and despite what their marketing and sales people declare their understanding of this topic is wide ranging in maturity and reality. Just as there is Cloud washing there is a lot of secure multi-tenancy washing going on.

This week I actually had to sit down and document in a paper what secure multi-tenancy is. How would we know when it had been achieved?

Of course one usually starts such an activity with Wikipedia and Google, just like my children do for their school work.

Given how much you hear this phrase these days its a surprise that there is no clear and concise definition out there.  One of the earlier ones is from the Cisco Validated Design that they produced with Netapp. It states
... the capability to provide secure isolation while still delivering the management and flexibility benefits of shared resources. Both private and public cloud providers must enable all customer data, communication, and application environments to be securely separated, protected, and isolated from other tenants. The separation must be so complete and secure that the tenants have no visibility of each other.
That is reasonable but personally I did not feel it catered as a measure of success. So I sad down and tried to summarise what I have learnt and have been implementing over the last few years.

Secure multi-tenancy ensures that for a shared service:
  • No tenant can determine the existence or identity of any other tenant.
  • No tenant can access the data in motion (network) of any other tenant.
  • No tenant can access the data at rest (storage) of any other tenant.
  • No tenant can perform an operation that might deny service to another tenant.
  • Each tenant may have a configuration which should not be limited by any other tenant's existence or configuration. For example in naming or addressing.
  • Where a resource (compute, storage or network) is decommissioned from a tenant the resources shall be cleared of all data and configuration information. 
I have tried to keep it as succinct as possible. The last item regarding the clearing of all data and configuration is most familiar to people as the "dirty disk" problem.  You could consider that this item is a duplication of the 3rd point, that no data of one tenant can be accessed by that of any other. Yet people tend to forget about the residual information of both configuration or information that may remain thus introducing vulnerabilities. Thanks to my colleague Jarek for contributing this 6th item to my original list.

Do you consider the environments that you use meet all of this criteria? Does this criteria cover the required elements or does it cover to much? Appreciate your comments.

Rodos


Wednesday, October 10, 2012

Drobo Update - My experience of a failed disk drive

Back in November 2009, thats 3 years ago, I did a post about Drobo. What made this interesting was that it included a video of my then 12 year old son doing the unboxing and deployment of the device.

Well 3 years later a lot has changed. My son Tim is more of a geek than I am and a heck of a lot taller, and the range of Drobo hardware has changed. However the Drobo box has been faithfully running those last three years without incident. That is until last week when it suffered a drive failure.

When I originally installed the Drobo I put in 3 x 1TB drives. Then after about two years, as usage started to increase, I picked up another 2TB drive to insert into the remaining slot. No config required, just inserted the drive and let it do its thing. Thats the benefit of BeyondRAID.

Last week at work I get a text from Tim.
The drobo lights are flashing green then red, its got about 480g free. I dont have the drobo dashboard but if you tel me which drobo it is I can get it and see what it says.
Of course I remember, unlike Tim, that there is the indicator light cheat sheet on the back of the front cover. I ring Tim and he reads out what the lights mean.


The chart reveals the problem, the new 2TB drive has failed and a rebuild is occurring on the remaining 3 drives. Thats good. We have not lost any data, the unit is still operating, we can read and write data. The only change is that the available free space on the unit has reduced, as indicated by the blue lights on the front.

After quite a few hours, Tim and I start to get impatient about the rebuild time. I of course am expecting the rebuild to take a while, I know that even in big enterprise storage arrays this can take a while and that the older Gen2 Drobo that we have is does not have much grunt in its processing power.

To try and determine the time I browse a few sites on the Internet to read what others have experienced. It was a little disturbing to read so many horror stories about peoples rebuild times, sometimes its weeks and other stories of their units failing. Sounded a little ominous so we installed the Drobo Dashboard onto the Mac Mini it was now connected to in order to determine the rebuild time.

The estimate for the rebuild was another 30 or so hours and it had probably been running around 12 hours already. We went to bed thinking we had a long wait ahead. In the end the rebuild finished ahead of schedule and probably took around the 24 hour mark. For the age of the device and the fact that we lost the largest drive in a box and that was reasonably utilised, I think that is a reasonable time.

Turned out that the drive that failed was still under warranty (thank you serial numbers) but we figured we might as well go and get another 2TB drive and get back some free space, when the RMA arrives we can swap out one of the smaller drives with the larger one and get some new space.

After a fun trip to a computer store we slotted in the new drive. The crazy thing was that within under a minute the indicator lights started flashing again and one of the original 1TB drives was showing red, another failed drive. I have no idea if it was good or bad luck! But the drive was certainly failed. The rebuild was much faster this time.

A day or so later we found a spare 1TB hard drive in the study and threw it in the slot of the failed drive. All great. We are now back to where we were, plenty of space and redundancy. Here is the current state.


Now once that 2TB drive returns from the RMA we will still swap out one of the original drives.

So whats my thoughts on the Drobo after 3 years and experiencing a real life drive failure?

  • Everything works as advertised.
  • The setup was easy, we experienced that 3 years ago.
  • When a drive failed it was great that everything continued to operate as normal, we could still read and write data as we wanted.
  • It was great that it did not matter if we had the Drobo Dashboard software up to date or even installed, the unit took care of everything itself.
  • The rebuild after a failure did take time, but in our experience that time was reasonable and as it did not really on the computer, and that we could still utilise the device, the only thing inconvenienced was our patience.
  • Even thought the unit is starting to age, the software and firmware updates are still available.
  • Never trust a single device. All of data that is critical such as family photos, but also general data we would be inconvenient to loose is also backed up to the Cloud using CrashPlan. I don't care how highly available a storage unit is, it is not backup if its your primary copy! That is one concept I think a few of those people complaining about their Drobo experiences need to take heed of. 
There is sometime attractive about technology that is so smart it can make things so simple to use.

Rodos

Friday, October 05, 2012

IDWS - Come hang out with the storage geeks

If you are in Sydney next Tuesday (9th of October) come hang out with the storage and information systems geeks.

IDWS, or Information & Data World Symposium is on and a great place to catch up with vendors and providers, share experience with colleagues and industry, and of course learn new things.

Here is the blurb

A comprehensive technical symposium designed for data management professionals and IT practitioners to broaden their knowledge into all facets of building and maintaining their information infrastructures. 
The symposium will be educational and technical, targeted to all IT levels from CIOs to the skilled staff responsible for managing and protecting their companies greatest asset, it's data. 
Be engrossed as key industry players battle it out against each other at the 'Great Debate'. Throw in live tweets and feeds from the floor including voting and see who will be crowned champion on a range of key topics. 
The symposium will feature a Technical Lab for vendors to demonstrate real information infrastructure solutions as well as technical workshops to suit all delegates. 
This one-day event will cover Big Data Analytics, Cloud Storage & Services, Infrastructure Convergence, Data Management & Protection, Storage Security, Virtualisation and a lot more.
Registration is free, head to http://www.idws.com.au/ to learn more and register.
See you there.
Rodos
P.S. As you know I am a Board Member of SNIA who are part of this event.

Saturday, September 01, 2012

Crashplan gets Australia based service

Crashplan from Code42 is one of those services that are so great that I just tell everyone about it. You never have a problem recommending something that has worked so well for yourself and that you know others will really benefit from.

I have always recommended crashplan to colleagues, friends and family for many reasons.

  • It is free if you want to do local backup or backup to a friend.
  • Their optional Cloud backup destination is all you can eat and competitively priced.
  • You can backup to multiple locations with different data sets, which is a key feature.
  • It's very secure when backing up locally or remotely.
  • It's automated and just works, set and forget.
  • You can access the files you have backed up via an iPad app.
  • I have tried other services and they were not as good.
So I was really please when at VMworld this week I ran into the CrashPlan stand.

On the stand was their chief marketing officer and he happened to mention that they were launching an Australian based presence for their service. My accent must have given away that I was Australian, go figure.

Here is what their press release says about it

"We've always had a strong customer base in Australia. Now, with the opening of our Australian office, we’re positioned to serve this fast-growing market much more effectively," commented Matthew Dornquast, co-founder and CEO, Code 42 Software. "In addition, having a local data centre means we can deliver even better performance for our cloud backup customers located across the entire Asia-Pacific region."

The new data centre will provide state-of-the-art, cloud-based backup services to Australian users of Code 42‘s CrashPlan products:

• CrashPlan+ - the award-winning computer backup solution for home users.

• CrashPlan PRO - the innovative backup system for small and mid-sized businesses.

• CrashPlan PROe - the world's most advanced backup and disaster-recovery solution for enterprises.

A local data centre also enables Code 42 to extend its popular U.S. “seeding” option to Australian customers. With this option, Australian customers have the option of shipping their initial backup to Code 42 where it’s then loaded directly onto CrashPlan servers. “This approach is extremely beneficial because it can save our customers a lot of time, especially those with large initial backups or where bandwidth is limited,” Dornquast explained.

Latency always adds some overhead to these network applications so having an Australian presence is going to help all of us Australian users. Plus the ability to send them you're data for seeding is great for those with slow Internet links.

However, if you are already a happy customer, like myself, you will continue to use the US servers, for the moment at least. Only new Australian customers will use the new site. According to the person I spoke to they are working on the process for moving existing customers on shore and their was no timeframe for when this might be done by. Definetely something we need to all encourage them to do. I don't want to start a new service and have to push my 0.5TB back again plus loose all of my regions and deleted files.

So if you were looking for a backup service for your personal machine, you have even more reason to give Crashplan a look!

Rodos

P.S. I will have to see if I can go visit their Australian office and interview their local staff to find out more about their environment. I should also write up my best tips for using Crashplan.

 

Tuesday, June 05, 2012

BigData. So what?

Sometimes it takes a bus trip to connect the dots. In my case today these were BigData and a Wired Magazine article.

We have all been hearing a lot about big data lately. If a vendor has little to say, or possibly said everything they can, about Cloud then they just search and replace the marketing materials with the phrase "Big Data".  We are not at the stage where McDonalds has decided to replace the BigMac with the BigData burger so the consumer world is safe for the moment, but most CIOs are probably getting their in tray full of promotions and case studies.

Whilst I get big data and see its value, I have personally struggled with the realities of execution. We have been reading about the increasing demand of developers skilled in Hadoop and I have a college who is a CCIE, got into Cloud and is now chasing the Hadoop angle. But to me BigData itself brought no real shift in ability to execute here. It might be cheaper and easier to store and process big data these days, but the insights have always been a human effort. the human effort to develop the analytics takes intellect and scale. There was the rub, not all humans have the same intellect and humans don't scale in the specialist areas. I have a friend who works for Oracle in demand planning. His is real smart at building data mining for global companies that need to forecast all sorts of whacky things. Yet he is very specialised and uses some real high end software. The gap between those people with big data, and those who can do something with it, has always irked me.

So today I am on the bus reading Wired on my iPad, as you do, and read an article "Can an Algorithm Write a Better News Story than a Human Reporter?". Have a scan thru the article but the premise is that given large amounts of statistical data companies such as Narrative Science and turn it into a news story that is very insightful. They started out doing this with children's baseball games. Feed in the play by play data and it generates a story such as
Friona fell 10-8 to Boys Ranch in five innings on Monday at Friona despite racking up seven hits and eight runs. Friona was led by a flawless day at the dish by Hunter Sundre, who went 2-2 against Boys Ranch pitching. Sundre singled in the third inning and tripled in the fourth inning … Friona piled up the steals, swiping eight bags in all …
Baseball, financial markets, they can do some amazing stuff. Many companies are actually using machines to find insights and produce prose. As mentioned
Once Narrative Science had mastered the art of telling sports and finance stories, the company realized that it could produce much more than journalism. Indeed, anyone who needed to translate and explain large sets of data could benefit from its services. Requests poured in from people who were buried in spreadsheets and charts. It turned out that those people would pay to convert all that confusing information into a couple of readable paragraphs that hit the key points.
and
And the subject matter keeps getting more diverse. Narrative Science was hired by a fast-food company to write a monthly report for its franchise operators that analyzes sales figures, compares them to regional peers, and suggests particular menu items to push. What’s more, the low cost of transforming data into stories makes it practical to write even for an audience of one. Narrative Science is looking into producing personalized 401(k) financial reports and synopses of World of Warcraft sessions—players could get a recap after a big raid that would read as if an embedded journalist had accompanied their guild. “The Internet generates more numbers than anything that we’ve ever seen. And this is a company that turns numbers into words,” says former DoubleClick CEO David Rosenblatt, who sits on Narrative Science’s board. “Narrative Science needs to exist. The journalism might be only the sizzle—the steak might be management reports.” 
This is where the dots connected and I became a lot more relaxed about big data. Here we have the birth of what can start to give reality to big data capture and processing. Whether you view it as AI, clever algorithms or plain ole automation does not matter. Looking forward you can see how companies can cheaply and easily generate business insights from the data they collect.

Until these analytic services mature you might want to brush up on your hadoop skills, but in the future you might just start getting more emails from a automated account like the following.

"Rodos, yesterday there was a flood of traffic on the fibre channel network that was generated from workloads in the Melbourne IaaS availability zone. Looks like this was mostly from a specific customer and I also picked up that their Unified Communications workloads in the UCaaS nodes in Singapore peaked. The company in question just listed on the stock exchange in Hong Kong and forecast interest in their services, if it continues at the rate, will cause increased workload that will take the Melbourne availability zone B to 90% capacity. Last time zone B hit 88% capacity (Sept 2014) SLAs for 2 customers were broken. Just a heads up, regards Siri".

Friday, February 24, 2012

VFD2 - Xangati


Third and last on the first day of Virtualisation Field Day #2 was Xangati

There was great food presented before the start of the session, ice-cream and a variety of bacon in different flavours. The idea was not to mix them (hey they do weird things with food in America) but to provide choice. The bacon was very well received by the delegates. For some reason bacon is very popular.



Jagan Jagannathan, the founder and CTO grabbed the whiteboard pen and started to explain things. It worth repeating (and it gets repeated a lot), this is exactly the type of engagement and insight that delegates at TFDs respect and value. You can tell the difference. The room is quiet, the questions and interruptions are minimal, people listen intensity. I have seen this again and again at these days. It staggers me how some vendors do it and succeed and other ignore the advise and struggle. In a similar way, Xangati provided a list to each person of who was present, their names, titles and twitter handles. When you are writing up notes and posting on twitter, this easy access to info is so very helpful. You would think that the PR people at Xangati had not only read the advise given them (http://techfieldday.com/sponsors/presenting-engineers/) but they actually attempted to put it into practice!

I was engaged with what was being presented and ended up not taking many notes. 



Some of the interesting things discussed was how in the performance monitoring world you have triage vs postmortem. For triage you need real time, at this minute information, anything older than that and its not as useful. Older data such as five minutes later, thats used for postmortem analysis.

One of the key things that Xangati does is take all of the incoming data and process/analyse it in memory, rather than writing it to a database for analysis. This allows them to give very timely and detailed information in their UI and alarming. The interface has a slider and you can wind back the clock a little and see what was just happening prior to now. You can also record the detailed real time information you are looking at for later analysis. This recording links in with their alerting. That is when an alert is created it records the associated real time info for that short time period so you can see what was happening. Of course the data, is also written to database in a  summarised form for later analysis. This uses a reporting interface that is not as nice or as interactive as the real time interface. I would like to see the two much more similar, its feels a little strange to have them so different. However given that they work of different data models and server different purposes you can see the reasons why. 

They have self calculating thresholds but you can create your own. 

Xangati have been doing a lot in the VDI monitoring space but they were keen to point out that they are not a VDI monitoring company, they do straight virtualistion too. I think they don't like being tarred to much with the pure VDI brush.

The do have some great VDI features though. If you are looking into desktop performance you can launch into a WMI viewer as well as a PCoIP or Citrix HDX viewer to see a lot more detail about whats going on inside the desktop and the display protocols. They even have a neat feature where and end user can self service a recording of their performance for a help desk to analyse. The user can go to a form and request a recording for their environment, it records the previous 1 minute before the submission. Thats nice.

Here is a look a the demo environment I interacted with.


Where there are reports that have sub elements (such as a protocol list) you can drill down to those. At first I thought the reports were not interactive, but I was wrong about that and shown the error of my ways.
It was a good session. I certainly got the impression that for real time performance trouble shooting Xangati is a real player worth investigating. I did not get enough of a chance to look at the product or discuss with them its suitability as an comprehensive monitoring solution. I think there are a few things that an overall monitoring solution requires that I did not see in the product, for example inventory data. Maybe a more in-depth look at the features and functions would help nut this out more. Hard to do in our limited time. They do have the free version which is popular and evils are available, so its easy to check these out for yourself.

Well its been a long day, lots to see and think about. Looking forward some brief sleep before another day of it all tomorrow.

Rodos

P.S. Note that I am at this event at the invite of GestaltIT and that flights and expenses are provided. There is also the occasional swag gift from the vendors. However I write what I want, and only if I feel like it. I write nice things and critical things when I feel it is warranted. 

VFD2 - Zerto


The second vendor for day 1 of Virtualisation Tech Field Day #2 was Zerto

The session was held back at the hotel. The camera crew did not come back and this event was not broadcast over the Internet. To be honest I was a little confused about the what you can say, and what you can't say, discussion. There was going to be mention of some things which are coming out in the next version as well as some customer names. Both of these might be things that should not be in the public domain until appropriate. To be honest, this is difficult as a blogger and I thought there was some "understanding" with these events that everything was public, no NDAs. Whilst I respect a vendor wants to keep things private whilst balancing giving us the greatest insights and information, it makes it really difficult for me to navigate what I can and can't write about. So if I make a bad mistake treading that line I am sure someone will let me know in the morning and I will be editing this post very fast.

Our presenting were Gil Levonai who is the VP Marketing and Products and Oded Kedem, CTO & Co-founder. So we were really getting things from the experts.


Zerto do disaster recovery solutions for the VMware environment. Their target customers are the Enterprise and now the Service or Cloud providers. Having spent quite a few years working in the DR and recently the Cloud space I was very keen to hear what Zerto had to say.

Here is my summary notes for the session.
  • The founders were also the founders of Kashya Inc which was sold to EMC. After being sold to EMC Kashya turned into RecoverPoint which is one of the mainstream replication technologies for Continuous Data Protection (CDP) based DR today. 
  • They are more after the Enterprise market and not the SMB players. I have no idea what their pricing is like, I wonder if their pricing matches that market segmentation?
  • Replication of a workload can be of a number of scenarios. One is between internal sites within the same Enterprise. Alternatively you can go from within an Enterprise to an external Cloud provider. There is a third use case (which is very similar to the first) where a Cloud provide could us Zerto to replicate between their internal sites.
  • The fundamental principal for Zerto is moving the replication from the storage layer up to the hypervisor without loosing functionality. Essentially it is a CDP product in the nature of RecoverPoint or FalconStore Continuous Data Protector, but rather than being done at the storage or fabric layer it utilises the VMware SCSI Filter Driver (as detailed by Texiwill) to get up and close to the Virtual Machine. This means that Zerto can be totally agnostic to the physical storage that might be being used which is a great feature. This is important in the Cloud realm were the consumer and the provider might be running very different storage systems.
  • The goal of Zerto is to still keeping all of the Enterprise class functions such as consistency groups, point in time recovery, low RPOs and RTOs. The only obvious high end feature that I saw was lacking was synchronous replication. This question was asked and Gil responded that they felt that this was not really that much of a requirement these days and synchronous might not be required. I think their is still a case of needing synchronous but Zerto just does not seam to be wanting to go after it, which is fair enough.

  • There are two components (shown above). The Zerto Virtual Manager that sits with vCenter. This is the only thing that you interact with. It is provided as a windows package that you need to deploy on a server and it integrates to vCenter as a plugin.
  • Then there is the Zerto Virtual Replication Appliance (Linux based) which is required on each host. This is deployed by the manager. 
  • Some of the features of Zerto are :
    • Replication from anything to anything, its not reliant on any hardware layers, just VMware
    • Its highly scalable, being software based
    • It has an RPO in seconds, near sync (but not sync)
    • It has bandwidth optimisation and WAN resiliency. Built in WAN compression and throttling.
    • Built-in CDP which is journal based. 
    • It is policy based and understands consistency groups. You can set CDP timelines for retention in a intelligent way. 
    • If it gets behind and can't keep up it will drop from a send every write mode to a block change algorithm and drop writes in order to catch up. This catchup mode is only used if the replication can't keep up for some reason (lack of bandwidth, higher priority servers to be replicated. What I would like to see is for this to be a feature you can turn on. So rather than CDP you can pick a number of points in time that you want and writes between these are not replicated. This would emulate what occurs with SAN snapshots. Yes its not as protection but for lower tier workloads you might want to save the bandwidth, you can match what you might be doing with  SAN snapshots but can do it across vendor. Gil did not think this was a great idea but I think their is real merit to it, but I would, being my idea. 
  • Often people want to replicate the same workload to multiple sites. Sometime the same machine two different sites from the primary one (call this A to B and A to C), or from the primary to a secondary site and then a replication from the secondary site to a third site (A to B to C). You can't do either of these modes at the moment but watch this space. 
  • There is a concept of Virtual protection groups. VM and VMDK level consistency groups. This is very important for some applications which need to have data synchronised across multiple disks or across systems, its great to see this supported. 
  • Support for VMotion, Storage VMotion, HA, vApp. 
  • There are check points in the CDP stream every few second and you can set a time for doing a special VSS check point. This is excellent. 
  • Its vApp awareness is very good. If you add a VM to a vApp it will start replicating it. It also knows things like startup order within the vApp and retains that information for recovery at the other site. This is better then VMware Site Recovery Manager (SRM).
  • You can denote a volume as swap or scratch so its not replicated. It does replicate it once, just so it has the disk to be able to mount up to the OS. Once once replicated it does not send any writes made to the disk. This way you get a valid disk that will mount fine at the destination with the initial swap or scratch state. This is a great feature.
  • They will be able to pre-seed the destination disk at the other site to speed up the synchronisation, a big need in the DR space when you are dealing of very large amounts of bandwidth down restricted bandwidth pipes. 
  • There is no need for a shadow VM at the destination site. They are created on recovery or failover. At the failover the VMs are created and disks connected to it.
  • Failback is supported.
  • Test failover is provided and it can have read write capability. Replication continues to run while the test recovery is taking place (you always need to be protected). The test can't run longer than your CDP journal size. The test recovery is very efficient in storage size as it sources the reads from the replica journal and it does not have to create a full copy of the disk, so only your writes to the test copy take up additional space.
  • For the recovery migration you can do a move instead of a failover which does a shutdown of the VM first to give constancy.
  • For the failover you can choose the network to connect each nic to at the other site. You can specify different NICs for actual failover versus a test failover. It can also re-ip address the machine if required. 
  • Support the Nexus 1KV but as port groups. I don't think it can orchestrate network creation on the N1K.
  • Pre and post recovery scripts can be configured to run, so you can script actions to want ever you want, such as updating DNS entires etc.
  • Now the really really really nice thing is that you can destine to a VMware vCloud implementation. When you target to a vCloud you select which of your available organisation VDCs you want to recover to. Then, when you are selecting you networking, it presents the organisational Networks as your choices. Very nice. A demo was done or doing a failover to a VCD environment and it worked very nicely. I was quite impressed. I discussed with Oded how all of the provider side was handled, the multi-tennacy, security etc, just about everything had been covered an quickly explained. This showed to me that this stuff is very real and they have thought about this a lot. I see a lot of potential solutions for this that might work in a Enterprise space but that have no chance in the service provider space, but from what I could see I think Zerto gets it.
  • When you need to do a failover , what happens if the source site no longer exists. Well you go to the vCenter on the destination site and do it their. This is a problem in the Cloud space as the customers are not going to have access to the vCenter, only the provider. Today the provider is going to have to do the recovery for you if you your site is gone. Their is an API for the provider to use with their own portal. Ultimately Zerto are saying they will provide a stand alone interface to do this function. 
I really enjoyed the presentation for Gil and Oded. Not to many slides, a great demo, lots of explanation and really showing of what was unique about their offering. I am looking forward to learning more about what they are doing and in seeing their functionality grow, I think they have many things right in this new hybrid Cloud world.

Rodos

P.S. Note that I am at this event at the invite of GestaltIT and that flights and expenses are provided. There is also the occasional swag gift from the vendors. However I write what I want, and only if I feel like it. I write nice things and critical things when I feel it is warranted. 

VFD2 - Symantec


First vendor off the rank at Virtualisation Field Day # 2 was Symantec. It was an early start as we were having breakfast there. 

It was an interesting start as things took a while to get organised and the opening question was, who uses backup. Given you have a room full of top virtualisation bloggers I figure they can all be dangerous on the topic of backup. We also hear that Symantec is the #1 in VMware backup and they have been working with VMware for 12 years now. GSX and ESX were released to market in 2001 so they must have been there right from the very first day. 

First up was NetBackup.

George Winter, Technical Product Manager, presented on NetBackup.

Some general notes, assume these relate to NetBack but some refer to Symantec products in general.
  • They don't support VCB anymore as of the current version. 
  • On the topic of passing off VMware snapshotting to the array, they don't do anything today but in the next release (by end of 2012) this will be provided through something called Replication Director.
  • They have their own VSS provider for application quiescence which you can use to replace the VMware one. This is free of charge and included in the distribution.
  • We spent a while looking at dedupe and the different ways that you can do it with Symantec products. You have all sorts of ways of doing this from source based in the agent to hardware appliances that can replicated to each other across sites.
  • In regards to the lifecycle of retention policies you can have local copies, replicate to another site using depuce and might even also destine a copy to "the Cloud". There was little detail about what "the Cloud" means apart from a list of providers that are supported such as Nirvanix, AT&T, Rackspace or Amazon.  No details were provided to on the protocols that are supported, I am sure that can be sourced in the product information. Data destined to the Cloud is encrypted and the keys are stored on the local media server. In destining to Clouds it supports cataloging, expiring and full control of data that might be destined there.
  • They have an accelerator client that rather than doing source based dedupe do a changed block technique so they only send a small amount of data without the load of source dedupe. Symantec claim they are the only people that do this and its new in the latest 7.5 release.
  • For VMDK backups the files are cataloged at ingestion so when you need to do a file level restore you can search for where that file might be, you don't need to know which VM or VMDK it might have been in in the first place. When data is being stored, the files and their mapped blocks are recorded. So at restore time for a file they only need to pull the blocks for the file back in, you don't have to restore the entire VMDK which saves a lot of time, space etc.
  • Integration with vCenter. Backup events can be sent to the vCenter events for a VM and custom attributes can be updated with date of last backup etc. There is no plugin available but there is one coming but no details provided on this. 
There were some specific topics that sparked my interest.

vCloud Director

I am keeping my eye out for things around vCloud Director over the two days. Mike Laverick got the vCloud question in before I got the chance, asking what their NetBackup support was. They don't have anything today but have been working on it since it was first released. The good news is that this work is about to released this year. It always hard to get details about products that are not released but I tried to dig some sort of feature list out. It was revealed that there would be support for per tenant restore and it sounded like the tenant would be able to do this themselves. Going to be very interesting to see what features and functions this is going to really have. This should get some real attention as over the next 12 months I believe we are going to see many vendors start releasing support for vCloud Director.

VMware Intelligent policy (VIP)

One of the challenges about backup in a dynamic virtual environment is the effort to apply your policies to your workloads. To ease this pain VIP give you VMware protection on auto-pilot. It is an alternative method of selecting machines where new and moved VM's are automatically detected and protected. You specify a criteria which might match a particular VM which is based on 30 vCenter based definitions. These definitions can include things such as vApp details or even custom attributes. Its designed to help in the dynamic environments with have VMotion, Storage VMotion, DRS and Storage DRS. When you have this "rule based" matching one thing I am always concerned about is the hierarchy of rules as it can be very easy to have multiple rules that might match a machine. If multiple rules match it will apply both and do multiple backup of the machine. You can't set a hierarchy so have things like a default and then have an override for a more specific rule. I think this would be a great feature and suspect there might even be a way to do it, it might just have been my interoperation of the answer. 

Another element of VIP is apply thresholds. One issue in vSphere backup environments is that your backup load can effect the performance of production by causing an impactful load on elements of your infrastructure. NetBackup can "automatically balance backups" across entire vSphere environment (fibre or network), physical location (host, Datastore or cluster) or logical attributes (vCenter Folder, resource pool, attribute). 


Resource limits to throttle the number of backups which execute concurrently can be set based on elements such as vCenter, snapshots, cluster, esxserver and lots of different datastore elements. So for example you can set a resource limit such as no more than 1 active backup per datastore with no more than 2 active backups per ESX. A problems is that this is a global setting and that its fixed. It does not interact with the metrics from vSphere so it does not adjust and its for everything. I can see that you might want different values for different parts of your environment and for it to adjust based on load. This is the first release of this functionality so we should see this functionality build out in future versions. 

Next we had the Backup exec guys.

Kelly Smith & Gareth Fraser-King

Some general notes
  • Specific packaged solutions for virtualised solutions, targeted to the SMB.
  • Showed the new GUI (pictures below) which will be released next month. Looks very slick with lots of wizards. 
  • You can visually/graphically see the stages of protection for a workload. For example the backup, followed by the replication etc. When you go back and look at the machine you see the types of jobs associated with the machine, what they do and when they are scheduled. It gives you a workflow centric view.
  • Symantec are adding a backup option which can be destined to a Cloud provider (partnered with Doyenz.com) at which you can do a restore in the event of a disaster. I really would have liked to see this demo'd.



Here are some other thoughts from the session.

So why two backup products? We hear for example about the fact that there is no vSphere plugin for NetBackup but their is for BackupExec. Yes we know that there are historical factors, but if Symantec were to start again, why for technical reasons would you create two products? Its hard to summarise the answer as the conversation went around a little (maybe watch the video) but essentially their answers was because there are two markets, the big enterprise and the SMB to medium enterprise. Creating products, licensing and features sets that go across that entire spectrum of use cases is to hard, Symantec felt they really needed to have products target to the two different markets. I understand this argument, but as the audience are IT technical people, it would have been nice to hear about the technical aspects behind this. Maybe something about scaling catalog databases and how its hard to create a scaled down version or something. I did not really get why they needed two products (apart from history). However it was discussed that there are many techniques that are use by both products such as a lot of the dedupe functions. 

In regards to the execution to be honest I would have expected something a little more polished from a vendor such as Symantec. We spent a bit of time learning 101 about VMware backups, but given that the audience are bloggers and specialists of Virtualisation, this could probably be considered assumed knowledge. Maybe this was included for the remote audience, as the sessions were being recorded and broadcast. The format was also looking at some quite simple customer use cases, which I did not feel added much to explaining the value of Symantec products over other vendors. Also some of the explanations were inaccurate, such as talking about redo logs. Once we got into some of the cool things that Symantec do, and what they are doing different to others, it got a lot more interesting. Also we can be a prickly bunch so you need to know how to do objection handle really well. I noticed this improved during the morning.

Lastly a presenter needs to be flexible in their delivery. The NetBackup team insisted on finishing their slides and talking through the last 5 so fast no one could really listen to what was being said. We had very little time from the BackupExec team who I think had some really interesting stuff and way to long on NetBackup. I think the imbalance did not help Symantec overall.

Thanks to Symantec. It was a really interesting morning and we learnt a few things.

Rodos

P.S. Note that I am at this event at the invite of GestaltIT and that flights and expenses are provided. There is also the occasional swag gift from the vendors. However I write what I want, and only if I feel like it. I write nice things and critical things when I feel it is warranted.  

Wednesday, February 22, 2012

Virtualization Field Day #2 / Silicon Valley- pre event

Well I have escaped from the wet and dreary shores of Sydney to spend some time geeking it up with the crew for the Virtualization Field Day #2. Having been to one of these events before I know just how much hard work and fun it can be. Its so great to hang out with people so smart in their field, plus to hear direct from the best people within the presenting vendors.

The activities start Wednesday night with a get together dinner. Thursday and Friday are all of the vendor presentations. I arrived this last Sunday to do a few days work of meeting before the event. Of course I had to do a bit of the usual Silicon Valley shop hop around some of the favourite haunts for all things geek.

One place I went today, that I had never thought of before, was to Apple HQ. Here is some guy who was wearing a suit, who wears a suit in the valley, only me!

Who's the stiff in the suit!
The cool thing is that there is a company store there. Its not like an Apple store. It has a lot more Apple merchandise. It also has a t-shirt that can only be purchased from the Apple campus store. Of course I had to get one.

Apple Company Store
Of course I also had to do a trip to Fry's and pick up something. Ended up getting a 4 port 1G switch for the home office. I am sick of 100Mb transfer speed between me and the Drobo storage device (which hangs off a Mac mini). Also some of those nice little pop up speakers for use in hotel rooms etc. This is on top of the other stuff I pre-shipped to my hotel, none of which has arrive yet. I pre shipped a bunch of t-shirts from ThinkGeek for the kids and a SSD drive for me.

One place I have never been to here in the US is In-N-Out burger. My American friends rave about it. So I had to check it out.
The back wall of In-N-Out burger, the view from the car park.
I had to go the whole hog and get a burger, fries and a shake. I am told the way to order your burger is "animal" style, which means it comes with (I think) sautéed onions and chilli. The person I was with sort of made a mistake and somehow also ordered their fries done "animal" style. Can you believe it, they actually do that. Here is what it looks like.


After eating mostly healthy food for about a year it was great to chow down on great fast food. This stuff is fresh, you have to wait for it to be cooked. The fries are cut from whole potatoes just before they are cooked. However my stomach rebelled about half an hour later, the temple had been defiled! But was worth it. Repeat after me, "In-N-Out is occasional food!".

But what is going to be really fun this week is hanging out with the old friends plus some new people at the field day. The attendees this year are Edward Haletky, Bill Hill, Mike Laverick, Dwayne Lessner, Scott Lowe, Roger Lund, Robert Novak, David Owen, Brandon Riley, Todd Scalzott, Rick Schlander and Chris Wahl. Some real who's who of virtualisation thinkers. 

The vendors this event are interesting, we have Symantec, Zerto, Xangati, PureStorage, Truebit.tv and Pivot3. Some big names their, some interesting new ones and great to see that I will get to hear the thoughtful words of Mr Backup himself, aka Mr W. Curtis Preston again. 

The only vendor I will call out specifically as sparking some very high interest from me pre event is Zerto. They have DR capabilities with full integration to VMware vCloud Director. As I deal daily with one of the leading deployments of vCloud Director in the service provider space this really gets my brain juices flowing. There is big interest in this topic and I am really keen to see exactly what these guys have. I want to separate the hype from the reality and really hope that the reality is an exciting story.

You can see the details of the whole event over at the Field Day site http://techfieldday.com/, the links page really gives you all the resources you need.  The sessions will be broadcast online and you can follow the tweet stream via the hashtag #VFD2. 

More updates as the events unfold.

Rodos

Monday, January 09, 2012

Don't get fired over the cloud

An actual paper magazine arrived via snail mail today, Datacenter Dynamics Focus. Of course it was the August/September issue so those snails are really slow these days. I was flipping through the pages and the article "Don't get fired over the Cloud" caught my attention.

The premise of the article is probably summed up by this section entitle "The Danger in Change".
But here’s the rub: “You will be fired as a CIO if you don’t know where things are running. You will be fired if something goes down and you don’t know about it. If you are a CIO and some of your apps are running on Amazon and you don’t know about it then your costs are running out of control. If they are running a mission-critical app and it gets breached, you will get fired.” 
Much of this comes down to service management and security. “Those horizontals must traverse this fragmented world of choice. Service management in the New World Order changes dramatically,” Karolczak says. “In the Cloud you had better know which applications are in the Cloud and which cloud they sit on. One of the delusions of the Cloud is that you have unlimited bursts of separation. But you can’t, in most situations, separate the apps from the Cloud.”
Really, is this a little bit of Cloud washing? Could we not say "Don't get fired over stupidity" rather than over Cloud? Why target out Cloud? If your app goes down it does not matter if its running internally or externally, thats an implementation thing. If your SLA is that it can't go down and you don't implement it in a way that achieves that then your are in trouble, Cloud or not. The costs running out of control is nothing that new either, we have being living with virtual sprawl within the data centre for a few years and the hidden cost impacts it can create.

The statement is valid thought that "much of this comes down to service management". Thats really the point here. If you are not managing your environment, whatever it is, then thats where these issues can creep in and give you pain.

What the CIO needs to be focusing, in order to not get fired, are two things as they adopt Cloud.

First, is the integration into the service management infrastructure as the article mentioned. In order to adopt Cloud services a review of your SM implementation will be required. For example you might need to work through your ITIL practices ensuring that you have a method of delivering each function. Some functions might be undertaken by the provider and this should be documented. You will end up getting into the details such as what additional CI records you will need in your CMDB to track services which reside within various Cloud providers.

Second, is educating your technical teams on Cloud architectures and implementations. Cloud might be existing technologies integrated and sold in a different way but there is a lot to learn about how to best utilise various providers Cloud services. IT teams have had years of practice at building and operating technologies in a manner that will not get them fired. However I often see a lot of assumptions and miss understanding of how various Cloud services work which result in implementation gaps that do have the potential to get you fired. IT teams need to skill up through training or hiring.

Of course there is nothing new under the sun here, as IT professionals we have been dealing with these issues as information technology has evolved over the decades. Just think about our journey through virtualization over the last 5 years. Virtualization effected many of these areas as well, but these days you are more likely to get fired for not virtualising.

Rodos