Pages

Wednesday, December 24, 2008

What do you know about virtualization?

What do you know about virtualization? There is a quick 12 question quiz at Computer World. Go on, you know you want to give it a go.  Here is my score.

Yes, 100%. Much better than the just pass that I got on my prometric exam this morning, but that was not about virtualisation. 

A hint, don't answer what you think, answer what you think they want you to think. Can't say I agree with the answers, but its not to hard to pick the ones they want.

You know you want to try it!

Tuesday, December 23, 2008

The maturity model for cloud computing - 2009, the year of automation

James Urquhart over at cnet in the Wisdom of Clouds blog has an interesting post about "A maturity model for cloud computing".

The post is worth a quick read, it lists five high level steps, to quote:
At a very high level, each step of the model breaks down like this:
Consolidation is achieved as data centers discover ways to reduce redundancy and wasted space and equipment by measured planning of both architecture (including facilities allocation and design) and process.
Abstraction occurs when data centers decouple the workloads and payloads of their data center infrastructure from the physical infrastructure itself, and manage to the abstraction instead of the infrastructure.
Automation comes into play when data centers systematically remove manual labor requirements for run time operation of the data center.
Utility is the stage at which data centers introduce the concepts of self-service and metering.
Market is achieved when utilities can be brought together over over the Internet to create an open competitive marketplace for IT capabilities (an "Inter-cloud", so to speak).

James then goes on to detail out what organisations are doing in these area, where the opportunities lay, which is all good.

So what does this have to do with all those VMware customers out there? Why is this more important than a new name like vSphere? Well what struck me about this is how it sits with what I have been thinking about as important for 2009.

2009 is going, in my humble opinion, to be the year of automation for VMware/vSphere environments. If there is one thing thats worth investing in, its the automation areas of your virtual infrastructure.  Automation is often not tackled up front in the VMware lifecycle and it requires a certain level of maturation of virtualisation within the orgnaisation first.

Why automation? Many sites have started on the path of consolidation and abstraction, sure there is more to do in these areas, but this will be organic growth on the existing base. The industry/vendors will be running around releasing new versions, pushing clouds and all sorts of great things. Yet if you don't have your automation right these new paradigms may be difficult for you to adopt in late 2009 or early 2010. Automation takes time and investment, it will not occur overnight in your organisation. How can you package your application system into a vApp and describe is service characteristics if the only way to create that systems is days of manual installation and configuration?

Automation will also give good returns. In tight economic times automation can reduce TCO. Many of todays security risks and downtimes are caused by human error, not failing hardware, the more that can be automated, the more those up times can be increased.

Take a look at what VMware have done this year. They have released Site Recovery Manager, Stage Manager and Life Cycle Manager. They have new initiatives within the systems and life cycle management market with companies such as BMC, CA and HP. In 2009 we should see the release of the technology from the B-hive acquisition. It certainly looks like VMware themselves have been getting ready and we may see maturity and greater adoption of these offerings in 2009.

Do you see automation as a key initative for 2009? Leave your views in the comments section.

Rodos

Sunday, December 21, 2008

From K/L to VI4 to vSphere

Unless you have spent the last 48 hours entrapped in a shopping center doing last minute Christmas shopping you will have seen that the new name for the next version of VMware has been officially leaked. vSphere it is.

I say officially leaked because no one was game to spill the beans until someone with enough authority gave the okay. The name was mentioned at a user group meeting last week and wanting to report on it Jason Boche got authority from VMware marketing to say it in a wider forum. You do wonder if this is was a plan by VMware? Seems strange to have the big name change for your key product launched via a user group and the blog sphere. Are they just following the hype that Veeam are getting with the release of their new free product, I doubt it? Did they feel that it was going to get out anyway so might as well be part of it, maybe. Lets see how long it takes for the official press release to appear.

At the end of the day its just a name (sorry Marketing). This new version has been called many things, starting out with K/L. At VMworld you would hear lots of VMware employees use the phrase "K/L" and the Beta forum is labeled "K/L". Of course most people have been calling it VI4. It will be interesting to see how this name integrates into all the other recent name chnanges, VDC-OS et al.

Things could be worse, can you imagine what it was like for all of those die hard Citrix fans. One day your company buys this thing called Xen and then after a few months they rename just about every product in your sweet by putting Xen in the front of it. Now that has to put the whole vSphere name into perspective. It could have been EMCompute or something. Although I think Sean Clark gets the prize for Atmos-vSphere because EMC have Atmos, their cloud optimized storage. Atmos-vSphere, it has a certain ring to it don't you think.

Rodos

Friday, December 19, 2008

Blade enclosures and HA

HA has always been an interest of mine, its such a cool and effective feature of VMware ESX. Whilst it is so simple and effective the understanding of how it actually works is often a black art, essentially because VMware have left much of it undocumented. Don't start me on slot calculation.

So this week another edge case for HA came to my attention. Over on the VMTN forum there has been a discussion about the redundancy level of blade chassis. You can read all of the gory details (debate) there but Virtek highlighted an interesting scenario. Here is what Virtek had to say.
I have also seen customers with 2 Blade Chassis in a C7000 6 Blades in each. An firmware issue affected all switch modules simultaneously instantly isolating all blades in the same chassis. Because they were the first 6 blades built it took down all 5 Primary HA agents. The VMs powered down and never powered back up. Because of this I recommend using two chassis and limiting cluster size to 8 nodes to ensure that the 5 primary nodes will never all reside on the same chassis.

My point is that blades are a good solution but require special planning and configuration to do right.
I have never thought of that before, but the resolution can be better, this is not a reason to limit your cluster size.

You see with VMware HA you have up to five primary nodes and beyond that secondary nodes. Primary nodes do the (real) work and without a primary node everything goes foo bar. So what happened in Virtek's case? Well as you add nodes to the HA cluster they are added as primary nodes first. Therefore, if you purchase two blade chassis, spliting the nodes between, but add all of the blades in one chassis first, guess what, the first five all become primary. That lovely redundancy you paid all that money for has gone out the window as all the primary nodes will reside within the first chassis. As Virtek found, if all those hosts go, HA is unable to manage the restart of the machines on the ESX hosts in the other chassis, because they are all secondary nodes.

Is this bad, not really, the resolution is to reconfigure your HA once you have added all of your blades in to the HA cluster. This reconfigure will redistribute the primary and secondary nodes around the cluster, which should leave them spread across your chassis. Problem solved.

To determine which notes are primary, if you really want to check, run the "listnodes" command from AAM which will dump a report like this.
/opt/vmware/aam/bin/ftcli -domain vmware -connect YOURESXHOST -port 8042 -timeout 60 -cmd "listnodes"

Node Type State
----------------------- ------------ --------------
esx1 Primary Agent Running
esx2 Primary Agent Running
esx3 Secondary Agent Running
esx4 Primary Agent Running
esx5 Primary Agent Running
esx6 Secondary Agent Running
esx7 Primary Agent Running
If you want some more details on how HA works Duncan Epping has a great summary over at YellowBricks.

There, easily fixed and much easier to analyse compared to HA admission control and slot calculations.

If you have any further insights or links, post into the comments.

Rodos

P.S. Thanks to Alan Renouf via Twitter for the command line on listnodes as I did not have access to a cluster to confirm the right syntax.

Thursday, December 18, 2008

VDI and WAAS - Part II

On Tuesday I posted about VDI and WAAS, as part of an ongoing discussion on WAN optimisation for VDI.

Today I finally got some deeper detail from Cisco about VDI and WAAS, shout out to Brad for putting me onto it.

The document is the "Cisco Application Networking Services for VMware Virtual Desktop Infrastructure Deployment Guide" and it is well worth your time to have a flick through.

If you are looking at putting in WAN acceleration for VDI/RDP then you should read through this document, no matter what vendor you are looking at, to give you some good detail.

For example it details traffic flows, configuration details and performance results.



One of the interesting results is the response time. Remember as discussed its the response time that is critical in VDI/RDP over a WAN, rather than the bandwidth. In a test over a 1.5Mbps line with a 100ms RTT, the results were:

The response time measured at the remote branch office during a test of 15 simultaneous VMware VDI sessions shows a 4-times improvement. Cisco WAAS acceleration results in an average response time of 154 ms, and native VMware VDI achieves an average response time of 601 ms (Figure 18).



If our metric for user experience is under 200ms this shows that without WAAS we have a problem, with WAAS we have success.

Go have a read for yourself, its worth the time.

Rodos

Updated VMware Technical Resource documents listing #17

New documents added to the "VMware Technical Resource documents listing" on VMTN.

At version 17 there are now 198 documents listed with abstracts for searching.

VMware View Reference Architecture Kit
by VMware on 12/09/2008 
http://www.vmware.com/files/pdf/resources/vmware-view-ns20-deployment-guide.pdf

This reference architecture kit is comprised of four distinct papers written by VMware and our supporting partners to serve as a guide to assist in the early phases of planning, design and deployment of VMware View based solutions. The building block approach uses common components to minimize support costs and deployment risks during the planning of VMware View based deployments.

Included in this kit are the following materials:
  • VMware View Reference Architecture
  • Guide to Profile Virtualization
  • Windows XP Deployment Guide
  • Storage Deployment Guide for VMware View
http://communities.vmware.com/docs/DOC-2590

Rodos

Wednesday, December 17, 2008

The year that was - 2008 review of VMware world

Have you wondered why its the end of the year and you are just exhausted? Maybe its because for the last 12 months you have been trying to keep up with the frantic pace that is the virtualisation world form a VMware perspective. Here is a brief recap of what you had to digest and manage this past year.

Jan
  • Acquisition of Thinstall
  • Stage Manager released
  • Virtual Desktop Manager 2.0 released
  • Microsoft acquires Calista
  • Quest acquires Vizioncore
Feb
  • VMworld Europe
  • VMsafe announced
March
  • Lifecycle Manager released
April
  • Update 1 released
May
  • Site Recovery Manager released
  • Stage Manager released
  • Management and Automation bundles (M&A) released
  • Acquisition of B-hive
  • First Communities Round Table Podcast
June
  • Virtual Desktop Manager 2.1 released
July
  • ThinApp 4.0 released
  • Diane Greene is out and Paul Maritz is in
  • VMware Infrastructure Toolkit released
  • Free version of ESXi released
  • Citrix XenDesktop released
  • Update 2 released
August
  • Lab Manager 3 released
  • Update 2 time bomb failure and re-release of Update 2
  • Microsoft announces support for 31 applications running under a Server Virtualization Validation Program (SVVP) listed hypervisor.
September
  • Mendel Rosenblum leaves
  • ESX qualifies as first hypervisor for Microsoft Server Virtualization Validation Program (SVVP)
  • VMworld
  • VMware announces VDC-OS, vCloud and View
  • Fusion 2.0 released
  • Teradici partnership
  • Cisco and VMware announce the Cisco Nexus 1000V & VN-Link
  • Workstation 6.5 released
  • Server 2.0 released
October
  • Microsoft Hyper-V Server released
November
  • VMware acquires Tungsten Graphics
  • Update 3 released
December
  • VMware renames most components of the product line
  • View Manager 3 released
  • Site Recovery Manager Update 1 released
I am sure there are items I have missed, so post into the comments and I will update the list.

Have a wonder Christmas and a happy new year. Rest up, 2009 is going to be even bigger!

Rodos

FC storage maximums explained

Have you ever been confused by all of those maximums and descriptions in the Fibre Channel section of the Configurations Maximums reference document? Which ones are for a host and which are for a cluster? Well read on to find out.

Here is the table that lists the limits for Fibre Channel.



Whilst it’s simple it can be confusing, why do some have “per server” and others don’t. When it says "Number of paths to a LUN" is that across the cluster or for the host? Are you sure?

Well here is some clarification.

LUNs per server 256

On your ESX host you can only zone 256 LUNs to that host. That’s a big number.

LUN size 2TB

Your LUN can’t be bigger than 2TB, but you can use extents to combine multiple LUNs to make a larger datastore. Most SANs will not create a LUN larger than 2TB either.

Number of paths to a LUN 32

This is the confusing one, because it does not have “per server” and then the seed of doubt is sown. To confirm, this is a server metric, not a cluster one. Also note that only active paths are counted. So if you have an active/passive SAN that’s one path, even if you have redundant HBAs. If it’s an Active/Active SAN you can have four Paths (one for each combination of HBA and SP). In some high end storage arrays, like a HDS USP/VM you can configure way more than 32 paths to a single LUN, now that’s some scalability and redundancy.

Number of total paths on a server 1024

So if you have two HBAs on an Active/Active SAN that’s a maximum of 256 LUNs on the host. Hey, go figure, that matches the LUNs per server limit!

LUNs concurrently opened by all virtual machines 256

Again, you can only be talking to 256 LUNs from the one server.

LUN ID 255

No, its not a misprint, LUN counting starts a 0. This is effectively cluster wide, as for multipathing to work properly, each LUN must present the same LUN ID number to all ESX Server hosts.

So what do you really need to be worried about for the maximums? 256 LUNs is your limit per cluster and host, and you can have up to four active paths for each (but ESX will only use one path at a time for a particular LUN). Of course, as Edward recently pointed out the real limit may be your SAN, as some have limitations on how many hosts per LUN.

What other things should you be looking out for then? Here is a quick dump of some of the considerations in regards to FC and pathing.

  • Do not use a Fixed path policy if you have an Active/Passive SAN, use Most Recently Used, this avoids potential issues with path thrashing.
  • Round Robin load balancing is experimental and not supported for production use.
  • HA does not monitor or detect storage path failover. If all your paths to your storage fail, bad things happen. To quote the SAN configuration guide (pg 42) “A virtual machine will fail in an unpredictable way if all paths to the storage device where you stored your virtual machine disks become unavailable.”
  • With certain active/active arrays you can do static load balancing to place traffic across multiple HBAs to multiple LUNs. To do this, assign preferred paths to your LUNs so that your HBAs are being used evenly. For example, if you have two LUNs (A and B) and two HBAs (X and Y), you can set HBA X to be the preferred path for LUN A, and HBA Y as the preferred path for LUN B. This maximizes use of your HBAs bandwidth. Path policy must be set to Fixed for this case. Duncan has details of a script written by Ernst that can automate this process for you. Duncan writes in English which is helpful for us single language people.
  • When you use VMotion with an active/passive SAN storage device, make sure that all ESX Server systems have consistent paths to all storage processors. Not doing so can cause path thrashing when a VMotion migration occurs.
  • For best results, use the same model of HBA in one server. Ensure that the firmware level on each HBA is the same in one server. Having Emulex and QLogic HBAs in the same server to the same target is not supported.
  • Set the timeout value for detecting when a path fails in the HBA driver. VMware recommends that you set the timeout to 30 seconds to ensure optimal performance.
  • For boot from SAN, if you have an active/passive SAN the configured SP must be available, if its not, it can’t use the passive one and the ESX host will not boot.
Further details can be found in the Fibre Channel SAN Configuration Guide. Thanks my Simon my local VMware SE for letting me push to the pedantic edge the issue of is "Number of Paths to a LUN" really a host number, I wanted it in writing.

Rodos

Details of VMware View Reference Architecture - Part I

News came out today about the View Manager Reference Architecture papers. We have been waiting for some further technical details about scaling and best practices, so what was released and what does it mean?

First things first, where to do you get them? Well you have to register for them. VMware have enough issues with where to find technical documents and they have gone and introduced yet another method? Thankfully the pack is listed on the Technical Resources page where people would naturally go looking for reference architectures; however this simply presents a link back to the registration page! 

Okay, lets be clear, you are being evil VMware. Do marketing really think that someone is going to want to read these documents and not already have some form of contact with VMware that they can track and follow to pursue sales leads. Either you have already purchased it, downloaded the evaluation (which collects more information) or you obtained it from a VMware sales person. Come on, you can’t be so desperate for sales leads that you think this is going to add any real value to your pipeline. Okay, I feel better to have that off my chest.

Here are the documents and their descriptions.
  • VMware View Reference Architecture - A guide to large-scale Enterprise VMware View Deployments
  • Guide to Profile Virtualization - Examine traditional approaches to user profiles and discover a new way to approach user profiles in a heterogonous environment while delivering the best overall user experience.
  • Windows XP Deployment Guide - This guide suggests best practices for creating Windows XP-based templates for VMware View based solutions and for preparing the templates for use with VMware View Manager 3.
  • Storage Deployment Guide for VMware View - Review a detailed summary and characterization of designing and configuring the EMC Celerra NS20FC for use with VMware View.
In this “Part I” we are going to dive into the main document, the Reference Architecture.

The document consists of 36 pages detailing the infrastructure required to create a building block which can support 1,000 users. Also included is the components to integrate 5 building blocks, to support 5,000 users. Its vendor agnostic (except reference to EMC), so you can put your vendor of choice into each component, and they might bring some special additional feature to the table (but these are not considered). The first 14 pages rehash lots of features and functions of VMware View without actually giving much detail about the reference architecture, but its worth having there though.

The architecture is based on large-scale, LAN-based deployment, so it does not delve into the WAN acceleration space. Recommended bandwidth is 100-150Kbps without multimedia and latency below 150ms, so nothing new or changed here.

When describing virtual disk management the following is stated.
“Because VMware View Manager supports a variety of back ends, such as VMware View virtual desktops, Microsoft Terminal Services, Blade PCs, and ordinary PCs, it is recommended that a robust profile management solution be implemented. Profile management solutions such as RTO Virtual Profiles or Appsense can work in place of, or in conjunction with, a VMware View Composer user data disk. Profile management helps to ensure that personal settings will always be available to users who are entitled to multiple back-end resources, regardless of the system they are accessing. Profile management solutions also help to reduce logon and logoff times and to ensure profile integrity.”
This is an important point and one I will come back to in a Part II. I met today with Appsense to nut out some particular nasty VDI use cases and they have some great technology for VDI (which I started looking at back in Oct 2008 when VDM 2.0 was in Beta).

Here is what the building block looks like.



Each block consisted of a blade chassis with 16 hosts split into two clusters, A & B. Cluster A was for testing a dynamic work load of office type workers which consisted of 25 Persistent Full Clones, 225 Persistent Linked Clones and 250 Non-Persistent Linked Clones. Cluster B for a more static load of task-oriented workers and it consisted solely of 500 Non-Persistent Linked Clones. 

There was one vCenter for each block of two clusters. The blades were dual quad core 2.66 Ghz with 32G of RAM. It’s a little hard to understand the networking configuration of the blades, because it’s laid out weird in the parts tables. 10G interconnects were not available when they did the test and it utalises iSCSI for the storage fabric. There is no clear detailing of the networking apart from a logical diagram and some scattered info of how it hangs together, this could be improved through further diagrams or some more details. 

For the 1,000 desktop tests two VDM Connection Servers were used, for the 5,000 there was five. Desktop connections were direct hence no SSL.

All of the supporting servers (VDM, AD, SQL, vCenter etc) where virtual but resided outside of the block.

The storage was a NS20FC with a CLARiiON CX3-10F backend with 30 x 300GB 15K FC disks. It was very hard to tell but it looks like each blade enclosure was directly connected via Ethernet to the SAN. Seven Datastores were created for each cluster, which gives around 64 VMs per Datastore.

The good thing here is that these are all sensible and common components which can be achieved in the real world outside of the lab where money for things like RAM sticks is irrelevant. 

Provisioning results are detailed. The 500 linked clones in Cluster B took 161 minutes, or just under 33 seconds each, that’s nice and fast. One interesting point that was highlighted is that each data store gets one replica for its clones to use and that different pools will share the same replica if they are off the same Parent version (snapshot), sweet. I knew they were copied per datastore but not that they were shared across pools.

Load testing was done with normal MS office applications plus Acrobat, IE and real time Anti-virus, performing actions for 14 hours. Each desktop had a 8Gb disk and 512Mb of RAM. 

What was the result? 
“1,000 users can easily be maintained by this architecture [with fast application response time] using the provided server, network, storage resources, and configuration”. 
Application response times are detailed and they look fine, but there is no comparison to  benchmark against.

In the performance testing neither cluster went over 50% utilization. The VM to core ration is just under 4:1 which is on the money. Given the CPU utalisation it looks like you could got to 5:1 and run around 640 machines, however more RAM may be required. The vCenter and SQL server managed fine with the load. Likewise the storage in terms of CPU and IOPS was fine with spare capacity. 

So what’s missing. 

Well a glaring omission is any metrics on memory utalisation. As most installations are memory rather than CPU bound it would have been interesting to see how close to the wire the memory footprint would have been? What about over commit, how much transparent pages sharing was there? Sure it’s a reference platform but they would be very helpful figures to see, certainly no less helpful than the CPU ones. Are VMware trying to hide something here? I doubt it, but it does beg the question then why not include it?

Lastly it would have been good to see a lot more configuration data, how the storage was laid out, some more of the network connectivity. Yes its trying to be vendor agnostic but if you can include VLAN ids you can include a storage design. There is a lot of this information in the other reference paper on the NS20, but people are not going to look their for high level stuff if they are not condsidering that SAN.

There you have it. In Part II we will look at some of the other documents.

Rodos

Tuesday, December 16, 2008

VDI and WAAS

Further to my previous post on Is network acceleration useful for VDI?. Cisco have a post up on their Data Center Networks blog about VDI. In the white paper linked off this post it states
The joint Cisco and VMware solution optimizes VMware VDI delivery and allows customers to achieve the benefits of VMware VDI by providing the following features:
  • Near-LAN performance for virtual desktops over the WAN, improving performance by 70 percent
  • Increased scalability of the number of VMware VDI clients, increasing the number of clients supported by 2 to 4 times, and massive scalability of VMware VDI and VMware VDM data center infrastructure
  • 60 to 70 percent reduction in WAN bandwidth requirements
  • Optimization of printing over the WAN by 70 percent, with the option of a local print server hosted on the Cisco WAAS appliance
  • Improved business continuity by accelerating virtual image backup by up to 50 times and reducing bandwidth by more than 90 percent
Just like other vendors some bold claims. Some of these are simply distractions, improving transport of virtual image backup across a WAN link for DR purposes is not really a VDI issue, thats stretching the friendship and moving into marketing spin. However I have seen some of the internal Cisco analysis on the acceleration of RDP/VDI and it does look to stack up. Hopefully in the new year I will have completed some testing.

As mentioned in my post on Cloud,  the need and adoption of WAN acceleration is going to heat up next year. We all need to pay attention to this space, with the usual eye of lets see the reality. If they vendors are going to keep spruiking it, fair game that we hold them to account for deeper details.

Rodos


Monday, December 15, 2008

Cloud computing conference report

What would a Enterprise, in particular one using virtualisation, take away from a Cloud computing event? What if the speakers were from Cisco, Yahoo, Google, Microsoft, Baker & McKenzie and Deloitte Digital? Well two weeks ago I went to such an event, took lots of notes and engaged in some interesting discussions. Here is what some of what occurred and my updated thoughts on the Cloud space.

The event was held by Key Forums and was help in Sydney, Australia on the 3rd of December. Billed as a one day, comprehensive conference, the expectation was
This conference will kick start your Cloud strategy and will get you up to speed on what the main players, critics and users think about the potential of Cloud Computing. It will give you the opportunity to discuss your issues surrounding Cloud Computing with your peers and our expert panel of speakers.
The speakers were local heavy hitters and an impressive list. Some of their presentations are available online and worth a look. The Google one was originally public but has been pulled at Google’s request. Of course the presentation from the legal firm was never public, go figure.

Missing were the industry analyst types, such as IDC and Gartner and as a result there was not a lot of this is what the industry and market are doing. Instead what was presented was what the speakers form the organizations are thinking about in terms of direction in the cloud space. If you were to pick the vendors in the Cloud segment the only two vendors missing were Amazon (not much of a presence in Australia) and VMware. Amazon is fare enough, they don’t have much of an interest in Australia, however I think it would have been good to see VMware there, as I believe they have a lot to contribute to the space.

What were the highlights then, as there is way too much to report on in detail in this forum and format?
  • All of the vendors see a strong growth in the various forms of Cloud computing. To paraphrase Anna Liu from Microsoft, “You either embrace and anticipate these changes or get left behind”. For example Cisco are one of the biggest users of Salesforce.com, the NY times used Amazon and Hadoop to convert 4TB of images to text, in just 24 hours, Gmail has over 10 million paid accounts and Cisco now own Webex (a SaaS based app) and its in their list of top return on investment acquisitions.
  • There was a reasonably common view of what cloud computing is and is not, yet each company had a particular slant to their area of the cloud (which you would expect). The breakdown was typically your SaaS, PaaS and IaaS. From the IaaS site Cisco were certainly the most strong as you would expect, although it was termed “hosting” which was a little strange. In terms of alignment with VMware Cisco was very close, even mentioning the recent visit my Paul Maritz and discussing hybrid cloud of on and off premise.
  • Both Cisco and Yahoo spoke about virtualisation and the changing space of the server world. Keith Marlow from Yahoo! made the obvious but insightful comment that the overhead of virtualisation was a constant and due to the increasing power of processors this constant was becoming irrelevant. Keith showed a picture of the Yahoo! datacenter with 20,000 nodes. With all of that processing power sitting there, it makes economic sense to sell some of the capacity. Kevin Bloch from Cisco spent some time talking about virtualisation and increasing core counts and how this was changing the game in data centers.
  • Cisco made a good observation that WAN acceleration technologies are going to be important as clouds are built and federated with more and more data flowing around. Of course WAAS was mentioned.
  • Security was mentioned multiple times as issues being presented as a concern by the market place. Whilst everyone acknowledged and respected this, the responses were generally aligned. The view was that currently people trust their banking and credit cards to the Internet and forms of cloud, so we are already starting to see acceptance. Comparison was made to the level of skills, quantity of people and attention to the problem that the providers give compared to the usual very small set of security staff within an organization. (However as one college reminded my today, they have much larger attack surface to cover too).
  • The concept of access from any where at any time from any device came up a number of times. This links in very well with the VMware vision for View and access to your desktops, applications and data from anywhere at any time.
  • Openness was certainly a theme from Yahoo! and Google. Open APIs, or Open Standards, being able to embrace and extended were seen as important. 
  • Certainly the presentation from Microsoft was the best, from my view anyway. Anna was not only a good speaker but had some good insights into the space. I recommend flicking through the presentation and I would love to hear her speak again on the topic. Some great comments such as just because you have control of an SLA, that is, its in house, does not mean that the SLA is any better than it might be if you don’t control it, such as in the cloud. A good contrast was made between control and economies of scale. A car gives you lots of flexibility for transporting things anywhere you want but does not have as good economy of scale. A freight train has great economy of scale but comes with a constraint of controlling flexibility. This balance of control vs scale was compared to build vs buy and on premise vs in the cloud. The Microsoft way was to move to services from the cloud (SaaS) or rewrite applications (PaaS). Around PaaS the view is strongly aligned with .net and hence Azure. Anna indicated that moving is “non trivial” and that there is going to be on premise and cloud models for quite a while. 
Here is a picture of the Yahoo! room with 20,000 nodes.



So if this conference was to help one understand and think move about the Cloud computing space, what influence did it have? Did it change or enhance my thinking on Cloud? In some ways yes, and I recognize that I too come at cloud from a very specific angle.

Here is my current positioning.
SaaS is going to be a massive market. However in the main this adoption is going to be more in growing new services that are tactical rather than the main game of organizations core business processes. Yes Cisco may move to Salesforce.com but for most enterprises core elements are going to stay in house. It’s the new systems and non core that are going to see the most growth in SaaS.

When that project team needs a new intranet site, rather than waiting two weeks for IT to not deliver, they will put their credit card number into a website and be up and running in 10 minutes.

The challenge for VMware here as well as the opportunity is to capture some of this space with Virtual Appliances. If a mature set of Virtual Appliances can be available through the market place; if these can be downloaded from within the management environment painlessly; and if they can not only describe their service levels, backup and disaster recovery requirements (vApps) but also implement them automatically (AppSpeed, Data Recovery Appliance, SRM) VMware could be onto a good slice of this pie. Put your credit card in, download the service and run it locally in your own security model, owning the data. You can even still have the provider maintain the application as part of a maintenance agreement. Most of the benefits of SaaS without many of the current concerns and limits. VMware need to lead by example here.

PaaS is the challenging space. Microsoft is going to push .Net real hard, and that means Azure. The challenge for all of the players is to support open languages such as Hadoop and Ruby on Rails. This strategy is good for customers in the enterprise. If they develop their own applications in these open languages they can execute them internally on their own clouds. After all a key element of VDC-OS is running the workloads of today and tomorrow. With VDC-OS you can run mixes of work loads and change them on the fly, one day 10% of your cluster might be running Hadoop nodes and tomorrow, to scale up for a specific project work load, it may be 30%. Even better because you have written to an open standard you can go to a provider in the open market to buy capacity for short or medium term, if you really need to scale up. Even if you run that application out in the external cloud, if there is a problem, or for testing, or for DR, you can always run it in house on your own cloud if needed. VMware need to work hard here to not let the ISV market get away from them. Maybe that’s why Paul Maritz can’t stop saying Ruby on Rails.

IaaS is VMware’s sweet spot. The enterprises know they need to move to the benefits that cloud and utility based computing can bring. They want to run like Google and Yahoo! The challenge is how to do this in today’s environment and that’s where VDC-OS, vCloud and vApp come into play. Running internal clouds, federating them, running the workloads of today and tomorrow, its not just a dream, its like Christmas, you know its coming and its not far away. The problem for VMware is to not been seen like Azure and be another closed shop. That’s why it’s good to see vApp being based on OVF. We know VMware is the best system for running workloads, if we can keep the portability of workloads it means the best technology wins. If it all goes closed, the best marketing company wins, and that is not VMware. Also VMware need to keep tight (like they already are) with complementing technologies. Networking is going to play a huge role in enablement of the cloud, from things like split VLANs to WAN acceleration.
If you are in Australia and interested in cloud computing like I am. I would recommend you put March 25th, 2009 in your calendar. IDC are holding a Cloud Computing Conference on this day in Sydney. See, aren’t you glad that you read all the way to the end of this post! You can register for free as an early bird attendee. 

Also Richard Garsthagen in revealing details of VMWorld Europe 2009 has state that it will “probably” have a special pavilion for vCloud providers (2:20).

If you have some thoughts on the Cloud space, leave a comment.

Rodos

Thursday, December 11, 2008

Linked Clones Not A Panacea?

Over at vinternals Stu asks if linked clones are the panacea that a lot of people are claiming about the storage problem with VDI? I say yes, however we are moving from designing for capacity to designing for performance, and VMware have given us some good tools to manage it. Let me explain a bit further.

Stu essentially raises two issues.

First, delta disks grow more than you think. Stu considers that growth is going to be a lot more than people expect, citing that NTFS typically writes to zero'd blocks before deleted ones and there is lots of activity on the system disk, even if you have done a reasonable job at locking it down.

Second SCSI reservations. People are paranoid about SCSI reservations and avoid snapshot longevity as much as possible. With a datastore just full of delta disks that continually grow, are we setting up ourselves for an "epic fail"?

These are good questions. I think what this highlights is that the with Composer the focus for storage for VDI has shifted from an issue of capacity management to performance management. Where before we were concerned with how to deliver a couple of TB of data now we are concerned with how to deliver a few hundred GB of data at a suitable rate.

In regards to the delta disk growth issue. Yes, these disks are going to grow, however this is why we have the automated desktop refresh to take the machine back to the clean delta disk. The refresh can be performed on demand, as a timed event or when the delta disk reaches a certain size. What this means it that the problem can be easily managed and designed for. We can plan for storage over commit and set the pools up to manage themselves.

To me the big storage problem we had was preparing for the worse case scenario. Every desktop would consume either 10G or 20G even though most only consumed much less than 10GB. Why? Just in case! Just in case one or two machines do lots of activity and because we had NO easy means of resizing them we also had to be conservative about the starting point. With Composer we can start with a 10GB image but only allocate used space. If we install new applications and decide we really do need the capacity to grow to 12GB, we can create a new master and perform a recomposition of the machines. Now we are no long building for worse case but managing for used space only. This is a significant shift.

So happens today there was a blog posting about Project Minty Fresh. This installation has a problem with maintaining the integrity of their desktops. As a result they are putting a policy in place to refresh the OS every 5 days. This will not only maintain their SOE integrity but also keep their storage overcommit it check.

In regards to SCSI reservations. I do believe that the delta disks do still grow at 16MB and not some larger size. So when the delta disks are growing there will be reservations, and you will have many on the one datastore. Is this a problem? I think not.

In the VMware world we have always been concerned about SCSI reservations because of server work loads. For server work loads we want to ensure fast and more importantly predictable performance. If we have lots of snapshots that SQL database system which usually runs fine now starts to behave a little differently. Predictability or consistency in performance is sometimes more important than the actual speed. My estimation is that desktop workloads are going to be quiet different. In our favor we have concurrency and users. All those users and going to have a lower concurrency of activity, given the right balance we should have a manageable amount of SCSI reservations, if not we rebalance our datastores, same space, just more LUNs. Also unlike servers, will users be able to perceive any SCSI reservation hits as they go about their activity. Given the nature of users work profile and that any large IOs should be redirected not into the OS disk but into their network shares or user drives the problem may not be as relevant as we may expect.

What Stu did not mention and we do need to be careful of because it can be the elephant in the room is IO storms. This is where we really do have some potential risk. If a particular activity causes a high currency of IO activity things could get very interesting.

Lastly, as Stu points out, statelessness is the goal for VDI deployments. Using application virtualisation, locking down the OS to a suitable level and redirecting file activity to appropriate user or networked storage is going to make a big impact on the IO profile. These are activities we want to undertake in any event, so the effort has multiple benefits.

I too believe you need to try this out in your environment, not just for the storage requirements, but also for the CPU, user experience, device capabilities and operational changes. VDI has come a long way with this release and I do strongly believe it will enable impactful storage savings.

What I really want is the offline feature to become supported rather than just being experimental. Plus I want it to support the Composer based pools. There is no reason why it can't and until then, there is still some way to go before we can address the breadth of use cases. However there are plenty of use cases now, which form the bulk, to sink our teeth into.

Rodos

Updated VMware Technical Resource documents listing

Two new documents added to the "VMware Technical Resource documents listing" on VMTN.

At version 16 there are now 197 documents listed with abstracts for searching.

Added Storage Design Options for VMware Virtual Desktop Infrastructure
Added Using IP Multicast with VMware ESX 3.5

http://communities.vmware.com/docs/DOC-2590

Rodos

VMware release searchable HCL system

VMware have released a new searchable HCL system that makes it much easier to check for compatibility. What does it look like and how do you use it? Read on.

Here is the URL
http://www.vmware.com/resources/compatibility/search.php
and the following image is an example search.



What do we have here? Lets say you have a BOM for a system, it includes a new card you are not familiar with, a NC360T. Enter that into the keyword search on the IOs tab. Great news, its supported in just about all versions, any they are all listed in front of you, could not be easier!

The page comes back with a number of components in the results. The first is a categorization based on partners. Lets say you type in something generic, this lets you quickly filter down to a particular vendor or subset, excellent feature.

For the results many of the details are hyperlinks. I have shown an exploded view of what it looks like when you click on a element. The opening page shows the specific details of that item.

This is a great new feature and is going to make our job so much easier. Do you self a favor and go and have a play with it, then create a bookmark!

Rodos

Wednesday, December 10, 2008

Is network acceleration useful for VDI?

Many of the WAN acceleration vendors (Cisco, Riverbed, Expand, Packeteer, Cisco WANscaler, Exinda) are sprouting some amazing statistics for improvement of VDI performance over a WAN. However are these claims realistic? How does one navigate the VDI and WAN acceleration space?

Here is a claim from one vendor
Expand’s VDI solution can can provide acceleration by an average of 300% with peaks of up to 1000% for virtual desktop user traffic.
Is this realistic? Sounds like a lot of sales and marketing to me. I remember reading a white paper from one vendor sprouting their lead in the acceleration of VDI, which consisted of the argument that by reducing the bandwidth of the other protocols over the link VDI would magically get faster. Whilst true, if that’s all there is it’s not really an improvement on RDP is it? I am sure this was Riverbed but I can’t find the paper and its not on their website anymore. Instead on the page about virtualisation they talk about ACE in the desktop space, good grief, get with the program, no mention of VDI at all.

So how do we make some sense of this space?

Firstly there are two areas that need addressing, RDP and printing.

Lets start with RDP, it’s a protocol that could do with some assistance over the WAN.

RDP as a protocol is chatty, by which I mean there are lots of smaller size packets which go back and forth, rather than a large data steam that goes generally in one direction like a file transfer. This makes RDP more affected by latency and packet loss. It needs some bandwidth but not huge amounts for normal workloads. As a guide I always use 30k to 100k bps per user with 150ms or lower latency. The bandwidth is going to depend on screen size, colour depth and the type of work you are doing. Also remember that as you add users its not a simple multiplier of bandwidth required, as you are more concerned about concurrency than number of users, not everyone is banging on their keyboards and refreshing their screens constantly or at the same time.

The WAN acceleration providers have three basic techniques for improving RDP itself.
  1. Improving the delivery and flow of TCP/IP. Riverbed call this Transport Streamlining and Cisco call it TCP Flow Optimization (TFO). Through lots of clever tricks at the TCP/IP layer to minimize effects of latency, fill packets, adjust window sizes and improve startup times these techniques can have a good impact on perceived and measured performance. The good thing here is that this occurs irrespective of the application protocol being transported, but different protocols will give different results.
  2. Compression. All the vendors use compression to reduce the amount of traffic. However the important point here is that if the traffic is already compressed or if it’s encrypted the effectiveness of this compression may be negligible. 
  3. Prioritization. If there are different protocols or even different sources or destinations, certain traffic can be prioritized over others. By jumping the queue time sensitive protocols such as RDP can show improved performance from the user perspective. However it’s important to remember that if the majority of traffic is all the same priority, improvements are not going to be achieved. If all the traffic over the WAN link is now just RDP traffic, it’s hard to make any improvements via prioritization.
There is another technique that the vendors sprout, which is application protocol specific acceleration, such as for CIFS or HTTP. By understanding the protocol and placing in optimizations to handle it specifically, dramatic gains can sometimes be achieved. However I have yet to see a vendor who when you dig deep, actually has techniques for improving RDP specific to its protocol. For RDP, it always falls back to the above categories of generic improvements around flow, compression and prioritization.

So how can this be put to advantage in VDI deployments? Here are my tips to give your WAN accelerator every chance at success.

  • If you do have a mix of other traffic over the same link you may get some good results. With other traffic bandwidth reduced and prioritization applied you should see some improvement in your RDP traffic.
  • Remove encryption from the RDP sessions. For your XP desktops you want to be installing the Microsoft patch http://support.microsoft.com/KB/956072 to allow the registry changes for turning the encryption level of RDP down or off. Of course you will need to take into account the security considerations and you may use some techniques in your WAN accelerator to add some encryption back in (after the fact). However removing the encryption allows for much better compression and caching of the data.
  • Turn off compression of the RDP sessions. This is described on page 152 of the View Manager 3 manual, the setting is “Enable compression” and it’s enabled by default.
Now let’s turn our minds to printing.

Moving to server based computing and therefore moving print jobs from the LAN to the WAN often causes headaches. Print jobs have a tendency to become large, which is not usually a problem when there is a 10 Mb or 100Mb link between the users machine and the printer. However put a number of users behind a much smaller link and print jobs start queuing up. The printer can sit their slowly receiving a large job over the small link, and nothing else prints until its all finished. That one small job of a single page right at the end of the queue has to wait and wait, as does the user who printed it. This is nothing new, it has been an issue in the Citrix world for a long time and vendors like ThinPrint have come out with some fantastic technologies to compress and prioritize print jobs, with some amazing results.

Where does WAN acceleration come into the printing issue? Today many WAN acceleration vendors include specific printing acceleration services within their solutions. Yes, VMware View Client for Windows comes with a version of ThinPrint but that is only for printing to a local printer through the client. This can provide part of the solution but most Enterprise customers in my experience will have networked printers in the remote offices run off central or remote servers, and with VDI the print jobs will not go via the client. Therefore you may find that your WAN acceleration solution also solves much of your new printing problems too. It may not provide a Universal Print Driver or have all of the nice features like prioritization (or it may) but it may be good enough.

So don’t be fooled by all of the marketing by the WAN acceleration vendors, they are all attempting to jump on the VDI bandwagon and it is causing some confusion. Try to understand where the improvements are actually happing, are they a primary improvement to the actual protocols or are they simply secondary effects of fixing other data going over the link? How will you be able to integration printing? Armed with some knowledge you should be able to evaluate vendors products based on their presented information plus also be able to conduct some good testing.

I plan to get some tests done with Cisco WAAS and VDI over the Christmas period if I can, I will let you know how I get on.

On a side note, the VMTN Community Round Table this week has Jerry Chen, Senior Director of Enterprise Desktop Virtualization as a guest this week. If Jerry runs out of things to talk about lets see if we can get him to give a view on acceleration of RDP and what he knows or has seen in the market? Maybe even some futures stuff or who VMware is working with.

Rodos

Wednesday, December 03, 2008

Does VMware have too many locations for technical materials

Do you think that VMware have too many locations of important or relevant technical materials? Its starting to feel there are a lot of places where some good content has the potential to be isolated or fragmented.

Here are some of the places that I know of just off the top of my head which contains content.
  1. VMTN. Here you will find community documents but some VMware teams put their content here too, such as the performance group.
  2. VMware.com Resources - Technical Resources page. This page lists all of the technical papers, probably the most valuable resource.
  3. VMworld Community. This is a new space and vendor and VMware information is starting to appear scattered throughout.
  4. VMware Whitepapers page. More general white papers, and this does link to the Technical Resources list.
  5. VI:OPS. the Virtual Infrastructure Operations site.
  6. Documentation sets.
  7. Knowledge base. Certainly a specific type of content thats going to be seperate. 
There are most likely others and I know there are some more on the way. I am sure there is an argument that each of these is targeted to a particular audience and its great to see VMware using the comminity to expand and enhance IP in multiple ways. However it does feel like things are starting to get fragmented and there is certainly a lot to keep your eye on if you want to be on top of the space. 

I wonder if VMware have a strategy here or are we seeing simply seeing Web 2.0 at work?

Rodos

Tuesday, December 02, 2008

Storage Analysis of VMware View Composer

Can I turn 16TB of storage for 1000 VDI users into 619GB, let me show you how it’s actually done. The release today of VMware View Manager 3 brings to market the long anticipated thin provisioning of storage for virtual desktops. Previewed in 2007 as SVI (Scalable Virtual Images) what does this now released View Composer linked clone technology look like under the hood? How much storage will it actually use?

Here is the diagram presented on page 94 of the View Manager Administration Guide (http://www.vmware.com/pdf/viewmanager3_admin_guide.pdf).



This diagram as presented is a conceptual view of the storage. The important logical elements to note here are

  • Parent VM. This is the standard virtual machine you use to create and maintain your various versions of your image. It can have various versions as different snapshots.

  • Replica. The replica is a copy of a specific version of the Parent VM. That is its one of your snapshots states of the Parent. The key thing here is that the disk in the replica is a thin provisioned copy of the parent disk and snapshot.

  • Clones. The diagram shows two clones. The clones are an instance of a replica for a particular VM. For its disk the clone uses the thin provisioned disk in the replica plus its own snapshot to provide the disk. Changes the clone makes to its disk are isolated in the clones snapshot and the replica disk remains untouched and shared by all the clones.


The diagram in the manual is not a great representation, here is my own one that adds some needed detail. Of course there is more complexity, but we can handle that.



What are we looking at here.


  • Each box is a directory in your Datastore. You have one directory for your Parent VM (what I have labeled base image), one for each replica, a special one called a source and one for each clone (which I have labeled user).

  • The Parent VM (base image) in blue is your standard VM. Notice that the C: disk is thick provisioned as is normal in ESX. A 15G disk will consume 15G in your base image. Then you have your snapshots. Notice that the C: and snapshot 0002 have been combined logically, as this is our view of the disk from the VM.

  • Using the Add Desktop wizard in the View Administrator you can create a pool of desktops based on a snapshot from a ParentVM. As part of the process you have to choose a VM and one of its snapshots. When this is done a unique replica is created. This process is marked as (1) on the diagram. Here a copy of the machine is performed, into a new directory however the disk is thin provisioned. If our original disk was 15G yet only 2G was consumed, the disk in the replica will only by 2G. This process can take a short period of time as the data copies, but it is a once off process. This thin provisioned disk is the master disk that all of the clone VMs will use as their base. You can make changes to the parent VM, and the replica can not be harmed.

  • What is not shown in the documentation is that a source directory is also created. This source directory is unique to the replica and contains all of the files required to make a clone. These files are essentially your standard VM files with an empty snapshot of the disk in the replica. It is my thinking that clones are created by making copies of the files in this directory. This is why the cloning process is very very fast, all of the work in the background is mostly done. My testing shows under 60 seconds to deploy a new clone. Again this creation of the source directory is a once off process.

  • The clone (labeled user) directories are created once for each VM in the pool as required/directed by your pool configuration. The directory name is based on the naming convention given at pool creation. Here we have the files required to run the VM instance. The two important files are firstly the snapshot file which is based of the thin provisioned disk in the replica directory. This is where all of the writes for the VM will be stored, so this file will grow over time (until you Recomposition, Refresh or Rebalance the VM). The diagram tries to shows how the C: drive is made of the combination of the thin disk from the replica directory and the local snapshot file in the clone machine. A separate thin provisioned disk is also created for the user D: drive. This is where the quickprep and user data is stored. This user D drive will grow over time as data is put there, it can’t cant be shrunk.


There you have it, the storage layout of View Composer. What does it look like in reality? Here are some screen snippets.


Datastore Directories.

Here is the directories in the Datastore. You can see there is one Parent VM (XP_VMMAST), one replica directory with its matching source directory, and 3 clone directories XPROD 1 through 3.


Parent VM directory files

These are the files in the parent VM, all the usual suspects and the disk is 15G thick provisioned.


Replica directory files

These are the files in the replica directory. Notice the disk has shrunk to only 2G as its thin provisioned and there is our snaptshot which does not really get used.


Source directory files

These are the files in the source directory. They are pristine and clean, ready for use as a clone. Notice the vmdk file, its based on the replica name, a special kind of snapshot.



Clone VM directory files

These are the files in the actual VM clone directory, one directory for each provisioned desktop. Notice that the vmdk file has grown. This is the growth after booting windows the first time and letting it settle, 50M. Notice there are two more files here, one is the user D disk, which is persistent but thin provisioned, its grown to 23M in size. There is also a vswp file as the machine is booted, otherwise if suspended it would be the vmss.

There you have it. For this test of a 15G machine with just over 2.1G used, what would the storage look like for 1000 users. We will leave the user space aside, we need to cater for that either way. We just want to compare the old method with the new View Composer.

Parent VM is 16G.

Replica and source is 2.1G

1000 machines including swap is 600G

Grand storage space for 1000 users is 619GB.

Compare that to one week ago when it was 16TB, that’s some saving. Of course these figures a little extreme, we now have 1000 users running off a single 2.1G thin provisioned disk, its going to need a lot of spindles to deliver the IOPS required.

Exciting times. We are all going to see how View plays out over the next six months. There is some great architectural work to be done in designing for implementation.

Rodos (Rod Haywood)
Senior Consulting Architect - Virtualisation
Alphawest Services, Sydney Australia

Searchable VMware Technical Resource Documents Listing

VMware admins are always looking for quick reference to whitepapers and best practices. How do you do that?

Well there are some great sites which list links such as VMware-land. However the most popular document in VMTN (for non desktop products) is the "VMware Technical Resource Documents Listing" at http://communities.vmware.com/docs/DOC-2590

In a single page you will find the title, abstract and link to the PDF for every VMware published technical note. There are currently 195 documents listed.

The source for the listing is the VMware technical resources document list, however this has no usable search function, hence why this alternate listing was originally created. The excellent thing about the document list page is within seconds you can quick search with your browser for any keyword. Go on give it a try, jump over and search for a keyword that interests you. You may be surprised to find a paper that you never knew existed.

In the 11 months since I created this page it has had over 10,000 hits. This is dwarfed by some of the Fusion documents, one of which is over 33,000. VMware, its probably time to improved the search facility of the technical resource list. It would be great to be able to search within the documents, until then, at least we have a usable alternative.

Rodos