Over at vinternals Stu asks if linked clones are the panacea that a lot of people are claiming about the storage problem with VDI? I say yes, however we are moving from designing for capacity to designing for performance, and VMware have given us some good tools to manage it. Let me explain a bit further.
Stu essentially raises two issues.
First, delta disks grow more than you think. Stu considers that growth is going to be a lot more than people expect, citing that NTFS typically writes to zero'd blocks before deleted ones and there is lots of activity on the system disk, even if you have done a reasonable job at locking it down.
Second SCSI reservations. People are paranoid about SCSI reservations and avoid snapshot longevity as much as possible. With a datastore just full of delta disks that continually grow, are we setting up ourselves for an "epic fail"?
These are good questions. I think what this highlights is that the with Composer the focus for storage for VDI has shifted from an issue of capacity management to performance management. Where before we were concerned with how to deliver a couple of TB of data now we are concerned with how to deliver a few hundred GB of data at a suitable rate.
In regards to the delta disk growth issue. Yes, these disks are going to grow, however this is why we have the automated desktop refresh to take the machine back to the clean delta disk. The refresh can be performed on demand, as a timed event or when the delta disk reaches a certain size. What this means it that the problem can be easily managed and designed for. We can plan for storage over commit and set the pools up to manage themselves.
To me the big storage problem we had was preparing for the worse case scenario. Every desktop would consume either 10G or 20G even though most only consumed much less than 10GB. Why? Just in case! Just in case one or two machines do lots of activity and because we had NO easy means of resizing them we also had to be conservative about the starting point. With Composer we can start with a 10GB image but only allocate used space. If we install new applications and decide we really do need the capacity to grow to 12GB, we can create a new master and perform a recomposition of the machines. Now we are no long building for worse case but managing for used space only. This is a significant shift.
So happens today there was a blog posting about Project Minty Fresh. This installation has a problem with maintaining the integrity of their desktops. As a result they are putting a policy in place to refresh the OS every 5 days. This will not only maintain their SOE integrity but also keep their storage overcommit it check.
In regards to SCSI reservations. I do believe that the delta disks do still grow at 16MB and not some larger size. So when the delta disks are growing there will be reservations, and you will have many on the one datastore. Is this a problem? I think not.
In the VMware world we have always been concerned about SCSI reservations because of server work loads. For server work loads we want to ensure fast and more importantly predictable performance. If we have lots of snapshots that SQL database system which usually runs fine now starts to behave a little differently. Predictability or consistency in performance is sometimes more important than the actual speed. My estimation is that desktop workloads are going to be quiet different. In our favor we have concurrency and users. All those users and going to have a lower concurrency of activity, given the right balance we should have a manageable amount of SCSI reservations, if not we rebalance our datastores, same space, just more LUNs. Also unlike servers, will users be able to perceive any SCSI reservation hits as they go about their activity. Given the nature of users work profile and that any large IOs should be redirected not into the OS disk but into their network shares or user drives the problem may not be as relevant as we may expect.
What Stu did not mention and we do need to be careful of because it can be the elephant in the room is IO storms. This is where we really do have some potential risk. If a particular activity causes a high currency of IO activity things could get very interesting.
Lastly, as Stu points out, statelessness is the goal for VDI deployments. Using application virtualisation, locking down the OS to a suitable level and redirecting file activity to appropriate user or networked storage is going to make a big impact on the IO profile. These are activities we want to undertake in any event, so the effort has multiple benefits.
I too believe you need to try this out in your environment, not just for the storage requirements, but also for the CPU, user experience, device capabilities and operational changes. VDI has come a long way with this release and I do strongly believe it will enable impactful storage savings.
What I really want is the offline feature to become supported rather than just being experimental. Plus I want it to support the Composer based pools. There is no reason why it can't and until then, there is still some way to go before we can address the breadth of use cases. However there are plenty of use cases now, which form the bulk, to sink our teeth into.
Rodos
Thursday, December 11, 2008
Subscribe to:
Post Comments (Atom)

7 comments: