Pages

Monday, February 08, 2010

Overlay Transport Virtualization

I am seeing quite a few comments appearing regarding Cisco Overlay Transport Virtualisation recently so I figured it was worth commenting on.

One element of Cloud is being able to easily move workloads around, in and out. The harder this is the more difficult the adoption. If you have ever architected or even harder, implemented, a DR solution you know that re-addressing the networking for a pile of machines can be either difficult, "non-trivial" or just down right impossible. At the same time we see companies such as EMC and F5 working on doing VMotion across sites.

An outcome of this workload migration into the Cloud and long term VMotion across sites is the spreading of layer 2 networks across physical sites. Of course this can be done today but there are all sorts of limitations and difficult bits.

Cisco have been working hard on this problem and have come up with Overlay Transport Virtualization (OTV).

Some big players such as Terremark have been trailing OTV, interconnecting three of their data centres. "[...] the company likes what it has seen as an early beta tester of OTV and the Nexus 7000 switches it runs on. Interconnecting data centers takes minutes instead of several hours, says Mike Duckett, Terremark’s general manager of network services." You can read more about Terremark and OTV at this Network World article.

Also, Omar Sultan of Cisco has posted some initial information and a video [shown below] detailing more about OTV.


Its important to note that at the moment OTV is a "Nexus 7000 specific feature" and that they are looking at supporting other platforms and submitting to the standards bodies. I think this is critical for Cisco, there is a lot of Catalyst 6500's out there and not all enterprises may be willing to migrate to the higher end Nexus rage yet without having all of the service modules available.

If you are thinking Cloud for the Enterprise, then my advise is you want to keep an eye on OTV.

Rodos

Saturday, February 06, 2010

Cisco sells over 400 UCS systems and executives bullish on VCE

Cisco had their quarterly quarter earnings call a few days ago [transcript care of Seeking Alpha]. These things are really interesting and I am starting to follow them. The senior people give some good summary of they latest goings on in their company and the analysts usually ask some good questions.

A few comments were made about Cisco UCS.
  • Chambers : "Our UCS numbers are still in the early stage of customer acceptance and palettes, but again showed sequential (inaudible) rates of over 100% and now over 400 customers have ordered from Cisco."
  • Chambers : "So in terms of the datacenter area we're starting to win the architectural battles, you're seeing the value on BTE with EMC and BM where you're seeing that it isn't about server standalone technology. We have no interest in that, but architectural plays. We're off to a good start. "
Interesting, 400 customers of which some would be large and some small purchases (remember how much UCS kit was at VMworld last year!). Also that the top brass view UCS as an architectural play. Interesting to see the reference to VCE (I think the BTE and BM are transcript errors for VCE and VM). Its not about isolated server technology.

There was a longer question and answer on UCS.
Richard Gardner – Citigroup

Okay great. Well, most of my questions have been answered, but I did want to ask you where you're seeing the most success in UCS? You're obviously on a pretty good trajectory there quarter to quarter. Where are you seeing the most success in terms of applications in workloads? Can you talk about who you go up against most often in competitive bidding situations? And why customers choice UCS over competing products when they (inaudible) as your product?

John T. Chambers

Let me take a little bit of cut at it, but I want to also not mislead you. We’re just up to 400 customers. Most of those are doing pilots and implementation. How the first couple dozen go in the big account service providers and enterprise to determine how your next wave goes. Why we’re winning, it’s an architectural play. I (inaudible) it’s a very well class product which in my opinion one that is well ahead of our competitors at this point in time. But it’s the architecture and how the network and possessing capability and storage capability comes together with the applications and the cloud. And the ability to build the architectures where many of the costumers are doing net.

Others are doing it and we’ve been surprised a little bit we’re off in the commercial market play with some real leading edge commercial customers just saying, hey, you save so much in terms of my splurge costs. So much in flexibility and you’re headed to where you’re going to go without locking me in and best in class products in each category. We’re going to line with you.

I probably would say, I would poll a discussion on tele-presences for the next quarter call. We’re probably two quarters out from being able to do the same meaningful discussion on the UCS side of the house. So what I think you can say is that we’re not only holding our own in the data center and virtualization. Padma, what you started with Cloud and what we’re really driving through, we’re having very good success with.
So early implementations of UCS have gone into big service providers and enterprise. How the early implementations succeed is key for ongoing success. Again mention of UCS being an architectural play with reference of how it is networking, processing and storage combining to deliver applications and Cloud.

Certainly looks like the stack is important to the executives at Cisco.

I wondered if it was the same over at EMC, so I dug up their latest earnings transcript. Asking about pipelines William Fearnley asked Joe Tucci "where are the bright spots here when you look across the world for 2010?" The second element to the answer was :
I think there are a lot of opportunities on the back of our partnerships with Cisco and VCE, and revitalizing our DELL partnership; I think those two are massive opportunities for us if we do them right, and I believe we can do them both right. Of course, as we really get into this next generation of how we take the cloud computing and really bring it, internal or private cloud market is going to be a big, big, big opportunity for us and how we really execute on that is phenomenal.
So Tucci is thinking the same as Chambers, VCE, big opportunities and Cloud. I suspect these guys are probably smarter than you or I. If they are betting big something might just be in this. Then of course, maybe they have just spent "big, big, big" money and they want some "big, big, bigger" return on their investment.

Rodos

Friday, February 05, 2010

UCS local disk policy + some vBlock

I have been reading through all of the VCE vBlock reference documents that were recently published as announced by Chad. The last thing we want is for our implementation to be forked, away from the blessed best practices. [jump to the end for brief comments on the guides, this post is about something else]

In the deployment guide it details various UCS manager policies that should be created, I noticed that it specifies creating a "Local Disk Configuration Policy" set to "No Local Storage". The default is for any configuration.

Sidebar - Local disk configuration policy explained.
What the Local Disk Configuration Policy does it configure up the installed disks in your blades as the service profile is deployed to them. Forget going into the BIOS and setting things up, this is virtual hardware and stateless computing people. You just pick a policy, of say RAID Mirror, and when your server profile is applied to the blade it configures the RAID controller automatically. As an aside, you can also have local storage qualifications to even say what size disk you want, so you can deploy your server profile asking it to find a spare blade that matches your requirements.
The reason why I noticed it was because this caught me out during deployment/testing. When it means No Local Disk it really means no disk. We started with this exact"No Local Disk" policy. During some deployment we noticed that no spare blades could be found. After a short time of head scratching we realised that the only blades left were some that had local disks. Its a true testament to stateless computing when you start to forget what hardware you have and where it is, just letting the systems consume it for you. A quick change to a policy of any configuration and it was off and deploying again.

Of course I am going to put the policy back the way it was eventually (when we pull the drives out of that set of machines), here is why :
  • Security - To perform stateless computing you are booting from SAN and local disks are usually not required. The only case would be local scratch disk that was transient. You don't want to be writing data to the local storage and then for some reason redeploy your server profile onto another blade, leaving that data behind, bad security move.
  • Scrub Policy - Those who know a bit about UCS may say, "Rodos, just create a Scrub Policy". A Scrub Policy scrubs the disk so that a subsequent service profile has clean disks. Problem is that its not effective. Not being one to trust anything I dug into how it scrubs, all it does is overwrite the start of the disk with some zeros, it does not scrub the whole disk with multiple passes. Its a future function to make it a more secure scrub but as it is now I bet you could somehow get at that data.
So my recommendation which concurs with the vBlock guidelines is. Boot from SAN, set a No Local Storage policy and let the automation of UCS stateless computing take care of things for you.

Rodos

P.S.

My thoughts on the VCE vBlock guides themselves. I have skimmed through them all, initial impressions. Of course I will send some notes to those inside the VCE organisation through channels but I figured people would be interested. Reading the VCN (Netapps) document is on my list too, will be interesting to compare.
  • Don't think these will do your work for you. They leave more as an "exercise for the reader" than you might think. Its not a design of your system and you are going to have to do some significant work to create a solution. I know, I have just done it.
  • There is a lot of detailed information in the deployment guide about UCS and UCSM, very detailed. There is a bit about the EMC storage and a token amount on VMware. Sure it is not a very fare comparison because its easy to describe and detail how to build up the UCS system, whereas in contrast its not like you can describe laying out a VMax in 20 pages. Also the VMax design and implementation service comes with the hardware anyway. The VMware component consists of how to install ESX, not a mention of vCenter Server. Nothing about setting up N1K and its VSMs or PowerPath/VE etc even though they are a requirement of the architecture. Not saying that should be there in detail, but you are not deployed without it and its not even mentioned. Contrast this to the UCS blade details which has every screenshot on how to check the boot from SAN has been assigned correctly in the BIOS.
  • My gut feeling is that no one from VMware really contributed to this, it was a Cisco person who did the VMware bits and EMC did theirs.

Wednesday, February 03, 2010

First APAC Virtualisation Roundtable Podcast

We just finished the first APAC Virtualisation Roundtable podcast which got off the ground due to Andre Leibovici.

You can listen via the widget below or go to the Talkshoe site.



There were quite a few people on the call, maybe around 20 with about 6 or 7 on voice and the rest on chat. Participants ranged from vendors, partners and end users.

Some topics were VDI, NFS vs FC, Xsigo, Cloud, FT, LabManager, 10G. None were a deep dive, we were getting to know each other and just chat over things.

I am sure over time as more people are able to get voice working on Talkshoe there will be more participation. Andre is looking at getting some guests for future events.

Lovely chatting to people, it should be fun. I am sure everyone is looking forward to next week.

Rodos

My Cisco UCS system in the lab can't talk to anything



If you are lucky enough to get a Cisco UCS system for your lab you might get a bit confused if your run it up and don't connect it to any upstream switches. That is you try to just use the blades to talk to each other.

The reason is that the default mode for the Fabric-Interconnect is End Host Mode and by default the uplink fail action is link down, so the NICs on your blades look down.

I have seen people hit this problem so thought I would quickly write something up.

Lets start with a refresher on the way the switching in the Fabric Interconnect works.
Your UCS Fabric Interconnects (F-I) can work in End Host Mode (EHM), the recommended setting or Switch Mode. In EHM the F-I "forwarding is based on server-to-uplink pinning. A given server interface uses a given uplink regardless of the destination it’s trying to reach. Therefore, fabric interconnects don’t learn MAC addresses from external LAN switches, they learn MACs from servers inside the chassis only. The address table is managed so that it only contains MAC addresses of stations connected to Server Ports. Addresses are not learned on frames from network ports; and frames from Server Ports are allowed to be forwarded only when their source addresses have been learned into the switch forwarding table. Frames sourced from stations inside UCS take optimal paths to all destinations (unicast or multicast) inside. If these frames need to leave UCS, they only exit on their pinned network port. Frames received on network ports are filtered, based on various checks, with an overriding requirement that any frame received from outside UCS must not be forwarded back out of UCS. However fabric interconnects do perform local switching for server to server traffic. This is required because a LAN switch will by default never forward traffic back out the interface it came in on." (source)
So local traffic between the blades stays inside and everything else is throw North bound to your main switches, which in this case don't exist. (Can you tell I am not a networking guy). Based on this, sounds like your blades will have no troubles talking to each other right, wrong.

Normally your F-I is going to be North bound connected, there is little sense being isolated. But remember, there are two Fabrics in your UCS environment for redundancy (or there should be). You are going to have two F-I, an A and B side, and each of these will be connected North bound.

For a visual picture of this see my previous schematic.

Now here is the kicker that cause the lab scenario problem. In the normal world what would you want to occur if your F-I lost North bound connectivity thus causing it to because isolated from the rest of the world? Your blades are going to be sending out traffic and its going to drop. Yet you have another F-I and because you have everything nice and redundant the traffic can probably go that way. You probably want the vNICs to go down so the system knows to send its traffic out the other Fabric.

So, in UCS there is an uplink-fail-action Network Control Policy. What this policy does is define what should happen when your uplinks fail (or in your lab case not even connected). By default the policy is link-down, which causes the operation state of the vNICs on your blade to go down in order to facilitate fabric failover for the vNICs. The alternative is warning which leaves them active. The setting is done through the CLI and can be found in the documentation here. So in your lab change the policy to warning and things should start working.

Of course in your lab you could change to Switch mode, but that would be no fun at all. Hopefully this helps someone from banging their head against the wall for a shorter time than you otherwise might.

Rodos

Monday, February 01, 2010

Hand me your pCard

Two businessman passing business card, close-up

Do your applications have a pCard yet? Given the hopes of some industry people one day they may.


Whats a pCard? pCard is a concept that James Urquhart (Cloud marketing at Cisco and Wisdom of Clouds blogger at CNet) is developing as a means of describing your workload to the Cloud providers.

Here is how James describes it in his CNet post "Payload descriptor for cloud computing: An update" to introduce pCard.
[A] pCard is a calling card for a software payload--whether simple single container payloads, or complex multi-container distributed payloads--that contains the information needed by a service provider to determine
a) if they can meet the needs of the payload, and
b) what kind of services are required to do so (and their costs).
Rather than having to deliver the whole payload of the workload you can just deliver the requirements via a pCard to determine if the provider can deliver your requirements.

Here is an idea of the structure which James has proposed.
[image from James Urquhart @ CNet]

The process would be that you send the pCard to a service provider, they process it and respond with the details as to their ability to meet your request. The response could include confirmation of those requirements which can be met and details of those that can not.

I really jumped on this idea as a great one, here is some of my thinking on it.
  • Cloud Brokers

    This is something that James has not mentioned yet but I am sure he has thought of it as well. Many people are predicting that a natural evolution in the Cloud marketplace is the rise of Cloud Brokers or what some call Cloud Arbitrage. You can read a bit more about these here.

    For a Broker to work they need to be able to process in an automated fashion your workload request, pass that to multiple providers and process the results to feed back your options. For this to work you you don't want to be passing the actual work load around. The type and amount of information may be different to the actual work load and you want it to be much smaller.

  • Security

    How much information do you want to give away about your application before setting up a trust relationship with a provider? There are certain items you may not wish to reveal about your application to a list of untrusted parties at this stage of the process. You want to throw out the question, "Hey, I have this workload that looks like this, has these mandatory and these desirable characteristics, can you handle it?" You may get many positive responses but its only after you make a selection that you increase the trust level up a notch to actually start sharing data.

  • Returned information

    In addition to a simple response to the requested attributes I propose that there will be additional information which needs to be returned by the provider, such as financial information or time frames. Therefore more than just acknowledging ability to execute the payload you can find out the cost of running that workload with a time commitment. "Yes we can run that application and its 90c per hour with a minimum of 60 hours and we can commit the resources for up to 30 days." Having a descriptor separate to the workload allows these different types of information exchanges to take place.

  • cCards

    The conversation does not have to be one way. You can imagine a Broker having to send a lot of requests to providers to have pCards processed. Its my idea that there should be an equivalent of the pCard for the provider, call it a cCard for the want of something better. A provide can describe the capabilities of their service in a cCard and this can be loaded into the your system or the brokers. Of course cCards would need expiry information and you would still actually need to send the request to the provider for the final processing. However if there were thousands of providers the broker could quickly filter that list into a sufficient subset based on the matching of data from your pCard and its set of cCards, so that your pCard only needs to be sent onto the relevant providers. For example, if you have a mandated requirement for geographical location of the service there is no point in sending the request to providers who can't provide a service in that geography.

  • Efficiency

    There is lots of effort going into creating a standard for workloads, through OVF, vApp and others. However vendors are always going to try and keep their edge and there is only so much standardization that can be done within the elements that have to actually do the grunt work of the actual workload. Having a separate and lighter construct could make things a lot easier. Certainly a pCard could be generated from the packaged workload or even before its packaged up. You could see an analysis system working through your existing workloads running scenarios of how they could be efficiently restructured, running some ROI scenario,s before any actual work was actually done.
These are certainly some long term concepts that we are contemplating here, but that is where these things start, somewhere. After all, someone had to wake up one day and consider that there should be a standard structure for portable virtual machines.

If you are interested in such things, if you think you may be a large consumer of pCards or if you are a service provider who may want to process and respond to pCards, or if you are a vendor who may need to work with them then I recommend you join in on the discussion on the development of the pCard idea.

James has set up a Google Group with a mailing list and there are a few of us throwing these ideas around. Why don't you visit and give some comments.


You never know, one day our clouds may just be shaking hands whilst they pass pCards around.

Rodos

APAC Virtualization Roundtable

Andre Leibovici is trying to start up a APAC Virtualization Roundtable run via Talkshoe like the VMTN Communities Roundtable.

The first one is on this Wednesday night at 9:00pm Sydney time.

If you are in APAC, or anywhere else you may want to join in.

http://www.talkshoe.com/tc/75046

Rodos