Pages

Tuesday, October 30, 2012

Architecture effects cost

"Architecture effects cost", thats a simple and somewhat obvious statement. Those of us who architect systems know that there are outcomes for the design decisions that are made. Often there are tradeoffs and no environment lives in a world where money is unlimited. Well okay the unlimited funds projects probably do exist somewhere, but I have not had the pleasure of working in one of those environments in my career.

The concept of architecture effecting costs is a key element when you are architecting for Cloud solutions. One of the reasons this is key, is that your ongoing costs have a greater potential to be variable. This is a good thing, one of the benefits of moving to Cloud models is the elastic nature and you want your applications to be right sized and running efficiently both technically but also financially.

There are two considerations when thinking about costs for Cloud.

First there is the selection cost. As an example, if you were using Amazon S3 there are two durability options available, one is lower durability and also lower cost. If you have scratch data or data that can be regenerated you might simply choose to use the less expensive storage.

Second is then understanding your elastic cost, based on your predicted demand. Based on your forecast demand, what will your peak and average expenditure be over time? What size of cheque are you signing up to if you implement your selected solution?

An example of this crossed my mind this last week as a read a blog post from VMware on their vFabric blog. It was entitled "ROBLOX: RabbitMQ, Hybrid Clouds, and 1 Billion Page Views/Month".

Don't be distracted by the 1 billion number. Its an interesting article on how the application architecture for this company needed to be changed as they started to scale out. The example was an AWS customer and in order to speed up their service they introduced a message queue to decouple the synchronous relationship between the web services and the back-end services (as you do). Here is a pertinent bit of text.
To dive deeper, ROBLOX implemented RabbitMQ to help deal with the volume of database requests and slow response times.  The queue is managing 15 million requests a day. The example scenario is when there is an update to the ROBLOX catalog content, which needs to update the search index.  Before RabbitMQ, the web server would have to wait for an update to complete.  Now, the loosely coupled architecture allows the web server to update the message queue, not the database, and move on (technically, the update is enqueued and picked up by the consumers).  At some point, the message is picked up by a service that updates their search platform index.
Wonderful stuff. But but it hit me, why bother implementing RabbitMQ? It does not sound like they were using a lot of its sophisticated functions, surely they could have used the AWS Simple Queue Service (SQS). Thats when the "architect for cost" mantra kicked in. 15 million requests a day, that does not seam like much but lets see what that would cost for SQS. SQS is $0.01 per 10,000 requests and there are no data charges if the transfer is within a single region. Thats $15 a day which is reasonable. Thats about $465 a month. Thats about the same prices as a full license over a year potentially for RabbitMQ (or you could use the free version) but that is now a fixed cost and you have to factor in the cost for running the server to execute it along with the effort to maintain it (no cheating on the hidden costs). Looks like others have pondered this SQS vs RabbitMQ question as well (1, 2). However this is just an example of the point which jumped out this week.

So when architecting for Cloud, don't forget "architecture effects costs". With a Cloud there is not only the selection cost but also consider your elastic cost.

Rodos


Wednesday, October 17, 2012

Secure Multi-Tenancy

Edward Haletky (blog) once asked me what I thought secure multi-tenancy was in relation to Cloud Computing. I am willing to admit that at first I gave an answer reminiscent of a Ronald Reagan quote.

Due to the painful memory of this event the actual definition of what security multi-tenancy is has always had my attention. I work with a lot of vendors and despite what their marketing and sales people declare their understanding of this topic is wide ranging in maturity and reality. Just as there is Cloud washing there is a lot of secure multi-tenancy washing going on.

This week I actually had to sit down and document in a paper what secure multi-tenancy is. How would we know when it had been achieved?

Of course one usually starts such an activity with Wikipedia and Google, just like my children do for their school work.

Given how much you hear this phrase these days its a surprise that there is no clear and concise definition out there.  One of the earlier ones is from the Cisco Validated Design that they produced with Netapp. It states
... the capability to provide secure isolation while still delivering the management and flexibility benefits of shared resources. Both private and public cloud providers must enable all customer data, communication, and application environments to be securely separated, protected, and isolated from other tenants. The separation must be so complete and secure that the tenants have no visibility of each other.
That is reasonable but personally I did not feel it catered as a measure of success. So I sad down and tried to summarise what I have learnt and have been implementing over the last few years.

Secure multi-tenancy ensures that for a shared service:
  • No tenant can determine the existence or identity of any other tenant.
  • No tenant can access the data in motion (network) of any other tenant.
  • No tenant can access the data at rest (storage) of any other tenant.
  • No tenant can perform an operation that might deny service to another tenant.
  • Each tenant may have a configuration which should not be limited by any other tenant's existence or configuration. For example in naming or addressing.
  • Where a resource (compute, storage or network) is decommissioned from a tenant the resources shall be cleared of all data and configuration information. 
I have tried to keep it as succinct as possible. The last item regarding the clearing of all data and configuration is most familiar to people as the "dirty disk" problem.  You could consider that this item is a duplication of the 3rd point, that no data of one tenant can be accessed by that of any other. Yet people tend to forget about the residual information of both configuration or information that may remain thus introducing vulnerabilities. Thanks to my colleague Jarek for contributing this 6th item to my original list.

Do you consider the environments that you use meet all of this criteria? Does this criteria cover the required elements or does it cover to much? Appreciate your comments.

Rodos


Wednesday, October 10, 2012

Drobo Update - My experience of a failed disk drive

Back in November 2009, thats 3 years ago, I did a post about Drobo. What made this interesting was that it included a video of my then 12 year old son doing the unboxing and deployment of the device.

Well 3 years later a lot has changed. My son Tim is more of a geek than I am and a heck of a lot taller, and the range of Drobo hardware has changed. However the Drobo box has been faithfully running those last three years without incident. That is until last week when it suffered a drive failure.

When I originally installed the Drobo I put in 3 x 1TB drives. Then after about two years, as usage started to increase, I picked up another 2TB drive to insert into the remaining slot. No config required, just inserted the drive and let it do its thing. Thats the benefit of BeyondRAID.

Last week at work I get a text from Tim.
The drobo lights are flashing green then red, its got about 480g free. I dont have the drobo dashboard but if you tel me which drobo it is I can get it and see what it says.
Of course I remember, unlike Tim, that there is the indicator light cheat sheet on the back of the front cover. I ring Tim and he reads out what the lights mean.


The chart reveals the problem, the new 2TB drive has failed and a rebuild is occurring on the remaining 3 drives. Thats good. We have not lost any data, the unit is still operating, we can read and write data. The only change is that the available free space on the unit has reduced, as indicated by the blue lights on the front.

After quite a few hours, Tim and I start to get impatient about the rebuild time. I of course am expecting the rebuild to take a while, I know that even in big enterprise storage arrays this can take a while and that the older Gen2 Drobo that we have is does not have much grunt in its processing power.

To try and determine the time I browse a few sites on the Internet to read what others have experienced. It was a little disturbing to read so many horror stories about peoples rebuild times, sometimes its weeks and other stories of their units failing. Sounded a little ominous so we installed the Drobo Dashboard onto the Mac Mini it was now connected to in order to determine the rebuild time.

The estimate for the rebuild was another 30 or so hours and it had probably been running around 12 hours already. We went to bed thinking we had a long wait ahead. In the end the rebuild finished ahead of schedule and probably took around the 24 hour mark. For the age of the device and the fact that we lost the largest drive in a box and that was reasonably utilised, I think that is a reasonable time.

Turned out that the drive that failed was still under warranty (thank you serial numbers) but we figured we might as well go and get another 2TB drive and get back some free space, when the RMA arrives we can swap out one of the smaller drives with the larger one and get some new space.

After a fun trip to a computer store we slotted in the new drive. The crazy thing was that within under a minute the indicator lights started flashing again and one of the original 1TB drives was showing red, another failed drive. I have no idea if it was good or bad luck! But the drive was certainly failed. The rebuild was much faster this time.

A day or so later we found a spare 1TB hard drive in the study and threw it in the slot of the failed drive. All great. We are now back to where we were, plenty of space and redundancy. Here is the current state.


Now once that 2TB drive returns from the RMA we will still swap out one of the original drives.

So whats my thoughts on the Drobo after 3 years and experiencing a real life drive failure?

  • Everything works as advertised.
  • The setup was easy, we experienced that 3 years ago.
  • When a drive failed it was great that everything continued to operate as normal, we could still read and write data as we wanted.
  • It was great that it did not matter if we had the Drobo Dashboard software up to date or even installed, the unit took care of everything itself.
  • The rebuild after a failure did take time, but in our experience that time was reasonable and as it did not really on the computer, and that we could still utilise the device, the only thing inconvenienced was our patience.
  • Even thought the unit is starting to age, the software and firmware updates are still available.
  • Never trust a single device. All of data that is critical such as family photos, but also general data we would be inconvenient to loose is also backed up to the Cloud using CrashPlan. I don't care how highly available a storage unit is, it is not backup if its your primary copy! That is one concept I think a few of those people complaining about their Drobo experiences need to take heed of. 
There is sometime attractive about technology that is so smart it can make things so simple to use.

Rodos

Friday, October 05, 2012

IDWS - Come hang out with the storage geeks

If you are in Sydney next Tuesday (9th of October) come hang out with the storage and information systems geeks.

IDWS, or Information & Data World Symposium is on and a great place to catch up with vendors and providers, share experience with colleagues and industry, and of course learn new things.

Here is the blurb

A comprehensive technical symposium designed for data management professionals and IT practitioners to broaden their knowledge into all facets of building and maintaining their information infrastructures. 
The symposium will be educational and technical, targeted to all IT levels from CIOs to the skilled staff responsible for managing and protecting their companies greatest asset, it's data. 
Be engrossed as key industry players battle it out against each other at the 'Great Debate'. Throw in live tweets and feeds from the floor including voting and see who will be crowned champion on a range of key topics. 
The symposium will feature a Technical Lab for vendors to demonstrate real information infrastructure solutions as well as technical workshops to suit all delegates. 
This one-day event will cover Big Data Analytics, Cloud Storage & Services, Infrastructure Convergence, Data Management & Protection, Storage Security, Virtualisation and a lot more.
Registration is free, head to http://www.idws.com.au/ to learn more and register.
See you there.
Rodos
P.S. As you know I am a Board Member of SNIA who are part of this event.