VMware Infrastructure 3: Advanced Technical Design Guide and Advanced Operations Guide
by Scott Herold, Ron Oglesby, Mike Laverick
“Every VMware Architect should buy and read this book to assist their understanding of designing for VI3. By helping you understand the important design decisions in building a virtual infrastructure, and importantly showing the pro’s and con’s, you will be better informed as you make your architectural decisions. This book will help you think, rather than prescript an answer and that’s a good thing.”
Rodney Haywood
National Senior Consulting Architect for Virtualisation
Alphawest - VMware Premier and VAC Partner
Sydney, Australia
As a person who has spent a few years earning a living architecting many big and small VMware infrastructures I was keen to I get my hands on a copy of the “Advanced Technical Design Guide” by Scott Herold, Ron Oglesby and Mike Laverick. It had just the words that inspire me; “Advanced” which hopefully meant it had some meat to it and “Technical Design”, so something that might not be marketing or just another manual. I was also keen to validate my own practices and hopefully pick up on some new ideas or improvements.
One of my first activities at VMworld in September was therefore to head to the bookshop and get my hands on a copy. They sold out fast. It has taken me a while to get through the 352 pages, so here is the high level of what I think. In keeping with the books methodology we will have to put the answer into advantages and disadvantages.
Advantages
- Each of the elements of a virtual infrastructure is explained and the design options detailed. The key element here is that rather than providing wrote answers or saying it depends, the options available are compared, maybe an opinion given and then a list of advantages and disadvantages. This is exactly what the reader needs and it was refreshing to see, especially as it was consistently used throughout the chapters as appropriate. Each organizations requirements and situation is different and this approach helps a designer think through the options and come up with their own answer right for them.
- It’s written by people who know what they are talking about and have a passion for it. The scattered stories and examples give you a good feel that you are in the hands of people who have trodden this path before you.
- It’s great to have a book that is entertaining, as there are comments and geek humor buried in many pages. For example in the chapter on Business Continuity there is discussion on protecting Active Directory Controllers, it states “Unless you only have one Active Directory server in your environment, in which case you should close this book and proceed to beat yourself with it until you pass out, it will always be easier to deploy a new server and follow the proper steps to create a new domain controller and allow the …”.
- It covers the topics you need to address; it has breadth combined with sufficient depth.
- The books is dated already, being based on prior version of ESX. Often whilst reading you think, but that’s not true anymore, or that has changed. A book of this nature can’t but help suffer from this problem but it does feel like quite a period of time has transpired between writing and publishing. However I don’t think this is a significant problem, which I will speak too in my recommendations further on.
- The editing is to a poor standard. There is a high frequency of grammar and spelling errors that really should not be there, it gets a bit distracting. The authors are not professional writers, but isn’t that why you have a copy editor?
- The diagrams could do with some improvement and consistency. Some are barely readable because of the black and white printing, dark backgrounds that most likely looked wonderful on a computer screen have no contrast and look like a dark blob (p121). Other diagrams could be made conceptually a lot clearer, such as the networking ones in Chapter 6, where I certainly had to look twice to understand how port groups were being represented.
- I was expecting to see some worked out check lists or outlines for guidance but there are few. There is some details in the chapter on managing your environment but you feel you might want some more, maybe as an appendix.
- Focus on the principles presented and verify any specific features. As the software versions have changed you will need to validate current features and functions with their most recent best practices. So focus on the concepts, the why and the reasoning.
- Do some subsequent reading on a topic if you are going to make a decision. Quickly skim the relevant component of the VMware documentation, along with some online resources, the best of which will be vendor specific best practice guides, VMTN and the key bloggers.
- Read with a pencil and mark bits to come back and review. There is no point reading without taking any action. If you mark areas to come back and review you can focus on reading and understanding. Once you have finished, flip through and pick some areas to do some further research on and put some plans it place to update your infrastructure, next design or start a new project to make some improvements.
- In the Chapter 5 - Storage there is a nice diagram showing the parts of a storage fabric. In referring to the diagram the chapter states “If following the best practices of not only ESX storage configurations, but also the best practices defined for storage infrastructures, your environment should be laid out similarly to the example illustrate in Figure 5 – 1.
Now when you look at this diagram what jumps out at you in regards to being “best practice”. You got it in one, no redundant paths from the fibre channel switches the storage processors. As a reference on "best practice" in a “Advanced Technical Design” book this should not be so. For details on this topic see the relevant tip write up at the VI Team blog. - On pages 256 and 257 is the best analogy of how shares work that I have certainly ever heard. It involves screaming children and trying to convince your teenage daughter that a life of drunken debauchery might have appeal but it won’t lead to a prosperous career in IT. I won’t spoil it by repeating it here, you can enjoy it yourself when your copy arrives.
- Disaster Recovery is a topic I spend quite a bit of time in, which is covered in Chapter 10. I was a little surprised to read the following on page 323, “First, it takes a lot of planning and preparation to do right, and second, it’s never cheap when done correctly. DR with ESX server is no different. We have all heard the sales pitch as to how easy DR is with VMware, but is it really easy or cheaper? Maybe it’s easier; however, I am not sure about cheaper.” I disagree. No matter how you play it, DR with VMware is cheaper. You need less hardware full stop. Sure SAN replication is not cheap, you can do expensive DR configurations with VMware like you can in a physical world, but that’s not a comparison of with versus without VMware.
- In regards to Disaster Recovery I would have expected to see some reference to VMware licensing for DR sites and some techniques or guidelines, around this as this is a frequently asked question.
- Ignore the statement on page 318 that you can not run VCB on the same server as VC, this information is out of date. There is a good tip about having a VCB host for each VMware cluster as clusters also serve as shared storage boundaries. However some of the arguments are no longer valid, such as guaranteeing different LUN IDs if you don’t, because VCB now compares VMFS signatures (or NAA IDs for RDMs).
- There are only five pages on the topic of Physical to Virtual migrations. I think this is big enough of a topic to deserve more space. Yes, the book is about the VI infrastructure but anyone reading the book is going to have to deal with the large project of how to move 20 or 500 machines which currently exist into this new environment. There is much more that could be said and some methodologies, guidelines and check lists could be very useful to new entrants to the space.
- The sizing techniques presented through the book are brilliant. This is probably the element that I liked the most as you really get to see and learn from the experience of the authors. Since reading this I have certainly enhanced my sizing analysis methodology to include some of the ideas presented.
- It was great (and a small relief) to see so many of the recommendations in my own practice validated. We can’t all live in a depends world, you sometimes need to give a recommendation and just state there are edge cases which may be exceptions. Mixing workloads, server sizing, boot from SAN, Vizioncore backups, network configuration, service console software, avoiding silos, when and when not to use an RDM, LUN sizing; all had good alignment, which is not really that surprising as I suspect most experienced architects will be quite similar on these issues.
- No mention is made of UPS software or how to shutdown for power outages, this is often a question that people may turn to the book for and could be covered.
- I thought the details on host failure calculations for HA left out some details which I consider meet the “advanced” label. I would have liked to at least seen reference to slot calculations. I have had many whiteboard sessions trying to “educate” people about such HA calculations as presented in the book. If VMware improved their documentation this may not be such a problem.
- Likewise I think the handling of split brain and isolation response is weak. Some good details on advanced configuration options would be welcome here. I also disagree with the conclusion about the “Leave Powered On” isolation response. Unlike the author I have seen and heard of many corruptions due to forced shutdown. It’s not specifically a VMware issue but a guest OS issue, however it is caused by VMware. Many times I have seen someone take out the network switches due to a failed firmware upgrade or the like and cause every ESX host go into split brain mode, resulting in the killing of all of the machines in the cluster. Given the fact that most sites will do as the book recommends and build reliable and redundant networks, the occurrence of a split brain is most likely a major network issue affecting all servers rather than just one, so leaving powered on is in my opinion a better option. If it’s a single host that has failed, your monitoring system should pick it up quickly enough for action to be taken. Of course if the host actually fails you still have the desired result, as the ESX host is off the network and the locks on the shared storage are released, hence the VMs will be restarted as you want.
There you have it; the final verdict is a buy. I think the fact that I have actually picked it up a few times and quickly read a small section whilst thinking about a design related issue shows the overall usefulness of the book. Get yourself a Christmas present or tell your mum, otherwise you may be stuck with André Rieu!