Monitoring
Your monitoring system provides the following functions for you.
- Ensures that you are alerted to any pending problems
- Allows you to investigate the current and historical state of your environment to assist in trouble shooting
- Provides uptime and usage information for management reporting
- Provides capacity management projections
- Free space of Datastores
- Free space of Service Consoles
- List of orphaned snapshots
- List of long running snapshots
- Failed (automatic) VMotions
- VMware tools running in hosts
- Size of VC database
- Monitor CPU READY (ms) or CPU %READY per VM per host
- Monitor %CPU BUSY percentages per VM per host
- Monitor network and disk I/O usage per VM per Host
- Monitor service console memory swap usage
- Monitor VM balloon memory and swap usage
- Host downtime reporting
- Server hardware faults (power supplies, fans, IO cards, disks, CPUs, RAM)
- SAN hardware faults (disks and vendor specific)
Your monitoring will certainly consist of VMware vCenter Server and also your hardware monitoring platform. Often these are supplemented by a VMware specific product like Vizioncore vFoglight, Veeam Monitor or Nimsoft.
Your management processes and procedures provide the following functions for you.
- A list of maintenance activities to perform on a periodic basis
- formal heath check
- update templates with patches and updates
- A list of operational procedures on how to perform standard maintenance and trouble shooting tasks.
- A change management impact matrix to detail the potential impact and risk of a particular type of change.
- The procedure to create a new virtual machine
- The procedure to place a new virtual machine within the virtual infrastructure into a Production state. This may be identical to the physical server commissioning procedure.
- The procedure to place an ESX server into and then out of maintenance mode, migrating the guests onto other ESX Server hosts.
- The procedure used to contact VMware for support. It should include contact information and specify contact methods as well as means of collecting information.
- The procedure to add a LUN to an existing ESX server cluster.
- The procedure to patch a template used for creating virtual machines.
- The procedure to create a snapshot of a virtual machine.
- The procedure to restore the virtual machine state to its previous state at the start of the snapshot.
- The procedure for investigating user reported virtual machine performance issues. What to check and how to respond.
- The procedure to add a disk to an existing virtual machine.
- The procedure to expand the size of an existing disk for a virtual machine.
- The procedure to shrink a disk used by a virtual machine.
- The procedure to remove a disk from a virtual machine.
- The procedure to decommission a virtual machine.
- The procedure to migrate (VMotion) a virtual machine between ESX Server hosts in the same ESX cluster.
- The procedure to build an ESX server.
- The procedure to add an ESX server into an existing ESX cluster.
- The procedure to migrate a virtual machine between ESX Server hosts in the different ESX clusters (i.e. between datacenters).
- The procedure to confirm that a SAN link is active, to be used after a SAN link has failed and been restored.
- The procedure to confirm that a network link is active, to be used after a network link has failed and been restored.
- The procedure to enable the network group to troubleshoot user reported network / performance issues.
- The procedure for backing up/restoring VMs (VM-level and file-level).
- The procedure for backing up/restoring VirtualCenter database.
- The procedure for backing up/restoring license server files (or keys).
- The procedure for restoring VirtualCenter Server.
- The procedure for restoring ESX hosts.
Do you have any elements you also find important for Operations? Post in the comments.
Rodos

8 comments: