Server virtualization can lower costs and increase capacity on physical servers by sharing them among several virtual machines. However, there are a number of best practices to consider when using or thinking about server virtualization.
When using virtualization systems from VMware Inc. and other vendors, it's important to remember that there's a physical server with physical storage underneath the virtualization layer. This means that you still can't exceed the capacity of your basic hardware, and virtualization will give you less total capacity because of the overhead of the system. So while you can balance your load by putting applications with different peak loads on different virtual machines (VMs), you must still respect the limits of your hardware.
Because of this, server virtualization software requires a slightly different set of best practices when there are problems with your storage subsystem.
Poor I/O can impact the system
The most critical factor in a VMware installation is generally bandwidth. The I/O performance of your physical system will have more to do with the overall performance of the system than the storage capacity.
Anything that degrades I/O, such as a failed disk in a RAID array, will have a major impact on the performance of your virtualized system. If you're having performance issues, one of the first things you should check is I/O. For example, RAID arrays should continue to work even with a bad disk, but they will slow down considerably and likely affect your virtual machines as well.
Monitor virtual machine performance
Virtualization introduces a whole new level of performance statistics. In addition to being concerned with the performance of the underlying physical hardware, you now need to pay attention to the performance of individual virtual machines.
Fortunately, VMware can monitor a number of statistics to measure the performance of your virtual machines. For tuning and troubleshooting purposes, you should be familiar with these numbers, particularly disk command aborts and memory swap-in and memory swap-out.
Disk command aborts are requests that have timed out because the disk is taking too long to respond. This indicates a problem such as an I/O bottleneck or a poorly configured disk.
Memory swap-in and memory swap-out each measure activity in the virtual machine's virtual memory. A large number of memory swaps indicate that the VM doesn't have enough memory and its parameters need to be adjusted. It's important to note that this refers to memory assigned to the VM, not necessarily physical memory.
VMware's vCenter Server can help you make the most of server virtualization statistics. With this tool, you can maintain logs going back five years as opposed to the one hour maximum that the VMware ESX and VMware ESXi platforms provide.
Eliminate single points of failure
The failure of a single piece of hardware can take down a dozen or more virtual servers. To correct this, storage managers should carefully plan their storage infrastructure and eliminate single points of failure in their physical infrastructure. This includes such features as redundant data paths between the physical server and storage system, multiple host bus adapters (HBAs) on both ends of the storage-area network (SAN), and RAID or even mirrored RAID on your storage.
Ensure the write cache works properly
Because I/O is so important to virtualized systems, you need to ensure that the write cache on your RAID controller is working properly. For example, a dead or missing battery on the controller card can disable the card's write cache, which will lead to problems with the RAID controller.