My VMWare Server has 2 300GB SATA drives installed, using software RAID. I’ve recently discovered why this system has been rather slow – the jumper on the drives was set to restrict throughput to 1.5Gb/sec. After changing this jumper, and installing updates, the system seemed to behave better, but not always.
After talking to a friend, I’ve learned that IDE (yes, SATA still communicates with IDE) is a really dumb protocol. Meaning… it leaves lots of processing to be done by the host computer in software, rather than handling the data totally on its own. Being that this is a virtual server host, more than one virtual server (VM) is runing on it. Each server is accessing the hard drive for data I/O, creating a queue of work to be pushed to the disks. The host then has to process these changes, and send the commands to the disks.
During one particularly heavy load, I was watching ‘top’, and found the following:
top - 02:01:53 up 13 days, 39 min, 2 users, load average: 24.65, 20.69, 12.11
Tasks: 110 total, 4 running, 106 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.2%us, 9.0%sy, 0.0%ni, 2.0%id, 84.9%wa, 0.0%hi, 0.9%si, 0.0%st
WOW! I’ve never seen a load average that high! Looking further, even though the CPU is not under a heavy load, we se that 84.9% of the CPU’s time is simply waiting for I/O to complete.
I recently ordered a 3WARE SATA RAID card to install into this machine, and hopefully alleviate this issue. Meanwhile, I wait for this process to complete, before I’m able to access the VMs reliably.