Solaris and High Wait I/O CPU
Given the fact that the application was running on a Solaris platform I looked at the vmstat history logs kept for just such an investigation. Per vmstat, for the time in question, there was plenty of idle cpu. Immediately, I thought this person must have been looking at the sar data on the machine in question. Sure enough, the sar data indicated a very low percentage of idle cpu. As you might have guessed, the percentage of time the system was waiting for I/O was rather large according to sar and, consequently, low idle time was being reported. I explained that it was typical for this system to run a high wait I/O percentage as reported by sar; after all, it is a database server with many processors. I also explained that low idle time as reported by sar does not necessarily mean a cpu bottleneck exists.
I remembered reading in Adrian Cockroft’s book, Sun Performance Tuning, that vmstat lumps wait I/O into idle time. So, naturally I was confident in my counter-assertion that our cpu utilization was just fine. I assuredly reached for my copy of the Sun Performance Tuning book to show where I had read this information years ago. I searched the index of the book and gave the book a cursory once-over to no avail. I started doubting whether I had reached for the wrong text! A bit frustrated I decide to perform a full book scan. Low and behold, I only got past two pages before my memory was vindicated. On page 3 it reads. “Whenever there are any blocked processes, all cpu idle time is treated as wait for I/O time! The vmstat command correctly includes wait for I/O in its idle value…” Viola!
The clock interrupt handler in the Solaris operating system runs every 10ms (or at least used to) to get cpu utilization information. It will search the state structure for each cpu and find that each cpu is in one of five states: user, system, idle, waiting for I/O or quiesced. Based on my understanding, the quiesced state is not really indicated by a value stored in a structure or variable associated with a cpu. It is simply the state when a cpu is not running user, system or idle threads and not waiting for I/O.
The point is, a high value for wait I/O generated from sar on a Solaris platform does not indicate a cpu bottleneck. Moreover, high wait I/O values do not necessarily indicate an I/O bottleneck. However, an I/O bottleneck could very easy manifest in high wait I/O percentages. You really need to look at your I/O service times to determine if the I/O subsystem is performing poorly.
For those wanting to know more on the algorithm used by Solaris to calculate idle and wait I/O cpu percentages read here. It is a bit dated, but describes how wait I/O is tallied in the Solaris operating system (at least in earlier versions). Interestingly enough this article cites Sun Performance Tuning, my trusty reference.