Wednesday, December 06, 2006

Increasing the Longevity of Your CPU

One of my current assignments is to evaluate the potential to increase CPU headroom on a server running a large OLTP Oracle database. Of course, any project such as this is typically motivated by the desire to save money by foregoing a seemingly imminent hardware upgrade. Realizing more CPU headroom for your Oracle database server can be achieved, but not limited to, the following approaches:

1. Add more same-speed CPUs to your existing server.
2. Replace your existing CPUs with faster CPUs.
3. Replace your existing CPUs with a greater number of faster CPUs.
4. Commission a new server platform with more same-speed and/or faster CPUs.
5. Commission a new server platform with a greater number of slower CPUs.
6. Chronologically distribute the load on the system to avoid spikes in CPU.
7. Reduce the work required of the system to satisfy the business.

More times than not I suspect approaches 1-5 are chosen. I am of the opinion that 1) and 3) are more predictable when trying to evaluate the expected CPU headroom yield. Propositions 2), 4) and 5) can be a little less predictable. For example, if I double the speed of my current CPUs will I yield the upgraded CPU cycles as headroom? That is, if I am running 10x500MHz and upgrade to 10x1GHz will I now have the additional 5GHz as headroom? It has been my experience that upgrades such as these do not produce such predictable results, especially if your current box approximates 100% utilization. Certainly, moving to a new server with a greater number of same-speed and/or faster CPUs is a tricky proposition. New servers need to be tested using Production volume with great rigor. While at face value 500MHz would appear to be universally “portable” to any server, there are many other factors that can influence your CPU horsepower: memory architecture, amount of processor cache, processor crosstalk, etc. Options 1-5 can all be very costly and in some cases yield undesirable and unpredictable results.

If you have the luxury of distributing the load on your system to avoid spikes in CPU then that is a great first option. It could buy you more time to evaluate a longer-term solution. For example, shifting any batch jobs to off-peak OLTP hours might give you immediate relief.

What if we can simply “do less” to satisfy the needs of the business? This concept is not new to most Database Administrators and rings a rather cliché tone. After all, aren’t we brow-beaten by the dozens of books and countless articles that speak to database and SQL optimization? The “do less” principle is very sound, but it can be intractable. Reducing the work required of an application often requires management support and can run into political obstacles at every turn. Getting Application Developers and Database Administrators to work in lockstep can require a significant effort. If Management, Developers and Database Administrators buy into a synergistic endeavor the benefits can be amazing – and can save the company a large sum of money.

If you are lucky enough to be working on a project where the common goal of all parties is to reduce the CPU load on your system then I have learned a few things that I hope can help you.

Identify the Targets for Optimization

Identify those SQL statements that contribute the greatest to the CPU load on your database server. These statements usually relate to those that produce the most logical I/O on your database. Caution needs to be taken when trying to identify these statements. You shouldn’t focus solely on those statements that have the highest logical I/O (LIO) to execution ratio. Often you will find statements that are well optimized but are executed with extremely high frequency. Look for the aggregate LIO footprint of a SQL statement. Without Statspack or AWR this analysis might be very difficult. However, if you collect this diagnostic data you can use the LEAD analytical function to craft a nice SQL statement to identify the top CPU consuming statements on your system (join stats$sql_summary and stats$snaphot).

Don’t limit your SQL statement identification to just those statements flagged by your analysis as a top CPU consumer. Go another step and identify the most frequently executed statements. Some of the most frequently executed statements are the most optimized on your system. These statements if executed by many programs concurrently can influence concurrency and thusly CPU load. One approach I took recently identified the top 20 CPU consuming statements during a 12 hour window of each week day. I then ran the same analysis against the most frequently executed statements on the system. The results yielded only 31 distinct statements as 9 were on both lists. The amazing thing is that, on average, these 31 statements contributed to 58% of all logical reads on the system and 59% of all executions. Keep in mind that there were over 20 thousand distinct statements cached in the Shared Pool. It is rather amazing that such a small subset of the application footprint contributed so greatly to the aggregate load.

Ask The Right Questions

The identification phase is crucial as you want to optimize that which will yield the greatest benefit. Subsequent to the identification phase the Database Administrators and Developers can sit and discuss approaches to reduce the load incurred by these SQL statements. Here are some of the key points I have taken away during such collaborative efforts.

1. Is there a better execution plan for the statement? Optimization is often achieved by rewriting the query to get at a better execution plan. While I don’t like hinting code, they can relieve pressure in a pinch.

2. Does the statement need to be executed? If you see SQL statements that seldom/never return rows (rows/exec approaches 0) there is a possibility it can be eliminated from your application.

3. Does the statement need to be executed so frequently? You might be surprised that Developers often have other application-side caching techniques that can dramatically reduce the frequency of a statement’s execution against the database. Or, the application might simply call the statement needlessly. It doesn’t hurt to ask!

4. Are the requirements of the business immutable? Sometimes you can work an optimization by simply redefining what is required. This is not the tail wagging the dog here. It is possible that the business would be completely happy with a proposed optimization. For example, can the query return just the first 100 rows found instead of all rows.

5. Do the rows returned to the application need to be sorted? Highly efficient SQL statements can easily have their CPU profile doubled by sorting the output.

6. Are all columns being projected by a query needed? If your application retrieves the entire row and it only needed a very small subset of the attributes it is possible you could satisfy the query using index access alone.

7. Is the most suitable SQL statement being executed to meet the retrieval requirements of the application? Suitability is rather vague but could apply to: the number of rows fetched, any misplaced aggregation, insufficient WHERE clause conditions etc.

8. Are tables being joined needlessly? I have encountered statements that Developers have determined are joining a table, projecting some of its attributes, without using its data upon retrieval. The inclusion of another table in such a manner can dramatically increase the logical I/O required. This is extremely difficult for a DBA to discern without intimate application code knowledge.

9. How well are your indexes clustered with your table(s)? Sometimes data reorganization techniques can greatly reduce the logical I/O required of a SQL statement. Sometimes IOTs prove to be very feasible solutions to poor performing queries.

10. Can I add a better index or compress/rebuild an existing index to reduce logical I/O? Better indexing and/or index compression could take a query that required 10 logical I/O operations down to 5 or 6. This might feel like a trivial optimization. But, if this statement is executed 300 times each second that could save your system 1,500 logical I/Os per second. Never discount the benefit of a 50% reduction of an already seemingly optimized statement.

11. Can I reorganize a table to reduce logical I/O?

I suspect most of us have read that 80% of optimization is application centric (I tend to feel that the percentage is higher). Usually the implication is that the SQL being generated and sent to the database is 80% of the target for optimization. More specifically, optimization requires the tuning of SQL 80% of the time. However, don’t limit your efforts to optimize your application to “tuning the SQL.” Sometimes a portion of your optimization will include “tuning the algorithms” used by the application. Needless execution and improper execution of SQL statements can be equally destructive. Hardware resources, in particular CPUs, can be very expensive to purchase and license for production Oracle databases. It is well worth the effort to at least investigate the possibilities of increasing CPU headroom by decreasing CPU utilization.

Update: An astute reader suggested I mention Cary Millsap's Optimizing Oracle Performance with regard to this topic. I highly recommend reading this book as it weighs in heavy on Oracle optimization and Method-R. Trust me if you have optimization on the brain don't miss this read.