Saturday, February 10, 2007

Physical Standby Turbo Boost

Is your physical standby database lagging way behind your production database? Maybe an outage to your standby environment has produced a lag that will not meet certain business requirements: reporting needs, disaster recovery time objective, testing, etc. When you don't have the luxury of performing a full production restore into your standby environment and your archive log files are not being consumed at an acceptable pace, you still have options that don't involve immediate architectural changes.

In some cases you can dramatically speed up your recovery time by copying a small subset of your production database to your standby environment and resume recovery. For example, if a large percentage of your database's write activity is absorbed by a small subset of your database you are primed for a standby recovery turbo boost. Notice I did not say small percentage of your data files. After all, you could have 90% of your writes going to 10% of your data files, but those data files might comprise 90% of your database footprint. In most cases a small percentage of your database files equates to a small subset of your database, but not always.

If a vast majority of writes go against a small subset of your database, how would copying these files to your standby give your recovery a boost? During recovery if Oracle does not need to recover a file it won't. All of those redo entries dedicated to recovering those files will just get passed over. Knowing this simple fact can help you get your physical standby database back on track to meet the needs of your business quickly.

The first order of business is to determine if the write skew condition exists in your database: those files, if copied to your standby, benefiting your recovery time the most. Fortunately, this information can be easily gathered using the v$filestat and v$datafile dynamic performance views in your production database. The following query will get you the top N most written to files in your database.

select * from
(select a.name, b.phyblkwrt from v$datafile a, v$filestat b
where a.file# = b.file# order by 2 desc)
where rownum < N;

If you know the data files that are getting written to the most in production then you also know the most frequently written to files on your standby during recovery. If Oracle can skip over redo entries during recovery then you avoid all of that physical and logical I/O against your standby data files. To recover a database block you have to perform a read and a write of that block. If your writes are somewhat evenly distributed amongst the files in your database then it will be more difficult to get that turbo boost. But, if 60+% of your database writes are absorbed by <= 10% of the database footprint you could gain a significant boost in the recovery time by shipping those files to your standby.

I know this is a rather short post, but this little tidbit just might help you get out of a physical standby database recovery dilemma.