Friday, March 03, 2006

RMAN is Absolutely Fuzzy

This is a bit of a follow-on post to my The Pleasure of Finding Oracle Things Out post in which I spoke of some Oracle internal related mechanics relating to user-managed hot backups. In this post I want to focus on RMAN and how, subsequent to a restore of a data file from an RMAN formatted backup set or an image copy to disk, Oracle knows what needs to be recovered. Remember, with user-managed backups the checkpoint SCN is frozen after the checkpoint issued by the BEGIN BACKUP completes. This tells Oracle where to start recovery when the file is restored. We also know that the END BACKUP command creates a redo record that corresponds to this backup – tied together via the checkpoint SCN. During recover Oracle is privileged to the operations that define the backup-necessary redo. That is, the BEGIN/END BACKUP commands act as recovery bookends to preserve consistency.

However, RMAN is a totally different beast. We know that RMAN permits us to take hot backups and restore either image copies or backup set formatted files for recovery. This is well documented and I am sure you have reaped the benefits of this RMAN feature in your DBA adventures. But how does it work? (I love that question)

Yes, we read of RMAN handling the fractured block dilemma inherent in the user-managed backups by re-reading changed blocks to ensure block consistency. And, you have probably read that it reads the file header block first and only backs up blocks that have been previously modified (let’s stick to level 0 backups here). Oracle also issues a checkpoint on the applicable file(s) before backing up said file(s) for either a backup set or image copy. Okay, but what can we say of the bookends? When we restore a file from a backup set or an image copy what defines the bookends for the backup-necessary redo? Backup-necessary redo can be defined as the redo that is required to make a fuzzy/inconsistent backup file a non-fuzzy/consistent file. Of course, we often continue past the backup-necessary redo to get to a further point in Oracle time.

If you stop for a moment and really think about it, defining the bookends for RMAN is not a very easy proposition. We know it works. We use it. We like it. How often do we question it?

Let’s create a very rudimentary example to demonstrate the nature of the RMAN bookends. In this example we will backup a file (level 0) comprised of six blocks (all blocks previously used therefore subject to our level 0 backup) to a backup set – to disk or tape it does not matter. The figure below depicts the states of the blocks in our sample data file at different points in time. Block 0 is the header block and blocks 1-5 are for data. Listed under the block number is the checkpoint SCN for the header block and the block SCN for the data blocks. For example, at time=0 the state of the data file is a checkpoint completed at SCN 120 and blocks 2, 4 and 5 have been modified since the last checkpoint. Blocks 1 and block 3 were modified prior to the checkpoint at SCN 120. What would happen if we started an RMAN backup at time=0? Oracle would issue a checkpoint on the file and start reading the blocks in order from 0-5. For simplicity assume we have only a single backup process reading the data file during the backup. Also assume the backup finishes reading the data file at time=4. What would the backup piece “contain” such that a subsequent restore would be primed for recovery?

First we need to know which blocks are backed up, and when, before we can determine what will be written to the backup set. Let’s use the following diagram to depict the state of the backup set if the backup is started at time=0.


At time=0 RMAN issues a checkpoint against the data file and reads the header block into memory. Assume the header block is buffered in the Large Pool (Oracle could just as easily buffer the block in the memory of the shadow process backing up the data file). I found a presentation on the Web authored by a member of the Oak Table that asserts the header block is read first and written last. Indeed, we will see this needs to be the case to satisfy both bookends of the backup-necessary redo.

As we can see at time=1 we have only backed up block 1. At time=2 RMAN backs up block=2. At time=3 it backs up blocks 3-4. Lastly, at time=5 it backs up block=5. Remember, before RMAN started it issued a checkpoint against the file and preserved the SCN from the completion of the checkpoint in the buffered image of the header block – the header block checkpoint SCN in this case was 125 as the checkpoint completed at time=1. Notice the state of the blocks written to the backup set. They vary in Oracle time as they contain different SCN values. The fact that block=4 has an SCN that is greater than the checkpoint SCN recorded in the buffered image of the header makes the backup of this file inconsistent or fuzzy (file header status 0x40). How does Oracle resolve this? Well, Oracle reads the SCNs of each block during the course of the backup. If a single block backed up has an SCN greater than the checkpoint SCN all intermediate SCNs need to be accounted for during recovery. For instance, in our example block=1 was changed to SCN=126 while the backup was being performed. Moreover, it is less than the highest block SCN encountered (block=4 has SCN=127). However, the backup set has the state of this block at SCN=123. If Oracle knew the highest SCN encountered during the course of the backup it can account for any recover needed between the checkpoint SCN and this highest SCN.

As it turns out, Oracle reserves a section of the file header block of each file for just such an occurrence. This is called the Absolute Fuzzy SCN and represents the SCN required for recovery to make this a consistent file. Our bookends are then defined as the checkpoint SCN and the Absolute Fuzzy SCN. At a minimum, Oracle must recover from the checkpoint SCN through the Absolute Fuzzy SCN for consistency. If Oracle did not detect any SCNs higher than the checkpoint SCN during the backup then the backup would be considered consistent (file header status 0x0) and the Absolute Fuzzy SCN would remain at 0x0 - obviating the need for any backup-necessary redo to be applied. As you can see, this is the reason Oracle waits until all data blocks in the file have been read and written before it writes the header to the backup set. This permits the proper settings for the bookends. You can find the Absolute Fuzzy SCN at the bottom of a file header dump.

.
.
Absolute fuzzy scn: 0x0000.004da811
Recovery fuzzy scn: 0x0000.00000000 01/01/1988 00:00:00
Terminal Recovery Stamp scn: 0x0000.00000000 01/01/1988 00:00:00

What about image copies?

RMAN does not really “behave” any differently for an image copy. It stills performs a checkpoint and still notes the Absolute Fuzzy SCN if applicable. What is different however, is that an image is, well, just that, an image. Therefore, the position of the header block is in its natural position at the logical front of the backed up data – not the end as in the backup set case. This is yet another reason, if not the primary reason, why Oracle does not permit image copies to be backed up directly to tape – it could not set the Absolute Fuzzy SCN in the header block because the header block would have already been written to a sequential access medium.

13 Comments:

Anonymous Anonymous said...

great article - i've been wanting to know more about the underlying mechanics of this.
well explained and good diagrams.

3/19/2007 9:03 AM  
Blogger Eric S. Emrick said...

I am glad you liked the article. It was a really interesting bit of research.

3/19/2007 7:58 PM  
Anonymous Anonymous said...

Great article Eric. I have always wanted to sit down and do something similar but I am still trying to work out how folks like you find the time to devote to the detail and I have yet to do so :-)

Martin

11/02/2007 3:27 AM  
Anonymous Anonymous said...

Eric:

This was a great article.

Brad

5/01/2008 7:32 PM  
Anonymous Anonymous said...

Great Eric good work

I want to ask one question when there is no fuzzy blocks i.e during online backup using RMAN if there is no activity going on then i think there is no any block go to be fuzzy.But after restoring backup why still requires recovery.

7/16/2008 6:28 AM  
Blogger Eric S. Emrick said...

@ "when there is no fuzzy blocks i.e during online backup using RMAN if there is no activity going on then i think there is no any block go to be fuzzy.But after restoring backup why still requires recovery."

When RMAN is taking a backup and it encounters a block that is inconsistent (header/footer info do not match) it simply re-reads the block. It will continue to do so until it gets a consistent version of the block. The term fuzzy in the context of this article is with respect to the state of the file. Oracle requires the fuzzy SCN to be written to the file header to tell Oracle at what point in recovery the file would be consistent with respect to all of the blocks in just that file (it still might not be consistent with the database as a whole).

7/16/2008 11:19 PM  
Anonymous Anonymous said...

eric thanks for reply.

I might be missing some thing in yours article ,but as i understand from yours artcile that during rman online backup, blocks are backed up at diffrent time ,when backup start it triggers its own checkpoint.checkpoint complete and some blocks backed up just after checkpoint completed which has scn at that time but as time goes on blocks keep changing and come into that scn which has now greater scn then ckpt scn.

backed up files would have now some blocks scn which would be greater then ckpt scn which is so called fuzzy blocks or in other words we say inconsistent backup.

my question is if no activity goes on during backup alls block are not touched during activity (i.e no dml at blocks) then scn will never goes on above ckpt scn.

but still in this scenario this restored backup require recovery why.

7/17/2008 9:28 AM  
Anonymous Anonymous said...

Let me attempt to answer as to why you still need recovery. Even in case when the restored file is consistent with respect to all blocks in that file, the file itself may not be consistent with rest of the database, i.e it may not be consistent with other datafiles. Hence the need to recover it to bring it at par with the rest of the databases.
Am I right Eric?

I have one quetsion. The fuzzy SCN is recorded in the backup set right? How did you get the dump of the header recorded in the backup set?

8/22/2008 1:55 PM  
Anonymous Anonymous said...

Hi Eric,

great article
Would would happen if at the moment 5
, while block 5 is backup, block 4 gets written with SCN 128?

Thanks

8/27/2008 11:04 PM  
Blogger daspeac said...

I have heard about another way of howto repair table visual foxpro. Besides, you can visit my blogs at: http://daspeac.livejournal.com/ or http://daspeac.blogspot.com/ where I’m trying to share my experience with regard to data corruption issues

10/30/2010 12:04 PM  
Blogger Helen said...

Great post! Thank you so much for explaining!

12/24/2012 6:48 AM  
Anonymous Anonymous said...

This comment has been removed by the author.

5/01/2014 2:51 AM  
Blogger Path Infotech said...

Thanks for sharing the information

For more info : Oracle Certification Program

5/01/2014 2:53 AM  

Post a Comment

<< Home