Thursday, February 16, 2006

Reliable Hot Backups Without Oracle Intervention?

In most cases when we need to take a hot backup of an Oracle database we employ either the user-managed hot backup approach using the ALTER TABLESPACE BEGIN BACKUP facility or we simply take an RMAN backup. Of course, there are measures that can be taken to hot backup your database using hardware mirrors by enlisting a combination of the ALTER SYSTEM SUSPEND, ALTER TABLESPACE BEGIN BACKUP commands and vendor specific mirror control interfaces. However, each of these approaches has a downside.

ALTER TABLESPACE BEGIN BACKUP

1. Redo generation overhead and potential redo infrastructure contention.
2. Backup processing consumes production server resources.
3. To minimize performance impacts to production iterating through your tablespaces can take quite some time.

RMAN

1. Backup processing consumes production server resources.
2. Careful planning to ensure the RMAN metadata is maintained, i.e. frequent control file and/or RMAN catalog backups.
3. Additional layer of backup and recovery abstraction.

ALTER SYSTEM SUSPEND and Hardware Mirrors

1. Redo overhead associated with the recommended ALTER TABLESPACE BEGIN BACKUP.
2. All tablespaces must be in backup mode prior to splitting; once again redo overhead.
3. Suspension of all I/O causing immediate outages to application processing until your mirror is split. While this suspend/split might only take minutes, some 24x7x365.25 applications cannot afford zero application activity on a nightly or semi-nightly basis.
4. The cost of a third mirror. Ideally we don’t want to split our primary mirror as we immediately become susceptible to media failure during the time the split mirror is out of sync with your production database.

It would be nice to avoid as many of these side effects as possible in your backup strategy. That is, could we take a "hot backup" that 1) Does not require our tablespaces to be placed in hot backup mode 2) Does not required ALL application data to be suspended during a portion of the backup phase and 3) does not require another layer of backup and recovery abstraction via additional Oracle metadata maintenance?

Yes. EMC has a really good suite of products in their TimeFinder [tm] solution set that can accommodate. Within the TimeFinder solution set is the notion of a Composite Group [tm]. The Composite Group can be defined for the set of primary devices comprising your production database paired with Business Continuance Volumes (BCV), or simply, software controlled mirrors. When you synchronize your primary devices with their respective paired BCV devices and initiate a consistent split operation the EMC subsystem suspends WRITE operations to your primary devices only for the duration of the split operation. However, the split is for a consistent point in time and typically takes less than a few seconds. In recent versions of the TimeFinder product read activity is permitted to flow. This differs from the ALTER SYSTEM SUSPEND approach in that Oracle cannot guarantee the immediate termination of I/O and the read activity is necessarily suspended. The end result of this consistent split is an “aborted” database on the mirror.

How do we convert this “aborted” image of the database on the mirror to a database that is meaningful for backup and recovery? Startup the database in mount mode and issue the RECOVER DATABASE command. Remember, the online redo logs are preserved in the synchronization process to the same point in time as the data files and control files. The RECOVER DATABASE command makes the database consistent with respect to a single point in time. As a matter of fact, the file headers have consistent stop SCNs and a file status flag of(0x0). Consequently, your mirror taken while the database was up and functional (1-3 seconds of write suspension) has been converted to a consistent backup. Please note, if you intend to backup your database(s) from the mirror devices, you cannot open the database on said devices as that would rollback transactions that, in production, might have been committed. Naturally, there are license fees associated with TimeFinder and the third mirror needs to be purchased, among other considerations. But these additional expenditures can be evaluated against the benefits of the solution to determine if the product is worth the investment.

This stuff really works and is very reliable!

14 Comments:

Blogger Steve Eck said...

Mounting the database and recovering it to a consistent PIT is an interesting idea.

It does mean:

1) You can't mount the BCVs back to the production server (which is a disturbingly common practice) without adding some serious risk in bringing the BCV copy up to mount and recovering it.

2) You need oracle software installed on the server you DO mount the BCVs on.

Another possibility is putting the database in backup mode for duration of the split.

This has the downside of having the same disadvantages you listed, although the time in backup mode is limited to a few minutes. But the plus side is that the copy on BCV is directly usable just like any other hot backup.

2/23/2006 10:27 AM  
Anonymous Anonymous said...

Both oracle and EMC recommend doing the alter tablespace backup before doing the split. ( And that's how we do it here ). While your mileage may vary, here we are only in "backup" mode for a second or two here ( the splits are almost instantaneous ).

We then bring up the split database mirror on a backup host and get a backup over there.

You have to fudge around with the control files somewhat but it's all well documented from EMC and oracle.

2/23/2006 11:05 AM  
Blogger Eric S. Emrick said...

John Hurley said...

Both oracle and EMC recommend doing the alter tablespace backup before doing the split. ( And that's how we do it here ). While your mileage may vary, here we are only in "backup" mode for a second or two here ( the splits are almost instantaneous ).

We then bring up the split database mirror on a backup host and get a backup over there.

You have to fudge around with the control files somewhat but it's all well documented from EMC and oracle.


If you use device groups then yes putting your tablespaces in backup mode is required. However, if you use EMC's consistency technology, either BCVs or Clones, then this is not required. It really works very well. I have performed countless restores from backups taken using this approach.

If you have an extremely DML-intensive system with many tablespaces it can take quite a while to get all of your tablespaces in hot backup mode. The requisite checkpoints slow down the process. Moreover, having all tablespaces in backup mode simultaneously can cause enormous amounts of redo to be generated. The Consistent Split technology offered by EMC does not require this approach as the resultant BCV(s) comprise what is viewed by Oracle as an aborted instance; no need messing around with the control files, etc. Simply, mount the database and recover, but do not open, and backup to tape.

But, if you can get in and out of backup mode in a reasonable amount of time, then the resultant checkpoints aren't and issue and likewise with the redo generation. It really all depends on your change volume.

Regards,
Eric

2/23/2006 1:19 PM  
Blogger Eric S. Emrick said...

Steve eck said...

1) You can't mount the BCVs back to the production server (which is a disturbingly common practice) without adding some serious risk in bringing the BCV copy up to mount and recovering it.

2) You need oracle software installed on the server you DO mount the BCVs on.


Gulp...mounting the BCV file systems back to the production server, while technically possible,is a shaky and dangerous proposition at best. Yes, a second server that is scaled down for mere backup purposes is highly recommended.

2/23/2006 1:28 PM  
Anonymous Anonymous said...

Steve,
I'm currently backing up 5TB using the approach described in that EMC+Oracle whitepaper and it works like a treat, however.....
Oracle 9i + 24GB db_cache + 4000 datafiles (don't ask) = 3 hours to put all the tablespaces into backup mode!
This is a known issue with Oracle up to 9i and I'm told that the algorithm has been changed in 10g to avoid this problem but that won't help me for the next 12 months.
I'd love to use your approach but it has to be fully supported by Oracle.
Can you, or anyone else, point me to any Oracle reference describing running "recover database" on an aborted instance to prepare a good backup?

6/20/2006 8:23 AM  
Blogger Eric S. Emrick said...

Hi Greg,

There are many things that Oracle does not openly certify with regard to backup and recovery - just too many scenarios. However, rest assured that issuing a "recover database" against a database that has had all of its instances "aborted" is a perfectly legitimate means to prepare a backup. You can dump the file headers after the "recover database" command and you will notice that each online file has a consistent status (0x0). There is no difference between the states of these files and those of files, say, after a cancel based recovery. Both scenarios yield consistent files that can be used for "more" recovery. I suppose the first question is "why doesn't Oracle think the file is fuzzy when the RECOVER DATABASE command is given?" It does if you simply backup an online file without putting it into backup mode and try to restore and recover it, right? The difference here is that the control file used for recovery is the current control file. Consequently, there aren't any checkpoint discrepancies between the control file records and the corresponding data file headers. Perform your own restore/recovery scenarios using small test databases. My biggest apprehension with EMC consistent splits was the consistent split technology. However, to date I have not had any issues with recovering databases using backups taken from consistently split BCVs or Clones.

6/20/2006 7:47 PM  
Anonymous Anonymous said...

What are your thoughts/experiences for point in time recovery and the consistency of the instance using this approach

12/01/2006 9:15 AM  
Blogger Eric S. Emrick said...

Anonymous said...

What are your thoughts/experiences for point in time recovery and the consistency of the instance using this approach

Eric said...

Once the consistent split has completed and the recover database command has been issued against the database(s) on the BCV/Clone, the mechanics to perform PIT recovery to a future date are simple. Now, after you backup your database(s) on the target device it can be subsequently restore and used for PIT recovery. However, the the restored control file will be a "current" control file in relation to the data files restored. To roll forward this database to some point in time in the future you need to "classify" the control file as a backup control file. This is easily accomplished by issuing the "RECOVER DATABASE...USING BACKUP CONTROLFILE"

I hope this helps.

12/03/2006 3:08 PM  
Anonymous Anonymous said...

hi eric,
first thanks for nice blog.it realy helps a lot.

1. we are doing a test phase with bcv backup (xp1024). now we split it. and give it to another machine. on that machine i have asm instance and db instance. now i want to up the database.by the way i am taking archivelog backup every 5 min to rman catalog.

its a test phase i am working with it.
second thing is which is one is good suspend the database before splitting or put in backup mode?? plz explian this things in details.

3/06/2007 5:31 PM  
Blogger super said...

HI
I would like to know which backup mode is good..etc using TimeFinder solution with hotbackup or just TimeFinder solution with out hotbackup mode?

9/11/2007 4:06 PM  
Blogger super said...

HI
I would like to know which backup mode is good..etc using TimeFinder solution with hotbackup or just TimeFinder solution with out hotbackup mode?

9/11/2007 4:07 PM  
Blogger super said...

HI
I would like to know which backup mode is good..etc using TimeFinder solution with hotbackup or just TimeFinder solution with out hotbackup mode?

9/11/2007 4:07 PM  
Blogger daspeac said...

I have heard about another way of pdf repairing. Besides, you can visit my blogs at: http://daspeac.livejournal.com/ or http://daspeac.blogspot.com/ where I’m trying to share my experience with regard to data corruption issues

10/30/2010 12:10 PM  
Blogger Path Infotech said...

Path Infotech is in the field of oracle training program from past several years.

For more info : Oca Certification

5/15/2014 4:31 AM  

Post a Comment

<< Home