Recovery of Offline Data Files
You have probably noticed at one time or another that taking a data file offline in a READ/WRITE tablespace necessitates a subsequent recovery operation on that data file, to bring it online again. The same cannot be said for tablespaces under most circumstances. That is, we can offline normal a tablespace (to make a backup, for instance) and recovery is not required to bring the data files comprising the tablespace online. The reason Oracle requires that you recover your data file is Oracle does not checkpoint a data file when we take it offline. Whereas, for a tablespace taken offline "normally" each data file in that tablespace is checkpointed. This checkpoint operation synchronizes the STOP SCN in the control file and START SCN in the file header. For data files taken offline, Oracle only updates the STOP SCN for the control file entry related to that data file. The result is a control file that has a STOP SCN that is in the future of the Checkpoint SCN in the file’s header, and thusly recovery is required.
I don’t think this is terribly enlightening information for most DBAs. However, I have read on more than one occasion that we need to beware of taking data files offline because of the "potential" of having to roll through a vast amount of archive logs to bring it back online. Most items I have read on this topic, in essence, state that you need all of the redo between the offline operation and the current log to fully recover the data file. Assuming the control file mounted by your database is the current control file, this notion is not accurate. The only redo that is required to recover your data file subsequent to an offline operation is that which is generated between the Checkpoint SCN in the file header and the STOP SCN recorded in the control file.
Let’s create a simple mental exercise to understand why this is the case. Then we can go through a quick example using an Oracle database.
If a database is mounted using the current control file then the information that Oracle maintains for the file headers is assumed 100% reliable. When we take a data file offline we are telling Oracle to immediately render the data file inaccessible to the application. Oracle cannot do this without telling the control file WHEN this offline activity was performed (we are not dealing with OFFLINE DROP). By updating the control file STOP SCN for the data file this requirement is satisfied. Remember, the STOP SCN defines the WHEN in Oracle time. If Oracle decided to checkpoint the data file before taking it offline then recovery would not be an issue. Taking a tablespace offline "normally" proves this.
Okay, let’s assume in our single instance database we have 3 redo log groups with the current log at sequence number 100. Furthermore, let’s take an online data file offline. Assume that this offline activity occurs while log sequence 100 is still the current log sequence number. If after we take the data file offline we initiate 5 logs switches, then our current thread will have the current log sequence number at 105. Since we have 3 log groups we know that log sequence numbers (LSN) 100-102 should be archived (and probably even 103 and 104). So, to the question, why would Oracle require us to roll up through log sequence 105 when the data file was taken offline in log sequence number 100? The control file is current and has recorded the STOP SCN. Given the current status of the control file it is not possible that any change vectors with SCNs greater than the STOP SCN are relevant to the data file we took offline in log sequence number 100. Therefore, it seems abundantly reasonable that the only log sequences required for recovery are those between the Checkpoint SCN (LSN<=100) and the STOP SCN (LSN=100). After log sequence number 100 is applied, recovery should be complete. Now, let’s take this mental exercise and apply it to a real database scenario.
Here we see that Oracle requested archive logs 98 through 100 for recovery. If we had dumped the file header for data file 4 we would have seen that the last time the file was checkpointed the log sequence number was 98. This is evident by looking at the redo byte address associated with the Checkpoint SCN in the header dump. Nonetheless, we have seen using the example above that Oracle does not require all redo generated between the offline operation and the current log for complete data file recovery. Oracle only required the range of log sequence numbers that corresponded to the Checkpoint SCN (LSN=98) and the STOP SCN (LSN=100). Notice that we cannot assume Oracle has applied all redo between the offline and the current log simply because it stopped asking for my archive logs. If Oracle had really applied all redo between log sequence number 98 and the current redo log, then it would have necessitated the application of log sequence numbers 101 and 102 as they are archived and not present in the online thread. But Oracle did not require these logs. We can rest assured that Oracle has only done what it needed to do based on the information in the current control file and the data file header.
I don’t think this is terribly enlightening information for most DBAs. However, I have read on more than one occasion that we need to beware of taking data files offline because of the "potential" of having to roll through a vast amount of archive logs to bring it back online. Most items I have read on this topic, in essence, state that you need all of the redo between the offline operation and the current log to fully recover the data file. Assuming the control file mounted by your database is the current control file, this notion is not accurate. The only redo that is required to recover your data file subsequent to an offline operation is that which is generated between the Checkpoint SCN in the file header and the STOP SCN recorded in the control file.
Let’s create a simple mental exercise to understand why this is the case. Then we can go through a quick example using an Oracle database.
If a database is mounted using the current control file then the information that Oracle maintains for the file headers is assumed 100% reliable. When we take a data file offline we are telling Oracle to immediately render the data file inaccessible to the application. Oracle cannot do this without telling the control file WHEN this offline activity was performed (we are not dealing with OFFLINE DROP). By updating the control file STOP SCN for the data file this requirement is satisfied. Remember, the STOP SCN defines the WHEN in Oracle time. If Oracle decided to checkpoint the data file before taking it offline then recovery would not be an issue. Taking a tablespace offline "normally" proves this.
Okay, let’s assume in our single instance database we have 3 redo log groups with the current log at sequence number 100. Furthermore, let’s take an online data file offline. Assume that this offline activity occurs while log sequence 100 is still the current log sequence number. If after we take the data file offline we initiate 5 logs switches, then our current thread will have the current log sequence number at 105. Since we have 3 log groups we know that log sequence numbers (LSN) 100-102 should be archived (and probably even 103 and 104). So, to the question, why would Oracle require us to roll up through log sequence 105 when the data file was taken offline in log sequence number 100? The control file is current and has recorded the STOP SCN. Given the current status of the control file it is not possible that any change vectors with SCNs greater than the STOP SCN are relevant to the data file we took offline in log sequence number 100. Therefore, it seems abundantly reasonable that the only log sequences required for recovery are those between the Checkpoint SCN (LSN<=100) and the STOP SCN (LSN=100). After log sequence number 100 is applied, recovery should be complete. Now, let’s take this mental exercise and apply it to a real database scenario.
sys@10g:SQL> select status from dba_tablespaces
where tablespace_name = 'USERS';
STATUS
---------
ONLINE
sys@10g:SQL> select file_id from dba_data_files
where tablespace_name = 'USERS';
FILE_ID
----------
4
sys@10g:SQL> select sequence# from v$log where status = 'CURRENT';
SEQUENCE#
----------
100
sys@10g:SQL> select distinct group# from v$log;
GROUP#
----------
1
2
3
sys@10g:SQL> alter database datafile 4 offline;
Database altered.
sys@10g:SQL> select sequence#, archived from v$log;
SEQUENCE# ARC
---------- ---
100 NO
98 YES
99 YES
sys@10g:SQL> alter system switch logfile;
System altered.
sys@10g:SQL> /
System altered.
sys@10g:SQL> /
System altered.
sys@10g:SQL> /
System altered.
sys@10g:SQL> /
System altered.
sys@10g:SQL> select sequence#, status, archived from v$log;
SEQUENCE# STATUS ARC
---------- ---------------- ---
103 INACTIVE YES
104 INACTIVE YES
105 CURRENT NO
sys@10g:SQL> alter database datafile 4 online;
alter database datafile 4 online
*
ERROR at line 1:
ORA-01113: file 4 needs media recovery
ORA-01110: data file 4:
'/ora/data001/10g/10G/datafile/o1_mf_users_1pdkdtz5_.dbf'
sys@10g:SQL> recover datafile 4;
ORA-00279: change 12438697 generated at 03/28/2006 13:48:42
needed for thread 1
ORA-00289: suggestion : /ora/arch/10g/1_98_584918376.dbf
ORA-00280: change 12438697 for thread 1 is in sequence #98
Specify log: {=suggested filename AUTO CANCEL}
ORA-00279: change 12438704 generated at 03/28/2006 13:48:44
needed for thread 1
ORA-00289: suggestion : /ora/arch/10g/1_99_584918376.dbf
ORA-00280: change 12438704 for thread 1 is in sequence #99
ORA-00278: log file '/ora/arch/10g/1_98_584918376.dbf'
no longer needed for
this recovery
Specify log: {=suggested filename AUTO CANCEL}
ORA-00279: change 12438707 generated at 03/28/2006 13:48:50
needed for thread 1
ORA-00289: suggestion : /ora/arch/10g/1_100_584918376.dbf
ORA-00280: change 12438707 for thread 1 is in sequence #100
ORA-00278: log file '/ora/arch/10g/1_99_584918376.dbf'
no longer needed for this recovery
Specify log: {=suggested filename AUTO CANCEL}
Log applied.
Media recovery complete.
Here we see that Oracle requested archive logs 98 through 100 for recovery. If we had dumped the file header for data file 4 we would have seen that the last time the file was checkpointed the log sequence number was 98. This is evident by looking at the redo byte address associated with the Checkpoint SCN in the header dump. Nonetheless, we have seen using the example above that Oracle does not require all redo generated between the offline operation and the current log for complete data file recovery. Oracle only required the range of log sequence numbers that corresponded to the Checkpoint SCN (LSN=98) and the STOP SCN (LSN=100). Notice that we cannot assume Oracle has applied all redo between the offline and the current log simply because it stopped asking for my archive logs. If Oracle had really applied all redo between log sequence number 98 and the current redo log, then it would have necessitated the application of log sequence numbers 101 and 102 as they are archived and not present in the online thread. But Oracle did not require these logs. We can rest assured that Oracle has only done what it needed to do based on the information in the current control file and the data file header.
16 Comments:
Thanks, your weblog hit on Google helped us out this morning after putting a datafile offline...
Regards, Erik (DBA from The Netherlands).
Well said. With appropriate demo
It was very much enligtning to know how it works. Explained really well with example.
Hi,
Nice presentation. So can I understand that before making datafiles offline we should make sure that the database is in archive log mode.
If the DB is not in archive log mode then we should take a cold bakup and then only proceed with the making offline operation of the datafile.Otherways the DB may become unusable.
Pratheej,
You cannot really take a single data file offline (NORMAL or IMMEDIATE) unless you are in ARCHIVELOG mode. You can offline the file, but you must add the "DROP" suffix to the command. The DROP suffix is a bit of a misnomer. The backup of such a file would still be recoverable by using a redo log from the CURRENT online redo log group - insofar as it has not been overwriten. If you are in NOARCHIVELOG mode it is best to take the tablespace offline because a checkpoint on all files in the tablespace is performed. Subsequent to backing up your tablespace you can then online your tablespace without recovery, even in NOARCHIVELOG mode.
Hi Eric,
would you know how to drop a datafile that is in recover status when you dont have any of the redo logs available to recover it?
issue occured some months ago now - during a sap import we ran out of space - db went bang and post start up we got the needs space and continued - missing the datafiles in recover status. now the rqd redo logs are long gone as you can imagine!
these dont seem to be in use - last scn for these was in jan (when the issue occured).
tried to drop them but no luck - can only see an export,drop tablespace and import as a solution
Is there another (simpler) route?
interesting huh?
You are correct. The only way to remove the data file from the control file (and its applicable "free" space) is to drop the tablespace. I assume the missing files did not have any permanent segment extents as the export of which you speak would fail.
Hi Eric,
I have a problem exactly like above Anonymous but my datafile which is in recover status is in a rollback tablespace and have an active rollback segment.
When I wanna drop tablespace, Oracle returns :
ORA-01548: active rollback segment 'RBS0' found, terminate dropping tablespace
Do you have any suggestion for dropping this rollback tablespace ?
This comment has been removed by a blog administrator.
We needed a fast, concise track to recover and found yours, that was made to order! Thanks !
regards
Kevin (DBA ,Papua New Guinea)
Hi,
Excellent presentation !! I Understand now.
Can we have following step to avoid recovery.
SQL> ALTER SYSTEM CHECKPOINT;
SQL> ALTER DATABASE DATAFILE 4 OFFLINE;
I think, since we have taken a checkpoint before taking the datafile offline, we dont need recovery here.
Please clarify.
Thanks
Advait Deo
I really don't think that will work for you, as the OFFLINE command still does not issue a checkpoint that is meaningful to the recovery of the file in question. Give it a try and let me know.
Thanks!
Eric
Retrieve your data as quick as possible. Unfortunately even for us not all data is always retrievable. You should understand what is wrong with your device, and the necessary steps to retrieve your data from it. Your data is not only important but private, and therefore you need to strict confidentiality programmed in place to ensure that all your data is secure.
data recovery
Thanks,great Blog..
What all Tablespaces can we take into account to recover data files this way?
Hi Eric,
Today we have faced same kind of problem. Few non-system datafiles were in offline status. DB is up and running and is in archive log mode.we don't have archive files to recover the datafile. Please suggest how to recover offline datafiles.
Thanks,
Pavan Kumar
Odzyskujemy dane z wszystkich nośników dobrych i uszkodzonych fizycznie : telefonów i smartfonów Alcatel, Apple iPhone, BlackBerry, HTC, Huawei, Lenovo, LG, Motorola, myPhone, Nokia, Samsung, Sony, Sony Ericsson, Xiaomi, ZTE,
(wszystkich modele również chińskie telefony),
dysków twardych ssd, karty pamięci, pendrive,
monitoringów i wideorejestratora BCS po formacie dysku i nadpisaniu danych. Warszawa u nas możesz odzyskać dane !
Odzyskiwanie danych Warszawa data recovery Warsaw
Post a Comment
<< Home