Blockrecover in Standard Edition? Nope, but close enough.

Halloween started really scary this year – some company called us and said, that they have some problems with their Oracle Database Standard Edition.

Apparently they had a storage crash six months before, which caused some blocks to get corrupted and from that time no one has noticed that RMAN is not producing any backups.

But the system was working fine so no one was complaining at the business level… until the Halloween 2022.

It has occurred that twice a year they do some complex reporting and while doing it, they noticed corrupted blocks in some tablespaces.

Blockrecover would be a blessing but it is possible only in Enterprise Edition, so they tried to restore and recover affected database files from 6 months before, and it was not a surprise that the process of applying that amount of archivelogs lasted a very, very long time!

So they called us to verify if there is any possibility to resume their business process within a few hours and not… days.

Well, let’s give it a try!

SQL> select * from hr.employees;
select * from hr.employees
       *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 19, block # 205)
ORA-01110: data file 19:
'/u01/app/oracle/oradata/ORCL/F07C665E552C42C5E0530400000A57EF/datafile/o1_mf_us
ers_ktc4b0t1_.dbf'

The errors looked similar to the one above. But there were much more database blocks 🙂 In rico2 blocks were filled with some random data:

rico2 > d
 File: /u01/app/oracle/oradata/ORCL/F07C665E552C42C5E0530400000A57EF/datafile/o1_mf_users_ktc4b0t1_.dbf(15)
 Block: 205 Offsets: 0 to 512		Dba: 0x3c000cd
---------------------------------------------------------------
64757061 64757061 64757061 64757061 | dupadupadupadupa
64757061 64757061 64757061 64757061 | dupadupadupadupa
64757061 64757061 64757061 64757061 | dupadupadupadupa
64757061 64757061 64757061 64757061 | dupadupadupadupa
64757061 64757061 64757061 64757061 | dupadupadupadupa
64757061 64757061 64757061 64757061 | dupadupadupadupa
64757061 64757061 64757061 64757061 | dupadupadupadupa
64757061 64757061 64757061 64757061 | dupadupadupadupa
64757061 64757061 64757061 64757061 | dupadupadupadupa
64757061 64757061 64757061 64757061 | dupadupadupadupa
64757061 64757061 64757061 64757061 | dupadupadupadupa
64757061 64757061 64757061 64757061 | dupadupadupadupa

Of course waiting a few days for applying hundreds of gigabytes of archivelogs was not an option… but a thought came to me, that if no one noticed this error before, those blocks were not being touched since the storage crash!

So in theory it could be possible to just restore the database files from before the crash, find the blocks and copy them from the source datafiles to the current ones.

The only problem is, how to be sure, that a block was not changed during this whole time?

It is actually pretty easy! You have to scan all archivelogs and verify if the affected blocks were ever modified. To do that, you have to have a block id.

To calculate a valid block id, you have to construct it from relative file number and block number in that file.

So if the relative file number is 15 and corrupted block number is 205, the block id can be calculated like this: 4194304*15+205 = 62914765.

4194304 is a total number of blocks in one regular datafile.

62914765 in hex is 0x3c000cd so in little endian format will have the following structure: cd 00 c0 03

If you know that, you write a simple tool to scan binary files and look for that pattern in parallel. I modified my tool focs_vm and created a new tool: focs_file.

If given archivelog contains change vectors for specified database block address (DBA), the output will look similar to this one:

[root@beta23c focs_file]# ./target/release/focs_file -P 4 -p 'cd 00 c0 03' -f /u01/app/oracle/fra/ORCL/archivelog/2022_94_ktbxmbn7_.arc
Scanning memory from 0 to 22856448 in a separate thread
Scanning memory from 22856448 to 45712896 in a separate thread
Scanning memory from 68569344 to 91425792 in a separate thread
Scanning memory from 45712896 to 68569344 in a separate thread
Scanned: 87 %
Found 296 positions in a chunk
Length: 256 (0x100) bytes
0000:   cd 00 c0 03  66 0e bc 00  00 00 00 00  01 01 1d 78   ....f..........x
0010:   04 00 00 00  00 00 00 00  06 00 14 00  10 00 1c 78   ...............x
0020:   1d 78 01 00  01 80 02 00  00 00 00 00  c8 00 c0 03   .x..............
0030:   00 00 00 00  66 0e bc 00  00 80 00 00  00 00 00 00   ....f...........
0040:   00 00 00 00  2e 00 14 00  18 00 10 00  33 00 04 00   ............3...
0050:   01 22 00 00  55 b6 02 00  5e 00 00 00  10 80 20 0f   ."..U...^..... .
0060:   64 00 00 00  01 00 00 00  66 0e bc 00  01 00 00 00   d.......f.......
0070:   0b df 89 2e  00 00 00 00  0d 15 01 00  13 00 01 00   ................
0080:   ce 00 c0 03  66 0e bc 00  00 00 00 00  01 01 1d 78   ....f..........x
0090:   04 00 00 00  00 00 00 00  06 00 14 00  10 00 1c 78   ...............x
00a0:   1d 78 01 00  01 80 02 00  00 00 00 00  c8 00 c0 03   .x..............
00b0:   00 00 00 00  66 0e bc 00  00 80 00 00  00 00 00 00   ....f...........
00c0:   00 00 00 00  64 00 00 00  01 12 00 00  66 0e bc 00   ....d.......f...
00d0:   01 00 00 00  0b df 89 2e  00 00 00 00  0d 15 01 00   ................
00e0:   13 00 01 00  cf 00 c0 03  66 0e bc 00  00 00 00 00   ........f.......
00f0:   01 01 1d 78  04 00 00 00  00 00 00 00  06 00 14 00   ...x............

Length: 256 (0x100) bytes
0000:   cd 00 c0 03  ca 00 c0 03  fa 12 03 01  01 00 00 00   ................
0010:   00 00 00 36  0b 02 01 00  13 00 01 00  cd 00 c0 03   ...6............
0020:   66 0e bc 00  00 00 00 00  02 00 1d 78  04 00 00 00   f..........x....
0030:   00 00 00 00  1c 00 18 00  32 00 02 00  06 00 04 00   ........2.......
0040:   05 00 0c 00  07 00 07 00  03 00 00 00  00 00 02 00   ................
0050:   01 0d 00 00  00 00 00 00  08 00 0b 00  73 0c 00 00   ............s...
0060:   a4 19 40 02  7c 02 64 00  cd 00 c0 03  ca 00 c0 03   ..@.|.d.........
0070:   fa 12 02 01  01 00 00 00  2c 01 0b 00  00 00 00 00   ........,.......
0080:   60 00 00 00  00 00 00 00  00 00 00 00  02 00 00 00   `...............
0090:   3e 00 00 00  00 00 03 00  e8 52 40 02  c2 02 bc 00   >........R@.....
00a0:   53 74 65 76  65 6e ff ff  4b 69 6e 67  53 4b 49 4e   Steven..KingSKIN
00b0:   47 00 14 00  35 31 35 2e  31 32 33 2e  34 35 36 37   G...515.123.4567
00c0:   78 67 06 11  01 01 01 00  41 44 5f 50  52 45 53 00   xg......AD_PRES.
00d0:   c3 03 29 00  c1 5b 02 00  08 01 00 00  01 00 00 00   ..)..[..........
00e0:   66 0e bc 00  03 00 00 00  0b df 89 2e  00 00 00 00   f...............
00f0:   05 01 20 00  12 00 ff ff  a4 19 40 02  66 0e bc 00   .. .......@.f...

If not – there will be nothing to show:

[root@beta23c focs_file]# ./target/release/focs_file -P 4 -p 'cd 00 c0 03' -f /u01/app/oracle/fra/ORCL/archivelog/2022_12_23/o1_mf_1_95_ktbxvbx9_.arc
Scanning memory from 0 to 305536 in a separate thread
Scanning memory from 305536 to 611072 in a separate thread
Scanning memory from 916608 to 1222144 in a separate thread
Scanning memory from 611072 to 916608 in a separate thread

So if there were no changes found to the affected blocks, we can just restore a datafile to the new location:

RMAN> run {
2> set newname for datafile 19 to new;
3> restore datafile 19;
4> }

executing command: SET NEWNAME

Starting restore at 23-DEC-22
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=17 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00019 to /u01/app/oracle/oradata/ORCL/F07C665E552C42C5E0530400000A57EF/datafile/o1_mf_users_%u_.dbf
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fra/ORCL/F07C665E552C42C5E0530400000A57EF/backupset/2022_12_23/o1_mf_nnndf_TAG20221223T120731_ktc317h6_.bkp
channel ORA_DISK_1: piece handle=/u01/app/oracle/fra/ORCL/F07C665E552C42C5E0530400000A57EF/backupset/2022_12_23/o1_mf_nnndf_TAG20221223T120731_ktc317h6_.bkp tag=TAG20221223T120731
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 23-DEC-22

Verify the block is there and it is not corrupted:

rico2 > map
 File: /u01/app/oracle/oradata/ORCL/F07C665E552C42C5E0530400000A57EF/datafile/o1_mf_users_ktc6t6no_.dbf(16)
 Block: 205			Dba: 0x40000cd
------------------------------------------------------------
 DATA Table/Cluster

 struct kcbh, 20 bytes				@0

 struct ktbbh,  72 bytes 			@20

 struct kdbh, 14 bytes 				@100

 struct kdbt[1], 4 bytes			@114

 sb2 kdbr[98]					@118

 ub1 freespace[874]				@314

 ub1 rowdata[7000]				@1188

 ub4 tailchk					@8188

And copy it manually to the appropriate place:

>>> source = open("/u01/app/oracle/oradata/ORCL/F07C665E552C42C5E0530400000A57EF/datafile/o1_mf_users_ktc6t6no_.dbf", "rb")
>>> target = open("/u01/app/oracle/oradata/ORCL/F07C665E552C42C5E0530400000A57EF/datafile/o1_mf_users_ktc4b0t1_.dbf", "rb+")
>>> source.seek(205*8192)
1679360
>>> source_block=source.read(8192)
>>> target.seek(205*8192)
1679360
>>> target.write(source_block)
8192
>>> target.close()

Let’s see if Oracle can handle this brutal behavior:

RMAN> backup validate datafile 19;

Starting backup at 23-DEC-22
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=747 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00019 name=/u01/app/oracle/oradata/ORCL/F07C665E552C42C5E0530400000A57EF/datafile/o1_mf_users_ktc4b0t1_.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
19   OK     0              133          641             12324606
  File Name: /u01/app/oracle/oradata/ORCL/F07C665E552C42C5E0530400000A57EF/datafile/o1_mf_users_ktc4b0t1_.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              30
  Index      0              19
  Other      0              458

Finished backup at 23-DEC-22

SQL> ;
  1* select /*+ FULL(e) */ count(*) from hr.employees e
SQL> /


Execution Plan
----------------------------------------------------------
Plan hash value: 1756381138

------------------------------------------------------------------------
| Id  | Operation	   | Name      | Rows  | Cost (%CPU)| Time     |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |	       |     1 |     3	 (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |	       |     1 |	    |	       |
|   2 |   TABLE ACCESS FULL| EMPLOYEES |   107 |     3	 (0)| 00:00:01 |
------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
	  0  recursive calls
	  0  db block gets
	  7  consistent gets
	  0  physical reads
	  0  redo size
	577  bytes sent via SQL*Net to client
	108  bytes received via SQL*Net from client
	  2  SQL*Net roundtrips to/from client
	  0  sorts (memory)
	  0  sorts (disk)
	  1  rows processed

Now the customer has to do some tests and verify that everything works good, but the processing can go on 🙂

This is extremely risky and I wouldn’t recommend it unless you are completely fucked and have no other possibility of rescuing your data! The business has to be aware that this is not supported and that the only supported method of fixing this problem is just to wait for archivelogs to be applied! But sometimes each hour costs a lot of money and it’s worth the risk.

It was another case of Schrödinger’s Backup 🙂

Blockrecover in Standard Edition? Nope, but close enough.

Search:

How checkpoint queue actually works (the basic scenario)

[SOLVED!] ORA-39405: Oracle Data Pump does not support importing from a source database with TSTZ

Binary patching/hacking VM process

Last resort data recovery

Hacking database writer

Flashback query and cursor: pin S wait on X