Blockrecover in Standard Edition? Nope, but close enough.

Halloween started really scary this year – some company called us and said, that they have some problems with their Oracle Database Standard Edition.

Apparently they had a storage crash six months before, which caused some blocks to get corrupted and from that time no one has noticed that RMAN is not producing any backups.

But the system was working fine so no one was complaining at the business level… until the Halloween 2022.

It has occurred that twice a year they do some complex reporting and while doing it, they noticed corrupted blocks in some tablespaces.

Blockrecover would be a blessing but it is possible only in Enterprise Edition, so they tried to restore and recover affected database files from 6 months before, and it was not a surprise that the process of applying that amount of archivelogs lasted a very, very long time!

So they called us to verify if there is any possibility to resume their business process within a few hours and not… days.

Well, let’s give it a try!

1SQL> select * from hr.employees;
2select * from hr.employees
3       *
4ERROR at line 1:
5ORA-01578: ORACLE data block corrupted (file # 19, block # 205)
6ORA-01110: data file 19:
7'/u01/app/oracle/oradata/ORCL/F07C665E552C42C5E0530400000A57EF/datafile/o1_mf_us
8ers_ktc4b0t1_.dbf'

The errors looked similar to the one above. But there were much more database blocks 🙂 In rico2 blocks were filled with some random data:

1rico2 > d
2 File: /u01/app/oracle/oradata/ORCL/F07C665E552C42C5E0530400000A57EF/datafile/o1_mf_users_ktc4b0t1_.dbf(15)
3 Block: 205 Offsets: 0 to 512       Dba: 0x3c000cd
4---------------------------------------------------------------
564757061 64757061 64757061 64757061 | dupadupadupadupa
664757061 64757061 64757061 64757061 | dupadupadupadupa
764757061 64757061 64757061 64757061 | dupadupadupadupa
864757061 64757061 64757061 64757061 | dupadupadupadupa
964757061 64757061 64757061 64757061 | dupadupadupadupa
1064757061 64757061 64757061 64757061 | dupadupadupadupa
1164757061 64757061 64757061 64757061 | dupadupadupadupa
1264757061 64757061 64757061 64757061 | dupadupadupadupa
1364757061 64757061 64757061 64757061 | dupadupadupadupa
1464757061 64757061 64757061 64757061 | dupadupadupadupa
1564757061 64757061 64757061 64757061 | dupadupadupadupa
1664757061 64757061 64757061 64757061 | dupadupadupadupa

Of course waiting a few days for applying hundreds of gigabytes of archivelogs was not an option… but a thought came to me, that if no one noticed this error before, those blocks were not being touched since the storage crash!

So in theory it could be possible to just restore the database files from before the crash, find the blocks and copy them from the source datafiles to the current ones.

The only problem is, how to be sure, that a block was not changed during this whole time?

It is actually pretty easy! You have to scan all archivelogs and verify if the affected blocks were ever modified. To do that, you have to have a block id.

To calculate a valid block id, you have to construct it from relative file number and block number in that file.

So if the relative file number is 15 and corrupted block number is 205, the block id can be calculated like this: 4194304*15+205 = 62914765.

4194304 is a total number of blocks in one regular datafile.

62914765 in hex is 0x3c000cd so in little endian format will have the following structure: cd 00 c0 03

If you know that, you write a simple tool to scan binary files and look for that pattern in parallel. I modified my tool focs_vm and created a new tool: focs_file.

If given archivelog contains change vectors for specified database block address (DBA), the output will look similar to this one:

1[root@beta23c focs_file]# ./target/release/focs_file -P 4 -p 'cd 00 c0 03' -f /u01/app/oracle/fra/ORCL/archivelog/2022_94_ktbxmbn7_.arc
2Scanning memory from 0 to 22856448 in a separate thread
3Scanning memory from 22856448 to 45712896 in a separate thread
4Scanning memory from 68569344 to 91425792 in a separate thread
5Scanning memory from 45712896 to 68569344 in a separate thread
6Scanned: 87 %
7Found 296 positions in a chunk
8Length: 256 (0x100) bytes
90000:   cd 00 c0 03  66 0e bc 00  00 00 00 00  01 01 1d 78   ....f..........x
100010:   04 00 00 00  00 00 00 00  06 00 14 00  10 00 1c 78   ...............x
110020:   1d 78 01 00  01 80 02 00  00 00 00 00  c8 00 c0 03   .x..............
120030:   00 00 00 00  66 0e bc 00  00 80 00 00  00 00 00 00   ....f...........
130040:   00 00 00 00  2e 00 14 00  18 00 10 00  33 00 04 00   ............3...
140050:   01 22 00 00  55 b6 02 00  5e 00 00 00  10 80 20 0f   ."..U...^..... .
150060:   64 00 00 00  01 00 00 00  66 0e bc 00  01 00 00 00   d.......f.......
160070:   0b df 89 2e  00 00 00 00  0d 15 01 00  13 00 01 00   ................
170080:   ce 00 c0 03  66 0e bc 00  00 00 00 00  01 01 1d 78   ....f..........x
180090:   04 00 00 00  00 00 00 00  06 00 14 00  10 00 1c 78   ...............x
1900a0:   1d 78 01 00  01 80 02 00  00 00 00 00  c8 00 c0 03   .x..............
2000b0:   00 00 00 00  66 0e bc 00  00 80 00 00  00 00 00 00   ....f...........
2100c0:   00 00 00 00  64 00 00 00  01 12 00 00  66 0e bc 00   ....d.......f...
2200d0:   01 00 00 00  0b df 89 2e  00 00 00 00  0d 15 01 00   ................
2300e0:   13 00 01 00  cf 00 c0 03  66 0e bc 00  00 00 00 00   ........f.......
2400f0:   01 01 1d 78  04 00 00 00  00 00 00 00  06 00 14 00   ...x............
25 
26Length: 256 (0x100) bytes
270000:   cd 00 c0 03  ca 00 c0 03  fa 12 03 01  01 00 00 00   ................
280010:   00 00 00 36  0b 02 01 00  13 00 01 00  cd 00 c0 03   ...6............
290020:   66 0e bc 00  00 00 00 00  02 00 1d 78  04 00 00 00   f..........x....
300030:   00 00 00 00  1c 00 18 00  32 00 02 00  06 00 04 00   ........2.......
310040:   05 00 0c 00  07 00 07 00  03 00 00 00  00 00 02 00   ................
320050:   01 0d 00 00  00 00 00 00  08 00 0b 00  73 0c 00 00   ............s...
330060:   a4 19 40 02  7c 02 64 00  cd 00 c0 03  ca 00 c0 03   ..@.|.d.........
340070:   fa 12 02 01  01 00 00 00  2c 01 0b 00  00 00 00 00   ........,.......
350080:   60 00 00 00  00 00 00 00  00 00 00 00  02 00 00 00   `...............
360090:   3e 00 00 00  00 00 03 00  e8 52 40 02  c2 02 bc 00   >........R@.....
3700a0:   53 74 65 76  65 6e ff ff  4b 69 6e 67  53 4b 49 4e   Steven..KingSKIN
3800b0:   47 00 14 00  35 31 35 2e  31 32 33 2e  34 35 36 37   G...515.123.4567
3900c0:   78 67 06 11  01 01 01 00  41 44 5f 50  52 45 53 00   xg......AD_PRES.
4000d0:   c3 03 29 00  c1 5b 02 00  08 01 00 00  01 00 00 00   ..)..[..........
4100e0:   66 0e bc 00  03 00 00 00  0b df 89 2e  00 00 00 00   f...............
4200f0:   05 01 20 00  12 00 ff ff  a4 19 40 02  66 0e bc 00   .. .......@.f...

If not – there will be nothing to show:

1[root@beta23c focs_file]# ./target/release/focs_file -P 4 -p 'cd 00 c0 03' -f /u01/app/oracle/fra/ORCL/archivelog/2022_12_23/o1_mf_1_95_ktbxvbx9_.arc
2Scanning memory from 0 to 305536 in a separate thread
3Scanning memory from 305536 to 611072 in a separate thread
4Scanning memory from 916608 to 1222144 in a separate thread
5Scanning memory from 611072 to 916608 in a separate thread

So if there were no changes found to the affected blocks, we can just restore a datafile to the new location:

1RMAN> run {
22> set newname for datafile 19 to new;
33> restore datafile 19;
44> }
5 
6executing command: SET NEWNAME
7 
8Starting restore at 23-DEC-22
9allocated channel: ORA_DISK_1
10channel ORA_DISK_1: SID=17 device type=DISK
11 
12channel ORA_DISK_1: starting datafile backup set restore
13channel ORA_DISK_1: specifying datafile(s) to restore from backup set
14channel ORA_DISK_1: restoring datafile 00019 to /u01/app/oracle/oradata/ORCL/F07C665E552C42C5E0530400000A57EF/datafile/o1_mf_users_%u_.dbf
15channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fra/ORCL/F07C665E552C42C5E0530400000A57EF/backupset/2022_12_23/o1_mf_nnndf_TAG20221223T120731_ktc317h6_.bkp
16channel ORA_DISK_1: piece handle=/u01/app/oracle/fra/ORCL/F07C665E552C42C5E0530400000A57EF/backupset/2022_12_23/o1_mf_nnndf_TAG20221223T120731_ktc317h6_.bkp tag=TAG20221223T120731
17channel ORA_DISK_1: restored backup piece 1
18channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
19Finished restore at 23-DEC-22

Verify the block is there and it is not corrupted:

1rico2 > map
2 File: /u01/app/oracle/oradata/ORCL/F07C665E552C42C5E0530400000A57EF/datafile/o1_mf_users_ktc6t6no_.dbf(16)
3 Block: 205         Dba: 0x40000cd
4------------------------------------------------------------
5 DATA Table/Cluster
6 
7 struct kcbh, 20 bytes              @0
8 
9 struct ktbbh,  72 bytes            @20
10 
11 struct kdbh, 14 bytes              @100
12 
13 struct kdbt[1], 4 bytes            @114
14 
15 sb2 kdbr[98]                   @118
16 
17 ub1 freespace[874]             @314
18 
19 ub1 rowdata[7000]              @1188
20 
21 ub4 tailchk                    @8188

And copy it manually to the appropriate place:

1>>> source = open("/u01/app/oracle/oradata/ORCL/F07C665E552C42C5E0530400000A57EF/datafile/o1_mf_users_ktc6t6no_.dbf", "rb")
2>>> target = open("/u01/app/oracle/oradata/ORCL/F07C665E552C42C5E0530400000A57EF/datafile/o1_mf_users_ktc4b0t1_.dbf", "rb+")
3>>> source.seek(205*8192)
41679360
5>>> source_block=source.read(8192)
6>>> target.seek(205*8192)
71679360
8>>> target.write(source_block)
98192
10>>> target.close()

Let’s see if Oracle can handle this brutal behavior:

1RMAN> backup validate datafile 19;
2 
3Starting backup at 23-DEC-22
4using target database control file instead of recovery catalog
5allocated channel: ORA_DISK_1
6channel ORA_DISK_1: SID=747 device type=DISK
7channel ORA_DISK_1: starting full datafile backup set
8channel ORA_DISK_1: specifying datafile(s) in backup set
9input datafile file number=00019 name=/u01/app/oracle/oradata/ORCL/F07C665E552C42C5E0530400000A57EF/datafile/o1_mf_users_ktc4b0t1_.dbf
10channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
11List of Datafiles
12=================
13File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
14---- ------ -------------- ------------ --------------- ----------
1519   OK     0              133          641             12324606
16  File Name: /u01/app/oracle/oradata/ORCL/F07C665E552C42C5E0530400000A57EF/datafile/o1_mf_users_ktc4b0t1_.dbf
17  Block Type Blocks Failing Blocks Processed
18  ---------- -------------- ----------------
19  Data       0              30
20  Index      0              19
21  Other      0              458
22 
23Finished backup at 23-DEC-22

1SQL> ;
2  1* select /*+ FULL(e) */ count(*) from hr.employees e
3SQL> /
4 
5 
6Execution Plan
7----------------------------------------------------------
8Plan hash value: 1756381138
9 
10------------------------------------------------------------------------
11| Id  | Operation      | Name      | Rows  | Cost (%CPU)| Time     |
12------------------------------------------------------------------------
13|   0 | SELECT STATEMENT   |           |     1 |     3   (0)| 00:00:01 |
14|   1 |  SORT AGGREGATE    |           |     1 |        |          |
15|   2 |   TABLE ACCESS FULL| EMPLOYEES |   107 |     3   (0)| 00:00:01 |
16------------------------------------------------------------------------
17 
18Note
19-----
20   - dynamic statistics used: dynamic sampling (level=2)
21 
22 
23Statistics
24----------------------------------------------------------
25      0  recursive calls
26      0  db block gets
27      7  consistent gets
28      0  physical reads
29      0  redo size
30    577  bytes sent via SQL*Net to client
31    108  bytes received via SQL*Net from client
32      2  SQL*Net roundtrips to/from client
33      0  sorts (memory)
34      0  sorts (disk)
35      1  rows processed

Now the customer has to do some tests and verify that everything works good, but the processing can go on 🙂

This is extremely risky and I wouldn’t recommend it unless you are completely fucked and have no other possibility of rescuing your data! The business has to be aware that this is not supported and that the only supported method of fixing this problem is just to wait for archivelogs to be applied! But sometimes each hour costs a lot of money and it’s worth the risk.

It was another case of Schrödinger’s Backup 🙂

Blockrecover in Standard Edition? Nope, but close enough.

Search:

RICO3 – as a tool for thief

JAS-MIN

When security drives you crazy! (CBFUK)

How checkpoint queue actually works (the basic scenario)

[SOLVED!] ORA-39405: Oracle Data Pump does not support importing from a source database with TSTZ

Binary patching/hacking VM process

Last resort data recovery

Hacking database writer

Flashback query and cursor: pin S wait on X