Oracle database block checksum XOR algorithm explained


28.01.2018
by Kamil Stawiarski

Recently I’ve started to write my own clone of BBED to have something handy and useful in extreme cases when you have to go deep and fix stuff on low level (I have only like 2 such cases a year but each time it is really fun and a nice money 😉 )

When I’ll finish the tool, it will be available on my GitHub. Maybe as an option for ODBV or a separate tool? I haven’t decided yet but I’ll cover it with a blog post for sure!

Meanwhile, I want to share with you my investigation on calculating a block checksum.
Quoting from a documentation:

A number calculated by the database from all the bytes stored in a data or redo block. If the DB_BLOCK_CHECKSUM initialization parameter is enabled, then the database calculates the checksum for every data file or online redo log block and stores it in the block header when writing to disk. The database can use the checksum value to check consistency.

This checksum in a database block is stored on 16th byte. Let’s check it with BBED:

1BBED> p kcbh
2struct kcbh, 20 bytes                       @0
3   ub1 type_kcbh                            @0        0x06
4   ub1 frmt_kcbh                            @1        0xa2
5   ub1 spare1_kcbh                          @2        0x00
6   ub1 spare2_kcbh                          @3        0x00
7   ub4 rdba_kcbh                            @4        0x01c0008b
8   ub4 bas_kcbh                             @8        0x009217c8
9   ub2 wrp_kcbh                             @12       0x0000
10   ub1 seq_kcbh                             @14       0x01
11   ub1 flg_kcbh                             @15       0x06 (KCBHFDLC, KCBHFCKV)
12   ub2 chkval_kcbh                          @16       0x40e0
13   ub2 spare3_kcbh                          @18       0x0000

As you can see the value is 0x40e0 so 13710. Please note that BBED shows the hex values format in a human-readable manner like in a Big Endian system. Since I’m doing this on Intel, the actual hex value in a datafile is written as "E040″.

BBED can calculate a checksum with command "sum" and apply it to a block with a command "sum apply". Let’s check what will happen if I’ll change the value in a database block at a low level without putting the appropriate checksum.

1BBED> p *kdbr[0]
2rowdata[3696]
3-------------
4ub1 rowdata[3696]                           @4877     0x2c
5 
6BBED> x /rncccctcn
7rowdata[3696]                               @4877
8-------------
9flag@4877: 0x2c (KDRHFL, KDRHFF, KDRHFH)
10lock@4878: 0x01
11cols@4879:   11
12 
13col    0[3] @4880: 108
14col    1[5] @4884: Nancy
15col    2[9] @4890: Greenberg
16col    3[8] @4900: NGREENBE
17col   4[12] @4909: 515.124.4569
18col    5[7] @4922: 17-AUG-02
19col    6[6] @4930: FI_MGR
20col    7[4] @4937: 12008
21col    8[0] @4942: *NULL*
22col    9[3] @4943: 101
23col   10[2] @4947: 100
24 
25BBED> set offset 4885
26    OFFSET          4885
27BBED> set mode edit
28    MODE            Edit
29 
30BBED>  modify /c "Dupcy"
31Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) Y
32BBED> p *kdbr[0]
33rowdata[3696]
34-------------
35ub1 rowdata[3696]                           @4877     0x2c
36 
37BBED> x /rncccctcn
38rowdata[3696]                               @4877
39-------------
40flag@4877: 0x2c (KDRHFL, KDRHFF, KDRHFH)
41lock@4878: 0x01
42cols@4879:   11
43 
44col    0[3] @4880: 108
45col    1[5] @4884: Dupcy
46col    2[9] @4890: Greenberg
47col    3[8] @4900: NGREENBE
48col   4[12] @4909: 515.124.4569
49col    5[7] @4922: 17-AUG-02
50col    6[6] @4930: FI_MGR
51col    7[4] @4937: 12008
52col    8[0] @4942: *NULL*
53col    9[3] @4943: 101
54col   10[2] @4947: 100

Now we can see that BBED is showing me the changes correctly, but when I’ll try to read a block from Oracle:

1ERROR at line 1:
2ORA-01578: ORACLE data block corrupted (file # 7, block # 139)
3ORA-01110: data file 7:
4'/u01/app/oracle/oradata/SKIPER/datafile/o1_mf_tbs_dupa_f5rzk9yl_.dbf'

BBED will also show us that a checksum in a block is different than it shout be:

1BBED> sum
2Check value for File 7, Block 139:
3current = 0x40e0, required = 0x54f4

Now we know what the checksum should be and how to trigger BBED to count it – it’s debugging time!!!

Firstly I’ll attach to my BBED process with a GDB and I’ll set a watchpoint. In assembler, a result from a function has to be set on one of the registers. The CPU documentation will tell which register to observe. In most of the cases on Intel and AMD it will be RAX:

1(gdb) watch $rax == 0x54f4
2Watchpoint 1: $rax == 0x54f4
3(gdb) command 1
4Type commands for breakpoint(s) 1, one per line.
5End with a line saying just "end".
6>c
7>end
8(gdb) set pagination off
9(gdb) c
10Continuing.

After executing a sum command in a BBED I’ll see a set of functions that returned with a specified value – of course, I’m interested only in the first one:

1Watchpoint 1: $rax == 0x54f4
2 
3Old value = 0
4New value = 1
50x00007f3c5857eb6b in kcbhxoro () from /u01/app/oracle/product/12.1.0/dbhome_1/lib/libclntsh.so.12.1
6Watchpoint 1: $rax == 0x54f4

I’ve consulted Frits and he told me that this kcbhxoro launches sxorchk, which is doing the work. But sxorchk never fires when we count a checksum from BBED. Let’s check kcbhxoro:

1(... removed for clarity ... )
2   0x00007fa2e1154b16 <+6>:   test   %edx,%edx
3   0x00007fa2e1154b18 <+8>:   je     0x7fa2e1154b24 <kcbhxoro+20>
4   0x00007fa2e1154b1a <+10>:  callq  0x7fa2df2be5f8 <sxorchk@plt>
5 (... removed for clarity ... )

As you can see, calling sxorchk is conditional and it can be skipped by highlighted instructions – and it happens in our case.

The first interesting thing is a little loop in here:

10x00007fa2e1154b3b <+43>:  xor    (%rdi),%rdx
20x00007fa2e1154b3e <+46>:  inc    %eax
30x00007fa2e1154b40 <+48>:  xor    0x8(%rdi),%rdx
40x00007fa2e1154b44 <+52>:  xor    0x10(%rdi),%rdx
50x00007fa2e1154b48 <+56>:  xor    0x18(%rdi),%rdx
60x00007fa2e1154b4c <+60>:  add    $0x20,%rdi
70x00007fa2e1154b50 <+64>:  cmp    %esi,%eax
80x00007fa2e1154b52 <+66>:  jb     0x7fa2e1154b3b <kcbhxoro+43>

We are xoring the value from the memory address stored in the RDI registry with a value of RDX registry – the result is put in RDX.
Then we increment value on EAX registry – EAX is used here as a loop counter. Do you see the last line of the above output? This is a conditional jump. First, we are comparing values on ESI and EAX and if EAX is below the ESI – we are jumping back to offset 43 to the xor.

So ESI contains the value of the loop spins. The counter is calculated here:

10x00007fa2e1154b31 <+33>:  shr    $0x5,%rsi

From the documentation we know, that RSI is a register used to pass the second argument to functions. So we can see here, that the second argument is shifted by 5 bits to the right – shifting right by 5 bits is the equivalent of diving the argument by 2**5 – so 32.

Ok, so let’s make a breakpoint at the shr command and check the values of the registers:

1(gdb) b *0x00007fa2e1154b31
2Breakpoint 1 at 0x7fa2e1154b31
3(gdb) b *0x00007fa2e1154b31
4Breakpoint 1 at 0x7fa2e1154b31
5(gdb) c
6Continuing.
7 
8Breakpoint 1, 0x00007fa2e1154b31 in kcbhxoro () from /u01/app/oracle/product/12.1.0/dbhome_1/lib/libclntsh.so.12.1
9(gdb) p $rsi
10$1 = 8192
11(gdb) p $rdx
12$2 = 0
13(gdb) p $rdi
14$3 = 140337484677632
15(gdb) p/x $rdi
16$4 = 0x7fa2ddec4200

Great! So we know that RDI contains the address in a memory where data begins (I know it because in assembler dump I see that RDI is used in brackets). The RSI, on the other hand, contains value 8192. If I will shift it right by 5 bits it would be equivalent to dividing it by 32 – so after this operation, I should have value 256 on the RSI.

Let’s check it.

1(gdb) b *0x00007fa2e1154b3b
2Breakpoint 2 at 0x7fa2e1154b3b
3(gdb) c
4Continuing.
5 
6Breakpoint 2, 0x00007fa2e1154b3b in kcbhxoro () from /u01/app/oracle/product/12.1.0/dbhome_1/lib/libclntsh.so.12.1
7(gdb) p $rsi
8$5 = 256

Awesome! So it works. Now I’ll check what data is stored on the addres from the RDI. To do it, I’ll use "examine" command:

1(gdb) x/gx $rdi
20x7fa2ddec4200: 0x01c0008b0000a206

Since the result of an XOR is being put in an RDX registry, the portion of 8 bytes will be used as a chunk – RDX is a 64bit registry. Format "g" in examine command means: "g(giant, 8 bytes)". You can see this with "help x".

Let’s check right now the contents of this block:

1[oracle@rico ~]$ xxd -s+1138688 -g 8 -l 8 /u01/app/oracle/oradata/SKIPER/datafile/o1_mf_tbs_dupa_f5rzk9yl_.dbf
20116000: 06a200008b00c001                   ........

Remember that GDB shows hex values in a human format (the same as big-endian) but on Intel, we have a small-endian format. So if you’ll reverse bytes, you will see that the values are actually the same.

Conclusion? RDI contains the address to memory with my database block! Awesome!
So the main loop looks like this:

  • XOR 8 bytes from memory address, stored in RDI with RDX
  • Increment counter on EAX
  • XOR 8 bytes from memory address (+8B offset), stored in RDI with RDX
  • XOR 8 bytes from memory address (+16B offset), stored in RDI with RDX
  • XOR 8 bytes from memory address (+24B offset), stored in RDI with RDX
  • add 32B to the value of RDI – so the pointer will shift 32B
  • already did this 256 times? No? Jump to the begining

After that, we have some more fun:

10x00007fa2e1154b54 <+68>:  mov    %rdx,%rcx

Put value of RDX into RCX

10x00007fa2e1154b57 <+71>:  shr    $0x20,%rcx

Shift value at RCX 32b to the right.

10x00007fa2e1154b5b <+75>:  xor    %rcx,%rdx

XOR RCX and RDX – put the result in RDX

10x00007fa2e1154b5e <+78>:  mov    %rdx,%rsi

Put value of RDX into RSI

10x00007fa2e1154b61 <+81>:  shr    $0x10,%rsi

Shift value at RSI 16b to the right.

10x00007fa2e1154b65 <+85>:  xor    %rsi,%rdx

XOR RSI and RDX – put the result in RDX

10x00007fa2e1154b68 <+88>:  movzwl %dx,%eax

Put 16bits (2 bytes) of value stored in RDX into EAX and fill empty bits with 0. The EAX is a register used as 1st return register. So retq (return instruction) will return value stored on this register.

There is one more thing to check – the value of a checksum is stored at 16B of a block and has 2B. It makes no sense including it into checksum calculation. So let’s compare the value of the in-memory block with actual one:

This is block stored in memory:

1(gdb) x/gx $rdi+16
20x7fa2ddec4210: 0x0016000100000000

And here is the value from the disk:

1[oracle@rico ~]$ xxd -s+1138704 -g 8 -l 8 /u01/app/oracle/oradata/SKIPER/datafile/o1_mf_tbs_dupa_f5rzk9yl_.dbf
20116010: e040000001001600                   .@......

Awesome! So those 2B are filled with zeros in memory. Now we can write our checksum code in Python and check it:

1from struct import Struct
2import sys, binascii
3 
4ushort = Struct("H")
5ulong = Struct("Q")
6 
7file_name = sys.argv[1]
8block_no = int(sys.argv[2])
9 
10block_size = 8192
11 
12dbf = open(file_name, "rb")
13dbf.seek(block_size*block_no)
14 
15block = dbf.read(16)
16block += b"\x00\x00"
17dbf.seek(block_size*block_no + 18)
18block += dbf.read(block_size - 18)
19dbf.close()
20 
21checksum_value = 0
22 
23for i in range(int(block_size/8)):
24  checksum_value = checksum_value ^ ulong.unpack(block[i*8:i*8+8])[0]
25 
26tmp = checksum_value >> 32
27checksum_value = checksum_value ^ tmp
28tmp = checksum_value >> 16
29checksum_value = checksum_value ^ tmp
30 
31final_checksum = ushort.unpack(ulong.pack(checksum_value)[0:2])[0]
32 
33print("checksum int = " + str(final_checksum))
34print("checksum hex = " + str(hex(final_checksum)))

Result from BBED:

1BBED> sum
2Check value for File 7, Block 139:
3current = 0x40e0, required = 0x54f4

Result from my Python script:

1[oracle@rico ~]$ python kcbhxoro.py /u01/app/oracle/oradata/SKIPER/datafile/o1_mf_tbs_dupa_f5rzk9yl_.dbf 139
2checksum int = 21748
3checksum hex = 0x54f4

That’s it! And what about sxorchk function? A brief look into it shows that it does the same calculations in a different manner. So my guess is, that sxorchk is just a highly optimised version of kcbhxoro and while Oracle DB is calling this function, the argument is being passed to inform kcbhxoro that I want to use in fact sxorchk – and BBED is using just the old API. But this is just a guess.

I’d like to thank Frits Hoogland for his insights on this subject 🙂

Hope to show you the new clone of BBED soon!
Cheers!

P.S.
This is a very nice article to read if you want to understand GDB usage and formatting: https://www.recurse.com/blog/7-understanding-c-by-learning-assembly


Contact us

Database Whisperers sp. z o. o. sp. k.
al. Jerozolimskie 200, 3rd floor, room 342
02-486 Warszawa
NIP: 5272744987
REGON:362524978
+48 508 943 051
+48 661 966 009
info@ora-600.pl

Newsletter Sign up to be updated