Oracle database block checksum XOR algorithm explained


28.01.2018
by Kamil Stawiarski

Recently I’ve started to write my own clone of BBED to have something handy and useful in extreme cases when you have to go deep and fix stuff on low level (I have only like 2 such cases a year but each time it is really fun and a nice money 😉 )

When I’ll finish the tool, it will be available on my GitHub. Maybe as an option for ODBV or a separate tool? I haven’t decided yet but I’ll cover it with a blog post for sure!

Meanwhile, I want to share with you my investigation on calculating a block checksum.
Quoting from a documentation:

A number calculated by the database from all the bytes stored in a data or redo block. If the DB_BLOCK_CHECKSUM initialization parameter is enabled, then the database calculates the checksum for every data file or online redo log block and stores it in the block header when writing to disk. The database can use the checksum value to check consistency.

This checksum in a database block is stored on 16th byte. Let’s check it with BBED:

BBED> p kcbh
struct kcbh, 20 bytes                       @0
   ub1 type_kcbh                            @0        0x06
   ub1 frmt_kcbh                            @1        0xa2
   ub1 spare1_kcbh                          @2        0x00
   ub1 spare2_kcbh                          @3        0x00
   ub4 rdba_kcbh                            @4        0x01c0008b
   ub4 bas_kcbh                             @8        0x009217c8
   ub2 wrp_kcbh                             @12       0x0000
   ub1 seq_kcbh                             @14       0x01
   ub1 flg_kcbh                             @15       0x06 (KCBHFDLC, KCBHFCKV)
   ub2 chkval_kcbh                          @16       0x40e0
   ub2 spare3_kcbh                          @18       0x0000

As you can see the value is 0x40e0 so 13710. Please note that BBED shows the hex values format in a human-readable manner like in a Big Endian system. Since I’m doing this on Intel, the actual hex value in a datafile is written as "E040″.

BBED can calculate a checksum with command "sum" and apply it to a block with a command "sum apply". Let’s check what will happen if I’ll change the value in a database block at a low level without putting the appropriate checksum.

BBED> p *kdbr[0]
rowdata[3696]
-------------
ub1 rowdata[3696]                           @4877     0x2c

BBED> x /rncccctcn
rowdata[3696]                               @4877
-------------
flag@4877: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@4878: 0x01
cols@4879:   11

col    0[3] @4880: 108
col    1[5] @4884: Nancy
col    2[9] @4890: Greenberg
col    3[8] @4900: NGREENBE
col   4[12] @4909: 515.124.4569
col    5[7] @4922: 17-AUG-02
col    6[6] @4930: FI_MGR
col    7[4] @4937: 12008
col    8[0] @4942: *NULL*
col    9[3] @4943: 101
col   10[2] @4947: 100

BBED> set offset 4885
	OFFSET         	4885
BBED> set mode edit
	MODE           	Edit

BBED>  modify /c "Dupcy"
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) Y
BBED> p *kdbr[0]
rowdata[3696]
-------------
ub1 rowdata[3696]                           @4877     0x2c

BBED> x /rncccctcn
rowdata[3696]                               @4877
-------------
flag@4877: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@4878: 0x01
cols@4879:   11

col    0[3] @4880: 108
col    1[5] @4884: Dupcy
col    2[9] @4890: Greenberg
col    3[8] @4900: NGREENBE
col   4[12] @4909: 515.124.4569
col    5[7] @4922: 17-AUG-02
col    6[6] @4930: FI_MGR
col    7[4] @4937: 12008
col    8[0] @4942: *NULL*
col    9[3] @4943: 101
col   10[2] @4947: 100

Now we can see that BBED is showing me the changes correctly, but when I’ll try to read a block from Oracle:

ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 7, block # 139)
ORA-01110: data file 7:
'/u01/app/oracle/oradata/SKIPER/datafile/o1_mf_tbs_dupa_f5rzk9yl_.dbf'

BBED will also show us that a checksum in a block is different than it shout be:

BBED> sum
Check value for File 7, Block 139:
current = 0x40e0, required = 0x54f4

Now we know what the checksum should be and how to trigger BBED to count it – it’s debugging time!!!

Firstly I’ll attach to my BBED process with a GDB and I’ll set a watchpoint. In assembler, a result from a function has to be set on one of the registers. The CPU documentation will tell which register to observe. In most of the cases on Intel and AMD it will be RAX:

(gdb) watch $rax == 0x54f4
Watchpoint 1: $rax == 0x54f4
(gdb) command 1
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>c
>end
(gdb) set pagination off
(gdb) c
Continuing.

After executing a sum command in a BBED I’ll see a set of functions that returned with a specified value – of course, I’m interested only in the first one:

Watchpoint 1: $rax == 0x54f4

Old value = 0
New value = 1
0x00007f3c5857eb6b in kcbhxoro () from /u01/app/oracle/product/12.1.0/dbhome_1/lib/libclntsh.so.12.1
Watchpoint 1: $rax == 0x54f4

I’ve consulted Frits and he told me that this kcbhxoro launches sxorchk, which is doing the work. But sxorchk never fires when we count a checksum from BBED. Let’s check kcbhxoro:

(... removed for clarity ... )
   0x00007fa2e1154b16 <+6>:	test   %edx,%edx
   0x00007fa2e1154b18 <+8>:	je     0x7fa2e1154b24 <kcbhxoro+20>
   0x00007fa2e1154b1a <+10>:	callq  0x7fa2df2be5f8 <sxorchk@plt>
 (... removed for clarity ... )

As you can see, calling sxorchk is conditional and it can be skipped by highlighted instructions – and it happens in our case.

The first interesting thing is a little loop in here:

   0x00007fa2e1154b3b <+43>:	xor    (%rdi),%rdx
   0x00007fa2e1154b3e <+46>:	inc    %eax
   0x00007fa2e1154b40 <+48>:	xor    0x8(%rdi),%rdx
   0x00007fa2e1154b44 <+52>:	xor    0x10(%rdi),%rdx
   0x00007fa2e1154b48 <+56>:	xor    0x18(%rdi),%rdx
   0x00007fa2e1154b4c <+60>:	add    $0x20,%rdi
   0x00007fa2e1154b50 <+64>:	cmp    %esi,%eax
   0x00007fa2e1154b52 <+66>:	jb     0x7fa2e1154b3b <kcbhxoro+43>

We are xoring the value from the memory address stored in the RDI registry with a value of RDX registry – the result is put in RDX.
Then we increment value on EAX registry – EAX is used here as a loop counter. Do you see the last line of the above output? This is a conditional jump. First, we are comparing values on ESI and EAX and if EAX is below the ESI – we are jumping back to offset 43 to the xor.

So ESI contains the value of the loop spins. The counter is calculated here:

   0x00007fa2e1154b31 <+33>:	shr    $0x5,%rsi

From the documentation we know, that RSI is a register used to pass the second argument to functions. So we can see here, that the second argument is shifted by 5 bits to the right – shifting right by 5 bits is the equivalent of diving the argument by 2**5 – so 32.

Ok, so let’s make a breakpoint at the shr command and check the values of the registers:

(gdb) b *0x00007fa2e1154b31
Breakpoint 1 at 0x7fa2e1154b31
(gdb) b *0x00007fa2e1154b31
Breakpoint 1 at 0x7fa2e1154b31
(gdb) c
Continuing.

Breakpoint 1, 0x00007fa2e1154b31 in kcbhxoro () from /u01/app/oracle/product/12.1.0/dbhome_1/lib/libclntsh.so.12.1
(gdb) p $rsi
$1 = 8192
(gdb) p $rdx
$2 = 0
(gdb) p $rdi
$3 = 140337484677632
(gdb) p/x $rdi
$4 = 0x7fa2ddec4200

Great! So we know that RDI contains the address in a memory where data begins (I know it because in assembler dump I see that RDI is used in brackets). The RSI, on the other hand, contains value 8192. If I will shift it right by 5 bits it would be equivalent to dividing it by 32 – so after this operation, I should have value 256 on the RSI.

Let’s check it.

(gdb) b *0x00007fa2e1154b3b
Breakpoint 2 at 0x7fa2e1154b3b
(gdb) c
Continuing.

Breakpoint 2, 0x00007fa2e1154b3b in kcbhxoro () from /u01/app/oracle/product/12.1.0/dbhome_1/lib/libclntsh.so.12.1
(gdb) p $rsi
$5 = 256

Awesome! So it works. Now I’ll check what data is stored on the addres from the RDI. To do it, I’ll use "examine" command:

(gdb) x/gx $rdi
0x7fa2ddec4200:	0x01c0008b0000a206

Since the result of an XOR is being put in an RDX registry, the portion of 8 bytes will be used as a chunk – RDX is a 64bit registry. Format "g" in examine command means: "g(giant, 8 bytes)". You can see this with "help x".

Let’s check right now the contents of this block:

[oracle@rico ~]$ xxd -s+1138688 -g 8 -l 8 /u01/app/oracle/oradata/SKIPER/datafile/o1_mf_tbs_dupa_f5rzk9yl_.dbf
0116000: 06a200008b00c001                   ........

Remember that GDB shows hex values in a human format (the same as big-endian) but on Intel, we have a small-endian format. So if you’ll reverse bytes, you will see that the values are actually the same.

Conclusion? RDI contains the address to memory with my database block! Awesome!
So the main loop looks like this:

  • XOR 8 bytes from memory address, stored in RDI with RDX
  • Increment counter on EAX
  • XOR 8 bytes from memory address (+8B offset), stored in RDI with RDX
  • XOR 8 bytes from memory address (+16B offset), stored in RDI with RDX
  • XOR 8 bytes from memory address (+24B offset), stored in RDI with RDX
  • add 32B to the value of RDI – so the pointer will shift 32B
  • already did this 256 times? No? Jump to the begining

After that, we have some more fun:

   0x00007fa2e1154b54 <+68>:	mov    %rdx,%rcx

Put value of RDX into RCX

   0x00007fa2e1154b57 <+71>:	shr    $0x20,%rcx

Shift value at RCX 32b to the right.

   0x00007fa2e1154b5b <+75>:	xor    %rcx,%rdx

XOR RCX and RDX – put the result in RDX

   0x00007fa2e1154b5e <+78>:	mov    %rdx,%rsi

Put value of RDX into RSI

   0x00007fa2e1154b61 <+81>:	shr    $0x10,%rsi

Shift value at RSI 16b to the right.

   0x00007fa2e1154b65 <+85>:	xor    %rsi,%rdx

XOR RSI and RDX – put the result in RDX

   0x00007fa2e1154b68 <+88>:	movzwl %dx,%eax

Put 16bits (2 bytes) of value stored in RDX into EAX and fill empty bits with 0. The EAX is a register used as 1st return register. So retq (return instruction) will return value stored on this register.

There is one more thing to check – the value of a checksum is stored at 16B of a block and has 2B. It makes no sense including it into checksum calculation. So let’s compare the value of the in-memory block with actual one:

This is block stored in memory:

(gdb) x/gx $rdi+16
0x7fa2ddec4210:	0x0016000100000000

And here is the value from the disk:

[oracle@rico ~]$ xxd -s+1138704 -g 8 -l 8 /u01/app/oracle/oradata/SKIPER/datafile/o1_mf_tbs_dupa_f5rzk9yl_.dbf
0116010: e040000001001600                   .@......

Awesome! So those 2B are filled with zeros in memory. Now we can write our checksum code in Python and check it:

from struct import Struct
import sys, binascii

ushort = Struct("H")
ulong = Struct("Q")

file_name = sys.argv[1]
block_no = int(sys.argv[2])

block_size = 8192

dbf = open(file_name, "rb")
dbf.seek(block_size*block_no)

block = dbf.read(16)
block += b"\x00\x00"
dbf.seek(block_size*block_no + 18)
block += dbf.read(block_size - 18)
dbf.close()

checksum_value = 0

for i in range(int(block_size/8)):
  checksum_value = checksum_value ^ ulong.unpack(block[i*8:i*8+8])[0]

tmp = checksum_value >> 32
checksum_value = checksum_value ^ tmp
tmp = checksum_value >> 16
checksum_value = checksum_value ^ tmp

final_checksum = ushort.unpack(ulong.pack(checksum_value)[0:2])[0]

print("checksum int = " + str(final_checksum))
print("checksum hex = " + str(hex(final_checksum)))

Result from BBED:

BBED> sum
Check value for File 7, Block 139:
current = 0x40e0, required = 0x54f4

Result from my Python script:

[oracle@rico ~]$ python kcbhxoro.py /u01/app/oracle/oradata/SKIPER/datafile/o1_mf_tbs_dupa_f5rzk9yl_.dbf 139
checksum int = 21748
checksum hex = 0x54f4

That’s it! And what about sxorchk function? A brief look into it shows that it does the same calculations in a different manner. So my guess is, that sxorchk is just a highly optimised version of kcbhxoro and while Oracle DB is calling this function, the argument is being passed to inform kcbhxoro that I want to use in fact sxorchk – and BBED is using just the old API. But this is just a guess.

I’d like to thank Frits Hoogland for his insights on this subject 🙂

Hope to show you the new clone of BBED soon!
Cheers!

P.S.
This is a very nice article to read if you want to understand GDB usage and formatting: https://www.recurse.com/blog/7-understanding-c-by-learning-assembly


Contact us

Database Whisperers sp. z o. o. sp. k.
al. Jerozolimskie 200, 3rd floor, room 342
02-486 Warszawa
NIP: 5272744987
REGON:362524978
+48 508 943 051
+48 661 966 009
info@ora-600.pl

Newsletter Sign up to be updated