Recently I’ve started to write my own clone of BBED to have something handy and useful in extreme cases when you have to go deep and fix stuff on low level (I have only like 2 such cases a year but each time it is really fun and a nice money 😉 )
When I’ll finish the tool, it will be available on my GitHub. Maybe as an option for ODBV or a separate tool? I haven’t decided yet but I’ll cover it with a blog post for sure!
Meanwhile, I want to share with you my investigation on calculating a block checksum.
Quoting from a documentation:
A number calculated by the database from all the bytes stored in a data or redo block. If the DB_BLOCK_CHECKSUM initialization parameter is enabled, then the database calculates the checksum for every data file or online redo log block and stores it in the block header when writing to disk. The database can use the checksum value to check consistency.
This checksum in a database block is stored on 16th byte. Let’s check it with BBED:
BBED> p kcbh struct kcbh, 20 bytes @0 ub1 type_kcbh @0 0x06 ub1 frmt_kcbh @1 0xa2 ub1 spare1_kcbh @2 0x00 ub1 spare2_kcbh @3 0x00 ub4 rdba_kcbh @4 0x01c0008b ub4 bas_kcbh @8 0x009217c8 ub2 wrp_kcbh @12 0x0000 ub1 seq_kcbh @14 0x01 ub1 flg_kcbh @15 0x06 (KCBHFDLC, KCBHFCKV) ub2 chkval_kcbh @16 0x40e0 ub2 spare3_kcbh @18 0x0000
As you can see the value is 0x40e0 so 13710. Please note that BBED shows the hex values format in a human-readable manner like in a Big Endian system. Since I’m doing this on Intel, the actual hex value in a datafile is written as "E040″.
BBED can calculate a checksum with command "sum" and apply it to a block with a command "sum apply". Let’s check what will happen if I’ll change the value in a database block at a low level without putting the appropriate checksum.
BBED> p *kdbr[0] rowdata[3696] ------------- ub1 rowdata[3696] @4877 0x2c BBED> x /rncccctcn rowdata[3696] @4877 ------------- flag@4877: 0x2c (KDRHFL, KDRHFF, KDRHFH) lock@4878: 0x01 cols@4879: 11 col 0[3] @4880: 108 col 1[5] @4884: Nancy col 2[9] @4890: Greenberg col 3[8] @4900: NGREENBE col 4[12] @4909: 515.124.4569 col 5[7] @4922: 17-AUG-02 col 6[6] @4930: FI_MGR col 7[4] @4937: 12008 col 8[0] @4942: *NULL* col 9[3] @4943: 101 col 10[2] @4947: 100 BBED> set offset 4885 OFFSET 4885 BBED> set mode edit MODE Edit BBED> modify /c "Dupcy" Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) Y BBED> p *kdbr[0] rowdata[3696] ------------- ub1 rowdata[3696] @4877 0x2c BBED> x /rncccctcn rowdata[3696] @4877 ------------- flag@4877: 0x2c (KDRHFL, KDRHFF, KDRHFH) lock@4878: 0x01 cols@4879: 11 col 0[3] @4880: 108 col 1[5] @4884: Dupcy col 2[9] @4890: Greenberg col 3[8] @4900: NGREENBE col 4[12] @4909: 515.124.4569 col 5[7] @4922: 17-AUG-02 col 6[6] @4930: FI_MGR col 7[4] @4937: 12008 col 8[0] @4942: *NULL* col 9[3] @4943: 101 col 10[2] @4947: 100
Now we can see that BBED is showing me the changes correctly, but when I’ll try to read a block from Oracle:
ERROR at line 1: ORA-01578: ORACLE data block corrupted (file # 7, block # 139) ORA-01110: data file 7: '/u01/app/oracle/oradata/SKIPER/datafile/o1_mf_tbs_dupa_f5rzk9yl_.dbf'
BBED will also show us that a checksum in a block is different than it shout be:
BBED> sum Check value for File 7, Block 139: current = 0x40e0, required = 0x54f4
Now we know what the checksum should be and how to trigger BBED to count it – it’s debugging time!!!
Firstly I’ll attach to my BBED process with a GDB and I’ll set a watchpoint. In assembler, a result from a function has to be set on one of the registers. The CPU documentation will tell which register to observe. In most of the cases on Intel and AMD it will be RAX:
(gdb) watch $rax == 0x54f4 Watchpoint 1: $rax == 0x54f4 (gdb) command 1 Type commands for breakpoint(s) 1, one per line. End with a line saying just "end". >c >end (gdb) set pagination off (gdb) c Continuing.
After executing a sum command in a BBED I’ll see a set of functions that returned with a specified value – of course, I’m interested only in the first one:
Watchpoint 1: $rax == 0x54f4 Old value = 0 New value = 1 0x00007f3c5857eb6b in kcbhxoro () from /u01/app/oracle/product/12.1.0/dbhome_1/lib/libclntsh.so.12.1 Watchpoint 1: $rax == 0x54f4
I’ve consulted Frits and he told me that this kcbhxoro launches sxorchk, which is doing the work. But sxorchk never fires when we count a checksum from BBED. Let’s check kcbhxoro:
(... removed for clarity ... ) 0x00007fa2e1154b16 <+6>: test %edx,%edx 0x00007fa2e1154b18 <+8>: je 0x7fa2e1154b24 <kcbhxoro+20> 0x00007fa2e1154b1a <+10>: callq 0x7fa2df2be5f8 <sxorchk@plt> (... removed for clarity ... )
As you can see, calling sxorchk is conditional and it can be skipped by highlighted instructions – and it happens in our case.
The first interesting thing is a little loop in here:
0x00007fa2e1154b3b <+43>: xor (%rdi),%rdx 0x00007fa2e1154b3e <+46>: inc %eax 0x00007fa2e1154b40 <+48>: xor 0x8(%rdi),%rdx 0x00007fa2e1154b44 <+52>: xor 0x10(%rdi),%rdx 0x00007fa2e1154b48 <+56>: xor 0x18(%rdi),%rdx 0x00007fa2e1154b4c <+60>: add $0x20,%rdi 0x00007fa2e1154b50 <+64>: cmp %esi,%eax 0x00007fa2e1154b52 <+66>: jb 0x7fa2e1154b3b <kcbhxoro+43>
We are xoring the value from the memory address stored in the RDI registry with a value of RDX registry – the result is put in RDX.
Then we increment value on EAX registry – EAX is used here as a loop counter. Do you see the last line of the above output? This is a conditional jump. First, we are comparing values on ESI and EAX and if EAX is below the ESI – we are jumping back to offset 43 to the xor.
So ESI contains the value of the loop spins. The counter is calculated here:
0x00007fa2e1154b31 <+33>: shr $0x5,%rsi
From the documentation we know, that RSI is a register used to pass the second argument to functions. So we can see here, that the second argument is shifted by 5 bits to the right – shifting right by 5 bits is the equivalent of diving the argument by 2**5 – so 32.
Ok, so let’s make a breakpoint at the shr command and check the values of the registers:
(gdb) b *0x00007fa2e1154b31 Breakpoint 1 at 0x7fa2e1154b31 (gdb) b *0x00007fa2e1154b31 Breakpoint 1 at 0x7fa2e1154b31 (gdb) c Continuing. Breakpoint 1, 0x00007fa2e1154b31 in kcbhxoro () from /u01/app/oracle/product/12.1.0/dbhome_1/lib/libclntsh.so.12.1 (gdb) p $rsi $1 = 8192 (gdb) p $rdx $2 = 0 (gdb) p $rdi $3 = 140337484677632 (gdb) p/x $rdi $4 = 0x7fa2ddec4200
Great! So we know that RDI contains the address in a memory where data begins (I know it because in assembler dump I see that RDI is used in brackets). The RSI, on the other hand, contains value 8192. If I will shift it right by 5 bits it would be equivalent to dividing it by 32 – so after this operation, I should have value 256 on the RSI.
Let’s check it.
(gdb) b *0x00007fa2e1154b3b Breakpoint 2 at 0x7fa2e1154b3b (gdb) c Continuing. Breakpoint 2, 0x00007fa2e1154b3b in kcbhxoro () from /u01/app/oracle/product/12.1.0/dbhome_1/lib/libclntsh.so.12.1 (gdb) p $rsi $5 = 256
Awesome! So it works. Now I’ll check what data is stored on the addres from the RDI. To do it, I’ll use "examine" command:
(gdb) x/gx $rdi 0x7fa2ddec4200: 0x01c0008b0000a206
Since the result of an XOR is being put in an RDX registry, the portion of 8 bytes will be used as a chunk – RDX is a 64bit registry. Format "g" in examine command means: "g(giant, 8 bytes)". You can see this with "help x".
Let’s check right now the contents of this block:
[oracle@rico ~]$ xxd -s+1138688 -g 8 -l 8 /u01/app/oracle/oradata/SKIPER/datafile/o1_mf_tbs_dupa_f5rzk9yl_.dbf 0116000: 06a200008b00c001 ........
Remember that GDB shows hex values in a human format (the same as big-endian) but on Intel, we have a small-endian format. So if you’ll reverse bytes, you will see that the values are actually the same.
Conclusion? RDI contains the address to memory with my database block! Awesome!
So the main loop looks like this:
- XOR 8 bytes from memory address, stored in RDI with RDX
- Increment counter on EAX
- XOR 8 bytes from memory address (+8B offset), stored in RDI with RDX
- XOR 8 bytes from memory address (+16B offset), stored in RDI with RDX
- XOR 8 bytes from memory address (+24B offset), stored in RDI with RDX
- add 32B to the value of RDI – so the pointer will shift 32B
- already did this 256 times? No? Jump to the begining
After that, we have some more fun:
0x00007fa2e1154b54 <+68>: mov %rdx,%rcx
Put value of RDX into RCX
0x00007fa2e1154b57 <+71>: shr $0x20,%rcx
Shift value at RCX 32b to the right.
0x00007fa2e1154b5b <+75>: xor %rcx,%rdx
XOR RCX and RDX – put the result in RDX
0x00007fa2e1154b5e <+78>: mov %rdx,%rsi
Put value of RDX into RSI
0x00007fa2e1154b61 <+81>: shr $0x10,%rsi
Shift value at RSI 16b to the right.
0x00007fa2e1154b65 <+85>: xor %rsi,%rdx
XOR RSI and RDX – put the result in RDX
0x00007fa2e1154b68 <+88>: movzwl %dx,%eax
Put 16bits (2 bytes) of value stored in RDX into EAX and fill empty bits with 0. The EAX is a register used as 1st return register. So retq (return instruction) will return value stored on this register.
There is one more thing to check – the value of a checksum is stored at 16B of a block and has 2B. It makes no sense including it into checksum calculation. So let’s compare the value of the in-memory block with actual one:
This is block stored in memory:
(gdb) x/gx $rdi+16 0x7fa2ddec4210: 0x0016000100000000
And here is the value from the disk:
[oracle@rico ~]$ xxd -s+1138704 -g 8 -l 8 /u01/app/oracle/oradata/SKIPER/datafile/o1_mf_tbs_dupa_f5rzk9yl_.dbf 0116010: e040000001001600 .@......
Awesome! So those 2B are filled with zeros in memory. Now we can write our checksum code in Python and check it:
from struct import Struct import sys, binascii ushort = Struct("H") ulong = Struct("Q") file_name = sys.argv[1] block_no = int(sys.argv[2]) block_size = 8192 dbf = open(file_name, "rb") dbf.seek(block_size*block_no) block = dbf.read(16) block += b"\x00\x00" dbf.seek(block_size*block_no + 18) block += dbf.read(block_size - 18) dbf.close() checksum_value = 0 for i in range(int(block_size/8)): checksum_value = checksum_value ^ ulong.unpack(block[i*8:i*8+8])[0] tmp = checksum_value >> 32 checksum_value = checksum_value ^ tmp tmp = checksum_value >> 16 checksum_value = checksum_value ^ tmp final_checksum = ushort.unpack(ulong.pack(checksum_value)[0:2])[0] print("checksum int = " + str(final_checksum)) print("checksum hex = " + str(hex(final_checksum)))
Result from BBED:
BBED> sum Check value for File 7, Block 139: current = 0x40e0, required = 0x54f4
Result from my Python script:
[oracle@rico ~]$ python kcbhxoro.py /u01/app/oracle/oradata/SKIPER/datafile/o1_mf_tbs_dupa_f5rzk9yl_.dbf 139 checksum int = 21748 checksum hex = 0x54f4
That’s it! And what about sxorchk function? A brief look into it shows that it does the same calculations in a different manner. So my guess is, that sxorchk is just a highly optimised version of kcbhxoro and while Oracle DB is calling this function, the argument is being passed to inform kcbhxoro that I want to use in fact sxorchk – and BBED is using just the old API. But this is just a guess.
I’d like to thank Frits Hoogland for his insights on this subject 🙂
Hope to show you the new clone of BBED soon!
Cheers!
P.S.
This is a very nice article to read if you want to understand GDB usage and formatting: https://www.recurse.com/blog/7-understanding-c-by-learning-assembly