How Oracle stores numbers internally


21.08.2017
by Kamil Stawiarski

Before you proceed, please check out this short article written by Tanel Poder:
http://blog.tanelpoder.com/2010/09/02/which-number-takes-more-space-in-an-oracle-row/

In the documentation, you can find the following explanation about the internal numeric format:

Oracle stores numeric data in variable-length format. Each value is stored in scientific notation, with 1 byte used to store the exponent and up to 20 bytes to store the mantissa. The resulting value is limited to 38 digits of precision. Oracle does not store leading and trailing zeros. For example, the number 412 is stored in a format similar to 4.12 x 102, with 1 byte used to store the exponent(2) and 2 bytes used to store the three significant digits of the mantissa(4,1,2). Negative numbers include the sign in their length.

So how to decode a number from hex value, that can be found in a datafile or archivelog?
Let’s take a value c3020102 which is 10001.

The first byte (c3) is exponent… but wait? c3 = 195. And 10^195 would be a little higher than 10001 🙂
Well, it has occurred that you have to subtract 193 from the first byte (c1) and multiple it 2 and add 2 (yes, really) to get the actual exponent.

So it could look like this:

>>> (int("c3",16)-193)*2+2
6

So the actual value of the exponent is 6. The next step would be to concatenate all other bytes together in a format "02d" while subtracting 1 from each number.
It would give us the following result: 010001. Let’s add "0." in the front and multiply it by 10^6 – the result is 10001 (this is our number!)

The similar case is with negative numbers – the negative number has byte 0x66 (102) on the last position and you have to subtract each byte from 101 while making a concatenation.

The code for 0 is 0x80

The algorithm in Python looks like this:

from struct import Struct
from decimal import Decimal
import binascii

def decode_number(data_hex):
    data_hex_b = binascii.unhexlify(data_hex)
    if data_hex == "80":
        return 0

    if Struct("B").unpack(data_hex_b[-1])[0] != 102:
        exPot = Struct("B").unpack(data_hex_b[0])[0] - 193
        numberValue = "0."
        exPot=exPot*2 + 2

        for i in range(1,len(data_hex_b)):
            numberValue += "{:02d}".format(Struct("B").unpack(data_hex_b[i])[0] - 1)
    else:
        exPot = 62 - Struct("B").unpack(data_hex_b[0])[0]
        numberValue = "-0."
        exPot=exPot*2 + 2

        for i in range(1,len(data_hex_b)-1):
            numberValue += "{:02d}".format(101 - Struct("B").unpack(data_hex_b[i])[0])

    fVal = Decimal(numberValue)
    powVal = Decimal(10)**Decimal(exPot)
    return str(fVal * powVal).rstrip("0").rstrip(".")

if __name__=='__main__':
    print decode_number("c3020102"), " c3020102 should be 10001"
    print decode_number("c2050112105b"), " c2050112105b should be 400,17159"
    print decode_number("c205025b0a29"), " c205025b0a29 should be 401,90094"
    print decode_number("c20502155d"), " c20502155d should be 401,2092"
    print decode_number("c102"), " c102 should by 1"
    print decode_number("c22662645547"), " c22662645547 should be 3797,99847"
    print decode_number("c2074d022906072449"), "c2074d022906072449 should be 676,014005063572"
    print decode_number("3d5f19643d605f421d66"), " should be -676.014005063572"
    print decode_number("c40a110e09"), " c40a110e09  should be 9161308"
    print decode_number("80"), " 80 should be 0"

You may wonder how it is useful? First of all, it’s fun 😉 But also I’ve heard some heresies that it doesn’t matter anymore what data type you choose while designing your database. That storing numbers or dates can be done in varchar2 with no impact on performance. Bullshit.

Just check how many bytes will take to store above numbers in varchar2. Have fun with testing! 🙂

Search:


Kontakt

Database Whisperers sp. z o. o. sp. k.
Al. Jerozolimskie 200
Wejście B, III piętro/ pokój 342
02-486 Warszawa
NIP: 5272744987
REGON:362524978
+48 508 943 051
+48 661 966 009
info@ora-600.pl

Newsletter zapisz się żeby być na bieżąco