What to do with 5GB of trace files?


08.01.2021
by Kamil Stawiarski

Parse it 🙂
Simple, right? But for sure we are not going to use tkprof for 14 000 trace files, are we?

We don’t have Diagnostic Pack and we just started a complete database tracing for an hour or two, because the customer said that they encounter performance problems during this period. Now you have to deal with 7k trace files that weight around 5GBs. We already know that tkprofing everything is not an option but we want to extract some valuable data, like the amount of wait events, time consumed by them and which SQL IDs were affected by those wait events.
There was no such tool that could help me – so I wrote myself one 🙂

https://github.com/ora600pl/tracefile_parsers/tree/master/parse_trace_owi

It is written on Go Lang so you can easily download it and compile.

Let’s see how you can use it!

This is my directory with trace files:

1inter@piclerick 20201217_przypadki_testowe1 % ls *.trc | wc -l
2    7580
3inter@piclerick 20201217_przypadki_testowe1 % du -sh .
44.9G    .

After installing Go Lang I can clone my git repo and build the tool:

1inter@piclerick 20201217_przypadki_testowe1 % git clone https://github.com/ora600pl/tracefile_parsers
2Cloning into 'tracefile_parsers'...
3remote: Enumerating objects: 53, done.
4remote: Counting objects: 100% (53/53), done.
5remote: Compressing objects: 100% (39/39), done.
6remote: Total 68 (delta 25), reused 40 (delta 13), pack-reused 15
7Unpacking objects: 100% (68/68), done.
8inter@piclerick 20201217_przypadki_testowe1 % export GOPATH=$PWD/tracefile_parsers/parse_trace_owi
9inter@piclerick 20201217_przypadki_testowe1 % export PATH=$GOPATH:$PATH
10inter@piclerick 20201217_przypadki_testowe1 % go build $GOPATH/parse_trace_owi.go
11inter@piclerick 20201217_przypadki_testowe1 % ./parse_trace_owi
12   -cpuprofile string
13        write cpu profile to file
14  -d string
15        debug (default "false")
16  -event string
17        Display SQLids for specified event
18  -p int
19        parallel degree (default 1)
20  -s string
21        where to search for trace files
22  -sqlid string
23        Display wait events for sqlid
24  -tf string
25        time from (default "2020-01-01 00:00:00.100")
26  -top int
27        Only TOP n elements
28  -tt string
29        time to (default "2020-01-02 00:00:00.100")

Let’s say, that my customer run a few test cases and we want analyze the one between 11:00 and 12:00 at 2020-12-17. Let’s display TOP 10 of wait events:

1inter@piclerick 20201217_przypadki_testowe1 % ./parse_trace_owi -s . -p 4 -tf="2020-12-17 11:00:00.000" -tt="2020-12-17 12:00:00.000" -top=10
2                                                  WAIT EVENT        ELA(ms)         COUNT       AVG(ms)         STDDEV(ms)      CLASS
3 
4                                     'direct path read temp'        17950.971000        4454        4.030303        6.339411        User I/O
5                                             'log file sync'        39742.699000        1807        21.993746       55.424215       Commit
6                                         'gc cr grant 2-way'        62113.381000        28519       2.177965        6.365191        Cluster
7                                         'library cache pin'        70865.219000        20153       3.516361        18.274058       Concurrency
8                                     'read by other session'        84863.151000        23327       3.637980        20.428769       User I/O
9                                     'enq: IV -  contention'        114279.343000       9344        12.230238       52.188474       UNKNOWN
10                                        'library cache lock'        130987.800000       19661       6.662316        39.169932       Concurrency
11                             'cell multiblock physical read'        267987.927000       119366      2.245094        8.076494        User I/O
12                                         'TCP Socket (KGAS)'        279916.933000       252     1110.781480     5213.904037     Network
13                           'cell single block physical read'        1462145.153000      749308      1.951327        10.049973       User I/O
14Everythong took: 92.733040 ms

You may also want TOP 10 SQL IDs waiting for some event:

1% ./parse_trace_owi -s . -p 4 -tf="2020-12-17 11:00:00.000" -tt="2020-12-17 12:00:00.000" -top=10 -event="'cell single block physical read'"
2SQLs for event 'cell single block physical read'
3'agxfj8nfgb6yc'     7260.469000
4'9k51zwtnfv4p5'     7738.085000
5'18fy6gm5dp4f4'     14051.708000
6'5tqpusbd6ndzq'     14832.855000
7'9au7nr1ct9nx4'     30770.077000
8'0r0f8c4qn71g3'     72057.268000
9'8ujnkzntxyh8z'     73529.641000
10'8mxfu63djand2'     86717.187000
11'4rtk5q2jdbbm0'     164311.897000
12'5pgxcutnsdpqc'     863357.352000
13It was 50.967742 percent out of all SQLs - 869 out of 1705
14Everythong took: 43.363431 ms

Or all events on which some specified SQL ID waited:

1% ./parse_trace_owi -s . -p 4 -tf="2020-12-17 11:00:00.000" -tt="2020-12-17 12:00:00.000" -sqlid="'0r0f8c4qn71g3'"
2Wait events for this SQLid
3                                                  WAIT EVENT        ELA(ms)         COUNT       AVG(ms)         STDDEV(ms)      CLASS
4 
5                                  'Disk file operations I/O'        0.005000        1       0.005000        0.000000        User I/O
6                                 'SQL*Net message to client'        0.006000        3       0.002000        0.000000        Network
7                                 'gc cr multi block request'        0.311000        1       0.311000        0.000000        Cluster
8                                         'gc cr grant 2-way'        9.610000        3       3.203333        3.454249        Cluster
9                                           'gc cr disk read'        18.062000       5       3.612400        3.275354        Cluster
10                                     'gc current grant busy'        35.486000       3       11.828667       10.254008       Cluster
11                          'enq: KO - fast object checkpoint'        40.521000       9       4.502333        9.891949        Application
12                                     'read by other session'        53.002000       2       26.501000       8.863000        User I/O
13                                         'gc cr block 2-way'        55.438000       21      2.639905        4.168717        Cluster
14                                    'gc current block 2-way'        74.444000       22      3.383818        3.040416        Cluster
15                               'latch: cache buffers chains'        87.275000       17      5.133824        11.571923       Concurrency
16                         'cell list of blocks physical read'        539.831000      30      17.994367       13.906518       User I/O
17                                     'cell smart table scan'        793.048000      143     5.545790        7.733290        User I/O
18                             'cell multiblock physical read'        46368.724000        8541        5.428957        13.757391       User I/O
19                           'cell single block physical read'        72057.268000        18950       3.802494        9.149033        User I/O
20Everythong took: 46.288263 ms

Pretty cool, right? 🙂

You may notice that a first parse will take much longer then next executions. That’s because I used a simple Go Lang trick – after the first actual parse, I’m saving binary representation of structures in a file:

1//remember the structure map
2var eventBin bytes.Buffer
3enc := gob.NewEncoder(&eventBin)
4err := enc.Encode(me)
5if err != nil {
6    log.Panic(err)
7}
8 
9eventBinF, errf := os.Create(saveFileName)
10if errf != nil {
11    log.Panic(errf)
12}
13 
14eventBin.WriteTo(eventBinF)
15eventBinF.Close()
16/////////////////////////////////

This is going to create a binary file with pre-parsed trace files:

1% file 20201217110000000_20201217120000000.owisave
220201217110000000_20201217120000000.owisave: data

The name of the file is constructed based on the date range you specified in parameters. If the file for specified date range exists, I’m reading it instead of doing praising once more:

1var eventBin bytes.Buffer
2dec := gob.NewDecoder(&eventBin)
3eventBinF, errf := os.Open(saveFileName)
4if errf != nil {
5    log.Panic(errf)
6}
7 
8eventBin.ReadFrom(eventBinF)
9dec.Decode(&me)

Thanks to that trick, you have to wait only once. But still – parsing 5GB in 13s is not bad. If you want to parse the whole set from scratch – just remove *.owisave files.

That’s it. Have fun! 🙂


Contact us

Database Whisperers sp. z o. o. sp. k.
al. Jerozolimskie 200, 3rd floor, room 342
02-486 Warszawa
NIP: 5272744987
REGON:362524978
+48 508 943 051
+48 661 966 009
info@ora-600.pl

Newsletter Sign up to be updated