What to do with 5GB of trace files?


08.01.2021
by Kamil Stawiarski

Parse it 🙂
Simple, right? But for sure we are not going to use tkprof for 14 000 trace files, are we?

We don’t have Diagnostic Pack and we just started a complete database tracing for an hour or two, because the customer said that they encounter performance problems during this period. Now you have to deal with 7k trace files that weight around 5GBs. We already know that tkprofing everything is not an option but we want to extract some valuable data, like the amount of wait events, time consumed by them and which SQL IDs were affected by those wait events.
There was no such tool that could help me – so I wrote myself one 🙂

https://github.com/ora600pl/tracefile_parsers/tree/master/parse_trace_owi

It is written on Go Lang so you can easily download it and compile.

Let’s see how you can use it!

This is my directory with trace files:

inter@piclerick 20201217_przypadki_testowe1 % ls *.trc | wc -l
    7580
inter@piclerick 20201217_przypadki_testowe1 % du -sh .
4.9G	.

After installing Go Lang I can clone my git repo and build the tool:

inter@piclerick 20201217_przypadki_testowe1 % git clone https://github.com/ora600pl/tracefile_parsers
Cloning into 'tracefile_parsers'...
remote: Enumerating objects: 53, done.
remote: Counting objects: 100% (53/53), done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 68 (delta 25), reused 40 (delta 13), pack-reused 15
Unpacking objects: 100% (68/68), done.
inter@piclerick 20201217_przypadki_testowe1 % export GOPATH=$PWD/tracefile_parsers/parse_trace_owi
inter@piclerick 20201217_przypadki_testowe1 % export PATH=$GOPATH:$PATH
inter@piclerick 20201217_przypadki_testowe1 % go build $GOPATH/parse_trace_owi.go
inter@piclerick 20201217_przypadki_testowe1 % ./parse_trace_owi
   -cpuprofile string
    	write cpu profile to file
  -d string
    	debug (default "false")
  -event string
    	Display SQLids for specified event
  -p int
    	parallel degree (default 1)
  -s string
    	where to search for trace files
  -sqlid string
    	Display wait events for sqlid
  -tf string
    	time from (default "2020-01-01 00:00:00.100")
  -top int
    	Only TOP n elements
  -tt string
    	time to (default "2020-01-02 00:00:00.100")

Let’s say, that my customer run a few test cases and we want analyze the one between 11:00 and 12:00 at 2020-12-17. Let’s display TOP 10 of wait events:

inter@piclerick 20201217_przypadki_testowe1 % ./parse_trace_owi -s . -p 4 -tf="2020-12-17 11:00:00.000" -tt="2020-12-17 12:00:00.000" -top=10
                                                  WAIT EVENT		ELA(ms)			COUNT		AVG(ms)			STDDEV(ms)		CLASS

                                     'direct path read temp'		17950.971000		4454		4.030303		6.339411		User I/O
                                             'log file sync'		39742.699000		1807		21.993746		55.424215		Commit
                                         'gc cr grant 2-way'		62113.381000		28519		2.177965		6.365191		Cluster
                                         'library cache pin'		70865.219000		20153		3.516361		18.274058		Concurrency
                                     'read by other session'		84863.151000		23327		3.637980		20.428769		User I/O
                                     'enq: IV -  contention'		114279.343000		9344		12.230238		52.188474		UNKNOWN
                                        'library cache lock'		130987.800000		19661		6.662316		39.169932		Concurrency
                             'cell multiblock physical read'		267987.927000		119366		2.245094		8.076494		User I/O
                                         'TCP Socket (KGAS)'		279916.933000		252		1110.781480		5213.904037		Network
                           'cell single block physical read'		1462145.153000		749308		1.951327		10.049973		User I/O
Everythong took: 92.733040 ms

You may also want TOP 10 SQL IDs waiting for some event:

% ./parse_trace_owi -s . -p 4 -tf="2020-12-17 11:00:00.000" -tt="2020-12-17 12:00:00.000" -top=10 -event="'cell single block physical read'"
SQLs for event 'cell single block physical read'
'agxfj8nfgb6yc'		7260.469000
'9k51zwtnfv4p5'		7738.085000
'18fy6gm5dp4f4'		14051.708000
'5tqpusbd6ndzq'		14832.855000
'9au7nr1ct9nx4'		30770.077000
'0r0f8c4qn71g3'		72057.268000
'8ujnkzntxyh8z'		73529.641000
'8mxfu63djand2'		86717.187000
'4rtk5q2jdbbm0'		164311.897000
'5pgxcutnsdpqc'		863357.352000
It was 50.967742 percent out of all SQLs - 869 out of 1705
Everythong took: 43.363431 ms

Or all events on which some specified SQL ID waited:

% ./parse_trace_owi -s . -p 4 -tf="2020-12-17 11:00:00.000" -tt="2020-12-17 12:00:00.000" -sqlid="'0r0f8c4qn71g3'"
Wait events for this SQLid
                                                  WAIT EVENT		ELA(ms)			COUNT		AVG(ms)			STDDEV(ms)		CLASS

                                  'Disk file operations I/O'		0.005000		1		0.005000		0.000000		User I/O
                                 'SQL*Net message to client'		0.006000		3		0.002000		0.000000		Network
                                 'gc cr multi block request'		0.311000		1		0.311000		0.000000		Cluster
                                         'gc cr grant 2-way'		9.610000		3		3.203333		3.454249		Cluster
                                           'gc cr disk read'		18.062000		5		3.612400		3.275354		Cluster
                                     'gc current grant busy'		35.486000		3		11.828667		10.254008		Cluster
                          'enq: KO - fast object checkpoint'		40.521000		9		4.502333		9.891949		Application
                                     'read by other session'		53.002000		2		26.501000		8.863000		User I/O
                                         'gc cr block 2-way'		55.438000		21		2.639905		4.168717		Cluster
                                    'gc current block 2-way'		74.444000		22		3.383818		3.040416		Cluster
                               'latch: cache buffers chains'		87.275000		17		5.133824		11.571923		Concurrency
                         'cell list of blocks physical read'		539.831000		30		17.994367		13.906518		User I/O
                                     'cell smart table scan'		793.048000		143		5.545790		7.733290		User I/O
                             'cell multiblock physical read'		46368.724000		8541		5.428957		13.757391		User I/O
                           'cell single block physical read'		72057.268000		18950		3.802494		9.149033		User I/O
Everythong took: 46.288263 ms

Pretty cool, right? 🙂

You may notice that a first parse will take much longer then next executions. That’s because I used a simple Go Lang trick – after the first actual parse, I’m saving binary representation of structures in a file:

//remember the structure map
var eventBin bytes.Buffer
enc := gob.NewEncoder(&eventBin)
err := enc.Encode(me)
if err != nil {
	log.Panic(err)
}

eventBinF, errf := os.Create(saveFileName)
if errf != nil {
	log.Panic(errf)
}

eventBin.WriteTo(eventBinF)
eventBinF.Close()
/////////////////////////////////

This is going to create a binary file with pre-parsed trace files:

% file 20201217110000000_20201217120000000.owisave
20201217110000000_20201217120000000.owisave: data

The name of the file is constructed based on the date range you specified in parameters. If the file for specified date range exists, I’m reading it instead of doing praising once more:

var eventBin bytes.Buffer
dec := gob.NewDecoder(&eventBin)
eventBinF, errf := os.Open(saveFileName)
if errf != nil {
	log.Panic(errf)
}

eventBin.ReadFrom(eventBinF)
dec.Decode(&me)

Thanks to that trick, you have to wait only once. But still – parsing 5GB in 13s is not bad. If you want to parse the whole set from scratch – just remove *.owisave files.

That’s it. Have fun! 🙂


Contact us

Database Whisperers sp. z o. o. sp. k.
al. Jerozolimskie 200, 3rd floor, room 342
02-486 Warszawa
NIP: 5272744987
REGON:362524978
+48 508 943 051
+48 661 966 009
info@ora-600.pl

Newsletter Sign up to be updated