At UKOUG Tech’15 (Super Sunday) there was a session called "12c: A Closer Look at the Multi-Process Multi-Threaded Model for Oracle on Linux/Unix" by Markus Flechtner. Since then I can’t stop thinking about the performance of this feature. Why Oracle decided to provide it?
I did some tests regarding sort operations and threaded execution was always a bit slower. I decided to dig a little dipper into this.
Oracle instance with threaded_execution=false:
1 | SQL> sho parameters threaded_execution |
4 | ------------------------------------ ----------- ------------------------------ |
5 | threaded_execution boolean FALSE |
6 | SQL> select count(1) from sales2; |
12 | SQL> select count(distinct file# || block#) |
13 | 2 from v$bh b, dba_objects o |
14 | 3 where b.objd=o.data_object_id |
15 | 4 and o.object_name='SALES2' |
16 | 5 and b.status!='free'; |
18 | COUNT(DISTINCTFILE#||BLOCK#) |
19 | ---------------------------- |
22 | SQL> select blocks from user_segments where segment_name='SALES2'; |
Oracle instance with threaded_execution=true:
1 | SQL> sho parameters threaded_execution |
4 | ------------------------------------ ----------- ------------------------------ |
5 | threaded_execution boolean TRUE |
6 | SQL> select count(1) from sales2; |
12 | SQL> select count(distinct file# || block#) |
13 | 2 from v$bh b, dba_objects o |
14 | 3 where b.objd=o.data_object_id |
15 | 4 and o.object_name='SALES2' |
16 | 5 and b.status!='free'; |
18 | COUNT(DISTINCTFILE#||BLOCK#) |
19 | ---------------------------- |
22 | SQL> select blocks from user_segments where segment_name='SALES2'; |
As you can see, I have the same table (SALES2) on both instances, which is cached in buffer cache.
Now let’s test some kind of query, that access only this one table:
Oracle instance with threaded_execution=false:
5 | 3 select s.*, row_number() over (order by amount_sold desc) as rn |
13 | PROD_ID CUST_ID TIME_ID CHANNEL_ID PROMO_ID QUANTITY_SOLD AMOUNT_SOLD |
14 | ---------- ---------- --------- ---------- ---------- ------------- ----------- |
17 | 18 510 03-JAN-99 3 999 1 1782.72 |
20 | 18 1330 03-JAN-99 3 999 1 1782.72 |
Oracle instance with threaded_execution=true:
5 | 3 select s.*, row_number() over (order by amount_sold desc) as rn |
13 | PROD_ID CUST_ID TIME_ID CHANNEL_ID PROMO_ID QUANTITY_SOLD AMOUNT_SOLD |
14 | ---------- ---------- --------- ---------- ---------- ------------- ----------- |
17 | 18 510 03-JAN-99 3 999 1 1782.72 |
20 | 18 1330 03-JAN-99 3 999 1 1782.72 |
I used perf to analyze those two executions and I found a lot more syscalls, when threaded execution was used. I noticed also that in threaded execution, oracle is using nsmore2recv function. When I googled it, I found a great article by Tanel Poder: https://blog.tanelpoder.com/2008/02/05/oracle-hidden-costs-revealed-part-1/
Using strace revealed similar situation to this described by Tanel – a lot of POLL syscalls.
Oracle instance with threaded_execution=false:
1 | [root@rico ~]# strace -cp 12168 |
3 | ^CProcess 12168 detached |
4 | % time seconds usecs/call calls errors syscall |
5 | ------ ----------- ----------- --------- --------- ---------------- |
7 | 0.00 0.000000 0 2 write |
8 | 0.00 0.000000 0 1 semctl |
9 | 0.00 0.000000 0 159 getrusage |
10 | 0.00 0.000000 0 12 times |
11 | 0.00 0.000000 0 3 semtimedop |
12 | ------ ----------- ----------- --------- --------- ---------------- |
13 | 100.00 0.000000 179 total |
Oracle instance with threaded_execution=true:
1 | [root@rico fd]# strace -cp 12165 |
3 | ^CProcess 12165 detached |
4 | % time seconds usecs/call calls errors syscall |
5 | ------ ----------- ----------- --------- --------- ---------------- |
6 | 84.22 0.113706 0 980840 poll |
7 | 10.37 0.014000 7000 2 read |
8 | 5.41 0.007310 1218 6 semtimedop |
9 | 0.00 0.000000 0 2 write |
10 | 0.00 0.000000 0 1 semctl |
11 | 0.00 0.000000 0 419 getrusage |
12 | 0.00 0.000000 0 12 times |
13 | ------ ----------- ----------- --------- --------- ---------------- |
14 | 100.00 0.135016 981282 total |
16 | [root@rico fd]# strace -p 12165 -o /tmp/threaded_exec.out |
18 | ^CProcess 12165 detached |
19 | [root@rico fd]# grep poll /tmp/threaded_exec.out | tail |
20 | poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) |
21 | poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) |
22 | poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) |
23 | poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) |
24 | poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) |
25 | poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) |
26 | poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) |
27 | poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) |
28 | poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) |
29 | poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) |
The STID 12165 was signed to SPID 8107:
3 | 2 from v$process p, v$session s |
5 | 4* and s.sid=sys_context('userenv','sid') |
9 | ------------------------ ------------------------ |
Let’s check the file descriptors for this thread:
1 | [root@rico ~]# cd /proc/8107/task/12165/fd |
2 | [root@rico fd]# ls -al | grep 63 |
3 | lrwx------. 1 oracle oinstall 64 12-17 21:38 63 -> socket:[73968] |
4 | [root@rico fd]# lsof | grep 73968 |
5 | ora_scmn_ 8107 oracle 63u IPv6 73968 0t0 TCP localhost:ncube-lm->localhost:32400 (ESTABLISHED) |
6 | [root@rico fd]# ps aux | grep 8107 | grep -v grep |
7 | oracle 8107 4.7 29.0 6155520 2901516 ? Ssl 20:01 6:54 ora_u005_orclth |
So it looks like, that the process ora_u00N is using sockets to communicate with its threads. That’s why we have seen the additional syscalls, which were an overhead causing slower execution.
So I raise the question: why and when use threaded execution? If ever?