At UKOUG Tech’15 (Super Sunday) there was a session called "12c: A Closer Look at the Multi-Process Multi-Threaded Model for Oracle on Linux/Unix" by Markus Flechtner. Since then I can’t stop thinking about the performance of this feature. Why Oracle decided to provide it?
I did some tests regarding sort operations and threaded execution was always a bit slower. I decided to dig a little dipper into this.
Oracle instance with threaded_execution=false:
SQL> sho parameters threaded_execution NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ threaded_execution boolean FALSE SQL> select count(1) from sales2; COUNT(1) ---------- 58805952 SQL> select count(distinct file# || block#) 2 from v$bh b, dba_objects o 3 where b.objd=o.data_object_id 4 and o.object_name='SALES2' 5 and b.status!='free'; COUNT(DISTINCTFILE#||BLOCK#) ---------------------------- 283624 SQL> select blocks from user_segments where segment_name='SALES2'; BLOCKS ---------- 284672
Oracle instance with threaded_execution=true:
SQL> sho parameters threaded_execution NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ threaded_execution boolean TRUE SQL> select count(1) from sales2; COUNT(1) ---------- 58805952 SQL> select count(distinct file# || block#) 2 from v$bh b, dba_objects o 3 where b.objd=o.data_object_id 4 and o.object_name='SALES2' 5 and b.status!='free'; COUNT(DISTINCTFILE#||BLOCK#) ---------------------------- 283624 SQL> select blocks from user_segments where segment_name='SALES2'; BLOCKS ---------- 284672
As you can see, I have the same table (SALES2) on both instances, which is cached in buffer cache.
Now let’s test some kind of query, that access only this one table:
Oracle instance with threaded_execution=false:
SQL> set timing on SQL> get bigsort 1 with v_sales as 2 ( 3 select s.*, row_number() over (order by amount_sold desc) as rn 4 from sales2 s 5 ) 6 select * 7 from v_sales 8* where rn<=2 SQL> / PROD_ID CUST_ID TIME_ID CHANNEL_ID PROMO_ID QUANTITY_SOLD AMOUNT_SOLD ---------- ---------- --------- ---------- ---------- ------------- ----------- RN ---------- 18 510 03-JAN-99 3 999 1 1782.72 1 18 1330 03-JAN-99 3 999 1 1782.72 2 Elapsed: 00:00:11.98
Oracle instance with threaded_execution=true:
SQL> set timing on SQL> get bigsort 1 with v_sales as 2 ( 3 select s.*, row_number() over (order by amount_sold desc) as rn 4 from sales2 s 5 ) 6 select * 7 from v_sales 8* where rn<=2 SQL> / PROD_ID CUST_ID TIME_ID CHANNEL_ID PROMO_ID QUANTITY_SOLD AMOUNT_SOLD ---------- ---------- --------- ---------- ---------- ------------- ----------- RN ---------- 18 510 03-JAN-99 3 999 1 1782.72 1 18 1330 03-JAN-99 3 999 1 1782.72 2 Elapsed: 00:00:12.58
I used perf to analyze those two executions and I found a lot more syscalls, when threaded execution was used. I noticed also that in threaded execution, oracle is using nsmore2recv function. When I googled it, I found a great article by Tanel Poder: https://blog.tanelpoder.com/2008/02/05/oracle-hidden-costs-revealed-part-1/
Using strace revealed similar situation to this described by Tanel – a lot of POLL syscalls.
Oracle instance with threaded_execution=false:
[root@rico ~]# strace -cp 12168 Process 12168 attached ^CProcess 12168 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 0.00 0.000000 0 2 read 0.00 0.000000 0 2 write 0.00 0.000000 0 1 semctl 0.00 0.000000 0 159 getrusage 0.00 0.000000 0 12 times 0.00 0.000000 0 3 semtimedop ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000000 179 total
Oracle instance with threaded_execution=true:
[root@rico fd]# strace -cp 12165 Process 12165 attached ^CProcess 12165 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 84.22 0.113706 0 980840 poll 10.37 0.014000 7000 2 read 5.41 0.007310 1218 6 semtimedop 0.00 0.000000 0 2 write 0.00 0.000000 0 1 semctl 0.00 0.000000 0 419 getrusage 0.00 0.000000 0 12 times ------ ----------- ----------- --------- --------- ---------------- 100.00 0.135016 981282 total [root@rico fd]# strace -p 12165 -o /tmp/threaded_exec.out Process 12165 attached ^CProcess 12165 detached [root@rico fd]# grep poll /tmp/threaded_exec.out | tail poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) poll([{fd=63, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout)
The STID 12165 was signed to SPID 8107:
SQL> get spid 1 select spid, stid 2 from v$process p, v$session s 3 where p.addr=s.paddr 4* and s.sid=sys_context('userenv','sid') SQL> / SPID STID ------------------------ ------------------------ 8107 12165
Let’s check the file descriptors for this thread:
[root@rico ~]# cd /proc/8107/task/12165/fd [root@rico fd]# ls -al | grep 63 lrwx------. 1 oracle oinstall 64 12-17 21:38 63 -> socket:[73968] [root@rico fd]# lsof | grep 73968 ora_scmn_ 8107 oracle 63u IPv6 73968 0t0 TCP localhost:ncube-lm->localhost:32400 (ESTABLISHED) [root@rico fd]# ps aux | grep 8107 | grep -v grep oracle 8107 4.7 29.0 6155520 2901516 ? Ssl 20:01 6:54 ora_u005_orclth [root@rico fd]#
So it looks like, that the process ora_u00N is using sockets to communicate with its threads. That’s why we have seen the additional syscalls, which were an overhead causing slower execution.
So I raise the question: why and when use threaded execution? If ever?