Recently one of my students asked me if there is any correlation between the AU size in ASM diskgroup and parameter db_file_multiblock_read_count. I made some tests but results are very strange.
I have 4 same tables, located in 4 different diskgroups:
1 select t.table_name, g.name, t.blocks, g.ALLOCATION_UNIT_SIZE 2 from dba_tables t, dba_data_files d, v$asm_diskgroup g 3 where t.tablespace_name=d.tablespace_name 4 and g.name=substr(d.file_name,2,instr(d.file_name,'/')-2) 5 and t.owner='SH' 6* and t.table_name like 'SALES%' SQL> / TABLE_NAME NAME BLOCKS ALLOCATION_UNIT_SIZE ---------- ------------------------------ ---------- -------------------- SALES16M DATA16M 93551 16777216 SALES32M DATA32M 93551 33554432 SALES1M DATA1M 93551 1048576 SALES4M DATA4M 93551 4194304
All tables are smaller then a hidden parameter "_small_table_threshold" so table access full on all tables will be using db file scattered read.
SQL> ; 1 select p.ksppinm, v.ksppstvl 2 from x$ksppi p, x$ksppcv v 3 where p.indx=v.indx 4* and p.ksppinm='_small_table_threshold' SQL> / KSPPINM KSPPSTVL ---------------------------------------- ------------------------------ _small_table_threshold 101527
Let’s start the main test. Parameter db_file_multiblock_read_count is set to the default value
[oracle@rico ~]$ sqlplus sh/sh SQL*Plus: Release 12.1.0.2.0 Production on Fri Jul 17 10:30:36 2015 Copyright (c) 1982, 2014, Oracle. All rights reserved. Last Successful login time: Wed Jul 01 2015 11:16:33 +02:00 Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics and Real Application Testing options SQL> sho parameters db_file_multiblock NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_file_multiblock_read_count integer 128 SQL>
I will identify the server process, correlated with my session
SQL> ; 1 select spid 2 from v$process p, v$session s 3 where p.addr=s.paddr 4* and s.username='SH' SQL> / SPID ------------------------ 10430
Now I will use strace on this server process:
[root@rico ~]# strace -c -p 10430 Process 10430 attached - interrupt to quit
I will gather results for the following queries:
SQL> select count(1) from SALES16M; COUNT(1) ---------- 19295703 SQL> select count(1) from SALES32M; COUNT(1) ---------- 19295703 SQL> select count(1) from SALES1M; COUNT(1) ---------- 19295703 SQL> select count(1) from SALES4M; COUNT(1) ---------- 19295703
Results for SALES16M
[root@rico ~]# strace -c -p 10430 Process 10430 attached - interrupt to quit ^CProcess 10430 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.053000 71 750 pread 0.00 0.000000 0 31 read 0.00 0.000000 0 2 write 0.00 0.000000 0 16 7 open 0.00 0.000000 0 9 close 0.00 0.000000 0 1 stat 0.00 0.000000 0 1 fstat 0.00 0.000000 0 28 lseek 0.00 0.000000 0 3 mmap 0.00 0.000000 0 2 munmap 0.00 0.000000 0 1 brk 0.00 0.000000 0 1 uname 0.00 0.000000 0 10 fcntl 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 164 getrusage 0.00 0.000000 0 28 times 0.00 0.000000 0 1 getuid 0.00 0.000000 0 1 io_getevents 0.00 0.000000 0 1 io_submit ------ ----------- ----------- --------- --------- ---------------- 100.00 0.053000 1051 7 total
Results for SALES32M
[root@rico ~]# strace -c -p 10430 Process 10430 attached - interrupt to quit ^CProcess 10430 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 98.16 0.053462 72 747 pread 1.84 0.001000 500 2 read 0.00 0.000000 0 2 write 0.00 0.000000 0 1 open 0.00 0.000000 0 1 close 0.00 0.000000 0 1 mmap 0.00 0.000000 0 3 fcntl 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 164 getrusage 0.00 0.000000 0 28 times ------ ----------- ----------- --------- --------- ---------------- 100.00 0.054462 950 total
Results for SALES1M
[root@rico ~]# strace -c -p 10430 Process 10430 attached - interrupt to quit ^CProcess 10430 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 48.58 0.015000 26 583 io_getevents 39.99 0.012349 75 164 pread 11.43 0.003528 6 583 io_submit 0.00 0.000000 0 3 read 0.00 0.000000 0 2 write 0.00 0.000000 0 2 open 0.00 0.000000 0 2 close 0.00 0.000000 0 1 semctl 0.00 0.000000 0 3 fcntl 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 164 getrusage 0.00 0.000000 0 29 times 0.00 0.000000 0 3 semtimedop ------ ----------- ----------- --------- --------- ---------------- 100.00 0.030877 1540 total
Results for SALES4M
[root@rico ~]# strace -c -p 10430 Process 10430 attached - interrupt to quit ^CProcess 10430 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 95.16 0.043000 65 664 pread 2.62 0.001186 14 83 io_submit 2.21 0.001000 12 83 io_getevents 0.00 0.000000 0 2 read 0.00 0.000000 0 2 write 0.00 0.000000 0 21 getrusage 0.00 0.000000 0 12 times ------ ----------- ----------- --------- --------- ---------------- 100.00 0.045186 867 total
As we can see, the ASYNC IO represented by syscalls: io_submit and io_getevents are mostly used while scanning table located in diskgroup with 1MB AU.
At first I thought that this is correlated with db_file_multiblock_read_count. The default value (128) multiplied by 8k (block size for my database) is exactly 1M. So I made a few more tests with altered values for this parameter. First results where promising – I have managed to get ASYNC IO for all of the tables.
But then I made the final (a little stupid :)) test. After creating a session I have altered value of the db_file_multiblock_read_count to 128. Yes: 128. So the same as default:
[oracle@rico ~]$ sqlplus sh/sh SQL*Plus: Release 12.1.0.2.0 Production on Fri Jul 17 10:49:25 2015 Copyright (c) 1982, 2014, Oracle. All rights reserved. Last Successful login time: Fri Jul 17 2015 10:48:43 +02:00 Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics and Real Application Testing options SQL> sho parameters db_file_multiblock NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_file_multiblock_read_count integer 128 SQL> alter session set db_file_multiblock_read_count=128; Session altered.
Let’s check the results now:
SALES16M
[root@rico ~]# strace -c -p 11208 Process 11208 attached - interrupt to quit ^CProcess 11208 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 53.96 0.007032 9 747 io_submit 46.04 0.006000 8 747 io_getevents 0.00 0.000000 0 31 read 0.00 0.000000 0 2 write 0.00 0.000000 0 16 7 open 0.00 0.000000 0 9 close 0.00 0.000000 0 1 stat 0.00 0.000000 0 1 fstat 0.00 0.000000 0 28 lseek 0.00 0.000000 0 8 mmap 0.00 0.000000 0 1 munmap 0.00 0.000000 0 1 brk 0.00 0.000000 0 4 pread 0.00 0.000000 0 1 uname 0.00 0.000000 0 10 fcntl 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 164 getrusage 0.00 0.000000 0 34 times 0.00 0.000000 0 1 getuid ------ ----------- ----------- --------- --------- ---------------- 100.00 0.013032 1807 7 total
SALES32M
[root@rico ~]# strace -c -p 11208 Process 11208 attached - interrupt to quit ^CProcess 11208 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 54.18 0.008278 11 746 io_submit 45.82 0.007000 9 746 io_getevents 0.00 0.000000 0 2 read 0.00 0.000000 0 2 write 0.00 0.000000 0 1 open 0.00 0.000000 0 1 close 0.00 0.000000 0 1 pread 0.00 0.000000 0 3 fcntl 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 164 getrusage 0.00 0.000000 0 28 times ------ ----------- ----------- --------- --------- ---------------- 100.00 0.015278 1695 total
SALES1M
[root@rico ~]# strace -c -p 11208 Process 11208 attached - interrupt to quit ^CProcess 11208 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 65.36 0.010000 13 749 io_getevents 34.64 0.005300 7 746 io_submit 0.00 0.000000 0 3 read 0.00 0.000000 0 2 write 0.00 0.000000 0 2 open 0.00 0.000000 0 2 close 0.00 0.000000 0 1 pread 0.00 0.000000 0 1 semctl 0.00 0.000000 0 3 fcntl 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 164 getrusage 0.00 0.000000 0 28 times 0.00 0.000000 0 2 semtimedop ------ ----------- ----------- --------- --------- ---------------- 100.00 0.015300 1704 total
SALES4M
[root@rico ~]# strace -c -p 11208 Process 11208 attached - interrupt to quit ^CProcess 11208 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 50.10 0.006000 8 746 io_getevents 49.90 0.005975 8 746 io_submit 0.00 0.000000 0 2 read 0.00 0.000000 0 2 write 0.00 0.000000 0 1 pread 0.00 0.000000 0 21 getrusage 0.00 0.000000 0 12 times ------ ----------- ----------- --------- --------- ---------------- 100.00 0.011975 1530 total
So EVERYTHING is being read with ASYNC IO!!! And this not the craziest thing – the other sessions are also using ASYNC IO for those tables now. Without altering the parameter. I have tried to flush buffer_cache and shared_pool, but with no success. The situation goes back to square one after restarting the instance.
I occurs that it takes only one session to set the parameter to any value – even the same value as it already was by default. And when the first query will use db file scattered read in async mode – all the other sessions, querying tables on ASM will also use io_getevents and io_submit instead of pread. (!!!)
[oracle@rico ~]$ sqlplus sh/sh SQL*Plus: Release 12.1.0.2.0 Production on Fri Jul 17 10:59:22 2015 Copyright (c) 1982, 2014, Oracle. All rights reserved. Last Successful login time: Fri Jul 17 2015 10:56:53 +02:00 Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics and Real Application Testing options SQL> alter session set db_file_multiblock_read_count=128; Session altered. SQL> select count(1) from sales16m; COUNT(1) ---------- 19295703 SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics and Real Application Testing options [oracle@rico ~]$ sqlplus sh/sh SQL*Plus: Release 12.1.0.2.0 Production on Fri Jul 17 10:59:51 2015 Copyright (c) 1982, 2014, Oracle. All rights reserved. Last Successful login time: Fri Jul 17 2015 10:59:22 +02:00 Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics and Real Application Testing options SQL> select count(1) from SALES4M; COUNT(1) ---------- 19295703 SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics and Real Application Testing options [oracle@rico ~]$ sqlplus sh/sh SQL*Plus: Release 12.1.0.2.0 Production on Fri Jul 17 11:00:43 2015 Copyright (c) 1982, 2014, Oracle. All rights reserved. Last Successful login time: Fri Jul 17 2015 10:59:51 +02:00 Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics and Real Application Testing options SQL> select count(1) from sales32m; COUNT(1) ---------- 19295703
The appropriate strace results are:
[root@rico ~]# strace -c -p 11782 Process 11782 attached - interrupt to quit ^CProcess 11782 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 53.97 0.006000 8 747 io_getevents 45.60 0.005069 7 747 io_submit 0.43 0.000048 1 34 times 0.00 0.000000 0 31 read 0.00 0.000000 0 2 write 0.00 0.000000 0 16 7 open 0.00 0.000000 0 9 close 0.00 0.000000 0 1 stat 0.00 0.000000 0 1 fstat 0.00 0.000000 0 28 lseek 0.00 0.000000 0 7 mmap 0.00 0.000000 0 1 munmap 0.00 0.000000 0 1 brk 0.00 0.000000 0 4 pread 0.00 0.000000 0 1 uname 0.00 0.000000 0 10 fcntl 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 164 getrusage 0.00 0.000000 0 1 getuid ------ ----------- ----------- --------- --------- ---------------- 100.00 0.011117 1806 7 total [root@rico ~]# strace -c -p 11793 Process 11793 attached - interrupt to quit ^CProcess 11793 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 51.71 0.007000 10 731 io_getevents 48.29 0.006536 9 731 io_submit 0.00 0.000000 0 32 read 0.00 0.000000 0 2 write 0.00 0.000000 0 17 7 open 0.00 0.000000 0 10 close 0.00 0.000000 0 1 stat 0.00 0.000000 0 1 fstat 0.00 0.000000 0 28 lseek 0.00 0.000000 0 9 mmap 0.00 0.000000 0 1 munmap 0.00 0.000000 0 1 brk 0.00 0.000000 0 17 pread 0.00 0.000000 0 1 uname 0.00 0.000000 0 1 semctl 0.00 0.000000 0 10 fcntl 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 209 getrusage 0.00 0.000000 0 50 times 0.00 0.000000 0 1 getuid 0.00 0.000000 0 2 semtimedop ------ ----------- ----------- --------- --------- ---------------- 100.00 0.013536 1856 7 total [root@rico ~]# strace -c -p 11823 Process 11823 attached - interrupt to quit ^CProcess 11823 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 51.65 0.005342 7 731 io_submit 48.35 0.005000 7 731 io_getevents 0.00 0.000000 0 31 read 0.00 0.000000 0 2 write 0.00 0.000000 0 16 7 open 0.00 0.000000 0 9 close 0.00 0.000000 0 1 stat 0.00 0.000000 0 1 fstat 0.00 0.000000 0 28 lseek 0.00 0.000000 0 9 mmap 0.00 0.000000 0 1 munmap 0.00 0.000000 0 1 brk 0.00 0.000000 0 17 pread 0.00 0.000000 0 1 uname 0.00 0.000000 0 10 fcntl 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 209 getrusage 0.00 0.000000 0 50 times 0.00 0.000000 0 1 getuid ------ ----------- ----------- --------- --------- ---------------- 100.00 0.010342 1850 7 total
Results from perf seams to confirm this strange behaviour.
Before setting the parameter:
[root@rico ~]# perf record -p 13521 ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.061 MB perf.data (~2644 samples) ] [root@rico ~]# perf report | grep aio 0.06% oracle_13521_or [kernel.kallsyms] [k] do_aio_read [root@rico ~]# perf report | grep pread 0.06% oracle_13521_or libpthread-2.12.so [.] __pread_nocancel
After setting the parameter:
[root@rico ~]# perf record -p 13573 ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.061 MB perf.data (~2680 samples) ] [root@rico ~]# perf report | grep pread [root@rico ~]# perf report | grep aio 0.18% oracle_13573_or libaio.so.1.0.1 [.] 0x0000000000000615 0.06% oracle_13573_or oracle [.] ksfdaio 0.06% oracle_13573_or [kernel.kallsyms] [k] aio_put_req 0.06% oracle_13573_or [kernel.kallsyms] [k] aio_run_iocb
I made tests on: Oracle 12.1.0.2 EE (OEL 6.6) and Oracle 11.2.0.3 EE (OEL 6.3)
So the conclusion is: WTF?!