Recently one of my students asked me if there is any correlation between the AU size in ASM diskgroup and parameter db_file_multiblock_read_count. I made some tests but results are very strange.
I have 4 same tables, located in 4 different diskgroups:
1 | 1 select t.table_name, g.name, t.blocks, g.ALLOCATION_UNIT_SIZE |
2 | 2 from dba_tables t, dba_data_files d, v$asm_diskgroup g |
3 | 3 where t.tablespace_name=d.tablespace_name |
4 | 4 and g.name=substr(d.file_name,2,instr(d.file_name,'/')-2) |
6 | 6* and t.table_name like 'SALES%' |
9 | TABLE_NAME NAME BLOCKS ALLOCATION_UNIT_SIZE |
10 | ---------- ------------------------------ ---------- -------------------- |
11 | SALES16M DATA16M 93551 16777216 |
12 | SALES32M DATA32M 93551 33554432 |
13 | SALES1M DATA1M 93551 1048576 |
14 | SALES4M DATA4M 93551 4194304 |
All tables are smaller then a hidden parameter "_small_table_threshold" so table access full on all tables will be using db file scattered read.
2 | 1 select p.ksppinm, v.ksppstvl |
3 | 2 from x$ksppi p, x$ksppcv v |
5 | 4* and p.ksppinm='_small_table_threshold' |
9 | ---------------------------------------- ------------------------------ |
10 | _small_table_threshold 101527 |
Let’s start the main test. Parameter db_file_multiblock_read_count is set to the default value
1 | [oracle@rico ~]$ sqlplus sh/sh |
3 | SQL*Plus: Release 12.1.0.2.0 Production on Fri Jul 17 10:30:36 2015 |
5 | Copyright (c) 1982, 2014, Oracle. All rights reserved. |
7 | Last Successful login time: Wed Jul 01 2015 11:16:33 +02:00 |
10 | Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production |
11 | With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics |
12 | and Real Application Testing options |
14 | SQL> sho parameters db_file_multiblock |
17 | ------------------------------------ ----------- ------------------------------ |
18 | db_file_multiblock_read_count integer 128 |
I will identify the server process, correlated with my session
3 | 2 from v$process p, v$session s |
9 | ------------------------ |
Now I will use strace on this server process:
1 | [root@rico ~]# strace -c -p 10430 |
2 | Process 10430 attached - interrupt to quit |
I will gather results for the following queries:
1 | SQL> select count(1) from SALES16M; |
7 | SQL> select count(1) from SALES32M; |
13 | SQL> select count(1) from SALES1M; |
19 | SQL> select count(1) from SALES4M; |
Results for SALES16M
1 | [root@rico ~]# strace -c -p 10430 |
2 | Process 10430 attached - interrupt to quit |
3 | ^CProcess 10430 detached |
4 | % time seconds usecs/call calls errors syscall |
5 | ------ ----------- ----------- --------- --------- ---------------- |
6 | 100.00 0.053000 71 750 pread |
7 | 0.00 0.000000 0 31 read |
8 | 0.00 0.000000 0 2 write |
9 | 0.00 0.000000 0 16 7 open |
10 | 0.00 0.000000 0 9 close |
11 | 0.00 0.000000 0 1 stat |
12 | 0.00 0.000000 0 1 fstat |
13 | 0.00 0.000000 0 28 lseek |
14 | 0.00 0.000000 0 3 mmap |
15 | 0.00 0.000000 0 2 munmap |
17 | 0.00 0.000000 0 1 uname |
18 | 0.00 0.000000 0 10 fcntl |
19 | 0.00 0.000000 0 1 getrlimit |
20 | 0.00 0.000000 0 164 getrusage |
21 | 0.00 0.000000 0 28 times |
22 | 0.00 0.000000 0 1 getuid |
23 | 0.00 0.000000 0 1 io_getevents |
24 | 0.00 0.000000 0 1 io_submit |
25 | ------ ----------- ----------- --------- --------- ---------------- |
26 | 100.00 0.053000 1051 7 total |
Results for SALES32M
1 | [root@rico ~]# strace -c -p 10430 |
2 | Process 10430 attached - interrupt to quit |
3 | ^CProcess 10430 detached |
4 | % time seconds usecs/call calls errors syscall |
5 | ------ ----------- ----------- --------- --------- ---------------- |
6 | 98.16 0.053462 72 747 pread |
7 | 1.84 0.001000 500 2 read |
8 | 0.00 0.000000 0 2 write |
10 | 0.00 0.000000 0 1 close |
11 | 0.00 0.000000 0 1 mmap |
12 | 0.00 0.000000 0 3 fcntl |
13 | 0.00 0.000000 0 1 getrlimit |
14 | 0.00 0.000000 0 164 getrusage |
15 | 0.00 0.000000 0 28 times |
16 | ------ ----------- ----------- --------- --------- ---------------- |
17 | 100.00 0.054462 950 total |
Results for SALES1M
1 | [root@rico ~]# strace -c -p 10430 |
2 | Process 10430 attached - interrupt to quit |
3 | ^CProcess 10430 detached |
4 | % time seconds usecs/call calls errors syscall |
5 | ------ ----------- ----------- --------- --------- ---------------- |
6 | 48.58 0.015000 26 583 io_getevents |
7 | 39.99 0.012349 75 164 pread |
8 | 11.43 0.003528 6 583 io_submit |
10 | 0.00 0.000000 0 2 write |
11 | 0.00 0.000000 0 2 open |
12 | 0.00 0.000000 0 2 close |
13 | 0.00 0.000000 0 1 semctl |
14 | 0.00 0.000000 0 3 fcntl |
15 | 0.00 0.000000 0 1 getrlimit |
16 | 0.00 0.000000 0 164 getrusage |
17 | 0.00 0.000000 0 29 times |
18 | 0.00 0.000000 0 3 semtimedop |
19 | ------ ----------- ----------- --------- --------- ---------------- |
20 | 100.00 0.030877 1540 total |
Results for SALES4M
1 | [root@rico ~]# strace -c -p 10430 |
2 | Process 10430 attached - interrupt to quit |
3 | ^CProcess 10430 detached |
4 | % time seconds usecs/call calls errors syscall |
5 | ------ ----------- ----------- --------- --------- ---------------- |
6 | 95.16 0.043000 65 664 pread |
7 | 2.62 0.001186 14 83 io_submit |
8 | 2.21 0.001000 12 83 io_getevents |
10 | 0.00 0.000000 0 2 write |
11 | 0.00 0.000000 0 21 getrusage |
12 | 0.00 0.000000 0 12 times |
13 | ------ ----------- ----------- --------- --------- ---------------- |
14 | 100.00 0.045186 867 total |
As we can see, the ASYNC IO represented by syscalls: io_submit and io_getevents are mostly used while scanning table located in diskgroup with 1MB AU.
At first I thought that this is correlated with db_file_multiblock_read_count. The default value (128) multiplied by 8k (block size for my database) is exactly 1M. So I made a few more tests with altered values for this parameter. First results where promising – I have managed to get ASYNC IO for all of the tables.
But then I made the final (a little stupid :)) test. After creating a session I have altered value of the db_file_multiblock_read_count to 128. Yes: 128. So the same as default:
1 | [oracle@rico ~]$ sqlplus sh/sh |
3 | SQL*Plus: Release 12.1.0.2.0 Production on Fri Jul 17 10:49:25 2015 |
5 | Copyright (c) 1982, 2014, Oracle. All rights reserved. |
7 | Last Successful login time: Fri Jul 17 2015 10:48:43 +02:00 |
10 | Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production |
11 | With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics |
12 | and Real Application Testing options |
14 | SQL> sho parameters db_file_multiblock |
17 | ------------------------------------ ----------- ------------------------------ |
18 | db_file_multiblock_read_count integer 128 |
19 | SQL> alter session set db_file_multiblock_read_count=128; |
Let’s check the results now:
SALES16M
1 | [root@rico ~]# strace -c -p 11208 |
2 | Process 11208 attached - interrupt to quit |
3 | ^CProcess 11208 detached |
4 | % time seconds usecs/call calls errors syscall |
5 | ------ ----------- ----------- --------- --------- ---------------- |
6 | 53.96 0.007032 9 747 io_submit |
7 | 46.04 0.006000 8 747 io_getevents |
8 | 0.00 0.000000 0 31 read |
9 | 0.00 0.000000 0 2 write |
10 | 0.00 0.000000 0 16 7 open |
11 | 0.00 0.000000 0 9 close |
12 | 0.00 0.000000 0 1 stat |
13 | 0.00 0.000000 0 1 fstat |
14 | 0.00 0.000000 0 28 lseek |
15 | 0.00 0.000000 0 8 mmap |
16 | 0.00 0.000000 0 1 munmap |
18 | 0.00 0.000000 0 4 pread |
19 | 0.00 0.000000 0 1 uname |
20 | 0.00 0.000000 0 10 fcntl |
21 | 0.00 0.000000 0 1 getrlimit |
22 | 0.00 0.000000 0 164 getrusage |
23 | 0.00 0.000000 0 34 times |
24 | 0.00 0.000000 0 1 getuid |
25 | ------ ----------- ----------- --------- --------- ---------------- |
26 | 100.00 0.013032 1807 7 total |
SALES32M
1 | [root@rico ~]# strace -c -p 11208 |
2 | Process 11208 attached - interrupt to quit |
3 | ^CProcess 11208 detached |
4 | % time seconds usecs/call calls errors syscall |
5 | ------ ----------- ----------- --------- --------- ---------------- |
6 | 54.18 0.008278 11 746 io_submit |
7 | 45.82 0.007000 9 746 io_getevents |
9 | 0.00 0.000000 0 2 write |
10 | 0.00 0.000000 0 1 open |
11 | 0.00 0.000000 0 1 close |
12 | 0.00 0.000000 0 1 pread |
13 | 0.00 0.000000 0 3 fcntl |
14 | 0.00 0.000000 0 1 getrlimit |
15 | 0.00 0.000000 0 164 getrusage |
16 | 0.00 0.000000 0 28 times |
17 | ------ ----------- ----------- --------- --------- ---------------- |
18 | 100.00 0.015278 1695 total |
SALES1M
1 | [root@rico ~]# strace -c -p 11208 |
2 | Process 11208 attached - interrupt to quit |
3 | ^CProcess 11208 detached |
4 | % time seconds usecs/call calls errors syscall |
5 | ------ ----------- ----------- --------- --------- ---------------- |
6 | 65.36 0.010000 13 749 io_getevents |
7 | 34.64 0.005300 7 746 io_submit |
9 | 0.00 0.000000 0 2 write |
10 | 0.00 0.000000 0 2 open |
11 | 0.00 0.000000 0 2 close |
12 | 0.00 0.000000 0 1 pread |
13 | 0.00 0.000000 0 1 semctl |
14 | 0.00 0.000000 0 3 fcntl |
15 | 0.00 0.000000 0 1 getrlimit |
16 | 0.00 0.000000 0 164 getrusage |
17 | 0.00 0.000000 0 28 times |
18 | 0.00 0.000000 0 2 semtimedop |
19 | ------ ----------- ----------- --------- --------- ---------------- |
20 | 100.00 0.015300 1704 total |
SALES4M
1 | [root@rico ~]# strace -c -p 11208 |
2 | Process 11208 attached - interrupt to quit |
3 | ^CProcess 11208 detached |
4 | % time seconds usecs/call calls errors syscall |
5 | ------ ----------- ----------- --------- --------- ---------------- |
6 | 50.10 0.006000 8 746 io_getevents |
7 | 49.90 0.005975 8 746 io_submit |
9 | 0.00 0.000000 0 2 write |
10 | 0.00 0.000000 0 1 pread |
11 | 0.00 0.000000 0 21 getrusage |
12 | 0.00 0.000000 0 12 times |
13 | ------ ----------- ----------- --------- --------- ---------------- |
14 | 100.00 0.011975 1530 total |
So EVERYTHING is being read with ASYNC IO!!! And this not the craziest thing – the other sessions are also using ASYNC IO for those tables now. Without altering the parameter. I have tried to flush buffer_cache and shared_pool, but with no success. The situation goes back to square one after restarting the instance.
I occurs that it takes only one session to set the parameter to any value – even the same value as it already was by default. And when the first query will use db file scattered read in async mode – all the other sessions, querying tables on ASM will also use io_getevents and io_submit instead of pread. (!!!)
1 | [oracle@rico ~]$ sqlplus sh/sh |
3 | SQL*Plus: Release 12.1.0.2.0 Production on Fri Jul 17 10:59:22 2015 |
5 | Copyright (c) 1982, 2014, Oracle. All rights reserved. |
7 | Last Successful login time: Fri Jul 17 2015 10:56:53 +02:00 |
10 | Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production |
11 | With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics |
12 | and Real Application Testing options |
14 | SQL> alter session set db_file_multiblock_read_count=128; |
18 | SQL> select count(1) from sales16m; |
24 | SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production |
25 | With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics |
26 | and Real Application Testing options |
27 | [oracle@rico ~]$ sqlplus sh/sh |
29 | SQL*Plus: Release 12.1.0.2.0 Production on Fri Jul 17 10:59:51 2015 |
31 | Copyright (c) 1982, 2014, Oracle. All rights reserved. |
33 | Last Successful login time: Fri Jul 17 2015 10:59:22 +02:00 |
36 | Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production |
37 | With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics |
38 | and Real Application Testing options |
40 | SQL> select count(1) from SALES4M; |
46 | SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production |
47 | With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics |
48 | and Real Application Testing options |
49 | [oracle@rico ~]$ sqlplus sh/sh |
51 | SQL*Plus: Release 12.1.0.2.0 Production on Fri Jul 17 11:00:43 2015 |
53 | Copyright (c) 1982, 2014, Oracle. All rights reserved. |
55 | Last Successful login time: Fri Jul 17 2015 10:59:51 +02:00 |
58 | Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production |
59 | With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics |
60 | and Real Application Testing options |
62 | SQL> select count(1) from sales32m; |
The appropriate strace results are:
1 | [root@rico ~]# strace -c -p 11782 |
2 | Process 11782 attached - interrupt to quit |
3 | ^CProcess 11782 detached |
4 | % time seconds usecs/call calls errors syscall |
5 | ------ ----------- ----------- --------- --------- ---------------- |
6 | 53.97 0.006000 8 747 io_getevents |
7 | 45.60 0.005069 7 747 io_submit |
8 | 0.43 0.000048 1 34 times |
9 | 0.00 0.000000 0 31 read |
10 | 0.00 0.000000 0 2 write |
11 | 0.00 0.000000 0 16 7 open |
12 | 0.00 0.000000 0 9 close |
13 | 0.00 0.000000 0 1 stat |
14 | 0.00 0.000000 0 1 fstat |
15 | 0.00 0.000000 0 28 lseek |
16 | 0.00 0.000000 0 7 mmap |
17 | 0.00 0.000000 0 1 munmap |
19 | 0.00 0.000000 0 4 pread |
20 | 0.00 0.000000 0 1 uname |
21 | 0.00 0.000000 0 10 fcntl |
22 | 0.00 0.000000 0 1 getrlimit |
23 | 0.00 0.000000 0 164 getrusage |
24 | 0.00 0.000000 0 1 getuid |
25 | ------ ----------- ----------- --------- --------- ---------------- |
26 | 100.00 0.011117 1806 7 total |
27 | [root@rico ~]# strace -c -p 11793 |
28 | Process 11793 attached - interrupt to quit |
29 | ^CProcess 11793 detached |
30 | % time seconds usecs/call calls errors syscall |
31 | ------ ----------- ----------- --------- --------- ---------------- |
32 | 51.71 0.007000 10 731 io_getevents |
33 | 48.29 0.006536 9 731 io_submit |
34 | 0.00 0.000000 0 32 read |
35 | 0.00 0.000000 0 2 write |
36 | 0.00 0.000000 0 17 7 open |
37 | 0.00 0.000000 0 10 close |
38 | 0.00 0.000000 0 1 stat |
39 | 0.00 0.000000 0 1 fstat |
40 | 0.00 0.000000 0 28 lseek |
41 | 0.00 0.000000 0 9 mmap |
42 | 0.00 0.000000 0 1 munmap |
44 | 0.00 0.000000 0 17 pread |
45 | 0.00 0.000000 0 1 uname |
46 | 0.00 0.000000 0 1 semctl |
47 | 0.00 0.000000 0 10 fcntl |
48 | 0.00 0.000000 0 1 getrlimit |
49 | 0.00 0.000000 0 209 getrusage |
50 | 0.00 0.000000 0 50 times |
51 | 0.00 0.000000 0 1 getuid |
52 | 0.00 0.000000 0 2 semtimedop |
53 | ------ ----------- ----------- --------- --------- ---------------- |
54 | 100.00 0.013536 1856 7 total |
55 | [root@rico ~]# strace -c -p 11823 |
56 | Process 11823 attached - interrupt to quit |
57 | ^CProcess 11823 detached |
58 | % time seconds usecs/call calls errors syscall |
59 | ------ ----------- ----------- --------- --------- ---------------- |
60 | 51.65 0.005342 7 731 io_submit |
61 | 48.35 0.005000 7 731 io_getevents |
62 | 0.00 0.000000 0 31 read |
63 | 0.00 0.000000 0 2 write |
64 | 0.00 0.000000 0 16 7 open |
65 | 0.00 0.000000 0 9 close |
66 | 0.00 0.000000 0 1 stat |
67 | 0.00 0.000000 0 1 fstat |
68 | 0.00 0.000000 0 28 lseek |
69 | 0.00 0.000000 0 9 mmap |
70 | 0.00 0.000000 0 1 munmap |
72 | 0.00 0.000000 0 17 pread |
73 | 0.00 0.000000 0 1 uname |
74 | 0.00 0.000000 0 10 fcntl |
75 | 0.00 0.000000 0 1 getrlimit |
76 | 0.00 0.000000 0 209 getrusage |
77 | 0.00 0.000000 0 50 times |
78 | 0.00 0.000000 0 1 getuid |
79 | ------ ----------- ----------- --------- --------- ---------------- |
80 | 100.00 0.010342 1850 7 total |
Results from perf seams to confirm this strange behaviour.
Before setting the parameter:
1 | [root@rico ~]# perf record -p 13521 |
2 | ^C[ perf record: Woken up 1 times to write data ] |
3 | [ perf record: Captured and wrote 0.061 MB perf.data (~2644 samples) ] |
5 | [root@rico ~]# perf report | grep aio |
6 | 0.06% oracle_13521_or [kernel.kallsyms] [k] do_aio_read |
7 | [root@rico ~]# perf report | grep pread |
8 | 0.06% oracle_13521_or libpthread-2.12.so [.] __pread_nocancel |
After setting the parameter:
1 | [root@rico ~]# perf record -p 13573 |
2 | ^C[ perf record: Woken up 1 times to write data ] |
3 | [ perf record: Captured and wrote 0.061 MB perf.data (~2680 samples) ] |
5 | [root@rico ~]# perf report | grep pread |
6 | [root@rico ~]# perf report | grep aio |
7 | 0.18% oracle_13573_or libaio.so.1.0.1 [.] 0x0000000000000615 |
8 | 0.06% oracle_13573_or oracle [.] ksfdaio |
9 | 0.06% oracle_13573_or [kernel.kallsyms] [k] aio_put_req |
10 | 0.06% oracle_13573_or [kernel.kallsyms] [k] aio_run_iocb |
I made tests on: Oracle 12.1.0.2 EE (OEL 6.6) and Oracle 11.2.0.3 EE (OEL 6.3)
So the conclusion is: WTF?!