db file scattered read – MBRC and ASYNC IO on linux


17.07.2015
by Kamil Stawiarski

Recently one of my students asked me if there is any correlation between the AU size in ASM diskgroup and parameter db_file_multiblock_read_count. I made some tests but results are very strange.

I have 4 same tables, located in 4 different diskgroups:

  1  select t.table_name,  g.name, t.blocks, g.ALLOCATION_UNIT_SIZE
  2  from dba_tables t, dba_data_files d, v$asm_diskgroup g
  3  where t.tablespace_name=d.tablespace_name
  4  and   g.name=substr(d.file_name,2,instr(d.file_name,'/')-2)
  5  and  t.owner='SH'
  6* and  t.table_name like 'SALES%'
SQL> /

TABLE_NAME NAME 			      BLOCKS ALLOCATION_UNIT_SIZE
---------- ------------------------------ ---------- --------------------
SALES16M   DATA16M			       93551		 16777216
SALES32M   DATA32M			       93551		 33554432
SALES1M    DATA1M			       93551		  1048576
SALES4M    DATA4M			       93551		  4194304

All tables are smaller then a hidden parameter "_small_table_threshold" so table access full on all tables will be using db file scattered read.

SQL> ;
  1  select p.ksppinm, v.ksppstvl
  2  from x$ksppi p, x$ksppcv v
  3  where p.indx=v.indx
  4* and p.ksppinm='_small_table_threshold'
SQL> /

KSPPINM 				 KSPPSTVL
---------------------------------------- ------------------------------
_small_table_threshold			 101527

Let’s start the main test. Parameter db_file_multiblock_read_count is set to the default value

[oracle@rico ~]$ sqlplus sh/sh

SQL*Plus: Release 12.1.0.2.0 Production on Fri Jul 17 10:30:36 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Wed Jul 01 2015 11:16:33 +02:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics
and Real Application Testing options

SQL> sho parameters db_file_multiblock

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
db_file_multiblock_read_count	     integer	 128
SQL> 

I will identify the server process, correlated with my session


SQL> ;
  1  select spid
  2  from v$process p, v$session s
  3  where p.addr=s.paddr
  4* and s.username='SH'
SQL> /

SPID
------------------------
10430

Now I will use strace on this server process:

[root@rico ~]# strace -c -p 10430
Process 10430 attached - interrupt to quit

I will gather results for the following queries:

SQL> select count(1) from SALES16M;

  COUNT(1)
----------
  19295703

SQL> select count(1) from SALES32M;

  COUNT(1)
----------
  19295703

SQL> select count(1) from SALES1M;

  COUNT(1)
----------
  19295703

SQL> select count(1) from SALES4M;

  COUNT(1)
----------
  19295703

Results for SALES16M

[root@rico ~]# strace -c -p 10430
Process 10430 attached - interrupt to quit
^CProcess 10430 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.053000          71       750           pread
  0.00    0.000000           0        31           read
  0.00    0.000000           0         2           write
  0.00    0.000000           0        16         7 open
  0.00    0.000000           0         9           close
  0.00    0.000000           0         1           stat
  0.00    0.000000           0         1           fstat
  0.00    0.000000           0        28           lseek
  0.00    0.000000           0         3           mmap
  0.00    0.000000           0         2           munmap
  0.00    0.000000           0         1           brk
  0.00    0.000000           0         1           uname
  0.00    0.000000           0        10           fcntl
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0       164           getrusage
  0.00    0.000000           0        28           times
  0.00    0.000000           0         1           getuid
  0.00    0.000000           0         1           io_getevents
  0.00    0.000000           0         1           io_submit
------ ----------- ----------- --------- --------- ----------------
100.00    0.053000                  1051         7 total

Results for SALES32M

[root@rico ~]# strace -c -p 10430
Process 10430 attached - interrupt to quit
^CProcess 10430 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.16    0.053462          72       747           pread
  1.84    0.001000         500         2           read
  0.00    0.000000           0         2           write
  0.00    0.000000           0         1           open
  0.00    0.000000           0         1           close
  0.00    0.000000           0         1           mmap
  0.00    0.000000           0         3           fcntl
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0       164           getrusage
  0.00    0.000000           0        28           times
------ ----------- ----------- --------- --------- ----------------
100.00    0.054462                   950           total

Results for SALES1M

[root@rico ~]# strace -c -p 10430
Process 10430 attached - interrupt to quit
^CProcess 10430 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 48.58    0.015000          26       583           io_getevents
 39.99    0.012349          75       164           pread
 11.43    0.003528           6       583           io_submit
  0.00    0.000000           0         3           read
  0.00    0.000000           0         2           write
  0.00    0.000000           0         2           open
  0.00    0.000000           0         2           close
  0.00    0.000000           0         1           semctl
  0.00    0.000000           0         3           fcntl
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0       164           getrusage
  0.00    0.000000           0        29           times
  0.00    0.000000           0         3           semtimedop
------ ----------- ----------- --------- --------- ----------------
100.00    0.030877                  1540           total

Results for SALES4M

[root@rico ~]# strace -c -p 10430
Process 10430 attached - interrupt to quit
^CProcess 10430 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 95.16    0.043000          65       664           pread
  2.62    0.001186          14        83           io_submit
  2.21    0.001000          12        83           io_getevents
  0.00    0.000000           0         2           read
  0.00    0.000000           0         2           write
  0.00    0.000000           0        21           getrusage
  0.00    0.000000           0        12           times
------ ----------- ----------- --------- --------- ----------------
100.00    0.045186                   867           total

As we can see, the ASYNC IO represented by syscalls: io_submit and io_getevents are mostly used while scanning table located in diskgroup with 1MB AU.

At first I thought that this is correlated with db_file_multiblock_read_count. The default value (128) multiplied by 8k (block size for my database) is exactly 1M. So I made a few more tests with altered values for this parameter. First results where promising – I have managed to get ASYNC IO for all of the tables.

But then I made the final (a little stupid :)) test. After creating a session I have altered value of the db_file_multiblock_read_count to 128. Yes: 128. So the same as default:

[oracle@rico ~]$ sqlplus sh/sh

SQL*Plus: Release 12.1.0.2.0 Production on Fri Jul 17 10:49:25 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Fri Jul 17 2015 10:48:43 +02:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics
and Real Application Testing options

SQL> sho parameters db_file_multiblock

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
db_file_multiblock_read_count	     integer	 128
SQL> alter session set db_file_multiblock_read_count=128;

Session altered.

Let’s check the results now:

SALES16M

[root@rico ~]# strace -c -p 11208
Process 11208 attached - interrupt to quit
^CProcess 11208 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 53.96    0.007032           9       747           io_submit
 46.04    0.006000           8       747           io_getevents
  0.00    0.000000           0        31           read
  0.00    0.000000           0         2           write
  0.00    0.000000           0        16         7 open
  0.00    0.000000           0         9           close
  0.00    0.000000           0         1           stat
  0.00    0.000000           0         1           fstat
  0.00    0.000000           0        28           lseek
  0.00    0.000000           0         8           mmap
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         1           brk
  0.00    0.000000           0         4           pread
  0.00    0.000000           0         1           uname
  0.00    0.000000           0        10           fcntl
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0       164           getrusage
  0.00    0.000000           0        34           times
  0.00    0.000000           0         1           getuid
------ ----------- ----------- --------- --------- ----------------
100.00    0.013032                  1807         7 total

SALES32M

[root@rico ~]# strace -c -p 11208
Process 11208 attached - interrupt to quit
^CProcess 11208 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 54.18    0.008278          11       746           io_submit
 45.82    0.007000           9       746           io_getevents
  0.00    0.000000           0         2           read
  0.00    0.000000           0         2           write
  0.00    0.000000           0         1           open
  0.00    0.000000           0         1           close
  0.00    0.000000           0         1           pread
  0.00    0.000000           0         3           fcntl
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0       164           getrusage
  0.00    0.000000           0        28           times
------ ----------- ----------- --------- --------- ----------------
100.00    0.015278                  1695           total

SALES1M

[root@rico ~]# strace -c -p 11208
Process 11208 attached - interrupt to quit
^CProcess 11208 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 65.36    0.010000          13       749           io_getevents
 34.64    0.005300           7       746           io_submit
  0.00    0.000000           0         3           read
  0.00    0.000000           0         2           write
  0.00    0.000000           0         2           open
  0.00    0.000000           0         2           close
  0.00    0.000000           0         1           pread
  0.00    0.000000           0         1           semctl
  0.00    0.000000           0         3           fcntl
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0       164           getrusage
  0.00    0.000000           0        28           times
  0.00    0.000000           0         2           semtimedop
------ ----------- ----------- --------- --------- ----------------
100.00    0.015300                  1704           total

SALES4M

[root@rico ~]# strace -c -p 11208
Process 11208 attached - interrupt to quit
^CProcess 11208 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 50.10    0.006000           8       746           io_getevents
 49.90    0.005975           8       746           io_submit
  0.00    0.000000           0         2           read
  0.00    0.000000           0         2           write
  0.00    0.000000           0         1           pread
  0.00    0.000000           0        21           getrusage
  0.00    0.000000           0        12           times
------ ----------- ----------- --------- --------- ----------------
100.00    0.011975                  1530           total

So EVERYTHING is being read with ASYNC IO!!! And this not the craziest thing – the other sessions are also using ASYNC IO for those tables now. Without altering the parameter. I have tried to flush buffer_cache and shared_pool, but with no success. The situation goes back to square one after restarting the instance.

I occurs that it takes only one session to set the parameter to any value – even the same value as it already was by default. And when the first query will use db file scattered read in async mode – all the other sessions, querying tables on ASM will also use io_getevents and io_submit instead of pread. (!!!)

[oracle@rico ~]$ sqlplus sh/sh

SQL*Plus: Release 12.1.0.2.0 Production on Fri Jul 17 10:59:22 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Fri Jul 17 2015 10:56:53 +02:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics
and Real Application Testing options

SQL> alter session set db_file_multiblock_read_count=128;

Session altered.

SQL> select count(1) from sales16m;

  COUNT(1)
----------
  19295703

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics
and Real Application Testing options
[oracle@rico ~]$ sqlplus sh/sh

SQL*Plus: Release 12.1.0.2.0 Production on Fri Jul 17 10:59:51 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Fri Jul 17 2015 10:59:22 +02:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics
and Real Application Testing options

SQL> select count(1) from SALES4M;

  COUNT(1)
----------
  19295703

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics
and Real Application Testing options
[oracle@rico ~]$ sqlplus sh/sh

SQL*Plus: Release 12.1.0.2.0 Production on Fri Jul 17 11:00:43 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Fri Jul 17 2015 10:59:51 +02:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics
and Real Application Testing options

SQL> select count(1) from sales32m;

  COUNT(1)
----------
  19295703

The appropriate strace results are:

[root@rico ~]# strace -c -p 11782
Process 11782 attached - interrupt to quit
^CProcess 11782 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 53.97    0.006000           8       747           io_getevents
 45.60    0.005069           7       747           io_submit
  0.43    0.000048           1        34           times
  0.00    0.000000           0        31           read
  0.00    0.000000           0         2           write
  0.00    0.000000           0        16         7 open
  0.00    0.000000           0         9           close
  0.00    0.000000           0         1           stat
  0.00    0.000000           0         1           fstat
  0.00    0.000000           0        28           lseek
  0.00    0.000000           0         7           mmap
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         1           brk
  0.00    0.000000           0         4           pread
  0.00    0.000000           0         1           uname
  0.00    0.000000           0        10           fcntl
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0       164           getrusage
  0.00    0.000000           0         1           getuid
------ ----------- ----------- --------- --------- ----------------
100.00    0.011117                  1806         7 total
[root@rico ~]# strace -c -p 11793
Process 11793 attached - interrupt to quit
^CProcess 11793 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 51.71    0.007000          10       731           io_getevents
 48.29    0.006536           9       731           io_submit
  0.00    0.000000           0        32           read
  0.00    0.000000           0         2           write
  0.00    0.000000           0        17         7 open
  0.00    0.000000           0        10           close
  0.00    0.000000           0         1           stat
  0.00    0.000000           0         1           fstat
  0.00    0.000000           0        28           lseek
  0.00    0.000000           0         9           mmap
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         1           brk
  0.00    0.000000           0        17           pread
  0.00    0.000000           0         1           uname
  0.00    0.000000           0         1           semctl
  0.00    0.000000           0        10           fcntl
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0       209           getrusage
  0.00    0.000000           0        50           times
  0.00    0.000000           0         1           getuid
  0.00    0.000000           0         2           semtimedop
------ ----------- ----------- --------- --------- ----------------
100.00    0.013536                  1856         7 total
[root@rico ~]# strace -c -p 11823
Process 11823 attached - interrupt to quit
^CProcess 11823 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 51.65    0.005342           7       731           io_submit
 48.35    0.005000           7       731           io_getevents
  0.00    0.000000           0        31           read
  0.00    0.000000           0         2           write
  0.00    0.000000           0        16         7 open
  0.00    0.000000           0         9           close
  0.00    0.000000           0         1           stat
  0.00    0.000000           0         1           fstat
  0.00    0.000000           0        28           lseek
  0.00    0.000000           0         9           mmap
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         1           brk
  0.00    0.000000           0        17           pread
  0.00    0.000000           0         1           uname
  0.00    0.000000           0        10           fcntl
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0       209           getrusage
  0.00    0.000000           0        50           times
  0.00    0.000000           0         1           getuid
------ ----------- ----------- --------- --------- ----------------
100.00    0.010342                  1850         7 total

Results from perf seams to confirm this strange behaviour.

Before setting the parameter:

[root@rico ~]# perf record -p 13521
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.061 MB perf.data (~2644 samples) ]

[root@rico ~]# perf report | grep aio
     0.06%  oracle_13521_or  [kernel.kallsyms]   [k] do_aio_read                     
[root@rico ~]# perf report | grep pread
     0.06%  oracle_13521_or  libpthread-2.12.so  [.] __pread_nocancel  

After setting the parameter:

[root@rico ~]# perf record -p 13573
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.061 MB perf.data (~2680 samples) ]

[root@rico ~]# perf report | grep pread
[root@rico ~]# perf report | grep aio
     0.18%  oracle_13573_or  libaio.so.1.0.1     [.] 0x0000000000000615              
     0.06%  oracle_13573_or  oracle              [.] ksfdaio                         
     0.06%  oracle_13573_or  [kernel.kallsyms]   [k] aio_put_req                     
     0.06%  oracle_13573_or  [kernel.kallsyms]   [k] aio_run_iocb        

I made tests on: Oracle 12.1.0.2 EE (OEL 6.6) and Oracle 11.2.0.3 EE (OEL 6.3)

So the conclusion is: WTF?!


Contact us

Database Whisperers sp. z o. o. sp. k.
al. Jerozolimskie 200, 3rd floor, room 342
02-486 Warszawa
NIP: 5272744987
REGON:362524978
+48 508 943 051
+48 661 966 009
info@ora-600.pl

Newsletter Sign up to be updated