Oracle SPARC M7 – tracing DAX with DTrace and busstat


27.01.2016
by Kamil Stawiarski

Thanks to Oracle I had a possibility to test the new Sparc M7 with DAX coprocessors to boost In-Memory performance.
You can read about it here and here

My first thought was – how to check if and when the DAX coprocessors are being used?

When you have a POC for Exadata, you want to check wait events for CELL SMART SCAN, to be sure that query offloading occurred.
Unfortunately there are no DAX-specific wait events, that you can check at the database level – so how will we know that DAX coprocessors are working? Should we just believe that queries are being offloaded? Well… there’s always DTrace 😉

It’s time for testing! I have three tables, loaded into In-Memory buffer:

SQL> ;
  1  select INMEMORY_SIZE/1024/1024 as im_size, bytes/1024/1024 as p_size, segment_name
  2  from v$im_segments
  3  where segment_name in ('LINEORDER','CUSTOMER','SUPPLIER')
  4* and   owner='SSB'
SQL> /

   IM_SIZE     P_SIZE SEGMENT_NAME
---------- ---------- ------------------------------
  412.1875 1148.03906 CUSTOMER
  63580.75 202806.672 LINEORDER
   31.1875  70.484375 SUPPLIER

Let’s start with something simple – what modules/libraries are used, when querying a table stored in memory?
I will use this DTrace script

#!/usr/sbin/dtrace -s
pid$1:::entry
{
        @[probemod]=count();
}

to trace the server process, which will execute this query:

SQL> ;
  1  select /*+ NOPARALLEL */ sum(LO_QUANTITY)
  2  from lineorder
  3  where lo_tax between 1 and 10
  4* and LO_SHIPMODE like '%MAIL%'

and the results look like this:

root@m7:~/ks# ./modules.d 18419
dtrace: script './modules.d' matched 229096 probes
^C

  libm.so.2                                                        16
  LM1`ld.so.1                                                     846
  libdax_query.so.1                                             53925
  libshpkm712.so                                                66563
  libdax.so.1                                                  146204
  libc.so.1                                                    511366
  oracle                                                      4763677

WOW – it’s great! We can see that Oracle is using two DAX-specific libraries – libdax.so.1 and libdax_query.so.1.
Those libraries can be found in this path: /usr/lib/sparcv9/
After closer examination we also find out that libdax_query.so.1 contains functions for offloading OZIP operations – that’s why DAX coprocessors are most effective with MEMCOMPRESS FOR QUERY HIGH algorithm.

root@m7:~/ks# nm /usr/lib/sparcv9/libdax_query.so.1 | grep ozip
[202]   |               25220|                 472|FUNC |LOCL |0    |15     |check_ozip_hdr
[224]   |               40688|                 660|FUNC |LOCL |0    |15     |extract_ozip
[221]   |               40088|                 600|FUNC |LOCL |0    |15     |unpack_ozip

Another great finding was that the fbt provider, provides functions for tracing dax at system level.

root@m7:~/ks# dtrace -m fbt:dax -l
   ID   PROVIDER            MODULE                          FUNCTION NAME
78803        fbt               dax                    dax_hma_unload entry
78804        fbt               dax                    dax_hma_unload return
78805        fbt               dax             dax_hma_unload_finish entry
78806        fbt               dax             dax_hma_unload_finish return
78807        fbt               dax                      dax_hma_exit entry
78808        fbt               dax                      dax_hma_exit return
78809        fbt               dax                          dax_open entry
78810        fbt               dax                          dax_open return
78811        fbt               dax                         dax_close entry
78812        fbt               dax                         dax_close return
78813        fbt               dax                         dax_ioctl entry
78814        fbt               dax                         dax_ioctl return
78815        fbt               dax                        dax_devmap entry
78816        fbt               dax                        dax_devmap return
78817        fbt               dax                       dax_getinfo entry
78818        fbt               dax                       dax_getinfo return
78819        fbt               dax                        dax_attach entry
78820        fbt               dax                        dax_attach return
78821        fbt               dax                        dax_detach entry
78822        fbt               dax                        dax_detach return
78823        fbt               dax                    dax_devmap_map entry
78824        fbt               dax                    dax_devmap_map return
78825        fbt               dax                 dax_devmap_access entry
78826        fbt               dax                 dax_devmap_access return
78827        fbt               dax                    dax_devmap_dup entry
78828        fbt               dax                    dax_devmap_dup return
78829        fbt               dax                  dax_devmap_unmap entry
78830        fbt               dax                  dax_devmap_unmap return
78831        fbt               dax                   dax_minor_alloc entry
78832        fbt               dax                   dax_minor_alloc return
78833        fbt               dax                    dax_minor_free entry
78834        fbt               dax                    dax_minor_free return
78835        fbt               dax                     dax_minor_get entry
78836        fbt               dax                     dax_minor_get return
78837        fbt               dax                    dax_minor_rele entry
78838        fbt               dax                    dax_minor_rele return
78839        fbt               dax                    dax_minor_init entry
78840        fbt               dax                    dax_minor_init return
78841        fbt               dax                 dax_hsvc_register entry
78842        fbt               dax                 dax_hsvc_register return
78843        fbt               dax               dax_hsvc_unregister entry
78844        fbt               dax               dax_hsvc_unregister return
78845        fbt               dax                dax_tsd_destructor entry
78846        fbt               dax                dax_tsd_destructor return
78847        fbt               dax         dax_set_suspend_callbacks entry
78848        fbt               dax         dax_set_suspend_callbacks return
78849        fbt               dax       dax_unset_suspend_callbacks entry
78850        fbt               dax       dax_unset_suspend_callbacks return
78851        fbt               dax                       dax_hv_init entry
78852        fbt               dax                       dax_hv_init return
78853        fbt               dax                      dax_hma_init entry
78854        fbt               dax                      dax_hma_init return
78855        fbt               dax                      dax_hma_fini entry
78856        fbt               dax                      dax_hma_fini return
78857        fbt               dax                  dax_state_create entry
78858        fbt               dax                  dax_state_create return
78859        fbt               dax       dax_sfmmu_hash_remove_state entry
78860        fbt               dax       dax_sfmmu_hash_remove_state return
78861        fbt               dax                 dax_state_destroy entry
78862        fbt               dax                 dax_state_destroy return
78863        fbt               dax            dax_ioctl_ccb_thr_init entry
78864        fbt               dax            dax_ioctl_ccb_thr_init return
78865        fbt               dax            dax_ioctl_ccb_thr_fini entry
78866        fbt               dax            dax_ioctl_ccb_thr_fini return
78867        fbt               dax              dax_ioctl_ca_dequeue entry
78868        fbt               dax              dax_ioctl_ca_dequeue return
78869        fbt               dax                      dax_ctx_fini entry
78870        fbt               dax                      dax_ctx_fini return
78871        fbt               dax                     dax_ctx_alloc entry
78872        fbt               dax                     dax_ctx_alloc return
78873        fbt               dax              dax_state_add_thread entry
78874        fbt               dax              dax_state_add_thread return
78875        fbt               dax               dax_state_add_sfmmu entry
78876        fbt               dax               dax_state_add_sfmmu return
78877        fbt               dax          dax_sfmmu_hash_add_state entry
78878        fbt               dax          dax_sfmmu_hash_add_state return
78879        fbt               dax                    dax_hma_enable entry
78880        fbt               dax                    dax_hma_enable return
78881        fbt               dax           dax_state_remove_thread entry
78882        fbt               dax           dax_state_remove_thread return
78883        fbt               dax       dax_ctx_flush_decommit_ccbs entry
78884        fbt               dax       dax_ctx_flush_decommit_ccbs return
78885        fbt               dax      dax_validate_ca_dequeue_args entry
78886        fbt               dax      dax_validate_ca_dequeue_args return
78887        fbt               dax   dax_ccb_buffer_get_contig_block entry
78888        fbt               dax   dax_ccb_buffer_get_contig_block return
78889        fbt               dax           dax_ccb_buffer_decommit entry
78890        fbt               dax           dax_ccb_buffer_decommit return
78891        fbt               dax                      dax_ccb_info entry
78892        fbt               dax                      dax_ccb_info return
78893        fbt               dax                      dax_ccb_kill entry
78894        fbt               dax                      dax_ccb_kill return
78895        fbt               dax                dax_ctx_flush_ccbs entry
78896        fbt               dax                dax_ctx_flush_ccbs return
78897        fbt               dax              dax_ccb_flush_contig entry
78898        fbt               dax              dax_ccb_flush_contig return
78899        fbt               dax                      dax_ccb_wait entry
78900        fbt               dax                      dax_ccb_wait return
78901        fbt               dax         dax_sfmmu_hash_ent_create entry
78902        fbt               dax         dax_sfmmu_hash_ent_create return
78903        fbt               dax      dax_sfmmu_hash_ent_add_state entry
78904        fbt               dax      dax_sfmmu_hash_ent_add_state return
78905        fbt               dax   dax_sfmmu_hash_ent_remove_state entry
78906        fbt               dax   dax_sfmmu_hash_ent_remove_state return
78907        fbt               dax       dax_sfmmu_hash_remove_sfmmu entry
78908        fbt               dax       dax_sfmmu_hash_remove_sfmmu return
78909        fbt               dax            dax_state_remove_sfmmu entry
78910        fbt               dax            dax_state_remove_sfmmu return
78911        fbt               dax        dax_sfmmu_hash_ent_destroy entry
78912        fbt               dax        dax_sfmmu_hash_ent_destroy return
78913        fbt               dax      dax_sfmmu_wait_matching_ccbs entry
78914        fbt               dax      dax_sfmmu_wait_matching_ccbs return
78915        fbt               dax               dax_thr_search_ccbs entry
78916        fbt               dax               dax_thr_search_ccbs return
78917        fbt               dax                  dax_hma_ccb_wait entry
78918        fbt               dax                  dax_hma_ccb_wait return
78919        fbt               dax                    dax_ccb_search entry
78920        fbt               dax                    dax_ccb_search return
78921        fbt               dax             dax_ccb_search_contig entry
78922        fbt               dax             dax_ccb_search_contig return
78923        fbt               dax               dax_ccb_contains_va entry
78924        fbt               dax               dax_ccb_contains_va return
78925        fbt               dax                dax_hma_save_state entry
78926        fbt               dax                dax_hma_save_state return
78927        fbt               dax                 dax_ccb_flush_all entry
78928        fbt               dax                 dax_ccb_flush_all return
78929        fbt               dax             dax_sfmmu_hash_walker entry
78930        fbt               dax             dax_sfmmu_hash_walker return
78931        fbt               dax              dax_sfmmu_flush_ccbs entry
78932        fbt               dax              dax_sfmmu_flush_ccbs return
78933        fbt               dax               dax_error_decode_cb entry
78934        fbt               dax               dax_error_decode_cb return
78935        fbt               dax                dax_pre_suspend_cb entry
78936        fbt               dax                dax_pre_suspend_cb return
78937        fbt               dax               dax_post_suspend_cb entry
78938        fbt               dax               dax_post_suspend_cb return
78939        fbt               dax                             _info entry

After some investigation, I’ve noticed that most functions responsible for In-Memory execution are called kdmo*, kdzdcol*, kdml* and *ozip*. So I’ve written a short DTrace script to reveal all the functions names and execution times (nanosec) used in my previous query:

#!/usr/sbin/dtrace -s
#pragma D option quiet

fbt:dax::entry,
pid$1::kdmo*:entry,
pid$1::kdzdcol*:entry,
pid$1::kdml*:entry,
pid$1::*ozip*:entry,
pid$1::*dax*:entry
/pid==$1/
{
        self->t=timestamp;
}
fbt:dax::return,
pid$1::kdmo*:return,
pid$1::kdzdcol*:return,
pid$1::kdml*:return,
pid$1::*ozip*:return,
pid$1::*dax*:return
/pid==$1 && self->t!=0/
{
        @ftimes[probemod,probefunc]=sum(timestamp-self->t);
        self->t=0;
}
END
{
        printa("%s:%-40s %@d\n",@ftimes);
}

Here are the results:

root@m7:~/ks# ./dax_inmem_f.d 18746
^C
dax:dax_hma_enable                           1200
dax:dax_devmap_map                           1380
dax:dax_state_add_sfmmu                      1395
dax:dax_devmap_access                        1500
dax:dax_state_create                         1635
dax:dax_sfmmu_hash_ent_add_state             1650
dax:dax_state_add_thread                     1680
dax:dax_sfmmu_hash_ent_create                2115
dax:dax_devmap                               2340
dax:dax_getinfo                              2415
oracle:kdmoCostOverheadIMCScan                  3915
oracle:kdmoCostTraversalPerIMCU                 4065
oracle:kdmoCostIOPerIMCU                        4110
oracle:kdmoCostMinMaxPerCU                      4140
oracle:kdmoCostRowTxnJournal                    4140
oracle:kdmoCostRowStitchPerCol                  4215
dax:dax_minor_get                            4605
oracle:kdmoInvalidInMemRowsFrn                  5055
oracle:kdzdcol_sizeof                           5190
libdax.so.1:dax_add_queue                            6045
oracle:kdmoCacheDecompKeyGenCB                  6315
oracle:kdmoCacheTabKeyCompCB                    7230
oracle:kdmoCachePredEvalKeyGenCB                8010
oracle:kdmoMinMaxPruningPossible                8025
oracle:kdmoCacheMinMaxPruningKeyGenCB           8565
dax:dax_ctx_alloc                            8655
dax:dax_hma_unload                           9915
oracle:kdmoZeroSegStats                         12690
oracle:kdmoCacheTabKeyGenCB                     15270
dax:dax_ccb_contains_va                      57000
dax:dax_ccb_buffer_decommit                  339030
dax:dax_ccb_buffer_get_contig_block          362925
dax:dax_validate_ca_dequeue_args             381630
dax:dax_ioctl                                486945
oracle:kdzdcol_skip_slots_rowid                 6434055
oracle:kdzdcol_has_agg_rowset                   7371105
oracle:kdzdcol_init_rowid                       7478415
oracle:kdzdcol_eva_likefast_imc_dict            8026725
oracle:kdzdcol_has_eva                          9398745
libdax.so.1:dax_query_copy_scan_operand              10169670
oracle:kdzdcol_agg_cols_imc_dict_fini_agg       10349370
oracle:kdmoInitSegStatsInt                      11979975
oracle:kdmoMinMaxProcessOnePred                 13855200
oracle:kdzdcol_init_unsep                       13935975
oracle:kdzdcol_reset_unsep                      18861735
oracle:kdzdcol_get_vals_imc_dict                19224690
oracle:kdzdcol_isnull_imc_dict                  21221775
oracle:kdzdcol_dict_binary_search               22327710
oracle:kdzdcol_skip_slots                       23420385
libdax.so.1:dax_submit                               25563135
oracle:kdzdcol_end_imc                          27531735
oracle:kdzdcol_end                              27557565
libdax_query.so.1:dax_advantage                            27981420
libdax_query.so.1:dax_valid                                28049730
libdax_query.so.1:dax_available                            28829775
libdax.so.1:dax_ccbca_mark_read                      28991550
oracle:kdzdcol_dbas_create_valid_vec            29668080
oracle:kdzdcol_init_imc_dict                    31321560
oracle:kdzdcol_reset                            35010705
oracle:kdzdcol_init                             35573295
oracle:kdzdcol_get_minval                       40057785
oracle:kdzdcol_prep                             45411450
oracle:kdmoDecompGetRate                        46526115
oracle:kdzdcol_has_nulls                        55654635
oracle:kdzdcol_get_maxval                       59688045
oracle:kdmoGetHeaderColCU                       64548000
oracle:kdmlGetColCUPointer                      64854240
oracle:kdzdcol_get_vals_unsep_one               76925550
libdax.so.1:dax_log_msg                              80138355
oracle:kdzdcol_skip_slots_unsep                 554166045

We can also trace DAX usage at the system level, using just DAX module from FBT provider – user stack examination can give us some information about functions which were offloaded. To do that, we can use this DTrace scirpt:

#!/usr/sbin/dtrace -s
#pragma D option quiet

fbt:dax::entry
{
        self->t=timestamp;
}
fbt:dax::return
/self->t!=0/
{
        @times[ustack(100,500),probemod,probefunc]=quantize(timestamp-self->t);
        self->t=0;
}
tick-1s
{
        trunc(@times,10);
        printa(@times);
        trunc(@times);
}

END
{
        printa(@times);
        trunc(@times);
}

Sample results will look like this:

dax                                                 dax_ioctl
           value  ------------- Distribution ------------- count
            4096 |                                         0
            8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
           16384 |                                         0



              libc.so.1`ioctl+0x8
              libdax.so.1`dax_read_results+0x264
              libdax_query.so.1`dax_query_execute+0x160
              libdax_query.so.1`extract_ozip+0x210
              libdax_query.so.1`dax_extract+0x1a0
              libshpkm712.so`kdzk_ozip_decode_dydi+0x8c
              oracle`kdzk_ozip_decode+0x100
  dax                                                 dax_ccb_buffer_get_contig_block
           value  ------------- Distribution ------------- count
            1024 |                                         0
            2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
            4096 |                                         0


              libc.so.1`ioctl+0x8
              libdax.so.1`dax_read_results+0x264
              libdax_query.so.1`dax_query_execute+0x160
              libdax_query.so.1`extract_ozip+0x210
              libdax_query.so.1`dax_extract+0x1a0
              libshpkm712.so`kdzk_ozip_decode_dydi+0x8c
              oracle`kdzk_ozip_decode+0x100
  dax                                                 dax_validate_ca_dequeue_args

In the above example we can see that OZIP algorithm was offloaded by DAX coprocessors.

Another way to trace DAX usage is busstat:

root@m7:~/ks# busstat -e dax
pic0
DAX_QRY0_output_valid
DAX_QRY1_output_valid
DAX_QRY2_output_valid
DAX_QRY3_output_valid
DAX_QRY0_input_valid
DAX_QRY1_input_valid
DAX_QRY2_input_valid
DAX_QRY3_input_valid
DAX_MSG_MDI_req
DAX_MSG_MDI_nack
DAX_MSG_data_xfer
DAX_MSG_membus_PIO_pull
DAX_MSG_membus_PIO_req
DAX_MSG_sequencer_busy_cyc
DAX_MSG_brkpt_hit
DAX_STRM_addr_brkpt_hit
DAX_STRM_qpipe_wr_req
DAX_STRM_qpipe_rd_req
DAX_STRM_mpipe_wr_req
DAX_STRM_mpipe_rd_req
DAX_STRM_mRTD_RMW_COU
DAX_STRM_mRTD_RMW_MCU
DAX_STRM_L3_probe_ack
DAX_STRM_L3_probe
DAX_STRM_SCC0_L3_no_credit_cyc
DAX_STRM_SCC1_L3_no_credit_cyc
DAX_STRM_SCC2_L3_no_credit_cyc
DAX_STRM_SCC3_L3_no_credit_cyc
DAX_STRM_SCC4_L3_no_credit_cyc
DAX_STRM_SCC5_L3_no_credit_cyc
DAX_STRM_SCC6_L3_no_credit_cyc
DAX_STRM_SCC7_L3_no_credit_cyc
DAX_STRM_COU0_no_wr_credit_cyc
DAX_STRM_COU1_no_wr_credit_cyc
DAX_STRM_COU2_no_wr_credit_cyc
DAX_STRM_COU3_no_wr_credit_cyc
DAX_STRM_COU0_no_rd_credit_cyc
DAX_STRM_COU1_no_rd_credit_cyc
DAX_STRM_COU2_no_rd_credit_cyc
DAX_STRM_COU3_no_rd_credit_cyc
DAX_STRM_RMW_req
DAX_STRM_L3_WRA_req
DAX_STRM_wr_req
DAX_STRM_rd_req
DAX_STRM_L3x_no_credit_cyc
DAX_STRM_COUx_no_wr_credit_cyc
DAX_STRM_COUx_no_rd_credit_cyc
DAX_SCH_pipe_cmd_stall_ops
DAX_SCH_pipe_cmd_stall_cyc
DAX_SCH_pipe_cmd_sched
DAX_SCH_remote_PIO
DAX_SCH_msgq_full_cyc
DAX_SCH_queryq_full_cyc
DAX_SCH_ccb_fetch
DAX_SCH_msg_cmd_stall_cyc
DAX_SCH_msg_cmd_sched
DAX_SCH_query_cmd_stall_cyc
DAX_SCH_query_cmd_sched
DAX_no_count
DAX_all_cyc

(...SAME OUTPUT FOR PIC1 AND PIC2...)

Those are physical CPU counters that can be used to determine the performance of DAX pipelines. There is no documentation (yet?) for those counters, but thanks to Martin Mueller, I know the meaning of two of them:

  • DAX_SCH_query_cmd_sched (number of query cmds scheduled)
  • DAX_SCH_query_cmd_stall_cyc (# of cycles a query cmd had to stall because of lack resources, a high number would indicate DAX overload)

You can monitor them using this command:

root@m7:~# busstat -w dax,pic0=DAX_SCH_query_cmd_sched,pic1=DAX_SCH_query_cmd_stall_cyc 5
time dev                      event0               pic0        event1               pic1        event2               pic2
5    dax0                     DAX_SCH_query_cmd_sched 26          DAX_SCH_query_cmd_stall_cyc 0           DAX_all_cyc          8963253855
5    dax1                     DAX_SCH_query_cmd_sched 26          DAX_SCH_query_cmd_stall_cyc 0           DAX_all_cyc          8963338617
5    dax2                     DAX_SCH_query_cmd_sched 25          DAX_SCH_query_cmd_stall_cyc 0           DAX_all_cyc          8963366741
5    dax3                     DAX_SCH_query_cmd_sched 24          DAX_SCH_query_cmd_stall_cyc 0           DAX_all_cyc          8963394841
5    dax4                     DAX_SCH_query_cmd_sched 21          DAX_SCH_query_cmd_stall_cyc 0           DAX_all_cyc          8963417412
5    dax5                     DAX_SCH_query_cmd_sched 27          DAX_SCH_query_cmd_stall_cyc 0           DAX_all_cyc          8963442430
5    dax6                     DAX_SCH_query_cmd_sched 25          DAX_SCH_query_cmd_stall_cyc 0           DAX_all_cyc          8963464834
5    dax7                     DAX_SCH_query_cmd_sched 28          DAX_SCH_query_cmd_stall_cyc 0           DAX_all_cyc          8963495126
10   dax0                     DAX_SCH_query_cmd_sched 38          DAX_SCH_query_cmd_stall_cyc 0           DAX_all_cyc          8961987804

That’s it for now 🙂

P.S.
If you want to learn about dynamic tracing technics for analyzing Oracle behavior – you should read this awesome article by Luca Canali http://db-blog.web.cern.ch/blog/luca-canali/2016-01-linux-perf-probes-oracle-tracing

Search:


Kontakt

Database Whisperers sp. z o. o. sp. k.
Al. Jerozolimskie 200
Wejście B, III piętro/ pokój 342
02-486 Warszawa
NIP: 5272744987
REGON:362524978
+48 508 943 051
+48 661 966 009
info@ora-600.pl

Newsletter zapisz się żeby być na bieżąco