Thanks to Oracle I had a possibility to test the new Sparc M7 with DAX coprocessors to boost In-Memory performance.
You can read about it here and here
My first thought was – how to check if and when the DAX coprocessors are being used?
When you have a POC for Exadata, you want to check wait events for CELL SMART SCAN, to be sure that query offloading occurred.
Unfortunately there are no DAX-specific wait events, that you can check at the database level – so how will we know that DAX coprocessors are working? Should we just believe that queries are being offloaded? Well… there’s always DTrace 😉
It’s time for testing! I have three tables, loaded into In-Memory buffer:
SQL> ; 1 select INMEMORY_SIZE/1024/1024 as im_size, bytes/1024/1024 as p_size, segment_name 2 from v$im_segments 3 where segment_name in ('LINEORDER','CUSTOMER','SUPPLIER') 4* and owner='SSB' SQL> / IM_SIZE P_SIZE SEGMENT_NAME ---------- ---------- ------------------------------ 412.1875 1148.03906 CUSTOMER 63580.75 202806.672 LINEORDER 31.1875 70.484375 SUPPLIER
Let’s start with something simple – what modules/libraries are used, when querying a table stored in memory?
I will use this DTrace script
#!/usr/sbin/dtrace -s pid$1:::entry { @[probemod]=count(); }
to trace the server process, which will execute this query:
SQL> ; 1 select /*+ NOPARALLEL */ sum(LO_QUANTITY) 2 from lineorder 3 where lo_tax between 1 and 10 4* and LO_SHIPMODE like '%MAIL%'
and the results look like this:
root@m7:~/ks# ./modules.d 18419 dtrace: script './modules.d' matched 229096 probes ^C libm.so.2 16 LM1`ld.so.1 846 libdax_query.so.1 53925 libshpkm712.so 66563 libdax.so.1 146204 libc.so.1 511366 oracle 4763677
WOW – it’s great! We can see that Oracle is using two DAX-specific libraries – libdax.so.1 and libdax_query.so.1.
Those libraries can be found in this path: /usr/lib/sparcv9/
After closer examination we also find out that libdax_query.so.1 contains functions for offloading OZIP operations – that’s why DAX coprocessors are most effective with MEMCOMPRESS FOR QUERY HIGH algorithm.
root@m7:~/ks# nm /usr/lib/sparcv9/libdax_query.so.1 | grep ozip [202] | 25220| 472|FUNC |LOCL |0 |15 |check_ozip_hdr [224] | 40688| 660|FUNC |LOCL |0 |15 |extract_ozip [221] | 40088| 600|FUNC |LOCL |0 |15 |unpack_ozip
Another great finding was that the fbt provider, provides functions for tracing dax at system level.
root@m7:~/ks# dtrace -m fbt:dax -l ID PROVIDER MODULE FUNCTION NAME 78803 fbt dax dax_hma_unload entry 78804 fbt dax dax_hma_unload return 78805 fbt dax dax_hma_unload_finish entry 78806 fbt dax dax_hma_unload_finish return 78807 fbt dax dax_hma_exit entry 78808 fbt dax dax_hma_exit return 78809 fbt dax dax_open entry 78810 fbt dax dax_open return 78811 fbt dax dax_close entry 78812 fbt dax dax_close return 78813 fbt dax dax_ioctl entry 78814 fbt dax dax_ioctl return 78815 fbt dax dax_devmap entry 78816 fbt dax dax_devmap return 78817 fbt dax dax_getinfo entry 78818 fbt dax dax_getinfo return 78819 fbt dax dax_attach entry 78820 fbt dax dax_attach return 78821 fbt dax dax_detach entry 78822 fbt dax dax_detach return 78823 fbt dax dax_devmap_map entry 78824 fbt dax dax_devmap_map return 78825 fbt dax dax_devmap_access entry 78826 fbt dax dax_devmap_access return 78827 fbt dax dax_devmap_dup entry 78828 fbt dax dax_devmap_dup return 78829 fbt dax dax_devmap_unmap entry 78830 fbt dax dax_devmap_unmap return 78831 fbt dax dax_minor_alloc entry 78832 fbt dax dax_minor_alloc return 78833 fbt dax dax_minor_free entry 78834 fbt dax dax_minor_free return 78835 fbt dax dax_minor_get entry 78836 fbt dax dax_minor_get return 78837 fbt dax dax_minor_rele entry 78838 fbt dax dax_minor_rele return 78839 fbt dax dax_minor_init entry 78840 fbt dax dax_minor_init return 78841 fbt dax dax_hsvc_register entry 78842 fbt dax dax_hsvc_register return 78843 fbt dax dax_hsvc_unregister entry 78844 fbt dax dax_hsvc_unregister return 78845 fbt dax dax_tsd_destructor entry 78846 fbt dax dax_tsd_destructor return 78847 fbt dax dax_set_suspend_callbacks entry 78848 fbt dax dax_set_suspend_callbacks return 78849 fbt dax dax_unset_suspend_callbacks entry 78850 fbt dax dax_unset_suspend_callbacks return 78851 fbt dax dax_hv_init entry 78852 fbt dax dax_hv_init return 78853 fbt dax dax_hma_init entry 78854 fbt dax dax_hma_init return 78855 fbt dax dax_hma_fini entry 78856 fbt dax dax_hma_fini return 78857 fbt dax dax_state_create entry 78858 fbt dax dax_state_create return 78859 fbt dax dax_sfmmu_hash_remove_state entry 78860 fbt dax dax_sfmmu_hash_remove_state return 78861 fbt dax dax_state_destroy entry 78862 fbt dax dax_state_destroy return 78863 fbt dax dax_ioctl_ccb_thr_init entry 78864 fbt dax dax_ioctl_ccb_thr_init return 78865 fbt dax dax_ioctl_ccb_thr_fini entry 78866 fbt dax dax_ioctl_ccb_thr_fini return 78867 fbt dax dax_ioctl_ca_dequeue entry 78868 fbt dax dax_ioctl_ca_dequeue return 78869 fbt dax dax_ctx_fini entry 78870 fbt dax dax_ctx_fini return 78871 fbt dax dax_ctx_alloc entry 78872 fbt dax dax_ctx_alloc return 78873 fbt dax dax_state_add_thread entry 78874 fbt dax dax_state_add_thread return 78875 fbt dax dax_state_add_sfmmu entry 78876 fbt dax dax_state_add_sfmmu return 78877 fbt dax dax_sfmmu_hash_add_state entry 78878 fbt dax dax_sfmmu_hash_add_state return 78879 fbt dax dax_hma_enable entry 78880 fbt dax dax_hma_enable return 78881 fbt dax dax_state_remove_thread entry 78882 fbt dax dax_state_remove_thread return 78883 fbt dax dax_ctx_flush_decommit_ccbs entry 78884 fbt dax dax_ctx_flush_decommit_ccbs return 78885 fbt dax dax_validate_ca_dequeue_args entry 78886 fbt dax dax_validate_ca_dequeue_args return 78887 fbt dax dax_ccb_buffer_get_contig_block entry 78888 fbt dax dax_ccb_buffer_get_contig_block return 78889 fbt dax dax_ccb_buffer_decommit entry 78890 fbt dax dax_ccb_buffer_decommit return 78891 fbt dax dax_ccb_info entry 78892 fbt dax dax_ccb_info return 78893 fbt dax dax_ccb_kill entry 78894 fbt dax dax_ccb_kill return 78895 fbt dax dax_ctx_flush_ccbs entry 78896 fbt dax dax_ctx_flush_ccbs return 78897 fbt dax dax_ccb_flush_contig entry 78898 fbt dax dax_ccb_flush_contig return 78899 fbt dax dax_ccb_wait entry 78900 fbt dax dax_ccb_wait return 78901 fbt dax dax_sfmmu_hash_ent_create entry 78902 fbt dax dax_sfmmu_hash_ent_create return 78903 fbt dax dax_sfmmu_hash_ent_add_state entry 78904 fbt dax dax_sfmmu_hash_ent_add_state return 78905 fbt dax dax_sfmmu_hash_ent_remove_state entry 78906 fbt dax dax_sfmmu_hash_ent_remove_state return 78907 fbt dax dax_sfmmu_hash_remove_sfmmu entry 78908 fbt dax dax_sfmmu_hash_remove_sfmmu return 78909 fbt dax dax_state_remove_sfmmu entry 78910 fbt dax dax_state_remove_sfmmu return 78911 fbt dax dax_sfmmu_hash_ent_destroy entry 78912 fbt dax dax_sfmmu_hash_ent_destroy return 78913 fbt dax dax_sfmmu_wait_matching_ccbs entry 78914 fbt dax dax_sfmmu_wait_matching_ccbs return 78915 fbt dax dax_thr_search_ccbs entry 78916 fbt dax dax_thr_search_ccbs return 78917 fbt dax dax_hma_ccb_wait entry 78918 fbt dax dax_hma_ccb_wait return 78919 fbt dax dax_ccb_search entry 78920 fbt dax dax_ccb_search return 78921 fbt dax dax_ccb_search_contig entry 78922 fbt dax dax_ccb_search_contig return 78923 fbt dax dax_ccb_contains_va entry 78924 fbt dax dax_ccb_contains_va return 78925 fbt dax dax_hma_save_state entry 78926 fbt dax dax_hma_save_state return 78927 fbt dax dax_ccb_flush_all entry 78928 fbt dax dax_ccb_flush_all return 78929 fbt dax dax_sfmmu_hash_walker entry 78930 fbt dax dax_sfmmu_hash_walker return 78931 fbt dax dax_sfmmu_flush_ccbs entry 78932 fbt dax dax_sfmmu_flush_ccbs return 78933 fbt dax dax_error_decode_cb entry 78934 fbt dax dax_error_decode_cb return 78935 fbt dax dax_pre_suspend_cb entry 78936 fbt dax dax_pre_suspend_cb return 78937 fbt dax dax_post_suspend_cb entry 78938 fbt dax dax_post_suspend_cb return 78939 fbt dax _info entry
After some investigation, I’ve noticed that most functions responsible for In-Memory execution are called kdmo*, kdzdcol*, kdml* and *ozip*. So I’ve written a short DTrace script to reveal all the functions names and execution times (nanosec) used in my previous query:
#!/usr/sbin/dtrace -s #pragma D option quiet fbt:dax::entry, pid$1::kdmo*:entry, pid$1::kdzdcol*:entry, pid$1::kdml*:entry, pid$1::*ozip*:entry, pid$1::*dax*:entry /pid==$1/ { self->t=timestamp; } fbt:dax::return, pid$1::kdmo*:return, pid$1::kdzdcol*:return, pid$1::kdml*:return, pid$1::*ozip*:return, pid$1::*dax*:return /pid==$1 && self->t!=0/ { @ftimes[probemod,probefunc]=sum(timestamp-self->t); self->t=0; } END { printa("%s:%-40s %@d\n",@ftimes); }
Here are the results:
root@m7:~/ks# ./dax_inmem_f.d 18746 ^C dax:dax_hma_enable 1200 dax:dax_devmap_map 1380 dax:dax_state_add_sfmmu 1395 dax:dax_devmap_access 1500 dax:dax_state_create 1635 dax:dax_sfmmu_hash_ent_add_state 1650 dax:dax_state_add_thread 1680 dax:dax_sfmmu_hash_ent_create 2115 dax:dax_devmap 2340 dax:dax_getinfo 2415 oracle:kdmoCostOverheadIMCScan 3915 oracle:kdmoCostTraversalPerIMCU 4065 oracle:kdmoCostIOPerIMCU 4110 oracle:kdmoCostMinMaxPerCU 4140 oracle:kdmoCostRowTxnJournal 4140 oracle:kdmoCostRowStitchPerCol 4215 dax:dax_minor_get 4605 oracle:kdmoInvalidInMemRowsFrn 5055 oracle:kdzdcol_sizeof 5190 libdax.so.1:dax_add_queue 6045 oracle:kdmoCacheDecompKeyGenCB 6315 oracle:kdmoCacheTabKeyCompCB 7230 oracle:kdmoCachePredEvalKeyGenCB 8010 oracle:kdmoMinMaxPruningPossible 8025 oracle:kdmoCacheMinMaxPruningKeyGenCB 8565 dax:dax_ctx_alloc 8655 dax:dax_hma_unload 9915 oracle:kdmoZeroSegStats 12690 oracle:kdmoCacheTabKeyGenCB 15270 dax:dax_ccb_contains_va 57000 dax:dax_ccb_buffer_decommit 339030 dax:dax_ccb_buffer_get_contig_block 362925 dax:dax_validate_ca_dequeue_args 381630 dax:dax_ioctl 486945 oracle:kdzdcol_skip_slots_rowid 6434055 oracle:kdzdcol_has_agg_rowset 7371105 oracle:kdzdcol_init_rowid 7478415 oracle:kdzdcol_eva_likefast_imc_dict 8026725 oracle:kdzdcol_has_eva 9398745 libdax.so.1:dax_query_copy_scan_operand 10169670 oracle:kdzdcol_agg_cols_imc_dict_fini_agg 10349370 oracle:kdmoInitSegStatsInt 11979975 oracle:kdmoMinMaxProcessOnePred 13855200 oracle:kdzdcol_init_unsep 13935975 oracle:kdzdcol_reset_unsep 18861735 oracle:kdzdcol_get_vals_imc_dict 19224690 oracle:kdzdcol_isnull_imc_dict 21221775 oracle:kdzdcol_dict_binary_search 22327710 oracle:kdzdcol_skip_slots 23420385 libdax.so.1:dax_submit 25563135 oracle:kdzdcol_end_imc 27531735 oracle:kdzdcol_end 27557565 libdax_query.so.1:dax_advantage 27981420 libdax_query.so.1:dax_valid 28049730 libdax_query.so.1:dax_available 28829775 libdax.so.1:dax_ccbca_mark_read 28991550 oracle:kdzdcol_dbas_create_valid_vec 29668080 oracle:kdzdcol_init_imc_dict 31321560 oracle:kdzdcol_reset 35010705 oracle:kdzdcol_init 35573295 oracle:kdzdcol_get_minval 40057785 oracle:kdzdcol_prep 45411450 oracle:kdmoDecompGetRate 46526115 oracle:kdzdcol_has_nulls 55654635 oracle:kdzdcol_get_maxval 59688045 oracle:kdmoGetHeaderColCU 64548000 oracle:kdmlGetColCUPointer 64854240 oracle:kdzdcol_get_vals_unsep_one 76925550 libdax.so.1:dax_log_msg 80138355 oracle:kdzdcol_skip_slots_unsep 554166045
We can also trace DAX usage at the system level, using just DAX module from FBT provider – user stack examination can give us some information about functions which were offloaded. To do that, we can use this DTrace scirpt:
#!/usr/sbin/dtrace -s #pragma D option quiet fbt:dax::entry { self->t=timestamp; } fbt:dax::return /self->t!=0/ { @times[ustack(100,500),probemod,probefunc]=quantize(timestamp-self->t); self->t=0; } tick-1s { trunc(@times,10); printa(@times); trunc(@times); } END { printa(@times); trunc(@times); }
Sample results will look like this:
dax dax_ioctl value ------------- Distribution ------------- count 4096 | 0 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 16384 | 0 libc.so.1`ioctl+0x8 libdax.so.1`dax_read_results+0x264 libdax_query.so.1`dax_query_execute+0x160 libdax_query.so.1`extract_ozip+0x210 libdax_query.so.1`dax_extract+0x1a0 libshpkm712.so`kdzk_ozip_decode_dydi+0x8c oracle`kdzk_ozip_decode+0x100 dax dax_ccb_buffer_get_contig_block value ------------- Distribution ------------- count 1024 | 0 2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 4096 | 0 libc.so.1`ioctl+0x8 libdax.so.1`dax_read_results+0x264 libdax_query.so.1`dax_query_execute+0x160 libdax_query.so.1`extract_ozip+0x210 libdax_query.so.1`dax_extract+0x1a0 libshpkm712.so`kdzk_ozip_decode_dydi+0x8c oracle`kdzk_ozip_decode+0x100 dax dax_validate_ca_dequeue_args
In the above example we can see that OZIP algorithm was offloaded by DAX coprocessors.
Another way to trace DAX usage is busstat:
root@m7:~/ks# busstat -e dax pic0 DAX_QRY0_output_valid DAX_QRY1_output_valid DAX_QRY2_output_valid DAX_QRY3_output_valid DAX_QRY0_input_valid DAX_QRY1_input_valid DAX_QRY2_input_valid DAX_QRY3_input_valid DAX_MSG_MDI_req DAX_MSG_MDI_nack DAX_MSG_data_xfer DAX_MSG_membus_PIO_pull DAX_MSG_membus_PIO_req DAX_MSG_sequencer_busy_cyc DAX_MSG_brkpt_hit DAX_STRM_addr_brkpt_hit DAX_STRM_qpipe_wr_req DAX_STRM_qpipe_rd_req DAX_STRM_mpipe_wr_req DAX_STRM_mpipe_rd_req DAX_STRM_mRTD_RMW_COU DAX_STRM_mRTD_RMW_MCU DAX_STRM_L3_probe_ack DAX_STRM_L3_probe DAX_STRM_SCC0_L3_no_credit_cyc DAX_STRM_SCC1_L3_no_credit_cyc DAX_STRM_SCC2_L3_no_credit_cyc DAX_STRM_SCC3_L3_no_credit_cyc DAX_STRM_SCC4_L3_no_credit_cyc DAX_STRM_SCC5_L3_no_credit_cyc DAX_STRM_SCC6_L3_no_credit_cyc DAX_STRM_SCC7_L3_no_credit_cyc DAX_STRM_COU0_no_wr_credit_cyc DAX_STRM_COU1_no_wr_credit_cyc DAX_STRM_COU2_no_wr_credit_cyc DAX_STRM_COU3_no_wr_credit_cyc DAX_STRM_COU0_no_rd_credit_cyc DAX_STRM_COU1_no_rd_credit_cyc DAX_STRM_COU2_no_rd_credit_cyc DAX_STRM_COU3_no_rd_credit_cyc DAX_STRM_RMW_req DAX_STRM_L3_WRA_req DAX_STRM_wr_req DAX_STRM_rd_req DAX_STRM_L3x_no_credit_cyc DAX_STRM_COUx_no_wr_credit_cyc DAX_STRM_COUx_no_rd_credit_cyc DAX_SCH_pipe_cmd_stall_ops DAX_SCH_pipe_cmd_stall_cyc DAX_SCH_pipe_cmd_sched DAX_SCH_remote_PIO DAX_SCH_msgq_full_cyc DAX_SCH_queryq_full_cyc DAX_SCH_ccb_fetch DAX_SCH_msg_cmd_stall_cyc DAX_SCH_msg_cmd_sched DAX_SCH_query_cmd_stall_cyc DAX_SCH_query_cmd_sched DAX_no_count DAX_all_cyc (...SAME OUTPUT FOR PIC1 AND PIC2...)
Those are physical CPU counters that can be used to determine the performance of DAX pipelines. There is no documentation (yet?) for those counters, but thanks to Martin Mueller, I know the meaning of two of them:
- DAX_SCH_query_cmd_sched (number of query cmds scheduled)
- DAX_SCH_query_cmd_stall_cyc (# of cycles a query cmd had to stall because of lack resources, a high number would indicate DAX overload)
You can monitor them using this command:
root@m7:~# busstat -w dax,pic0=DAX_SCH_query_cmd_sched,pic1=DAX_SCH_query_cmd_stall_cyc 5 time dev event0 pic0 event1 pic1 event2 pic2 5 dax0 DAX_SCH_query_cmd_sched 26 DAX_SCH_query_cmd_stall_cyc 0 DAX_all_cyc 8963253855 5 dax1 DAX_SCH_query_cmd_sched 26 DAX_SCH_query_cmd_stall_cyc 0 DAX_all_cyc 8963338617 5 dax2 DAX_SCH_query_cmd_sched 25 DAX_SCH_query_cmd_stall_cyc 0 DAX_all_cyc 8963366741 5 dax3 DAX_SCH_query_cmd_sched 24 DAX_SCH_query_cmd_stall_cyc 0 DAX_all_cyc 8963394841 5 dax4 DAX_SCH_query_cmd_sched 21 DAX_SCH_query_cmd_stall_cyc 0 DAX_all_cyc 8963417412 5 dax5 DAX_SCH_query_cmd_sched 27 DAX_SCH_query_cmd_stall_cyc 0 DAX_all_cyc 8963442430 5 dax6 DAX_SCH_query_cmd_sched 25 DAX_SCH_query_cmd_stall_cyc 0 DAX_all_cyc 8963464834 5 dax7 DAX_SCH_query_cmd_sched 28 DAX_SCH_query_cmd_stall_cyc 0 DAX_all_cyc 8963495126 10 dax0 DAX_SCH_query_cmd_sched 38 DAX_SCH_query_cmd_stall_cyc 0 DAX_all_cyc 8961987804
That’s it for now 🙂
P.S.
If you want to learn about dynamic tracing technics for analyzing Oracle behavior – you should read this awesome article by Luca Canali http://db-blog.web.cern.ch/blog/luca-canali/2016-01-linux-perf-probes-oracle-tracing