Thanks to Oracle I had a possibility to test the new Sparc M7 with DAX coprocessors to boost In-Memory performance.
You can read about it here and here
My first thought was – how to check if and when the DAX coprocessors are being used?
When you have a POC for Exadata, you want to check wait events for CELL SMART SCAN, to be sure that query offloading occurred.
Unfortunately there are no DAX-specific wait events, that you can check at the database level – so how will we know that DAX coprocessors are working? Should we just believe that queries are being offloaded? Well… there’s always DTrace 😉
It’s time for testing! I have three tables, loaded into In-Memory buffer:
2 | 1 select INMEMORY_SIZE/1024/1024 as im_size, bytes/1024/1024 as p_size, segment_name |
4 | 3 where segment_name in ('LINEORDER','CUSTOMER','SUPPLIER') |
8 | IM_SIZE P_SIZE SEGMENT_NAME |
9 | ---------- ---------- ------------------------------ |
10 | 412.1875 1148.03906 CUSTOMER |
11 | 63580.75 202806.672 LINEORDER |
12 | 31.1875 70.484375 SUPPLIER |
Let’s start with something simple – what modules/libraries are used, when querying a table stored in memory?
I will use this DTrace script
to trace the server process, which will execute this query:
2 | 1 select /*+ NOPARALLEL */ sum(LO_QUANTITY) |
4 | 3 where lo_tax between 1 and 10 |
5 | 4* and LO_SHIPMODE like '%MAIL%' |
and the results look like this:
1 | root@m7:~/ks# ./modules.d 18419 |
2 | dtrace: script './modules.d' matched 229096 probes |
7 | libdax_query.so.1 53925 |
WOW – it’s great! We can see that Oracle is using two DAX-specific libraries – libdax.so.1 and libdax_query.so.1.
Those libraries can be found in this path: /usr/lib/sparcv9/
After closer examination we also find out that libdax_query.so.1 contains functions for offloading OZIP operations – that’s why DAX coprocessors are most effective with MEMCOMPRESS FOR QUERY HIGH algorithm.
1 | root@m7:~/ks# nm /usr/lib/sparcv9/libdax_query.so.1 | grep ozip |
2 | [202] | 25220| 472|FUNC |LOCL |0 |15 |check_ozip_hdr |
3 | [224] | 40688| 660|FUNC |LOCL |0 |15 |extract_ozip |
4 | [221] | 40088| 600|FUNC |LOCL |0 |15 |unpack_ozip |
Another great finding was that the fbt provider, provides functions for tracing dax at system level.
1 | root@m7:~/ks# dtrace -m fbt:dax -l |
2 | ID PROVIDER MODULE FUNCTION NAME |
3 | 78803 fbt dax dax_hma_unload entry |
4 | 78804 fbt dax dax_hma_unload return |
5 | 78805 fbt dax dax_hma_unload_finish entry |
6 | 78806 fbt dax dax_hma_unload_finish return |
7 | 78807 fbt dax dax_hma_exit entry |
8 | 78808 fbt dax dax_hma_exit return |
9 | 78809 fbt dax dax_open entry |
10 | 78810 fbt dax dax_open return |
11 | 78811 fbt dax dax_close entry |
12 | 78812 fbt dax dax_close return |
13 | 78813 fbt dax dax_ioctl entry |
14 | 78814 fbt dax dax_ioctl return |
15 | 78815 fbt dax dax_devmap entry |
16 | 78816 fbt dax dax_devmap return |
17 | 78817 fbt dax dax_getinfo entry |
18 | 78818 fbt dax dax_getinfo return |
19 | 78819 fbt dax dax_attach entry |
20 | 78820 fbt dax dax_attach return |
21 | 78821 fbt dax dax_detach entry |
22 | 78822 fbt dax dax_detach return |
23 | 78823 fbt dax dax_devmap_map entry |
24 | 78824 fbt dax dax_devmap_map return |
25 | 78825 fbt dax dax_devmap_access entry |
26 | 78826 fbt dax dax_devmap_access return |
27 | 78827 fbt dax dax_devmap_dup entry |
28 | 78828 fbt dax dax_devmap_dup return |
29 | 78829 fbt dax dax_devmap_unmap entry |
30 | 78830 fbt dax dax_devmap_unmap return |
31 | 78831 fbt dax dax_minor_alloc entry |
32 | 78832 fbt dax dax_minor_alloc return |
33 | 78833 fbt dax dax_minor_free entry |
34 | 78834 fbt dax dax_minor_free return |
35 | 78835 fbt dax dax_minor_get entry |
36 | 78836 fbt dax dax_minor_get return |
37 | 78837 fbt dax dax_minor_rele entry |
38 | 78838 fbt dax dax_minor_rele return |
39 | 78839 fbt dax dax_minor_init entry |
40 | 78840 fbt dax dax_minor_init return |
41 | 78841 fbt dax dax_hsvc_register entry |
42 | 78842 fbt dax dax_hsvc_register return |
43 | 78843 fbt dax dax_hsvc_unregister entry |
44 | 78844 fbt dax dax_hsvc_unregister return |
45 | 78845 fbt dax dax_tsd_destructor entry |
46 | 78846 fbt dax dax_tsd_destructor return |
47 | 78847 fbt dax dax_set_suspend_callbacks entry |
48 | 78848 fbt dax dax_set_suspend_callbacks return |
49 | 78849 fbt dax dax_unset_suspend_callbacks entry |
50 | 78850 fbt dax dax_unset_suspend_callbacks return |
51 | 78851 fbt dax dax_hv_init entry |
52 | 78852 fbt dax dax_hv_init return |
53 | 78853 fbt dax dax_hma_init entry |
54 | 78854 fbt dax dax_hma_init return |
55 | 78855 fbt dax dax_hma_fini entry |
56 | 78856 fbt dax dax_hma_fini return |
57 | 78857 fbt dax dax_state_create entry |
58 | 78858 fbt dax dax_state_create return |
59 | 78859 fbt dax dax_sfmmu_hash_remove_state entry |
60 | 78860 fbt dax dax_sfmmu_hash_remove_state return |
61 | 78861 fbt dax dax_state_destroy entry |
62 | 78862 fbt dax dax_state_destroy return |
63 | 78863 fbt dax dax_ioctl_ccb_thr_init entry |
64 | 78864 fbt dax dax_ioctl_ccb_thr_init return |
65 | 78865 fbt dax dax_ioctl_ccb_thr_fini entry |
66 | 78866 fbt dax dax_ioctl_ccb_thr_fini return |
67 | 78867 fbt dax dax_ioctl_ca_dequeue entry |
68 | 78868 fbt dax dax_ioctl_ca_dequeue return |
69 | 78869 fbt dax dax_ctx_fini entry |
70 | 78870 fbt dax dax_ctx_fini return |
71 | 78871 fbt dax dax_ctx_alloc entry |
72 | 78872 fbt dax dax_ctx_alloc return |
73 | 78873 fbt dax dax_state_add_thread entry |
74 | 78874 fbt dax dax_state_add_thread return |
75 | 78875 fbt dax dax_state_add_sfmmu entry |
76 | 78876 fbt dax dax_state_add_sfmmu return |
77 | 78877 fbt dax dax_sfmmu_hash_add_state entry |
78 | 78878 fbt dax dax_sfmmu_hash_add_state return |
79 | 78879 fbt dax dax_hma_enable entry |
80 | 78880 fbt dax dax_hma_enable return |
81 | 78881 fbt dax dax_state_remove_thread entry |
82 | 78882 fbt dax dax_state_remove_thread return |
83 | 78883 fbt dax dax_ctx_flush_decommit_ccbs entry |
84 | 78884 fbt dax dax_ctx_flush_decommit_ccbs return |
85 | 78885 fbt dax dax_validate_ca_dequeue_args entry |
86 | 78886 fbt dax dax_validate_ca_dequeue_args return |
87 | 78887 fbt dax dax_ccb_buffer_get_contig_block entry |
88 | 78888 fbt dax dax_ccb_buffer_get_contig_block return |
89 | 78889 fbt dax dax_ccb_buffer_decommit entry |
90 | 78890 fbt dax dax_ccb_buffer_decommit return |
91 | 78891 fbt dax dax_ccb_info entry |
92 | 78892 fbt dax dax_ccb_info return |
93 | 78893 fbt dax dax_ccb_kill entry |
94 | 78894 fbt dax dax_ccb_kill return |
95 | 78895 fbt dax dax_ctx_flush_ccbs entry |
96 | 78896 fbt dax dax_ctx_flush_ccbs return |
97 | 78897 fbt dax dax_ccb_flush_contig entry |
98 | 78898 fbt dax dax_ccb_flush_contig return |
99 | 78899 fbt dax dax_ccb_wait entry |
100 | 78900 fbt dax dax_ccb_wait return |
101 | 78901 fbt dax dax_sfmmu_hash_ent_create entry |
102 | 78902 fbt dax dax_sfmmu_hash_ent_create return |
103 | 78903 fbt dax dax_sfmmu_hash_ent_add_state entry |
104 | 78904 fbt dax dax_sfmmu_hash_ent_add_state return |
105 | 78905 fbt dax dax_sfmmu_hash_ent_remove_state entry |
106 | 78906 fbt dax dax_sfmmu_hash_ent_remove_state return |
107 | 78907 fbt dax dax_sfmmu_hash_remove_sfmmu entry |
108 | 78908 fbt dax dax_sfmmu_hash_remove_sfmmu return |
109 | 78909 fbt dax dax_state_remove_sfmmu entry |
110 | 78910 fbt dax dax_state_remove_sfmmu return |
111 | 78911 fbt dax dax_sfmmu_hash_ent_destroy entry |
112 | 78912 fbt dax dax_sfmmu_hash_ent_destroy return |
113 | 78913 fbt dax dax_sfmmu_wait_matching_ccbs entry |
114 | 78914 fbt dax dax_sfmmu_wait_matching_ccbs return |
115 | 78915 fbt dax dax_thr_search_ccbs entry |
116 | 78916 fbt dax dax_thr_search_ccbs return |
117 | 78917 fbt dax dax_hma_ccb_wait entry |
118 | 78918 fbt dax dax_hma_ccb_wait return |
119 | 78919 fbt dax dax_ccb_search entry |
120 | 78920 fbt dax dax_ccb_search return |
121 | 78921 fbt dax dax_ccb_search_contig entry |
122 | 78922 fbt dax dax_ccb_search_contig return |
123 | 78923 fbt dax dax_ccb_contains_va entry |
124 | 78924 fbt dax dax_ccb_contains_va return |
125 | 78925 fbt dax dax_hma_save_state entry |
126 | 78926 fbt dax dax_hma_save_state return |
127 | 78927 fbt dax dax_ccb_flush_all entry |
128 | 78928 fbt dax dax_ccb_flush_all return |
129 | 78929 fbt dax dax_sfmmu_hash_walker entry |
130 | 78930 fbt dax dax_sfmmu_hash_walker return |
131 | 78931 fbt dax dax_sfmmu_flush_ccbs entry |
132 | 78932 fbt dax dax_sfmmu_flush_ccbs return |
133 | 78933 fbt dax dax_error_decode_cb entry |
134 | 78934 fbt dax dax_error_decode_cb return |
135 | 78935 fbt dax dax_pre_suspend_cb entry |
136 | 78936 fbt dax dax_pre_suspend_cb return |
137 | 78937 fbt dax dax_post_suspend_cb entry |
138 | 78938 fbt dax dax_post_suspend_cb return |
139 | 78939 fbt dax _info entry |
After some investigation, I’ve noticed that most functions responsible for In-Memory execution are called kdmo*, kdzdcol*, kdml* and *ozip*. So I’ve written a short DTrace script to reveal all the functions names and execution times (nanosec) used in my previous query:
16 | pid$1::kdzdcol*:return, |
20 | /pid==$1 && self->t!=0/ |
22 | @ftimes[probemod,probefunc]=sum(timestamp-self->t); |
27 | printa("%s:%-40s %@d\n",@ftimes); |
Here are the results:
1 | root@m7:~/ks# ./dax_inmem_f.d 18746 |
5 | dax:dax_state_add_sfmmu 1395 |
6 | dax:dax_devmap_access 1500 |
7 | dax:dax_state_create 1635 |
8 | dax:dax_sfmmu_hash_ent_add_state 1650 |
9 | dax:dax_state_add_thread 1680 |
10 | dax:dax_sfmmu_hash_ent_create 2115 |
13 | oracle:kdmoCostOverheadIMCScan 3915 |
14 | oracle:kdmoCostTraversalPerIMCU 4065 |
15 | oracle:kdmoCostIOPerIMCU 4110 |
16 | oracle:kdmoCostMinMaxPerCU 4140 |
17 | oracle:kdmoCostRowTxnJournal 4140 |
18 | oracle:kdmoCostRowStitchPerCol 4215 |
20 | oracle:kdmoInvalidInMemRowsFrn 5055 |
21 | oracle:kdzdcol_sizeof 5190 |
22 | libdax.so.1:dax_add_queue 6045 |
23 | oracle:kdmoCacheDecompKeyGenCB 6315 |
24 | oracle:kdmoCacheTabKeyCompCB 7230 |
25 | oracle:kdmoCachePredEvalKeyGenCB 8010 |
26 | oracle:kdmoMinMaxPruningPossible 8025 |
27 | oracle:kdmoCacheMinMaxPruningKeyGenCB 8565 |
29 | dax:dax_hma_unload 9915 |
30 | oracle:kdmoZeroSegStats 12690 |
31 | oracle:kdmoCacheTabKeyGenCB 15270 |
32 | dax:dax_ccb_contains_va 57000 |
33 | dax:dax_ccb_buffer_decommit 339030 |
34 | dax:dax_ccb_buffer_get_contig_block 362925 |
35 | dax:dax_validate_ca_dequeue_args 381630 |
37 | oracle:kdzdcol_skip_slots_rowid 6434055 |
38 | oracle:kdzdcol_has_agg_rowset 7371105 |
39 | oracle:kdzdcol_init_rowid 7478415 |
40 | oracle:kdzdcol_eva_likefast_imc_dict 8026725 |
41 | oracle:kdzdcol_has_eva 9398745 |
42 | libdax.so.1:dax_query_copy_scan_operand 10169670 |
43 | oracle:kdzdcol_agg_cols_imc_dict_fini_agg 10349370 |
44 | oracle:kdmoInitSegStatsInt 11979975 |
45 | oracle:kdmoMinMaxProcessOnePred 13855200 |
46 | oracle:kdzdcol_init_unsep 13935975 |
47 | oracle:kdzdcol_reset_unsep 18861735 |
48 | oracle:kdzdcol_get_vals_imc_dict 19224690 |
49 | oracle:kdzdcol_isnull_imc_dict 21221775 |
50 | oracle:kdzdcol_dict_binary_search 22327710 |
51 | oracle:kdzdcol_skip_slots 23420385 |
52 | libdax.so.1:dax_submit 25563135 |
53 | oracle:kdzdcol_end_imc 27531735 |
54 | oracle:kdzdcol_end 27557565 |
55 | libdax_query.so.1:dax_advantage 27981420 |
56 | libdax_query.so.1:dax_valid 28049730 |
57 | libdax_query.so.1:dax_available 28829775 |
58 | libdax.so.1:dax_ccbca_mark_read 28991550 |
59 | oracle:kdzdcol_dbas_create_valid_vec 29668080 |
60 | oracle:kdzdcol_init_imc_dict 31321560 |
61 | oracle:kdzdcol_reset 35010705 |
62 | oracle:kdzdcol_init 35573295 |
63 | oracle:kdzdcol_get_minval 40057785 |
64 | oracle:kdzdcol_prep 45411450 |
65 | oracle:kdmoDecompGetRate 46526115 |
66 | oracle:kdzdcol_has_nulls 55654635 |
67 | oracle:kdzdcol_get_maxval 59688045 |
68 | oracle:kdmoGetHeaderColCU 64548000 |
69 | oracle:kdmlGetColCUPointer 64854240 |
70 | oracle:kdzdcol_get_vals_unsep_one 76925550 |
71 | libdax.so.1:dax_log_msg 80138355 |
72 | oracle:kdzdcol_skip_slots_unsep 554166045 |
We can also trace DAX usage at the system level, using just DAX module from FBT provider – user stack examination can give us some information about functions which were offloaded. To do that, we can use this DTrace scirpt:
11 | @times[ustack(100,500),probemod,probefunc]=quantize(timestamp-self->t); |
Sample results will look like this:
2 | value ------------- Distribution ------------- count |
4 | 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 |
10 | libdax.so.1`dax_read_results+0x264 |
11 | libdax_query.so.1`dax_query_execute+0x160 |
12 | libdax_query.so.1`extract_ozip+0x210 |
13 | libdax_query.so.1`dax_extract+0x1a0 |
14 | libshpkm712.so`kdzk_ozip_decode_dydi+0x8c |
15 | oracle`kdzk_ozip_decode+0x100 |
16 | dax dax_ccb_buffer_get_contig_block |
17 | value ------------- Distribution ------------- count |
19 | 2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 |
24 | libdax.so.1`dax_read_results+0x264 |
25 | libdax_query.so.1`dax_query_execute+0x160 |
26 | libdax_query.so.1`extract_ozip+0x210 |
27 | libdax_query.so.1`dax_extract+0x1a0 |
28 | libshpkm712.so`kdzk_ozip_decode_dydi+0x8c |
29 | oracle`kdzk_ozip_decode+0x100 |
30 | dax dax_validate_ca_dequeue_args |
In the above example we can see that OZIP algorithm was offloaded by DAX coprocessors.
Another way to trace DAX usage is busstat:
1 | root@m7:~/ks# busstat -e dax |
14 | DAX_MSG_membus_PIO_pull |
16 | DAX_MSG_sequencer_busy_cyc |
18 | DAX_STRM_addr_brkpt_hit |
27 | DAX_STRM_SCC0_L3_no_credit_cyc |
28 | DAX_STRM_SCC1_L3_no_credit_cyc |
29 | DAX_STRM_SCC2_L3_no_credit_cyc |
30 | DAX_STRM_SCC3_L3_no_credit_cyc |
31 | DAX_STRM_SCC4_L3_no_credit_cyc |
32 | DAX_STRM_SCC5_L3_no_credit_cyc |
33 | DAX_STRM_SCC6_L3_no_credit_cyc |
34 | DAX_STRM_SCC7_L3_no_credit_cyc |
35 | DAX_STRM_COU0_no_wr_credit_cyc |
36 | DAX_STRM_COU1_no_wr_credit_cyc |
37 | DAX_STRM_COU2_no_wr_credit_cyc |
38 | DAX_STRM_COU3_no_wr_credit_cyc |
39 | DAX_STRM_COU0_no_rd_credit_cyc |
40 | DAX_STRM_COU1_no_rd_credit_cyc |
41 | DAX_STRM_COU2_no_rd_credit_cyc |
42 | DAX_STRM_COU3_no_rd_credit_cyc |
47 | DAX_STRM_L3x_no_credit_cyc |
48 | DAX_STRM_COUx_no_wr_credit_cyc |
49 | DAX_STRM_COUx_no_rd_credit_cyc |
50 | DAX_SCH_pipe_cmd_stall_ops |
51 | DAX_SCH_pipe_cmd_stall_cyc |
55 | DAX_SCH_queryq_full_cyc |
57 | DAX_SCH_msg_cmd_stall_cyc |
59 | DAX_SCH_query_cmd_stall_cyc |
60 | DAX_SCH_query_cmd_sched |
64 | (...SAME OUTPUT FOR PIC1 AND PIC2...) |
Those are physical CPU counters that can be used to determine the performance of DAX pipelines. There is no documentation (yet?) for those counters, but thanks to Martin Mueller, I know the meaning of two of them:
- DAX_SCH_query_cmd_sched (number of query cmds scheduled)
- DAX_SCH_query_cmd_stall_cyc (# of cycles a query cmd had to stall because of lack resources, a high number would indicate DAX overload)
You can monitor them using this command:
1 | root@m7:~# busstat -w dax,pic0=DAX_SCH_query_cmd_sched,pic1=DAX_SCH_query_cmd_stall_cyc 5 |
2 | time dev event0 pic0 event1 pic1 event2 pic2 |
3 | 5 dax0 DAX_SCH_query_cmd_sched 26 DAX_SCH_query_cmd_stall_cyc 0 DAX_all_cyc 8963253855 |
4 | 5 dax1 DAX_SCH_query_cmd_sched 26 DAX_SCH_query_cmd_stall_cyc 0 DAX_all_cyc 8963338617 |
5 | 5 dax2 DAX_SCH_query_cmd_sched 25 DAX_SCH_query_cmd_stall_cyc 0 DAX_all_cyc 8963366741 |
6 | 5 dax3 DAX_SCH_query_cmd_sched 24 DAX_SCH_query_cmd_stall_cyc 0 DAX_all_cyc 8963394841 |
7 | 5 dax4 DAX_SCH_query_cmd_sched 21 DAX_SCH_query_cmd_stall_cyc 0 DAX_all_cyc 8963417412 |
8 | 5 dax5 DAX_SCH_query_cmd_sched 27 DAX_SCH_query_cmd_stall_cyc 0 DAX_all_cyc 8963442430 |
9 | 5 dax6 DAX_SCH_query_cmd_sched 25 DAX_SCH_query_cmd_stall_cyc 0 DAX_all_cyc 8963464834 |
10 | 5 dax7 DAX_SCH_query_cmd_sched 28 DAX_SCH_query_cmd_stall_cyc 0 DAX_all_cyc 8963495126 |
11 | 10 dax0 DAX_SCH_query_cmd_sched 38 DAX_SCH_query_cmd_stall_cyc 0 DAX_all_cyc 8961987804 |
That’s it for now 🙂
P.S.
If you want to learn about dynamic tracing technics for analyzing Oracle behavior – you should read this awesome article by Luca Canali http://db-blog.web.cern.ch/blog/luca-canali/2016-01-linux-perf-probes-oracle-tracing