Brendan Gregg - Sun Microsystems
2008-May-27 18:26 UTC
[dtrace-discuss] MySQL provider plan
G''Day Mikael, How''s the provider going? If you have a plan for probes and arguments, or if you can draw up a plan, can you please post it here? It should also help give context to some of the difficulties you''ve run into. cheers, Brendan -- Brendan [CA, USA]
Hi Brendan I have in this first patch added two new providers: The mysql provider for the MySQL Server and the ndbd provider for the ndbd process (which is the data node in the MySQL Cluster). I have working patches for both of these for the 5.1.24 and 5.1.24-ndb-6.3.14 versions. I also have merged these providers into a 6.0.5 version but there I''m still working out the details to get it working, so far the issues is more about 6.0.5 per se and not so much on DTrace problems. These are just initial probes and so I expect that both myself and others will add parameters to these probes as well as completely new probes. There have been a couple of issues in the DTrace environment and I''ll work directly with the DTrace guys to see what can be done about that, but for these providers that I have added things works just nicely so far with a very small amount of testing. I completed this work a week and a half ago and have since enjoyed travelling and vacation and I''m just getting started again so I will hopefully get the 6.0.5 version working as well this week. Originally the ndbd provider was 2 providers ndb_vm and ndb_blocks but that didn''t work, I also had some issues with a probe in an inline function which was used in long functions so I had to convert that function into a non-inlined function. I got the patches and the trees uploaded to www.iclaustron.com/ downloads_sun.html if any wants to try those out. Rgrds Mikael PS: Providing the ndbd and mysql provider code below with comments. The ndbd provider: -------------------------- provider ndbd { /* Job execution probes -------------------- Popular signals to keep track of: TCKEYREQ (12): Executed once for each primary key read/write in transaction coordinator LQHKEYREQ (316): Executed once for each primary key read/write in local query handler PACKED_SIGNAL (342) : Executes a bunch of COMMIT, COMMITTED, COMPLETE, COMPLETED signals SCAN_TABREQ (32): Start a full table scan or index scan SCAN_FRAGREQ (353): Start a scan/index scan of a partition SCAN_NEXTREQ (28): Execute next batch of records in a scan of a partition NEXT_SCANREQ (332): Execute next record in one batch of a scan of a partition NDB is designed as an engine with two major concepts, blocks and signals. Everything in the database engine is handled through signals received in blocks. Blocks do not share data structures. All communications between blocks happen through signals. These signals can either be asynchronous in which case they are scheduled with priority or sent directly as function calls. The start_execute_signal probe catches start of execution of all asynchronous signals. So this probe is very powerful and can be used to find out what happens in the database engine. The start_direct_signal probe catches start execution of a direct signal. These start and end probes together with the probes that handles sending of the signals makes it possible to trace most of the things happening in the database engine. The additional probes possible to define are found in the ndb_blocks provider. The remainder of the probes in this ndb_vm provider is about monitoring the internal scheduler of the NDB kernel. The gsn numbers are defined in the file GlobalSignalNumbers.h in the directory storage/ndb/include/kernel. A signal called TCKEYREQ is executed by a method called execTCKEYREQ in the code. */ probe start_execute_signal(int gsn, int ptr, void* signal); probe end_execute_signal(); probe start_direct_signal(int gsn, void *signal); probe end_direct_signal(); probe send_signal(int gsn, void *signal); probe send_signal_delayed(int gsn, void *signal); /* Scheduler probes ---------------- There is a set of probes handling the scheduler part of NDB that deals with signal handling locally and distributed, prioritisation, delayed signals and polling for new signals from other nodes. Changes to the scheduler has no impact on what gets executed only how signals are gathered into TCP/IP messages and how many signals are executed in a row and so forth. Thus it can have a major impact on performance but not on functionality. There are also various things done to improve real-time performance in new releases of MySQL Cluster. NDB executes a number of signals in a batch. It executes those messages from the job buffer and the start probe has a parameter that provides the number of signals in the job buffer on priority level B where almost all signals are executed. The end probe provides how many signals have been executed in this batch. Remember that a number of those could have been generated by signals executed in this batch. The end probe also reports the occupancy after finishing, this should normally be 0 but can be higher when many scan operations are operating and when the system is very heavy loaded. */ probe start_do_job(int occupancy); probe end_do_job(int num_signals, int occupancy); /* Every now and then we check for new API or data nodes connecting. This probe is activated before those are checked. */ probe update_connections(); /* These probes are activated before and after sending all distributed signals generated by executing a batch of signals. */ probe start_send(); probe end_send(); /* These probes are activated before and after poll on all sockets are performed. The start probe have a timeout of either 0 or 10 milliseconds. The end probe reports 0 if no socket had any data and 1 otherwise. */ probe start_poll(int timeout); probe end_poll(int); /* These probes are activated before and after receiving data on the sockets that reported data available in the poll call. */ probe start_receive(); probe end_receive(); /* These probes are activated before and after checking the time queue for delayed signals. */ probe start_scan_time_queue(); probe end_scan_time_queue(); /* DBTC probes ----------- The most interesting signals to trace in DBTC is the TCKEYREQ, TCINDXREQ and SCAN_TABREQ. In addition all the phases of transaction handling is interesting. These signals are sent piggybacked on a signal PACKED_SIGNAL and thus we made specific probes for each of those "signals" since those are handled internally in the DBTC block. */ probe start_committed(void* signal); probe end_committed(); probe start_completed(void* signal); probe end_completed(); probe start_tc_lqhkeyconf(void* signal); probe end_tc_lqhkeyconf(); /* DBLQH probes ------------ The most interesting signals to trace in DBLQH is LQHKEYREQ and SCAN_FRAGREQ. In addition the commit signals are handled by PACKED_SIGNAL in the same way as in DBTC and have their own probes. */ probe start_commit(void* signal); probe end_commit(); probe start_complete(void* signal); probe end_complete(); probe start_lqh_lqhkeyconf(void* signal); probe end_lqh_lqhkeyconf(); probe start_remove_marker_ord(void* signal); probe end_remove_marker_ord(); /* DBACC probes ------------ The most interesting signal to DBACC is ACCKEYREQ which is a direct signal. However when a transaction gets into a lock situation is such an interesting event to trace from a users perspective that we put in a special probe for lock and unlock. The third parameter in start_acckeyreq_lockwait tells us whether this is a scan being locked (otherwise a primary key). The second parameter in the end_acckeyreq_lockwait indicates whether the tuple was gone while waiting for the lock and the third indicates if it was a scan operation or not. */ probe start_acckeyreq_lockwait(void* op_rec, int op_ptr, int scan_flag); probe end_acckeyreq_lockwait(int op_ptr, int tuple_gone, int scan_flag); /* DBDIH probes ------------ The most interesting events in DBDIH is DIGETNODESREQ and DIVERIFYREQ. There is also numerous interesting things connected to checkpoints here. One interesting thing to measure in DBDIH is the time that a transaction can be blocked waiting for a global checkpoint to be completed after it started. This can be done by triggering the direct signal DIVERIFYREQ and at the return check if theData[0] == 1 in which case the wait is happening. theData[0] in DIVERIFYREQ is the reference of the transaction. The wait ends when DIVERIFYCONF is sent and also here theData[0] is the reference to the transaction. The signal number of DIVERIFYREQ is 241 and of DIVERIFYCONF is 239. All this can be handled with the ndb_vm probes so there are no specific DBDIH probes. */ /* DBTUP probes ------------ The most crucial signal to DBTUP is TUPKEYREQ where the actual read and write of database data happens. In addition the TUP_COMMITREQ is interesting and since this happens also as a normal function call we added special probes for this. */ probe start_tup_commit(int op_ptr); probe end_tup_commit(int type_of_return); }; The mysql provider: --------------------------- provider mysql { /* These set of probes are used when performing write operations towards any handler. */ probe insert_row_start(); probe insert_row_end(); probe update_row_start(); probe update_row_end(); probe delete_row_start(); probe delete_row_end(); /* This probe is activated when calling external_lock for any handler */ probe external_lock(int); /* These probes are activated when a filesort activity happens in a query. */ probe filesort_start(); probe filesort_end(); /* The query types SELECT, INSERT, INSERT AS SELECT, UPDATE, UPDATE with multiple tables, DELETE, DELETE with multiple tables are all probed. The start probe always contains the query text. */ probe select_start(char *query); probe select_end(); probe insert_start(char *query); probe insert_end(); probe insert_select_start(char *query); probe insert_select_end(); probe update_start(char *query); probe update_end(); probe multi_update_start(char *query); probe multi_update_end(); probe delete_start(char *query); probe delete_end(); probe multi_delete_start(char *query); probe multi_delete_end(); /* These probes are used to discover when there are query cache hits and misses. */ probe query_cache_hit(char *query); probe query_cache_miss(char *query); /* This probe is activated when a query is executed. This means also all independent queries of a stored procedure and prepared statements. Also the stored procedure itself is a query. This also has the query string as a parameter. */ probe query_execute_start(char *query); probe query_execute_end(); /* These probes are used to measure the time waiting for network traffic. */ probe net_read_start(); probe net_read_end(); }; 27 maj 2008 kl. 20.26 skrev Brendan Gregg - Sun Microsystems:> G''Day Mikael, > > How''s the provider going? If you have a plan for probes and > arguments, > or if you can draw up a plan, can you please post it here? It should > also help give context to some of the difficulties you''ve run into. > > cheers, > > Brendan > > -- > Brendan > [CA, USA]
Brendan Gregg - Sun Microsystems
2008-May-28 05:15 UTC
[dtrace-discuss] MySQL provider plan
G''Day Mikael, Thanks for the details - this is looking like an awsome set of probes - incredible visibility for MySQL internals! Here are some suggestions: On Tue, May 27, 2008 at 08:55:23PM +0200, Mikael Ronstr?m wrote:> Hi Brendan > I have in this first patch added two new providers:[...]> The ndbd provider: > -------------------------- > provider ndbd { > /* > Job execution probes[...]> */ > probe start_execute_signal(int gsn, int ptr, void* signal); > probe end_execute_signal(); > probe start_direct_signal(int gsn, void *signal); > probe end_direct_signal(); > probe send_signal(int gsn, void *signal); > probe send_signal_delayed(int gsn, void *signal);The convention is to end with ''-start'' and ''-end'', which also helps to visually group these. The convention is also to use dashes, not underscores (which is achived using "__" in the source). We also try to group components from most common to least common element. So the probe names should be: signal-execute-start signal-execute-end signal-direct-start signal-direct-end signal-send signal-send-delayed About the arguments - is there any other context that can be supplied? (I realise that these are asynchronous.) I''m thinking of database name, etc. Can the signals error in any way? If so, this should be provided as an argument in the end probe.> > /* > Scheduler probes[...]> */ > probe start_do_job(int occupancy); > probe end_do_job(int num_signals, int occupancy); > probe update_connections(); > probe start_send(); > probe end_send(); > probe start_poll(int timeout); > probe end_poll(int); > probe start_receive(); > probe end_receive(); > probe start_scan_time_queue(); > probe end_scan_time_queue();Probe names should be shuffled as before. There aren''t many args here - can we provide context args? Same suggestions for the other DBTC ... probes. [...]> The mysql provider: > --------------------------- > provider mysql { > /* > These set of probes are used when performing write operations towards > any handler. > */ > probe insert_row_start(); > probe insert_row_end(); > probe update_row_start(); > probe update_row_end(); > probe delete_row_start(); > probe delete_row_end(); > /* This probe is activated when calling external_lock for any handler */ > probe external_lock(int); > /* > These probes are activated when a filesort activity happens in a > query. > */ > probe filesort_start(); > probe filesort_end(); > /* > The query types SELECT, INSERT, INSERT AS SELECT, UPDATE, UPDATE with > multiple tables, DELETE, DELETE with multiple tables are all probed. > The start probe always contains the query text. > */ > probe select_start(char *query); > probe select_end(); > probe insert_start(char *query); > probe insert_end(); > probe insert_select_start(char *query); > probe insert_select_end(); > probe update_start(char *query); > probe update_end(); > probe multi_update_start(char *query); > probe multi_update_end(); > probe delete_start(char *query); > probe delete_end(); > probe multi_delete_start(char *query); > probe multi_delete_end(); > /* > These probes are used to discover when there are query cache hits and > misses. > */ > probe query_cache_hit(char *query); > probe query_cache_miss(char *query); > /* > This probe is activated when a query is executed. This means also all > independent queries of a stored procedure and prepared statements. > Also > the stored procedure itself is a query. This also has the query > string > as a parameter. > */ > probe query_execute_start(char *query); > probe query_execute_end(); > /* These probes are used to measure the time waiting for network > traffic. */ > probe net_read_start(); > probe net_read_end(); > };This is a great collection of probes (and start/end are around the right way for these, as are some of the word component ordering :). How about a consistent set of args like: (void *)thr THR Class pointer (for raw debugging) (char *)database Database name string (char *)user Username string (char *)host Hostname or IP address string (char *)query SQL Query string (int)result For -end probes (0 == success, -1 == failure) It will be great to easily trace events by database, user, host, as well as by query. cheers, Brendan -- Brendan [CA, USA]
Hi Mikael,> The mysql provider: > --------------------------- > provider mysql { > /* > These set of probes are used when performing write operations > towards > any handler. > */ > probe insert_row_start(); > probe insert_row_end(); > probe update_row_start(); > probe update_row_end(); > probe delete_row_start(); > probe delete_row_end(); > /* This probe is activated when calling external_lock for any > handler */ > probe external_lock(int);I think it would be useful here if, at the very least, the ID of the thread that triggered these operations was added into the probe. Otherwise it will be impossible to determine which operation or query triggers these functions.> > /* > These probes are activated when a filesort activity happens in a > query. > */ > probe filesort_start(); > probe filesort_end();Which query? How would you identify which query had triggered the filesort?> > /* > The query types SELECT, INSERT, INSERT AS SELECT, UPDATE, UPDATE > with > multiple tables, DELETE, DELETE with multiple tables are all probed. > The start probe always contains the query text. > */ > probe select_start(char *query); > probe select_end(); > probe insert_start(char *query); > probe insert_end(); > probe insert_select_start(char *query); > probe insert_select_end(); > probe update_start(char *query); > probe update_end(); > probe multi_update_start(char *query); > probe multi_update_end(); > probe delete_start(char *query); > probe delete_end(); > probe multi_delete_start(char *query); > probe multi_delete_end();I''m not sure the query alone will be enough for a user to identify the operation as it goes through the system. The same query may be raised by multiple clients simultaneous; how would you identify those queries that are causing problems and being blocked as opposed to those that aren''t? Those using DTrace on a busy server with multiple connections will need to be able to filter out the query and connections they do/dont want to a much finer level. For the probes I created and pushed through to Brian I included the thread ID, query, client user, client hostname and database. For select/insert/delete I included the number of rows inserted/ deleted. For Update, I included the number of rows matched and the number of rows actually changed. I think this should be a minimum and provides DTrace users with the best range of information for them to get some useful data from the probes. That enables you to generate useful statements like this one: Thread Time ms user at host DB Rows/f:u Query select 3 0 root at localhost test 0:0 update t1 set c1=9 where c1=10 select 3 0 root at localhost test 0:0 update t1 set c1=9 where c1=5 select 3 2870 root at localhost test 16384:16384 update t1 set c1=9 where c1=6 select 4 0 root at localhost test 3 show databases select 4 4 root at localhost test 6 show tables select 4 2 root at localhost test 1 select count(*) from t1 where c1=6 Can we have this original information added into your probes? MC -- Martin ''MC'' Brown, mc at mcslp.com Everything MCslp: http://planet.mcslp.com
Hi Brendan, Comments below. 28 maj 2008 kl. 07.15 skrev Brendan Gregg - Sun Microsystems:> G''Day Mikael, > > Thanks for the details - this is looking like an awsome set of > probes - > incredible visibility for MySQL internals! > > Here are some suggestions: > > On Tue, May 27, 2008 at 08:55:23PM +0200, Mikael Ronstr?m wrote: >> Hi Brendan >> I have in this first patch added two new providers: > [...] >> The ndbd provider: >> -------------------------- >> provider ndbd { >> /* >> Job execution probes > [...] >> */ >> probe start_execute_signal(int gsn, int ptr, void* signal); >> probe end_execute_signal(); >> probe start_direct_signal(int gsn, void *signal); >> probe end_direct_signal(); >> probe send_signal(int gsn, void *signal); >> probe send_signal_delayed(int gsn, void *signal); > > The convention is to end with ''-start'' and ''-end'', which also helps to > visually group these. The convention is also to use dashes, not > underscores > (which is achived using "__" in the source). We also try to group > components > from most common to least common element. So the probe names > should be: > > signal-execute-start > signal-execute-end > signal-direct-start > signal-direct-end > signal-send > signal-send-delayed > > About the arguments - is there any other context that can be supplied? > (I realise that these are asynchronous.) I''m thinking of database > name, etc. >All context can be found from the void* signal, this is essentially an array of upto 25 uint32 words. Signal is a message and the signal object is the message container. Most signals are fixed size and a number of uint32''s simply.> Can the signals error in any way? If so, this should be provided > as an > argument in the end probe. >No a signal can''t error, errors in a signal are sent as a new signal, signals can also assert. I''ve changed to use - instead of _ in all places.>> >> /* >> Scheduler probes > [...] >> */ >> probe start_do_job(int occupancy); >> probe end_do_job(int num_signals, int occupancy); >> probe update_connections();Added int num_new_connections, int num_dropped_connections>> probe start_send(); >> probe end_send();Added int num_sends, int num_bytes_sent>> probe start_poll(int timeout); >> probe end_poll(int);This parameter is a boolean if any data is to be received.>> probe start_receive(); >> probe end_receive();Added int num_receives, int num_bytes_received num_receives and num_sends means how many sockets have sent/received, the number of bytes is the total over all sockets sent/received.>> probe start_scan_time_queue(); >> probe end_scan_time_queue(); > > Probe names should be shuffled as before. There aren''t many args > here - > can we provide context args? Same suggestions for the other > DBTC ... probes. >> [...] >> The mysql provider: >> --------------------------- >> provider mysql { >> /* >> These set of probes are used when performing write operations >> towards >> any handler. >> */ >> probe insert_row_start(); >> probe insert_row_end(); >> probe update_row_start(); >> probe update_row_end(); >> probe delete_row_start(); >> probe delete_row_end(); >> /* This probe is activated when calling external_lock for any >> handler */ >> probe external_lock(int); >> /* >> These probes are activated when a filesort activity happens in a >> query. >> */ >> probe filesort_start(); >> probe filesort_end(); >> /* >> The query types SELECT, INSERT, INSERT AS SELECT, UPDATE, UPDATE >> with >> multiple tables, DELETE, DELETE with multiple tables are all >> probed. >> The start probe always contains the query text. >> */ >> probe select_start(char *query); >> probe select_end(); >> probe insert_start(char *query); >> probe insert_end(); >> probe insert_select_start(char *query); >> probe insert_select_end(); >> probe update_start(char *query); >> probe update_end(); >> probe multi_update_start(char *query); >> probe multi_update_end(); >> probe delete_start(char *query); >> probe delete_end(); >> probe multi_delete_start(char *query); >> probe multi_delete_end(); >> /* >> These probes are used to discover when there are query cache >> hits and >> misses. >> */ >> probe query_cache_hit(char *query); >> probe query_cache_miss(char *query); >> /* >> This probe is activated when a query is executed. This means >> also all >> independent queries of a stored procedure and prepared statements. >> Also >> the stored procedure itself is a query. This also has the query >> string >> as a parameter. >> */ >> probe query_execute_start(char *query); >> probe query_execute_end(); >> /* These probes are used to measure the time waiting for network >> traffic. */ >> probe net_read_start(); >> probe net_read_end(); >> }; > > This is a great collection of probes (and start/end are around the > right > way for these, as are some of the word component ordering :). How > about a > consistent set of args like: > > (void *)thr THR Class pointer (for raw debugging) > (char *)database Database name string > (char *)user Username string > (char *)host Hostname or IP address string > (char *)query SQL Query string > (int)result For -end probes (0 == success, -1 > == failure) > > It will be great to easily trace events by database, user, host, as > well as > by query. >I added the following parameters to the probe query__start(void *thd, char *db_name, char *user, char *hostname, char *query, int exec_type) where exec_type is 0 for top-level query, 1 for Prepared statement, 2 for cursor statement, 3 for query in stored procedure. NOTE: I''ve changed the name from query_execute_start to query__start I added int result to query__end probe. Since query_start is called before any other probe, the scripts can gather the parameters from this probe and thus I don''t have to add this long list of parameters in multiple places. The query__start and query__end should be aligned with Postgres and other DBMS DTrace probes hopefully. The thd void* in PostgresSQL can either be ignored or replaced by some Postgres internal data structure. At some point in time I''ll also add transaction-start and transaction- end probes. This is still just a first step, so I hope to add more DTrace probes later, however there are a couple of DTrace problems that have to be dealt with first. Will publish patches and trees for 5.1, MySQL CGE 6.3 and 6.0 as soon as they''re tested and ready. Rgrds Mikael> cheers, > > Brendan > > -- > Brendan > [CA, USA]