Brendan Gregg - Sun Microsystems
2008-May-27 18:26 UTC
[dtrace-discuss] MySQL provider plan
G''Day Mikael, How''s the provider going? If you have a plan for probes and arguments, or if you can draw up a plan, can you please post it here? It should also help give context to some of the difficulties you''ve run into. cheers, Brendan -- Brendan [CA, USA]
Hi Brendan
I have in this first patch added two new providers:
The mysql provider for the MySQL Server and the ndbd provider for the
ndbd
process (which is the data node in the MySQL Cluster). I have working
patches
for both of these for the 5.1.24 and 5.1.24-ndb-6.3.14 versions. I
also have merged
these providers into a 6.0.5 version but there I''m still working out
the details to get
it working, so far the issues is more about 6.0.5 per se and not so
much on DTrace
problems.
These are just initial probes and so I expect that both myself and
others will add
parameters to these probes as well as completely new probes.
There have been a couple of issues in the DTrace environment and I''ll
work directly
with the DTrace guys to see what can be done about that, but for
these providers that
I have added things works just nicely so far with a very small amount
of testing.
I completed this work a week and a half ago and have since enjoyed
travelling and
vacation and I''m just getting started again so I will hopefully get
the 6.0.5 version
working as well this week.
Originally the ndbd provider was 2 providers ndb_vm and ndb_blocks
but that didn''t
work, I also had some issues with a probe in an inline function which
was used in
long functions so I had to convert that function into a non-inlined
function.
I got the patches and the trees uploaded to www.iclaustron.com/
downloads_sun.html
if any wants to try those out.
Rgrds Mikael
PS: Providing the ndbd and mysql provider code below with comments.
The ndbd provider:
--------------------------
provider ndbd {
/*
Job execution probes
--------------------
Popular signals to keep track of:
TCKEYREQ (12):
Executed once for each primary key read/write in transaction
coordinator
LQHKEYREQ (316):
Executed once for each primary key read/write in local query
handler
PACKED_SIGNAL (342) :
Executes a bunch of COMMIT, COMMITTED, COMPLETE, COMPLETED signals
SCAN_TABREQ (32):
Start a full table scan or index scan
SCAN_FRAGREQ (353):
Start a scan/index scan of a partition
SCAN_NEXTREQ (28):
Execute next batch of records in a scan of a partition
NEXT_SCANREQ (332):
Execute next record in one batch of a scan of a partition
NDB is designed as an engine with two major concepts, blocks and
signals. Everything in the database engine is handled through
signals received in blocks. Blocks do not share data structures.
All communications between blocks happen through signals. These
signals can either be asynchronous in which case they are scheduled
with priority or sent directly as function calls.
The start_execute_signal probe catches start of execution of all
asynchronous signals. So this probe is very powerful and can be used
to find out what happens in the database engine.
The start_direct_signal probe catches start execution of a direct
signal.
These start and end probes together with the probes that handles
sending of the signals makes it possible to trace most of the
things happening in the database engine.
The additional probes possible to define are found in the ndb_blocks
provider. The remainder of the probes in this ndb_vm provider is
about monitoring the internal scheduler of the NDB kernel.
The gsn numbers are defined in the file GlobalSignalNumbers.h in the
directory storage/ndb/include/kernel. A signal called TCKEYREQ is
executed by a method called execTCKEYREQ in the code.
*/
probe start_execute_signal(int gsn, int ptr, void* signal);
probe end_execute_signal();
probe start_direct_signal(int gsn, void *signal);
probe end_direct_signal();
probe send_signal(int gsn, void *signal);
probe send_signal_delayed(int gsn, void *signal);
/*
Scheduler probes
----------------
There is a set of probes handling the scheduler part of NDB that
deals with signal handling locally and distributed, prioritisation,
delayed signals and polling for new signals from other nodes. Changes
to the scheduler has no impact on what gets executed only how
signals are gathered into TCP/IP messages and how many signals are
executed in a row and so forth. Thus it can have a major impact on
performance but not on functionality. There are also various things
done to improve real-time performance in new releases of MySQL
Cluster.
NDB executes a number of signals in a batch. It executes those
messages from the job buffer and the start probe has a parameter
that provides the number of signals in the job buffer on priority
level B where almost all signals are executed.
The end probe provides how many signals have been executed in this
batch. Remember that a number of those could have been generated
by signals executed in this batch. The end probe also reports the
occupancy after finishing, this should normally be 0 but can be
higher when many scan operations are operating and when the system
is very heavy loaded.
*/
probe start_do_job(int occupancy);
probe end_do_job(int num_signals, int occupancy);
/*
Every now and then we check for new API or data nodes connecting.
This probe is activated before those are checked.
*/
probe update_connections();
/*
These probes are activated before and after sending all distributed
signals generated by executing a batch of signals.
*/
probe start_send();
probe end_send();
/*
These probes are activated before and after poll on all sockets are
performed. The start probe have a timeout of either 0 or 10
milliseconds. The end probe reports 0 if no socket had any data and
1 otherwise.
*/
probe start_poll(int timeout);
probe end_poll(int);
/*
These probes are activated before and after receiving data on the
sockets
that reported data available in the poll call.
*/
probe start_receive();
probe end_receive();
/*
These probes are activated before and after checking the time queue
for delayed signals.
*/
probe start_scan_time_queue();
probe end_scan_time_queue();
/*
DBTC probes
-----------
The most interesting signals to trace in DBTC is the TCKEYREQ,
TCINDXREQ and SCAN_TABREQ. In addition all the phases of transaction
handling is interesting. These signals are sent piggybacked on a
signal PACKED_SIGNAL and thus we made specific probes for each of
those "signals" since those are handled internally in the DBTC
block.
*/
probe start_committed(void* signal);
probe end_committed();
probe start_completed(void* signal);
probe end_completed();
probe start_tc_lqhkeyconf(void* signal);
probe end_tc_lqhkeyconf();
/*
DBLQH probes
------------
The most interesting signals to trace in DBLQH is LQHKEYREQ and
SCAN_FRAGREQ. In addition the commit signals are handled by
PACKED_SIGNAL in the same way as in DBTC and have their own
probes.
*/
probe start_commit(void* signal);
probe end_commit();
probe start_complete(void* signal);
probe end_complete();
probe start_lqh_lqhkeyconf(void* signal);
probe end_lqh_lqhkeyconf();
probe start_remove_marker_ord(void* signal);
probe end_remove_marker_ord();
/*
DBACC probes
------------
The most interesting signal to DBACC is ACCKEYREQ which is a direct
signal. However when a transaction gets into a lock situation is such
an interesting event to trace from a users perspective that we put in
a special probe for lock and unlock.
The third parameter in start_acckeyreq_lockwait tells us whether this
is a scan being locked (otherwise a primary key).
The second parameter in the end_acckeyreq_lockwait indicates whether
the tuple was gone while waiting for the lock and the third indicates
if it was a scan operation or not.
*/
probe start_acckeyreq_lockwait(void* op_rec, int op_ptr, int scan_flag);
probe end_acckeyreq_lockwait(int op_ptr, int tuple_gone, int scan_flag);
/*
DBDIH probes
------------
The most interesting events in DBDIH is DIGETNODESREQ and
DIVERIFYREQ.
There is also numerous interesting things connected to checkpoints
here.
One interesting thing to measure in DBDIH is the time that a
transaction
can be blocked waiting for a global checkpoint to be completed after
it started. This can be done by triggering the direct signal
DIVERIFYREQ and at the return check if theData[0] == 1 in which case
the wait is happening. theData[0] in DIVERIFYREQ is the reference
of the
transaction.
The wait ends when DIVERIFYCONF is sent and also here theData[0]
is the
reference to the transaction.
The signal number of DIVERIFYREQ is 241 and of DIVERIFYCONF is 239.
All this can be handled with the ndb_vm probes so there are no
specific
DBDIH probes.
*/
/*
DBTUP probes
------------
The most crucial signal to DBTUP is TUPKEYREQ where the actual
read and
write of database data happens. In addition the TUP_COMMITREQ is
interesting and since this happens also as a normal function call we
added special probes for this.
*/
probe start_tup_commit(int op_ptr);
probe end_tup_commit(int type_of_return);
};
The mysql provider:
---------------------------
provider mysql {
/*
These set of probes are used when performing write operations towards
any handler.
*/
probe insert_row_start();
probe insert_row_end();
probe update_row_start();
probe update_row_end();
probe delete_row_start();
probe delete_row_end();
/* This probe is activated when calling external_lock for any handler */
probe external_lock(int);
/*
These probes are activated when a filesort activity happens in a
query.
*/
probe filesort_start();
probe filesort_end();
/*
The query types SELECT, INSERT, INSERT AS SELECT, UPDATE, UPDATE with
multiple tables, DELETE, DELETE with multiple tables are all probed.
The start probe always contains the query text.
*/
probe select_start(char *query);
probe select_end();
probe insert_start(char *query);
probe insert_end();
probe insert_select_start(char *query);
probe insert_select_end();
probe update_start(char *query);
probe update_end();
probe multi_update_start(char *query);
probe multi_update_end();
probe delete_start(char *query);
probe delete_end();
probe multi_delete_start(char *query);
probe multi_delete_end();
/*
These probes are used to discover when there are query cache hits and
misses.
*/
probe query_cache_hit(char *query);
probe query_cache_miss(char *query);
/*
This probe is activated when a query is executed. This means also all
independent queries of a stored procedure and prepared statements.
Also
the stored procedure itself is a query. This also has the query
string
as a parameter.
*/
probe query_execute_start(char *query);
probe query_execute_end();
/* These probes are used to measure the time waiting for network
traffic. */
probe net_read_start();
probe net_read_end();
};
27 maj 2008 kl. 20.26 skrev Brendan Gregg - Sun Microsystems:
> G''Day Mikael,
>
> How''s the provider going? If you have a plan for probes and
> arguments,
> or if you can draw up a plan, can you please post it here? It should
> also help give context to some of the difficulties you''ve run
into.
>
> cheers,
>
> Brendan
>
> --
> Brendan
> [CA, USA]
Brendan Gregg - Sun Microsystems
2008-May-28 05:15 UTC
[dtrace-discuss] MySQL provider plan
G''Day Mikael, Thanks for the details - this is looking like an awsome set of probes - incredible visibility for MySQL internals! Here are some suggestions: On Tue, May 27, 2008 at 08:55:23PM +0200, Mikael Ronstr?m wrote:> Hi Brendan > I have in this first patch added two new providers:[...]> The ndbd provider: > -------------------------- > provider ndbd { > /* > Job execution probes[...]> */ > probe start_execute_signal(int gsn, int ptr, void* signal); > probe end_execute_signal(); > probe start_direct_signal(int gsn, void *signal); > probe end_direct_signal(); > probe send_signal(int gsn, void *signal); > probe send_signal_delayed(int gsn, void *signal);The convention is to end with ''-start'' and ''-end'', which also helps to visually group these. The convention is also to use dashes, not underscores (which is achived using "__" in the source). We also try to group components from most common to least common element. So the probe names should be: signal-execute-start signal-execute-end signal-direct-start signal-direct-end signal-send signal-send-delayed About the arguments - is there any other context that can be supplied? (I realise that these are asynchronous.) I''m thinking of database name, etc. Can the signals error in any way? If so, this should be provided as an argument in the end probe.> > /* > Scheduler probes[...]> */ > probe start_do_job(int occupancy); > probe end_do_job(int num_signals, int occupancy); > probe update_connections(); > probe start_send(); > probe end_send(); > probe start_poll(int timeout); > probe end_poll(int); > probe start_receive(); > probe end_receive(); > probe start_scan_time_queue(); > probe end_scan_time_queue();Probe names should be shuffled as before. There aren''t many args here - can we provide context args? Same suggestions for the other DBTC ... probes. [...]> The mysql provider: > --------------------------- > provider mysql { > /* > These set of probes are used when performing write operations towards > any handler. > */ > probe insert_row_start(); > probe insert_row_end(); > probe update_row_start(); > probe update_row_end(); > probe delete_row_start(); > probe delete_row_end(); > /* This probe is activated when calling external_lock for any handler */ > probe external_lock(int); > /* > These probes are activated when a filesort activity happens in a > query. > */ > probe filesort_start(); > probe filesort_end(); > /* > The query types SELECT, INSERT, INSERT AS SELECT, UPDATE, UPDATE with > multiple tables, DELETE, DELETE with multiple tables are all probed. > The start probe always contains the query text. > */ > probe select_start(char *query); > probe select_end(); > probe insert_start(char *query); > probe insert_end(); > probe insert_select_start(char *query); > probe insert_select_end(); > probe update_start(char *query); > probe update_end(); > probe multi_update_start(char *query); > probe multi_update_end(); > probe delete_start(char *query); > probe delete_end(); > probe multi_delete_start(char *query); > probe multi_delete_end(); > /* > These probes are used to discover when there are query cache hits and > misses. > */ > probe query_cache_hit(char *query); > probe query_cache_miss(char *query); > /* > This probe is activated when a query is executed. This means also all > independent queries of a stored procedure and prepared statements. > Also > the stored procedure itself is a query. This also has the query > string > as a parameter. > */ > probe query_execute_start(char *query); > probe query_execute_end(); > /* These probes are used to measure the time waiting for network > traffic. */ > probe net_read_start(); > probe net_read_end(); > };This is a great collection of probes (and start/end are around the right way for these, as are some of the word component ordering :). How about a consistent set of args like: (void *)thr THR Class pointer (for raw debugging) (char *)database Database name string (char *)user Username string (char *)host Hostname or IP address string (char *)query SQL Query string (int)result For -end probes (0 == success, -1 == failure) It will be great to easily trace events by database, user, host, as well as by query. cheers, Brendan -- Brendan [CA, USA]
Hi Mikael,> The mysql provider: > --------------------------- > provider mysql { > /* > These set of probes are used when performing write operations > towards > any handler. > */ > probe insert_row_start(); > probe insert_row_end(); > probe update_row_start(); > probe update_row_end(); > probe delete_row_start(); > probe delete_row_end(); > /* This probe is activated when calling external_lock for any > handler */ > probe external_lock(int);I think it would be useful here if, at the very least, the ID of the thread that triggered these operations was added into the probe. Otherwise it will be impossible to determine which operation or query triggers these functions.> > /* > These probes are activated when a filesort activity happens in a > query. > */ > probe filesort_start(); > probe filesort_end();Which query? How would you identify which query had triggered the filesort?> > /* > The query types SELECT, INSERT, INSERT AS SELECT, UPDATE, UPDATE > with > multiple tables, DELETE, DELETE with multiple tables are all probed. > The start probe always contains the query text. > */ > probe select_start(char *query); > probe select_end(); > probe insert_start(char *query); > probe insert_end(); > probe insert_select_start(char *query); > probe insert_select_end(); > probe update_start(char *query); > probe update_end(); > probe multi_update_start(char *query); > probe multi_update_end(); > probe delete_start(char *query); > probe delete_end(); > probe multi_delete_start(char *query); > probe multi_delete_end();I''m not sure the query alone will be enough for a user to identify the operation as it goes through the system. The same query may be raised by multiple clients simultaneous; how would you identify those queries that are causing problems and being blocked as opposed to those that aren''t? Those using DTrace on a busy server with multiple connections will need to be able to filter out the query and connections they do/dont want to a much finer level. For the probes I created and pushed through to Brian I included the thread ID, query, client user, client hostname and database. For select/insert/delete I included the number of rows inserted/ deleted. For Update, I included the number of rows matched and the number of rows actually changed. I think this should be a minimum and provides DTrace users with the best range of information for them to get some useful data from the probes. That enables you to generate useful statements like this one: Thread Time ms user at host DB Rows/f:u Query select 3 0 root at localhost test 0:0 update t1 set c1=9 where c1=10 select 3 0 root at localhost test 0:0 update t1 set c1=9 where c1=5 select 3 2870 root at localhost test 16384:16384 update t1 set c1=9 where c1=6 select 4 0 root at localhost test 3 show databases select 4 4 root at localhost test 6 show tables select 4 2 root at localhost test 1 select count(*) from t1 where c1=6 Can we have this original information added into your probes? MC -- Martin ''MC'' Brown, mc at mcslp.com Everything MCslp: http://planet.mcslp.com
Hi Brendan, Comments below. 28 maj 2008 kl. 07.15 skrev Brendan Gregg - Sun Microsystems:> G''Day Mikael, > > Thanks for the details - this is looking like an awsome set of > probes - > incredible visibility for MySQL internals! > > Here are some suggestions: > > On Tue, May 27, 2008 at 08:55:23PM +0200, Mikael Ronstr?m wrote: >> Hi Brendan >> I have in this first patch added two new providers: > [...] >> The ndbd provider: >> -------------------------- >> provider ndbd { >> /* >> Job execution probes > [...] >> */ >> probe start_execute_signal(int gsn, int ptr, void* signal); >> probe end_execute_signal(); >> probe start_direct_signal(int gsn, void *signal); >> probe end_direct_signal(); >> probe send_signal(int gsn, void *signal); >> probe send_signal_delayed(int gsn, void *signal); > > The convention is to end with ''-start'' and ''-end'', which also helps to > visually group these. The convention is also to use dashes, not > underscores > (which is achived using "__" in the source). We also try to group > components > from most common to least common element. So the probe names > should be: > > signal-execute-start > signal-execute-end > signal-direct-start > signal-direct-end > signal-send > signal-send-delayed > > About the arguments - is there any other context that can be supplied? > (I realise that these are asynchronous.) I''m thinking of database > name, etc. >All context can be found from the void* signal, this is essentially an array of upto 25 uint32 words. Signal is a message and the signal object is the message container. Most signals are fixed size and a number of uint32''s simply.> Can the signals error in any way? If so, this should be provided > as an > argument in the end probe. >No a signal can''t error, errors in a signal are sent as a new signal, signals can also assert. I''ve changed to use - instead of _ in all places.>> >> /* >> Scheduler probes > [...] >> */ >> probe start_do_job(int occupancy); >> probe end_do_job(int num_signals, int occupancy); >> probe update_connections();Added int num_new_connections, int num_dropped_connections>> probe start_send(); >> probe end_send();Added int num_sends, int num_bytes_sent>> probe start_poll(int timeout); >> probe end_poll(int);This parameter is a boolean if any data is to be received.>> probe start_receive(); >> probe end_receive();Added int num_receives, int num_bytes_received num_receives and num_sends means how many sockets have sent/received, the number of bytes is the total over all sockets sent/received.>> probe start_scan_time_queue(); >> probe end_scan_time_queue(); > > Probe names should be shuffled as before. There aren''t many args > here - > can we provide context args? Same suggestions for the other > DBTC ... probes. >> [...] >> The mysql provider: >> --------------------------- >> provider mysql { >> /* >> These set of probes are used when performing write operations >> towards >> any handler. >> */ >> probe insert_row_start(); >> probe insert_row_end(); >> probe update_row_start(); >> probe update_row_end(); >> probe delete_row_start(); >> probe delete_row_end(); >> /* This probe is activated when calling external_lock for any >> handler */ >> probe external_lock(int); >> /* >> These probes are activated when a filesort activity happens in a >> query. >> */ >> probe filesort_start(); >> probe filesort_end(); >> /* >> The query types SELECT, INSERT, INSERT AS SELECT, UPDATE, UPDATE >> with >> multiple tables, DELETE, DELETE with multiple tables are all >> probed. >> The start probe always contains the query text. >> */ >> probe select_start(char *query); >> probe select_end(); >> probe insert_start(char *query); >> probe insert_end(); >> probe insert_select_start(char *query); >> probe insert_select_end(); >> probe update_start(char *query); >> probe update_end(); >> probe multi_update_start(char *query); >> probe multi_update_end(); >> probe delete_start(char *query); >> probe delete_end(); >> probe multi_delete_start(char *query); >> probe multi_delete_end(); >> /* >> These probes are used to discover when there are query cache >> hits and >> misses. >> */ >> probe query_cache_hit(char *query); >> probe query_cache_miss(char *query); >> /* >> This probe is activated when a query is executed. This means >> also all >> independent queries of a stored procedure and prepared statements. >> Also >> the stored procedure itself is a query. This also has the query >> string >> as a parameter. >> */ >> probe query_execute_start(char *query); >> probe query_execute_end(); >> /* These probes are used to measure the time waiting for network >> traffic. */ >> probe net_read_start(); >> probe net_read_end(); >> }; > > This is a great collection of probes (and start/end are around the > right > way for these, as are some of the word component ordering :). How > about a > consistent set of args like: > > (void *)thr THR Class pointer (for raw debugging) > (char *)database Database name string > (char *)user Username string > (char *)host Hostname or IP address string > (char *)query SQL Query string > (int)result For -end probes (0 == success, -1 > == failure) > > It will be great to easily trace events by database, user, host, as > well as > by query. >I added the following parameters to the probe query__start(void *thd, char *db_name, char *user, char *hostname, char *query, int exec_type) where exec_type is 0 for top-level query, 1 for Prepared statement, 2 for cursor statement, 3 for query in stored procedure. NOTE: I''ve changed the name from query_execute_start to query__start I added int result to query__end probe. Since query_start is called before any other probe, the scripts can gather the parameters from this probe and thus I don''t have to add this long list of parameters in multiple places. The query__start and query__end should be aligned with Postgres and other DBMS DTrace probes hopefully. The thd void* in PostgresSQL can either be ignored or replaced by some Postgres internal data structure. At some point in time I''ll also add transaction-start and transaction- end probes. This is still just a first step, so I hope to add more DTrace probes later, however there are a couple of DTrace problems that have to be dealt with first. Will publish patches and trees for 5.1, MySQL CGE 6.3 and 6.0 as soon as they''re tested and ready. Rgrds Mikael> cheers, > > Brendan > > -- > Brendan > [CA, USA]