thr3ads.net - dtrace discuss - [dtrace-discuss] Guidelines for Long Running DTrace Scripts [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Ben Rockwood

2009-Mar-12 05:04 UTC

[dtrace-discuss] Guidelines for Long Running DTrace Scripts

There are several forms of data collection that I simply can gather in
no other way than to use DTrace, and am therefore considering
implementing several perpetual scripts, potentially SMF controlled.

To date I''ve been reluctant to just leave scripts running for very long
periods of time (hours, days, weeks)... but I''m not entirely sure why.

Are there any general guidelines or best practice for doing so?
Possibly experiences from others who have done this?

benr.

Robert Milkowski

2009-Mar-16 23:36 UTC

head link

[dtrace-discuss] Guidelines for Long Running DTrace Scripts

Hello Ben,

Thursday, March 12, 2009, 5:04:40 AM, you wrote:

BR> There are several forms of data collection that I simply can gather in
BR> no other way than to use DTrace, and am therefore considering
BR> implementing several perpetual scripts, potentially SMF controlled.

BR> To date I''ve been reluctant to just leave scripts running for
very long
BR> periods of time (hours, days, weeks)... but I''m not entirely
sure why.

BR> Are there any general guidelines or best practice for doing so?
BR> Possibly experiences from others who have done this?

Just some thoughts, probably obvious:

 - make sure you discard (assign 0) all variables no longer used
 otherwise your script will probably be memory leaking; it''s common to
 not to think about it for one liners or short lived scripts but could
 be an issue for long time running ones

 - you probably want to monitor if any drops are happening

 - using a cyclic buffer could be useful

 - some of the dtrace buffers probably will need tuning - best to observe a
 script for some time and look for errors from dtrace

 - try to avoid any string comparisons in predicates (well not
 necessarily linked to long running scripts...) and only use direct
 comparisons without need to dereferencing pointers, etc
 
 - depends on what you are monitoring dtrace -Z could be useful as
 your application might restart, etc.

 - be very precise at probe definitions to monitor only what you
 really need otherwise a non-typical app/os behavior could induce big
 overhead from dtrace

 - observe allocated and resident memory sets of dtrace process for
 each script if they are not growing above acceptable levels

 - monitor any other errors coming from dtrace

 - libdtrace could be useful? (don''t know, just a thought)

 - "system is unresponsive" could be tuned IIRC and maybe you want it
 to be more aggressive so dtrace scripts exit quicker than default (so
 minimizing bad impact on a system)

 - dtrace speculations could be very useful to minimize amount of
 output and focus only on interesting ones

 - use walltimestamp and/or timestamp provided by dtrace on multi-cpu servers
instead
 on relaying on syslog or anything else external to dtrace - if real
 order of events is required to be know everything else will fail
 sooner or later (again, not necessarily related only to long running
 scripts)

 - remember dtrace can drop events by design and you neeed to take it
 into account - it is not an auditing framework afterall

 - when average numbers are good enough profile-N could be much more
 lightweight than measuring every event but you can miss some
 potentially interesting data...
 
 - never forget then there are other tools than dtrace and sometimes
 it is easier (better) to achieve something by using them than dtrace

 

-- 
Best regards,
 Robert Milkowski
                                       http://milek.blogspot.com

Marcelo Leal

2009-Mar-28 19:14 UTC

head link

[dtrace-discuss] Guidelines for Long Running DTrace Scripts

Hello there...
 I need to implement something similar, and before start, i did think to look
here first. And that is the good in being part of such community. ;-)
 As there is only one comment for the Ben''s question, and a good one, i
think if we can work in a prototype together, and maybe create a general
framework for that. And publish the result FMA/Dtrace scripts, and the
processing scripts. I''m thinking in use orca to do the plotting, so,
even the necessary steps for that.
 Ben did not talk about specifically what is the dtrace script he wants to
implement in "daemon" mode. So, i will explain my case, and ask Ben if
he can do the same, so we can implement together.
 I want the following informations for each ZFS dataset (FS or VOL), regarding
NFS operations, for all NFS servers:

 1 - Total requests (reads and writes);
 2 - Latency for each operation;
 3 - Total Sync operations (Zil);
 4 - And the spa_sync informations too.

 Would be nice to see if for "some reason" we have more requests that
we can handle... but i don''t think that is possible (in the end we
would have all done with big latency times, i guess). Anyway...
 
 Obviously, we do not need to get for example *all* the NFS operations, but we
do need something representative. Maybe aggregate in memory, and persist on disk
from time to time. I don''t know if there is some kind of
"timer" in dtrace to activate and deactivate probes.

 ps.: But a real time monitor(like analytics) would be very nice! ;-)

 That''s it, i wait your comments, and thanks a lot for your time!

 Leal
[ http://www.eall.com.br ]
-- 
This message posted from opensolaris.org

dtrace discuss - Mar 2009 - Guidelines for Long Running DTrace Scripts

[dtrace-discuss] Guidelines for Long Running DTrace Scripts

[dtrace-discuss] Guidelines for Long Running DTrace Scripts

[dtrace-discuss] Guidelines for Long Running DTrace Scripts