Ben Rockwood
2009-Mar-12 05:04 UTC
[dtrace-discuss] Guidelines for Long Running DTrace Scripts
There are several forms of data collection that I simply can gather in no other way than to use DTrace, and am therefore considering implementing several perpetual scripts, potentially SMF controlled. To date I''ve been reluctant to just leave scripts running for very long periods of time (hours, days, weeks)... but I''m not entirely sure why. Are there any general guidelines or best practice for doing so? Possibly experiences from others who have done this? benr.
Robert Milkowski
2009-Mar-16 23:36 UTC
[dtrace-discuss] Guidelines for Long Running DTrace Scripts
Hello Ben, Thursday, March 12, 2009, 5:04:40 AM, you wrote: BR> There are several forms of data collection that I simply can gather in BR> no other way than to use DTrace, and am therefore considering BR> implementing several perpetual scripts, potentially SMF controlled. BR> To date I''ve been reluctant to just leave scripts running for very long BR> periods of time (hours, days, weeks)... but I''m not entirely sure why. BR> Are there any general guidelines or best practice for doing so? BR> Possibly experiences from others who have done this? Just some thoughts, probably obvious: - make sure you discard (assign 0) all variables no longer used otherwise your script will probably be memory leaking; it''s common to not to think about it for one liners or short lived scripts but could be an issue for long time running ones - you probably want to monitor if any drops are happening - using a cyclic buffer could be useful - some of the dtrace buffers probably will need tuning - best to observe a script for some time and look for errors from dtrace - try to avoid any string comparisons in predicates (well not necessarily linked to long running scripts...) and only use direct comparisons without need to dereferencing pointers, etc - depends on what you are monitoring dtrace -Z could be useful as your application might restart, etc. - be very precise at probe definitions to monitor only what you really need otherwise a non-typical app/os behavior could induce big overhead from dtrace - observe allocated and resident memory sets of dtrace process for each script if they are not growing above acceptable levels - monitor any other errors coming from dtrace - libdtrace could be useful? (don''t know, just a thought) - "system is unresponsive" could be tuned IIRC and maybe you want it to be more aggressive so dtrace scripts exit quicker than default (so minimizing bad impact on a system) - dtrace speculations could be very useful to minimize amount of output and focus only on interesting ones - use walltimestamp and/or timestamp provided by dtrace on multi-cpu servers instead on relaying on syslog or anything else external to dtrace - if real order of events is required to be know everything else will fail sooner or later (again, not necessarily related only to long running scripts) - remember dtrace can drop events by design and you neeed to take it into account - it is not an auditing framework afterall - when average numbers are good enough profile-N could be much more lightweight than measuring every event but you can miss some potentially interesting data... - never forget then there are other tools than dtrace and sometimes it is easier (better) to achieve something by using them than dtrace -- Best regards, Robert Milkowski http://milek.blogspot.com
Marcelo Leal
2009-Mar-28 19:14 UTC
[dtrace-discuss] Guidelines for Long Running DTrace Scripts
Hello there... I need to implement something similar, and before start, i did think to look here first. And that is the good in being part of such community. ;-) As there is only one comment for the Ben''s question, and a good one, i think if we can work in a prototype together, and maybe create a general framework for that. And publish the result FMA/Dtrace scripts, and the processing scripts. I''m thinking in use orca to do the plotting, so, even the necessary steps for that. Ben did not talk about specifically what is the dtrace script he wants to implement in "daemon" mode. So, i will explain my case, and ask Ben if he can do the same, so we can implement together. I want the following informations for each ZFS dataset (FS or VOL), regarding NFS operations, for all NFS servers: 1 - Total requests (reads and writes); 2 - Latency for each operation; 3 - Total Sync operations (Zil); 4 - And the spa_sync informations too. Would be nice to see if for "some reason" we have more requests that we can handle... but i don''t think that is possible (in the end we would have all done with big latency times, i guess). Anyway... Obviously, we do not need to get for example *all* the NFS operations, but we do need something representative. Maybe aggregate in memory, and persist on disk from time to time. I don''t know if there is some kind of "timer" in dtrace to activate and deactivate probes. ps.: But a real time monitor(like analytics) would be very nice! ;-) That''s it, i wait your comments, and thanks a lot for your time! Leal [ http://www.eall.com.br ] -- This message posted from opensolaris.org