Hi Guys, I''m new (& self-taught) to DTrace and needed to write a program to track a specified dir and find out who/when/how etc if it got removed/renamed etc. As you can see from the below code, I''ve been caught by 1 or 2 gotchas during my testing. This is a serious prog, going into production asap, so any comments towards making it better/more robust would be appreciated. Cheers Chris PS Sorry about the layout, can''t see an option for code tags # Desc : Track dir deletions of specified dir in specified zone. # Attempts to handle path issues on cmd and/or dir/dir. # Tries to catch any form of removal eg shell cmds: # rm, rmdir, unlink, mv and ''internal'' code cmds inside Perl, C etc. # Note that normally this prog is controlled by # dt_dir_removal_mgr.pl, which reads the stdout & stderr, # filters false positives etc & logs and emails any alerts. # We do not allow the path of the tgt dir to be used, # as this may not be specified by the offending user/app... # thus we may get some false positives eg a file of the same name. # Local zonename is avail from DTrace, but filesystem and inode # are not avail from psinfo struct. # Not matching on zone because tgt dir can be deleted from global, # although the users should not be able to get in there. usage() { echo "USAGE: dt_dir_removal.sh -d dirname -z zonename -d dirname # dirname to track : must NOT inc path eg, dt_dir_removal.sh -d testdir " exit 1 } # --- Process Arguments --- # # Arg supplied ? if [[ $# -eq 0 ]] then usage fi # Check switch value & arg value : see usage() while getopts d: name do case $name in d) dirname=$(basename $OPTARG) ;; *) usage ;; esac done ################################# # --- DTrace --- # # NB: seem to need the single quotes around the DTrace code ... # This also means the even the contents of comment blocks CANNOT have single quotes # in them eg don''t, won''t etc... (sigh...) /usr/sbin/dtrace -n '' /* Params from shell input */ inline string DIRNAME = "''$dirname''"; #pragma D option quiet #pragma D option switchrate=10hz /* * Print header */ dtrace:::BEGIN { /* print main headers: We cannot line up final arg hdr exactly * because the cmd len varies */ printf("%-20s %-12s %5s %5s %6s %6s %s -> %s\n", "TIME", "ZONE", "GID", "UID", "PID", "PPID", "CMD", "TARGET") ; } /* * Check exec event type */ syscall::unlink:entry { /* Grab the dirname in qn to test later: remove any preceding path */ /* Experiment seems to indicate unlink will not have this value in the return state ; * contrast with rmdir below which may not have it in entry state */ tgt = basename(copyinstr(arg0)); } /* http://docs.sun.com/app/docs/doc/817-6223/6mlkidlrg?l=en&a=view#indexterm-458 : * Avoiding Errors * The copyin() and copyinstr() subroutines cannot read from user addresses which have not yet * been touched so even a valid address may cause an error if the page containing that address * has not yet been faulted in by being accessed. * To resolve this issue, wait for kernel or application to use the data before tracing it. * For example, you might wait until the system call returns to apply copyinstr() */ syscall::rmdir:entry, syscall::rename:entry { /* Try saving a ptr to the relevant value for later, otherwise it gives invalid addr error * in return section below */ self->file = arg0; } syscall::rmdir:return, syscall::rename:return { /* Grab the dirname in qn to test later: remove any preceding path */ tgt = basename(copyinstr(self->file)); } /* Not matching on zone because tgt dir can be deleted from global, * although the users should not be able to get in there. */ syscall::rmdir:return, syscall::rename:return, syscall::unlink:return / DIRNAME == tgt / { /* Print the field values. The TARGET tends not to line up as we print the cmd and the target name for completeness. For a shell level cmd, we will get the target name in the CMD field as well. For an "internal" cmd, eg rmdir() from within perl, the CMD field does not contain the target value. */ printf("%-20Y %-12s %5d %5d %6d %6d %s -> %s\n", walltimestamp, zonename, gid, uid, pid, ppid, curpsinfo->pr_psargs, tgt ) ; /* Clear the self->file ptr to avoid dynamic variable drop errors */ self->file = 0; } '' -- This message posted from opensolaris.org
Hi Guys, Just got this error <code> dtrace: error on enabled probe ID 6 (ID 11792: syscall::rename:return): invalid address (0x0) in action #1 at DIF offset 28 </code> which I thought I''d solved above. Apparently not :( any ptrs gratefully received. Cheers Chris -- This message posted from opensolaris.org
I''m on my phone so I''m not going to write much, but the first glaring bug is that your return probes are not predicated on the entry probe firing first. It is therefore possible that you are enabling probes whilst some process is in rmdir, unlink or rename, so that you see the return but not the entry. This is a common mistake for the DTrace novice. You return probe predicates should be: / self->file / or / self->file && ...... / Check out the DTrace Toolkit and reference manual for examples of how it''s done. Phil http://harmanholistix.com On 5 Oct 2010, at 04:50, chris moden <chrismoden at yahoo.com.au> wrote:> Hi Guys, > > Just got this error > > <code> > dtrace: error on enabled probe ID 6 (ID 11792: syscall::rename:return): invalid address (0x0) in action #1 at DIF offset 28 > </code> > which I thought I''d solved above. Apparently not :( any ptrs gratefully received. > > Cheers > Chris > -- > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Hi Phil, Sorry I haven''t replied sooner; I''ve been away. I can sort of see what you''re saying, but I thought that dtrace was triggered by the relevant event in the kernel. This program works when I test it, and it is normally left to run indefinitely; I''m not continuously stopping and starting it. The error only occurred after about 1 1/2 weeks of continuous running. Would the condition you mentioned still obtain? Am I missing something here? Cheers Chris -- This message posted from opensolaris.org
On 11/10/2010 04:25, chris moden wrote:> Hi Phil, > > Sorry I haven''t replied sooner; I''ve been away. > > I can sort of see what you''re saying, but I thought that dtrace was triggered by the relevant event in the kernel. > This program works when I test it, and it is normally left to run indefinitely; I''m not continuously stopping and starting it. > The error only occurred after about 1 1/2 weeks of continuous running. > Would the condition you mentioned still obtain? > Am I missing something here? > > Cheers > ChrisSorry Chris, I''d assumed it was urgent, which is why I replied from my iPhone. So, as I mentioned, I didn''t read the script that thoroughly (because all the lines were wrapped exceedingly short). The info about how you run it, and that your bug only shows up after 1 1/2 weeks would have been useful data. Let''s add some context...> Just got this error > > <code> > dtrace: error on enabled probe ID 6 (ID 11792: syscall::rename:return): invalid address (0x0) in action #1 at DIF offset 28 > </code> > which I thought I''d solved above. Apparently not:( any ptrs gratefully received.And your code...> ################################# > # --- DTrace --- > # > # NB: seem to need the single quotes around the DTrace code ...That''s a shell script thing, not a DTrace issue.> # This also means the even the contents of comment blocks CANNOT have single quotes > # in them eg don''t, won''t etc... (sigh...)Again this is a shell script thing. If you want a '' you can always do ''"''"'' inside the initial ''.> /usr/sbin/dtrace -n ''... or you can put the D in a separate script and start it with... #!/usr/sbin/dtrace -s ... and use '' and " to your heart''s content.> /* Params from shell input */ > inline string DIRNAME = "''$dirname''"; > > #pragma D option quiet > #pragma D option switchrate=10hz > > /* > * Print header > */ > dtrace:::BEGIN > { > /* print main headers: We cannot line up final arg hdr exactly > * because the cmd len varies > */ > printf("%-20s %-12s %5s %5s %6s %6s %s -> %s\n", > "TIME", "ZONE", "GID", "UID", "PID", "PPID", "CMD", "TARGET") ; > } > > /* > * Check exec event type > */ > > syscall::unlink:entry > { > /* Grab the dirname in qn to test later: remove any preceding path */ > /* Experiment seems to indicate unlink will not have this value in the return state ; > * contrast with rmdir below which may not have it in entry state > */ > tgt = basename(copyinstr(arg0)); > } > > /*http://docs.sun.com/app/docs/doc/817-6223/6mlkidlrg?l=en&a=view#indexterm-458 : > * Avoiding Errors > * The copyin() and copyinstr() subroutines cannot read from user addresses which have not yet > * been touched so even a valid address may cause an error if the page containing that address > * has not yet been faulted in by being accessed. > * To resolve this issue, wait for kernel or application to use the data before tracing it. > * For example, you might wait until the system call returns to apply copyinstr() > */This can apply to unlink as well, your experiments were not exhaustive.> syscall::rmdir:entry, syscall::rename:entry > { > /* Try saving a ptr to the relevant value for later, otherwise it gives invalid addr error > * in return section below > */ > self->file = arg0; > } > > syscall::rmdir:return, syscall::rename:return > { > /* Grab the dirname in qn to test later: remove any preceding path */ > tgt = basename(copyinstr(self->file)); > }The return probe needs to be matched with the entry probe. In a race between two or more threads you cannot predict what tgt will contain.> /* Not matching on zone because tgt dir can be deleted from global, > * although the users should not be able to get in there. > */ > syscall::rmdir:return, syscall::rename:return, syscall::unlink:return > / DIRNAME == tgt / > { > /* Print the field values. The TARGET tends not to line up as we > print the cmd and the target name for completeness. For a shell level cmd, > we will get the target name in the CMD field as well. For an "internal" cmd, > eg rmdir() from within perl, the CMD field does not contain the target value. > */ > printf("%-20Y %-12s %5d %5d %6d %6d %s -> %s\n", > walltimestamp, zonename, gid, uid, pid, ppid, > curpsinfo->pr_psargs, tgt ) ; > > /* Clear the self->file ptr to avoid dynamic variable drop errors */ > self->file = 0; > }Again, this needs to match the corresponding entry probe. I suggest something like the following ... syscall::rmdir:entry, syscall::rename:entry, syscall::unlink:entry /arg0/ { self->file = arg0; } syscall::rmdir:return, syscall::rename:return, syscall::unlink:return /self->file/ { self->tgt = basename(copyinstr(self->file)); } syscall::rmdir:return, syscall::rename:return, syscall::unlink:return /self->tgt == DIRNAME/ { printf("%-20Y %-12s %5d %5d %6d %6d %s -> %s\n", walltimestamp, zonename, gid, uid, pid, ppid, curpsinfo->pr_psargs, tgt ) ; } syscall::rmdir:return, syscall::rename:return, syscall::unlink:return /self->file/ { self->tgt = ""; self->file = 0; } Because you failed to match entry and return probes, you may have missed the event you were looking for in a race between two or more threads. I think the issue you saw was probably that something called rename(2) with arg0 set to zero, which is why I predicate on arg0 being non-zero. Phil
Hi Phil, Thx for that stuff; I''ve implemented more or less what you said. I do have another qn. Is it possible to return the full path of the dir that''s been removed, as I''m tracking a home dir (home of a key app that got deleted sometime back). It''d be nice to avoid false positives. What I''ve tried so far, is to check if the target dir (full path from the global zone into the ''local'' zone) still exists from the controlling Perl program. [code] if( -d $cfg::params{''TRACKED_DIR''} ) [/code] Unfortunately, it (Perl -d check) still thinks the dir exists at the time it checks the returned output from the DTrace program. (I''ve implemented the DTrace stuff as just D code now, ie no shell code reqd, thx) I know a file/dir doesn''t really go until all processes accessing it have closed the file/dir, but surely that doesn''t apply here? At the moment I''me checking just the final dir name eg xxdir in /a/b/c/xxdir, because I couldn''t see how to check the full name if eg someone does ''rmdir xxdir''. Cheers Chris -- This message posted from opensolaris.org
I was looking to analyse mutiple threads from different processes, especially the locking mechanism. My targte is to find out: 1. Which thread is waiting for mutex? 2. Which thread is holding the mutex? Does lockstat::mutex_enter:* would work? Here, one point to remember is that participating threads are from different processes. I have found some documentation in http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_Locks but could not find how to capture Conditional variables. I have the following in my system: # dtrace -lP lockstat ID PROVIDER MODULE FUNCTION NAME 44083 lockstat genunix mutex_enter adaptive-acquire 44084 lockstat genunix mutex_enter adaptive-block 44085 lockstat genunix mutex_enter adaptive-spin 44086 lockstat genunix mutex_exit adaptive-release 44087 lockstat genunix mutex_destroy adaptive-release 44088 lockstat genunix mutex_tryenter adaptive-acquire 44089 lockstat genunix lock_set spin-acquire 44090 lockstat genunix lock_set spin-spin 44091 lockstat genunix lock_set_spl spin-acquire 44092 lockstat genunix lock_set_spl spin-spin 44093 lockstat genunix lock_try spin-acquire 44094 lockstat genunix lock_clear spin-release 44095 lockstat genunix lock_clear_splx spin-release 44096 lockstat genunix CLOCK_UNLOCK spin-release 44097 lockstat genunix rw_enter rw-acquire 44098 lockstat genunix rw_enter rw-block 44099 lockstat genunix rw_exit rw-release 44100 lockstat genunix rw_tryenter rw-acquire 44101 lockstat genunix rw_tryupgrade rw-upgrade 44102 lockstat genunix rw_downgrade rw-downgrade 44103 lockstat genunix thread_lock thread-spin 44104 lockstat genunix thread_lock_high thread-spin Regards Sudip -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20101012/f9ad31f9/attachment-0001.html>
@Sudip: that''s a new qn; please open a new thread instead -- This message posted from opensolaris.org
Ok, I got a lot of dynamic variable drops with non-empty dirty list dynamic variable drops msgs, after it had been running for about 18 hrs. Had a dig around google and it seems that (according to the DTrace Guide) i should change self->tgt = ""; self->file = 0; to self->tgt = 0; self->file = 0; [quote] In general, one should always assign zero to thread-local variables that are no longer in use. [/quote] Playing with dynvarsize is advised on some threads I''ve seen. It may(?) be relevant that the system I''m using is old: Solaris 10 3/05, so I''m guessing DTrace has been updated a few times since then... Cheers Chris -- This message posted from opensolaris.org