Justin Lloyd
2007-Sep-19 23:12 UTC
[dtrace-discuss] Counting stat calls within a filesystem
I''m trying to write a script that will log the number of stat, stat64, lstat and lstat64 system calls whose targets (arg0) are in a particular filesystem, "/gold/tmcache". This will run in a script (either a dtrace script or in dtrace code embedded in a shell script) that will be run for some unknown length of time, possibly many hours or even days, so the idea of logging all paths and then post-processing the log becomes problematic due to its rapid growth. Since DTrace doesn''t have pattern matching, I thought about trying something like syscall::stat:entry, syscall::lstat:entry, syscall::stat64:entry, syscall::lstat64:entry /something[5] == ''/'' && something[6] == ''t'' && something[7] == ''m''/ { calls++; } /* log once per hour */ tick-1s /++n == 3600/ { printf("%Y %d\n", walltimestamp, calls); n = 0; calls = 0; } but I''m not sure how to get something[i] out of arg0, especially within a predicate. Any thoughts? Is this along the right line of thinking or is there some simpler way that I''m not seeing? Thanks, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20070919/22dfa597/attachment.html>
Chip Bennett
2007-Sep-20 00:31 UTC
[dtrace-discuss] Counting stat calls within a filesystem
Hi Justin! I made the following changes to your script. The explanations follow the script: syscall::stat:entry, syscall::lstat:entry, syscall::stat64:entry, syscall::lstat64:entry { self->fp = arg0; } syscall::stat:return, syscall::lstat:return, syscall::stat64:return, syscall::lstat64:return / self->fp && dirname (copyinstr (self->fp)) == "/gold/tmcache" / { calls++; self->fp = 0; } /* log once per hour */ tick-1s /++n == 3600/ { printf("%Y %d\n", walltimestamp, calls); n = 0; calls = 0; } 1) You shouldn''t inspect user data that is passed to a system call on entry to the system call, because it might not be paged in yet. Since dtrace clauses run in the kernel and with interrupts turned off, they can''t handle page faults. So the way we handle this is to save the string address in a thread local variable, and then inspect the string on return from the system call. (Note that the thread local gets assigned a 0 to clear it, when we''re done with it, because they will build up in memory otherwise.) 2) The "self->fp" variable is tested to see if it is non-zero at the beginning of the predicate, on the off chance that there was a stat call in the works at the time you started the script. If that scenario happened, a "return" would fire without a matching "entry", and the zero pointer would cause an error. 3) We use the "copyinstr" function because the arg0 pointer from "stat" is in user space. The string has to be copied into kernel space temporarily so that the D clause can see it. At the end of the clause, this storage is automatically released. 4) The "dirname" function strips away the filename, so that we can do the comparison. This function also creates a temporary string that is released at the end of the clause. 5) Just in case you didn''t think of it, if the path passed to arg0 is a relative pathname instead of an absolute pathname, this algorithm won''t work. In that situation, you''ll need to do something involving the "cwd" build-in variable. I didn''t go so far as to figure out what. Also, degenerate absolute pathnames, like "/gold/../gold/tmcache" won''t match either. Chip
Mike Gerdts
2007-Sep-20 03:35 UTC
[dtrace-discuss] Counting stat calls within a filesystem
On 9/19/07, Chip Bennett <cbennett at laurustech.com> wrote:> Also, degenerate absolute pathnames, like "/gold/../gold/tmcache" won''t > match either.cleanpath() could help there. -- Mike Gerdts http://mgerdts.blogspot.com/
Wee Yeh Tan
2007-Sep-20 05:28 UTC
[dtrace-discuss] Counting stat calls within a filesystem
Chip, You also will also miss files in subdirectories of "/gold/tmcache". I was looking for a strncpy equivalent but that will not solve pt 5 so decided to do more digging into fbts... Here''s what I have: Since Justin is only interested in lstat & stat, we can hook into lookuppnat:return to catch (vnode_t **) compvpp. This can then lead us to exactly the file system that was looked up. This will work across symlinks refs, etc but cannot be extended to arbitrary subdirectories within a file system. The script is as follows: syscall::stat:entry, syscall::stat64:entry, syscall::lstat:entry, syscall::lstat64:entry { self->t = arg0 } lookuppnat:entry /self->t/ {self->compvpp= (vnode_t **)arg4;} lookuppnat:return /self->compvpp && *(self->compvpp)/ { this->compvpp = *self->compvpp; self->mntpt = (string)this->compvpp->v_vfsp->vfs_mntpt->rs_string; } lookuppnat:return / self->mntpt == $$1 / { printf("%s(%d) ->%s\n", execname, pid, copyinstr(self->t)); } lookuppnat:return /self->compvpp/ { self->compvpp = 0; self->mntpt = 0; } syscall::stat:return, syscall::stat64:return, syscall::lstat:return, syscall::lstat64:return / self->t / { self->t = 0;} -- Just me, Wire ... Blog: <prstat.blogspot.com> On 9/20/07, Chip Bennett <cbennett at laurustech.com> wrote:> Hi Justin! > > I made the following changes to your script. The explanations follow > the script: > > syscall::stat:entry, > syscall::lstat:entry, > syscall::stat64:entry, > syscall::lstat64:entry > { > self->fp = arg0; > } > > syscall::stat:return, > syscall::lstat:return, > syscall::stat64:return, > syscall::lstat64:return > / self->fp && dirname (copyinstr (self->fp)) == "/gold/tmcache" / > { > calls++; > self->fp = 0; > } > > /* log once per hour */ > tick-1s > /++n == 3600/ > { > printf("%Y %d\n", walltimestamp, calls); > n = 0; > calls = 0; > } > > 1) You shouldn''t inspect user data that is passed to a system call on > entry to the system call, because it might not be paged in yet. Since > dtrace clauses run in the kernel and with interrupts turned off, they > can''t handle page faults. So the way we handle this is to save the > string address in a thread local variable, and then inspect the string > on return from the system call. (Note that the thread local gets > assigned a 0 to clear it, when we''re done with it, because they will > build up in memory otherwise.) > > 2) The "self->fp" variable is tested to see if it is non-zero at the > beginning of the predicate, on the off chance that there was a stat call > in the works at the time you started the script. If that scenario > happened, a "return" would fire without a matching "entry", and the > zero pointer would cause an error. > > 3) We use the "copyinstr" function because the arg0 pointer from "stat" > is in user space. The string has to be copied into kernel space > temporarily so that the D clause can see it. At the end of the clause, > this storage is automatically released. > > 4) The "dirname" function strips away the filename, so that we can do > the comparison. This function also creates a temporary string that is > released at the end of the clause. > > 5) Just in case you didn''t think of it, if the path passed to arg0 is a > relative pathname instead of an absolute pathname, this algorithm won''t > work. In that situation, you''ll need to do something involving the > "cwd" build-in variable. I didn''t go so far as to figure out what. > Also, degenerate absolute pathnames, like "/gold/../gold/tmcache" won''t > match either.
Justin Lloyd
2007-Sep-20 18:14 UTC
[dtrace-discuss] Counting stat calls within a filesystem
Chip: My attempts to use dirname() were resulting in a number of "invalid address (0x0) in predicate at DIF offset 40" messages. But I feel silly having overlooked that in the manual. However (as Wee points out), /gold/tmcache is a very deep tree (a 34 TB filesystem!) so dirname() wouldn''t quite work in this situation. But I have other places where it will be useful. Wee: Your modifications using lookuppnat() are working nicely. After the internals class recently (with Chip!), I should have thought to look at fbt trace of a stat call to see if I could get to the filesystem information. I do want to see about adding fstat calls to the mix, but that''s not as important right now. Thanks to you both! Justin -----Original Message----- From: Wee Yeh Tan [mailto:weeyeh at gmail.com] Sent: Wednesday, September 19, 2007 11:28 PM To: Chip Bennett Cc: Justin Lloyd; dtrace-discuss at opensolaris.org Subject: Re: [dtrace-discuss] Counting stat calls within a filesystem Chip, You also will also miss files in subdirectories of "/gold/tmcache". I was looking for a strncpy equivalent but that will not solve pt 5 so decided to do more digging into fbts... Here''s what I have: Since Justin is only interested in lstat & stat, we can hook into lookuppnat:return to catch (vnode_t **) compvpp. This can then lead us to exactly the file system that was looked up. This will work across symlinks refs, etc but cannot be extended to arbitrary subdirectories within a file system. The script is as follows: syscall::stat:entry, syscall::stat64:entry, syscall::lstat:entry, syscall::lstat64:entry { self->t = arg0 } lookuppnat:entry /self->t/ {self->compvpp= (vnode_t **)arg4;} lookuppnat:return /self->compvpp && *(self->compvpp)/ { this->compvpp = *self->compvpp; self->mntpt = (string)this->compvpp->v_vfsp->vfs_mntpt->rs_string; } lookuppnat:return / self->mntpt == $$1 / { printf("%s(%d) ->%s\n", execname, pid, copyinstr(self->t)); } lookuppnat:return /self->compvpp/ { self->compvpp = 0; self->mntpt = 0; } syscall::stat:return, syscall::stat64:return, syscall::lstat:return, syscall::lstat64:return / self->t / { self->t = 0;} -- Just me, Wire ... Blog: <prstat.blogspot.com> On 9/20/07, Chip Bennett <cbennett at laurustech.com> wrote:> Hi Justin! > > I made the following changes to your script. The explanations follow > the script: > > syscall::stat:entry, > syscall::lstat:entry, > syscall::stat64:entry, > syscall::lstat64:entry > { > self->fp = arg0; > } > > syscall::stat:return, > syscall::lstat:return, > syscall::stat64:return, > syscall::lstat64:return > / self->fp && dirname (copyinstr (self->fp)) == "/gold/tmcache" / { > calls++; > self->fp = 0; > } > > /* log once per hour */ > tick-1s > /++n == 3600/ > { > printf("%Y %d\n", walltimestamp, calls); > n = 0; > calls = 0; > } > > 1) You shouldn''t inspect user data that is passed to a system call on > entry to the system call, because it might not be paged in yet. Since> dtrace clauses run in the kernel and with interrupts turned off, they > can''t handle page faults. So the way we handle this is to save the > string address in a thread local variable, and then inspect the string> on return from the system call. (Note that the thread local gets > assigned a 0 to clear it, when we''re done with it, because they will > build up in memory otherwise.) > > 2) The "self->fp" variable is tested to see if it is non-zero at the > beginning of the predicate, on the off chance that there was a stat > call in the works at the time you started the script. If that > scenario happened, a "return" would fire without a matching "entry", > and the zero pointer would cause an error. > > 3) We use the "copyinstr" function because the arg0 pointer from"stat"> is in user space. The string has to be copied into kernel space > temporarily so that the D clause can see it. At the end of the > clause, this storage is automatically released. > > 4) The "dirname" function strips away the filename, so that we can do > the comparison. This function also creates a temporary string that is> released at the end of the clause. > > 5) Just in case you didn''t think of it, if the path passed to arg0 is > a relative pathname instead of an absolute pathname, this algorithm > won''t work. In that situation, you''ll need to do something involving > the "cwd" build-in variable. I didn''t go so far as to figure outwhat.> Also, degenerate absolute pathnames, like "/gold/../gold/tmcache" > won''t match either.