Justin Lloyd
2007-Sep-19 23:12 UTC
[dtrace-discuss] Counting stat calls within a filesystem
I''m trying to write a script that will log the number of stat, stat64,
lstat and lstat64 system calls whose targets (arg0) are in a particular
filesystem, "/gold/tmcache". This will run in a script (either a
dtrace
script or in dtrace code embedded in a shell script) that will be run
for some unknown length of time, possibly many hours or even days, so
the idea of logging all paths and then post-processing the log becomes
problematic due to its rapid growth.
Since DTrace doesn''t have pattern matching, I thought about trying
something like
syscall::stat:entry,
syscall::lstat:entry,
syscall::stat64:entry,
syscall::lstat64:entry
/something[5] == ''/'' && something[6] ==
''t'' && something[7] == ''m''/
{
calls++;
}
/* log once per hour */
tick-1s
/++n == 3600/
{
printf("%Y %d\n", walltimestamp, calls);
n = 0;
calls = 0;
}
but I''m not sure how to get something[i] out of arg0, especially within
a predicate.
Any thoughts? Is this along the right line of thinking or is there some
simpler way that I''m not seeing?
Thanks,
Justin
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20070919/22dfa597/attachment.html>
Chip Bennett
2007-Sep-20 00:31 UTC
[dtrace-discuss] Counting stat calls within a filesystem
Hi Justin!
I made the following changes to your script. The explanations follow
the script:
syscall::stat:entry,
syscall::lstat:entry,
syscall::stat64:entry,
syscall::lstat64:entry
{
self->fp = arg0;
}
syscall::stat:return,
syscall::lstat:return,
syscall::stat64:return,
syscall::lstat64:return
/ self->fp && dirname (copyinstr (self->fp)) ==
"/gold/tmcache" /
{
calls++;
self->fp = 0;
}
/* log once per hour */
tick-1s
/++n == 3600/
{
printf("%Y %d\n", walltimestamp, calls);
n = 0;
calls = 0;
}
1) You shouldn''t inspect user data that is passed to a system call on
entry to the system call, because it might not be paged in yet. Since
dtrace clauses run in the kernel and with interrupts turned off, they
can''t handle page faults. So the way we handle this is to save the
string address in a thread local variable, and then inspect the string
on return from the system call. (Note that the thread local gets
assigned a 0 to clear it, when we''re done with it, because they will
build up in memory otherwise.)
2) The "self->fp" variable is tested to see if it is non-zero at
the
beginning of the predicate, on the off chance that there was a stat call
in the works at the time you started the script. If that scenario
happened, a "return" would fire without a matching "entry",
and the
zero pointer would cause an error.
3) We use the "copyinstr" function because the arg0 pointer from
"stat"
is in user space. The string has to be copied into kernel space
temporarily so that the D clause can see it. At the end of the clause,
this storage is automatically released.
4) The "dirname" function strips away the filename, so that we can do
the comparison. This function also creates a temporary string that is
released at the end of the clause.
5) Just in case you didn''t think of it, if the path passed to arg0 is a
relative pathname instead of an absolute pathname, this algorithm won''t
work. In that situation, you''ll need to do something involving the
"cwd" build-in variable. I didn''t go so far as to figure out
what.
Also, degenerate absolute pathnames, like "/gold/../gold/tmcache"
won''t
match either.
Chip
Mike Gerdts
2007-Sep-20 03:35 UTC
[dtrace-discuss] Counting stat calls within a filesystem
On 9/19/07, Chip Bennett <cbennett at laurustech.com> wrote:> Also, degenerate absolute pathnames, like "/gold/../gold/tmcache" won''t > match either.cleanpath() could help there. -- Mike Gerdts http://mgerdts.blogspot.com/
Wee Yeh Tan
2007-Sep-20 05:28 UTC
[dtrace-discuss] Counting stat calls within a filesystem
Chip,
You also will also miss files in subdirectories of "/gold/tmcache". I
was looking for a strncpy equivalent but that will not solve pt 5 so
decided to do more digging into fbts...
Here''s what I have:
Since Justin is only interested in lstat & stat, we can hook into
lookuppnat:return to catch (vnode_t **) compvpp. This can then lead
us to exactly the file system that was looked up. This will work
across symlinks refs, etc but cannot be extended to arbitrary
subdirectories within a file system.
The script is as follows:
syscall::stat:entry,
syscall::stat64:entry,
syscall::lstat:entry,
syscall::lstat64:entry
{ self->t = arg0 }
lookuppnat:entry
/self->t/
{self->compvpp= (vnode_t **)arg4;}
lookuppnat:return
/self->compvpp && *(self->compvpp)/
{ this->compvpp = *self->compvpp;
self->mntpt =
(string)this->compvpp->v_vfsp->vfs_mntpt->rs_string;
}
lookuppnat:return
/ self->mntpt == $$1 /
{ printf("%s(%d) ->%s\n", execname, pid, copyinstr(self->t)); }
lookuppnat:return
/self->compvpp/
{ self->compvpp = 0; self->mntpt = 0; }
syscall::stat:return,
syscall::stat64:return,
syscall::lstat:return,
syscall::lstat64:return
/ self->t /
{ self->t = 0;}
--
Just me,
Wire ...
Blog: <prstat.blogspot.com>
On 9/20/07, Chip Bennett <cbennett at laurustech.com>
wrote:> Hi Justin!
>
> I made the following changes to your script. The explanations follow
> the script:
>
> syscall::stat:entry,
> syscall::lstat:entry,
> syscall::stat64:entry,
> syscall::lstat64:entry
> {
> self->fp = arg0;
> }
>
> syscall::stat:return,
> syscall::lstat:return,
> syscall::stat64:return,
> syscall::lstat64:return
> / self->fp && dirname (copyinstr (self->fp)) ==
"/gold/tmcache" /
> {
> calls++;
> self->fp = 0;
> }
>
> /* log once per hour */
> tick-1s
> /++n == 3600/
> {
> printf("%Y %d\n", walltimestamp, calls);
> n = 0;
> calls = 0;
> }
>
> 1) You shouldn''t inspect user data that is passed to a system call
on
> entry to the system call, because it might not be paged in yet. Since
> dtrace clauses run in the kernel and with interrupts turned off, they
> can''t handle page faults. So the way we handle this is to save
the
> string address in a thread local variable, and then inspect the string
> on return from the system call. (Note that the thread local gets
> assigned a 0 to clear it, when we''re done with it, because they
will
> build up in memory otherwise.)
>
> 2) The "self->fp" variable is tested to see if it is non-zero
at the
> beginning of the predicate, on the off chance that there was a stat call
> in the works at the time you started the script. If that scenario
> happened, a "return" would fire without a matching
"entry", and the
> zero pointer would cause an error.
>
> 3) We use the "copyinstr" function because the arg0 pointer from
"stat"
> is in user space. The string has to be copied into kernel space
> temporarily so that the D clause can see it. At the end of the clause,
> this storage is automatically released.
>
> 4) The "dirname" function strips away the filename, so that we
can do
> the comparison. This function also creates a temporary string that is
> released at the end of the clause.
>
> 5) Just in case you didn''t think of it, if the path passed to arg0
is a
> relative pathname instead of an absolute pathname, this algorithm
won''t
> work. In that situation, you''ll need to do something involving
the
> "cwd" build-in variable. I didn''t go so far as to
figure out what.
> Also, degenerate absolute pathnames, like "/gold/../gold/tmcache"
won''t
> match either.
Justin Lloyd
2007-Sep-20 18:14 UTC
[dtrace-discuss] Counting stat calls within a filesystem
Chip:
My attempts to use dirname() were resulting in a number of "invalid
address (0x0) in predicate at DIF offset 40" messages. But I feel silly
having overlooked that in the manual. However (as Wee points out),
/gold/tmcache is a very deep tree (a 34 TB filesystem!) so dirname()
wouldn''t quite work in this situation. But I have other places where it
will be useful.
Wee:
Your modifications using lookuppnat() are working nicely. After the
internals class recently (with Chip!), I should have thought to look at
fbt trace of a stat call to see if I could get to the filesystem
information.
I do want to see about adding fstat calls to the mix, but that''s not as
important right now.
Thanks to you both!
Justin
-----Original Message-----
From: Wee Yeh Tan [mailto:weeyeh at gmail.com]
Sent: Wednesday, September 19, 2007 11:28 PM
To: Chip Bennett
Cc: Justin Lloyd; dtrace-discuss at opensolaris.org
Subject: Re: [dtrace-discuss] Counting stat calls within a filesystem
Chip,
You also will also miss files in subdirectories of "/gold/tmcache". I
was looking for a strncpy equivalent but that will not solve pt 5 so
decided to do more digging into fbts...
Here''s what I have:
Since Justin is only interested in lstat & stat, we can hook into
lookuppnat:return to catch (vnode_t **) compvpp. This can then lead us
to exactly the file system that was looked up. This will work across
symlinks refs, etc but cannot be extended to arbitrary subdirectories
within a file system.
The script is as follows:
syscall::stat:entry,
syscall::stat64:entry,
syscall::lstat:entry,
syscall::lstat64:entry
{ self->t = arg0 }
lookuppnat:entry
/self->t/
{self->compvpp= (vnode_t **)arg4;}
lookuppnat:return
/self->compvpp && *(self->compvpp)/
{ this->compvpp = *self->compvpp;
self->mntpt =
(string)this->compvpp->v_vfsp->vfs_mntpt->rs_string;
}
lookuppnat:return
/ self->mntpt == $$1 /
{ printf("%s(%d) ->%s\n", execname, pid, copyinstr(self->t)); }
lookuppnat:return
/self->compvpp/
{ self->compvpp = 0; self->mntpt = 0; }
syscall::stat:return,
syscall::stat64:return,
syscall::lstat:return,
syscall::lstat64:return
/ self->t /
{ self->t = 0;}
--
Just me,
Wire ...
Blog: <prstat.blogspot.com>
On 9/20/07, Chip Bennett <cbennett at laurustech.com>
wrote:> Hi Justin!
>
> I made the following changes to your script. The explanations follow
> the script:
>
> syscall::stat:entry,
> syscall::lstat:entry,
> syscall::stat64:entry,
> syscall::lstat64:entry
> {
> self->fp = arg0;
> }
>
> syscall::stat:return,
> syscall::lstat:return,
> syscall::stat64:return,
> syscall::lstat64:return
> / self->fp && dirname (copyinstr (self->fp)) ==
"/gold/tmcache" / {
> calls++;
> self->fp = 0;
> }
>
> /* log once per hour */
> tick-1s
> /++n == 3600/
> {
> printf("%Y %d\n", walltimestamp, calls);
> n = 0;
> calls = 0;
> }
>
> 1) You shouldn''t inspect user data that is passed to a system call
on
> entry to the system call, because it might not be paged in yet. Since
> dtrace clauses run in the kernel and with interrupts turned off, they
> can''t handle page faults. So the way we handle this is to save
the
> string address in a thread local variable, and then inspect the string
> on return from the system call. (Note that the thread local gets
> assigned a 0 to clear it, when we''re done with it, because they
will
> build up in memory otherwise.)
>
> 2) The "self->fp" variable is tested to see if it is non-zero
at the
> beginning of the predicate, on the off chance that there was a stat
> call in the works at the time you started the script. If that
> scenario happened, a "return" would fire without a matching
"entry",
> and the zero pointer would cause an error.
>
> 3) We use the "copyinstr" function because the arg0 pointer from
"stat"> is in user space. The string has to be copied into kernel space
> temporarily so that the D clause can see it. At the end of the
> clause, this storage is automatically released.
>
> 4) The "dirname" function strips away the filename, so that we
can do
> the comparison. This function also creates a temporary string that is
> released at the end of the clause.
>
> 5) Just in case you didn''t think of it, if the path passed to arg0
is
> a relative pathname instead of an absolute pathname, this algorithm
> won''t work. In that situation, you''ll need to do
something involving
> the "cwd" build-in variable. I didn''t go so far as to
figure out
what.> Also, degenerate absolute pathnames, like "/gold/../gold/tmcache"
> won''t match either.