thr3ads.net - dtrace discuss - [dtrace-discuss] Why am I getting drops? [Sep 2007]

If this information is useful, please help other people find it:
Share via:

Alexander Kolbasov

2007-Sep-22 02:36 UTC

[dtrace-discuss] Why am I getting drops?

I am running a small script (attached) that timestamps file opens and
closes.

When I am running it with real load I get tons of messages like

dtrace: 79575 drops on CPU 12
dtrace: 140709 drops on CPU 13
dtrace: 125092 drops on CPU 14
dtrace: 38129 drops on CPU 15
...

and sometimes

dtrace: error on enabled probe ID 4 (ID 10200: fbt:genunix:closeandsetf:entry): 
out of scratch space in action #2 at DIF offset 508
dtrace: error on enabled probe ID 2 (ID 6227: fbt:genunix:copen:return): out of 
scratch space in action #3 at DIF offset 28
dtrace: error on enabled probe ID 4 (ID 10200: fbt:genunix:closeandsetf:entry): 
out of scratch space in action #2 at DIF offset 508
dtrace: error on enabled probe ID 4 (ID 10200: fbt:genunix:closeandsetf:entry): 
out of scratch space in action #2 at DIF offset 508
dtrace: error on enabled probe ID 4 (ID 10200: fbt:genunix:closeandsetf:entry): 
out of scratch space in action #2 at DIF offset 508

Unfortunately DTrace is not telling what are these drops so I don''t
know
what limit to tune (and why they are happening in the first place).

----------------------------------------------------------------------
#!/usr/sbin/dtrace -Cs

#pragma ident $Id: openrace.d,v 1.2 2007/07/25 21:51:16 akolb Exp akolb $
#pragma D option aggsortrev
#pragma D option quiet
#pragma D option dynvarsize=4m

#include <sys/types.h>
#include <sys/file.h>


inline uf_info_t curufinfo = curthread->t_procp->p_user.u_finfo;
inline string curargs =  curpsinfo->pr_psargs;

inline string f_name[int fd] = (fd >= 0 && fd <
curthread->t_procp->p_user.u_finfo.fi_nfiles ?
    (curufinfo.fi_list[fd].uf_file ?
	(curufinfo.fi_list[fd].uf_file->f_vnode->v_path != NULL ?
	    cleanpath(curufinfo.fi_list[fd].uf_file->f_vnode->v_path) :
	    "<unknown>") : 
	"<unknown>") :
    "<unknown>");

fbt::copen:entry
/curpsinfo->pr_projid == $1/
{
	self->opentime = walltimestamp;
	self->name = arg1;
	self->mode = !(args[2] & FREAD)  ? ''W'' : (args[2] ==
FREAD ? ''R'' : ''M'');
}

fbt::copen:return
/self->opentime && arg1 >= 0/
{
	printf("%d\t%c O %s %s %s\n",
	    self->opentime, self->mode, copyinstr(self->name),
	    cwd,  curargs);
}

fbt::copen:return
/self->opentime/
{
	self->opentime = 0;
	self->name = 0;
	self->mode = 0;
}

fbt::closeandsetf:entry
/curpsinfo->pr_projid == $1/
{
	printf("%d\tC C %s %s %s\n",
	    walltimestamp, f_name[arg0], cwd, curargs);
}
----------------------------------------------------------------------

- akolb

Bryan Cantrill

2007-Sep-22 07:51 UTC

head link

[dtrace-discuss] Why am I getting drops?

On Fri, Sep 21, 2007 at 07:36:20PM -0700, Alexander Kolbasov
wrote:> I am running a small script (attached) that timestamps file opens and
> closes.
> 
> When I am running it with real load I get tons of messages like
> 
> dtrace: 79575 drops on CPU 12
> dtrace: 140709 drops on CPU 13
> dtrace: 125092 drops on CPU 14
> dtrace: 38129 drops on CPU 15
> ...
> 
> and sometimes
> 
> dtrace: error on enabled probe ID 4 (ID 10200:
fbt:genunix:closeandsetf:entry):
> out of scratch space in action #2 at DIF offset 508
> dtrace: error on enabled probe ID 2 (ID 6227: fbt:genunix:copen:return):
out of
> scratch space in action #3 at DIF offset 28
> dtrace: error on enabled probe ID 4 (ID 10200:
fbt:genunix:closeandsetf:entry):
> out of scratch space in action #2 at DIF offset 508
> dtrace: error on enabled probe ID 4 (ID 10200:
fbt:genunix:closeandsetf:entry):
> out of scratch space in action #2 at DIF offset 508
> dtrace: error on enabled probe ID 4 (ID 10200:
fbt:genunix:closeandsetf:entry):
> out of scratch space in action #2 at DIF offset 508
> 
> Unfortunately DTrace is not telling what are these drops so I
don''t know
> what limit to tune (and why they are happening in the first place).
Both of these messages are symptoms of the same problem:  you''re
running
out of principal buffer space.  You can deal with this by either increasing
the buffer size (e.g. -x bufsize=20m), increasing the switch rate (e.g.
-x switchrate=123hz) or some combination of both.

	- Bryan

--------------------------------------------------------------------------
Bryan Cantrill, Solaris Kernel Development.       http://blogs.sun.com/bmc

Alexander Kolbasov

2007-Sep-24 23:51 UTC

head link

[dtrace-discuss] Why am I getting drops?

> > Unfortunately DTrace is not telling what are these drops so I
don''t know
> > what limit to tune (and why they are happening in the first place).
> 
> Both of these messages are symptoms of the same problem:  you''re
running
> out of principal buffer space.  You can deal with this by either increasing
> the buffer size (e.g. -x bufsize=20m), increasing the switch rate (e.g.
> -x switchrate=123hz) or some combination of both.
>
It seems that the largest bufsize I can get is 16m (otherwise I get 

dtrace: could not enable tracing: File too large

message. I tried with 16m bufsize and tried switchrate of 123, 257 and 1025hz 
and I still see lots of drops. Is there any other magic knob I can try?

- akolb
> On Fri, Sep 21, 2007 at 07:36:20PM -0700, Alexander Kolbasov wrote:
> > I am running a small script (attached) that timestamps file opens and
> > closes.
> > 
> > When I am running it with real load I get tons of messages like
> > 
> > dtrace: 79575 drops on CPU 12
> > dtrace: 140709 drops on CPU 13
> > dtrace: 125092 drops on CPU 14
> > dtrace: 38129 drops on CPU 15
> > ...
> > 
> > and sometimes
> > 
> > dtrace: error on enabled probe ID 4 (ID 10200:
fbt:genunix:closeandsetf:entry):
> > out of scratch space in action #2 at DIF offset 508
> > dtrace: error on enabled probe ID 2 (ID 6227:
fbt:genunix:copen:return): out of
> > scratch space in action #3 at DIF offset 28
> > dtrace: error on enabled probe ID 4 (ID 10200:
fbt:genunix:closeandsetf:entry):
> > out of scratch space in action #2 at DIF offset 508
> > dtrace: error on enabled probe ID 4 (ID 10200:
fbt:genunix:closeandsetf:entry):
> > out of scratch space in action #2 at DIF offset 508
> > dtrace: error on enabled probe ID 4 (ID 10200:
fbt:genunix:closeandsetf:entry):
> > out of scratch space in action #2 at DIF offset 508
> > 
> > Unfortunately DTrace is not telling what are these drops so I
don''t know
> > what limit to tune (and why they are happening in the first place).
> 
> Both of these messages are symptoms of the same problem:  you''re
running
> out of principal buffer space.  You can deal with this by either increasing
> the buffer size (e.g. -x bufsize=20m), increasing the switch rate (e.g.
> -x switchrate=123hz) or some combination of both.
> 
> 	- Bryan
> 
> --------------------------------------------------------------------------
> Bryan Cantrill, Solaris Kernel Development.       http://blogs.sun.com/bmc
>

Bryan Cantrill

2007-Sep-24 23:54 UTC

head link

[dtrace-discuss] Why am I getting drops?

On Mon, Sep 24, 2007 at 04:51:24PM -0700, Alexander Kolbasov
wrote:> > > Unfortunately DTrace is not telling what are these drops so I
don''t know
> > > what limit to tune (and why they are happening in the first
place).
> > 
> > Both of these messages are symptoms of the same problem: 
you''re running
> > out of principal buffer space.  You can deal with this by either
increasing
> > the buffer size (e.g. -x bufsize=20m), increasing the switch rate
(e.g.
> > -x switchrate=123hz) or some combination of both.
> >
> 
> It seems that the largest bufsize I can get is 16m (otherwise I get 
> 
> dtrace: could not enable tracing: File too large
That''s a surprisingly low number.  Are there a tremendous number of
CPUs
on this system?  (If this is a DEBUG kernel, try non-DEBUG -- kmem debugging
takes up a ton of wired-down memory.)
> message. I tried with 16m bufsize and tried switchrate of 123, 257 and
1025hz
> and I still see lots of drops. Is there any other magic knob I can try?
Are the numbers going down at least?  And is there any way that you can
use predicates and/or aggregations to prune or aggregate the data flow?

	- Bryan

--------------------------------------------------------------------------
Bryan Cantrill, Solaris Kernel Development.       http://blogs.sun.com/bmc

Alexander Kolbasov

2007-Sep-25 00:36 UTC

head link

[dtrace-discuss] Why am I getting drops?

> On Mon, Sep 24, 2007 at 04:51:24PM -0700, Alexander Kolbasov wrote:
> > > > Unfortunately DTrace is not telling what are these drops so
I don''t know
> > > > what limit to tune (and why they are happening in the first
place).
> > > 
> > > Both of these messages are symptoms of the same problem: 
you''re running
> > > out of principal buffer space.  You can deal with this by either
increasing
> > > the buffer size (e.g. -x bufsize=20m), increasing the switch rate
(e.g.
> > > -x switchrate=123hz) or some combination of both.
> > >
> > 
> > It seems that the largest bufsize I can get is 16m (otherwise I get 
> > 
> > dtrace: could not enable tracing: File too large
> 
> That''s a surprisingly low number.  Are there a tremendous number
of CPUs
> on this system?  (If this is a DEBUG kernel, try non-DEBUG -- kmem
debugging
> takes up a ton of wired-down memory.)
this is 16-CPU x86 system, running non-DEBUG snv_73.

 > > message. I tried with 16m bufsize and tried switchrate of 123, 257 and
1025hz
> > and I still see lots of drops. Is there any other magic knob I can
try?
> 
> Are the numbers going down at least?  And is there any way that you can
> use predicates and/or aggregations to prune or aggregate the data flow?
Difficult to say whether they are going down. It seems that it only happens 
when this one is enabled:

fbt::closeandsetf:entry
/curpsinfo->pr_projid == $1/
{
	printf("%d\tC C %s %s %s\n",
	    walltimestamp, fds[arg0].fi_pathname, cwd, curargs);
}

Once I comment out the printf drops disappear.

I am trying to debug race conditions during ON build where I''d like to
get
timestamps when a file is open and closed. I can try using global arrays to 
keep track of open files, but this seems even more prone to issues.

- akolb

Stefan Parvu

2007-Sep-28 14:21 UTC

head link

[dtrace-discuss] Why am I getting drops?

Hi,

Same here when using the dvm agent and Jdk 1.5. Getting a lot of drops using a
Thinkpad T43. Trying to set bufsize even up to limit 128m wont help, nor
specifying a switchrate. I have tried both option without success.

Im running on:
Solaris Nevada snv_64a X86.

Im trying to debug, as an example, the JAlbum, a java based application. This is
a simple script:

dvm$target:::method-entry
{
    self->ts[copyinstr(arg0),copyinstr(arg1)] = vtimestamp;
}

dvm$target:::method-return
{
    @ts[copyinstr(arg0),copyinstr(arg1)] = sum(vtimestamp -
           self->ts[copyinstr(arg0),copyinstr(arg1)]);
}


$ dtrace -x bufsize=128m -s getime-methods.d -p 3447
dtrace: script ''getime-methods.d'' matched 2 probes
dtrace: 2330 dynamic variable drops
dtrace: 7385 dynamic variable drops
dtrace: 9162 dynamic variable drops
dtrace: 10928 dynamic variable drops
dtrace: 10552 dynamic variable drops
dtrace: 6594 dynamic variable drops
dtrace: 2853 dynamic variable drops
^C

thanks,
stefan


--
This message posted from opensolaris.org

Stefan Parvu

2007-Sep-28 14:33 UTC

head link

[dtrace-discuss] Why am I getting drops?

brrr, looks like I missed my coffee: I was talking about the dynamic variable
pool. Brrr, off topic.
Had increased that to 32m and worked fine.

stefan


--
This message posted from opensolaris.org

Lars Bruun-Hansen

2008-Apr-21 07:33 UTC

head link

[dtrace-discuss] Why am I getting drops?

>From what I understand on this thread :
http://forum.java.sun.com/thread.jspa?threadID=5133053&tstart=30

there is a hardcoded limit in DTrace of 16M for the buffer size when you are not
running as root. Why, why, why ??


--
This message posted from opensolaris.org

Adam Leventhal

2008-Apr-21 16:03 UTC

head link

[dtrace-discuss] Why am I getting drops?

On Mon, Apr 21, 2008 at 12:33:47AM -0700, Lars Bruun-Hansen
wrote:> there is a hardcoded limit in DTrace of 16M for the buffer size when
> you are not running as root. Why, why, why ??
We don''t want to let non-root users exhaust system resources. A root
user
can change this by modifying dtrace_nonroot_maxsize via ''mdb
-k'' or
by adding a set directive to /etc/system (check out system(4)).

Adam

-- 
Adam Leventhal, Fishworks                        http://blogs.sun.com/ahl

dtrace discuss - Sep 2007 - Why am I getting drops?

[dtrace-discuss] Why am I getting drops?

[dtrace-discuss] Why am I getting drops?

[dtrace-discuss] Why am I getting drops?

[dtrace-discuss] Why am I getting drops?

[dtrace-discuss] Why am I getting drops?

[dtrace-discuss] Why am I getting drops?

[dtrace-discuss] Why am I getting drops?

[dtrace-discuss] Why am I getting drops?

[dtrace-discuss] Why am I getting drops?