I am running a small script (attached) that timestamps file opens and closes. When I am running it with real load I get tons of messages like dtrace: 79575 drops on CPU 12 dtrace: 140709 drops on CPU 13 dtrace: 125092 drops on CPU 14 dtrace: 38129 drops on CPU 15 ... and sometimes dtrace: error on enabled probe ID 4 (ID 10200: fbt:genunix:closeandsetf:entry): out of scratch space in action #2 at DIF offset 508 dtrace: error on enabled probe ID 2 (ID 6227: fbt:genunix:copen:return): out of scratch space in action #3 at DIF offset 28 dtrace: error on enabled probe ID 4 (ID 10200: fbt:genunix:closeandsetf:entry): out of scratch space in action #2 at DIF offset 508 dtrace: error on enabled probe ID 4 (ID 10200: fbt:genunix:closeandsetf:entry): out of scratch space in action #2 at DIF offset 508 dtrace: error on enabled probe ID 4 (ID 10200: fbt:genunix:closeandsetf:entry): out of scratch space in action #2 at DIF offset 508 Unfortunately DTrace is not telling what are these drops so I don''t know what limit to tune (and why they are happening in the first place). ---------------------------------------------------------------------- #!/usr/sbin/dtrace -Cs #pragma ident $Id: openrace.d,v 1.2 2007/07/25 21:51:16 akolb Exp akolb $ #pragma D option aggsortrev #pragma D option quiet #pragma D option dynvarsize=4m #include <sys/types.h> #include <sys/file.h> inline uf_info_t curufinfo = curthread->t_procp->p_user.u_finfo; inline string curargs = curpsinfo->pr_psargs; inline string f_name[int fd] = (fd >= 0 && fd < curthread->t_procp->p_user.u_finfo.fi_nfiles ? (curufinfo.fi_list[fd].uf_file ? (curufinfo.fi_list[fd].uf_file->f_vnode->v_path != NULL ? cleanpath(curufinfo.fi_list[fd].uf_file->f_vnode->v_path) : "<unknown>") : "<unknown>") : "<unknown>"); fbt::copen:entry /curpsinfo->pr_projid == $1/ { self->opentime = walltimestamp; self->name = arg1; self->mode = !(args[2] & FREAD) ? ''W'' : (args[2] == FREAD ? ''R'' : ''M''); } fbt::copen:return /self->opentime && arg1 >= 0/ { printf("%d\t%c O %s %s %s\n", self->opentime, self->mode, copyinstr(self->name), cwd, curargs); } fbt::copen:return /self->opentime/ { self->opentime = 0; self->name = 0; self->mode = 0; } fbt::closeandsetf:entry /curpsinfo->pr_projid == $1/ { printf("%d\tC C %s %s %s\n", walltimestamp, f_name[arg0], cwd, curargs); } ---------------------------------------------------------------------- - akolb
On Fri, Sep 21, 2007 at 07:36:20PM -0700, Alexander Kolbasov wrote:> I am running a small script (attached) that timestamps file opens and > closes. > > When I am running it with real load I get tons of messages like > > dtrace: 79575 drops on CPU 12 > dtrace: 140709 drops on CPU 13 > dtrace: 125092 drops on CPU 14 > dtrace: 38129 drops on CPU 15 > ... > > and sometimes > > dtrace: error on enabled probe ID 4 (ID 10200: fbt:genunix:closeandsetf:entry): > out of scratch space in action #2 at DIF offset 508 > dtrace: error on enabled probe ID 2 (ID 6227: fbt:genunix:copen:return): out of > scratch space in action #3 at DIF offset 28 > dtrace: error on enabled probe ID 4 (ID 10200: fbt:genunix:closeandsetf:entry): > out of scratch space in action #2 at DIF offset 508 > dtrace: error on enabled probe ID 4 (ID 10200: fbt:genunix:closeandsetf:entry): > out of scratch space in action #2 at DIF offset 508 > dtrace: error on enabled probe ID 4 (ID 10200: fbt:genunix:closeandsetf:entry): > out of scratch space in action #2 at DIF offset 508 > > Unfortunately DTrace is not telling what are these drops so I don''t know > what limit to tune (and why they are happening in the first place).Both of these messages are symptoms of the same problem: you''re running out of principal buffer space. You can deal with this by either increasing the buffer size (e.g. -x bufsize=20m), increasing the switch rate (e.g. -x switchrate=123hz) or some combination of both. - Bryan -------------------------------------------------------------------------- Bryan Cantrill, Solaris Kernel Development. http://blogs.sun.com/bmc
> > Unfortunately DTrace is not telling what are these drops so I don''t know > > what limit to tune (and why they are happening in the first place). > > Both of these messages are symptoms of the same problem: you''re running > out of principal buffer space. You can deal with this by either increasing > the buffer size (e.g. -x bufsize=20m), increasing the switch rate (e.g. > -x switchrate=123hz) or some combination of both. >It seems that the largest bufsize I can get is 16m (otherwise I get dtrace: could not enable tracing: File too large message. I tried with 16m bufsize and tried switchrate of 123, 257 and 1025hz and I still see lots of drops. Is there any other magic knob I can try? - akolb> On Fri, Sep 21, 2007 at 07:36:20PM -0700, Alexander Kolbasov wrote: > > I am running a small script (attached) that timestamps file opens and > > closes. > > > > When I am running it with real load I get tons of messages like > > > > dtrace: 79575 drops on CPU 12 > > dtrace: 140709 drops on CPU 13 > > dtrace: 125092 drops on CPU 14 > > dtrace: 38129 drops on CPU 15 > > ... > > > > and sometimes > > > > dtrace: error on enabled probe ID 4 (ID 10200: fbt:genunix:closeandsetf:entry): > > out of scratch space in action #2 at DIF offset 508 > > dtrace: error on enabled probe ID 2 (ID 6227: fbt:genunix:copen:return): out of > > scratch space in action #3 at DIF offset 28 > > dtrace: error on enabled probe ID 4 (ID 10200: fbt:genunix:closeandsetf:entry): > > out of scratch space in action #2 at DIF offset 508 > > dtrace: error on enabled probe ID 4 (ID 10200: fbt:genunix:closeandsetf:entry): > > out of scratch space in action #2 at DIF offset 508 > > dtrace: error on enabled probe ID 4 (ID 10200: fbt:genunix:closeandsetf:entry): > > out of scratch space in action #2 at DIF offset 508 > > > > Unfortunately DTrace is not telling what are these drops so I don''t know > > what limit to tune (and why they are happening in the first place). > > Both of these messages are symptoms of the same problem: you''re running > out of principal buffer space. You can deal with this by either increasing > the buffer size (e.g. -x bufsize=20m), increasing the switch rate (e.g. > -x switchrate=123hz) or some combination of both. > > - Bryan > > -------------------------------------------------------------------------- > Bryan Cantrill, Solaris Kernel Development. http://blogs.sun.com/bmc >
On Mon, Sep 24, 2007 at 04:51:24PM -0700, Alexander Kolbasov wrote:> > > Unfortunately DTrace is not telling what are these drops so I don''t know > > > what limit to tune (and why they are happening in the first place). > > > > Both of these messages are symptoms of the same problem: you''re running > > out of principal buffer space. You can deal with this by either increasing > > the buffer size (e.g. -x bufsize=20m), increasing the switch rate (e.g. > > -x switchrate=123hz) or some combination of both. > > > > It seems that the largest bufsize I can get is 16m (otherwise I get > > dtrace: could not enable tracing: File too largeThat''s a surprisingly low number. Are there a tremendous number of CPUs on this system? (If this is a DEBUG kernel, try non-DEBUG -- kmem debugging takes up a ton of wired-down memory.)> message. I tried with 16m bufsize and tried switchrate of 123, 257 and 1025hz > and I still see lots of drops. Is there any other magic knob I can try?Are the numbers going down at least? And is there any way that you can use predicates and/or aggregations to prune or aggregate the data flow? - Bryan -------------------------------------------------------------------------- Bryan Cantrill, Solaris Kernel Development. http://blogs.sun.com/bmc
> On Mon, Sep 24, 2007 at 04:51:24PM -0700, Alexander Kolbasov wrote: > > > > Unfortunately DTrace is not telling what are these drops so I don''t know > > > > what limit to tune (and why they are happening in the first place). > > > > > > Both of these messages are symptoms of the same problem: you''re running > > > out of principal buffer space. You can deal with this by either increasing > > > the buffer size (e.g. -x bufsize=20m), increasing the switch rate (e.g. > > > -x switchrate=123hz) or some combination of both. > > > > > > > It seems that the largest bufsize I can get is 16m (otherwise I get > > > > dtrace: could not enable tracing: File too large > > That''s a surprisingly low number. Are there a tremendous number of CPUs > on this system? (If this is a DEBUG kernel, try non-DEBUG -- kmem debugging > takes up a ton of wired-down memory.)this is 16-CPU x86 system, running non-DEBUG snv_73.> > message. I tried with 16m bufsize and tried switchrate of 123, 257 and 1025hz > > and I still see lots of drops. Is there any other magic knob I can try? > > Are the numbers going down at least? And is there any way that you can > use predicates and/or aggregations to prune or aggregate the data flow?Difficult to say whether they are going down. It seems that it only happens when this one is enabled: fbt::closeandsetf:entry /curpsinfo->pr_projid == $1/ { printf("%d\tC C %s %s %s\n", walltimestamp, fds[arg0].fi_pathname, cwd, curargs); } Once I comment out the printf drops disappear. I am trying to debug race conditions during ON build where I''d like to get timestamps when a file is open and closed. I can try using global arrays to keep track of open files, but this seems even more prone to issues. - akolb
Hi, Same here when using the dvm agent and Jdk 1.5. Getting a lot of drops using a Thinkpad T43. Trying to set bufsize even up to limit 128m wont help, nor specifying a switchrate. I have tried both option without success. Im running on: Solaris Nevada snv_64a X86. Im trying to debug, as an example, the JAlbum, a java based application. This is a simple script: dvm$target:::method-entry { self->ts[copyinstr(arg0),copyinstr(arg1)] = vtimestamp; } dvm$target:::method-return { @ts[copyinstr(arg0),copyinstr(arg1)] = sum(vtimestamp - self->ts[copyinstr(arg0),copyinstr(arg1)]); } $ dtrace -x bufsize=128m -s getime-methods.d -p 3447 dtrace: script ''getime-methods.d'' matched 2 probes dtrace: 2330 dynamic variable drops dtrace: 7385 dynamic variable drops dtrace: 9162 dynamic variable drops dtrace: 10928 dynamic variable drops dtrace: 10552 dynamic variable drops dtrace: 6594 dynamic variable drops dtrace: 2853 dynamic variable drops ^C thanks, stefan -- This message posted from opensolaris.org
brrr, looks like I missed my coffee: I was talking about the dynamic variable pool. Brrr, off topic. Had increased that to 32m and worked fine. stefan -- This message posted from opensolaris.org
>From what I understand on this thread :http://forum.java.sun.com/thread.jspa?threadID=5133053&tstart=30 there is a hardcoded limit in DTrace of 16M for the buffer size when you are not running as root. Why, why, why ?? -- This message posted from opensolaris.org
On Mon, Apr 21, 2008 at 12:33:47AM -0700, Lars Bruun-Hansen wrote:> there is a hardcoded limit in DTrace of 16M for the buffer size when > you are not running as root. Why, why, why ??We don''t want to let non-root users exhaust system resources. A root user can change this by modifying dtrace_nonroot_maxsize via ''mdb -k'' or by adding a set directive to /etc/system (check out system(4)). Adam -- Adam Leventhal, Fishworks http://blogs.sun.com/ahl