Hi, On a T5220, when using the speculative tracing there is a signifcant increase on system load. I am using the examples from http://wikis.sun.com/display/DTrace/Speculative+Tracing The system call traced is fork instead of open64. Can that script cause such a load? The system itself without the script is moderately loaded (usr+sys ~30-40). Also, I have to bump up the number of speculative buffers to 10 to avoid some of the warnings. What is a good rule to set the number of speculative buffers? In this case is it based on the number of fork calls that could be simultaneously traced? Thanks -- This message posted from opensolaris.org
Which example are you using, specopen.d, /*the script that instruments every fbt probe*/? Please post or be more precise about which script you''re using. If you''re using specopen.d, than you''re enabling on the order of 30,000 probes. That''s going to add up, even at the very reasonable cost of about 1 microsecond per probe (rough approximation). In general, the cost of using DTrace in terms of probe effect is directly proprotional to two things; - the number of probes you enable. - the activity of the code path of the enabled probe(s). In other words, if you instrument the fork system call, and the system is not executing forks, you won''t feel a thing. If, on the other hand, the system is going 1000 forks per second, you''ll feel it. The specopen.d script enables all the fbt probes. Even with the predicate ( / self->spec / ), the probe still has to fire before the predicate gets evaluated. HTH, /jim tester wrote:> Hi, > > On a T5220, when using the speculative tracing there is a signifcant increase on system load. I am using the examples from > > http://wikis.sun.com/display/DTrace/Speculative+Tracing > > The system call traced is fork instead of open64. Can that script cause such a load? The system itself without the script is moderately loaded (usr+sys ~30-40). > > Also, I have to bump up the number of speculative buffers to 10 to avoid some of the warnings. What is a good rule to set the number of speculative buffers? In this case is it based on the number of fork calls that could be simultaneously traced? > > Thanks >
counting system call process during this interval: Dtrace came on top ioctl dtrace 10609 I am sure if that is from the speculative dtrace script or the script used to count the system calls. Thanks -- This message posted from opensolaris.org
I''m sorry, but I am unable to parse this. What is the question here? Thanks, /jim tester wrote:> counting system call process during this interval: Dtrace came on top > > ioctl dtrace 10609 > > I am sure if that is from the speculative dtrace script or the script used to count the system calls. > > Thanks >
Ah, OK - I think I get it. tester wrote:> counting system call process during this interval: Dtrace came on top > > ioctl dtrace 10609 >Got it. DTrace executed 10,609 system calls during your sampling period, more than any other process. I often filter dtrace out in a predicate; / execname != "dtrace" /> I am sure if that is from the speculative dtrace script or the script used to count the system calls. >Both. If you want to know precisely which one, you can use pid as an aggregation key in the count aggregation, and correlate the pid to the script you''re starting. I''m not sure what you''re chasing here though... /jim> Thanks >
Jim, Thanks. You are right, I was using the specopen.d, but looking for fork errors instead of open. I did not know that probe has to fire before predicate gets evaluated. It now makes sense for 40% increase in load during dtracing. I would like to see the code path during a fork failure (and ultimately the reason), do you have any suggestion without incurring a significant overhead. We have an application that forks around 1000+ processes in 5-10 seconds and occassionaly the fork fails. I am think it is from lack of VM, but also have a feeling that there are some hardcoded limits in the kernel. There was plenty of RAM available duiring the fork failure. Thanks -- This message posted from opensolaris.org
On 06/02/09 18:30, tester wrote:> Jim, > > Thanks. You are right, I was using the specopen.d, but looking for fork > errors instead of open. I did not know that probe has to fire before > predicate gets evaluated. It now makes sense for 40% increase in load > during dtracing. I would like to see the code path during a fork > failure (and ultimately the reason), do you have any suggestion without > incurring a significant overhead. We have an application that forks > around 1000+ processes in 5-10 seconds and occassionaly the fork fails. > I am think it is from lack of VM, but also have a feeling that there are > some hardcoded limits in the kernel. There was plenty of RAM available > duiring the fork failure.I''d suggest you look at speculations - commit the speculation on fork failure, discard it on success. HTH Michael -- Michael Schuster http://blogs.sun.com/recursion Recursion, n.: see ''Recursion''
Hi (ummm, Tester?) - First and foremost, what''s the errno on the fork failure? 99% of the time, the errno information is enough to figure out why forks are failing. Second, make sure you look in /var/adm/messages - if fork is failing because of a system resource issue, you''ll often get a syslog message that tells you what''s wrong. You really should not need to generate a kernel function call trace to root-cause the failure of any system call, including fork. It''s like using an MRI and studying the resuling film when the problem is a small cut that just needs some disinfectint and a band-aid. You''re looking much deeper than you need to. Thanks, /jim tester wrote:> Jim, > > Thanks. You are right, I was using the specopen.d, but looking for fork errors instead of open. I did not know that probe has to fire before predicate gets evaluated. It now makes sense for 40% increase in load during dtracing. I would like to see the code path during a fork failure (and ultimately the reason), do you have any suggestion without incurring a significant overhead. We have an application that forks around 1000+ processes in 5-10 seconds and occassionaly the fork fails. I am think it is from lack of VM, but also have a feeling that there are some hardcoded limits in the kernel. There was plenty of RAM available duiring the fork failure. > > Thanks >
Michael, Thanks. I think that''s what the script from wiki.sun.com (specopen.d) does. Did I mis interpret your suggestion? Thanks -- This message posted from opensolaris.org
Hi Jim, The app software doesn''t poduce a errno in its logs (bad software, although from a leading vendor, I think they inherited it, but a error string says "not enough space" I tried grepping some of the header files but could not find a match. /var/adm/messages: that''s the first thing I looked at, but no help there either. I tried to match the error string in src.opensolaris.org, some hits turned up but I wasn''t confident they were a result of this. Thank you. -- This message posted from opensolaris.org
Try this; #!/usr/sbin/dtrace -s #pragma D option quiet extern int errno; syscall::forkall:return, syscall::vfork:return, syscall::forksys:return, syscall::fork1:return / arg0 == -1 || arg1 == -1 / { printf("FORKED FAILED, errno: %d, arg0: %d, arg1: %d\n",errno, arg0, arg1); }
Thanks Jim. Will use this during the next testing window. -- This message posted from opensolaris.org
"not enough space" indicates an errno 28 ENOSPC, which isn''t listed is the fork man page under ERRORS. Are you sure it''s fork(2) that''s failing? It may be errno 12, ENOMEM. So what does a general memory health profile of the system look like? Lots of free memory? Plenty of swap space? How about the process executing the forks - is it a 32-bit or 64-bit process? If 32-bit, is it hitting the limits of it''s address space? /jim tester wrote:> Hi Jim, > > The app software doesn''t poduce a errno in its logs (bad software, although from a leading vendor, I think they inherited it, but a error string says "not enough space" I tried grepping some of the header files but could not find a match. > > /var/adm/messages: that''s the first thing I looked at, but no help there either. > > I tried to match the error string in src.opensolaris.org, some hits turned up but I wasn''t confident they were a result of this. > > Thank you. >
D''oh! Disregard that last question (address space) - my brain was thinking thread create failures - it''s not applicable to fork failures. My bad. The system memory and swap space health checks still apply, as well as process count - grab some "sar -v 1 60" samples.... /jim Jim Mauro wrote:> "not enough space" indicates an errno 28 ENOSPC, which isn''t > listed is the fork man page under ERRORS. Are you sure it''s > fork(2) that''s failing? > > It may be errno 12, ENOMEM. > > So what does a general memory health profile of the system > look like? Lots of free memory? Plenty of swap space? > > How about the process executing the forks - is it a > 32-bit or 64-bit process? If 32-bit, is it hitting the limits > of it''s address space? > > /jim > > tester wrote: >> Hi Jim, >> >> The app software doesn''t poduce a errno in its logs (bad software, >> although from a leading vendor, I think they inherited it, but a >> error string says "not enough space" I tried grepping some of the >> header files but could not find a match. >> >> /var/adm/messages: that''s the first thing I looked at, but no help >> there either. >> >> I tried to match the error string in src.opensolaris.org, some hits >> turned up but I wasn''t confident they were a result of this. >> >> Thank you. >> > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Jim, Thanks . Here is a snapshot from vmsat 5 o/p during the failure. I am not sure how this is going to be formatted on the forum. kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr rm s0 s1 s2 in sy cs us sy id [pre] 0 0 0 62386728 56443136 338 3941 0 0 0 0 0 0 0 0 0 19267 85670 37553 10 7 83 0 0 0 56037624 55132248 52 13591 0 0 0 0 0 0 0 0 0 3674 21572 1283 30 46 24 1 0 0 34962096 53029088 16 11773 0 0 0 0 0 0 0 0 0 4725 16248 1491 13 78 9 1 0 0 22086880 51409376 15 8730 0 0 0 0 0 0 0 0 0 4976 14733 1741 9 76 16 2 0 0 12591624 50035952 81 7687 0 0 0 0 0 0 0 0 0 4551 22864 1684 7 68 25 2 0 0 4891536 48774296 33 6575 0 0 0 0 0 0 0 0 0 3883 25456 2479 6 57 37 [b]400 0 0 2664312 48843472 12 3134 0 0 0 0 0 0 0 0 0 4154 9433 2123 3 35 62 [/b]143 0 0 9577832 49808640 31 662 0 0 0 0 0 0 0 0 0 8319 8140 1282 1 95 4 0 0 0 9863824 49818808 64 5725 0 0 0 0 0 0 0 0 0 12599 41575 24975 7 25 69 0 0 0 8854032 49553696 16 86 0 0 0 0 0 0 0 0 0 5499 114000 12349 5 35 60 [/pre] runqueue is bombarded. here is the progression on thenumber of processes forked every 5sec number of application processes running (sampled every 5 sec) ============================================= 5 8 7 7 7 7 21 51 55 132 622 896 1121 1293 1142 Thank you -- This message posted from opensolaris.org