Chris Debenham - RSD Engineer
2007-Jul-25 00:52 UTC
[dtrace-discuss] proc:::exit not firing
I have a funny issue whereby when a application exits the proc:::exit probe doesn''t seem to be firing. A customer has written a dtrace script that outputs some info whenever proc:::exit fires but on occasion (random, no pattern in behaviour) dtrace exits without outputting the data. They are running dtrace via ''dtrace -s exit_trace.d -p <pid>'' Can anyone think of any reason this should be so? From my reading of various documents, solaris internals and other spots it appears that proc:::exit should ALWAYS be fired regardless of how the process ends. Thanks Chris Debenham The script they are using is as follows: --- begin script --- proc:::signal-send /args[1]->pr_pid == $target/ { last_signal = args[2]; last_signal_sender_pid = pid; last_signal_sender_name = execname; } proc:::exit /pid == $target && args[0] == CLD_EXITED/ { printf( "%Y normal exit.", walltimestamp ); ustack(); } proc:::exit /pid == $target && args[0] == CLD_KILLED/ { printf( "%Y exited by signal. signal=%d, sent by=%s (%d)", walltimestamp, last_signal, last_signal_sender_name, last_signal_sender_pid ); } proc:::exit /pid == $target && args[0] == CLD_DUMPED/ { printf( "%Y core dump", walltimestamp ); ustack(); } proc:::exit /pid == $target && args[0] != CLD_EXITED && args[0] != CLD_KILLED && args[0] != CLD_CONTINUED && args[0] != CLD_DUMPED/ { printf( "%Y stop or cont. status = %d", walltimestamp, args[0] ); } --- end script --- -- ------------------------------------------------------------------------------- ,-_|\ TSC Engineer E-mail : debenham at sun.com / \ Sun Microsystems Australia Pty Ltd. Direct : +61 (2) 9844 5188 \_,-\_* 828 Pacific Highway Phone : +61 (2) 9844 5000 v Gordon, N.S.W. 2072 Fax : +61 (2) 9844 5189 Mobile : +61 (40) 9844 514 ------------------------------------------------------------------------------- Don''t Worry About Life; You''re Not Going To Survive It, Anyway.
Chris, Here''s what I think might be happening. The process specified as the -p argument fires the exit probe and the record of that goes in the switch buffer. Then it becomes a race as to which happens first: the script exits because its "process of interest" exited, or the probe clause executes when it sees the record in the buffer. By default, DTrace uses a switch buffer (one set per CPU). One buffer is active and being written to by firing probes, while the other is inactive and be read from by the DTrace consumer. By default the switch from active to inactive happens once a second, so the exit probe could be recorded a full second before the D script would get a chance to see it. You could reduce the switch time, but that just decreases the odds of a mishap. To see if this is the problem, try making the process of interest a regular argument rather than a "-p" argument. Then you refer to it as $1 rather than $target. That will prevent the script from automatically exiting when the process exits, but you could manually exit yourself in the last proc:::exit clause. It''s only a theory. ;-) Chip Chris Debenham - RSD Engineer wrote:> I have a funny issue whereby when a application exits the proc:::exit > probe doesn''t seem to be firing. > A customer has written a dtrace script that outputs some info whenever > proc:::exit fires but on occasion (random, no pattern in behaviour) > dtrace exits without outputting the data. > > They are running dtrace via ''dtrace -s exit_trace.d -p <pid>'' > > Can anyone think of any reason this should be so? > From my reading of various documents, solaris internals and other spots > it appears that proc:::exit should ALWAYS be fired regardless of how the > process ends. > > Thanks > Chris Debenham > > The script they are using is as follows: > > --- begin script --- > proc:::signal-send > /args[1]->pr_pid == $target/ > { > last_signal = args[2]; > last_signal_sender_pid = pid; > last_signal_sender_name = execname; > } > > proc:::exit > /pid == $target && args[0] == CLD_EXITED/ > { > printf( "%Y normal exit.", walltimestamp ); > ustack(); > } > > proc:::exit > /pid == $target && args[0] == CLD_KILLED/ > { > printf( "%Y exited by signal. signal=%d, sent by=%s (%d)", > walltimestamp, > last_signal, > last_signal_sender_name, > last_signal_sender_pid ); > } > > proc:::exit > /pid == $target && args[0] == CLD_DUMPED/ > { > printf( "%Y core dump", walltimestamp ); > ustack(); > } > > proc:::exit > /pid == $target && args[0] != CLD_EXITED && args[0] != CLD_KILLED && > args[0] != CLD_CONTINUED && args[0] != CLD_DUMPED/ > { > printf( "%Y stop or cont. status = %d", walltimestamp, args[0] ); > } > > --- end script --- > >
Chris, As I slept on it, I realized that there was one part of the explanation below that''s not quite accurate. Most of the guts of a probe clause runs in the kernel when the probe fires. The printf actions dictate the layout and length of the record that gets placed in the switch buffer. But this doesn''t really change the idea that the your DTrace process on the user side is probably exiting before it gets a chance to get the proc:::exit record out of the buffer. Chip Chip Bennett wrote:> Chris, > > Here''s what I think might be happening. The process specified as the -p > argument fires the exit probe and the record of that goes in the switch > buffer. Then it becomes a race as to which happens first: the script > exits because its "process of interest" exited, or the probe clause > executes when it sees the record in the buffer. > > By default, DTrace uses a switch buffer (one set per CPU). One buffer > is active and being written to by firing probes, while the other is > inactive and be read from by the DTrace consumer. By default the switch > from active to inactive happens once a second, so the exit probe could > be recorded a full second before the D script would get a chance to see it. > > You could reduce the switch time, but that just decreases the odds of a > mishap. To see if this is the problem, try making the process of > interest a regular argument rather than a "-p" argument. Then you refer > to it as $1 rather than $target. That will prevent the script from > automatically exiting when the process exits, but you could manually > exit yourself in the last proc:::exit clause. > > It''s only a theory. ;-) > > Chip > > Chris Debenham - RSD Engineer wrote: > >> I have a funny issue whereby when a application exits the proc:::exit >> probe doesn''t seem to be firing. >> A customer has written a dtrace script that outputs some info whenever >> proc:::exit fires but on occasion (random, no pattern in behaviour) >> dtrace exits without outputting the data. >> >> They are running dtrace via ''dtrace -s exit_trace.d -p <pid>'' >> >> Can anyone think of any reason this should be so? >> From my reading of various documents, solaris internals and other spots >> it appears that proc:::exit should ALWAYS be fired regardless of how the >> process ends. >> >> Thanks >> Chris Debenham >> >> The script they are using is as follows: >> >> --- begin script --- >> proc:::signal-send >> /args[1]->pr_pid == $target/ >> { >> last_signal = args[2]; >> last_signal_sender_pid = pid; >> last_signal_sender_name = execname; >> } >> >> proc:::exit >> /pid == $target && args[0] == CLD_EXITED/ >> { >> printf( "%Y normal exit.", walltimestamp ); >> ustack(); >> } >> >> proc:::exit >> /pid == $target && args[0] == CLD_KILLED/ >> { >> printf( "%Y exited by signal. signal=%d, sent by=%s (%d)", >> walltimestamp, >> last_signal, >> last_signal_sender_name, >> last_signal_sender_pid ); >> } >> >> proc:::exit >> /pid == $target && args[0] == CLD_DUMPED/ >> { >> printf( "%Y core dump", walltimestamp ); >> ustack(); >> } >> >> proc:::exit >> /pid == $target && args[0] != CLD_EXITED && args[0] != CLD_KILLED && >> args[0] != CLD_CONTINUED && args[0] != CLD_DUMPED/ >> { >> printf( "%Y stop or cont. status = %d", walltimestamp, args[0] ); >> } >> >> --- end script --- >> >> >> > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >