Just to let people know what my big picture is, I''m trying to write a script that will let me run a program, and name a progeny of that program that I want to debug. My script should find the first occurrence of that progeny, and run it until it finishes initializing the runtime linker, but stop it before it runs any shared library startup routines. (Failing that, I''d be okay with stopping it at main()) Anyway, my latest glitch looks like this. When I attach dtrace to a process that''s being waited on by another process, and then call prun inside that dtrace script, my victim process gets a trace trap and core dumps. If I continue the same stopped process using "prun" from the command line, all is fine. Any ideas? (This is S10u1) dtrace2='' BEGIN { printf("resuming %d\n", $target); system("/bin/prun %d\n",$target); } '' dtrace -wqn "$dtrace2" -p $mypid --chris
Chris, I just tried running "yes > /dev/null" in background, did a "kill -STOP" on the pid, and then ran your DTrace script against it, and it worked fine. I wonder if the problem could be a function of the program that is being stopped, or where it''s being stopped. Can you try what I did, and see if it works on your system? (BTW, I tried it on Sol 10 3/05, which could also be a difference.) Chip Chris Quenelle wrote:> Just to let people know what my big picture is, I''m trying to write a > script > that will let me run a program, and name a progeny of that program > that I want to debug. My script should find the first occurrence > of that progeny, and run it until it finishes initializing the > runtime linker, but stop it before it runs any shared library startup > routines. (Failing that, I''d be okay with stopping it at main()) > > Anyway, my latest glitch looks like this. When I attach dtrace > to a process that''s being waited on by another process, and then > call prun inside that dtrace script, my victim process gets a > trace trap and core dumps. > > If I continue the same stopped process using "prun" from the > command line, all is fine. > > Any ideas? (This is S10u1) > > dtrace2='' > BEGIN > { printf("resuming %d\n", $target); system("/bin/prun %d\n",$target); } > '' > dtrace -wqn "$dtrace2" -p $mypid > > > --chris > > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
Thanks for the prodding, Chip. ;-) My overall goal is to get a specific process advanced to the point right before it runs the shared library init sections, and then hand it off to a debugger. It looks like dtrace will automatically resume the target process if it is stopped via /proc. The trace trap only happens if I try to resume the target process using prun in the BEGIN clause. But when I let dtrace resume the process it looks like the stop() command is not being trated synchronously. Here''s my all-in-one script to demo the problem. ------------------- #!/bin/ksh -v # I get the same results using both ''sh'' and ''ksh'' # for the ''qqscript'' script. rm -f qqscript cat > qqscript << EOM #!/bin/sh yes > /dev/null EOM chmod +x qqscript dtrace='' syscall::exec*:return /progenyof($target) && execname == "yes"/ { printf("%d\n",pid); stop(); exit(0); } '' rm -f /tmp/qqpid dtrace -wqn "$dtrace" -c ./qqscript > /tmp/qqpid mypid=$(cat /tmp/qqpid) echo PID = $mypid if [ "$mypid" == "" ]; then echo "Process ($pname) not found." exit 1; fi echo stack1 of $mypid is: pstack $mypid # # If I comment out the BEGIN/prun part, then the victim # process still gets resumed (automatically?) by dtrace, # but the rtld_db_preinit probe doesn''t fire. Very strange. # If I leave in the prun command, then the ''yes'' command # inside the above shell script returns a trace/trap error # to the ksh interpreter running the ''qqscript'' script. dtrace2='' /* BEGIN { printf("resuming %d\n", $target); system("/bin/prun %d\n",$target); } */ pid$target::main:entry { printf("stopping %d\n",$target); stop(); exit(0); } '' dtrace -wqn "$dtrace2" -p $mypid echo stack2 of $mypid is: pstack $mypid echo flags of $mypid is: pflags $mypid
I think you''re right about continuing the process being automatic. The DTrace guide says that DTrace "grabs" the process identified with "-p", and the DTrace code leads to a call to "pgrab" in libproc, which seemed to be doing a lot more than just acknowledging the existence of the process. In the second DTrace script, instead of grabbing the process with "-p" I just passed it in as an argument. I think this gets closer to the results you''re looking for, but the stack trace still doesn''t look right. (BTW, I''m not following what you said about the rtld_db_preinit probe not firing. Is this connected with "pid$target::main:entry"?) Here''s the script I tried: --------------------------- #!/bin/ksh -v # I get the same results using both ''sh'' and ''ksh'' # for the ''qqscript'' script. rm -f qqscript cat > qqscript << EOM #!/bin/sh yes > /dev/null EOM chmod +x qqscript dtrace='' syscall::exec*:return /progenyof($target) && execname == "yes"/ { printf("%d\n",pid); stop(); exit(0); } '' rm -f /tmp/qqpid dtrace -wqn "$dtrace" -c ./qqscript > /tmp/qqpid mypid=$(cat /tmp/qqpid) echo PID = $mypid if [ "$mypid" == "" ]; then echo "Process ($pname) not found." exit 1; fi echo stack1 of $mypid is: pstack $mypid # # If I comment out the BEGIN/prun part, then the victim # process still gets resumed (automatically?) by dtrace, # but the rtld_db_preinit probe doesn''t fire. Very strange. # If I leave in the prun command, then the ''yes'' command # inside the above shell script returns a trace/trap error # to the ksh interpreter running the ''qqscript'' script. dtrace2='' BEGIN { printf("resuming %d\n", $1); system("/bin/prun %d\n",$1); } pid$1::main:entry { printf("stopping %d\n",$1); stop(); exit(0); } '' dtrace -wqn "$dtrace2" $mypid echo stack2 of $mypid is: pstack $mypid echo flags of $mypid is: pflags $mypid --------------------- Chip
I think we''ve hit another instance of this bug: 6248750 dtrace -p, stop() and exec(2) collide Description [ahl 3.31.2005] Drew Balfour recently hit a problem where a process didnt'' seem to stop when this DTrace invocation was applied: # dtrace -w -n ''proc:::exec-success/pid == $target/{ stop(); }'' -p 1234 adam.leventhal at sun.com 2005-03-31 16:52:42 GMT Entry 1 adam.leventhal [2005-03-31 16:52] Comments [ahl 3.31.2005] The problem is that libdtrace traces the exec system call for processes it''s monitoring: /* * We must trace exit from exec() system calls so that if the exec is * successful, we can reset our breakpoints and re-initialize libproc. */ (void) Psysexit(P, SYS_exec, B_TRUE); (void) Psysexit(P, SYS_execve, B_TRUE); But if you try stopping when you hit exec, libdtrace will get it running again thinking that it stopped at libdtrace''s behest rather than the actual D program. Both requests to stop manifest themselves as PR_SYSEXIT. This could probably be fixed with some cleverness in libdtrace, but it may be a better idea to add code to the systrace provider to check for stopping conditions on syscall exit with PR_REQUESTED as we do for entry to syscalls: /* * We want to explicitly allow DTrace consumers to stop a process * before it actually executes the meat of the syscall. */ p = ttoproc(curthread); mutex_enter(&p->p_lock); if (curthread->t_dtrace_stop && !curthread->t_lwp->lwp_nostop) { curthread->t_dtrace_stop = 0; stop(PR_REQUESTED, 0); } mutex_exit(&p->p_lock); adam.leventhal at sun.com 2005-03-31 16:52:42 GMT Entry 1 adam.leventhal [2005-03-31 16:52] Adam On Wed, Jun 07, 2006 at 11:44:35AM -0500, Chip Bennett wrote:> I think you''re right about continuing the process being automatic. The > DTrace guide says that DTrace "grabs" the process identified with "-p", > and the DTrace code leads to a call to "pgrab" in libproc, which seemed > to be doing a lot more than just acknowledging the existence of the process. > > In the second DTrace script, instead of grabbing the process with "-p" I > just passed it in as an argument. I think this gets closer to the > results you''re looking for, but the stack trace still doesn''t look > right. (BTW, I''m not following what you said about the rtld_db_preinit > probe not firing. Is this connected with "pid$target::main:entry"?) > > Here''s the script I tried: > --------------------------- > #!/bin/ksh -v > > # I get the same results using both ''sh'' and ''ksh'' > # for the ''qqscript'' script. > > rm -f qqscript > cat > qqscript << EOM > #!/bin/sh > yes > /dev/null > EOM > > chmod +x qqscript > > dtrace='' > syscall::exec*:return > /progenyof($target) && execname == "yes"/ > { printf("%d\n",pid); stop(); exit(0); } > '' > > rm -f /tmp/qqpid > dtrace -wqn "$dtrace" -c ./qqscript > /tmp/qqpid > > mypid=$(cat /tmp/qqpid) > echo PID = $mypid > > if [ "$mypid" == "" ]; then > echo "Process ($pname) not found." > exit 1; > fi > > echo stack1 of $mypid is: > pstack $mypid > > # > # If I comment out the BEGIN/prun part, then the victim > # process still gets resumed (automatically?) by dtrace, > # but the rtld_db_preinit probe doesn''t fire. Very strange. > # If I leave in the prun command, then the ''yes'' command > # inside the above shell script returns a trace/trap error > # to the ksh interpreter running the ''qqscript'' script. > > dtrace2='' > > BEGIN > { printf("resuming %d\n", $1); system("/bin/prun %d\n",$1); } > > > pid$1::main:entry > { printf("stopping %d\n",$1); stop(); exit(0); } > '' > > dtrace -wqn "$dtrace2" $mypid > > echo stack2 of $mypid is: > pstack $mypid > echo flags of $mypid is: > pflags $mypid > > --------------------- > > Chip > > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
Good idea, Chip. I made that change and the script works better now. It seems that the process is left paused with the PC pointing at a staging area that dtrace has used to resume the process. % dbx - 4770 Reading yes Reading ld.so.1 Reading libc.so.1 Attached to process 4770 stopped in (unknown) at 0xff3a2040 0xff3a2040: save %g1, %g0, %sp (dbx) si stopped in main at 0x000106b4 0x000106b4: main+0x0004: sethi %hi(0x22000), %g1 (dbx) dis main 0x000106b0: main : save %sp, -96, %sp So it looks like the program was stopped with PC==<patch area> and NPC==main+4. This makes sense, but its a little confusing if I attach the process with another tool. Any reason why the PC can''t be set to &main in this case? WRT rtld_db_preinit: Instead of stopping at main, I really wanted to stop before any dynamic library inits are run, but after rtld has set up the link map. When I realized I could reproduce the same bug using a probe on main, I changed the script to use main, but I didn''t fix the comment. --chris Chip Bennett wrote:> I think you''re right about continuing the process being automatic. The > DTrace guide says that DTrace "grabs" the process identified with "-p", > and the DTrace code leads to a call to "pgrab" in libproc, which seemed > to be doing a lot more than just acknowledging the existence of the > process. > > In the second DTrace script, instead of grabbing the process with "-p" I > just passed it in as an argument. I think this gets closer to the > results you''re looking for, but the stack trace still doesn''t look > right. (BTW, I''m not following what you said about the rtld_db_preinit > probe not firing. Is this connected with "pid$target::main:entry"?) > > Here''s the script I tried: > ---------------------------
Adam Leventhal wrote:> I think we''ve hit another instance of this bug: > > 6248750 dtrace -p, stop() and exec(2) collide >Yup, that looks like the bug I hit. Chip''s suggestion is a workaround for this problem, I think. --chris
> > WRT rtld_db_preinit: Instead of stopping at main, I really > wanted to stop before any dynamic library inits are run, > but after rtld has set up the link map. When I realized > I could reproduce the same bug using a probe on main, I > changed the script to use main, but I didn''t fix the comment. > > --chrisThe bug Adam mentioned aside, we do have an option to control the above behavior when starting a process with dtrace -c. The -xevaltime option controls when the process is initially stopped to evaluate the D program and then begin tracing. The values are: -x evaltime=exec ... stop at the first instruction (after exec(2) returns) -x evaltime=preinit ... before .init sections run (default) -x evaltime=postinit ... after .init sections run -x evaltime=main ... stop on first instruction of main() -Mike -- Mike Shapiro, Solaris Kernel Development. blogs.sun.com/mws/
That''s good to know, and it''s close to what I want, but not quite. The program I''m starting with -c is not the real process I''m interested in. The one I attach to with -p (in pass 2) is the one I am really interested in. I think a PREINIT and POSTINIT pseudo-section that works like BEGIN would be useful. Also, a "pidtreeNNN" provider that enables probes in all the progeny of the named process-id would have been useful for my script. --chris> The bug Adam mentioned aside, we do have an option to control the above > behavior when starting a process with dtrace -c. The -xevaltime option > controls when the process is initially stopped to evaluate the D program > and then begin tracing. The values are: > > -x evaltime=exec ... stop at the first instruction (after exec(2) returns) > -x evaltime=preinit ... before .init sections run (default) > -x evaltime=postinit ... after .init sections run > -x evaltime=main ... stop on first instruction of main() > > -Mike >