James McIlree
2007-May-02 02:53 UTC
[dtrace-discuss] Deadlock when child process is forking?
I''ve found what I think is a deadlock in the dtrace fork code. My only Solaris test machine is a single cpu box, where I have been unable to reproduce in the field so far. I''ve been able to reproduce on other platforms. Here is the scenario. You''ve got a target process, which has static probes. It is calling fork(), which causes it to end up at line 330 in fork.c:> 328 if (p->p_dtrace_helpers != NULL) { > 329 ASSERT(dtrace_helpers_fork != NULL); > 330 (*dtrace_helpers_fork)(p, cp); > 331 } > 332 > 333 mutex_enter(&p->p_lock); > 334 p->p_flag &= ~SFORKING; > 335 sprunlock(p);As you can see, the parent is sprlock''d. The call to dtrace_helpers_fork() ends up at line 13526 in dtrace.c:> 13526 mutex_enter(&dtrace_lock); > 13527 ASSERT(from->p_dtrace_helpers != NULL); > 13528 ASSERT(dtrace_helpers > 0);So we have "owns sprlock, wants dtrace_lock". At the same time this is happening, we have a userland dtrace either registering or removing probes in the parent that is forking. If the dtrace process is enabling probes, it makes an ioctl call which immediately grabs the dtrace_lock. For DTRACEIOC_ENABLE here is an example backtrace: dtrace_ioctl dtrace_enabling_match dtrace_probe_enable dtrace_match dtrace_ecb_create_enable fasttrap_pid_enable fasttrap_pid_enable attempts to sprlock, which leaves us at "owns dtrace_lock, wants sprlock" I''d like to verify that I haven''t missed something, is this really a problem for Solaris? If so, what would be the right fix? From my reading of the code, I don''t think the fork needs to hold the sprlock across the call to dtrace_helpers_fork(), only the call to dtrace_fasttrap_fork(). James M
Adam Leventhal
2007-May-12 01:20 UTC
[dtrace-discuss] Deadlock when child process is forking?
Hey James, Nice find. That does indeed open the window for a deadlock, but it''s worse than you thought because we''ve also got the deadlock possibility on p_lock (which is taken as part of sprlock()). Since dtrace_helpers_destroy() is the only place where we re-assign a non-NULL p->p_dtrace_helpers, we can check in cfork() with the lock held and make the call after dropping both sprlock and p->p_lock. I''ve filed this bug: 6556673 potential deadlock with USDT tear-down and enable I think the solution might be as simple as this: ------- usr/src/uts/common/os/fork.c ------- --- - Fri May 11 18:19:38 2007 +++ usr/src/uts/common/os/fork.c Fri May 11 18:13:12 2007 @@ -326,11 +326,14 @@ * sprlock() may fail because the child is being forked. */ if (p->p_dtrace_helpers != NULL) { + sprunlock(p); ASSERT(dtrace_helpers_fork != NULL); (*dtrace_helpers_fork)(p, cp); + sprlock_proc(p); + } else { + mutex_enter(&p->p_lock); } - mutex_enter(&p->p_lock); p->p_flag &= ~SFORKING; sprunlock(p); } I need to think through it a bit more before I''m convinced. Thanks for investigating this. We certainly appreciate your diligent inspection of DTrace. Adam On Tue, May 01, 2007 at 07:53:16PM -0700, James McIlree wrote:> > I''ve found what I think is a deadlock in the dtrace fork code. > > My only Solaris test machine is a single cpu box, where I have been > unable to reproduce in the field so far. I''ve been able to reproduce > on other platforms. > > Here is the scenario. You''ve got a target process, which has > static probes. It is calling fork(), which causes it to end up at line > 330 > in fork.c: > > > > 328 if (p->p_dtrace_helpers != NULL) { > > 329 ASSERT(dtrace_helpers_fork != NULL); > > 330 (*dtrace_helpers_fork)(p, cp); > > 331 } > > 332 > > 333 mutex_enter(&p->p_lock); > > 334 p->p_flag &= ~SFORKING; > > 335 sprunlock(p); > > As you can see, the parent is sprlock''d. The call to > dtrace_helpers_fork() > ends up at line 13526 in dtrace.c: > > > >13526 mutex_enter(&dtrace_lock); > >13527 ASSERT(from->p_dtrace_helpers != NULL); > >13528 ASSERT(dtrace_helpers > 0); > > So we have "owns sprlock, wants dtrace_lock". > > At the same time this is happening, we have a userland dtrace either > registering or removing probes in the parent that is forking. > > If the dtrace process is enabling probes, it makes an ioctl call > which > immediately grabs the dtrace_lock. > > For DTRACEIOC_ENABLE here is an example backtrace: > > dtrace_ioctl > dtrace_enabling_match > dtrace_probe_enable > dtrace_match > dtrace_ecb_create_enable > fasttrap_pid_enable > > fasttrap_pid_enable attempts to sprlock, which leaves us at > > "owns dtrace_lock, wants sprlock" > > I''d like to verify that I haven''t missed something, is this really a > problem for Solaris? > > If so, what would be the right fix? From my reading of the code, I > don''t think the fork > needs to hold the sprlock across the call to dtrace_helpers_fork(), > only the call to > dtrace_fasttrap_fork(). > > James M > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl
Adam Leventhal
2007-May-16 20:22 UTC
[dtrace-discuss] Deadlock when child process is forking?
It seems to always be the case that we hit these hypothetical problems on systems shortly after speculating about their existence. Indeed, we hit this yesterday during some unrelated testing. Here are the diffs I settled on: @@ -328,13 +328,20 @@ * sprlock() may fail because the child is being forked. */ if (p->p_dtrace_helpers != NULL) { + mutex_enter(&p->p_lock); + sprunlock(p); + ASSERT(dtrace_helpers_fork != NULL); (*dtrace_helpers_fork)(p, cp); + + mutex_enter(&p->p_lock); + p->p_flag &= ~SFORKING; + mutex_exit(&p->p_lock); - - mutex_enter(&p->p_lock); - p->p_flag &= ~SFORKING; - sprunlock(p); + } else { + mutex_enter(&p->p_lock); + p->p_flag &= ~SFORKING; + sprunlock(p); } } Adam On Fri, May 11, 2007 at 06:20:26PM -0700, Adam Leventhal wrote:> Hey James, > > Nice find. That does indeed open the window for a deadlock, but it''s worse > than you thought because we''ve also got the deadlock possibility on p_lock > (which is taken as part of sprlock()). > > Since dtrace_helpers_destroy() is the only place where we re-assign a > non-NULL p->p_dtrace_helpers, we can check in cfork() with the lock > held and make the call after dropping both sprlock and p->p_lock. > > I''ve filed this bug: > > 6556673 potential deadlock with USDT tear-down and enable > > I think the solution might be as simple as this: > > ------- usr/src/uts/common/os/fork.c ------- > > --- - Fri May 11 18:19:38 2007 > +++ usr/src/uts/common/os/fork.c Fri May 11 18:13:12 2007 > @@ -326,11 +326,14 @@ > * sprlock() may fail because the child is being forked. > */ > if (p->p_dtrace_helpers != NULL) { > + sprunlock(p); > ASSERT(dtrace_helpers_fork != NULL); > (*dtrace_helpers_fork)(p, cp); > + sprlock_proc(p); > + } else { > + mutex_enter(&p->p_lock); > } > > - mutex_enter(&p->p_lock); > p->p_flag &= ~SFORKING; > sprunlock(p); > } > > > I need to think through it a bit more before I''m convinced. > > Thanks for investigating this. We certainly appreciate your diligent > inspection of DTrace. > > Adam > > On Tue, May 01, 2007 at 07:53:16PM -0700, James McIlree wrote: > > > > I''ve found what I think is a deadlock in the dtrace fork code. > > > > My only Solaris test machine is a single cpu box, where I have been > > unable to reproduce in the field so far. I''ve been able to reproduce > > on other platforms. > > > > Here is the scenario. You''ve got a target process, which has > > static probes. It is calling fork(), which causes it to end up at line > > 330 > > in fork.c: > > > > > > > 328 if (p->p_dtrace_helpers != NULL) { > > > 329 ASSERT(dtrace_helpers_fork != NULL); > > > 330 (*dtrace_helpers_fork)(p, cp); > > > 331 } > > > 332 > > > 333 mutex_enter(&p->p_lock); > > > 334 p->p_flag &= ~SFORKING; > > > 335 sprunlock(p); > > > > As you can see, the parent is sprlock''d. The call to > > dtrace_helpers_fork() > > ends up at line 13526 in dtrace.c: > > > > > > >13526 mutex_enter(&dtrace_lock); > > >13527 ASSERT(from->p_dtrace_helpers != NULL); > > >13528 ASSERT(dtrace_helpers > 0); > > > > So we have "owns sprlock, wants dtrace_lock". > > > > At the same time this is happening, we have a userland dtrace either > > registering or removing probes in the parent that is forking. > > > > If the dtrace process is enabling probes, it makes an ioctl call > > which > > immediately grabs the dtrace_lock. > > > > For DTRACEIOC_ENABLE here is an example backtrace: > > > > dtrace_ioctl > > dtrace_enabling_match > > dtrace_probe_enable > > dtrace_match > > dtrace_ecb_create_enable > > fasttrap_pid_enable > > > > fasttrap_pid_enable attempts to sprlock, which leaves us at > > > > "owns dtrace_lock, wants sprlock" > > > > I''d like to verify that I haven''t missed something, is this really a > > problem for Solaris? > > > > If so, what would be the right fix? From my reading of the code, I > > don''t think the fork > > needs to hold the sprlock across the call to dtrace_helpers_fork(), > > only the call to > > dtrace_fasttrap_fork(). > > > > James M > > > > _______________________________________________ > > dtrace-discuss mailing list > > dtrace-discuss at opensolaris.org > > -- > Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl