Hi Rick, looks like your latest patch nailed the issue. The box has been up for 3 days: ck at noc3:~ % uptime 12:22PM up 3 days, 4:11, 1 user, load averages: 0.07, 0.10, 0.08 ck at noc3:~ % If it does not crash over the weekend this seems to be it: ck at noc3:/usr/src % svn diff sys/netsmb/smb_iod.c Index: sys/netsmb/smb_iod.c ==================================================================--- sys/netsmb/smb_iod.c (revision 289211) +++ sys/netsmb/smb_iod.c (working copy) @@ -659,6 +659,11 @@ break; tsleep(&iod->iod_flags, PWAIT, "90idle", iod->iod_sleeptimo); } + + /* We can now safely destroy the mutexes and free the iod structure. */ + smb_sl_destroy(&iod->iod_rqlock); + smb_sl_destroy(&iod->iod_evlock); + free(iod, M_SMBIOD); mtx_unlock(&Giant); kproc_exit(0); } @@ -695,9 +700,6 @@ smb_iod_destroy(struct smbiod *iod) { smb_iod_request(iod, SMBIOD_EV_SHUTDOWN | SMBIOD_EV_SYNC, NULL); - smb_sl_destroy(&iod->iod_rqlock); - smb_sl_destroy(&iod->iod_evlock); - free(iod, M_SMBIOD); return 0; } ck at noc3:/usr/src % Can you get this committed into current and later stable ? Greetings Christian On Mon, 12 Oct 2015, Rick Macklem wrote:> Christian Kratzer wrote: >> Hi Rick, >> >> On Mon, 12 Oct 2015, Rick Macklem wrote: >> >>> Christian Kratzer wrote: >>>> Hi Rick, >>>> >>>> there was also a second more recent crash in /var/crash >>>> >>>> Mon Oct 12 03:01:16 CEST 2015 >>>> >>>> FreeBSD noc3.cksoft.de 10.2-STABLE FreeBSD 10.2-STABLE #2 r288980M: >>>> Sun >>>> Oct 11 08:37:40 CEST 2015 >>>> ck at noc3.cksoft.de:/usr/obj/usr/src/sys/NOC amd64 >>>> >>>> panic: Assertion mtx_unowned(m) failed at >>>> /usr/src/sys/kern/kern_mutex.c:955 >>>> >>> Oops, I screwed up. I should have looked at this panic assertion when you >>> reported >>> it before. Ok, so if I understand the assertion correctly, it means that >>> another >>> thread has the mutex locked. If this is correct, I'll have to take another >>> look at >>> the code and figure out how to wait for these other threads to finish with >>> the mutexes. >>> >>> I do think the patch fixes the race I saw, but there must be other races in >>> the code. >>> >>> I'll take another look, but if anyone else is conversant with netsmb, feel >>> free to >>> jump in, because it is all new to me. >>> >>> Unfortunately, I won't have any way to do testing for the next month or so, >>> so any >>> patches I do come up with will be "try this untested..". >> >> thats no problem. >> >> Just keep the patches coming when you have time and tell me when to reset >> back to stable, >> current or whatever so we don't lose sync of the status. >> > Well, you can try the attached one instead of the previous ones (ie. against stable). > It just delays destroying the mutexes until the iod thread is exiting. > > I can't quite see why the previous patches wouldn't fix it, but this one leaves > smb_iod_main() unchanged, so it is a simpler patch and doesn't affect semantics > except for a slight delay in destroying the mutexes. > >> As it looks like that the race happens on unmount I could try putting a sleep >> 60 into the >> script that does the "mount && rsycn && umount" magic just before the umount. >> That would >> allow anything that it slow to go away to perhaps release the mutexes before >> the umount. >> > If it still crashes with this patch, it might be worth a try. > > Or, if this patch still crashes, you could just delete the 3 lines that the > patch moves, so the mutexes are never destroyed. This would result in a leak, > but it would tell us if destroying these mutexes is the problem. > > Thanks for your willingness to test these, rick > >> Not a real fix of course but might help to verify what's going on. >> >> Greetings >> Christian >> >> >> -- >> Christian Kratzer CK Software GmbH >> Email: ck at cksoft.de Wildberger Weg 24/2 >> Phone: +49 7032 893 997 - 0 D-71126 Gaeufelden >> Fax: +49 7032 893 997 - 9 HRB 245288, Amtsgericht Stuttgart >> Mobile: +49 171 1947 843 Geschaeftsfuehrer: Christian Kratzer >> Web: http://www.cksoft.de/ >> _______________________________________________ >> freebsd-stable at freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" >> >-- Christian Kratzer CK Software GmbH Email: ck at cksoft.de Wildberger Weg 24/2 Phone: +49 7032 893 997 - 0 D-71126 Gaeufelden Fax: +49 7032 893 997 - 9 HRB 245288, Amtsgericht Stuttgart Mobile: +49 171 1947 843 Geschaeftsfuehrer: Christian Kratzer Web: http://www.cksoft.de/
Christian Kratzer wrote:> Hi Rick, > > looks like your latest patch nailed the issue. The box has been up for 3 > days: > > ck at noc3:~ % uptime > 12:22PM up 3 days, 4:11, 1 user, load averages: 0.07, 0.10, 0.08 > ck at noc3:~ % > > If it does not crash over the weekend this seems to be it: >Sounds good. Although I wouldn't have thought it could happen in practice, I did spot how a race could still occur with the last patch. Since smb_iod_request() did the msleep() with PDROP, it could return any time after the wakeup(evp) and destroy the mutexes before the iod thread was done with them. Anyhow, if it fixes the problem, I guess we're happy. Btw, I think PR#172942 and 201912 may both be caused by the same problem. I've asked the people that reported these to test the patch.> > ck at noc3:/usr/src % svn diff sys/netsmb/smb_iod.c > Index: sys/netsmb/smb_iod.c > ==================================================================> --- sys/netsmb/smb_iod.c (revision 289211) > +++ sys/netsmb/smb_iod.c (working copy) > @@ -659,6 +659,11 @@ > break; > tsleep(&iod->iod_flags, PWAIT, "90idle", > iod->iod_sleeptimo); > } > + > + /* We can now safely destroy the mutexes and free the iod structure. > */ > + smb_sl_destroy(&iod->iod_rqlock); > + smb_sl_destroy(&iod->iod_evlock); > + free(iod, M_SMBIOD); > mtx_unlock(&Giant); > kproc_exit(0); > } > @@ -695,9 +700,6 @@ > smb_iod_destroy(struct smbiod *iod) > { > smb_iod_request(iod, SMBIOD_EV_SHUTDOWN | SMBIOD_EV_SYNC, NULL); > - smb_sl_destroy(&iod->iod_rqlock); > - smb_sl_destroy(&iod->iod_evlock); > - free(iod, M_SMBIOD); > return 0; > } > > ck at noc3:/usr/src % > > > Can you get this committed into current and later stable ? >I should be able to do this, although not until mid-Nov. I'd also like to hear from the folk that reported the PRs. John, maybe you could review this? Thanks for your help testing this, rick> Greetings > Christian > > > > On Mon, 12 Oct 2015, Rick Macklem wrote: > > > Christian Kratzer wrote: > >> Hi Rick, > >> > >> On Mon, 12 Oct 2015, Rick Macklem wrote: > >> > >>> Christian Kratzer wrote: > >>>> Hi Rick, > >>>> > >>>> there was also a second more recent crash in /var/crash > >>>> > >>>> Mon Oct 12 03:01:16 CEST 2015 > >>>> > >>>> FreeBSD noc3.cksoft.de 10.2-STABLE FreeBSD 10.2-STABLE #2 r288980M: > >>>> Sun > >>>> Oct 11 08:37:40 CEST 2015 > >>>> ck at noc3.cksoft.de:/usr/obj/usr/src/sys/NOC amd64 > >>>> > >>>> panic: Assertion mtx_unowned(m) failed at > >>>> /usr/src/sys/kern/kern_mutex.c:955 > >>>> > >>> Oops, I screwed up. I should have looked at this panic assertion when you > >>> reported > >>> it before. Ok, so if I understand the assertion correctly, it means that > >>> another > >>> thread has the mutex locked. If this is correct, I'll have to take > >>> another > >>> look at > >>> the code and figure out how to wait for these other threads to finish > >>> with > >>> the mutexes. > >>> > >>> I do think the patch fixes the race I saw, but there must be other races > >>> in > >>> the code. > >>> > >>> I'll take another look, but if anyone else is conversant with netsmb, > >>> feel > >>> free to > >>> jump in, because it is all new to me. > >>> > >>> Unfortunately, I won't have any way to do testing for the next month or > >>> so, > >>> so any > >>> patches I do come up with will be "try this untested..". > >> > >> thats no problem. > >> > >> Just keep the patches coming when you have time and tell me when to reset > >> back to stable, > >> current or whatever so we don't lose sync of the status. > >> > > Well, you can try the attached one instead of the previous ones (ie. > > against stable). > > It just delays destroying the mutexes until the iod thread is exiting. > > > > I can't quite see why the previous patches wouldn't fix it, but this one > > leaves > > smb_iod_main() unchanged, so it is a simpler patch and doesn't affect > > semantics > > except for a slight delay in destroying the mutexes. > > > >> As it looks like that the race happens on unmount I could try putting a > >> sleep > >> 60 into the > >> script that does the "mount && rsycn && umount" magic just before the > >> umount. > >> That would > >> allow anything that it slow to go away to perhaps release the mutexes > >> before > >> the umount. > >> > > If it still crashes with this patch, it might be worth a try. > > > > Or, if this patch still crashes, you could just delete the 3 lines that the > > patch moves, so the mutexes are never destroyed. This would result in a > > leak, > > but it would tell us if destroying these mutexes is the problem. > > > > Thanks for your willingness to test these, rick > > > >> Not a real fix of course but might help to verify what's going on. > >> > >> Greetings > >> Christian > >> > >> > >> -- > >> Christian Kratzer CK Software GmbH > >> Email: ck at cksoft.de Wildberger Weg 24/2 > >> Phone: +49 7032 893 997 - 0 D-71126 Gaeufelden > >> Fax: +49 7032 893 997 - 9 HRB 245288, Amtsgericht Stuttgart > >> Mobile: +49 171 1947 843 Geschaeftsfuehrer: Christian Kratzer > >> Web: http://www.cksoft.de/ > >> _______________________________________________ > >> freebsd-stable at freebsd.org mailing list > >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable > >> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" > >> > > > > -- > Christian Kratzer CK Software GmbH > Email: ck at cksoft.de Wildberger Weg 24/2 > Phone: +49 7032 893 997 - 0 D-71126 Gaeufelden > Fax: +49 7032 893 997 - 9 HRB 245288, Amtsgericht Stuttgart > Mobile: +49 171 1947 843 Geschaeftsfuehrer: Christian Kratzer > Web: http://www.cksoft.de/ > _______________________________________________ > freebsd-stable at freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" >
Christian Kratzer wrote:> Hi Rick, > > looks like your latest patch nailed the issue. The box has been up for 3 > days: > > ck at noc3:~ % uptime > 12:22PM up 3 days, 4:11, 1 user, load averages: 0.07, 0.10, 0.08 > ck at noc3:~ % > > If it does not crash over the weekend this seems to be it: >When I took a closer look, it appears that PR 172942 was a different crash and it appears that one was fixed via r264600. Your problem does not appear to be in the bugs database. (I will commit the patch in mid-November anyhow, but creating a PR for this might be useful for others.) Btw, I think the attached patch (which includes this change) also fixes a problem that caused a crash during mounting, reported via PR 201912. (If you`d like to test this one that would be appreciated. It should be applied to code not already patched with the one below, since the below patch is included in it.) Thanks for your help with this, rick> > ck at noc3:/usr/src % svn diff sys/netsmb/smb_iod.c > Index: sys/netsmb/smb_iod.c > ==================================================================> --- sys/netsmb/smb_iod.c (revision 289211) > +++ sys/netsmb/smb_iod.c (working copy) > @@ -659,6 +659,11 @@ > break; > tsleep(&iod->iod_flags, PWAIT, "90idle", > iod->iod_sleeptimo); > } > + > + /* We can now safely destroy the mutexes and free the iod structure. > */ > + smb_sl_destroy(&iod->iod_rqlock); > + smb_sl_destroy(&iod->iod_evlock); > + free(iod, M_SMBIOD); > mtx_unlock(&Giant); > kproc_exit(0); > } > @@ -695,9 +700,6 @@ > smb_iod_destroy(struct smbiod *iod) > { > smb_iod_request(iod, SMBIOD_EV_SHUTDOWN | SMBIOD_EV_SYNC, NULL); > - smb_sl_destroy(&iod->iod_rqlock); > - smb_sl_destroy(&iod->iod_evlock); > - free(iod, M_SMBIOD); > return 0; > } > > ck at noc3:/usr/src % > > > Can you get this committed into current and later stable ? > > Greetings > Christian > > > > On Mon, 12 Oct 2015, Rick Macklem wrote: > > > Christian Kratzer wrote: > >> Hi Rick, > >> > >> On Mon, 12 Oct 2015, Rick Macklem wrote: > >> > >>> Christian Kratzer wrote: > >>>> Hi Rick, > >>>> > >>>> there was also a second more recent crash in /var/crash > >>>> > >>>> Mon Oct 12 03:01:16 CEST 2015 > >>>> > >>>> FreeBSD noc3.cksoft.de 10.2-STABLE FreeBSD 10.2-STABLE #2 r288980M: > >>>> Sun > >>>> Oct 11 08:37:40 CEST 2015 > >>>> ck at noc3.cksoft.de:/usr/obj/usr/src/sys/NOC amd64 > >>>> > >>>> panic: Assertion mtx_unowned(m) failed at > >>>> /usr/src/sys/kern/kern_mutex.c:955 > >>>> > >>> Oops, I screwed up. I should have looked at this panic assertion when you > >>> reported > >>> it before. Ok, so if I understand the assertion correctly, it means that > >>> another > >>> thread has the mutex locked. If this is correct, I'll have to take > >>> another > >>> look at > >>> the code and figure out how to wait for these other threads to finish > >>> with > >>> the mutexes. > >>> > >>> I do think the patch fixes the race I saw, but there must be other races > >>> in > >>> the code. > >>> > >>> I'll take another look, but if anyone else is conversant with netsmb, > >>> feel > >>> free to > >>> jump in, because it is all new to me. > >>> > >>> Unfortunately, I won't have any way to do testing for the next month or > >>> so, > >>> so any > >>> patches I do come up with will be "try this untested..". > >> > >> thats no problem. > >> > >> Just keep the patches coming when you have time and tell me when to reset > >> back to stable, > >> current or whatever so we don't lose sync of the status. > >> > > Well, you can try the attached one instead of the previous ones (ie. > > against stable). > > It just delays destroying the mutexes until the iod thread is exiting. > > > > I can't quite see why the previous patches wouldn't fix it, but this one > > leaves > > smb_iod_main() unchanged, so it is a simpler patch and doesn't affect > > semantics > > except for a slight delay in destroying the mutexes. > > > >> As it looks like that the race happens on unmount I could try putting a > >> sleep > >> 60 into the > >> script that does the "mount && rsycn && umount" magic just before the > >> umount. > >> That would > >> allow anything that it slow to go away to perhaps release the mutexes > >> before > >> the umount. > >> > > If it still crashes with this patch, it might be worth a try. > > > > Or, if this patch still crashes, you could just delete the 3 lines that the > > patch moves, so the mutexes are never destroyed. This would result in a > > leak, > > but it would tell us if destroying these mutexes is the problem. > > > > Thanks for your willingness to test these, rick > > > >> Not a real fix of course but might help to verify what's going on. > >> > >> Greetings > >> Christian > >> > >> > >> -- > >> Christian Kratzer CK Software GmbH > >> Email: ck at cksoft.de Wildberger Weg 24/2 > >> Phone: +49 7032 893 997 - 0 D-71126 Gaeufelden > >> Fax: +49 7032 893 997 - 9 HRB 245288, Amtsgericht Stuttgart > >> Mobile: +49 171 1947 843 Geschaeftsfuehrer: Christian Kratzer > >> Web: http://www.cksoft.de/ > >> _______________________________________________ > >> freebsd-stable at freebsd.org mailing list > >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable > >> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" > >> > > > > -- > Christian Kratzer CK Software GmbH > Email: ck at cksoft.de Wildberger Weg 24/2 > Phone: +49 7032 893 997 - 0 D-71126 Gaeufelden > Fax: +49 7032 893 997 - 9 HRB 245288, Amtsgericht Stuttgart > Mobile: +49 171 1947 843 Geschaeftsfuehrer: Christian Kratzer > Web: http://www.cksoft.de/ > _______________________________________________ > freebsd-stable at freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" >-------------- next part -------------- A non-text attachment was scrubbed... Name: smbiod3.patch Type: text/x-patch Size: 1363 bytes Desc: not available URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20151018/998cbd79/attachment.bin>