Christian Kratzer wrote:> Hi Rick, > > On Mon, 12 Oct 2015, Rick Macklem wrote: > > > Christian Kratzer wrote: > >> Hi Rick, > >> > >> there was also a second more recent crash in /var/crash > >> > >> Mon Oct 12 03:01:16 CEST 2015 > >> > >> FreeBSD noc3.cksoft.de 10.2-STABLE FreeBSD 10.2-STABLE #2 r288980M: > >> Sun > >> Oct 11 08:37:40 CEST 2015 > >> ck at noc3.cksoft.de:/usr/obj/usr/src/sys/NOC amd64 > >> > >> panic: Assertion mtx_unowned(m) failed at > >> /usr/src/sys/kern/kern_mutex.c:955 > >> > > Oops, I screwed up. I should have looked at this panic assertion when you > > reported > > it before. Ok, so if I understand the assertion correctly, it means that > > another > > thread has the mutex locked. If this is correct, I'll have to take another > > look at > > the code and figure out how to wait for these other threads to finish with > > the mutexes. > > > > I do think the patch fixes the race I saw, but there must be other races in > > the code. > > > > I'll take another look, but if anyone else is conversant with netsmb, feel > > free to > > jump in, because it is all new to me. > > > > Unfortunately, I won't have any way to do testing for the next month or so, > > so any > > patches I do come up with will be "try this untested..". > > thats no problem. > > Just keep the patches coming when you have time and tell me when to reset > back to stable, > current or whatever so we don't lose sync of the status. >Well, you can try the attached one instead of the previous ones (ie. against stable). It just delays destroying the mutexes until the iod thread is exiting. I can't quite see why the previous patches wouldn't fix it, but this one leaves smb_iod_main() unchanged, so it is a simpler patch and doesn't affect semantics except for a slight delay in destroying the mutexes.> As it looks like that the race happens on unmount I could try putting a sleep > 60 into the > script that does the "mount && rsycn && umount" magic just before the umount. > That would > allow anything that it slow to go away to perhaps release the mutexes before > the umount. >If it still crashes with this patch, it might be worth a try. Or, if this patch still crashes, you could just delete the 3 lines that the patch moves, so the mutexes are never destroyed. This would result in a leak, but it would tell us if destroying these mutexes is the problem. Thanks for your willingness to test these, rick> Not a real fix of course but might help to verify what's going on. > > Greetings > Christian > > > -- > Christian Kratzer CK Software GmbH > Email: ck at cksoft.de Wildberger Weg 24/2 > Phone: +49 7032 893 997 - 0 D-71126 Gaeufelden > Fax: +49 7032 893 997 - 9 HRB 245288, Amtsgericht Stuttgart > Mobile: +49 171 1947 843 Geschaeftsfuehrer: Christian Kratzer > Web: http://www.cksoft.de/ > _______________________________________________ > freebsd-stable at freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" >-------------- next part -------------- A non-text attachment was scrubbed... Name: smbiod2.patch Type: text/x-patch Size: 688 bytes Desc: not available URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20151012/d642725c/attachment.bin>
Hi Rick, On Mon, 12 Oct 2015, Rick Macklem wrote: <snipp.>> Well, you can try the attached one instead of the previous ones (ie. against stable). > It just delays destroying the mutexes until the iod thread is exiting. > > I can't quite see why the previous patches wouldn't fix it, but this one leaves > smb_iod_main() unchanged, so it is a simpler patch and doesn't affect semantics > except for a slight delay in destroying the mutexes.patch applied this morning against plain 10-stable with wittness enabled ...>> As it looks like that the race happens on unmount I could try putting a sleep >> 60 into the >> script that does the "mount && rsycn && umount" magic just before the umount. >> That would >> allow anything that it slow to go away to perhaps release the mutexes before >> the umount. >> > If it still crashes with this patch, it might be worth a try.I had a sleep 60 before the umount over night and it did not crash. Could have been to short a wait though. I have removed the sleep 60 in order to give your patch a good testing> Or, if this patch still crashes, you could just delete the 3 lines that the > patch moves, so the mutexes are never destroyed. This would result in a leak, > but it would tell us if destroying these mutexes is the problem.Good point.> Thanks for your willingness to test these, rickNo problem. Thanks to you for wrapping your head around this. Greetings Christian -- Christian Kratzer CK Software GmbH Email: ck at cksoft.de Wildberger Weg 24/2 Phone: +49 7032 893 997 - 0 D-71126 Gaeufelden Fax: +49 7032 893 997 - 9 HRB 245288, Amtsgericht Stuttgart Mobile: +49 171 1947 843 Geschaeftsfuehrer: Christian Kratzer Web: http://www.cksoft.de/
Hi Rick,
looks like your latest patch nailed the issue. The box has been up for 3 days:
ck at noc3:~ % uptime
12:22PM up 3 days, 4:11, 1 user, load averages: 0.07, 0.10, 0.08
ck at noc3:~ %
If it does not crash over the weekend this seems to be it:
ck at noc3:/usr/src % svn diff sys/netsmb/smb_iod.c
Index: sys/netsmb/smb_iod.c
==================================================================---
sys/netsmb/smb_iod.c (revision 289211)
+++ sys/netsmb/smb_iod.c (working copy)
@@ -659,6 +659,11 @@
break;
tsleep(&iod->iod_flags, PWAIT, "90idle",
iod->iod_sleeptimo);
}
+
+ /* We can now safely destroy the mutexes and free the iod structure. */
+ smb_sl_destroy(&iod->iod_rqlock);
+ smb_sl_destroy(&iod->iod_evlock);
+ free(iod, M_SMBIOD);
mtx_unlock(&Giant);
kproc_exit(0);
}
@@ -695,9 +700,6 @@
smb_iod_destroy(struct smbiod *iod)
{
smb_iod_request(iod, SMBIOD_EV_SHUTDOWN | SMBIOD_EV_SYNC, NULL);
- smb_sl_destroy(&iod->iod_rqlock);
- smb_sl_destroy(&iod->iod_evlock);
- free(iod, M_SMBIOD);
return 0;
}
ck at noc3:/usr/src %
Can you get this committed into current and later stable ?
Greetings
Christian
On Mon, 12 Oct 2015, Rick Macklem wrote:
> Christian Kratzer wrote:
>> Hi Rick,
>>
>> On Mon, 12 Oct 2015, Rick Macklem wrote:
>>
>>> Christian Kratzer wrote:
>>>> Hi Rick,
>>>>
>>>> there was also a second more recent crash in /var/crash
>>>>
>>>> Mon Oct 12 03:01:16 CEST 2015
>>>>
>>>> FreeBSD noc3.cksoft.de 10.2-STABLE FreeBSD 10.2-STABLE #2
r288980M:
>>>> Sun
>>>> Oct 11 08:37:40 CEST 2015
>>>> ck at noc3.cksoft.de:/usr/obj/usr/src/sys/NOC amd64
>>>>
>>>> panic: Assertion mtx_unowned(m) failed at
>>>> /usr/src/sys/kern/kern_mutex.c:955
>>>>
>>> Oops, I screwed up. I should have looked at this panic assertion
when you
>>> reported
>>> it before. Ok, so if I understand the assertion correctly, it means
that
>>> another
>>> thread has the mutex locked. If this is correct, I'll have to
take another
>>> look at
>>> the code and figure out how to wait for these other threads to
finish with
>>> the mutexes.
>>>
>>> I do think the patch fixes the race I saw, but there must be other
races in
>>> the code.
>>>
>>> I'll take another look, but if anyone else is conversant with
netsmb, feel
>>> free to
>>> jump in, because it is all new to me.
>>>
>>> Unfortunately, I won't have any way to do testing for the next
month or so,
>>> so any
>>> patches I do come up with will be "try this untested..".
>>
>> thats no problem.
>>
>> Just keep the patches coming when you have time and tell me when to
reset
>> back to stable,
>> current or whatever so we don't lose sync of the status.
>>
> Well, you can try the attached one instead of the previous ones (ie.
against stable).
> It just delays destroying the mutexes until the iod thread is exiting.
>
> I can't quite see why the previous patches wouldn't fix it, but
this one leaves
> smb_iod_main() unchanged, so it is a simpler patch and doesn't affect
semantics
> except for a slight delay in destroying the mutexes.
>
>> As it looks like that the race happens on unmount I could try putting a
sleep
>> 60 into the
>> script that does the "mount && rsycn &&
umount" magic just before the umount.
>> That would
>> allow anything that it slow to go away to perhaps release the mutexes
before
>> the umount.
>>
> If it still crashes with this patch, it might be worth a try.
>
> Or, if this patch still crashes, you could just delete the 3 lines that the
> patch moves, so the mutexes are never destroyed. This would result in a
leak,
> but it would tell us if destroying these mutexes is the problem.
>
> Thanks for your willingness to test these, rick
>
>> Not a real fix of course but might help to verify what's going on.
>>
>> Greetings
>> Christian
>>
>>
>> --
>> Christian Kratzer CK Software GmbH
>> Email: ck at cksoft.de Wildberger Weg 24/2
>> Phone: +49 7032 893 997 - 0 D-71126 Gaeufelden
>> Fax: +49 7032 893 997 - 9 HRB 245288, Amtsgericht Stuttgart
>> Mobile: +49 171 1947 843 Geschaeftsfuehrer: Christian
Kratzer
>> Web: http://www.cksoft.de/
>> _______________________________________________
>> freebsd-stable at freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at
freebsd.org"
>>
>
--
Christian Kratzer CK Software GmbH
Email: ck at cksoft.de Wildberger Weg 24/2
Phone: +49 7032 893 997 - 0 D-71126 Gaeufelden
Fax: +49 7032 893 997 - 9 HRB 245288, Amtsgericht Stuttgart
Mobile: +49 171 1947 843 Geschaeftsfuehrer: Christian Kratzer
Web: http://www.cksoft.de/