thr3ads.net - freebsd stable - smbfs crashes since approx. 10.1-RELEASE [Oct 2015]

If this information is useful, please help other people find it:
Share via:

John Baldwin

2015-Oct-07 00:08 UTC

smbfs crashes since approx. 10.1-RELEASE

On Monday, October 05, 2015 06:16:54 PM Rick Macklem
wrote:> Christian Kratzer wrote:
> > Hi,
> > 
> > I run a regular rsync job that runs from cron and copies stuff that
gets
> > created on a Windows smbfs share.
> > 
> > Starting about 10.1-RELEASE the VM has become unstable and started
panicing.
> > 
> > I have narrowed the issue down to the aforementioned rsync job.
> > 
> > When I move the job to a different VM the the other VM starts crashing
and
> > the VM without the job becomes stable agin.
> > 
> > I have panics and crashinfos stored in /var/crash if anybody is
interested:
> > 
> >      root at noc2:/var/crash # uname -a
> >      FreeBSD noc2.cksoft.de 10.2-RELEASE FreeBSD 10.2-RELEASE #0
r286666: Wed
> >      Aug 12 15:26:37 UTC 2015
> >      root at releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC 
amd64
> >      root at noc2:/var/crash # freebsd-version -u
> >      10.2-RELEASE-p5
> >      root at noc2:/var/crash # freebsd-version -k
> >      10.2-RELEASE
> >      root at noc2:/var/crash #
> > 
> > This is what I have in /var/crash/core.txt.0
> > 
> >      Fatal trap 12: page fault while in kernel mode
> >      cpuid = 0; apic id = 00
> >      fault virtual address   = 0x20
> >      fault code              = supervisor read data, page not present
> >      instruction pointer     = 0x20:0xffffffff80996c7c
> >      stack pointer           = 0x28:0xfffffe003d6c0ac0
> >      frame pointer           = 0x28:0xfffffe003d6c0af0
> >      code segment            = base 0x0, limit 0xfffff, type 0x1b
> >  			    = DPL 0, pres 1, long 1, def32 0, gran 1
> >      processor eflags        = resume, IOPL = 0
> >      current process         = 1349 (smbiod10)
> >      trap number             = 12
> >      panic: page fault
> >      cpuid = 0
> >      KDB: stack backtrace:
> >      #0 0xffffffff80984e30 at kdb_backtrace+0x60
> >      #1 0xffffffff809489e6 at vpanic+0x126
> >      #2 0xffffffff809488b3 at panic+0x43
> >      #3 0xffffffff80d4aadb at trap_fatal+0x36b
> >      #4 0xffffffff80d4addd at trap_pfault+0x2ed
> >      #5 0xffffffff80d4a47a at trap+0x47a
> >      #6 0xffffffff80d307f2 at calltrap+0x8
> >      #7 0xffffffff8092ebe0 at __mtx_unlock_sleep+0x60
> >      #8 0xffffffff8092eb69 at __mtx_unlock_flags+0x69
> >      #9 0xffffffff81a1b724 at smb_iod_thread+0xb4
> >      #10 0xffffffff8091244a at fork_exit+0x9a
> >      #11 0xffffffff80d30d2e at fork_trampoline+0xe
> >      Uptime: 2h43m55s
> >      Dumping 103 out of 999 MB: (CTRL-C to abort)
> >      ..16%..31%..47%..62%..78%..93%
> > 
> This crash is occurring when doing an mtx_unlock(&Giant).
Unfortunately, I'm not
> conversant w.r.t. this code. I've cc'd jhb@ in case he has some
insight.
> If you don't get any responses, I'd suggest reposting to
freebsd-current@ with
> "crashes in mtx_unlock(&Giant)" in the subject line.
> 
> Btw John, the code does tsleep() in a loop before the
mtx_unlock(&Giant). I do
> remember that was once allowed, but am not sure if it still is (ie a
tsleep() call
> while holding Giant)?
> 
> Hopefully someone who knows what is special about Giant that might cause
this will
> respond.
> 
> Good luck with it, rick
tsleep() with Giant is still allowed.  However, this sort of panic usually means
you unlocked a mutex you didn't hold (but without INVARIANTS enabled or
you'd get
an assertion failure earlier).

I don't see anything obviously wrong in smb_iod_thread() however.

If you have the crashdump, can you please run this in kgdb:

frame 9
p (struct mtx *)c
p *(struct mtx *)c

-- 
John Baldwin

Christian Kratzer

2015-Oct-07 06:52 UTC

head link

smbfs crashes since approx. 10.1-RELEASE

Hi,

On Tue, 6 Oct 2015, John Baldwin wrote:
<snipp/>>> This crash is occurring when doing an mtx_unlock(&Giant).
Unfortunately, I'm not
>> conversant w.r.t. this code. I've cc'd jhb@ in case he has some
insight.
>> If you don't get any responses, I'd suggest reposting to
freebsd-current@ with
>> "crashes in mtx_unlock(&Giant)" in the subject line.
>>
>> Btw John, the code does tsleep() in a loop before the
mtx_unlock(&Giant). I do
>> remember that was once allowed, but am not sure if it still is (ie a
tsleep() call
>> while holding Giant)?
>>
>> Hopefully someone who knows what is special about Giant that might
cause this will
>> respond.
>>
>> Good luck with it, rick
>
> tsleep() with Giant is still allowed.  However, this sort of panic usually
means
> you unlocked a mutex you didn't hold (but without INVARIANTS enabled or
you'd get
> an assertion failure earlier).
>
> I don't see anything obviously wrong in smb_iod_thread() however.
>
> If you have the crashdump, can you please run this in kgdb:
>
> frame 9
> p (struct mtx *)c
> p *(struct mtx *)c
yes I have. Here we go:

--snipp--
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x20
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80996c7c
stack pointer           = 0x28:0xfffffe004e79bac0
frame pointer           = 0x28:0xfffffe004e79baf0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 12235 (smbiod172)
trap number             = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff80984e30 at kdb_backtrace+0x60
#1 0xffffffff809489e6 at vpanic+0x126
#2 0xffffffff809488b3 at panic+0x43
#3 0xffffffff80d4aadb at trap_fatal+0x36b
#4 0xffffffff80d4addd at trap_pfault+0x2ed
#5 0xffffffff80d4a47a at trap+0x47a
#6 0xffffffff80d307f2 at calltrap+0x8
#7 0xffffffff8092ebe0 at __mtx_unlock_sleep+0x60
#8 0xffffffff8092eb69 at __mtx_unlock_flags+0x69
#9 0xffffffff81a1b724 at smb_iod_thread+0xb4
#10 0xffffffff8091244a at fork_exit+0x9a
#11 0xffffffff80d30d2e at fork_trampoline+0xe
Uptime: 1d18h34m4s
Dumping 161 out of 999 MB:..10%..20%..30%..40%..50%..60%..70%..80%..90%..100%

Reading symbols from /boot/kernel/smbfs.ko.symbols...done.
Loaded symbols for /boot/kernel/smbfs.ko.symbols
Reading symbols from /boot/kernel/libiconv.ko.symbols...done.
Loaded symbols for /boot/kernel/libiconv.ko.symbols
Reading symbols from /boot/kernel/libmchain.ko.symbols...done.
Loaded symbols for /boot/kernel/libmchain.ko.symbols
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
219     pcpu.h: No such file or directory.
         in pcpu.h
(kgdb) frame 9
#9  0xffffffff8092ebe0 in __mtx_unlock_sleep (c=0xfffff8002f531790,
opts=<value optimized out>,
     file=0xffffffff81a25801 "%s: Can't handle disordered parameters
%d:%d\n", line=1) at /usr/src/sys/kern/kern_mutex.c:791
791     /usr/src/sys/kern/kern_mutex.c: No such file or directory.
         in /usr/src/sys/kern/kern_mutex.c
Current language:  auto; currently minimal
(kgdb) p (struct mtx *)c
$1 = (struct mtx *) 0xfffff8002f531790
(kgdb) p *(struct mtx *)c
$2 = {lock_object = {lo_name = 0x6 <Address 0x6 out of bounds>, lo_flags =
0, lo_data = 0, lo_witness = 0xfffff8002f531798},
   mtx_lock = 1444181401}
(kgdb)
--snipp--

I can build a GENERIC kernel with INVARIANTS enabled on the box to see if we get
a better assertions next time this happens.

That is in case it happens at all with a debug build.

Greetings
Christian

-- 
Christian Kratzer                   CK Software GmbH
Email:   ck at cksoft.de               Wildberger Weg 24/2
Phone:   +49 7032 893 997 - 0       D-71126 Gaeufelden
Fax:     +49 7032 893 997 - 9       HRB 245288, Amtsgericht Stuttgart
Mobile:  +49 171 1947 843           Geschaeftsfuehrer: Christian Kratzer
Web:     http://www.cksoft.de/

freebsd stable - Oct 2015 - smbfs crashes since approx. 10.1-RELEASE

smbfs crashes since approx. 10.1-RELEASE

smbfs crashes since approx. 10.1-RELEASE