thr3ads.net - freebsd stable - panic: spin lock held too long (RELENG

If this information is useful, please help other people find it:
Share via:

Mike Tancsa

2011-Jul-07 05:56 UTC

panic: spin lock held too long (RELENG_8 from today)

I did a buildworld on this box to bring it up to RELENG_8 for the BIND
fixes.  Unfortunately, the formerly solid box (April 13th kernel)
panic'd tonight with

Unread portion of the kernel message buffer:
spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac8a0 (tid 100107) too long
panic: spin lock held too long
cpuid = 0
Uptime: 13h30m4s
Physical memory: 2035 MB


Its a somewhat busy box taking in mail as well as backups for a few
servers over nfs.  At the time, it would have been getting about 250Mb/s
inbound on its gigabit interface.  Full core.txt file at

http://www.tancsa.com/core-jul8-2011.txt




#0  doadump () at pcpu.h:231
231     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) #0  doadump () at pcpu.h:231
#1  0xc06fd6d3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:429
#2  0xc06fd937 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:602
#3  0xc06ed95f in _mtx_lock_spin_failed (m=0x0)
    at /usr/src/sys/kern/kern_mutex.c:490
#4  0xc06ed9e5 in _mtx_lock_spin (m=0xc0b1d200, tid=3312388992, opts=0,
    file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:526
#5  0xc0720254 in sched_add (td=0xc5dac5c0, flags=0)
    at /usr/src/sys/kern/sched_ule.c:1119
#6  0xc07203f9 in sched_wakeup (td=0xc5dac5c0)
    at /usr/src/sys/kern/sched_ule.c:1950
#7  0xc07061f8 in setrunnable (td=0xc5dac5c0)
    at /usr/src/sys/kern/kern_synch.c:499
#8  0xc07362af in sleepq_resume_thread (sq=0xca0da300, td=0xc5dac5c0,
pri=Variable "pri" is not available.
)
    at /usr/src/sys/kern/subr_sleepqueue.c:751
#9  0xc0736e18 in sleepq_signal (wchan=0xc5fafe50, flags=1, pri=0, queue=0)
    at /usr/src/sys/kern/subr_sleepqueue.c:825
#10 0xc06b6764 in cv_signal (cvp=0xc5fafe50)
    at /usr/src/sys/kern/kern_condvar.c:422
#11 0xc08eaa0d in xprt_assignthread (xprt=Variable "xprt" is not
available.
) at /usr/src/sys/rpc/svc.c:342
#12 0xc08ec502 in xprt_active (xprt=0xc95d9600) at
/usr/src/sys/rpc/svc.c:378
#13 0xc08ee051 in svc_vc_soupcall (so=0xc6372ce0, arg=0xc95d9600,
waitflag=1)
    at /usr/src/sys/rpc/svc_vc.c:747
#14 0xc075bbb1 in sowakeup (so=0xc6372ce0, sb=0xc6372d34)
    at /usr/src/sys/kern/uipc_sockbuf.c:191
#15 0xc08447bc in tcp_do_segment (m=0xcaa8d200, th=0xca6aa824,
so=0xc6372ce0,
    tp=0xc63b4d20, drop_hdrlen=52, tlen=1448, iptos=0 '\0', ti_locked=2)
    at /usr/src/sys/netinet/tcp_input.c:1775
#16 0xc0847930 in tcp_input (m=0xcaa8d200, off0=20)
    at /usr/src/sys/netinet/tcp_input.c:1329
#17 0xc07ddaf7 in ip_input (m=0xcaa8d200)
    at /usr/src/sys/netinet/ip_input.c:787
#18 0xc07b8859 in netisr_dispatch_src (proto=1, source=0, m=0xcaa8d200)
    at /usr/src/sys/net/netisr.c:859
#19 0xc07b8af0 in netisr_dispatch (proto=1, m=0xcaa8d200)
    at /usr/src/sys/net/netisr.c:946
#20 0xc07ae5e1 in ether_demux (ifp=0xc56ed800, m=0xcaa8d200)
    at /usr/src/sys/net/if_ethersubr.c:894
#21 0xc07aeb5f in ether_input (ifp=0xc56ed800, m=0xcaa8d200)
    at /usr/src/sys/net/if_ethersubr.c:753
#22 0xc09977b2 in nfe_int_task (arg=0xc56ff000, pending=1)
    at /usr/src/sys/dev/nfe/if_nfe.c:2187
#23 0xc07387ca in taskqueue_run_locked (queue=0xc5702440)
    at /usr/src/sys/kern/subr_taskqueue.c:248
#24 0xc073895c in taskqueue_thread_loop (arg=0xc56ff130)
    at /usr/src/sys/kern/subr_taskqueue.c:385
#25 0xc06d1027 in fork_exit (callout=0xc07388a0 <taskqueue_thread_loop>,
    arg=0xc56ff130, frame=0xc538ed28) at /usr/src/sys/kern/kern_fork.c:861
#26 0xc09a5c24 in fork_trampoline () at
/usr/src/sys/i386/i386/exception.s:275
(kgdb)

-- 
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/

Andriy Gapon

2011-Jul-07 07:36 UTC

head link

panic: spin lock held too long (RELENG_8 from today)

on 07/07/2011 08:55 Mike Tancsa said the following:> I did a buildworld on this box to bring it up to RELENG_8 for the BIND
> fixes.  Unfortunately, the formerly solid box (April 13th kernel)
> panic'd tonight with
> 
> Unread portion of the kernel message buffer:
> spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac8a0 (tid 100107) too
long
> panic: spin lock held too long
> cpuid = 0
> Uptime: 13h30m4s
> Physical memory: 2035 MB
> 
> 
> Its a somewhat busy box taking in mail as well as backups for a few
> servers over nfs.  At the time, it would have been getting about 250Mb/s
> inbound on its gigabit interface.  Full core.txt file at
> 
> http://www.tancsa.com/core-jul8-2011.txt
I thought that this was supposed to contain output of 'thread apply all
bt' in
kgdb.  Anyway, I think that stacktrace for tid 100107 may have some useful
information.
> #0  doadump () at pcpu.h:231
> 231     pcpu.h: No such file or directory.
>         in pcpu.h
> (kgdb) #0  doadump () at pcpu.h:231
> #1  0xc06fd6d3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:429
> #2  0xc06fd937 in panic (fmt=Variable "fmt" is not available.
> ) at /usr/src/sys/kern/kern_shutdown.c:602
> #3  0xc06ed95f in _mtx_lock_spin_failed (m=0x0)
>     at /usr/src/sys/kern/kern_mutex.c:490
> #4  0xc06ed9e5 in _mtx_lock_spin (m=0xc0b1d200, tid=3312388992, opts=0,
>     file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:526
> #5  0xc0720254 in sched_add (td=0xc5dac5c0, flags=0)
>     at /usr/src/sys/kern/sched_ule.c:1119
> #6  0xc07203f9 in sched_wakeup (td=0xc5dac5c0)
>     at /usr/src/sys/kern/sched_ule.c:1950
> #7  0xc07061f8 in setrunnable (td=0xc5dac5c0)
>     at /usr/src/sys/kern/kern_synch.c:499
> #8  0xc07362af in sleepq_resume_thread (sq=0xca0da300, td=0xc5dac5c0,
> pri=Variable "pri" is not available.
> )
>     at /usr/src/sys/kern/subr_sleepqueue.c:751
> #9  0xc0736e18 in sleepq_signal (wchan=0xc5fafe50, flags=1, pri=0, queue=0)
>     at /usr/src/sys/kern/subr_sleepqueue.c:825
> #10 0xc06b6764 in cv_signal (cvp=0xc5fafe50)
>     at /usr/src/sys/kern/kern_condvar.c:422
> #11 0xc08eaa0d in xprt_assignthread (xprt=Variable "xprt" is not
available.
> ) at /usr/src/sys/rpc/svc.c:342
> #12 0xc08ec502 in xprt_active (xprt=0xc95d9600) at
> /usr/src/sys/rpc/svc.c:378
> #13 0xc08ee051 in svc_vc_soupcall (so=0xc6372ce0, arg=0xc95d9600,
> waitflag=1)
>     at /usr/src/sys/rpc/svc_vc.c:747
> #14 0xc075bbb1 in sowakeup (so=0xc6372ce0, sb=0xc6372d34)
>     at /usr/src/sys/kern/uipc_sockbuf.c:191
> #15 0xc08447bc in tcp_do_segment (m=0xcaa8d200, th=0xca6aa824,
> so=0xc6372ce0,
>     tp=0xc63b4d20, drop_hdrlen=52, tlen=1448, iptos=0 '\0',
ti_locked=2)
>     at /usr/src/sys/netinet/tcp_input.c:1775
> #16 0xc0847930 in tcp_input (m=0xcaa8d200, off0=20)
>     at /usr/src/sys/netinet/tcp_input.c:1329
> #17 0xc07ddaf7 in ip_input (m=0xcaa8d200)
>     at /usr/src/sys/netinet/ip_input.c:787
> #18 0xc07b8859 in netisr_dispatch_src (proto=1, source=0, m=0xcaa8d200)
>     at /usr/src/sys/net/netisr.c:859
> #19 0xc07b8af0 in netisr_dispatch (proto=1, m=0xcaa8d200)
>     at /usr/src/sys/net/netisr.c:946
> #20 0xc07ae5e1 in ether_demux (ifp=0xc56ed800, m=0xcaa8d200)
>     at /usr/src/sys/net/if_ethersubr.c:894
> #21 0xc07aeb5f in ether_input (ifp=0xc56ed800, m=0xcaa8d200)
>     at /usr/src/sys/net/if_ethersubr.c:753
> #22 0xc09977b2 in nfe_int_task (arg=0xc56ff000, pending=1)
>     at /usr/src/sys/dev/nfe/if_nfe.c:2187
> #23 0xc07387ca in taskqueue_run_locked (queue=0xc5702440)
>     at /usr/src/sys/kern/subr_taskqueue.c:248
> #24 0xc073895c in taskqueue_thread_loop (arg=0xc56ff130)
>     at /usr/src/sys/kern/subr_taskqueue.c:385
> #25 0xc06d1027 in fork_exit (callout=0xc07388a0
<taskqueue_thread_loop>,
>     arg=0xc56ff130, frame=0xc538ed28) at /usr/src/sys/kern/kern_fork.c:861
> #26 0xc09a5c24 in fork_trampoline () at
> /usr/src/sys/i386/i386/exception.s:275
> (kgdb)
> 

-- 
Andriy Gapon

Mike Tancsa

2011-Aug-17 18:26 UTC

head link

panic: spin lock held too long (RELENG_8 from today)

On 8/17/2011 1:38 PM, Hiroki Sato wrote:>  Any progress on the investigation?
Unfortunately, I cannot reproduce it yet with a debugging kernel :(


	---Mike
> 
> --
> spin lock 0xffffffff80cb46c0 (sched lock 0) held by 0xffffff01900458c0 (tid
100489) too long
> panic: spin lock held too long
> cpuid = 1
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> kdb_backtrace() at kdb_backtrace+0x37
> panic() at panic+0x187
> _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39
> _mtx_lock_spin() at _mtx_lock_spin+0x9e
> sched_add() at sched_add+0x117
> setrunnable() at setrunnable+0x78
> sleepq_signal() at sleepq_signal+0x7a
> cv_signal() at cv_signal+0x3b
> xprt_active() at xprt_active+0xe3
> svc_vc_soupcall() at svc_vc_soupcall+0xc
> sowakeup() at sowakeup+0x69
> tcp_do_segment() at tcp_do_segment+0x25e7
> tcp_input() at tcp_input+0xcdd
> ip_input() at ip_input+0xac
> netisr_dispatch_src() at netisr_dispatch_src+0x7e
> ether_demux() at ether_demux+0x14d
> ether_input() at ether_input+0x17d
> em_rxeof() at em_rxeof+0x1ca
> em_handle_que() at em_handle_que+0x5b
> taskqueue_run_locked() at taskqueue_run_locked+0x85
> taskqueue_thread_loop() at taskqueue_thread_loop+0x4e
> fork_exit() at fork_exit+0x11f
> fork_trampoline() at fork_trampoline+0xe
> --
> 
> -- Hiroki

-- 
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/

Attilio Rao

2011-Aug-17 18:37 UTC

head link

panic: spin lock held too long (RELENG_8 from today)

2011/8/17 Hiroki Sato <hrs@freebsd.org>:> Hi,
>
> Mike Tancsa <mike@sentex.net> wrote
> ?in <4E15A08C.6090407@sentex.net>:
>
> mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote:
> mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote:
> mi> >>
> mi> >> BTW, we had a similar panic, "spinlock held too
long", the spinlock
> mi> >> is the sched lock N, on busy 8-core box recently upgraded
to the
> mi> >> stable/8. Unfortunately, machine hung dumping core, so the
stack trace
> mi> >> for the owner thread was not available.
> mi> >>
> mi> >> I was unable to make any conclusion from the data that was
present.
> mi> >> If the situation is reproducable, you coulld try to revert
r221937. This
> mi> >> is pure speculation, though.
> mi> >
> mi> > Another crash just now after 5hrs uptime. I will try and revert
r221937
> mi> > unless there is any extra debugging you want me to add to the
kernel
> mi> > instead ??
>
> ?I am also suffering from a reproducible panic on an 8-STABLE box, an
> ?NFS server with heavy I/O load. ?I could not get a kernel dump
> ?because this panic locked up the machine just after it occurred, but
> ?according to the stack trace it was the same as posted one.
> ?Switching to an 8.2R kernel can prevent this panic.
>
> ?Any progress on the investigation?
Hiroki,
how easilly can you reproduce it?

It would be important to have a DDB textdump with these informations:
- bt
- ps
- show allpcpu
- alltrace

Alternatively, a coredump which has the stop cpu patch which Andryi can provide.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein

Hiroki Sato

2011-Aug-17 19:44 UTC

head link

panic: spin lock held too long (RELENG_8 from today)

Attilio Rao <attilio@freebsd.org> wrote
  in <CAJ-FndCDOW0_B2MV0LZEo-tpEa9+7oAnJ7iHvKQsM4j4B0DLqg@mail.gmail.com>:

at> 2011/8/17 Hiroki Sato <hrs@freebsd.org>:
at> > Hi,
at> >
at> > Mike Tancsa <mike@sentex.net> wrote
at> > ?in <4E15A08C.6090407@sentex.net>:
at> >
at> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote:
at> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote:
at> > mi> >>
at> > mi> >> BTW, we had a similar panic, "spinlock held too
long", the spinlock
at> > mi> >> is the sched lock N, on busy 8-core box recently
upgraded to the
at> > mi> >> stable/8. Unfortunately, machine hung dumping core,
so the stack trace
at> > mi> >> for the owner thread was not available.
at> > mi> >>
at> > mi> >> I was unable to make any conclusion from the data
that was present.
at> > mi> >> If the situation is reproducable, you coulld try to
revert r221937. This
at> > mi> >> is pure speculation, though.
at> > mi> >
at> > mi> > Another crash just now after 5hrs uptime. I will try and
revert r221937
at> > mi> > unless there is any extra debugging you want me to add
to the kernel
at> > mi> > instead ??
at> >
at> > ?I am also suffering from a reproducible panic on an 8-STABLE box,
an
at> > ?NFS server with heavy I/O load. ?I could not get a kernel dump
at> > ?because this panic locked up the machine just after it occurred,
but
at> > ?according to the stack trace it was the same as posted one.
at> > ?Switching to an 8.2R kernel can prevent this panic.
at> >
at> > ?Any progress on the investigation?
at> 
at> Hiroki,
at> how easilly can you reproduce it?

 It takes 5-10 hours.  I installed another kernel for debugging just
 now, so I think I will be able to collect more detail information in
 a couple of days.

at> It would be important to have a DDB textdump with these informations:
at> - bt
at> - ps
at> - show allpcpu
at> - alltrace
at> 
at> Alternatively, a coredump which has the stop cpu patch which Andryi can
provide.

 Okay, I will post them once I can get another panic.  Thanks!

-- Hiroki
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20110817/08f80fb1/attachment.pgp

Hiroki Sato

2011-Aug-18 00:16 UTC

head link

panic: spin lock held too long (RELENG_8 from today)

Hiroki Sato <hrs@freebsd.org> wrote
  in <20110818.043332.27079545013461535.hrs@allbsd.org>:

hr> Attilio Rao <attilio@freebsd.org> wrote
hr>   in
<CAJ-FndCDOW0_B2MV0LZEo-tpEa9+7oAnJ7iHvKQsM4j4B0DLqg@mail.gmail.com>:
hr> 
hr> at> 2011/8/17 Hiroki Sato <hrs@freebsd.org>:
hr> at> > Hi,
hr> at> >
hr> at> > Mike Tancsa <mike@sentex.net> wrote
hr> at> > ?in <4E15A08C.6090407@sentex.net>:
hr> at> >
hr> at> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote:
hr> at> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote:
hr> at> > mi> >>
hr> at> > mi> >> BTW, we had a similar panic, "spinlock
held too long", the spinlock
hr> at> > mi> >> is the sched lock N, on busy 8-core box
recently upgraded to the
hr> at> > mi> >> stable/8. Unfortunately, machine hung dumping
core, so the stack trace
hr> at> > mi> >> for the owner thread was not available.
hr> at> > mi> >>
hr> at> > mi> >> I was unable to make any conclusion from the
data that was present.
hr> at> > mi> >> If the situation is reproducable, you coulld
try to revert r221937. This
hr> at> > mi> >> is pure speculation, though.
hr> at> > mi> >
hr> at> > mi> > Another crash just now after 5hrs uptime. I will
try and revert r221937
hr> at> > mi> > unless there is any extra debugging you want me
to add to the kernel
hr> at> > mi> > instead ??
hr> at> >
hr> at> > ?I am also suffering from a reproducible panic on an 8-STABLE
box, an
hr> at> > ?NFS server with heavy I/O load. ?I could not get a kernel
dump
hr> at> > ?because this panic locked up the machine just after it
occurred, but
hr> at> > ?according to the stack trace it was the same as posted one.
hr> at> > ?Switching to an 8.2R kernel can prevent this panic.
hr> at> >
hr> at> > ?Any progress on the investigation?
hr> at> 
hr> at> Hiroki,
hr> at> how easilly can you reproduce it?
hr> 
hr>  It takes 5-10 hours.  I installed another kernel for debugging just
hr>  now, so I think I will be able to collect more detail information in
hr>  a couple of days.
hr> 
hr> at> It would be important to have a DDB textdump with these
informations:
hr> at> - bt
hr> at> - ps
hr> at> - show allpcpu
hr> at> - alltrace
hr> at> 
hr> at> Alternatively, a coredump which has the stop cpu patch which
Andryi can provide.
hr> 
hr>  Okay, I will post them once I can get another panic.  Thanks!

 I got the panic with a crash dump this time.  The result of bt, ps,
 allpcpu, and traces can be found at the following URL:

  http://people.allbsd.org/~hrs/FreeBSD/pool-panic_20110818-1.txt

-- Hiroki

Hiroki Sato

2011-Sep-09 20:10 UTC

head link

ZFS panic on a RELENG_8 NFS server (Was: panic: spin lock held too long (RELENG_8 from today))

Hiroki Sato <hrs@freebsd.org> wrote
  in <20110907.094717.2272609566853905102.hrs@allbsd.org>:

hr>  During this investigation an disk has to be replaced and resilvering
hr>  it is now in progress.  A deadlock and a forced reboot after that
hr>  make recovering of the zfs datasets take a long time (for committing
hr>  logs, I think), so I will try to reproduce the deadlock and get a
hr>  core dump after it finished.

 I think I could reproduce the symptoms.  I have no idea about if
 these are exactly the same as occurred on my box before because the
 kernel was replaced with one with some debugging options, but these
 are reproducible at least.

 There are two symptoms.  One is a panic.  A DDB output when the panic
 occurred is the following:

----
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address	= 0x100000040
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff8065b926
stack pointer	        = 0x28:0xffffff8257b94d70
frame pointer	        = 0x28:0xffffff8257b94e10
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 992 (nfsd: service)
[thread pid 992 tid 100586 ]
Stopped at      witness_checkorder+0x246:       movl    0x40(%r13),%ebx

db> bt
Tracing pid 992 tid 100586 td 0xffffff00595d9000
witness_checkorder() at witness_checkorder+0x246
_sx_slock() at _sx_slock+0x35
dmu_bonus_hold() at dmu_bonus_hold+0x57
zfs_zget() at zfs_zget+0x237
zfs_dirent_lock() at zfs_dirent_lock+0x488
zfs_dirlook() at zfs_dirlook+0x69
zfs_lookup() at zfs_lookup+0x26b
zfs_freebsd_lookup() at zfs_freebsd_lookup+0x81
vfs_cache_lookup() at vfs_cache_lookup+0xf0
VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x40
lookup() at lookup+0x384
nfsvno_namei() at nfsvno_namei+0x268
nfsrvd_lookup() at nfsrvd_lookup+0xd6
nfsrvd_dorpc() at nfsrvd_dorpc+0x745
nfssvc_program() at nfssvc_program+0x447
svc_run_internal() at svc_run_internal+0x51b
svc_thread_start() at svc_thread_start+0xb
fork_exit() at fork_exit+0x11d
fork_trampoline() at fork_trampoline+0xe
--- trap 0xc, rip = 0x8006a031c, rsp = 0x7fffffffe6c8, rbp = 0x6 ---
----

 The complete output can be found at:

  http://people.allbsd.org/~hrs/zfs_panic_20110909_1/pool-zfs-20110909-1.txt

 Another is getting stuck at ZFS access.  The kernel is running with
 no panic but any access to ZFS datasets causes a program
 non-responsive.  The DDB output can be found at:

  http://people.allbsd.org/~hrs/zfs_panic_20110909_2/pool-zfs-20110909-2.txt

 The trigger for the both was some access to a ZFS dataset from the
 NFS clients.  Because the access pattern was complex I could not
 narrow down what was the culprit, but it seems timing-dependent and
 simply doing "rm -rf" locally on the server can sometimes trigger
 them.

 The crash dump and the kernel can be found at the following URLs:

  panic:
    http://people.allbsd.org/~hrs/zfs_panic_20110909_1/

  no panic but unresponsive:
    http://people.allbsd.org/~hrs/zfs_panic_20110909_2/

  kernel:
    http://people.allbsd.org/~hrs/zfs_panic_20110909_kernel/

-- Hiroki

Hiroki Sato

2011-Sep-10 20:47 UTC

head link

ZFS panic on a RELENG_8 NFS server

Hiroki Sato <hrs@freebsd.org> wrote
  in <20110910.044841.232160047547388224.hrs@allbsd.org>:

hr> Hiroki Sato <hrs@freebsd.org> wrote
hr>   in <20110907.094717.2272609566853905102.hrs@allbsd.org>:
hr>
hr> hr>  During this investigation an disk has to be replaced and
resilvering
hr> hr>  it is now in progress.  A deadlock and a forced reboot after that
hr> hr>  make recovering of the zfs datasets take a long time (for
committing
hr> hr>  logs, I think), so I will try to reproduce the deadlock and get a
hr> hr>  core dump after it finished.
hr>
hr>  I think I could reproduce the symptoms.  I have no idea about if
hr>  these are exactly the same as occurred on my box before because the
hr>  kernel was replaced with one with some debugging options, but these
hr>  are reproducible at least.
hr>
hr>  There are two symptoms.  One is a panic.  A DDB output when the panic
hr>  occurred is the following:

 I am trying vfs.lookup_shared=0 and seeing how it goes.  It seems the
 box can endure a high load which quickly caused these symptoms.

-- Hiroki
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20110910/92bf13aa/attachment.pgp

Hiroki Sato

2011-Sep-20 03:25 UTC

head link

ZFS panic on a RELENG_8 NFS server

Hiroki Sato <hrs@freebsd.org> wrote
  in <20110911.054601.1424617155148336027.hrs@allbsd.org>:

hr> Hiroki Sato <hrs@freebsd.org> wrote
hr>   in <20110910.044841.232160047547388224.hrs@allbsd.org>:
hr>
hr> hr> Hiroki Sato <hrs@freebsd.org> wrote
hr> hr>   in <20110907.094717.2272609566853905102.hrs@allbsd.org>:
hr> hr>
hr> hr> hr>  During this investigation an disk has to be replaced and
resilvering
hr> hr> hr>  it is now in progress.  A deadlock and a forced reboot
after that
hr> hr> hr>  make recovering of the zfs datasets take a long time (for
committing
hr> hr> hr>  logs, I think), so I will try to reproduce the deadlock
and get a
hr> hr> hr>  core dump after it finished.
hr> hr>
hr> hr>  I think I could reproduce the symptoms.  I have no idea about if
hr> hr>  these are exactly the same as occurred on my box before because
the
hr> hr>  kernel was replaced with one with some debugging options, but
these
hr> hr>  are reproducible at least.
hr> hr>
hr> hr>  There are two symptoms.  One is a panic.  A DDB output when the
panic
hr> hr>  occurred is the following:
hr>
hr>  I am trying vfs.lookup_shared=0 and seeing how it goes.  It seems the
hr>  box can endure a high load which quickly caused these symptoms.

 There was no difference by the knob.  The same panic or
 unresponsiveness still occurs in about 24-32 hours or so.

-- Hiroki
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20110920/e039e2f2/attachment.pgp

freebsd stable - Jul 2011 - panic: spin lock held too long (RELENG_8 from today)

panic: spin lock held too long (RELENG_8 from today)

panic: spin lock held too long (RELENG_8 from today)

panic: spin lock held too long (RELENG_8 from today)

panic: spin lock held too long (RELENG_8 from today)

panic: spin lock held too long (RELENG_8 from today)

panic: spin lock held too long (RELENG_8 from today)

ZFS panic on a RELENG_8 NFS server (Was: panic: spin lock held too long (RELENG_8 from today))

ZFS panic on a RELENG_8 NFS server

ZFS panic on a RELENG_8 NFS server