I did a buildworld on this box to bring it up to RELENG_8 for the BIND fixes. Unfortunately, the formerly solid box (April 13th kernel) panic'd tonight with Unread portion of the kernel message buffer: spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac8a0 (tid 100107) too long panic: spin lock held too long cpuid = 0 Uptime: 13h30m4s Physical memory: 2035 MB Its a somewhat busy box taking in mail as well as backups for a few servers over nfs. At the time, it would have been getting about 250Mb/s inbound on its gigabit interface. Full core.txt file at http://www.tancsa.com/core-jul8-2011.txt #0 doadump () at pcpu.h:231 231 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump () at pcpu.h:231 #1 0xc06fd6d3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:429 #2 0xc06fd937 in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:602 #3 0xc06ed95f in _mtx_lock_spin_failed (m=0x0) at /usr/src/sys/kern/kern_mutex.c:490 #4 0xc06ed9e5 in _mtx_lock_spin (m=0xc0b1d200, tid=3312388992, opts=0, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:526 #5 0xc0720254 in sched_add (td=0xc5dac5c0, flags=0) at /usr/src/sys/kern/sched_ule.c:1119 #6 0xc07203f9 in sched_wakeup (td=0xc5dac5c0) at /usr/src/sys/kern/sched_ule.c:1950 #7 0xc07061f8 in setrunnable (td=0xc5dac5c0) at /usr/src/sys/kern/kern_synch.c:499 #8 0xc07362af in sleepq_resume_thread (sq=0xca0da300, td=0xc5dac5c0, pri=Variable "pri" is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:751 #9 0xc0736e18 in sleepq_signal (wchan=0xc5fafe50, flags=1, pri=0, queue=0) at /usr/src/sys/kern/subr_sleepqueue.c:825 #10 0xc06b6764 in cv_signal (cvp=0xc5fafe50) at /usr/src/sys/kern/kern_condvar.c:422 #11 0xc08eaa0d in xprt_assignthread (xprt=Variable "xprt" is not available. ) at /usr/src/sys/rpc/svc.c:342 #12 0xc08ec502 in xprt_active (xprt=0xc95d9600) at /usr/src/sys/rpc/svc.c:378 #13 0xc08ee051 in svc_vc_soupcall (so=0xc6372ce0, arg=0xc95d9600, waitflag=1) at /usr/src/sys/rpc/svc_vc.c:747 #14 0xc075bbb1 in sowakeup (so=0xc6372ce0, sb=0xc6372d34) at /usr/src/sys/kern/uipc_sockbuf.c:191 #15 0xc08447bc in tcp_do_segment (m=0xcaa8d200, th=0xca6aa824, so=0xc6372ce0, tp=0xc63b4d20, drop_hdrlen=52, tlen=1448, iptos=0 '\0', ti_locked=2) at /usr/src/sys/netinet/tcp_input.c:1775 #16 0xc0847930 in tcp_input (m=0xcaa8d200, off0=20) at /usr/src/sys/netinet/tcp_input.c:1329 #17 0xc07ddaf7 in ip_input (m=0xcaa8d200) at /usr/src/sys/netinet/ip_input.c:787 #18 0xc07b8859 in netisr_dispatch_src (proto=1, source=0, m=0xcaa8d200) at /usr/src/sys/net/netisr.c:859 #19 0xc07b8af0 in netisr_dispatch (proto=1, m=0xcaa8d200) at /usr/src/sys/net/netisr.c:946 #20 0xc07ae5e1 in ether_demux (ifp=0xc56ed800, m=0xcaa8d200) at /usr/src/sys/net/if_ethersubr.c:894 #21 0xc07aeb5f in ether_input (ifp=0xc56ed800, m=0xcaa8d200) at /usr/src/sys/net/if_ethersubr.c:753 #22 0xc09977b2 in nfe_int_task (arg=0xc56ff000, pending=1) at /usr/src/sys/dev/nfe/if_nfe.c:2187 #23 0xc07387ca in taskqueue_run_locked (queue=0xc5702440) at /usr/src/sys/kern/subr_taskqueue.c:248 #24 0xc073895c in taskqueue_thread_loop (arg=0xc56ff130) at /usr/src/sys/kern/subr_taskqueue.c:385 #25 0xc06d1027 in fork_exit (callout=0xc07388a0 <taskqueue_thread_loop>, arg=0xc56ff130, frame=0xc538ed28) at /usr/src/sys/kern/kern_fork.c:861 #26 0xc09a5c24 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:275 (kgdb) -- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/
on 07/07/2011 08:55 Mike Tancsa said the following:> I did a buildworld on this box to bring it up to RELENG_8 for the BIND > fixes. Unfortunately, the formerly solid box (April 13th kernel) > panic'd tonight with > > Unread portion of the kernel message buffer: > spin lock 0xc0b1d200 (sched lock 1) held by 0xc5dac8a0 (tid 100107) too long > panic: spin lock held too long > cpuid = 0 > Uptime: 13h30m4s > Physical memory: 2035 MB > > > Its a somewhat busy box taking in mail as well as backups for a few > servers over nfs. At the time, it would have been getting about 250Mb/s > inbound on its gigabit interface. Full core.txt file at > > http://www.tancsa.com/core-jul8-2011.txtI thought that this was supposed to contain output of 'thread apply all bt' in kgdb. Anyway, I think that stacktrace for tid 100107 may have some useful information.> #0 doadump () at pcpu.h:231 > 231 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) #0 doadump () at pcpu.h:231 > #1 0xc06fd6d3 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:429 > #2 0xc06fd937 in panic (fmt=Variable "fmt" is not available. > ) at /usr/src/sys/kern/kern_shutdown.c:602 > #3 0xc06ed95f in _mtx_lock_spin_failed (m=0x0) > at /usr/src/sys/kern/kern_mutex.c:490 > #4 0xc06ed9e5 in _mtx_lock_spin (m=0xc0b1d200, tid=3312388992, opts=0, > file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:526 > #5 0xc0720254 in sched_add (td=0xc5dac5c0, flags=0) > at /usr/src/sys/kern/sched_ule.c:1119 > #6 0xc07203f9 in sched_wakeup (td=0xc5dac5c0) > at /usr/src/sys/kern/sched_ule.c:1950 > #7 0xc07061f8 in setrunnable (td=0xc5dac5c0) > at /usr/src/sys/kern/kern_synch.c:499 > #8 0xc07362af in sleepq_resume_thread (sq=0xca0da300, td=0xc5dac5c0, > pri=Variable "pri" is not available. > ) > at /usr/src/sys/kern/subr_sleepqueue.c:751 > #9 0xc0736e18 in sleepq_signal (wchan=0xc5fafe50, flags=1, pri=0, queue=0) > at /usr/src/sys/kern/subr_sleepqueue.c:825 > #10 0xc06b6764 in cv_signal (cvp=0xc5fafe50) > at /usr/src/sys/kern/kern_condvar.c:422 > #11 0xc08eaa0d in xprt_assignthread (xprt=Variable "xprt" is not available. > ) at /usr/src/sys/rpc/svc.c:342 > #12 0xc08ec502 in xprt_active (xprt=0xc95d9600) at > /usr/src/sys/rpc/svc.c:378 > #13 0xc08ee051 in svc_vc_soupcall (so=0xc6372ce0, arg=0xc95d9600, > waitflag=1) > at /usr/src/sys/rpc/svc_vc.c:747 > #14 0xc075bbb1 in sowakeup (so=0xc6372ce0, sb=0xc6372d34) > at /usr/src/sys/kern/uipc_sockbuf.c:191 > #15 0xc08447bc in tcp_do_segment (m=0xcaa8d200, th=0xca6aa824, > so=0xc6372ce0, > tp=0xc63b4d20, drop_hdrlen=52, tlen=1448, iptos=0 '\0', ti_locked=2) > at /usr/src/sys/netinet/tcp_input.c:1775 > #16 0xc0847930 in tcp_input (m=0xcaa8d200, off0=20) > at /usr/src/sys/netinet/tcp_input.c:1329 > #17 0xc07ddaf7 in ip_input (m=0xcaa8d200) > at /usr/src/sys/netinet/ip_input.c:787 > #18 0xc07b8859 in netisr_dispatch_src (proto=1, source=0, m=0xcaa8d200) > at /usr/src/sys/net/netisr.c:859 > #19 0xc07b8af0 in netisr_dispatch (proto=1, m=0xcaa8d200) > at /usr/src/sys/net/netisr.c:946 > #20 0xc07ae5e1 in ether_demux (ifp=0xc56ed800, m=0xcaa8d200) > at /usr/src/sys/net/if_ethersubr.c:894 > #21 0xc07aeb5f in ether_input (ifp=0xc56ed800, m=0xcaa8d200) > at /usr/src/sys/net/if_ethersubr.c:753 > #22 0xc09977b2 in nfe_int_task (arg=0xc56ff000, pending=1) > at /usr/src/sys/dev/nfe/if_nfe.c:2187 > #23 0xc07387ca in taskqueue_run_locked (queue=0xc5702440) > at /usr/src/sys/kern/subr_taskqueue.c:248 > #24 0xc073895c in taskqueue_thread_loop (arg=0xc56ff130) > at /usr/src/sys/kern/subr_taskqueue.c:385 > #25 0xc06d1027 in fork_exit (callout=0xc07388a0 <taskqueue_thread_loop>, > arg=0xc56ff130, frame=0xc538ed28) at /usr/src/sys/kern/kern_fork.c:861 > #26 0xc09a5c24 in fork_trampoline () at > /usr/src/sys/i386/i386/exception.s:275 > (kgdb) >-- Andriy Gapon
On 8/17/2011 1:38 PM, Hiroki Sato wrote:> Any progress on the investigation?Unfortunately, I cannot reproduce it yet with a debugging kernel :( ---Mike> > -- > spin lock 0xffffffff80cb46c0 (sched lock 0) held by 0xffffff01900458c0 (tid 100489) too long > panic: spin lock held too long > cpuid = 1 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > kdb_backtrace() at kdb_backtrace+0x37 > panic() at panic+0x187 > _mtx_lock_spin_failed() at _mtx_lock_spin_failed+0x39 > _mtx_lock_spin() at _mtx_lock_spin+0x9e > sched_add() at sched_add+0x117 > setrunnable() at setrunnable+0x78 > sleepq_signal() at sleepq_signal+0x7a > cv_signal() at cv_signal+0x3b > xprt_active() at xprt_active+0xe3 > svc_vc_soupcall() at svc_vc_soupcall+0xc > sowakeup() at sowakeup+0x69 > tcp_do_segment() at tcp_do_segment+0x25e7 > tcp_input() at tcp_input+0xcdd > ip_input() at ip_input+0xac > netisr_dispatch_src() at netisr_dispatch_src+0x7e > ether_demux() at ether_demux+0x14d > ether_input() at ether_input+0x17d > em_rxeof() at em_rxeof+0x1ca > em_handle_que() at em_handle_que+0x5b > taskqueue_run_locked() at taskqueue_run_locked+0x85 > taskqueue_thread_loop() at taskqueue_thread_loop+0x4e > fork_exit() at fork_exit+0x11f > fork_trampoline() at fork_trampoline+0xe > -- > > -- Hiroki-- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/
2011/8/17 Hiroki Sato <hrs@freebsd.org>:> Hi, > > Mike Tancsa <mike@sentex.net> wrote > ?in <4E15A08C.6090407@sentex.net>: > > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote: > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote: > mi> >> > mi> >> BTW, we had a similar panic, "spinlock held too long", the spinlock > mi> >> is the sched lock N, on busy 8-core box recently upgraded to the > mi> >> stable/8. Unfortunately, machine hung dumping core, so the stack trace > mi> >> for the owner thread was not available. > mi> >> > mi> >> I was unable to make any conclusion from the data that was present. > mi> >> If the situation is reproducable, you coulld try to revert r221937. This > mi> >> is pure speculation, though. > mi> > > mi> > Another crash just now after 5hrs uptime. I will try and revert r221937 > mi> > unless there is any extra debugging you want me to add to the kernel > mi> > instead ?? > > ?I am also suffering from a reproducible panic on an 8-STABLE box, an > ?NFS server with heavy I/O load. ?I could not get a kernel dump > ?because this panic locked up the machine just after it occurred, but > ?according to the stack trace it was the same as posted one. > ?Switching to an 8.2R kernel can prevent this panic. > > ?Any progress on the investigation?Hiroki, how easilly can you reproduce it? It would be important to have a DDB textdump with these informations: - bt - ps - show allpcpu - alltrace Alternatively, a coredump which has the stop cpu patch which Andryi can provide. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein
Attilio Rao <attilio@freebsd.org> wrote in <CAJ-FndCDOW0_B2MV0LZEo-tpEa9+7oAnJ7iHvKQsM4j4B0DLqg@mail.gmail.com>: at> 2011/8/17 Hiroki Sato <hrs@freebsd.org>: at> > Hi, at> > at> > Mike Tancsa <mike@sentex.net> wrote at> > ?in <4E15A08C.6090407@sentex.net>: at> > at> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote: at> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote: at> > mi> >> at> > mi> >> BTW, we had a similar panic, "spinlock held too long", the spinlock at> > mi> >> is the sched lock N, on busy 8-core box recently upgraded to the at> > mi> >> stable/8. Unfortunately, machine hung dumping core, so the stack trace at> > mi> >> for the owner thread was not available. at> > mi> >> at> > mi> >> I was unable to make any conclusion from the data that was present. at> > mi> >> If the situation is reproducable, you coulld try to revert r221937. This at> > mi> >> is pure speculation, though. at> > mi> > at> > mi> > Another crash just now after 5hrs uptime. I will try and revert r221937 at> > mi> > unless there is any extra debugging you want me to add to the kernel at> > mi> > instead ?? at> > at> > ?I am also suffering from a reproducible panic on an 8-STABLE box, an at> > ?NFS server with heavy I/O load. ?I could not get a kernel dump at> > ?because this panic locked up the machine just after it occurred, but at> > ?according to the stack trace it was the same as posted one. at> > ?Switching to an 8.2R kernel can prevent this panic. at> > at> > ?Any progress on the investigation? at> at> Hiroki, at> how easilly can you reproduce it? It takes 5-10 hours. I installed another kernel for debugging just now, so I think I will be able to collect more detail information in a couple of days. at> It would be important to have a DDB textdump with these informations: at> - bt at> - ps at> - show allpcpu at> - alltrace at> at> Alternatively, a coredump which has the stop cpu patch which Andryi can provide. Okay, I will post them once I can get another panic. Thanks! -- Hiroki -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20110817/08f80fb1/attachment.pgp
Hiroki Sato <hrs@freebsd.org> wrote in <20110818.043332.27079545013461535.hrs@allbsd.org>: hr> Attilio Rao <attilio@freebsd.org> wrote hr> in <CAJ-FndCDOW0_B2MV0LZEo-tpEa9+7oAnJ7iHvKQsM4j4B0DLqg@mail.gmail.com>: hr> hr> at> 2011/8/17 Hiroki Sato <hrs@freebsd.org>: hr> at> > Hi, hr> at> > hr> at> > Mike Tancsa <mike@sentex.net> wrote hr> at> > ?in <4E15A08C.6090407@sentex.net>: hr> at> > hr> at> > mi> On 7/7/2011 7:32 AM, Mike Tancsa wrote: hr> at> > mi> > On 7/7/2011 4:20 AM, Kostik Belousov wrote: hr> at> > mi> >> hr> at> > mi> >> BTW, we had a similar panic, "spinlock held too long", the spinlock hr> at> > mi> >> is the sched lock N, on busy 8-core box recently upgraded to the hr> at> > mi> >> stable/8. Unfortunately, machine hung dumping core, so the stack trace hr> at> > mi> >> for the owner thread was not available. hr> at> > mi> >> hr> at> > mi> >> I was unable to make any conclusion from the data that was present. hr> at> > mi> >> If the situation is reproducable, you coulld try to revert r221937. This hr> at> > mi> >> is pure speculation, though. hr> at> > mi> > hr> at> > mi> > Another crash just now after 5hrs uptime. I will try and revert r221937 hr> at> > mi> > unless there is any extra debugging you want me to add to the kernel hr> at> > mi> > instead ?? hr> at> > hr> at> > ?I am also suffering from a reproducible panic on an 8-STABLE box, an hr> at> > ?NFS server with heavy I/O load. ?I could not get a kernel dump hr> at> > ?because this panic locked up the machine just after it occurred, but hr> at> > ?according to the stack trace it was the same as posted one. hr> at> > ?Switching to an 8.2R kernel can prevent this panic. hr> at> > hr> at> > ?Any progress on the investigation? hr> at> hr> at> Hiroki, hr> at> how easilly can you reproduce it? hr> hr> It takes 5-10 hours. I installed another kernel for debugging just hr> now, so I think I will be able to collect more detail information in hr> a couple of days. hr> hr> at> It would be important to have a DDB textdump with these informations: hr> at> - bt hr> at> - ps hr> at> - show allpcpu hr> at> - alltrace hr> at> hr> at> Alternatively, a coredump which has the stop cpu patch which Andryi can provide. hr> hr> Okay, I will post them once I can get another panic. Thanks! I got the panic with a crash dump this time. The result of bt, ps, allpcpu, and traces can be found at the following URL: http://people.allbsd.org/~hrs/FreeBSD/pool-panic_20110818-1.txt -- Hiroki
Hiroki Sato
2011-Sep-09 20:10 UTC
ZFS panic on a RELENG_8 NFS server (Was: panic: spin lock held too long (RELENG_8 from today))
Hiroki Sato <hrs@freebsd.org> wrote in <20110907.094717.2272609566853905102.hrs@allbsd.org>: hr> During this investigation an disk has to be replaced and resilvering hr> it is now in progress. A deadlock and a forced reboot after that hr> make recovering of the zfs datasets take a long time (for committing hr> logs, I think), so I will try to reproduce the deadlock and get a hr> core dump after it finished. I think I could reproduce the symptoms. I have no idea about if these are exactly the same as occurred on my box before because the kernel was replaced with one with some debugging options, but these are reproducible at least. There are two symptoms. One is a panic. A DDB output when the panic occurred is the following: ---- Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x100000040 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8065b926 stack pointer = 0x28:0xffffff8257b94d70 frame pointer = 0x28:0xffffff8257b94e10 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 992 (nfsd: service) [thread pid 992 tid 100586 ] Stopped at witness_checkorder+0x246: movl 0x40(%r13),%ebx db> bt Tracing pid 992 tid 100586 td 0xffffff00595d9000 witness_checkorder() at witness_checkorder+0x246 _sx_slock() at _sx_slock+0x35 dmu_bonus_hold() at dmu_bonus_hold+0x57 zfs_zget() at zfs_zget+0x237 zfs_dirent_lock() at zfs_dirent_lock+0x488 zfs_dirlook() at zfs_dirlook+0x69 zfs_lookup() at zfs_lookup+0x26b zfs_freebsd_lookup() at zfs_freebsd_lookup+0x81 vfs_cache_lookup() at vfs_cache_lookup+0xf0 VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x40 lookup() at lookup+0x384 nfsvno_namei() at nfsvno_namei+0x268 nfsrvd_lookup() at nfsrvd_lookup+0xd6 nfsrvd_dorpc() at nfsrvd_dorpc+0x745 nfssvc_program() at nfssvc_program+0x447 svc_run_internal() at svc_run_internal+0x51b svc_thread_start() at svc_thread_start+0xb fork_exit() at fork_exit+0x11d fork_trampoline() at fork_trampoline+0xe --- trap 0xc, rip = 0x8006a031c, rsp = 0x7fffffffe6c8, rbp = 0x6 --- ---- The complete output can be found at: http://people.allbsd.org/~hrs/zfs_panic_20110909_1/pool-zfs-20110909-1.txt Another is getting stuck at ZFS access. The kernel is running with no panic but any access to ZFS datasets causes a program non-responsive. The DDB output can be found at: http://people.allbsd.org/~hrs/zfs_panic_20110909_2/pool-zfs-20110909-2.txt The trigger for the both was some access to a ZFS dataset from the NFS clients. Because the access pattern was complex I could not narrow down what was the culprit, but it seems timing-dependent and simply doing "rm -rf" locally on the server can sometimes trigger them. The crash dump and the kernel can be found at the following URLs: panic: http://people.allbsd.org/~hrs/zfs_panic_20110909_1/ no panic but unresponsive: http://people.allbsd.org/~hrs/zfs_panic_20110909_2/ kernel: http://people.allbsd.org/~hrs/zfs_panic_20110909_kernel/ -- Hiroki
Hiroki Sato <hrs@freebsd.org> wrote in <20110910.044841.232160047547388224.hrs@allbsd.org>: hr> Hiroki Sato <hrs@freebsd.org> wrote hr> in <20110907.094717.2272609566853905102.hrs@allbsd.org>: hr> hr> hr> During this investigation an disk has to be replaced and resilvering hr> hr> it is now in progress. A deadlock and a forced reboot after that hr> hr> make recovering of the zfs datasets take a long time (for committing hr> hr> logs, I think), so I will try to reproduce the deadlock and get a hr> hr> core dump after it finished. hr> hr> I think I could reproduce the symptoms. I have no idea about if hr> these are exactly the same as occurred on my box before because the hr> kernel was replaced with one with some debugging options, but these hr> are reproducible at least. hr> hr> There are two symptoms. One is a panic. A DDB output when the panic hr> occurred is the following: I am trying vfs.lookup_shared=0 and seeing how it goes. It seems the box can endure a high load which quickly caused these symptoms. -- Hiroki -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20110910/92bf13aa/attachment.pgp
Hiroki Sato <hrs@freebsd.org> wrote in <20110911.054601.1424617155148336027.hrs@allbsd.org>: hr> Hiroki Sato <hrs@freebsd.org> wrote hr> in <20110910.044841.232160047547388224.hrs@allbsd.org>: hr> hr> hr> Hiroki Sato <hrs@freebsd.org> wrote hr> hr> in <20110907.094717.2272609566853905102.hrs@allbsd.org>: hr> hr> hr> hr> hr> During this investigation an disk has to be replaced and resilvering hr> hr> hr> it is now in progress. A deadlock and a forced reboot after that hr> hr> hr> make recovering of the zfs datasets take a long time (for committing hr> hr> hr> logs, I think), so I will try to reproduce the deadlock and get a hr> hr> hr> core dump after it finished. hr> hr> hr> hr> I think I could reproduce the symptoms. I have no idea about if hr> hr> these are exactly the same as occurred on my box before because the hr> hr> kernel was replaced with one with some debugging options, but these hr> hr> are reproducible at least. hr> hr> hr> hr> There are two symptoms. One is a panic. A DDB output when the panic hr> hr> occurred is the following: hr> hr> I am trying vfs.lookup_shared=0 and seeing how it goes. It seems the hr> box can endure a high load which quickly caused these symptoms. There was no difference by the knob. The same panic or unresponsiveness still occurs in about 24-32 hours or so. -- Hiroki -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20110920/e039e2f2/attachment.pgp