Hi, I am running a FreeBSD 9.1-REL system with GENERIC kernel: FreeBSD xxxxx 9.1-RELEASE FreeBSD 9.1-RELEASE #0: Fri Jan 4 12:28:48 CET 2013 root at xxxxx:/usr/obj/usr/src/sys/GENERIC amd64 It is crashing a couple of times per week, without any real pattern. There are no hints in the syslog, and I only have the core debug to work from... It is a webserver, using a NFS mounted docroot (if it might help) - here's the backtrace: <snip> This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: Sleeping thread (tid 100256, pid 85641) owns a non-sleepable lock KDB: stack backtrace of thread 100256: #0 0xffffffff808f2d46 at mi_switch+0x186 #1 0xffffffff8092bb52 at sleepq_wait+0x42 #2 0xffffffff808f34d6 at _sleep+0x376 #3 0xffffffff80b4f3ae at vm_object_page_remove+0x2ce #4 0xffffffff80b5ac7d at vnode_pager_setsize+0x17d #5 0xffffffff8082102c at nfscl_loadattrcache+0x2cc #6 0xffffffff80818d37 at nfs_getattr+0x287 #7 0xffffffff8098f1c0 at vn_stat+0xb0 #8 0xffffffff809869d9 at kern_statat_vnhook+0xf9 #9 0xffffffff80986b55 at kern_statat+0x15 #10 0xffffffff80986c1a at sys_lstat+0x2a #11 0xffffffff80bd7ae6 at amd64_syscall+0x546 #12 0xffffffff80bc3447 at Xfast_syscall+0xf7 panic: sleeping thread cpuid = 0 KDB: stack backtrace: #0 0xffffffff809208a6 at kdb_backtrace+0x66 #1 0xffffffff808ea8be at panic+0x1ce #2 0xffffffff8092ed22 at propagate_priority+0x1d2 #3 0xffffffff8092fa4e at turnstile_wait+0x1be #4 0xffffffff808d8d48 at _mtx_lock_sleep+0xd8 #5 0xffffffff80820fa4 at nfscl_loadattrcache+0x244 #6 0xffffffff8081758c at ncl_readrpc+0xac #7 0xffffffff80824c45 at ncl_getpages+0x485 #8 0xffffffff80b5aa0c at vnode_pager_getpages+0x9c #9 0xffffffff80b3fc93 at vm_fault_hold+0x673 #10 0xffffffff80b41cc3 at vm_fault+0x73 #11 0xffffffff80bd84b4 at trap_pfault+0x124 #12 0xffffffff80bd8c6c at trap+0x49c #13 0xffffffff80bc315f at calltrap+0x8 Uptime: 8d0h54m10s Dumping 2381 out of 24547 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from /boot/kernel/geom_mirror.ko.symbols...done. done. Loaded symbols for /boot/kernel/geom_mirror.ko Reading symbols from /boot/kernel/geom_stripe.ko...Reading symbols from /boot/kernel/geom_stripe.ko.symbols...done. done. Loaded symbols for /boot/kernel/geom_stripe.ko Reading symbols from /boot/kernel/if_em.ko...Reading symbols from /boot/kernel/if_em.ko.symbols...done. done. Loaded symbols for /boot/kernel/if_em.ko Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols from /boot/kernel/linprocfs.ko.symbols...done. done. Loaded symbols for /boot/kernel/linprocfs.ko Reading symbols from /boot/kernel/linux.ko...Reading symbols from /boot/kernel/linux.ko.symbols...done. done. Loaded symbols for /boot/kernel/linux.ko #0 doadump (textdump=Variable "textdump" is not available. ) at pcpu.h:224 224 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump (textdump=Variable "textdump" is not available. ) at pcpu.h:224 #1 0xffffffff808ea3a1 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:448 #2 0xffffffff808ea897 in panic (fmt=0x1 <Address 0x1 out of bounds>) at /usr/src/sys/kern/kern_shutdown.c:636 #3 0xffffffff8092ed22 in propagate_priority (td=Variable "td" is not available. ) at /usr/src/sys/kern/subr_turnstile.c:227 #4 0xffffffff8092fa4e in turnstile_wait (ts=Variable "ts" is not available. ) at /usr/src/sys/kern/subr_turnstile.c:743 #5 0xffffffff808d8d48 in _mtx_lock_sleep (m=0xfffffe044a3c8238, tid=18446741888664231936, opts=Variable "opts" is not available. ) at /usr/src/sys/kern/kern_mutex.c:471 #6 0xffffffff80820fa4 in nfscl_loadattrcache (vpp=Variable "vpp" is not available. ) at /usr/src/sys/fs/nfsclient/nfs_clport.c:379 #7 0xffffffff8081758c in ncl_readrpc (vp=0xfffffe044a6cd780, uiop=0xffffff86962fc650, cred=Variable "cred" is not available. ) at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:1369 #8 0xffffffff80824c45 in ncl_getpages (ap=0xffffff86962fc6f0) at /usr/src/sys/fs/nfsclient/nfs_clbio.c:171 #9 0xffffffff80b5aa0c in vnode_pager_getpages (object=0xfffffe016aa16570, m=0xffffff86962fc770, count=Variable "count" is not available. ) at vnode_if.h:1154 #10 0xffffffff80b3fc93 in vm_fault_hold (map=0xfffffe007f7e3188, vaddr=34366988288, fault_type=1 '\001', fault_flags=Variable "fault_flags" is not available. ) at vm_pager.h:128 #11 0xffffffff80b41cc3 in vm_fault (map=0xfffffe007f7e3188, vaddr=34366988288, fault_type=Variable "fault_type" is not available. ) at /usr/src/sys/vm/vm_fault.c:229 #12 0xffffffff80bd84b4 in trap_pfault (frame=0xffffff86962fcc40, usermode=1) at /usr/src/sys/amd64/amd64/trap.c:740 #13 0xffffffff80bd8c6c in trap (frame=0xffffff86962fcc40) at /usr/src/sys/amd64/amd64/trap.c:358 #14 0xffffffff80bc315f in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228 #15 0x0000000802091386 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) </snip> Dump header from device /dev/mirror/gm0s1b Architecture: amd64 Architecture Version: 2 Dump Length: 2496667648B (2381 MB) Blocksize: 512 Dumptime: Mon Mar 18 19:35:00 2013 Hostname: xxxxxxxxx Magic: FreeBSD Kernel Dump Version String: FreeBSD 9.1-RELEASE #0: Fri Jan 4 12:28:48 CET 2013 root at xxxxx:/usr/obj/usr/src/sys/GENERIC Panic String: sleeping thread Dump Parity: 826144189 Bounds: 5 Dump Status: good Any ideas? Thanks, /mich
On Tue, Mar 19, 2013 at 06:18:06PM +0100, Michael Landin Hostbaek wrote:> Hi, > > I am running a FreeBSD 9.1-REL system with GENERIC kernel: > FreeBSD xxxxx 9.1-RELEASE FreeBSD 9.1-RELEASE #0: Fri Jan 4 12:28:48 CET 2013 root at xxxxx:/usr/obj/usr/src/sys/GENERIC amd64 > > > It is crashing a couple of times per week, without any real pattern. There are no hints in the syslog, and I only have the core debug to work from... > > It is a webserver, using a NFS mounted docroot (if it might help) - here's the backtrace: > > <snip> > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > Sleeping thread (tid 100256, pid 85641) owns a non-sleepable lock > KDB: stack backtrace of thread 100256: > #0 0xffffffff808f2d46 at mi_switch+0x186 > #1 0xffffffff8092bb52 at sleepq_wait+0x42 > #2 0xffffffff808f34d6 at _sleep+0x376 > #3 0xffffffff80b4f3ae at vm_object_page_remove+0x2ce > #4 0xffffffff80b5ac7d at vnode_pager_setsize+0x17d > #5 0xffffffff8082102c at nfscl_loadattrcache+0x2cc > #6 0xffffffff80818d37 at nfs_getattr+0x287 > #7 0xffffffff8098f1c0 at vn_stat+0xb0 > #8 0xffffffff809869d9 at kern_statat_vnhook+0xf9 > #9 0xffffffff80986b55 at kern_statat+0x15 > #10 0xffffffff80986c1a at sys_lstat+0x2a > #11 0xffffffff80bd7ae6 at amd64_syscall+0x546 > #12 0xffffffff80bc3447 at Xfast_syscall+0xf7 > panic: sleeping thread > cpuid = 0 > KDB: stack backtrace: > #0 0xffffffff809208a6 at kdb_backtrace+0x66 > #1 0xffffffff808ea8be at panic+0x1ce > #2 0xffffffff8092ed22 at propagate_priority+0x1d2 > #3 0xffffffff8092fa4e at turnstile_wait+0x1be > #4 0xffffffff808d8d48 at _mtx_lock_sleep+0xd8 > #5 0xffffffff80820fa4 at nfscl_loadattrcache+0x244 > #6 0xffffffff8081758c at ncl_readrpc+0xac > #7 0xffffffff80824c45 at ncl_getpages+0x485 > #8 0xffffffff80b5aa0c at vnode_pager_getpages+0x9c > #9 0xffffffff80b3fc93 at vm_fault_hold+0x673 > #10 0xffffffff80b41cc3 at vm_fault+0x73 > #11 0xffffffff80bd84b4 at trap_pfault+0x124 > #12 0xffffffff80bd8c6c at trap+0x49c > #13 0xffffffff80bc315f at calltrap+0x8 > Uptime: 8d0h54m10s > Dumping 2381 out of 24547 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > > Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from /boot/kernel/geom_mirror.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/geom_mirror.ko > Reading symbols from /boot/kernel/geom_stripe.ko...Reading symbols from /boot/kernel/geom_stripe.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/geom_stripe.ko > Reading symbols from /boot/kernel/if_em.ko...Reading symbols from /boot/kernel/if_em.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/if_em.ko > Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols from /boot/kernel/linprocfs.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/linprocfs.ko > Reading symbols from /boot/kernel/linux.ko...Reading symbols from /boot/kernel/linux.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/linux.ko > #0 doadump (textdump=Variable "textdump" is not available. > ) at pcpu.h:224 > 224 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) bt > #0 doadump (textdump=Variable "textdump" is not available. > ) at pcpu.h:224 > #1 0xffffffff808ea3a1 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:448 > #2 0xffffffff808ea897 in panic (fmt=0x1 <Address 0x1 out of bounds>) at /usr/src/sys/kern/kern_shutdown.c:636 > #3 0xffffffff8092ed22 in propagate_priority (td=Variable "td" is not available. > ) at /usr/src/sys/kern/subr_turnstile.c:227 > #4 0xffffffff8092fa4e in turnstile_wait (ts=Variable "ts" is not available. > ) at /usr/src/sys/kern/subr_turnstile.c:743 > #5 0xffffffff808d8d48 in _mtx_lock_sleep (m=0xfffffe044a3c8238, tid=18446741888664231936, opts=Variable "opts" is not available. > ) > at /usr/src/sys/kern/kern_mutex.c:471 > #6 0xffffffff80820fa4 in nfscl_loadattrcache (vpp=Variable "vpp" is not available. > ) at /usr/src/sys/fs/nfsclient/nfs_clport.c:379 > #7 0xffffffff8081758c in ncl_readrpc (vp=0xfffffe044a6cd780, uiop=0xffffff86962fc650, cred=Variable "cred" is not available. > ) > at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:1369 > #8 0xffffffff80824c45 in ncl_getpages (ap=0xffffff86962fc6f0) at /usr/src/sys/fs/nfsclient/nfs_clbio.c:171 > #9 0xffffffff80b5aa0c in vnode_pager_getpages (object=0xfffffe016aa16570, m=0xffffff86962fc770, count=Variable "count" is not available. > ) > at vnode_if.h:1154 > #10 0xffffffff80b3fc93 in vm_fault_hold (map=0xfffffe007f7e3188, vaddr=34366988288, fault_type=1 '\001', fault_flags=Variable "fault_flags" is not available. > ) > at vm_pager.h:128 > #11 0xffffffff80b41cc3 in vm_fault (map=0xfffffe007f7e3188, vaddr=34366988288, fault_type=Variable "fault_type" is not available. > ) > at /usr/src/sys/vm/vm_fault.c:229 > #12 0xffffffff80bd84b4 in trap_pfault (frame=0xffffff86962fcc40, usermode=1) at /usr/src/sys/amd64/amd64/trap.c:740 > #13 0xffffffff80bd8c6c in trap (frame=0xffffff86962fcc40) at /usr/src/sys/amd64/amd64/trap.c:358 > #14 0xffffffff80bc315f in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228 > #15 0x0000000802091386 in ?? () > Previous frame inner to this frame (corrupt stack?) > (kgdb) > </snip> > > > Dump header from device /dev/mirror/gm0s1b > Architecture: amd64 > Architecture Version: 2 > Dump Length: 2496667648B (2381 MB) > Blocksize: 512 > Dumptime: Mon Mar 18 19:35:00 2013 > Hostname: xxxxxxxxx > Magic: FreeBSD Kernel Dump > Version String: FreeBSD 9.1-RELEASE #0: Fri Jan 4 12:28:48 CET 2013 > root at xxxxx:/usr/obj/usr/src/sys/GENERIC > Panic String: sleeping thread > Dump Parity: 826144189 > Bounds: 5 > Dump Status: good > > > > Any ideas?The kernel panic is happening in NFS-related code. Rick Macklem (and/or John Baldwin) should be able to help with this; I've CC'd both here. You're going to need to provide the following details: 1. Contents of /etc/rc.conf 2. Contents of /etc/sysctl.conf (if modified) 3. Contents of /etc/fstab 4. ifconfig -a 5. OS used by the NFS server, and all configuration details pertaining to that system You may also be asked to upgrade to 9.1-STABLE, as there may be fixes for whatever this is in base/stable/9 that are not in -RELEASE, but this is speculative on my part. -- | Jeremy Chadwick jdc at koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Wed, Mar 20, 2013 at 12:13:05PM +0100, Michael Landin Hostbaek wrote:> > On Mar 20, 2013, at 10:49 AM, Konstantin Belousov <kostikbel at gmail.com> wrote: > > > > I do not like it. As I said in the previous response to Andrey, > > I think that moving the vnode_pager_setsize() after the unlock is > > better, since it reduces races with other thread seeing half-done > > attribute update or making attribute change simultaneously. > > OK - so should I wait for another patch - or?I think the following is what I mean. As an additional note, why nfs client does not trim the buffers when server reported node size change ? diff --git a/sys/fs/nfsclient/nfs_clport.c b/sys/fs/nfsclient/nfs_clport.c index a07a67f..4fe2e35 100644 --- a/sys/fs/nfsclient/nfs_clport.c +++ b/sys/fs/nfsclient/nfs_clport.c @@ -361,6 +361,8 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper, struct nfsnode *np; struct nfsmount *nmp; struct timespec mtime_save; + u_quad_t nsize; + int setnsize; /* * If v_type == VNON it is a new node, so fill in the v_type, @@ -418,6 +420,7 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper, } else vap->va_fsid = vp->v_mount->mnt_stat.f_fsid.val[0]; np->n_attrstamp = time_second; + setnsize = 0; if (vap->va_size != np->n_size) { if (vap->va_type == VREG) { if (dontshrink && vap->va_size < np->n_size) { @@ -444,10 +447,13 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper, np->n_size = vap->va_size; np->n_flag |= NSIZECHANGED; } - vnode_pager_setsize(vp, np->n_size); } else { np->n_size = vap->va_size; } + if (vap->va_type == VREG || vap->va_type == VDIR) { + setnsize = 1; + nsize = vap->va_size; + } } /* * The following checks are added to prevent a race between (say) @@ -480,6 +486,8 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr *nap, void *nvaper, KDTRACE_NFS_ATTRCACHE_LOAD_DONE(vp, vap, 0); #endif NFSUNLOCKNODE(np); + if (setnsize) + vnode_pager_setsize(vp, nsize); return (0); } -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 834 bytes Desc: not available URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20130320/8f479531/attachment.sig>
On Wednesday, March 20, 2013 9:22:22 am Konstantin Belousov wrote:> On Wed, Mar 20, 2013 at 12:13:05PM +0100, Michael Landin Hostbaek wrote: > > > > On Mar 20, 2013, at 10:49 AM, Konstantin Belousov <kostikbel at gmail.com>wrote:> > > > > > I do not like it. As I said in the previous response to Andrey, > > > I think that moving the vnode_pager_setsize() after the unlock is > > > better, since it reduces races with other thread seeing half-done > > > attribute update or making attribute change simultaneously. > > > > OK - so should I wait for another patch - or? > > I think the following is what I mean. As an additional note, why nfs > client does not trim the buffers when server reported node size change ?Will changing the size always result in an mtime change forcing the client to throw away the data on the next read or fault anyway (or does it only affect ctime)? -- John Baldwin
On Wed, Mar 20, 2013 at 09:43:20AM -0400, John Baldwin wrote:> On Wednesday, March 20, 2013 9:22:22 am Konstantin Belousov wrote: > > On Wed, Mar 20, 2013 at 12:13:05PM +0100, Michael Landin Hostbaek wrote: > > > > > > On Mar 20, 2013, at 10:49 AM, Konstantin Belousov <kostikbel at gmail.com> > wrote: > > > > > > > > I do not like it. As I said in the previous response to Andrey, > > > > I think that moving the vnode_pager_setsize() after the unlock is > > > > better, since it reduces races with other thread seeing half-done > > > > attribute update or making attribute change simultaneously. > > > > > > OK - so should I wait for another patch - or? > > > > I think the following is what I mean. As an additional note, why nfs > > client does not trim the buffers when server reported node size change ? > > Will changing the size always result in an mtime change forcing the client to > throw away the data on the next read or fault anyway (or does it only affect > ctime)?UFS only modifies ctime on truncation, it seems. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 834 bytes Desc: not available URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20130320/ae182526/attachment.sig>
On Wed, Mar 20, 2013 at 08:58:08PM +0200, Konstantin Belousov wrote:> On Wed, Mar 20, 2013 at 09:43:20AM -0400, John Baldwin wrote: > > On Wednesday, March 20, 2013 9:22:22 am Konstantin Belousov wrote: > > > On Wed, Mar 20, 2013 at 12:13:05PM +0100, Michael Landin Hostbaek wrote: > > > > > > > > On Mar 20, 2013, at 10:49 AM, Konstantin Belousov <kostikbel at gmail.com> > > wrote: > > > > > > > > > > I do not like it. As I said in the previous response to Andrey, > > > > > I think that moving the vnode_pager_setsize() after the unlock is > > > > > better, since it reduces races with other thread seeing half-done > > > > > attribute update or making attribute change simultaneously. > > > > > > > > OK - so should I wait for another patch - or? > > > > > > I think the following is what I mean. As an additional note, why nfs > > > client does not trim the buffers when server reported node size change ? > > > > Will changing the size always result in an mtime change forcing the client to > > throw away the data on the next read or fault anyway (or does it only affect > > ctime)? > > UFS only modifies ctime on truncation, it seems.No, I was wrong. ffs_truncate() indeed only sets both IN_CHANGE | IN_UPDATE flags for the inode, and IN_UPDATE causes mtime update in ufs_itimes(), called from UFS_UPDATE(). -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 834 bytes Desc: not available URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20130320/d2151ac6/attachment.sig>