Jeff has been too busy to send this patch himself, but it fixed the quota deadlock that I was able to provoke on my machine. Does it also solve the issue for others? Kris ----- Forwarded message from Jeff Roberson <jroberson@chesapeake.net> ----- X-Original-To: kkenn@localhost Delivered-To: kkenn@localhost.obsecurity.org X-Original-To: kris@FreeBSD.org Delivered-To: kris@FreeBSD.org Date: Wed, 15 Feb 2006 17:46:31 -0800 (PST) From: Jeff Roberson <jroberson@chesapeake.net> X-X-Sender: jroberson@10.0.0.1 To: Kris Kennaway <kris@obsecurity.org> cc: ssouhlal@FreeBSD.org, kan@FreeBSD.org Subject: Re: Quota deadlock In-Reply-To: <20060215081545.GA18583@xor.obsecurity.org> X-Scanned-By: MIMEDefang 2.52 on 216.240.101.25 X-UIDL: L0D!!!H0!!'W+"!=h9"! X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.0.1 Please try this patch. LK_NOWAIT is causing qsync() to spin forever. Index: ufs_quota.c ==================================================================RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_quota.c,v retrieving revision 1.79 diff -u -r1.79 ufs_quota.c --- ufs_quota.c 12 Feb 2006 13:20:06 -0000 1.79 +++ ufs_quota.c 16 Feb 2006 01:45:56 -0000 @@ -750,7 +750,7 @@ MNT_ILOCK(mp); continue; } - error = vget(vp, LK_EXCLUSIVE | LK_NOWAIT | LK_INTERLOCK, td); + error = vget(vp, LK_EXCLUSIVE | LK_INTERLOCK, td); if (error) { MNT_ILOCK(mp); if (error == ENOENT) { On Wed, 15 Feb 2006, Kris Kennaway wrote:>Quotas are enabled on this machine: > >db> show lockedvnods >Locked vnodes > >0xc6be4a80: tag ufs, type VREG > usecount 0, writecount 0, refcount 3 mountedhere 0 > flags () > v_object 0xc6bf4d98 ref 0 pages 2 > lock type ufs: EXCL (count 1) by thread 0xcb814000 (pid 2031)#0 > 0xc0504d12 at lockmgr+0x55a >#1 0xc056f19b at vop_stdlock+0x32 >#2 0xc06cffde at VOP_LOCK_APV+0xa6 >#3 0xc063d775 at ffs_lock+0x19 >#4 0xc06cffde at VOP_LOCK_APV+0xa6 >#5 0xc0587b58 at vn_lock+0xd3 >#6 0xc0586d5e at vn_close+0x7c >#7 0xc0587cd1 at vn_closefile+0xf0 >#8 0xc04efc78 at fdrop_locked+0xb9 >#9 0xc04efbb9 at fdrop+0x3c >#10 0xc04ee0ac at closef+0x428 >#11 0xc04eaf7c at close+0x245 >#12 0xc06b7359 at syscall+0x2e9 >#13 0xc06a140f at Xint0x80_syscall+0x1f > > ino 23557, on dev da0s1a > >0xc7610540: tag ufs, type VDIR > usecount 1, writecount 0, refcount 4 mountedhere 0 > flags () > v_object 0xc9fb6078 ref 0 pages 1 > lock type ufs: EXCL (count 1) by thread 0xcb8b2d00 (pid 2001)#0 > 0xc0504d12 at lockmgr+0x55a >#1 0xc056f19b at vop_stdlock+0x32 >#2 0xc06cffde at VOP_LOCK_APV+0xa6 >#3 0xc063d775 at ffs_lock+0x19 >#4 0xc06cffde at VOP_LOCK_APV+0xa6 >#5 0xc0587b58 at vn_lock+0xd3 >#6 0xc057981f at vget+0xf0 >#7 0xc056c22f at cache_lookup+0x3d0 >#8 0xc056c8f6 at vfs_cache_lookup+0xa4 >#9 0xc06cd997 at VOP_LOOKUP_APV+0xa6 >#10 0xc057163b at lookup+0x47a >#11 0xc0570efa at namei+0x431 >#12 0xc057cf38 at kern_statfs+0x6d >#13 0xc057cea5 at statfs+0x35 >#14 0xc06b7359 at syscall+0x2e9 >#15 0xc06a140f at Xint0x80_syscall+0x1f > > ino 1492256, on dev da0s1e >VNASSERT failed >0xcac88200: tag (null), type VMARKER > usecount 0, writecount 0, refcount 0 mountedhere 0 > flags () > >The only process running is umount: > >db> wh 2030 >Tracing pid 2030 tid 100176 td 0xcbaa41a0 >cpustop_handler(e8ef9a10,c06b648f,e8ef9998,e8ef9998,cbaa41a0) at >cpustop_handler+0x2c >ipi_nmi_handler(e8ef9998,e8ef9998,cbaa41a0,e8ef99a4,46) at >ipi_nmi_handler+0x29 >trap(e8ef0008,e8ef0028,c06a0028,50,0) at trap+0x3f >calltrap() at calltrap+0x5 >--- trap 0x13, eip = 0xc0504794, esp = 0xe8ef9a58, ebp = 0xe8ef9a78 --- >acquire(e8ef9afc,50,1050000,b2,cbaa41a0) at acquire+0x124 >lockmgr(cd6bd838,2012,cd6bd8a8,cbaa41a0,e8ef9b28) at lockmgr+0x4df >vop_stdlock(e8ef9b7c,c063c186,c057cc44,c0742080,e8ef9b7c) at >vop_stdlock+0x32 >VOP_LOCK_APV(c07425c0,e8ef9b7c,e8ef9b54,c06cffde,e8ef9b7c) at >VOP_LOCK_APV+0xa6 >ffs_lock(e8ef9b7c,ce51a758,87f,2012,cd6bd7e0) at ffs_lock+0x19 >VOP_LOCK_APV(c0742080,e8ef9b7c,138,ce51a690,ce51a690) at VOP_LOCK_APV+0xa6 >vn_lock(cd6bd7e0,2012,cbaa41a0,79d,2012) at vn_lock+0xd3 >vget(cd6bd7e0,2012,cbaa41a0,2ea,c6823488) at vget+0xf0 >qsync(c6823400,0,c070ad43,490,c6823488) at qsync+0x13e >ffs_sync(c6823400,2,cbaa41a0,cbaa41a0,c6823400) at ffs_sync+0x2e0 >sync(cbaa41a0,e8ef9d04,cbaa41a0,cbaa41a0,c7a57108) at sync+0x100 >syscall(3b,3b,3b,2,bfbfe958) at syscall+0x2e9 >Xint0x80_syscall() at Xint0x80_syscall+0x1f >--- syscall (36, FreeBSD ELF32, sync), eip = 0x280c260f, esp = 0xbfbfe8ac, >ebp = 0xbfbfe918 --- >db>----- End forwarded message ----- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060221/5c8bc100/attachment.bin
On Tue, Feb 21, 2006 at 02:56:01PM -0500, Kris Kennaway wrote:> Jeff has been too busy to send this patch himself, but it fixed the > quota deadlock that I was able to provoke on my machine. Does it also > solve the issue for others?I've seen 0 feedback about this so far. Since a number of people have reported that the quota deadlock is the only thing keeping them from upgrading from 5.x to 6.x, I expected a more enthusiastic testing response than this :-) Please, those of you who have reported this problem, test the patch and see if it works for you. If there are further problems, we can't fix them in time for 6.1 unless we hear about it in the next week or so. Kris> ----- Forwarded message from Jeff Roberson <jroberson@chesapeake.net> ----- > > X-Original-To: kkenn@localhost > Delivered-To: kkenn@localhost.obsecurity.org > X-Original-To: kris@FreeBSD.org > Delivered-To: kris@FreeBSD.org > Date: Wed, 15 Feb 2006 17:46:31 -0800 (PST) > From: Jeff Roberson <jroberson@chesapeake.net> > X-X-Sender: jroberson@10.0.0.1 > To: Kris Kennaway <kris@obsecurity.org> > cc: ssouhlal@FreeBSD.org, kan@FreeBSD.org > Subject: Re: Quota deadlock > In-Reply-To: <20060215081545.GA18583@xor.obsecurity.org> > X-Scanned-By: MIMEDefang 2.52 on 216.240.101.25 > X-UIDL: L0D!!!H0!!'W+"!=h9"! > X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.0.1 > > Please try this patch. LK_NOWAIT is causing qsync() to spin forever. > > Index: ufs_quota.c > ==================================================================> RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_quota.c,v > retrieving revision 1.79 > diff -u -r1.79 ufs_quota.c > --- ufs_quota.c 12 Feb 2006 13:20:06 -0000 1.79 > +++ ufs_quota.c 16 Feb 2006 01:45:56 -0000 > @@ -750,7 +750,7 @@ > MNT_ILOCK(mp); > continue; > } > - error = vget(vp, LK_EXCLUSIVE | LK_NOWAIT | LK_INTERLOCK, > td); > + error = vget(vp, LK_EXCLUSIVE | LK_INTERLOCK, td); > if (error) { > MNT_ILOCK(mp); > if (error == ENOENT) { > > > On Wed, 15 Feb 2006, Kris Kennaway wrote: > > >Quotas are enabled on this machine: > > > >db> show lockedvnods > >Locked vnodes > > > >0xc6be4a80: tag ufs, type VREG > > usecount 0, writecount 0, refcount 3 mountedhere 0 > > flags () > > v_object 0xc6bf4d98 ref 0 pages 2 > > lock type ufs: EXCL (count 1) by thread 0xcb814000 (pid 2031)#0 > > 0xc0504d12 at lockmgr+0x55a > >#1 0xc056f19b at vop_stdlock+0x32 > >#2 0xc06cffde at VOP_LOCK_APV+0xa6 > >#3 0xc063d775 at ffs_lock+0x19 > >#4 0xc06cffde at VOP_LOCK_APV+0xa6 > >#5 0xc0587b58 at vn_lock+0xd3 > >#6 0xc0586d5e at vn_close+0x7c > >#7 0xc0587cd1 at vn_closefile+0xf0 > >#8 0xc04efc78 at fdrop_locked+0xb9 > >#9 0xc04efbb9 at fdrop+0x3c > >#10 0xc04ee0ac at closef+0x428 > >#11 0xc04eaf7c at close+0x245 > >#12 0xc06b7359 at syscall+0x2e9 > >#13 0xc06a140f at Xint0x80_syscall+0x1f > > > > ino 23557, on dev da0s1a > > > >0xc7610540: tag ufs, type VDIR > > usecount 1, writecount 0, refcount 4 mountedhere 0 > > flags () > > v_object 0xc9fb6078 ref 0 pages 1 > > lock type ufs: EXCL (count 1) by thread 0xcb8b2d00 (pid 2001)#0 > > 0xc0504d12 at lockmgr+0x55a > >#1 0xc056f19b at vop_stdlock+0x32 > >#2 0xc06cffde at VOP_LOCK_APV+0xa6 > >#3 0xc063d775 at ffs_lock+0x19 > >#4 0xc06cffde at VOP_LOCK_APV+0xa6 > >#5 0xc0587b58 at vn_lock+0xd3 > >#6 0xc057981f at vget+0xf0 > >#7 0xc056c22f at cache_lookup+0x3d0 > >#8 0xc056c8f6 at vfs_cache_lookup+0xa4 > >#9 0xc06cd997 at VOP_LOOKUP_APV+0xa6 > >#10 0xc057163b at lookup+0x47a > >#11 0xc0570efa at namei+0x431 > >#12 0xc057cf38 at kern_statfs+0x6d > >#13 0xc057cea5 at statfs+0x35 > >#14 0xc06b7359 at syscall+0x2e9 > >#15 0xc06a140f at Xint0x80_syscall+0x1f > > > > ino 1492256, on dev da0s1e > >VNASSERT failed > >0xcac88200: tag (null), type VMARKER > > usecount 0, writecount 0, refcount 0 mountedhere 0 > > flags () > > > >The only process running is umount: > > > >db> wh 2030 > >Tracing pid 2030 tid 100176 td 0xcbaa41a0 > >cpustop_handler(e8ef9a10,c06b648f,e8ef9998,e8ef9998,cbaa41a0) at > >cpustop_handler+0x2c > >ipi_nmi_handler(e8ef9998,e8ef9998,cbaa41a0,e8ef99a4,46) at > >ipi_nmi_handler+0x29 > >trap(e8ef0008,e8ef0028,c06a0028,50,0) at trap+0x3f > >calltrap() at calltrap+0x5 > >--- trap 0x13, eip = 0xc0504794, esp = 0xe8ef9a58, ebp = 0xe8ef9a78 --- > >acquire(e8ef9afc,50,1050000,b2,cbaa41a0) at acquire+0x124 > >lockmgr(cd6bd838,2012,cd6bd8a8,cbaa41a0,e8ef9b28) at lockmgr+0x4df > >vop_stdlock(e8ef9b7c,c063c186,c057cc44,c0742080,e8ef9b7c) at > >vop_stdlock+0x32 > >VOP_LOCK_APV(c07425c0,e8ef9b7c,e8ef9b54,c06cffde,e8ef9b7c) at > >VOP_LOCK_APV+0xa6 > >ffs_lock(e8ef9b7c,ce51a758,87f,2012,cd6bd7e0) at ffs_lock+0x19 > >VOP_LOCK_APV(c0742080,e8ef9b7c,138,ce51a690,ce51a690) at VOP_LOCK_APV+0xa6 > >vn_lock(cd6bd7e0,2012,cbaa41a0,79d,2012) at vn_lock+0xd3 > >vget(cd6bd7e0,2012,cbaa41a0,2ea,c6823488) at vget+0xf0 > >qsync(c6823400,0,c070ad43,490,c6823488) at qsync+0x13e > >ffs_sync(c6823400,2,cbaa41a0,cbaa41a0,c6823400) at ffs_sync+0x2e0 > >sync(cbaa41a0,e8ef9d04,cbaa41a0,cbaa41a0,c7a57108) at sync+0x100 > >syscall(3b,3b,3b,2,bfbfe958) at syscall+0x2e9 > >Xint0x80_syscall() at Xint0x80_syscall+0x1f > >--- syscall (36, FreeBSD ELF32, sync), eip = 0x280c260f, esp = 0xbfbfe8ac, > >ebp = 0xbfbfe918 --- > >db> > > > ----- End forwarded message ------------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060228/30573b25/attachment.bin
On 28. feb. 2006, at 06.23, Kris Kennaway wrote:> On Tue, Feb 21, 2006 at 02:56:01PM -0500, Kris Kennaway wrote: >> Jeff has been too busy to send this patch himself, but it fixed the >> quota deadlock that I was able to provoke on my machine. Does it >> also >> solve the issue for others? > > I've seen 0 feedback about this so far. Since a number of people have > reported that the quota deadlock is the only thing keeping them from > upgrading from 5.x to 6.x, I expected a more enthusiastic testing > response than this :-) > > Please, those of you who have reported this problem, test the patch > and see if it works for you. If there are further problems, we can't > fix them in time for 6.1 unless we hear about it in the next week or > so.I have not been tough enough to even try 6.x on our busy NFS servers yet, so I did not know this problem existed. This is what you get for not taking the time to test future releases I guess :-/ I will put RELENG_6 with this patch on one of them next week and report back ASAP! Does anyone have a test-case that will provoke the problem at will, so I can make some tests in a controlled environment before I put it out in the wild? Frode Nordahl frode@nordahl.net
Hello, We upgraded out FreeBSD6 amd64 machine to 6-STABLE on Monday, 2006-03-06.>From there, we applied the quota-deadlock patch (which also seems to be inthe 1.80 version of ufs_quota.c). Since doing so, we have had no deadlocks on the machine. Before the patch, we were experiencing the deadlocks about every 8 hours in the middle of the day, which is the peak time for its operation. The machine is a webserver hosting nearly 1000 small sites, and is now able to do so quite well since it has stopped crashing ;) I'd say the patch works! Thanks! Matt Systems Administrator Successful Hosting matt@successfulhosting.com http://www.SuccessfulHosting.com Toll-Free: +1.866.494.5096 ================================The Success behind your web site! =================================