On Fri, Mar 16, 2007 at 09:21:13AM +0100, Ulrich Spoerlein
wrote:> Hi,
>
> One of our fileservers deadlocked, again. It is running RELENG_6 from
> 2006-11-14 and was running dump(8) -L on a 11% filled 400GB UFS2
> volume. It is hanging for 3h hours now, and there is no disk activity.
>
> # ps axl | grep snap
> 0 46 0 1 -4 0 0 8 snaplk DL ?? 98:58.88
> [bufdaemon]
> 0 48 0 0 -4 0 0 8 snaplk DL ?? 68:22.58 [syncer]
> 0 15179 11192 5 8 0 1708 1044 wait I+ p1 0:00.00 sh
> -c /sbin/mksnap_ffs /export/
> 0 18738 15179 0 -8 0 2776 1756 getbuf D+ p1 0:04.07
> /sbin/mksnap_ffs /export/homes
>
> Quotas are enabled in the server, but the filesystems are currently
> mounted without quota support (they were once mounted with userquota,
> though).
>
> Thanks,
> Uli
And, what is the question ? You know what is needed to debug the hang.
In addition to DDB, "options DEBUG_LOCKS, DEBUG_VFS_LOCKS" would be
very helpful.
From the wait channel for proc 18738, I suspect that the problem might
be the LOR between cg buffer lock and snaplk. The fix was committed to
CURRENT some time ago, and I'm waiting for re@ decision whether the
change could be MFCed.
Meantime, if you can systematically reproduce the problem, I would recommend
you, in addition to providing proper deadlock report, to try the following
patch (it was heavily reviewed and tested before committed to CURRENT):
http://people.freebsd.org/~kib/misc/bdwrite.8.patch
(just ignore xfs chunk).
>
> PS: I can't break to DDB, as it is not configured for this server.
> What are the recommended DDB settings for _production_ servers? I want
> them to reboot on panic, but be able to grab the panic string via
> serial console. Is something like this gonna do the trick? Is there
> some kind of performance impact?
>
> options KDB
> options DDB
> options KDB_UNATTENDED
> options ALT_BREAK_TO_DEBUGGER
>
> It should *NOT* enter the debugger, if I plug/pull an RS232 cable. I
> read somewhere, that some controllers do send a break if the cable
> gets pulled, IIRC.
It seems to be reasonable set of options (see above for DEBUG_VFS_LOCKS,
that would have some impact on performance).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20070316/f7c747f5/attachment.pgp