Over the past couple of months, I've more or less regularly observed machines having more and more processes stuck in the zfs wchan. The processes never recover from that, and trying to reboot only gets the entire system stuck, without any console messages. I can enter the debugger, and I have saved a couple of dumps. The situation seems to be triggered by zfs receive'ing snapshots from the sister machine (both synchronize their active ZFS filesystems to each other, using zfs send and zfs receive). It appears it's the receiving causing trouble. Both machines run 8-stable from mid-February, with a single-disk ZFS pool, with ARC limited to 512M, prefetch and ZIL disabled via loader.conf. What should I be looking at to further diagnose? Thanks, Stefan -- Stefan Bethke <stb@lassitu.de> Fon +49 151 14070811
On 2010-Mar-09 10:15:53 +0100, Stefan Bethke <stb@lassitu.de> wrote:>Over the past couple of months, I've more or less regularly observed machines having more and more processes stuck in the zfs wchan. The processes never recover from that,How long have you waited? There seems to be a problem with low free memory handling that causes ZFS to turn into cold molasses. The work-around is to run a program that allocates a decent size chunk of memory and then exits. The original suggestion was something like: perl -e '@x = (0) x 1000000;' I've written a short program that allocates and dirties ~100MB and then exits and run it from cron. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20100309/3d8fa659/attachment.pgp
Le Tue, 9 Mar 2010 10:15:53 +0100, Stefan Bethke <stb@lassitu.de> a ?crit :> Over the past couple of months, I've more or less regularly observed > machines having more and more processes stuck in the zfs wchan. The > processes never recover from that, and trying to reboot only gets the > entire system stuck, without any console messages. I can enter the > debugger, and I have saved a couple of dumps. > > The situation seems to be triggered by zfs receive'ing snapshots from > the sister machine (both synchronize their active ZFS filesystems to > each other, using zfs send and zfs receive). It appears it's the > receiving causing trouble. > > Both machines run 8-stable from mid-February, with a single-disk ZFS > pool, with ARC limited to 512M, prefetch and ZIL disabled via > loader.conf. > > What should I be looking at to further diagnose? > > > Thanks, > Stefan >Hi, I encounter almost the same problem with a 8-STABLE build from the same time. When working a lot on files inside ~/, the directory get locked, any command trying to access it (from "ls" to application reading their configuration files) get stuck. The system is an amd64 desktop computer, 4GiB of memory and vfs.zfs.prefetch_disable is set to 0.
On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke wrote:> Over the past couple of months, I've more or less regularly observed machines having more and more processes stuck in the zfs wchan. The processes never recover from that, and trying to reboot only gets the entire system stuck, without any console messages. I can enter the debugger, and I have saved a couple of dumps. > > The situation seems to be triggered by zfs receive'ing snapshots from the sister machine (both synchronize their active ZFS filesystems to each other, using zfs send and zfs receive). It appears it's the receiving causing trouble. > > Both machines run 8-stable from mid-February, with a single-disk ZFS pool, with ARC limited to 512M, prefetch and ZIL disabled via loader.conf. > > What should I be looking at to further diagnose?What kind of hardware do you have there? There is 3-way deadlock I've a fix for which would be hard to trigger on single or dual core machines. Feel free to try the fix: http://people.freebsd.org/~pjd/patches/zfs_3way_deadlock.patch -- Pawel Jakub Dawidek http://www.wheelsystems.com pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20100309/d174119d/attachment.pgp
According to Stefan Bethke:> The situation seems to be triggered by zfs receive'ing snapshots from the sister machine (both synchronize their active ZFS filesystems to each other, using zfs send and zfs receive). It appears it's the receiving causing trouble.Have you tuned kern.maxvnodes in /etc/sysctl.conf? When I move to this new machine, I forgot to get it much higher than the default (now I use 200000) and it was locking up pretty soon. Had not a single lockup now. -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr In memoriam to Ondine : http://ondine.keltia.net/
Quoting Ivan Voras <ivoras@freebsd.org> (from Thu, 11 Mar 2010 11:59:01 +0100):> On 03/11/10 09:54, Borja Marcos wrote: > > I don't know about the rest but this: > >> CPU: Intel(R) Xeon(R) CPU L5420 @ 2.50GHz (2496.25-MHz >> K8-class CPU) > > does not agree with this: > >> FreeBSD/SMP: 1 package(s) x 8 core(s) > > The Xeon 54xx series does not come in 8 core packages. Either it is > 2xquad-core or a Xeon 55xx.Can also be a problem in the layout detection logic... Bye, Alexander. -- If we can ever make red tape nutritional, we can feed the world. -- R. Schaeberle, "Management Accounting" http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137