thr3ads.net - freebsd stable - Many processes stuck in zfs [Mar 2010]

If this information is useful, please help other people find it:
Share via:

Stefan Bethke

2010-Mar-09 09:16 UTC

Many processes stuck in zfs

Over the past couple of months, I've more or less regularly observed
machines having more and more processes stuck in the zfs wchan.  The processes
never recover from that, and trying to reboot only gets the entire system stuck,
without any console messages.  I can enter the debugger, and I have saved a
couple of dumps.

The situation seems to be triggered by zfs receive'ing snapshots from the
sister machine (both synchronize their active ZFS filesystems to each other,
using zfs send and zfs receive).  It appears it's the receiving causing
trouble.

Both machines run 8-stable from mid-February, with a single-disk ZFS pool, with
ARC limited to 512M, prefetch and ZIL disabled via loader.conf.

What should I be looking at to further diagnose?


Thanks,
Stefan

-- 
Stefan Bethke <stb@lassitu.de>   Fon +49 151 14070811

Peter Jeremy

2010-Mar-09 10:54 UTC

head link

Many processes stuck in zfs

On 2010-Mar-09 10:15:53 +0100, Stefan Bethke <stb@lassitu.de>
wrote:>Over the past couple of months, I've more or less regularly observed
machines having more and more processes stuck in the zfs wchan.  The processes
never recover from that,
How long have you waited?

There seems to be a problem with low free memory handling that causes ZFS
to turn into cold molasses.  The work-around is to run a program that
allocates a decent size chunk of memory and then exits.  The original
suggestion was something like:
	perl -e '@x = (0) x 1000000;'
I've written a short program that allocates and dirties ~100MB and then
exits and run it from cron.

-- 
Peter Jeremy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20100309/3d8fa659/attachment.pgp

Frédéric Bour

2010-Mar-09 10:58 UTC

head link

Many processes stuck in zfs

Le Tue, 9 Mar 2010 10:15:53 +0100,
Stefan Bethke <stb@lassitu.de> a ?crit :
> Over the past couple of months, I've more or less regularly observed
> machines having more and more processes stuck in the zfs wchan.  The
> processes never recover from that, and trying to reboot only gets the
> entire system stuck, without any console messages.  I can enter the
> debugger, and I have saved a couple of dumps.
> 
> The situation seems to be triggered by zfs receive'ing snapshots from
> the sister machine (both synchronize their active ZFS filesystems to
> each other, using zfs send and zfs receive).  It appears it's the
> receiving causing trouble.
> 
> Both machines run 8-stable from mid-February, with a single-disk ZFS
> pool, with ARC limited to 512M, prefetch and ZIL disabled via
> loader.conf.
> 
> What should I be looking at to further diagnose?
> 
> 
> Thanks,
> Stefan
> 
Hi,

I encounter almost the same problem with a 8-STABLE build
from the same time. When working a lot on files inside ~/,
the directory get locked, any command trying to access it (from "ls"
to
application reading their configuration files) get stuck.

The system is an amd64 desktop computer, 4GiB of memory and
vfs.zfs.prefetch_disable is set to 0.

Pawel Jakub Dawidek

2010-Mar-09 12:30 UTC

head link

Many processes stuck in zfs

On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke
wrote:> Over the past couple of months, I've more or less regularly observed
machines having more and more processes stuck in the zfs wchan.  The processes
never recover from that, and trying to reboot only gets the entire system stuck,
without any console messages.  I can enter the debugger, and I have saved a
couple of dumps.
> 
> The situation seems to be triggered by zfs receive'ing snapshots from
the sister machine (both synchronize their active ZFS filesystems to each other,
using zfs send and zfs receive).  It appears it's the receiving causing
trouble.
> 
> Both machines run 8-stable from mid-February, with a single-disk ZFS pool,
with ARC limited to 512M, prefetch and ZIL disabled via loader.conf.
> 
> What should I be looking at to further diagnose?
What kind of hardware do you have there? There is 3-way deadlock I've a
fix for which would be hard to trigger on single or dual core machines.

Feel free to try the fix:

	http://people.freebsd.org/~pjd/patches/zfs_3way_deadlock.patch

-- 
Pawel Jakub Dawidek                       http://www.wheelsystems.com
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20100309/d174119d/attachment.pgp

Ollivier Robert

2010-Mar-10 11:35 UTC

head link

Many processes stuck in zfs

According to Stefan Bethke:> The situation seems to be triggered by zfs receive'ing snapshots from
the sister machine (both synchronize their active ZFS filesystems to each other,
using zfs send and zfs receive).  It appears it's the receiving causing
trouble.
Have you tuned kern.maxvnodes in /etc/sysctl.conf?

When I move to this new machine, I forgot to get it much higher than the default
(now I use 200000) and it was locking up pretty soon.  Had not a single lockup
now.

-- 
Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr
In memoriam to Ondine : http://ondine.keltia.net/

Alexander Leidinger

2010-Mar-11 14:09 UTC

head link

Many processes stuck in zfs

Quoting Ivan Voras <ivoras@freebsd.org> (from Thu, 11 Mar 2010  
11:59:01 +0100):
> On 03/11/10 09:54, Borja Marcos wrote:
>
> I don't know about the rest but this:
>
>> CPU: Intel(R) Xeon(R) CPU           L5420  @ 2.50GHz (2496.25-MHz  
>> K8-class CPU)
>
> does not agree with this:
>
>> FreeBSD/SMP: 1 package(s) x 8 core(s)
>
> The Xeon 54xx series does not come in 8 core packages. Either it is  
> 2xquad-core or a Xeon 55xx.
Can also be a problem in the layout detection logic...

Bye,
Alexander.

-- 
If we can ever make red tape nutritional, we can feed the world.
		-- R. Schaeberle, "Management Accounting"

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137

freebsd stable - Mar 2010 - Many processes stuck in zfs

Many processes stuck in zfs

Many processes stuck in zfs

Many processes stuck in zfs

Many processes stuck in zfs

Many processes stuck in zfs

Many processes stuck in zfs