thr3ads.net - freebsd stable - NFS-exported ZFS instability [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Hiroki Sato

2013-Jan-02 01:53 UTC

NFS-exported ZFS instability

Hello,

 I have been in a trouble about my NFS server for a long time.  The
 symptom is that it stops working in one or two weeks after a boot.  I
 could not track down the cause yet, but it is reproducible and only
 occurred under a very high I/O load.

 It did not panic, just stopped working---while it responded to ping,
 userland programs seemed not working.  I could break it into DDB and
 get a kernel dump.  The following URLs are a log of ps, trace, and
 etc.:

  http://people.allbsd.org/~hrs/FreeBSD/pool.log.20130102
  http://people.allbsd.org/~hrs/FreeBSD/pool.dmesg.20130102

 Does anyone see how to debug this?  I guess this is due to a deadlock
 somewhere.  I have suffered from this problem for almost two years.
 The above log is from stable/9 as of Dec 19, but this have persisted
 since 8.X.

-- Hiroki
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20130102/84a6c04d/attachment.sig>

Perry Hutchison

2013-Jan-02 04:02 UTC

head link

NFS-exported ZFS instability

Hiroki Sato <hrs at freebsd.org> wrote:
>  I have been in a trouble about my NFS server for a long time.
>  The symptom is that it stops working in one or two weeks after
>  a boot ...  It did not panic, just stopped working---while it
>  responded to ping, userland programs seemed not working ...
>  Does anyone see how to debug this?  I guess this is due to a
>  deadlock somewhere ...
If you can afford the overhead, you could try running with some
of the kernel debug options enabled (e.g. WITNESS, INVARIANTS,
MUTEX_DEBUG).  See conf/NOTES for descriptions.

Rick Macklem

2013-Jan-02 13:24 UTC

head link

NFS-exported ZFS instability

Hiroki Sato wrote:> Hello,
> 
> I have been in a trouble about my NFS server for a long time. The
> symptom is that it stops working in one or two weeks after a boot. I
> could not track down the cause yet, but it is reproducible and only
> occurred under a very high I/O load.
> 
> It did not panic, just stopped working---while it responded to ping,
> userland programs seemed not working. I could break it into DDB and
> get a kernel dump. The following URLs are a log of ps, trace, and
> etc.:
> 
> http://people.allbsd.org/~hrs/FreeBSD/pool.log.20130102
> http://people.allbsd.org/~hrs/FreeBSD/pool.dmesg.20130102
> 
> Does anyone see how to debug this? I guess this is due to a deadlock
> somewhere. I have suffered from this problem for almost two years.
> The above log is from stable/9 as of Dec 19, but this have persisted
> since 8.X.
> Well, I took a quick glance at the log and there are a lot of processes
sleeping on "pfault" (in vm_waitpfault() in sys/vm/vm_page.c). I'm
no
vm guy, so I'm not sure when/why that will happen. The comment on the
function suggests they are waiting for free pages.

Maybe something as simple as running out of swap space or a problem
talking to the disk(s) that has the swap partition(s) or ???
(I'm talking through my hat here, because I'm not conversant with
 the vm side of things.)

I might take a closer look this evening and see if I can spot anything
in the log, rick
ps: I hope Alan and Kostik don't mind being added to the cc list.
> -- Hiroki

Rick Macklem

2013-Jan-29 23:06 UTC

head link

NFS-exported ZFS instability

Andriy Gapon wrote:> on 29/01/2013 23:44 Hiroki Sato said the following:
> >   http://people.allbsd.org/~hrs/FreeBSD/pool-20130130.txt
> >   http://people.allbsd.org/~hrs/FreeBSD/pool-20130130-info.txt
> 
> I recognize here a ZFS ARC deadlock that should have been prevented by
> r241773
> and its MFCs (r242858 for 9, r242859 for 8).
> Unfortunately, pool-20130130-info.txt shows a kernel built from r244417,
unless I somehow misread it.

rick
> See tid 100153 (arc reclaim thread), tid 100105 (pagedaemon) and tid
> 100639
> (nfsd in kmem_back).
> 
> --
> Andriy Gapon

freebsd stable - Jan 2013 - NFS-exported ZFS instability

NFS-exported ZFS instability

NFS-exported ZFS instability

NFS-exported ZFS instability

NFS-exported ZFS instability