TPCzfs at mklab.ph.rhul.ac.uk
2012-Jun-13 10:47 UTC
[zfs-discuss] (fwd) Re: ZFS NFS service hanging on Sunday morning problem
Dear All,
I have been advised to enquire here on zfs-discuss with the
ZFS problem described below, following discussion on Usenet NG
comp.unix.solaris. The full thread should be available here
https://groups.google.com/forum/#!topic/comp.unix.solaris/uEQzz1t-G1s
Many thanks
Tom Crane
-- forwarded message
cindy.swearingen at oracle.com wrote:
: On Tuesday, May 29, 2012 5:39:11 AM UTC-6, (unknown) wrote:
: > Dear All,
: > Can anyone give any tips on diagnosing the following recurring
problem?
: >
: > I have a Solaris box (server5, SunOS server5 5.10 Generic_147441-15
: > i86pc i386 i86pc ) whose ZFS FS NFS exported service fails every so
: > often, always in the early hours of Sunday morning. I am barely
: > familiar with Solaris but here what I have managed to discern when the
: > problem occurs;
: >
: > Jobs on other machines which access server5''s shares (via
automounter)
: > hang and attempts to manually remote-mount shares just timeout.
: >
: > Remotely, showmount -e server5 shows all the exported FS are available.
: >
: > On server5, the following services are running;
: >
: > root at server5:/var/adm# svcs | grep nfs
: > online May_25 svc:/network/nfs/status:default
: > online May_25 svc:/network/nfs/nlockmgr:default
: > online May_25 svc:/network/nfs/cbd:default
: > online May_25 svc:/network/nfs/mapid:default
: > online May_25 svc:/network/nfs/rquota:default
: > online May_25 svc:/network/nfs/client:default
: > online May_25 svc:/network/nfs/server:default
: >
: > On server5, I can list and read files on the affected FSs w/o problem
: > but any attempt to write to the FS (eg. copy a file to or rm a file
: > on the FS) just hangs the cp/rm process.
: >
: > On server5, using a zfs command zfs ''get sharenfs
pptank/local_linux''
: > displays the expected list of hosts/IPs with remote ro & rw access.
: >
: > Here is the O/P from some other hopefully relevant commands;
: >
: > root at server5:/# zpool status
: > pool: pptank
: > state: ONLINE
: > status: The pool is formatted using an older on-disk format. The pool
can
: > still be used, but some features are unavailable.
: > action: Upgrade the pool using ''zpool upgrade''. Once
this is done, the
: > pool will no longer be accessible on older software versions.
: > scan: none requested
: > config:
: >
: > NAME STATE READ WRITE CKSUM
: > pptank ONLINE 0 0 0
: > raidz1-0 ONLINE 0 0 0
: > c3t0d0 ONLINE 0 0 0
: > c3t1d0 ONLINE 0 0 0
: > c3t2d0 ONLINE 0 0 0
: > c3t3d0 ONLINE 0 0 0
: > c3t4d0 ONLINE 0 0 0
: > c3t5d0 ONLINE 0 0 0
: > c3t6d0 ONLINE 0 0 0
: >
: > errors: No known data errors
: >
: > root at server5:/# zpool list
: > NAME SIZE ALLOC FREE CAP HEALTH ALTROOT
: > pptank 12.6T 384G 12.3T 2% ONLINE -
: >
: > root at server5:/# zpool history
: > History for ''pptank'':
: > <just hangs here>
: >
: > root at server5:/# zpool iostat 5
: > capacity operations bandwidth
: > pool alloc free read write read write
: > ---------- ----- ----- ----- ----- ----- -----
: > pptank 384G 12.3T 92 115 3.08M 1.22M
: > pptank 384G 12.3T 1.11K 629 35.5M 3.03M
: > pptank 384G 12.3T 886 889 27.1M 3.68M
: > pptank 384G 12.3T 837 677 24.9M 2.82M
: > pptank 384G 12.3T 1.19K 757 37.4M 3.69M
: > pptank 384G 12.3T 1.02K 759 29.6M 3.90M
: > pptank 384G 12.3T 952 707 32.5M 3.09M
: > pptank 384G 12.3T 1.02K 831 34.5M 3.72M
: > pptank 384G 12.3T 707 503 23.5M 1.98M
: > pptank 384G 12.3T 626 707 20.8M 3.58M
: > pptank 384G 12.3T 816 838 26.1M 4.26M
: > pptank 384G 12.3T 942 800 30.1M 3.48M
: > pptank 384G 12.3T 677 675 21.7M 2.91M
: > pptank 384G 12.3T 590 725 19.2M 3.06M
: >
: >
: > top shows the following runnable processes. Nothing excessive here
AFAICT?
: >
: > last pid: 25282; load avg: 1.98, 1.95, 1.86; up 1+09:02:05
07:46:29
: > 72 processes: 67 sleeping, 1 running, 1 stopped, 3 on cpu
: > CPU states: 81.5% idle, 0.1% user, 18.3% kernel, 0.0% iowait, 0.0%
swap
: > Memory: 2048M phys mem, 32M free mem, 16G total swap, 16G free swap
: >
: > PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
: > 748 root 18 60 -20 103M 9752K cpu/1 78:44 6.62% nfsd
: > 24854 root 1 54 0 1480K 792K cpu/1 0:42 0.69% cp
: > 25281 root 1 59 0 3584K 2152K cpu/0 0:00 0.02% top
: >
: > The above cp job is as mentioned above, attempting to copy a file to
: > an effected FS, I''ve noticed is apparently not completely hung.
: >
: > The only thing that appears specific to Sunday morning is a cronjob to
: > remove old .nfs* files,
: >
: > root at server5:/# crontab -l | grep nfsfind
: > 15 3 * * 0 /usr/lib/fs/nfs/nfsfind
: >
: > Any suggestions on how to proceed?
: >
: > Many thanks
: > Tom Crane
: >
: > Ps. The email address in the header is just a spam-trap.
: > --
: > Tom Crane, IT support, RHUL Particle Physics.,
: > Dept. Physics, Royal Holloway, University of London, Egham Hill,
: > Egham, Surrey, TW20 0EX, England.
: > Email: T.Crane at rhul dot ac dot uk
: Hi Tom,
Hi Cindy,
Thanks for the followup
: I think SunOS server5 5.10 Generic_147441-15 is the Solaris 10 8/11
: release. Is this correct?
I think so,...
root at server5:/# cat /etc/release
Solaris 10 10/08 s10x_u6wos_07b X86
Copyright 2008 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 27 October 2008
: We looked at your truss output briefly and it looks like it is hanging
: trying to allocate memory. At least, that''s what the "br
...." statements
: are at the end.
: I will see if I can find out what diagnostic info would be help in
: this case.
Thanks. That would be much appreciated.
: You might get a faster response on zfs-discuss as John suggested.
I will CC to zfs-discuss.
Best regards
Tom.
: Thanks,
: Cindy
Ps. The email address in the header is just a spam-trap.
--
Tom Crane, Dept. Physics, Royal Holloway, University of London, Egham Hill,
Egham, Surrey, TW20 0EX, England.
Email: T.Crane at rhul dot ac dot uk
-- end of forwarded message --