thr3ads.net - zfs discuss - [zfs-discuss] (fwd) Re: ZFS NFS service hanging on Sunday morning problem [Jun 2012]

If this information is useful, please help other people find it:
Share via:
TPCzfs at mklab.ph.rhul.ac.uk
2012-Jun-13 10:47 UTC
[zfs-discuss] (fwd) Re: ZFS NFS service hanging on Sunday morning problem

Dear All,
	I have been advised to enquire here on zfs-discuss with the
ZFS problem described below, following discussion on Usenet NG 
comp.unix.solaris.  The full thread should be available here 
https://groups.google.com/forum/#!topic/comp.unix.solaris/uEQzz1t-G1s

Many thanks
Tom Crane



-- forwarded message

cindy.swearingen at oracle.com wrote:
: On Tuesday, May 29, 2012 5:39:11 AM UTC-6, (unknown) wrote:
: > Dear All,
: >    Can anyone give any tips on diagnosing the following recurring
problem?
: > 
: > I have a Solaris box (server5, SunOS server5 5.10 Generic_147441-15
: > i86pc i386 i86pc ) whose ZFS FS NFS exported service fails every so
: > often, always in the early hours of Sunday morning. I am barely
: > familiar with Solaris but here what I have managed to discern when the
: > problem occurs;
: > 
: > Jobs on other machines which access server5''s shares (via
automounter)
: > hang and attempts to manually remote-mount shares just timeout.
: > 
: > Remotely, showmount -e server5 shows all the exported FS are available.
: > 
: > On server5, the following services are running;
: > 
: > root at server5:/var/adm# svcs | grep nfs                 
: > online         May_25   svc:/network/nfs/status:default
: > online         May_25   svc:/network/nfs/nlockmgr:default
: > online         May_25   svc:/network/nfs/cbd:default
: > online         May_25   svc:/network/nfs/mapid:default
: > online         May_25   svc:/network/nfs/rquota:default
: > online         May_25   svc:/network/nfs/client:default
: > online         May_25   svc:/network/nfs/server:default
: > 
: > On server5, I can list and read files on the affected FSs w/o problem
: > but any attempt to write to the FS (eg. copy a file to or rm a file
: > on the FS) just hangs the cp/rm process.
: > 
: > On server5, using a zfs command zfs ''get sharenfs
pptank/local_linux''
: > displays the expected list of hosts/IPs with remote ro & rw access.
: > 
: > Here is the O/P from some other hopefully relevant commands;
: > 
: > root at server5:/# zpool status
: >   pool: pptank
: >  state: ONLINE
: > status: The pool is formatted using an older on-disk format.  The pool
can
: >         still be used, but some features are unavailable.
: > action: Upgrade the pool using ''zpool upgrade''.  Once
this is done, the
: >         pool will no longer be accessible on older software versions.
: >  scan: none requested
: > config:
: > 
: >         NAME        STATE     READ WRITE CKSUM
: >         pptank      ONLINE       0     0     0
: >           raidz1-0  ONLINE       0     0     0
: >             c3t0d0  ONLINE       0     0     0
: >             c3t1d0  ONLINE       0     0     0
: >             c3t2d0  ONLINE       0     0     0
: >             c3t3d0  ONLINE       0     0     0
: >             c3t4d0  ONLINE       0     0     0
: >             c3t5d0  ONLINE       0     0     0
: >             c3t6d0  ONLINE       0     0     0
: > 
: > errors: No known data errors
: > 
: > root at server5:/# zpool list
: > NAME     SIZE  ALLOC   FREE    CAP  HEALTH  ALTROOT
: > pptank  12.6T   384G  12.3T     2%  ONLINE  -
: > 
: > root at server5:/# zpool history
: > History for ''pptank'':
: > <just hangs here>
: > 
: > root at server5:/# zpool iostat 5
: >                capacity     operations    bandwidth
: > pool        alloc   free   read  write   read  write
: > ----------  -----  -----  -----  -----  -----  -----
: > pptank       384G  12.3T     92    115  3.08M  1.22M
: > pptank       384G  12.3T  1.11K    629  35.5M  3.03M
: > pptank       384G  12.3T    886    889  27.1M  3.68M
: > pptank       384G  12.3T    837    677  24.9M  2.82M
: > pptank       384G  12.3T  1.19K    757  37.4M  3.69M
: > pptank       384G  12.3T  1.02K    759  29.6M  3.90M
: > pptank       384G  12.3T    952    707  32.5M  3.09M
: > pptank       384G  12.3T  1.02K    831  34.5M  3.72M
: > pptank       384G  12.3T    707    503  23.5M  1.98M
: > pptank       384G  12.3T    626    707  20.8M  3.58M
: > pptank       384G  12.3T    816    838  26.1M  4.26M
: > pptank       384G  12.3T    942    800  30.1M  3.48M
: > pptank       384G  12.3T    677    675  21.7M  2.91M
: > pptank       384G  12.3T    590    725  19.2M  3.06M
: > 
: > 
: > top shows the following runnable processes.  Nothing excessive here
AFAICT?
: > 
: > last pid: 25282;  load avg:  1.98,  1.95,  1.86;       up 1+09:02:05
07:46:29
: > 72 processes: 67 sleeping, 1 running, 1 stopped, 3 on cpu
: > CPU states: 81.5% idle,  0.1% user, 18.3% kernel,  0.0% iowait,  0.0%
swap
: > Memory: 2048M phys mem, 32M free mem, 16G total swap, 16G free swap
: > 
: >    PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
: >    748 root      18  60  -20  103M 9752K cpu/1   78:44  6.62% nfsd
: >  24854 root       1  54    0 1480K  792K cpu/1    0:42  0.69% cp
: >  25281 root       1  59    0 3584K 2152K cpu/0    0:00  0.02% top
: > 
: > The above cp job is as mentioned above, attempting to copy a file to
: > an effected FS, I''ve noticed is apparently not completely hung.
: > 
: > The only thing that appears specific to Sunday morning is a cronjob to
: > remove old .nfs* files,
: > 
: > root at server5:/# crontab -l | grep nfsfind
: > 15 3 * * 0 /usr/lib/fs/nfs/nfsfind
: > 
: > Any suggestions on how to proceed?
: > 
: > Many thanks
: > Tom Crane
: > 
: > Ps. The email address in the header is just a spam-trap.
: > -- 
: > Tom Crane, IT support, RHUL Particle Physics.,
: > Dept. Physics, Royal Holloway, University of London, Egham Hill,
: > Egham, Surrey, TW20 0EX, England. 
: > Email:  T.Crane at rhul dot ac dot uk

: Hi Tom,

Hi Cindy,
	Thanks for the followup

: I think SunOS server5 5.10 Generic_147441-15 is the Solaris 10 8/11
: release. Is this correct?

I think so,...
root at server5:/# cat /etc/release
                       Solaris 10 10/08 s10x_u6wos_07b X86
           Copyright 2008 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                            Assembled 27 October 2008


: We looked at your truss output briefly and it looks like it is hanging
: trying to allocate memory. At least, that''s what the "br
...." statements
: are at the end.

: I will see if I can find out what diagnostic info would be help in
: this case.

Thanks. That would be much appreciated.

: You might get a faster response on zfs-discuss as John suggested.

I will CC to zfs-discuss.

Best regards
Tom.

: Thanks,

: Cindy

Ps. The email address in the header is just a spam-trap.
-- 
Tom Crane, Dept. Physics, Royal Holloway, University of London, Egham Hill,
Egham, Surrey, TW20 0EX, England. 
Email:  T.Crane at rhul dot ac dot uk
-- end of forwarded message --
zfs discuss - Jun 2012 - (fwd) Re: ZFS NFS service hanging on Sunday morning problem

[zfs-discuss] (fwd) Re: ZFS NFS service hanging on Sunday morning problem