TPCzfs at mklab.ph.rhul.ac.uk
2012-Jun-14 14:13 UTC
[zfs-discuss] (fwd) Re: ZFS NFS service hanging on Sunday morning
> > Offlist/OT - Sheer guess, straight out of my parts - maybe a cronjob to > rebuild the locate db or something similar is hammering it once a week?In the problem condition, there appears to be very little going on on the system. eg., root at server5:/tmp# /usr/local/bin/top last pid: 3828; load avg: 4.29, 3.95, 3.84; up 6+23:11:4407:12:47 79 processes: 78 sleeping, 1 on cpu CPU states: 73.0% idle, 0.0% user, 27.0% kernel, 0.0% iowait, 0.0% swap Memory: 2048M phys mem, 32M free mem, 16G total swap, 16G free swap PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 784 root 17 60 -20 88M 632K sleep 270:03 13.02% nfsd 2694 root 1 59 0 1376K 672K sleep 1:45 0.69% touch 3814 root 5 59 0 30M 3928K sleep 0:00 0.32% pkgserv 3763 root 1 60 0 8400K 1256K sleep 0:02 0.20% zfs 3826 root 1 52 0 3516K 2004K cpu/1 0:00 0.05% top 3811 root 1 59 0 7668K 1732K sleep 0:00 0.02% pkginfo 1323 noaccess 18 59 0 119M 1660K sleep 4:47 0.01% java 174 root 50 59 0 8796K 1208K sleep 1:47 0.01% nscd 332 root 1 49 0 2480K 456K sleep 0:06 0.01% dhcpagent 8 root 15 59 0 14M 640K sleep 0:07 0.01% svc.startd 1236 root 1 59 0 15M 5172K sleep 2:06 0.01% Xorg 1281 root 1 59 0 11M 544K sleep 1:00 0.00% dtgreet 26068 root 1 100 -20 2680K 1416K sleep 0:01 0.00% xntpd 582 root 4 59 0 6884K 1232K sleep 1:22 0.00% inetd 394 daemon 2 60 -20 2528K 508K sleep 5:54 0.00% lockd Regards Tom Crane> > On 6/13/12 3:47 AM, TPCzfs at mklab.ph.rhul.ac.uk wrote: > > Dear All, > > I have been advised to enquire here on zfs-discuss with the > > ZFS problem described below, following discussion on Usenet NG > > comp.unix.solaris. The full thread should be available here > > https://groups.google.com/forum/#!topic/comp.unix.solaris/uEQzz1t-G1s > > > > Many thanks > > Tom Crane >-- Tom Crane, Dept. Physics, Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 0EX, England. Email: T.Crane at rhul.ac.uk Fax: +44 (0) 1784 472794
John D Groenveld
2012-Jun-14 14:37 UTC
[zfs-discuss] (fwd) Re: ZFS NFS service hanging on Sunday morning
In message <201206141413.q5EEDvZq017439 at mklab.ph.rhul.ac.uk>, TPCzfs at mklab.ph.r hul.ac.uk writes:>Memory: 2048M phys mem, 32M free mem, 16G total swap, 16G free swapMy WAG is that your "zpool history" is hanging due to lack of RAM. John groenveld at acm.org
TPCzfs at mklab.ph.rhul.ac.uk
2012-Jun-14 15:11 UTC
[zfs-discuss] (fwd) Re: ZFS NFS service hanging on Sunday
> > In message <201206141413.q5EEDvZq017439 at mklab.ph.rhul.ac.uk>, TPCzfs at mklab.ph.r > hul.ac.uk writes: > >Memory: 2048M phys mem, 32M free mem, 16G total swap, 16G free swap > > > My WAG is that your "zpool history" is hanging due to lack of > RAM.Interesting. In the problem state the system is usually quite responsive, eg. not memory trashing. Under Linux which I''m more familiar with the ''used memory'' = ''total memory - ''free memory'', refers to physical memory being used for data caching by the kernel which is still available for processes to allocate as needed together with memory allocated to processes, as opposed to only physical memory already allocated and therefore really ''used''. Does this mean something different under Solaris ? Cheers Tom> > John > groenveld at acm.org
Jim Klimov
2012-Jun-14 16:27 UTC
[zfs-discuss] (fwd) Re: ZFS NFS service hanging on Sunday
2012-06-14 19:11, TPCzfs at mklab.ph.rhul.ac.uk wrote:>> >> In message <201206141413.q5EEDvZq017439 at mklab.ph.rhul.ac.uk>, TPCzfs at mklab.ph.r >> hul.ac.uk writes: >>> Memory: 2048M phys mem, 32M free mem, 16G total swap, 16G free swap >> My WAG is that your "zpool history" is hanging due to lack of >> RAM. > > Interesting. In the problem state the system is usually quite responsive, eg. not memory trashing. Under Linux which I''m more > familiar with the ''used memory'' = ''total memory - ''free memory'', refers to physical memory being used for data caching by > the kernel which is still available for processes to allocate as needed together with memory allocated to processes, as opposed to > only physical memory already allocated and therefore really ''used''. Does this mean something different under Solaris ?Well, it is roughly similar. In Solaris there is a general notion of "swap" or "virtual memory" so as not to confuse adepts of other systems, which is a general combination of "RAM" and "disk swap" spaces. Tools imported from other environments, like "top" above, use the common notions of "physical memory" and "on-disk swap"; tools like "vmstat" under Solaris would print the "swap = VM" and the "free = RAM" columns... Processes are allocated their memory requirements from the generic "swap = virt.mem", though some tricks are possible - some pages may be marked as not "swappable" to disk, others may require a reservation of on-disk swap space even if all the data still lives in RAM. Kernel memory, for example, that used by ZFS, does not go into on-disk swap (which can cause system freezes due to shortage of RAM for operations if some big ZFS task is not ready to just release that virtual memory). The ZFS ARC cache may release its memory "on request" for RAM from other processes, but this takes some time (and some programs check for lack of free memory and think they can''t get more, and break without even trying), so a reserve of free memory is usually kept by the OS. To have the free RAM go as low as the 32Mb low watermark, some strong hammering must be going on... Now, back to the 2Gb RAM problem: ZFS has lots of metadata. Both reads and writes to the pool have to traverse a large tree of block pointers, with leaves of the tree containing pieces of your "user-data". Updates to user-data cause rewriting of the whole path through the tree from updated blocks to the root (metadata blocks must be read, modified, re-checksummed at their parents - recurse to root). Metadata blocks are also stored on the disk, but in several copies per block (double-triple the IOPS cost). ZFS works fast when the "hot" paths through the needed portions of the blockpointer tree, or, even better, the whole tree, are cached into RAM. Otherwise, the least-used blocks are evicted to accomodate the recent newcomers. If you are low on RAM and useful blocks get evicted, this causes re-reads from disk to get them back (and evict some others), which may cause the lags you''re seeing. The high part of kernel time also indicates that it is not some userspace computation hogging the CPUs, but likely waiting for hardware IOs. Running "iostat 1" or "zpool iostat 1" can help you see some patterns (at least, whether there are many disk reads when the system is "hung"). Perhaps the pool is getting scrubbed, or the slocate database gets updated, or some machines begin dumping their backups onto the fileserver at once - and with so little cache the machine nearly dies, in terms of performance and responsiveness at least. This lack of RAM is especially deadly upon writes into deduped pools, because DDT tables tend to be large (tens of GBs for moderate-sized pools of tens of TB). Your box seems to have a 12Tb pool with just a little bit used, yet already the shortage of RAM is well seen... Hope this helps (understanding at least), //Jim Klimov
TPCzfs at mklab.ph.rhul.ac.uk
2012-Jun-25 09:58 UTC
[zfs-discuss] (fwd) Re: ZFS NFS service hanging on Sunday
> > 2012-06-14 19:11, TPCzfs at mklab.ph.rhul.ac.uk wrote: > >> > >> In message <201206141413.q5EEDvZq017439 at mklab.ph.rhul.ac.uk>, TPCzfs at mklab.ph.r > >> hul.ac.uk writes: > >>> Memory: 2048M phys mem, 32M free mem, 16G total swap, 16G free swap > >> My WAG is that your "zpool history" is hanging due to lack of > >> RAM. > > > > Interesting. In the problem state the system is usually quite responsive, eg. not memory trashing. Under Linux which I''m more > > familiar with the ''used memory'' = ''total memory - ''free memory'', refers to physical memory being used for data caching by > > the kernel which is still available for processes to allocate as needed together with memory allocated to processes, as opposed to > > only physical memory already allocated and therefore really ''used''. Does this mean something different under Solaris ? > > Well, it is roughly similar. In Solaris there is a general notion[snipped] Dear Jim, Thanks for the detailed explanation of ZFS memory usage. Special thanks also to John D Groenveld for the initial suggestion of a lack of RAM problem. Since up-ing the RAM from 2GB to 4GB the machine has sailed though the last two Sunday mornings w/o problem. I was interested to subsequently discover the Solaris command ''echo ::memstat | mdb -k'' which reveals just how much memory ZFS can use. Best regards Tom. -- Tom Crane, Dept. Physics, Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 0EX, England. Email: T.Crane at rhul dot ac dot uk
Hung-Sheng Tsao (LaoTsao) Ph.D
2012-Jun-25 11:37 UTC
[zfs-discuss] (fwd) Re: ZFS NFS service hanging on Sunday
in solaris zfs cache many things, you should have more ram if you setup 18gb swap , imho, ram should be high than 4gb regards Sent from my iPad On Jun 25, 2012, at 5:58, TPCzfs at mklab.ph.rhul.ac.uk wrote:>> >> 2012-06-14 19:11, TPCzfs at mklab.ph.rhul.ac.uk wrote: >>>> >>>> In message <201206141413.q5EEDvZq017439 at mklab.ph.rhul.ac.uk>, TPCzfs at mklab.ph.r >>>> hul.ac.uk writes: >>>>> Memory: 2048M phys mem, 32M free mem, 16G total swap, 16G free swap >>>> My WAG is that your "zpool history" is hanging due to lack of >>>> RAM. >>> >>> Interesting. In the problem state the system is usually quite responsive, eg. not memory trashing. Under Linux which I''m more >>> familiar with the ''used memory'' = ''total memory - ''free memory'', refers to physical memory being used for data caching by >>> the kernel which is still available for processes to allocate as needed together with memory allocated to processes, as opposed to >>> only physical memory already allocated and therefore really ''used''. Does this mean something different under Solaris ? >> >> Well, it is roughly similar. In Solaris there is a general notion > > [snipped] > > Dear Jim, > Thanks for the detailed explanation of ZFS memory usage. Special thanks also to John D Groenveld for the initial suggestion of a lack of RAM > problem. Since up-ing the RAM from 2GB to 4GB the machine has sailed though the last two Sunday mornings w/o problem. I was interested to > subsequently discover the Solaris command ''echo ::memstat | mdb -k'' which reveals just how much memory ZFS can use. > > Best regards > Tom. > > -- > Tom Crane, Dept. Physics, Royal Holloway, University of London, Egham Hill, > Egham, Surrey, TW20 0EX, England. > Email: T.Crane at rhul dot ac dot uk > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss