We have an IMAP e-mail server running on a Solaris 10 10/09 system. It uses six ZFS filesystems built on a single zpool with 14 daily snapshots. Every day at 11:56, a cron command destroys the oldest snapshots and creates new ones, both recursively. For about four minutes thereafter, the load average drops and I/O to the disk devices drops to almost zero. Then, the load average shoots up to about ten times normal and then declines to normal over about four minutes, as disk activity resumes. The statistics return to their normal state about ten minutes after the cron command runs. Is it destroying old snapshots or creating new ones that causes this dead time? What does each of these procedures do that could affect the system? What can I do to make this less visible to users? -- -Gary Mills- -Unix Group- -Computer and Network Services-
Gary Mills wrote:> We have an IMAP e-mail server running on a Solaris 10 10/09 system. > It uses six ZFS filesystems built on a single zpool with 14 daily > snapshots. Every day at 11:56, a cron command destroys the oldest > snapshots and creates new ones, both recursively. For about four > minutes thereafter, the load average drops and I/O to the disk devices > drops to almost zero. Then, the load average shoots up to about ten > times normal and then declines to normal over about four minutes, as > disk activity resumes. The statistics return to their normal state > about ten minutes after the cron command runs. > > Is it destroying old snapshots or creating new ones that causes this > dead time? What does each of these procedures do that could affect > the system? What can I do to make this less visible to users? > >I have a couple of Solaris 10 boxes that do something similar (hourly snaps) and I''ve never seen any lag in creating and destroying snapshots. One system with 16 filesystems takes 5 seconds to destroy the 16 oldest snaps and create 5 recursive new ones. I logged load average on these boxes and there is a small spike on the hour, but this is down to sending the snaps, not creating them. -- Ian.
Andrew Gabriel
2010-Mar-04 22:45 UTC
[zfs-discuss] Snapshot recycle freezes system activity
Gary Mills wrote:> We have an IMAP e-mail server running on a Solaris 10 10/09 system. > It uses six ZFS filesystems built on a single zpool with 14 daily > snapshots. Every day at 11:56, a cron command destroys the oldest > snapshots and creates new ones, both recursively. For about four > minutes thereafter, the load average drops and I/O to the disk devices > drops to almost zero. Then, the load average shoots up to about ten > times normal and then declines to normal over about four minutes, as > disk activity resumes. The statistics return to their normal state > about ten minutes after the cron command runs. > > Is it destroying old snapshots or creating new ones that causes this > dead time? What does each of these procedures do that could affect > the system? What can I do to make this less visible to users? >Creating a snapshot shouldn''t do anything much more than a regular transaction group commit, which should be happening at least every 30 seconds anyway. Deleting a snapshot potentially results in freeing up the space occupied by files/blocks which aren''t in any other snapshots. One way to think of this is that when you''re using regular snapshots, the freeing up of space which happens when you delete files is in effect all deferred until you destroy the snapshot(s) which also refer to that space, which has the effect of bunching all your space freeing. If this is the cause (a big _if_, as I''m just speculating), then it might be a good idea to: a) spread out the deleting of the snapshots, and b) create more snapshots more often (and conversely delete more snapshots, more often), so each one contains fewer accumulated space to be freed off. -- Andrew
Giovanni Tirloni
2010-Mar-04 22:51 UTC
[zfs-discuss] Snapshot recycle freezes system activity
On Thu, Mar 4, 2010 at 7:28 PM, Ian Collins <ian at ianshome.com> wrote:> Gary Mills wrote: > >> We have an IMAP e-mail server running on a Solaris 10 10/09 system. >> It uses six ZFS filesystems built on a single zpool with 14 daily >> snapshots. Every day at 11:56, a cron command destroys the oldest >> snapshots and creates new ones, both recursively. For about four >> minutes thereafter, the load average drops and I/O to the disk devices >> drops to almost zero. Then, the load average shoots up to about ten >> times normal and then declines to normal over about four minutes, as >> disk activity resumes. The statistics return to their normal state >> about ten minutes after the cron command runs. >> >> Is it destroying old snapshots or creating new ones that causes this >> dead time? What does each of these procedures do that could affect >> the system? What can I do to make this less visible to users? >> >> >> > I have a couple of Solaris 10 boxes that do something similar (hourly > snaps) and I''ve never seen any lag in creating and destroying snapshots. > One system with 16 filesystems takes 5 seconds to destroy the 16 oldest > snaps and create 5 recursive new ones. I logged load average on these boxes > and there is a small spike on the hour, but this is down to sending the > snaps, not creating them. >We''ve seen the behaviour that Gary describes while destroying datasets recursively (>600GB and with 7 snapshots). It seems that close to the end the server stalls for 10-15 minutes and NFS activity stops. For small datasets/snapshots that doesn''t happen or is harder to notice. Does ZFS have to do something special when it''s done releasing the data blocks at the end of the destroy operation ? -- Giovanni Tirloni sysdroid.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100304/1b3bd694/attachment.html>
On Thu, Mar 04, 2010 at 07:51:13PM -0300, Giovanni Tirloni wrote:> > On Thu, Mar 4, 2010 at 7:28 PM, Ian Collins <[1]ian at ianshome.com> > wrote: > > Gary Mills wrote: > > We have an IMAP e-mail server running on a Solaris 10 10/09 system. > It uses six ZFS filesystems built on a single zpool with 14 daily > snapshots. Every day at 11:56, a cron command destroys the oldest > snapshots and creates new ones, both recursively. For about four > minutes thereafter, the load average drops and I/O to the disk > devices > drops to almost zero. Then, the load average shoots up to about > ten > times normal and then declines to normal over about four minutes, > as > disk activity resumes. The statistics return to their normal state > about ten minutes after the cron command runs. > Is it destroying old snapshots or creating new ones that causes > this > dead time? What does each of these procedures do that could affect > the system? What can I do to make this less visible to users? > > I have a couple of Solaris 10 boxes that do something similar > (hourly snaps) and I''ve never seen any lag in creating and > destroying snapshots. One system with 16 filesystems takes 5 > seconds to destroy the 16 oldest snaps and create 5 recursive new > ones. I logged load average on these boxes and there is a small > spike on the hour, but this is down to sending the snaps, not > creating them. > > We''ve seen the behaviour that Gary describes while destroying datasets > recursively (>600GB and with 7 snapshots). It seems that close to the > end the server stalls for 10-15 minutes and NFS activity stops. For > small datasets/snapshots that doesn''t happen or is harder to notice. > Does ZFS have to do something special when it''s done releasing the > data blocks at the end of the destroy operation ?That does sound similar to the problem here. The zpool is 3 TB in size with about 1.4 TB used. It does sound as if the stall happens during the `zfs destroy -r'' rather than during the `zfs snapshot -r''. What can zfs be doing when the CPU load average drops and disk I/O is close to zero? I also had peculiar problem here recently when I was upgrading the ZFS filesystems on our test server from 3 to 4. When I tried `zfs upgrade -a'', the command hung for a long time and could not be interrupted, killed, or traced. Eventually it terminated on its own. Only the two upper-level filesystems had been upgraded. I upgraded the lower- level ones individually with `zfs upgrade'' with no further problems. I had previously upgraded the zpool with no problems. I don''t know if this behavior is related to the stall on the production server. I haven''t attempted the upgrades there yet. -- -Gary Mills- -Unix Group- -Computer and Network Services-
On Thu, Mar 04, 2010 at 04:20:10PM -0600, Gary Mills wrote:> We have an IMAP e-mail server running on a Solaris 10 10/09 system. > It uses six ZFS filesystems built on a single zpool with 14 daily > snapshots. Every day at 11:56, a cron command destroys the oldest > snapshots and creates new ones, both recursively. For about four > minutes thereafter, the load average drops and I/O to the disk devices > drops to almost zero. Then, the load average shoots up to about ten > times normal and then declines to normal over about four minutes, as > disk activity resumes. The statistics return to their normal state > about ten minutes after the cron command runs.I should mention that this seems to be a new problem. We''ve been using the same scheme to cycle snapshots for several years. The complaints of an unresponsive interval have only happened recently. I''m still waiting for our help desk to report on when the complaints started. It may be the result of some recent change we made, but so far I can''t tell what that might have been. -- -Gary Mills- -Unix Group- -Computer and Network Services-
>>>>> "gm" == Gary Mills <mills at cc.umanitoba.ca> writes:gm> destroys the oldest snapshots and creates new ones, both gm> recursively. I''d be curious if you try taking the same snapshots non-recursively instead, does the pause go away? Because recursive snapshots are special: they''re supposed to atomically synchronize the cut-point across all the filesystems involved, AIUI. I don''t see that recursive destroys should be anything special though. gm> Is it destroying old snapshots or creating new ones that gm> causes this dead time? sortof seems like you should tell us this, not the other way around. :) Seriously though, isn''t that easy to test? And I''m curious myself too. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100308/77ed21d4/attachment.bin>
On 08 March, 2010 - Miles Nordin sent me these 1,8K bytes:> >>>>> "gm" == Gary Mills <mills at cc.umanitoba.ca> writes: > > gm> destroys the oldest snapshots and creates new ones, both > gm> recursively. > > I''d be curious if you try taking the same snapshots non-recursively > instead, does the pause go away?According to my testing, that would give you a much longer period of "slightly slower", but shorter period of "per filesystem reallyslowness", given recursive snapshots over lots of "independent" filesystems.> Because recursive snapshots are special: they''re supposed to > atomically synchronize the cut-point across all the filesystems > involved, AIUI. I don''t see that recursive destroys should be > anything special though.>From my experiences on a homedir file server with about 700 filesystemsand ~65 snapshots on each, giving about 45k snapshots.. In the beginning, the snapshots took zero time to create.. Now when we have snapshots spanning over a year, it''s not as fast. We then turned to only doing daily snapshots (for online backups in addition to regular backups), but they could take up to 45 minutes sometimes with "regular nfs work" being abysmal. So we started tuning some stuff, and doing hourly snapshots actually helped (probably keeping some data structures warm in ARC). Down to 2-3 minutes or so for a recursive snapshot. So we tried adding 2x 4GB USB sticks (Kingston Data Traveller Mini Slim) as metadata L2ARC and that seems to have pushed the snapshot times down to about 30 seconds. acc.umu.se/~stric/tmp/snaptimes.png y axis is mmss, so a value of 450 is 4 minutes, 50 seconds.. not all linear ;) x axis is just snapshot number, higher == newer.. Large spikes are snapshots at the same time as daily backups. In snapshot 67..100 in the picture, I removed the L2ARC USB sticks and the times increased and started fluctuating.. I''ll give it a few days and put the L2ARC back.. Even cheap $10 USB sticks can help it seems.> gm> Is it destroying old snapshots or creating new ones that > gm> causes this dead time? > > sortof seems like you should tell us this, not the other way > around. :) Seriously though, isn''t that easy to test? And I''m curious > myself too.> _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > mail.opensolaris.org/mailman/listinfo/zfs-discuss/Tomas -- Tomas ?gren, stric at acc.umu.se, acc.umu.se/~stric |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Bill Sommerfeld
2010-Mar-08 21:23 UTC
[zfs-discuss] Snapshot recycle freezes system activity
On 03/08/10 12:43, Tomas ?gren wrote: So we tried adding 2x 4GB USB sticks (Kingston Data> Traveller Mini Slim) as metadata L2ARC and that seems to have pushed the > snapshot times down to about 30 seconds.Out of curiosity, how much physical memory does this system have?
On 08 March, 2010 - Bill Sommerfeld sent me these 0,4K bytes:> On 03/08/10 12:43, Tomas ?gren wrote: > So we tried adding 2x 4GB USB sticks (Kingston Data >> Traveller Mini Slim) as metadata L2ARC and that seems to have pushed the >> snapshot times down to about 30 seconds. > > Out of curiosity, how much physical memory does this system have?System Memory: Physical RAM: 6134 MB Free Memory : 190 MB LotsFree: 94 MB ARC Size: Current Size: 1890 MB (arcsize) Target Size (Adaptive): 2910 MB (c) Min Size (Hard Limit): 638 MB (zfs_arc_min) Max Size (Hard Limit): 5110 MB (zfs_arc_max) ARC Size Breakdown: Most Recently Used Cache Size: 67% 1959 MB (p) Most Frequently Used Cache Size: 32% 950 MB (c-p) It does some mail server stuff as well. The two added USB sticks grew to about 3.2GB of metadata L2ARC, totally about 6.5M files on the system. /Tomas -- Tomas ?gren, stric at acc.umu.se, acc.umu.se/~stric |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
On Mon, Mar 08, 2010 at 03:18:34PM -0500, Miles Nordin wrote:> >>>>> "gm" == Gary Mills <mills at cc.umanitoba.ca> writes: > > gm> destroys the oldest snapshots and creates new ones, both > gm> recursively. > > I''d be curious if you try taking the same snapshots non-recursively > instead, does the pause go away?I''m still collecting statistics, but that is one of the things I''d like to try.> Because recursive snapshots are special: they''re supposed to > atomically synchronize the cut-point across all the filesystems > involved, AIUI. I don''t see that recursive destroys should be > anything special though. > > gm> Is it destroying old snapshots or creating new ones that > gm> causes this dead time? > > sortof seems like you should tell us this, not the other way > around. :) Seriously though, isn''t that easy to test? And I''m curious > myself too.Yes, that''s another thing I''d like to try. I''ll just put a `sleep'' in the script between the two actions to see if the dead time moves later in the day. -- -Gary Mills- -Unix Group- -Computer and Network Services-
On Mon, Mar 08, 2010 at 01:23:10PM -0800, Bill Sommerfeld wrote:> On 03/08/10 12:43, Tomas ?gren wrote: > So we tried adding 2x 4GB USB sticks (Kingston Data > >Traveller Mini Slim) as metadata L2ARC and that seems to have pushed the > >snapshot times down to about 30 seconds. > > Out of curiosity, how much physical memory does this system have?Mine has 64 GB of memory with the ARC limited to 32 GB. The Cyrus IMAP processes, thousands of them, use memory mapping extensively. I don''t know if this design affects the snapshot recycle behavior. -- -Gary Mills- -Unix Group- -Computer and Network Services-
On Thu, Mar 04, 2010 at 04:20:10PM -0600, Gary Mills wrote:> We have an IMAP e-mail server running on a Solaris 10 10/09 system. > It uses six ZFS filesystems built on a single zpool with 14 daily > snapshots. Every day at 11:56, a cron command destroys the oldest > snapshots and creates new ones, both recursively. For about four > minutes thereafter, the load average drops and I/O to the disk devices > drops to almost zero. Then, the load average shoots up to about ten > times normal and then declines to normal over about four minutes, as > disk activity resumes. The statistics return to their normal state > about ten minutes after the cron command runs.I''m pleased to report that I found the culprit and the culprit was me! Well, ZFS peculiarities may be involved as well. Let me explain: We had a single second-level filesystem and five third-level filesystems, all with 14 daily snapshots. The snapshots were maintained by a cron command that did a `zfs list -rH -t snapshot -o name'' to get the names of all of the snapshots, extracted the part after the `@'', and then sorted them uniquely to get a list of suffixes that were older than 14 days. The suffixes were Julian dates so they sorted correctly. It then did a `zfs destroy -r'' to delete them. The recursion was always done from the second-level filesystem. The top-level filesystem was empty and had no snapshots. Here''s a portion of the script: zfs list -rH -t snapshot -o name $FS | \ cut -d@ -f2 | \ sort -ur | \ sed 1,${NR}d | \ xargs -I ''{}'' zfs destroy -r $FS@''{}'' zfs snapshot -r $FS@$JD Just over two weeks ago, I rearranged the filesystems so that the second-level filesystem was newly-created and initially had no snapshots. It did have a snapshot taken every day thereafter, so that eventually it also had 14 of them. It was during that interval that the complaints started. My statistics clearly showed the performance stall and subsequent recovery. Once that filesystem reached 14 snapshots, the complaints stopped and the statistics showed only a modest increase in CPU activity, but no stall. During this interval, the script was doing a recursive destroy for a snapshot that didn''t exist at the specified level, but only existed in the descendent filesystems. I''m assuming that that unusual situation was the cause of the stall, although I don''t have good evidence. By the time the complaints reached my ears, and I was able to refine my statistics gathering sufficiently, the problem had gone away. -- -Gary Mills- -Unix Group- -Computer and Network Services-