I''ve been watching the ZFS ARC cache on our IMAP server while the backups are running, and also when user activity is high. The two seem to conflict. Fast response for users seems to depend on their data being in the cache when it''s needed. Most of the disk I/O seems to be writes in this situation. However, the backup needs to stat all files and read many of them. I''m assuming that all of this information is also added to the ARC cache, even though it may never be needed again. It must also evict user data from the cache, causing it to be reloaded every time it''s needed. We use Networker for backups now. Is there some way to configure ZFS so that backups don''t churn the cache? Is there a different way to perform backups to avoid this problem? We do keep two weeks of daily ZFS snapshots to use for restores of recently-lost data. We still need something for longer-term backups. -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
Gary Mills wrote:> I''ve been watching the ZFS ARC cache on our IMAP server while the > backups are running, and also when user activity is high. The two > seem to conflict. Fast response for users seems to depend on their > data being in the cache when it''s needed. Most of the disk I/O seems > to be writes in this situation. However, the backup needs to stat > all files and read many of them. I''m assuming that all of this > information is also added to the ARC cache, even though it may never > be needed again. It must also evict user data from the cache, causing > it to be reloaded every time it''s needed. > > We use Networker for backups now. Is there some way to configure ZFS > so that backups don''t churn the cache? Is there a different way to > perform backups to avoid this problem? We do keep two weeks of daily > ZFS snapshots to use for restores of recently-lost data. We still > need something for longer-term backups. >Hi Gary, Find out whether you have a problem first. If not, don''t worry, but read one. If you do have a problem, add memory or an L2ARC device. The ARC was designed to mitigate the effect of any single burst of sequential I/O, but the size of the cache dedicated to more Frequently used pages (the current working set) will still be reduced, depending on the amount of activity on either side of the cache. As the ARC maintains a shadow list of recently evicted pages from both sides of the cache, such pages that are accessed again will then return to the ''Frequent'' side of the cache. There will be continuous competition between ''Recent'' and ''Frequent'' sides of the ARC (and for convenience, I''m glossing over the existence of ''Locked'' pages). Several reasons might cause pathological behaviour - a backup process might access the same metadata multiple times, causing that data to be promoted to ''Frequent'', flushing out application related data. (ZFS does not differentiate between data and metadata for resource allocation, they all use the same I/O mechanism and cache.) On the other hand, you might just not have sufficient memory to keep most of your metadata in the cache, or the backup process is just too aggressive. Adding memory or an L2cache might help. Cheers, Henk
On Thu, Apr 09, 2009 at 04:25:58PM +0200, Henk Langeveld wrote:> Gary Mills wrote: > >I''ve been watching the ZFS ARC cache on our IMAP server while the > >backups are running, and also when user activity is high. The two > >seem to conflict. Fast response for users seems to depend on their > >data being in the cache when it''s needed. Most of the disk I/O seems > >to be writes in this situation. However, the backup needs to stat > >all files and read many of them. I''m assuming that all of this > >information is also added to the ARC cache, even though it may never > >be needed again. It must also evict user data from the cache, causing > >it to be reloaded every time it''s needed. > > Find out whether you have a problem first. If not, don''t worry, but > read one. If you do have a problem, add memory or an L2ARC device.We do have a problem, but not with the backup itself. The backup is slow, but I expect that''s just because it''s reading a very large number of small files. Our problem is with normal IMAP operations becoming quite slow at times. I''m wondering if the backup is contributing to this problem.> The ARC was designed to mitigate the effect of any single burst of > sequential I/O, but the size of the cache dedicated to more Frequently > used pages (the current working set) will still be reduced, depending > on the amount of activity on either side of the cache.That''s a nice design, better than a simple cache.> As the ARC maintains a shadow list of recently evicted pages from both > sides of the cache, such pages that are accessed again will then return > to the ''Frequent'' side of the cache. > > There will be continuous competition between ''Recent'' and ''Frequent'' > sides of the ARC (and for convenience, I''m glossing over the existence > of ''Locked'' pages). > > Several reasons might cause pathological behaviour - a backup process > might access the same metadata multiple times, causing that data to > be promoted to ''Frequent'', flushing out application related data. > (ZFS does not differentiate between data and metadata for resource > allocation, they all use the same I/O mechanism and cache.)That might be possible in our case.> On the other hand, you might just not have sufficient memory to keep > most of your metadata in the cache, or the backup process is just too > aggressive. Adding memory or an L2cache might help.We''ve added memory. That did seem to help, although the problem''s still there. I assume the L2cache is not available in Solaris 10. -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-