I''ve got a bit of a strange problem with snapshot sizes. First, some background: For ages our DBA backed up all the company databases to a directory NFS mounted from a NetApp filer. That directory would then get dumped to tape. About a year ago, I built an OpenSolaris (technically Nexenta) machine with 24 x 1.5TB drives, for about 24TB of usable space. I am using this to backup OS images using backuppc. I was also backing up the DBA''s backup volume from the NetApp to the (ZFS) backup server. This is a combination of rsync + snapshots. The snapshots were using about 50GB/day. The backup volume is about 600GB total, so this wasn''t bad, especially on a box with 24TB of space available. I decided to cut out the middleman, and save some of that expensive NetApp disk space, by having the DBA backup directly to the backup server. I repointed the NFS mounts on our DB servers to point to the backup server instead of the NetApp. Then I ran a simple cron job to snapshot that ZFS filesystem daily. My problem is that the snapshots started taking around 500GB instead of 50GB. After a bit of thinking, I realized that the backup system my DBA was using must have been writing new files and moving them into place, or possibly writing a whole new file even if only part changed. I think this is the problem because ZFS never overwrites files in place. Instead it would allocate new blocks. But rsync does a byte-by-byte comparison, and only updates the blocks that have changed. Because it''s easier to change what I''m doing than what my DBA does, I decided that I would put rsync back in place, but locally. So I changed things so that the backups go to a staging FS, and then are rsync''ed over to another FS that I take snapshots on. The only problem is that the snapshots are still in the 500GB range. So, I need to figure out why these snapshots are taking so much more room than they were before. This, BTW, is the rsync command I''m using (and essentially the same command I was using when I was rsync''ing from the NetApp): rsync -aPH --inplace --delete /staging/oracle_backup/ /backups/oracle_backup/ This is the old system (rsync''ing from a NetApp and taking snapshots): zfs list -t snapshot -r bpool/snapback NAME USED AVAIL REFER MOUNTPOINT ... bpool/snapback at 20100310-182713 53.7G - 868G - bpool/snapback at 20100312-000318 59.8G - 860G - bpool/snapback at 20100312-182552 54.0G - 840G - bpool/snapback at 20100313-184834 71.7G - 884G - bpool/snapback at 20100314-123024 17.5G - 832G - bpool/snapback at 20100315-173609 72.6G - 891G - bpool/snapback at 20100316-165527 24.3G - 851G - bpool/snapback at 20100317-171304 56.2G - 884G - bpool/snapback at 20100318-170250 50.9G - 865G - bpool/snapback at 20100319-181131 53.9G - 874G - bpool/snapback at 20100320-183617 80.8G - 902G - ... This is from the new system (backing up directly to one volume, rsync''ing to and snapshotting another one): root at backup02:~# zfs list -t snapshot -r bpool/backups/oracle_backup NAME USED AVAIL REFER MOUNTPOINT bpool/backups/oracle_backup at 20100411-023130 479G - 681G - bpool/backups/oracle_backup at 20100411-104428 515G - 721G - bpool/backups/oracle_backup at 20100412-144700 0 - 734G - Thanks for any help, Paul
Paul Archer wrote:> > Because it''s easier to change what I''m doing than what my DBA does, I > decided that I would put rsync back in place, but locally. So I changed > things so that the backups go to a staging FS, and then are rsync''ed > over to another FS that I take snapshots on. The only problem is that > the snapshots are still in the 500GB range. > > So, I need to figure out why these snapshots are taking so much more > room than they were before. > > This, BTW, is the rsync command I''m using (and essentially the same > command I was using when I was rsync''ing from the NetApp): > > rsync -aPH --inplace --delete /staging/oracle_backup/ > /backups/oracle_backup/Try adding --no-whole-file to rsync. rsync disables block-by-block comparison if used locally by default. --Arne> > > > This is the old system (rsync''ing from a NetApp and taking snapshots): > zfs list -t snapshot -r bpool/snapback > NAME USED AVAIL REFER MOUNTPOINT > ... > bpool/snapback at 20100310-182713 53.7G - 868G - > bpool/snapback at 20100312-000318 59.8G - 860G - > bpool/snapback at 20100312-182552 54.0G - 840G - > bpool/snapback at 20100313-184834 71.7G - 884G - > bpool/snapback at 20100314-123024 17.5G - 832G - > bpool/snapback at 20100315-173609 72.6G - 891G - > bpool/snapback at 20100316-165527 24.3G - 851G - > bpool/snapback at 20100317-171304 56.2G - 884G - > bpool/snapback at 20100318-170250 50.9G - 865G - > bpool/snapback at 20100319-181131 53.9G - 874G - > bpool/snapback at 20100320-183617 80.8G - 902G - > ... > > > > This is from the new system (backing up directly to one volume, > rsync''ing to and snapshotting another one): > > root at backup02:~# zfs list -t snapshot -r bpool/backups/oracle_backup > NAME USED AVAIL REFER > MOUNTPOINT > bpool/backups/oracle_backup at 20100411-023130 479G - 681G - > bpool/backups/oracle_backup at 20100411-104428 515G - 721G - > bpool/backups/oracle_backup at 20100412-144700 0 - 734G - > > > Thanks for any help, > > Paul > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Though the rsync switch is probably the answer to your problem... You might want to consider upgrading to Nexenta 3.0, switching checksums from fletcher to sha1 and then enabling block level deduplication. You''d probably use less GB per snapshot even with rsync running inefficiently. -- This message posted from opensolaris.org
Oops, I meant SHA256. My mind just maps SHA->SHA1, totally forgetting that ZFS actually uses SHA256 (a SHA-2 variant). More on ZFS dedup, checksums and collisions: http://blogs.sun.com/bonwick/entry/zfs_dedup http://www.c0t0d0s0.org/archives/6349-Perceived-Risk.html -- This message posted from opensolaris.org
Now is probably a good time to mention that dedupe likes LOTS of RAM, based on experiences described here. 8 GiB minimum is a good start. And to avoid those obscenely long removal times due to updating the DDT, an SSD based L2ARC device seems to be highly recommended as well. That is, of course, if the OP decides to go the dedupe route. I get the feeling there is an actual solution to, or at least an intelligent reason for, for the symptoms he''s experiencing. I''m just not sure what either of those might be. On Tue, Apr 13, 2010 at 03:09, Peter Tripp <petertripp at gmail.com> wrote:> Oops, I meant SHA256. My mind just maps SHA->SHA1, totally forgetting that > ZFS actually uses SHA256 (a SHA-2 variant). > > More on ZFS dedup, checksums and collisions: > http://blogs.sun.com/bonwick/entry/zfs_dedup > http://www.c0t0d0s0.org/archives/6349-Perceived-Risk.html > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- "You can choose your friends, you can choose the deals." - Equity Private "If Linux is faster, it''s a Solaris bug." - Phil Harman Blog - http://whatderass.blogspot.com/ Twitter - @khyron4eva -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100413/d94bc91e/attachment.html>
Yesterday, Arne Jansen wrote:> Paul Archer wrote: >> >> Because it''s easier to change what I''m doing than what my DBA does, I >> decided that I would put rsync back in place, but locally. So I changed >> things so that the backups go to a staging FS, and then are rsync''ed >> over to another FS that I take snapshots on. The only problem is that >> the snapshots are still in the 500GB range. >> >> So, I need to figure out why these snapshots are taking so much more >> room than they were before. >> >> This, BTW, is the rsync command I''m using (and essentially the same >> command I was using when I was rsync''ing from the NetApp): >> >> rsync -aPH --inplace --delete /staging/oracle_backup/ >> /backups/oracle_backup/ > > Try adding --no-whole-file to rsync. rsync disables block-by-block > comparison if used locally by default. >Thanks for the tip. I didn''t realize rsync had that behavior. It looks like that got my snapshots back to the 50GB range. I''m going to try dedup on the staging FS as well, so I can do a side-by-side of which gives me the better space savings. Paul
Paul Archer
2010-Apr-14 13:48 UTC
[zfs-discuss] dedup causing problems with NFS?(was Re: snapshots taking too much space)
So I turned deduplication on on my staging FS (the one that gets mounted on the database servers) yesterday, and since then I''ve been seeing the mount hang for short periods of time off and on. (It lights nagios up like a Christmas tree ''cause the disk checks hang and timeout.) I haven''t turned dedup off again yet, because I''d like to figure out how to get past this problem. Can anyone give me an idea of why the mounts might be hanging, or where to look for clues? And has anyone had this problem with dedup and NFS before? FWIW, the clients are a mix of Solaris and Linux. Paul Yesterday, Paul Archer wrote:> Yesterday, Arne Jansen wrote: > >> Paul Archer wrote: >>> >>> Because it''s easier to change what I''m doing than what my DBA does, I >>> decided that I would put rsync back in place, but locally. So I changed >>> things so that the backups go to a staging FS, and then are rsync''ed >>> over to another FS that I take snapshots on. The only problem is that >>> the snapshots are still in the 500GB range. >>> >>> So, I need to figure out why these snapshots are taking so much more >>> room than they were before. >>> >>> This, BTW, is the rsync command I''m using (and essentially the same >>> command I was using when I was rsync''ing from the NetApp): >>> >>> rsync -aPH --inplace --delete /staging/oracle_backup/ >>> /backups/oracle_backup/ >> >> Try adding --no-whole-file to rsync. rsync disables block-by-block >> comparison if used locally by default. >> > > Thanks for the tip. I didn''t realize rsync had that behavior. It looks like > that got my snapshots back to the 50GB range. I''m going to try dedup on the > staging FS as well, so I can do a side-by-side of which gives me the better > space savings. > > Paul > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Bruno Sousa
2010-Apr-14 14:32 UTC
[zfs-discuss] dedup causing problems with NFS?(was Re: snapshots taking too much space)
Hi, Maybe your zfs box used for dedup has a big load, therefore giving timeouts in nagios checks? I ask you this because i also suffer from that effect in a system with 2 Intel Xeon 3.0Ghz ;) Bruno On 14-4-2010 15:48, Paul Archer wrote:> So I turned deduplication on on my staging FS (the one that gets > mounted on the database servers) yesterday, and since then I''ve been > seeing the mount hang for short periods of time off and on. (It lights > nagios up like a Christmas tree ''cause the disk checks hang and timeout.) > > I haven''t turned dedup off again yet, because I''d like to figure out > how to get past this problem. > > Can anyone give me an idea of why the mounts might be hanging, or > where to look for clues? And has anyone had this problem with dedup > and NFS before? FWIW, the clients are a mix of Solaris and Linux. > > Paul > > > > > Yesterday, Paul Archer wrote: > >> Yesterday, Arne Jansen wrote: >> >>> Paul Archer wrote: >>>> >>>> Because it''s easier to change what I''m doing than what my DBA does, I >>>> decided that I would put rsync back in place, but locally. So I >>>> changed >>>> things so that the backups go to a staging FS, and then are rsync''ed >>>> over to another FS that I take snapshots on. The only problem is that >>>> the snapshots are still in the 500GB range. >>>> >>>> So, I need to figure out why these snapshots are taking so much more >>>> room than they were before. >>>> >>>> This, BTW, is the rsync command I''m using (and essentially the same >>>> command I was using when I was rsync''ing from the NetApp): >>>> >>>> rsync -aPH --inplace --delete /staging/oracle_backup/ >>>> /backups/oracle_backup/ >>> >>> Try adding --no-whole-file to rsync. rsync disables block-by-block >>> comparison if used locally by default. >>> >> >> Thanks for the tip. I didn''t realize rsync had that behavior. It >> looks like that got my snapshots back to the 50GB range. I''m going to >> try dedup on the staging FS as well, so I can do a side-by-side of >> which gives me the better space savings. >> >> Paul >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3656 bytes Desc: S/MIME Cryptographic Signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100414/f825a916/attachment.bin>
Daniel Carosone
2010-Apr-15 05:08 UTC
[zfs-discuss] dedup causing problems with NFS?(was Re: snapshots taking too much space)
On Wed, Apr 14, 2010 at 08:48:42AM -0500, Paul Archer wrote:> So I turned deduplication on on my staging FS (the one that gets mounted > on the database servers) yesterday, and since then I''ve been seeing the > mount hang for short periods of time off and on. (It lights nagios up > like a Christmas tree ''cause the disk checks hang and timeout.)Does it have enough (really, lots) of memory? Do you have an l2arc cache device attached (as well)? Dedup has a significant memory requirement, or it has to go to disk for lots of DDT entries. While its doing that, NFS requests can time out. Lengthening the timeouts on the client (for the fs mounted as a backup destination) might help you around the edges of the problem. As a related issue, are your staging (export) and backup fileystems in the same pool? If they are, moving from staging to final will involve another round of updating lots of DDT entries. What might be worthwhile trying: - turning dedup *off* on the staging filesystem, so NFS isn''t waiting for it, and then deduping later as you move to the backup area at leisure (effectively, asynchronously to the nfs writes). - or, perhaps eliminating this double work by writing directly to the main backup fs. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100415/82545c8d/attachment.bin>
Erik Trimble
2010-Apr-15 05:21 UTC
[zfs-discuss] dedup causing problems with NFS?(was Re: snapshots taking too much space)
Daniel Carosone wrote:> On Wed, Apr 14, 2010 at 08:48:42AM -0500, Paul Archer wrote: > >> So I turned deduplication on on my staging FS (the one that gets mounted >> on the database servers) yesterday, and since then I''ve been seeing the >> mount hang for short periods of time off and on. (It lights nagios up >> like a Christmas tree ''cause the disk checks hang and timeout.) >> > > Does it have enough (really, lots) of memory? Do you have an l2arc > cache device attached (as well)? >The OP said he had 8GB of RAM, and I suspect that a cheap SSD in the 40-60GB range for L2ARC would actually be the best choice to speed things up in the future, rather than add another 8GB of RAM.> Dedup has a significant memory requirement, or it has to go to disk > for lots of DDT entries. While its doing that, NFS requests can time > out. Lengthening the timeouts on the client (for the fs mounted as a > backup destination) might help you around the edges of the problem. >Also, destroying the zpool where the deduped snapshots exist is fast, though not really an option if there are other filesystems on it that matter. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
Paul Archer
2010-Apr-15 16:12 UTC
[zfs-discuss] dedup causing problems with NFS?(was Re: snapshots taking too much space)
3:08pm, Daniel Carosone wrote:> On Wed, Apr 14, 2010 at 08:48:42AM -0500, Paul Archer wrote: >> So I turned deduplication on on my staging FS (the one that gets mounted >> on the database servers) yesterday, and since then I''ve been seeing the >> mount hang for short periods of time off and on. (It lights nagios up >> like a Christmas tree ''cause the disk checks hang and timeout.) > > Does it have enough (really, lots) of memory? Do you have an l2arc > cache device attached (as well)? > > Dedup has a significant memory requirement, or it has to go to disk > for lots of DDT entries. While its doing that, NFS requests can time > out. Lengthening the timeouts on the client (for the fs mounted as a > backup destination) might help you around the edges of the problem. > > As a related issue, are your staging (export) and backup fileystems > in the same pool? If they are, moving from staging to final will > involve another round of updating lots of DDT entries. > > What might be worthwhile trying: > - turning dedup *off* on the staging filesystem, so NFS isn''t waiting > for it, and then deduping later as you move to the backup area at > leisure (effectively, asynchronously to the nfs writes). > - or, perhaps eliminating this double work by writing directly to the > main backup fs. >Thanks for the info. FWIW, I have turned off dedup on the staging filesystem, but the dedup''ed data is still there, so it''s a bit late now. The reason I can''t write directly to the main backup FS is that the backup process (RMAN run by my Oracle DBA) writes new files in place, and so my snapshots were taking up 500GB each, vs the 50GB I get if I use rsync instead. I had the dedup turned on on the staging FS so that I could take snapshots of it with dedup and the final FS without dedup (but populated via rsync) to compare which works best. I guess I''ll have to wait until I can get some more RAM on the box. Paul
Paul Archer
2010-Apr-15 16:13 UTC
[zfs-discuss] dedup causing problems with NFS?(was Re: snapshots taking too much space)
Yesterday, Erik Trimble wrote:> Daniel Carosone wrote: >> On Wed, Apr 14, 2010 at 08:48:42AM -0500, Paul Archer wrote: >> >>> So I turned deduplication on on my staging FS (the one that gets mounted >>> on the database servers) yesterday, and since then I''ve been seeing the >>> mount hang for short periods of time off and on. (It lights nagios up like >>> a Christmas tree ''cause the disk checks hang and timeout.) >>> >> >> Does it have enough (really, lots) of memory? Do you have an l2arc >> cache device attached (as well)? > The OP said he had 8GB of RAM, and I suspect that a cheap SSD in the 40-60GB > range for L2ARC would actually be the best choice to speed things up in the > future, rather than add another 8GB of RAM. >I think I''m going to try both. Easier to get one request for upgrades approved than get a second one approved if the first one doesn''t cut it. Paul