Hi All, Just curious about how the incremental send works. Is it changed blocks or files and how are the changed blocks or files identified? Regards, Vic This message posted from opensolaris.org
I''m curious as well. I''m trying to set up a near-line backup using two ZFS-based machines, and am a bit confused on how to set it up properly. First time around, create a snapshot and send it to remote: zfs snapshot master/fs at mirror zfs send master/fs at mirror | ssh mirror zfs recv backup/mirrorfs Once that''s done, fs at mirror=mirrorfs, correct? So if I wanted to start an incremental backup, I''d have to start creating mirror1, mirror2, etc? zfs snapshot master/fs at mirror1 zfs send -i mirror master/fs at mirror1 | ssh mirror zfs recv backup/mirrorfs (make changes) zfs snapshot master/fs at mirror2 zfs send -i mirror1 master/fs at mirror2 | ssh mirror zfs recv backup/mirrorfs etc. So now fs at mirror2=mirrorfs. So I could probably create a script that does a rename, snapshot, and send. But let''s say it was running from cron and some failed (for whatever reason). So now I''m running the script manually and it''s complaining that the incremental source doesn''t match, and apparently no way to tell which fs at mirrorx is the source short of trial-and-error even if I kept all my incremental mirrors? Is there a clean way around this? I guess a better scenario would be that I accidentally destroyed the last fs at mirrorx (so no source snapshot to -i). Would my only recourse it to bite the bullet and do a zfs send fs at mirrorx+1 (aka full)? Another question is if I wanted to send a snapshot as a snapshot, I can do: zfs snapshot master/fs at savesnap zfs send master/fs at savesnap | ssh mirror zfs recv backup/mirrorfs at savesnap And now fs at savesnap=mirrorfs at savesnap, right? But that would involve sending the whole stream, which is however much data the filesystem was consuming at that point in time? How would mirrorfs handle the difference between it and mirrorfs at savesnap? Would it take as much space as the fs at mirrorx to fs at savesnap, or more? Sorry for all these questions (and sort of derailing the thread) but I''d really like to set this up the right way the first time around. I''d love to be able to say zfs mirror master/fs[@snapshot] backup at mirror/mirrorfs and have it sync a fs, either the current fs or a snapshot, but right now it seems as though the only way to do it is with snapshots, send/recv, and fiddling with incremental scripting... Thanks, -- Starfox This message posted from opensolaris.org
Oddly I posted a script that does what you want all be ti without sending it to a remote system on friday to my blog (http://blogs.sun.com/chrisg/entry/rolling_incremental_backups) which i use to backup my system to an external USB drive. --chris This message posted from opensolaris.org
Yes, I''ve read through tons of blogs.sun.com/* entries, went through the mailing list looking for the proper way to do it, etc. Unfortunately, zfs send/recv remains a hack that requires an elaborate script wrapper precisely because zfs send/recv is by nature a send-only/recv-only operation (which was renamed from backup/restore a while ago). It''s fine if you are back up the incrementals to tape and restore it at a later date, but lacking if you have two ZFS that are able to communicate. So on the high-end, you have something like: ZFS-exported-shareiscsi/nfs-put-in-fault-tolerant-ZFS-pool This basically puts one big file on the exporting ZFS, which is required to be at least as large as the smallest pool device of the mirror/raidx pool. So if your local devices are xGB in size, you probably need a x*1.5GB remote device in order for the exported flat file to allow the remote ZFS to do what it needs to do (checksum, etc.) AVS-over-network-to-fault-tolerant-ZFS-pool>From what I can tell (and I''m having problems loading the Flash movie for the demos), AVS basically sits between ZFS and the pool device and monitors any block commands, saves that, and sends it to remote. If you have identical setups, it works fine, but it will not work if you don''t have an idetical setup of x device doing y mirror/raidx on z pool, because it just sends block commands. So I can''t have a near-line backup of a pool unless I mirror the pool setup even with II, so nothing like raidz for pool-a, mirror for pool-b, although both pool-a and pool-b might be an identical size from the perspective of zfs.And on the "low" end, you have: Ghetto-lofiadm-NFS-mirror-and-let-ZFS-bitch-at-you This was off one of the blogs, where the author (you) basically exported a file via NFS, used lofiadm to create a "device", and added that to a pool as a mirror. When you needed a "consistent" state, you just connected the device and let it resilver, and when it was done you disconnect it. And after this you tried using iscsi at a later date. The issue with this setup is that it a) requires slicing off a portion of the fs tree where you actually want a mirrored pool, which basically means ZFS won''t be able to use write cache (goes against let-ZFS-manage-whole-devices philosophy) b) still needs a lot more disk space than the pool "device" size (same problem as iscsi-device-pool) and c) cannot use it where raidx is involved, because the "remote" device is liable to be disconnected at any time. Nonexistent-export-a-whole-device-over-network This I could not find. Basically let the drive sit on one machine, and let it be used as a pool device on another machine. Basically solves issue #b of the lofiadm setup because it doesn''t have the overhead of the underlying ZFS, but still runs into issue #a and #c. Script-a-mirror: I''ve seen couple of different ways of doing this. One is yours, another is ZetaBack, and another is zfs-auto-snapshot. This is why I asked all these weird questions, because it seems that in this point-in-time this is the only way of doing it on a per-filesystem basis. But being that (as I said earlier) send is send-only and recv is recv-only, ZFS will just happily bitch at you if you recv the wrong source incremental, even though both the send and recv might have a common snapshot that they could work off of, it will never know because it won''t communicate with each other. Ideal setup: Ideally, this is what I want to have. Put couple of large HD into a server, let zfs admin the entire device (so it can manage write-cache). Create a Mirror/RAIDx/Hot-spare pool with as much as your budget allows (and in my case, very little - probably 2x160G or so mirror), and create file systems as needed. Put another large HD into another machine, and connect it via a dedicated network segment (which is a wise thing to do with any SAN stuff). Create a Mirror/RAIDx/Hot-spare pool with whatever is left in your budget (in my case, I''ll just take a 60G hard drive from another machine). Now, from my perspective, I have no need/desire to let my 60G near-line the entire content of the 160G. And if I ended up lofiadm/iscsi''ing the 60GB and create a pool with the 160G, I''ll end up wasting 120GB off the 160s unless I slice off that 40GB "near-line mirror" and lose write cache by doing so (correct me if I''m wrong here). I don''t need fail-over or anything. All I want is for what I consider important (ie, Documents, settings, etc.) off a portion of a pool to be replicated onto another machine, so if a catastrophic failure happens where I lose both mirrors due to PS frying or ZFS bit-rotting an important document during a save, I have access to a "recent" copy of a file on another machine. There was a discussion on this forum recently (when I searched) that said that doing a ghetto-mirror is a lot more "easier" than setting up a script-a-mirror, and it is. Since ZFS can see the content of all the devices in the pool, it can take whatever steps are necessary to get it to a consistent and up-to-date. Now the point was raised that doing a ghetto-mirror was not recommended because ZFS had no way to insure that NFS/NIC/target machine didn''t corrupt the stream mid-way. Which is why they recommended doing ZFS-backed iSCSI/NFS share to be used as a ZFS device on another machine. For me, I just don''t see the advantage in this. If NFS bit-flips something, then the ZFS store just wrote a bit-flipped stream of ZFS raw data, which won''t help it one bit when it comes times to read it. The ZFS mirror will just throw a checksum error and ignore that block. Doing the export-a-device seems no different from the results point of view (it''ll still throw a checksum error), albeit with a lot more "potential" paths of error than having the device local, _if_ export-a-device can even be actually done. So what''s preventing ZFS from saying "the user wants to mirror xyz filesystem on another ZFS with matching ZFS version." Is that truly that different from a device mirror that it can''t track changes done to that file system (not pool!), be it a modification, snapshot creation, etc. that it can''t send those changes over-the-wire, be it another ZFS pool on the same machine or on another machine every so often? -- Starfox This message posted from opensolaris.org
Starfox wrote:> I don''t need fail-over or anything. All I want is for what I consider important (ie, Documents, settings, etc.) off a portion of a pool to be replicated onto another machine, so if a catastrophic failure happens where I lose both mirrors due to PS frying or ZFS bit-rotting an important document during a save, I have access to a "recent" copy of a file on another machine.filesync(1) was designed for this task. -- richard
Vic Engle wrote:> Hi All, > > Just curious about how the incremental send works. Is it changed blocks or > files and how are the changed blocks or files identified?It''s done at the DMU layer, based on blocks of objects. We use the block-pointer relationships (ie, the on-disk structure of files) to quickly find the only the changed blocks. --matt
Starfox wrote:> First time around, create a snapshot and send it to remote: zfs snapshot > master/fs at mirror zfs send master/fs at mirror | ssh mirror zfs recv > backup/mirrorfs > > Once that''s done, fs at mirror=mirrorfs, correct?More accurately, master/fs at mirror == backup/mirrorfs at mirror > So now I''m running the script manually> and it''s complaining that the incremental source doesn''t match, and > apparently no way to tell which fs at mirrorx is the source short of > trial-and-error even if I kept all my incremental mirrors? Is there a > clean way around this?I''d say either (a) have your script check to see if the send|recv was successful, or (b) have it check what snapshots are available at the other side, and start sending incrementals from there. Eg: $lastsnap = ssh remote zfs list -o name $fs | tail -1 | cut ''@'' ... do { zfs snapshot $fs@$newsnap zfs send -i $lastsnap $fs@$newsnap | ssh remote zfs recv -d pool/recvd } while (no errors)> I guess a better scenario would be that I accidentally destroyed the last > fs at mirrorx (so no source snapshot to -i). Would my only recourse it to > bite the bullet and do a zfs send fs at mirrorx+1 (aka full)?No, if you have the old snaps on both sides, you can simply send an incremental from the last common snap (eg, zfs send -i mirrorX-1 fs at mirrorX+1)> Another question is if I wanted to send a snapshot as a snapshot, I can > do: zfs snapshot master/fs at savesnap zfs send master/fs at savesnap | ssh > mirror zfs recv backup/mirrorfs at savesnap > > And now fs at savesnap=mirrorfs at savesnap, right? But that would involve > sending the whole stream, which is however much data the filesystem was > consuming at that point in time?Yes. But this will create a new filesystem backup/mirrorfs on the receiving side. I''m not sure I understand your goal -- zfs send always sends a snapshot, whether incremental or full. --matt
> Starfox wrote: > > master/fs at mirror zfs send master/fs at mirror | ssh > mirror zfs recv > > backup/mirrorfs > > > > Once that''s done, fs at mirror=mirrorfs, correct? > > More accurately, master/fs at mirror => backup/mirrorfs at mirror >Okay, that makes a lot more sense now.> I''d say either (a) have your script check to see if > the send|recv was > successful, or (b) have it check what snapshots are > available at the other > side, and start sending incrementals from there. Eg: > > $lastsnap = ssh remote zfs list -o name $fs | tail -1 > | cut ''@'' ... > do { > zfs snapshot $fs@$newsnap > zfs send -i $lastsnap $fs@$newsnap | ssh remote zfs > s recv -d pool/recvd > } while (no errors) >None of the scripts that I looked at seemed to offered any sort of error recovery. I think I''ll be able to use this as a starting point (and maybe the man pages can be updated to include that you can use any common snapshot to send -i - that fact is not obvious to those who are unfamiliar with the capabilities of ZFS).> > Another question is if I wanted to send a snapshot > as a snapshot, I can > > do: zfs snapshot master/fs at savesnap zfs send > master/fs at savesnap | ssh > > mirror zfs recv backup/mirrorfs at savesnap > > > Yes. But this will create a new filesystem > backup/mirrorfs on the receiving > side. I''m not sure I understand your goal -- zfs > send always sends a > snapshot, whether incremental or full. >I guess I''ll have to play around with when I get the systems up and running. But my request to allow two zfs boxes to sync "a" filesystem still stands. -- Starfox This message posted from opensolaris.org
> > Starfox wrote:> > None of the scripts that I looked at seemed to > offered any sort of error recovery. I think I''ll be > able to use this as a starting point (and maybe the > man pages can be updated to include that you can use > any common snapshot to send -i - that fact is not > obvious to those who are unfamiliar with the > capabilities of ZFS). >My experience of handling errors in zfs_backup is that since the script is duplicating the snapshots on the remote file system if it fails I simply run it again and it picks up where it left off. That said it has never failed so far, I''ve interrupted it or the system has crashed due to CR 6566921 but in both cases running the script again it does just what it should, no more no less. --chris This message posted from opensolaris.org