ZFS + rsync, backup on steroids. I was thinking today about backing up filesystems, and came up with an awesome idea. Use the power of rsync and ZFS together. Start with a one or two large SATA/PATA drives if you use two and don''t need the space you can mirror other wise just use as in raid0, enable compression unless your files are mostly precompressed, use rsync as the backup tool, the first time you just copy the data over. After you are done, take a snapshot, export the pool. And uninstall the drives until next time. When next time rolls around have rsync update the changed files, as it does block copies of changed data, only a small part of the data has changed. After than is done, take a snapshot. Now thanks to ZFS you have complete access to incremental backups, just look at the desired snapshots. For now rsync doesn''t support nfsv4 acls, but at least you have the data. The best part of this solution is that its completely free, and uses tools that you most likely are are already familiar with, and has features that are only available in commercial apps.
On Aug 29, 2006, at 12:17 PM, James Dickens wrote:> ZFS + rsync, backup on steroids. > > I was thinking today about backing up filesystems, and came up with an > awesome idea. Use the power of rsync and ZFS together. > > Start with a one or two large SATA/PATA drives if you use two and > don''t need the space you can mirror other wise just use as in raid0, > enable compression unless your files are mostly precompressed, use > rsync as the backup tool, the first time you just copy the data over. > After you are done, take a snapshot, export the pool. And uninstall > the drives until next time. When next time rolls around have rsync > update the changed files, as it does block copies of changed data, > only a small part of the data has changed. After than is done, take a > snapshot. > > Now thanks to ZFS you have complete access to incremental backups, > just look at the desired snapshots. For now rsync doesn''t support > nfsv4 acls, but at least you have the data.Yes I concur. This is how we do our backups, rsync + rolling over snapshots. For example, # ls -l /backups-4/pacific/.zfs/snapshot/ total 105 drwxr-xr-x 12 root sys 12 Feb 19 2006 20060620/ drwxr-xr-x 12 root sys 12 Feb 19 2006 20060621/ drwxr-xr-x 12 root sys 12 Feb 19 2006 20060622/ drwxr-xr-x 12 root sys 12 Feb 19 2006 20060623/ drwxr-xr-x 12 root sys 12 Feb 19 2006 20060624/ drwxr-xr-x 12 root sys 12 Feb 19 2006 20060625/ drwxr-xr-x 12 root sys 12 Feb 19 2006 20060626/ drwxr-xr-x 12 root sys 12 Feb 19 2006 20060627/ drwxr-xr-x 12 root sys 12 Feb 19 2006 20060628/ drwxr-xr-x 12 root sys 12 Feb 19 2006 20060629/ [emi:/] root# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT backups4 3.81T 3.39T 434G 88% ONLINE - Regards, J ???????????????????????????????????????????????????????????? Jason A. Hoffman, PhD | Founder, CTO, Joyent Inc. Applications => http://joyent.com/ Hosting => http://textdrive.com/ Backups => http://strongspace.com/ Weblog => http://joyeur.com/ Email => jason at joyent.com or jason at textdrive.com Mobile => (858)342-2179 ????????????????????????????????????????????????????????????
On August 29, 2006 2:17:06 PM -0500 James Dickens <jamesd.wi at gmail.com> wrote:> ZFS + rsync, backup on steroids.Seems to me ''zfs send | zfs recv'' would be both faster and more efficient. -frank
On 8/29/06, Frank Cusack <fcusack at fcusack.com> wrote:> On August 29, 2006 2:17:06 PM -0500 James Dickens <jamesd.wi at gmail.com> wrote: > > ZFS + rsync, backup on steroids. > > Seems to me ''zfs send | zfs recv'' would be both faster and more efficient. >only if you assume, the source is ZFS, with rsync and zfs you could do linux/unix and maybe even windows backups as well. James> -frank >
On August 29, 2006 3:17:21 PM -0500 James Dickens <jamesd.wi at gmail.com> wrote:> On 8/29/06, Frank Cusack <fcusack at fcusack.com> wrote: >> On August 29, 2006 2:17:06 PM -0500 James Dickens <jamesd.wi at gmail.com> wrote: >> > ZFS + rsync, backup on steroids. >> >> Seems to me ''zfs send | zfs recv'' would be both faster and more efficient. >> > only if you assume, the source is ZFS, with rsync and zfs you could do > linux/unix and maybe even windows backups as well.nice! -frank
On Tue, Aug 29, 2006 at 02:17:06PM -0500, James Dickens wrote:> ZFS + rsync, backup on steroids.I''ve long thought that network filesystem protocols could implement portion of the rsync algorithm, namely: - servers could compute rsync rolling CRC file checksums - ZFS could do it at the lowest layer, and might even cache this on-disk - servers could compute and send back to clients file diffs based on rsync file checksums sent by the client - clients could apply rsync diffs to server-side files efficiently if the server provided operations such as "read fh X, offset Y, length Z, write to some other fh at some other offset." I would love to see extensions to NFSv4, SFTP, for efficient distributed data synchronization based on the rsync algorithm. Additionally, since ZFS tracks differences between snapshost and filesystems there''s no reason that ZFS could not export block-level diffs for individual files (with the same dnode/generation numbers). (Finding differences like renames/links/unlinks is another story.) Nico --
On 30/08/2006, at 5:17 AM, James Dickens wrote:> ZFS + rsync, backup on steroids. > > I was thinking today about backing up filesystems, and came up with an > awesome idea. Use the power of rsync and ZFS together. > > Start with a one or two large SATA/PATA drives if you use two and > don''t need the space you can mirror other wise just use as in raid0, > enable compression unless your files are mostly precompressed, use > rsync as the backup tool, the first time you just copy the data over. > After you are done, take a snapshot, export the pool. And uninstall > the drives until next time. When next time rolls around have rsync > update the changed files, as it does block copies of changed data, > only a small part of the data has changed. After than is done, take a > snapshot. > > Now thanks to ZFS you have complete access to incremental backups, > just look at the desired snapshots. For now rsync doesn''t support > nfsv4 acls, but at least you have the data. > > The best part of this solution is that its completely free, and uses > tools that you most likely are are already familiar with, and has > features that are only available in commercial apps.I''ve been doing this for a while (although I don''t remove the disks, just keep them on the other side of the network). I got the idea from the tool I was using before (http:// www.rsnapshot.org/) which uses hard links to reduce the space usage at the destination. You might like to consider the --inplace option to rsync which should reduce the space usage for files which change in place, since rsync will just do the changed blocks, rather than making a copy then applying the changes. The latter will result in all unchanged blocks in the file being duplicated (in snapshots) on ZFS. Boyd
On 8/29/06, James Dickens <jamesd.wi at gmail.com> wrote:> ZFS + rsync, backup on steroids. >If you combine this with a de-duplication algorithm you could get really space-efficient backups. Suppose you have 100 (or 1000, or 10000) machines to back up that are the same 3 GB OS image + mixed bag of apps + various prod/non-prod copies of databases + per-machine customization. Wouldn''t it be nice if the backup server would store figure out that each machine is mostly the same and store one copy. Perhaps having a mechanism that it would store a per-block checksum in a database, then look for matches by checksum (aka hash) each time a block is written. Hash collissions should be verified with full block compare. Then you could create your restore procedure as a CGI or similar web magic that generates a flar based upon the URL+args provided. That URL can then be used in a jumpstart profile as "archive_location http://backupserver.mycompany.com/flar/...". A finish script would be responsible for using rsync or simlar to copy the sysidcfg-related files that jumpstart/flar refuses to preserve. FWIW, de-duplication seems to be a hot topic in VTLs (Virtual Tape Libraries). This would be an awesome feature to have in ZFS, even if the de-duplication happens as a later pass similar to zfs scrub. Mike -- Mike Gerdts http://mgerdts.blogspot.com/
Hello Jason, Tuesday, August 29, 2006, 9:35:13 PM, you wrote: JAH> On Aug 29, 2006, at 12:17 PM, James Dickens wrote:>> ZFS + rsync, backup on steroids. >> >> I was thinking today about backing up filesystems, and came up with an >> awesome idea. Use the power of rsync and ZFS together. >> >> Start with a one or two large SATA/PATA drives if you use two and >> don''t need the space you can mirror other wise just use as in raid0, >> enable compression unless your files are mostly precompressed, use >> rsync as the backup tool, the first time you just copy the data over. >> After you are done, take a snapshot, export the pool. And uninstall >> the drives until next time. When next time rolls around have rsync >> update the changed files, as it does block copies of changed data, >> only a small part of the data has changed. After than is done, take a >> snapshot. >> >> Now thanks to ZFS you have complete access to incremental backups, >> just look at the desired snapshots. For now rsync doesn''t support >> nfsv4 acls, but at least you have the data.JAH> Yes I concur. This is how we do our backups, rsync + rolling over JAH> snapshots. Why not make a snapshots on a production and then send incremental backups over net? Especially with a lot of files it should be MUCH faster than rsync. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
On 8/30/06, Robert Milkowski <rmilkowski at task.gda.pl> wrote:> Hello Jason, > > Tuesday, August 29, 2006, 9:35:13 PM, you wrote: > > JAH> On Aug 29, 2006, at 12:17 PM, James Dickens wrote: > >> ZFS + rsync, backup on steroids. > >> > >> I was thinking today about backing up filesystems, and came up with an > >> awesome idea. Use the power of rsync and ZFS together. > >> > >> Start with a one or two large SATA/PATA drives if you use two and > >> don''t need the space you can mirror other wise just use as in raid0, > >> enable compression unless your files are mostly precompressed, use > >> rsync as the backup tool, the first time you just copy the data over. > >> After you are done, take a snapshot, export the pool. And uninstall > >> the drives until next time. When next time rolls around have rsync > >> update the changed files, as it does block copies of changed data, > >> only a small part of the data has changed. After than is done, take a > >> snapshot. > >> > >> Now thanks to ZFS you have complete access to incremental backups, > >> just look at the desired snapshots. For now rsync doesn''t support > >> nfsv4 acls, but at least you have the data. > > JAH> Yes I concur. This is how we do our backups, rsync + rolling over > JAH> snapshots. > > Why not make a snapshots on a production and then send incremental > backups over net? Especially with a lot of files it should be MUCH > faster than rsync. >because its a ZFS limited solution, if the source is not ZFS it won''t work, and i''m not sure how much faster incrementals would be than rsysnc since rsync only shares checksums untill it finds a block that has changed. James> -- > Best regards, > Robert mailto:rmilkowski at task.gda.pl > http://milek.blogspot.com > >
James Dickens wrote:>> Why not make a snapshots on a production and then send incremental >> backups over net? Especially with a lot of files it should be MUCH >> faster than rsync. >> > because its a ZFS limited solution, if the source is not ZFS it won''t > work, and i''m not sure how much faster incrementals would be than > rsysnc since rsync only shares checksums untill it finds a block that > has changed.''zfs send'' is *incredibly* faster than rsync. rsync needs to traverse all the metadata, so it is fundamentally O(all metadata). It needs to read every directory and stat every file, to figure out what''s been changed. Then it needs to read all of every changed file to figure out what parts of it have been changed. In contrast, ''zfs send'' essentially only needs to read the changed data, so it is O(changed data). We can do this by leveraging our knowledge of the zfs internal structure, eg. block birth times. That said, there is still a bunch of low-hanging performance fruit in ''zfs send'', which I''ll be working on over the next few months. And of course if you need a cross-filesystem tool then ''zfs send'' is not for you. But give it a try if you can, and let us know how it works for you! --matt
On 30/08/06, Matthew Ahrens <Matthew.Ahrens at sun.com> wrote:> ''zfs send'' is *incredibly* faster than rsync.That''s interesting. We had considered it as a replacement for a certain task (publishing a master docroot to multiple webservers) but a quick test with ~500Mb of data showed the zfs send/recv to be about 5x slower than rsync for the initial copy. You''re saying subsequent copies (zfs send -i?) should be faster? -- Rasputin :: Jack of All Trades - Master of Nuns http://number9.hellooperator.net/
Dick Davies wrote:> On 30/08/06, Matthew Ahrens <Matthew.Ahrens at sun.com> wrote: > >> ''zfs send'' is *incredibly* faster than rsync. > > That''s interesting. We had considered it as a replacement for a > certain task (publishing a master docroot to multiple webservers) > but a quick test with ~500Mb of data showed the zfs send/recv > to be about 5x slower than rsync for the initial copy. > > You''re saying subsequent copies (zfs send -i?) should be faster?Yes. The architectural benefits of ''zfs send'' over rsync only apply to sending incremental changes. When sending a full backup, both schemes have to traverse all the metadata and send all the data, so the *should* be about the same speed. However, as I mentioned, there''s still some low-hanging performance issues with ''zfs send'', although I''m surprised that it was 5x slower than rsync! I''d like to look into that issue some more... What type of files were you sending? Eg. approximately what size files, how many files, how many files/directory? --matt
On Wed, Aug 30, 2006 at 07:51:45PM +0100, Dick Davies wrote:> On 30/08/06, Matthew Ahrens <Matthew.Ahrens at sun.com> wrote: > > >''zfs send'' is *incredibly* faster than rsync. > > That''s interesting. We had considered it as a replacement for a > certain task (publishing a master docroot to multiple webservers) > but a quick test with ~500Mb of data showed the zfs send/recv > to be about 5x slower than rsync for the initial copy. > > You''re saying subsequent copies (zfs send -i?) should be faster?Yes and no. Depends on the nature of the changes that have happened. Re-writing the same file contents would result in the whole file''s contents appearing in the incremental ZFS backup, but the rsync overhead for synchronizing the same file would be minimal (just the size of the rsync checksum, which is proportional to the file size, but much smaller). ZFS tracks changes transactionally and block-wise. Whereas rsync detects changes and produces remove+insert deltas. So ZFS snapshots/zfs send -i and rsync are very different things, complementary things perhaps, but different. Nico --
On 30/08/06, Matthew Ahrens <Matthew.Ahrens at sun.com> wrote:> Yes. The architectural benefits of ''zfs send'' over rsync only apply to > sending incremental changes. When sending a full backup, both schemes > have to traverse all the metadata and send all the data, so the *should* > be about the same speed.Cool! I''ll retry it then.> However, as I mentioned, there''s still some low-hanging performance > issues with ''zfs send'', although I''m surprised that it was 5x slower > than rsync! I''d like to look into that issue some more... What type of > files were you sending? Eg. approximately what size files, how many > files, how many files/directory?It was a copy of /usr/ports from freebsd, so around 500mb of small textfiles. Bear in mind I''m talking from memory, and it was just a quick test. I''ll retry and let you know if I see a similar problem - if you don''t hear anything, I couldn''t replicate it. Thanks! -- Rasputin :: Jack of All Trades - Master of Nuns http://number9.hellooperator.net/
Dick Davies wrote:> On 30/08/06, Matthew Ahrens <Matthew.Ahrens at sun.com> wrote: > >> ''zfs send'' is *incredibly* faster than rsync. > > That''s interesting. We had considered it as a replacement for a > certain task (publishing a master docroot to multiple webservers) > but a quick test with ~500Mb of data showed the zfs send/recv > to be about 5x slower than rsync for the initial copy. > > You''re saying subsequent copies (zfs send -i?) should be faster?There''s definitely something anomalous going on if you are seeing ''zfs send|recv'' being 5x slower than rsync. I just did some quick tests on some mediocre machines and send|recv is much faster than rsync. So it would be great if you could describe the setup where you are seeing it be 5x slower, so that we can try to diagnose it. I used two different source filesystems: "workspace" has 98,384 files totaling 1.86GB "big files" has 2 files totaling 2.13GB I ran zfs send|recv as: ptime zfs send pool/fs at snap | \ ssh -c blowfish hostname zfs recv -dv pool/recvd I ran rsync as: ptime rsync -a -e "ssh -c blowfish" /pool/fs hostname:/pool/rsync/fs And the results: zfs send|recv rsync full workspace: 220s (8.6MB/s) 312s (6.1MB/s) incremental workspace: <1s 51s full big files 207s (10.6MB/s) 204s (10.7MB/s) incremental big files: <1s <1s So as you can see, we are somewhat constrained by the 100mbit/sec link between these machines, but on the workspace, the full backup is 40% faster with send|recv, and the incremental > 50x faster. To eliminate the network, I tried it between two pools on the same machine (without using ssh): zfs send|recv rsync full workspace: 133s (14.9MB/s) 339 (5.6MB/s) incremental workspace: <1s 56s full big files 74s (29.6MB/s) 70s (31.1MB/s) Here, the full workspace backup was 2.5x faster with send|recv, the incremental >50x faster, and the full big files 5% slower. --matt
> > Why not make a snapshots on a production and then send incremental > > backups over net? Especially with a lot of files it should be MUCH > > faster than rsync. > > > because its a ZFS limited solution, if the source is not ZFS it won''t > work, and i''m not sure how much faster incrementals would be than > rsysnc since rsync only shares checksums untill it finds a block that > has changed.At small sizes, everything may be fine. But here are two things I would watch out for when doing this. #1 Rsync can run into problems on very large (# of files) filesystems. I''ve used rsync to copy some pretty big datasets. I had some where rsync would take a couple of hours before even starting to send data because it had to run the entire filesystem first. (This has nothing to do with ZFS). #2 Rsync tries to minimize the network transport, not local I/O. If you have small files, this isn''t a problem. If you have large DB files, it might be. While rsync can detect that only a small number of blocks have changed, it will attempt to update that file atomically on the target machine. It does this by using a temporary file and renaming it after completing any updates. ZFS will see this as a completely new file, losing potential space savings from snapshots. -- Darren Dunham ddunham at taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. >
Richard L. Hamilton
2006-Aug-31 06:17 UTC
[zfs-discuss] Re: ZFS + rsync, backup on steroids.
Are both of you doing a umount/mount (or export/import, I guess) of the source filesystem before both first and second test? Otherwise, there might still be a fair bit of cached data left over from the first test, which would give the 2nd an unfair advantage. I''m fairly sure unmounting a filesystem invalidates all cached pages associated with files on that filesystem, as well as any cached [iv]node entries, all of which in needed to ensure both tests are starting from the most similar situation possible. Ideally, all this would even be done in single-user mode, so that nothing else could interfere. If there were a list of precautions to take that would put comparisons like this on firmer ground, it might provide a good starting point for such comparisons to be more than anecdotes, saving time for all concerned, both those attempting to replicate a prior casual observation for reporting, and those looking at the report. This message posted from opensolaris.org
Hello Richard, Thursday, August 31, 2006, 8:17:41 AM, you wrote: RLH> Are both of you doing a umount/mount (or export/import, I guess) of the RLH> source filesystem before both first and second test? Otherwise, there might RLH> still be a fair bit of cached data left over from the first test, which would RLH> give the 2nd an unfair advantage. I''m fairly sure unmounting a filesystem RLH> invalidates all cached pages associated with files on that filesystem, as well RLH> as any cached [iv]node entries, all of which in needed to ensure both tests RLH> are starting from the most similar situation possible. Ideally, all this would RLH> even be done in single-user mode, so that nothing else could interfere. IIRC unmounting ZFS file system won''t flush its caches - you''ve got to export entire pool. -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
Robert Milkowski wrote:> Hello Richard, > > Thursday, August 31, 2006, 8:17:41 AM, you wrote: > > RLH> Are both of you doing a umount/mount (or export/import, I guess) of the > RLH> source filesystem before both first and second test? Otherwise, there might > RLH> still be a fair bit of cached data left over from the first test, which would > RLH> give the 2nd an unfair advantage. I''m fairly sure unmounting a filesystem > RLH> invalidates all cached pages associated with files on that filesystem, as well > RLH> as any cached [iv]node entries, all of which in needed to ensure both tests > RLH> are starting from the most similar situation possible. Ideally, all this would > RLH> even be done in single-user mode, so that nothing else could interfere. > > IIRC unmounting ZFS file system won''t flush its caches - you''ve got to > export entire pool.That''s correct. And I did ensure that the data was not cached before each of my tests. --matt
Matthew Ahrens writes: > Robert Milkowski wrote: > > Hello Richard, > > > > Thursday, August 31, 2006, 8:17:41 AM, you wrote: > > > > RLH> Are both of you doing a umount/mount (or export/import, I guess) of the > > RLH> source filesystem before both first and second test? Otherwise, there might > > RLH> still be a fair bit of cached data left over from the first test, which would > > RLH> give the 2nd an unfair advantage. I''m fairly sure unmounting a filesystem > > RLH> invalidates all cached pages associated with files on that filesystem, as well > > RLH> as any cached [iv]node entries, all of which in needed to ensure both tests > > RLH> are starting from the most similar situation possible. Ideally, all this would > > RLH> even be done in single-user mode, so that nothing else could interfere. > > > > IIRC unmounting ZFS file system won''t flush its caches - you''ve got to > > export entire pool. > > That''s correct. And I did ensure that the data was not cached before > each of my tests. > > --matt > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Matt ? It seems to me that (at least in the past) unmount would actually cause the data to not be accessible (read would issue an I/O) even if potentially the associated memory with previous cached data was not quite reaped back to the OS. I''m currently going on umount to clear the cache. export to free up the memory. Does this sound correct ? -r
Roch wrote:> Matthew Ahrens writes: > > Robert Milkowski wrote: > > > IIRC unmounting ZFS file system won''t flush its caches - you''ve got to > > > export entire pool. > > > > That''s correct. And I did ensure that the data was not cached before > > each of my tests. > > Matt ? > > It seems to me that (at least in the past) unmount would > actually cause the data to not be accessible (read would > issue an I/O) even if potentially the associated memory with > previous cached data was not quite reaped back to the OS.Looks like you''re right, we do (mostly) evict the data when a filesystem is unmounted. The exception is if some of its cached data is being shared with another filesystem (eg, via a clone fs), then that data will not be evicted. --matt
I''m working on replication of ZFS. Using perl script and SSH access with authorized key. My script automatically creates a new snapshot for each ZFS filesystem. It''s too slow to send snapshots to remote server. Not because of size of snapshot. Cause SSH with authorized key takes several seconds to complete. Anybody have suggestions? Thank you in advance. This message posted from opensolaris.org
hi there, On Sun, 2006-09-10 at 23:49 -0700, Bui Minh Truong wrote:> I''m working on replication of ZFS. Using perl script > and SSH access with authorized key.Cool - I did exactly same thing last week, adding send/receive functionality to the SMF service I had been playing with. More at http://blogs.sun.com/timf/entry/zfs_automatic_snapshots_now_with I''ve used ssh there to send/receive between servers with no problems.> My script automatically creates a new snapshot for each ZFS filesystem. > It''s too slow to send snapshots to remote server. Not because of size of snapshot. > Cause SSH with authorized key takes several seconds to complete.Can you give more details : what''s the ssh machine you''re logging into (OS and version of ssh, and perhaps the amount of encryption you''re doing). How fast is the network between the two machines ? Does "ssh -v" tell you any more ? cheers, tim -- Tim Foster, Sun Microsystems Inc, Solaris Engineering Ops http://blogs.sun.com/timf
Bui Minh Truong
2006-Sep-11 13:39 UTC
[zfs-discuss] Re: Re: ZFS + rsync, backup on steroids.
Hi Thanks for your replying.>Can you give more details : what''s the ssh machine you''re logging into >(OS and version of ssh, and perhaps the amount of encryption you''re >doing).Two of my machines are T-2000 ( Sun servers) running Sparc Solaris.>How fast is the network between the two machines ?Network is good too. LAN with broadband.>Does "ssh -v" tell you any more ?I don''t think problem is ZFS send/recv. I think it''s take a lot of time to connect over SSH. I tried to access SSH by typing: ssh remote_machine. It also takes serveral seconds( one or a half second) to connect. Maybe because of Solaris SSH. If you have 100files, it may take : 1000 x 0.5 = 500seconds So that I gave up that solution. I wrote 2 pieces of perl script: client and server. Their roles are similar to ssh and sshd, then I can connect faster. Do you have any suggestions? This message posted from opensolaris.org
On Mon, 2006-09-11 at 06:39 -0700, Bui Minh Truong wrote:> Do you have any suggestions?Yeah, I think we need more information to debug this: I''m seeing - timf at haiiro[508] ptime ssh usuki hostname usuki real 0.600 user 0.065 sys 0.013 - Oh, and yeah - what James said :-) cheers, tim -- Tim Foster, Sun Microsystems Inc, Solaris Engineering Ops http://blogs.sun.com/timf
James C. McPherson
2006-Sep-11 13:51 UTC
[zfs-discuss] Re: Re: ZFS + rsync, backup on steroids.
Bui Minh Truong wrote:> Hi Thanks for your replying. > >> Can you give more details : what''s the ssh machine you''re logging into >> (OS and version of ssh, and perhaps the amount of encryption you''re >> doing). > Two of my machines are T-2000 ( Sun servers) running Sparc Solaris. > >> How fast is the network between the two machines ? > Network is good too. LAN with broadband. > >> Does "ssh -v" tell you any more ? > I don''t think problem is ZFS send/recv. I think it''s take a lot of time to > connect over SSH. I tried to access SSH by typing: ssh remote_machine. It > also takes serveral seconds( one or a half second) to connect. Maybe > because of Solaris SSH. If you have 100files, it may take : 1000 x 0.5 > 500secondsNot necessarily the correct conclusion to make.> So that I gave up that solution. I wrote 2 pieces of perl script: client > and server. Their roles are similar to ssh and sshd, then I can connect > faster.Are your perl scripts doing name lookups? Is your DNS server external to either of your client / server machines?> Do you have any suggestions?Yes, get some real statistics which actually measure what is going on with your system. I suggest you make use of Brendan Gregg''s DTrace Toolkit (http://www.brendangregg.com/dtrace.html#DTraceToolkit) James C. McPherson
A tool like ''hardlink'' will only work for a read-only repository, or one in which files can never be overwritten, only replaced. For true deduplication you really want the underlying file system to have support for ''breaking'' the hard link when one file is changed; basically, copy-on-write semantics. This message posted from opensolaris.org
Nicolas Williams
2006-Sep-11 15:28 UTC
[zfs-discuss] Re: Re: ZFS + rsync, backup on steroids.
On Mon, Sep 11, 2006 at 06:39:28AM -0700, Bui Minh Truong wrote:> >Does "ssh -v" tell you any more ? > I don''t think problem is ZFS send/recv. I think it''s take a lot of time to connect over SSH. > I tried to access SSH by typing: ssh remote_machine. It also takes serveral seconds( one or a half second) to connect. Maybe because of Solaris SSH. > If you have 100files, it may take : 1000 x 0.5 = 500secondsYou''re not doing making an SSH connection for every file though -- you''re making an SSH connection for every snapshot. Now, if you''re taking snapshots every second, and each SSH connection takes on the order of .5 seconds, then you might have a problem.> So that I gave up that solution. I wrote 2 pieces of perl script: > client and server. Their roles are similar to ssh and sshd, then I can > connect faster.But is that secure?> Do you have any suggestions?Yes. First, let''s see if SSH connection establishment latency is a real problem. Second, you could adapt your Perl scripts to work over a persistent SSH connection, e.g., by using SSH port forwarding: % ssh -N -L 12345:localhost:56789 remote-host Now you have a persistent SSH connection to remote-host that forwards connections to localhost:12345 to port 56789 on remote-host. So now you can use your Perl scripts more securely. Nico --
On 12/09/2006, at 1:28 AM, Nicolas Williams wrote:> On Mon, Sep 11, 2006 at 06:39:28AM -0700, Bui Minh Truong wrote: >>> Does "ssh -v" tell you any more ? >> I don''t think problem is ZFS send/recv. I think it''s take a lot of >> time to connect over SSH. >> I tried to access SSH by typing: ssh remote_machine. It also takes >> serveral seconds( one or a half second) to connect. Maybe because >> of Solaris SSH. >> If you have 100files, it may take : 1000 x 0.5 = 500seconds > > You''re not doing making an SSH connection for every file though -- > you''re making an SSH connection for every snapshot. > > Now, if you''re taking snapshots every second, and each SSH connection > takes on the order of .5 seconds, then you might have a problem. > >> So that I gave up that solution. I wrote 2 pieces of perl script: >> client and server. Their roles are similar to ssh and sshd, then I >> can >> connect faster. > > But is that secure? > >> Do you have any suggestions? > > Yes. > > First, let''s see if SSH connection establishment latency is a real > problem. > > Second, you could adapt your Perl scripts to work over a persistent > SSH > connection, e.g., by using SSH port forwarding: > > % ssh -N -L 12345:localhost:56789 remote-host > > Now you have a persistent SSH connection to remote-host that forwards > connections to localhost:12345 to port 56789 on remote-host. > > So now you can use your Perl scripts more securely.It would be *so* nice if we could get some of the OpenSSH behaviour in this area. Recent versions include the ability to open a persistent connection and then automatically re-use it for subsequent connections to the same host/user.
Bui Minh Truong
2006-Sep-12 10:36 UTC
[zfs-discuss] Re: Re: Re: ZFS + rsync, backup on steroids.
Thank you all for your advices. Finally, I chose the way writing 2 scripts ( client & server) using Port forwading via SSH for security reasons. This message posted from opensolaris.org
Nicolas Williams
2006-Sep-12 14:29 UTC
[zfs-discuss] Re: Re: ZFS + rsync, backup on steroids.
On Tue, Sep 12, 2006 at 05:57:33PM +1000, Boyd Adamson wrote:> On 12/09/2006, at 1:28 AM, Nicolas Williams wrote: > >Now you have a persistent SSH connection to remote-host that forwards > >connections to localhost:12345 to port 56789 on remote-host. > > > >So now you can use your Perl scripts more securely. > > It would be *so* nice if we could get some of the OpenSSH behaviour > in this area. Recent versions include the ability to open a > persistent connection and then automatically re-use it for subsequent > connections to the same host/user.There''s an RFE for this.