Donald Pearson
2011-Jul-11  20:01 UTC
Feature request, or HowTo? State-full resume rsync transfer
I am looking to do state-full resume of rsync transfers. My network environment is is an unreliable and slow satellite infrastructure, and the files I need to send are approaching 10 gigs in size. In this network environment often times links cannot be maintained for more than a few minutes at a time. In this environment, bandwidth is at a premium, which is why rsync was chosen as ideal for the job. The problem that I am encountering while using rsync in these conditions is that the connection between the client and server will drop due to network instability before rsync can transfer the entire file. Upon retries, rsync starts from the beginning. Re-checking data that has already been sent, as well as re-building the checksum in it's entirety every time. Eventually I reach an impasse where the frequency of link loss prevents rsync from ever getting any new data to the destination. I've been reading through the various switches in the man pages to try to find a combination that will work. My thinking was to use a combination of --partial and --append. With the first attempt using the --partial switch, and subsequent attempts using both --partial and --append. The idea being rsync would build a new "partial" file, and be able to resume building that file while making the assumption upon subsequent retries that the existing partial file, however large it may be, was assembled correctly and does not need to be checked. However in practice rsync does not work in this way. I did not find any other switches or methods that would enable rsync to literally pick up where it left off, without destroying the original destination file, so that it's blocks can be used to minimize transferred data and not need to always start from block #1. Such that the aggregate of multiple rsync attempts are able to complete the transfer as a whole while still maintaining the minimum amount of data "on the wire" as if the file was sent in a single rsync session. If this is possible with rsync's current feature set I would be very appreciative of someones time to reply with an example. Or if this is not currently possible, an idea that comes to mind and ultimately a feature request would be to have a switch that tells rsync upon session drop, to do a memory dump of its checksum list, and the last completed block worked on, to a provided file name specified by the switch. This way, with a 2nd switch, rsync can be executed again and will reference this memory dump file, instead of rebuilding a new checksum list, and use that to pick up where it left off or "restore previous state", instead of starting over from block #1. Best regards, Donald -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20110711/995cc4b3/attachment.html>
Eberhard Moenkeberg
2011-Jul-11  20:13 UTC
Feature request, or HowTo? State-full resume rsync transfer
Hi, On Mon, 11 Jul 2011, Donald Pearson wrote:> I am looking to do state-full resume of rsync transfers. > > My network environment is is an unreliable and slow satellite > infrastructure, and the files I need to send are approaching 10 gigs in > size. In this network environment often times links cannot be maintained > for more than a few minutes at a time. In this environment, bandwidth is at > a premium, which is why rsync was chosen as ideal for the job. > > The problem that I am encountering while using rsync in these conditions is > that the connection between the client and server will drop due to network > instability before rsync can transfer the entire file. > > Upon retries, rsync starts from the beginning. Re-checking data that has > already been sent, as well as re-building the checksum in it's entirety > every time. Eventually I reach an impasse where the frequency of link loss > prevents rsync from ever getting any new data to the destination. > > I've been reading through the various switches in the man pages to try to > find a combination that will work. My thinking was to use a combination of > --partial and --append. With the first attempt using the --partial switch, > and subsequent attempts using both --partial and --append. The idea being > rsync would build a new "partial" file, and be able to resume building that > file while making the assumption upon subsequent retries that the existing > partial file, however large it may be, was assembled correctly and does not > need to be checked. > > However in practice rsync does not work in this way. I did not find any > other switches or methods that would enable rsync to literally pick up where > it left off, without destroying the original destination file, so that it's > blocks can be used to minimize transferred data and not need to always start > from block #1. Such that the aggregate of multiple rsync attempts are able > to complete the transfer as a whole while still maintaining the minimum > amount of data "on the wire" as if the file was sent in a single rsync > session. > > If this is possible with rsync's current feature set I would be very > appreciative of someones time to reply with an example. > > Or if this is not currently possible, an idea that comes to mind and > ultimately a feature request would be to have a switch that tells rsync upon > session drop, to do a memory dump of its checksum list, and the last > completed block worked on, to a provided file name specified by the switch. > This way, with a 2nd switch, rsync can be executed again and will reference > this memory dump file, instead of rebuilding a new checksum list, and use > that to pick up where it left off or "restore previous state", instead of > starting over from block #1.In my experience, re-checking the already received "partial" blocks takes about 3 minutes for a 4 GB partial file. Viele Gruesse Eberhard Moenkeberg (emoenke at gwdg.de, em at kki.org) -- Eberhard Moenkeberg Arbeitsgruppe IT-Infrastruktur E-Mail: emoenke at gwdg.de Tel.: +49 (0)551 201-1551 ------------------------------------------------------------------------- Gesellschaft fuer wissenschaftliche Datenverarbeitung mbH Goettingen (GWDG) Am Fassberg 11, 37077 Goettingen URL: http://www.gwdg.de E-Mail: gwdg at gwdg.de Tel.: +49 (0)551 201-1510 Fax: +49 (0)551 201-2150 Geschaeftsfuehrer: Prof. Dr. Oswald Haan und Dr. Paul Suren Aufsichtsratsvorsitzender: Prof. Dr. Christian Griesinger Sitz der Gesellschaft: Goettingen Registergericht: Goettingen Handelsregister-Nr. B 598 -------------------------------------------------------------------------
Matthias Schniedermeyer
2011-Jul-12  07:52 UTC
Feature request, or HowTo? State-full resume rsync transfer
On 11.07.2011 16:01, Donald Pearson wrote:> I am looking to do state-full resume of rsync transfers. > > My network environment is is an unreliable and slow satellite > infrastructure, and the files I need to send are approaching 10 gigs in > size. In this network environment often times links cannot be maintained > for more than a few minutes at a time. In this environment, bandwidth is at > a premium, which is why rsync was chosen as ideal for the job. > > The problem that I am encountering while using rsync in these conditions is > that the connection between the client and server will drop due to network > instability before rsync can transfer the entire file. > > Upon retries, rsync starts from the beginning. Re-checking data that has > already been sent, as well as re-building the checksum in it's entirety > every time. Eventually I reach an impasse where the frequency of link loss > prevents rsync from ever getting any new data to the destination. > > I've been reading through the various switches in the man pages to try to > find a combination that will work. My thinking was to use a combination of > --partial and --append. With the first attempt using the --partial switch, > and subsequent attempts using both --partial and --append. The idea being > rsync would build a new "partial" file, and be able to resume building that > file while making the assumption upon subsequent retries that the existing > partial file, however large it may be, was assembled correctly and does not > need to be checked. > > However in practice rsync does not work in this way.I think you didn't wait for the target rsync to complete, if a connection breaks, you have 2 parts left hanging. The less visible target-side is the important one here. That rsync has to "complete" before you do another try. Depending on how your connection drops it MAY hang for some time. I don't remember if rsync does "the right thing" if you just kill it, or if you have to wait for it. In the latter case "--timeout" sounds like it can be used to expedite matters. And also --inplace, with or without --append, reads like it is what you want, if you can live with it's caveats. Bis denn -- Real Programmers consider "what you see is what you get" to be just as bad a concept in Text Editors as it is in women. No, the Real Programmer wants a "you asked for it, you got it" text editor -- complicated, cryptic, powerful, unforgiving, dangerous.
Larry Irwin
2011-Jul-12  18:50 UTC
How to call rsync client so that is detects that server has gone away?
Hi All,
I have an rsync client that is not re-starting and/or timing out when 
the server processes die/go away...
The client is rsync  version 3.0.4  protocol version 30, natively 
compiled on Debian 6, kernel 2.6.32.5-amd64.
The server is a QNAP NAS device running rsync version 3.0.6 protocol 
version 30.
(I can change the client version if I need to, but not the version on 
the QNAP device)
I am trying to rsync ~700gb to the QNAP device, hopefully once per day.
There are ~40,000,000 files, many of them rsync-snapshot hard-links.
It takes a long time, so I'm using LVM2 snapshots to get static views of 
the data partitions.
I need to preserve the links or the destination size will multiply by a 
factor of 6 or so...
The command I am using on the client is:
nice -n 19 rsync -ave "ssh -l ${LOGIN} -p ${PORT}" --links
--hard-links
--perms --times --owner --group --delete --stats 
--exclude-from=${RSHOME}/cfg/rsyncbackup.exclude --link-dest=${LINKDEST} 
/ ${LOGIN}@${IPADDRESS}:..${DEST} 2>&1 | gzip -9 > ${LOGNAME}
The client and the QNAP are connected on the LAN with 1Gb NIC's on Gb 
switches.
When/if the server's rsync processes die (for an unknown reason at this 
point), I'd love for the client to recognize this fact and jump-start 
the server processes to pick up where they left off.
If not that, at least to quit and exit in a reasonable fashion...
At this moment, it is 2:45pm: The rsync started at 9:00am - the server 
processes went away at 12:33 PM - the client processes are still running...
Any ideas would be greatly appreciated.
Thanks,
Larry Irwin