thr3ads.net - rsync - Feature request, or HowTo? State-full resume rsync transfer [Jul 2011]

If this information is useful, please help other people find it:
Share via:

Donald Pearson

2011-Jul-11 20:01 UTC

Feature request, or HowTo? State-full resume rsync transfer

I am looking to do state-full resume of rsync transfers.

My network environment is is an unreliable and slow satellite
infrastructure, and the files I need to send are approaching 10 gigs in
size.  In this network environment often times links cannot be maintained
for more than a few minutes at a time.  In this environment, bandwidth is at
a premium, which is why rsync was chosen as ideal for the job.

The problem that I am encountering while using rsync in these conditions is
that the connection between the client and server will drop due to network
instability before rsync can transfer the entire file.

Upon retries, rsync starts from the beginning.  Re-checking data that has
already been sent, as well as re-building the checksum in it's entirety
every time.  Eventually I reach an impasse where the frequency of link loss
prevents rsync from ever getting any new data to the destination.

I've been reading through the various switches in the man pages to try to
find a combination that will work.  My thinking was to use a combination of
--partial and --append.  With the first attempt using the --partial switch,
and subsequent attempts using both --partial and --append.  The idea being
rsync would build a new "partial" file, and be able to resume building
that
file while making the assumption upon subsequent retries that the existing
partial file, however large it may be, was assembled correctly and does not
need to be checked.

However in practice rsync does not work in this way.  I did not find any
other switches or methods that would enable rsync to literally pick up where
it left off, without destroying the original destination file, so that it's
blocks can be used to minimize transferred data and not need to always start
from block #1.  Such that the aggregate of multiple rsync attempts are able
to complete the transfer as a whole while still maintaining the minimum
amount of data "on the wire" as if the file was sent in a single rsync
session.

If this is possible with rsync's current feature set I would be very
appreciative of someones time to reply with an example.

Or if this is not currently possible, an idea that comes to mind and
ultimately a feature request would be to have a switch that tells rsync upon
session drop, to do a memory dump of its checksum list, and the last
completed block worked on, to a provided file name specified by the switch.
 This way, with a 2nd switch, rsync can be executed again and will reference
this memory dump file, instead of rebuilding a new checksum list, and use
that to pick up where it left off or "restore previous state", instead
of
starting over from block #1.

Best regards,
Donald
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.samba.org/pipermail/rsync/attachments/20110711/995cc4b3/attachment.html>

Eberhard Moenkeberg

2011-Jul-11 20:13 UTC

head link

Feature request, or HowTo? State-full resume rsync transfer

Hi,

On Mon, 11 Jul 2011, Donald Pearson wrote:
> I am looking to do state-full resume of rsync transfers.
>
> My network environment is is an unreliable and slow satellite
> infrastructure, and the files I need to send are approaching 10 gigs in
> size.  In this network environment often times links cannot be maintained
> for more than a few minutes at a time.  In this environment, bandwidth is
at
> a premium, which is why rsync was chosen as ideal for the job.
>
> The problem that I am encountering while using rsync in these conditions is
> that the connection between the client and server will drop due to network
> instability before rsync can transfer the entire file.
>
> Upon retries, rsync starts from the beginning.  Re-checking data that has
> already been sent, as well as re-building the checksum in it's entirety
> every time.  Eventually I reach an impasse where the frequency of link loss
> prevents rsync from ever getting any new data to the destination.
>
> I've been reading through the various switches in the man pages to try
to
> find a combination that will work.  My thinking was to use a combination of
> --partial and --append.  With the first attempt using the --partial switch,
> and subsequent attempts using both --partial and --append.  The idea being
> rsync would build a new "partial" file, and be able to resume
building that
> file while making the assumption upon subsequent retries that the existing
> partial file, however large it may be, was assembled correctly and does not
> need to be checked.
>
> However in practice rsync does not work in this way.  I did not find any
> other switches or methods that would enable rsync to literally pick up
where
> it left off, without destroying the original destination file, so that
it's
> blocks can be used to minimize transferred data and not need to always
start
> from block #1.  Such that the aggregate of multiple rsync attempts are able
> to complete the transfer as a whole while still maintaining the minimum
> amount of data "on the wire" as if the file was sent in a single
rsync
> session.
>
> If this is possible with rsync's current feature set I would be very
> appreciative of someones time to reply with an example.
>
> Or if this is not currently possible, an idea that comes to mind and
> ultimately a feature request would be to have a switch that tells rsync
upon
> session drop, to do a memory dump of its checksum list, and the last
> completed block worked on, to a provided file name specified by the switch.
> This way, with a 2nd switch, rsync can be executed again and will reference
> this memory dump file, instead of rebuilding a new checksum list, and use
> that to pick up where it left off or "restore previous state",
instead of
> starting over from block #1.
In my experience, re-checking the already received "partial" blocks
takes
about 3 minutes for a 4 GB partial file.


Viele Gruesse
Eberhard Moenkeberg (emoenke at gwdg.de, em at kki.org)

-- 
Eberhard Moenkeberg
Arbeitsgruppe IT-Infrastruktur
E-Mail: emoenke at gwdg.de      Tel.: +49 (0)551 201-1551
-------------------------------------------------------------------------
Gesellschaft fuer wissenschaftliche Datenverarbeitung mbH Goettingen (GWDG)
Am Fassberg 11, 37077 Goettingen
URL:    http://www.gwdg.de             E-Mail: gwdg at gwdg.de
Tel.:   +49 (0)551 201-1510            Fax:    +49 (0)551 201-2150
Geschaeftsfuehrer:         Prof. Dr. Oswald Haan und Dr. Paul Suren
Aufsichtsratsvorsitzender: Prof. Dr. Christian Griesinger
Sitz der Gesellschaft:     Goettingen
Registergericht:           Goettingen  Handelsregister-Nr. B 598
-------------------------------------------------------------------------

Matthias Schniedermeyer

2011-Jul-12 07:52 UTC

head link

Feature request, or HowTo? State-full resume rsync transfer

On 11.07.2011 16:01, Donald Pearson wrote:> I am looking to do state-full resume of rsync transfers.
> 
> My network environment is is an unreliable and slow satellite
> infrastructure, and the files I need to send are approaching 10 gigs in
> size.  In this network environment often times links cannot be maintained
> for more than a few minutes at a time.  In this environment, bandwidth is
at
> a premium, which is why rsync was chosen as ideal for the job.
> 
> The problem that I am encountering while using rsync in these conditions is
> that the connection between the client and server will drop due to network
> instability before rsync can transfer the entire file.
> 
> Upon retries, rsync starts from the beginning.  Re-checking data that has
> already been sent, as well as re-building the checksum in it's entirety
> every time.  Eventually I reach an impasse where the frequency of link loss
> prevents rsync from ever getting any new data to the destination.
> 
> I've been reading through the various switches in the man pages to try
to
> find a combination that will work.  My thinking was to use a combination of
> --partial and --append.  With the first attempt using the --partial switch,
> and subsequent attempts using both --partial and --append.  The idea being
> rsync would build a new "partial" file, and be able to resume
building that
> file while making the assumption upon subsequent retries that the existing
> partial file, however large it may be, was assembled correctly and does not
> need to be checked.
> 
> However in practice rsync does not work in this way.
I think you didn't wait for the target rsync to complete, if a 
connection breaks, you have 2 parts left hanging. The less visible 
target-side is the important one here. That rsync has to "complete"
before you do another try. Depending on how your connection drops it MAY 
hang for some time. I don't remember if rsync does "the right
thing"
if you just kill it, or if you have to wait for it. In the latter case 
"--timeout" sounds like it can be used to expedite matters.

And also --inplace, with or without --append, reads like it is what you 
want, if you can live with it's caveats.






Bis denn

-- 
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated, 
cryptic, powerful, unforgiving, dangerous.

Larry Irwin

2011-Jul-12 18:50 UTC

head link

How to call rsync client so that is detects that server has gone away?

Hi All,
I have an rsync client that is not re-starting and/or timing out when 
the server processes die/go away...
The client is rsync  version 3.0.4  protocol version 30, natively 
compiled on Debian 6, kernel 2.6.32.5-amd64.
The server is a QNAP NAS device running rsync version 3.0.6 protocol 
version 30.
(I can change the client version if I need to, but not the version on 
the QNAP device)
I am trying to rsync ~700gb to the QNAP device, hopefully once per day.
There are ~40,000,000 files, many of them rsync-snapshot hard-links.
It takes a long time, so I'm using LVM2 snapshots to get static views of 
the data partitions.
I need to preserve the links or the destination size will multiply by a 
factor of 6 or so...
The command I am using on the client is:
nice -n 19 rsync -ave "ssh -l ${LOGIN} -p ${PORT}" --links
--hard-links
--perms --times --owner --group --delete --stats 
--exclude-from=${RSHOME}/cfg/rsyncbackup.exclude --link-dest=${LINKDEST} 
/ ${LOGIN}@${IPADDRESS}:..${DEST} 2>&1 | gzip -9 > ${LOGNAME}
The client and the QNAP are connected on the LAN with 1Gb NIC's on Gb 
switches.
When/if the server's rsync processes die (for an unknown reason at this 
point), I'd love for the client to recognize this fact and jump-start 
the server processes to pick up where they left off.
If not that, at least to quit and exit in a reasonable fashion...
At this moment, it is 2:45pm: The rsync started at 9:00am - the server 
processes went away at 12:33 PM - the client processes are still running...
Any ideas would be greatly appreciated.
Thanks,
Larry Irwin

Reasonably Related Threads

Search for more reasonably related threads

rsync - Jul 2011 - Feature request, or HowTo? State-full resume rsync transfer

Feature request, or HowTo? State-full resume rsync transfer

Feature request, or HowTo? State-full resume rsync transfer

Feature request, or HowTo? State-full resume rsync transfer

How to call rsync client so that is detects that server has gone away?

Reasonably Related Threads