thr3ads.net - rsync - disconnected synchronization (mostly unidirectional) [Mar 2007]

If this information is useful, please help other people find it:
Share via:

Konrad Karl

2007-Mar-10 19:43 UTC

disconnected synchronization (mostly unidirectional)

Hi List,

please consider the following scenario:
(and pleast forgive if I have not googled enough but I was unsuccessful to
find an app what exactly does what I want)

machine A (office) is, where most file changes/downloads etc happen.
                   and it has limited internet access (only proxy possible)
 
Machine B (home) has low bandwidth, is NATed etc.

So I would like to use an USB hard drive as transportation medium.

First lets assume, A and B have been brought to sync somehow.

Afterwards I could do a disconnected rsync operation like follows:
(assuming transfer A -> B)

1.) get B's hard disk contents metadata  - perhaps the contents of
   rsync's file list can be stored onto the transportation medium.

2.) (in the office) let rsync generate its source file list from A
    but instead of connecting to B and get the destinations file
    list load B's list from the USB drive.

3.) store the files which needed to be transfered and all the
    other information (what has to be deleted, file attrs etc)
    onto a shadow directory on the USB drive.

4.) drive home :-)

5.) again using rsync, copy the files to their final destination
    and do all the other things required using the info stored
    on the USB drive.

6.) result: B should be equal to A 

For a first approach it would be fine to not use the
"rsync" algorithm (transfer file differences only) because
implementing this will perhaps require considerable work and
hard drives are cheap nowadays.

What do you think?

Regards,
Konrad

Paul Slootman

2007-Mar-10 20:00 UTC

head link

disconnected synchronization (mostly unidirectional)

On Sat 10 Mar 2007, Konrad Karl wrote:
> please consider the following scenario:
> (and pleast forgive if I have not googled enough but I was unsuccessful to
> find an app what exactly does what I want)
It sounds like unison may be what you want, although using an USB stick
as intermediary transport is something that may not fit into unison's
scheme of things.


Paul Slootman

Wayne Davison

2007-Mar-11 14:41 UTC

head link

disconnected synchronization (mostly unidirectional)

On Sat, Mar 10, 2007 at 08:43:01PM +0100, Konrad Karl
wrote:> machine A (office) is, where most file changes/downloads etc happen.
>                    and it has limited internet access (only proxy possible)
>  
> Machine B (home) has low bandwidth, is NATed etc.
To copy data from A -> B using a USB as a data transport rsync supports
batch-file creation.  However, rsync doesn't support a fake file list.
So, to use batch files, you have two options:

1) Actually connect from A to B, but use the --only-write-batch=FILE
option.  This will only use the connection to transfer the file list and
receive checksums for changed files from the remote host.  (You can turn
off the checksum sending via --whole-file, which makes the batch contain
each changed file's whole data, not just the differences, to reduce the
data sent over the wire.)

2) Have a copy of your home data somewhere in your office setup, and use
rsync with the --write-batch=FILE option to update that office copy.
The batch it creates will contain all the necessary changes to update
the home files as long as the home system and the extra office copy were
identical at the start of the rsync.

In both cases you'd put the batch file that rsync created onto your USB
memory, and then run rsync --read-batch=FILE using that file when you
got home to do the actual update of machine B.

..wayne..

Wayne Davison

2007-Mar-12 18:21 UTC

head link

disconnected synchronization (mostly unidirectional)

On Mon, Mar 12, 2007 at 11:12:58AM +0100, Konrad Karl
wrote:> Given this capability together with the batch mode it should be
> possible to do what I want.
Not really, because rsync still needs to ask for the checksums to see
what has changed.  If you're fine sending whole files, it would be easy
to code something up in perl that just compared size+mtime to a list and
copied each whole file somewhere.  For instance, there's a perl script
in the support dir, file-attr-restore, that uses a "find ... -ls" file
to restore attributes in a hierarchy.  That could be adapted to do what
you want, especially if the find output was customized to output the
modified time value in a full-resolution format:

    find . -printf '%s %T@ %p\n'

..wayne..

Wayne Davison

2007-Mar-12 22:26 UTC

head link

disconnected synchronization (mostly unidirectional)

On Mon, Mar 12, 2007 at 07:31:52PM +0100, Konrad Karl
wrote:> it just would need the fileinfo of another_local_directory from
> somewhere (database, whatever) in order to generate rsync batch 
> files with --whole-files or am I missing something?
Yes, if you want to either code up a fuse filesystem that makes it
appear that there is a hierarchy of files present, that would work
with an unmodified rsync --whole-files using --write-batch (assuming
that you made the fuse filesystem discard the file data and update
the file info).  Other than that, you'd need to dig into the various
stat(), readdir(), etc. functions that the receiving side calls and
direct them to a DB.

..wayne..

Phil Howard

2007-Mar-14 16:02 UTC

head link

disconnected synchronization (mostly unidirectional)

On Sat, Mar 10, 2007 at 08:43:01PM +0100, Konrad Karl wrote:

| please consider the following scenario:
| (and pleast forgive if I have not googled enough but I was unsuccessful to
| find an app what exactly does what I want)
| 
| machine A (office) is, where most file changes/downloads etc happen.
|                    and it has limited internet access (only proxy possible)
|  
| Machine B (home) has low bandwidth, is NATed etc.
| 
| So I would like to use an USB hard drive as transportation medium.
| 
| First lets assume, A and B have been brought to sync somehow.
| 
| Afterwards I could do a disconnected rsync operation like follows:
| (assuming transfer A -> B)
| 
| 1.) get B's hard disk contents metadata  - perhaps the contents of
|    rsync's file list can be stored onto the transportation medium.
| 
| 2.) (in the office) let rsync generate its source file list from A
|     but instead of connecting to B and get the destinations file
|     list load B's list from the USB drive.
| 
| 3.) store the files which needed to be transfered and all the
|     other information (what has to be deleted, file attrs etc)
|     onto a shadow directory on the USB drive.
| 
| 4.) drive home :-)
| 
| 5.) again using rsync, copy the files to their final destination
|     and do all the other things required using the info stored
|     on the USB drive.
| 
| 6.) result: B should be equal to A 
| 
| For a first approach it would be fine to not use the
| "rsync" algorithm (transfer file differences only) because
| implementing this will perhaps require considerable work and
| hard drives are cheap nowadays.
| 
| What do you think?

I have a somewhat different scenario, but one I think is sufficiently
close that it could be adapted to yours.

I have the entire Gentoo mirror (around 50 GB) syncronized at home which
is on low bandwidth (28.8K dialup).  I keep it in sync with rsync in the
following way.  I take a list of all the files I do have at home (which
can be carried over the USB device, though I send that from home to office
ahead of time over the net).  I run rsync using the --exclude-from option
giving it the name of that file.  It downloads files that are not in that
list (new files and files I accidentally removed).  I then create a tar
file from the downloaded directory and copy that tarball directly to the
USB flash drive (no filesystem or mounting is involved).

You may be able to do something similar by substituting your office files
that need to be replicated at home for the remote mirror I used above.
The possible scenario might then be (assuming Unix/BSD/Linux hosts):

1.  Bring/get list of files already at home to office.

2.  Use rsync to make replica of office files to a temporary area using
    --exclude-from to limit to new files

3.  Save the replica subset to USB flash drive.

4.  At home, extract files from USB flash drive.

The big issue here is files that merely CHANGE are not detected.  To get
better syncronization, dates in the list of files could be used to cross
check dates of actual files.  Remove anything that has changed from the
exclude list, and proceed as above.

Konrad Karl

2007-Mar-14 18:59 UTC

head link

disconnected synchronization (mostly unidirectional)

Hello,

On Wed, Mar 14, 2007 at 11:02:04AM -0500, Phil Howard
wrote:> On Sat, Mar 10, 2007 at 08:43:01PM +0100, Konrad Karl wrote:
> 
[ deleted ]
> | machine A (office) is, where most file changes/downloads etc happen.
> |                    and it has limited internet access (only proxy
possible)
> |  
> | Machine B (home) has low bandwidth, is NATed etc.
> | 
> | So I would like to use an USB hard drive as transportation medium.
[ deleted]
> I have a somewhat different scenario, but one I think is sufficiently
> close that it could be adapted to yours.
> 
> I have the entire Gentoo mirror (around 50 GB) syncronized at home which
> is on low bandwidth (28.8K dialup).  I keep it in sync with rsync in the
> following way.  I take a list of all the files I do have at home (which
> can be carried over the USB device, though I send that from home to office
> ahead of time over the net).  I run rsync using the --exclude-from option
> giving it the name of that file.  It downloads files that are not in that
> list (new files and files I accidentally removed).  I then create a tar
> file from the downloaded directory and copy that tarball directly to the
> USB flash drive (no filesystem or mounting is involved).
I see but I want to get closer to a directly connected rsync with all its
benefits - looking at file length and attribute changes etc.

Right now I am really close using my hacked cpio which generates
sparse files (pls see my earlier post). I am still trying to optimize
the space requirements and speed and have played with a hacked
fuse-dbfs-0.6 (it does not store the file contents) but fuse-dbfs-0.6
becomes really slow if you have more than a few thousand files in one
directory (I have up to 25000 or so, unfortunately) - it implements
directories as a linear and unsorted list...
> 
> You may be able to do something similar by substituting your office files
> that need to be replicated at home for the remote mirror I used above.
> The possible scenario might then be (assuming Unix/BSD/Linux hosts):
> 
> 1.  Bring/get list of files already at home to office.
> 
> 2.  Use rsync to make replica of office files to a temporary area using
>     --exclude-from to limit to new files
> 
> 3.  Save the replica subset to USB flash drive.
> 
> 4.  At home, extract files from USB flash drive.
> 
> The big issue here is files that merely CHANGE are not detected.  To get
> better syncronization, dates in the list of files could be used to cross
> check dates of actual files.  Remove anything that has changed from the
> exclude list, and proceed as above.
using my sparse-file mirror rsync detects the changes quite nicely.

Thanks for your input,
Konrad

Apparently Analagous Threads

Search for more reasonably related threads

rsync - Mar 2007 - disconnected synchronization (mostly unidirectional)

disconnected synchronization (mostly unidirectional)

disconnected synchronization (mostly unidirectional)

disconnected synchronization (mostly unidirectional)

disconnected synchronization (mostly unidirectional)

disconnected synchronization (mostly unidirectional)

disconnected synchronization (mostly unidirectional)

disconnected synchronization (mostly unidirectional)

Apparently Analagous Threads