Hi List, please consider the following scenario: (and pleast forgive if I have not googled enough but I was unsuccessful to find an app what exactly does what I want) machine A (office) is, where most file changes/downloads etc happen. and it has limited internet access (only proxy possible) Machine B (home) has low bandwidth, is NATed etc. So I would like to use an USB hard drive as transportation medium. First lets assume, A and B have been brought to sync somehow. Afterwards I could do a disconnected rsync operation like follows: (assuming transfer A -> B) 1.) get B's hard disk contents metadata - perhaps the contents of rsync's file list can be stored onto the transportation medium. 2.) (in the office) let rsync generate its source file list from A but instead of connecting to B and get the destinations file list load B's list from the USB drive. 3.) store the files which needed to be transfered and all the other information (what has to be deleted, file attrs etc) onto a shadow directory on the USB drive. 4.) drive home :-) 5.) again using rsync, copy the files to their final destination and do all the other things required using the info stored on the USB drive. 6.) result: B should be equal to A For a first approach it would be fine to not use the "rsync" algorithm (transfer file differences only) because implementing this will perhaps require considerable work and hard drives are cheap nowadays. What do you think? Regards, Konrad
On Sat 10 Mar 2007, Konrad Karl wrote:> please consider the following scenario: > (and pleast forgive if I have not googled enough but I was unsuccessful to > find an app what exactly does what I want)It sounds like unison may be what you want, although using an USB stick as intermediary transport is something that may not fit into unison's scheme of things. Paul Slootman
On Sat, Mar 10, 2007 at 08:43:01PM +0100, Konrad Karl wrote:> machine A (office) is, where most file changes/downloads etc happen. > and it has limited internet access (only proxy possible) > > Machine B (home) has low bandwidth, is NATed etc.To copy data from A -> B using a USB as a data transport rsync supports batch-file creation. However, rsync doesn't support a fake file list. So, to use batch files, you have two options: 1) Actually connect from A to B, but use the --only-write-batch=FILE option. This will only use the connection to transfer the file list and receive checksums for changed files from the remote host. (You can turn off the checksum sending via --whole-file, which makes the batch contain each changed file's whole data, not just the differences, to reduce the data sent over the wire.) 2) Have a copy of your home data somewhere in your office setup, and use rsync with the --write-batch=FILE option to update that office copy. The batch it creates will contain all the necessary changes to update the home files as long as the home system and the extra office copy were identical at the start of the rsync. In both cases you'd put the batch file that rsync created onto your USB memory, and then run rsync --read-batch=FILE using that file when you got home to do the actual update of machine B. ..wayne..
On Mon, Mar 12, 2007 at 11:12:58AM +0100, Konrad Karl wrote:> Given this capability together with the batch mode it should be > possible to do what I want.Not really, because rsync still needs to ask for the checksums to see what has changed. If you're fine sending whole files, it would be easy to code something up in perl that just compared size+mtime to a list and copied each whole file somewhere. For instance, there's a perl script in the support dir, file-attr-restore, that uses a "find ... -ls" file to restore attributes in a hierarchy. That could be adapted to do what you want, especially if the find output was customized to output the modified time value in a full-resolution format: find . -printf '%s %T@ %p\n' ..wayne..
On Mon, Mar 12, 2007 at 07:31:52PM +0100, Konrad Karl wrote:> it just would need the fileinfo of another_local_directory from > somewhere (database, whatever) in order to generate rsync batch > files with --whole-files or am I missing something?Yes, if you want to either code up a fuse filesystem that makes it appear that there is a hierarchy of files present, that would work with an unmodified rsync --whole-files using --write-batch (assuming that you made the fuse filesystem discard the file data and update the file info). Other than that, you'd need to dig into the various stat(), readdir(), etc. functions that the receiving side calls and direct them to a DB. ..wayne..
On Sat, Mar 10, 2007 at 08:43:01PM +0100, Konrad Karl wrote: | please consider the following scenario: | (and pleast forgive if I have not googled enough but I was unsuccessful to | find an app what exactly does what I want) | | machine A (office) is, where most file changes/downloads etc happen. | and it has limited internet access (only proxy possible) | | Machine B (home) has low bandwidth, is NATed etc. | | So I would like to use an USB hard drive as transportation medium. | | First lets assume, A and B have been brought to sync somehow. | | Afterwards I could do a disconnected rsync operation like follows: | (assuming transfer A -> B) | | 1.) get B's hard disk contents metadata - perhaps the contents of | rsync's file list can be stored onto the transportation medium. | | 2.) (in the office) let rsync generate its source file list from A | but instead of connecting to B and get the destinations file | list load B's list from the USB drive. | | 3.) store the files which needed to be transfered and all the | other information (what has to be deleted, file attrs etc) | onto a shadow directory on the USB drive. | | 4.) drive home :-) | | 5.) again using rsync, copy the files to their final destination | and do all the other things required using the info stored | on the USB drive. | | 6.) result: B should be equal to A | | For a first approach it would be fine to not use the | "rsync" algorithm (transfer file differences only) because | implementing this will perhaps require considerable work and | hard drives are cheap nowadays. | | What do you think? I have a somewhat different scenario, but one I think is sufficiently close that it could be adapted to yours. I have the entire Gentoo mirror (around 50 GB) syncronized at home which is on low bandwidth (28.8K dialup). I keep it in sync with rsync in the following way. I take a list of all the files I do have at home (which can be carried over the USB device, though I send that from home to office ahead of time over the net). I run rsync using the --exclude-from option giving it the name of that file. It downloads files that are not in that list (new files and files I accidentally removed). I then create a tar file from the downloaded directory and copy that tarball directly to the USB flash drive (no filesystem or mounting is involved). You may be able to do something similar by substituting your office files that need to be replicated at home for the remote mirror I used above. The possible scenario might then be (assuming Unix/BSD/Linux hosts): 1. Bring/get list of files already at home to office. 2. Use rsync to make replica of office files to a temporary area using --exclude-from to limit to new files 3. Save the replica subset to USB flash drive. 4. At home, extract files from USB flash drive. The big issue here is files that merely CHANGE are not detected. To get better syncronization, dates in the list of files could be used to cross check dates of actual files. Remove anything that has changed from the exclude list, and proceed as above.
Hello, On Wed, Mar 14, 2007 at 11:02:04AM -0500, Phil Howard wrote:> On Sat, Mar 10, 2007 at 08:43:01PM +0100, Konrad Karl wrote: >[ deleted ]> | machine A (office) is, where most file changes/downloads etc happen. > | and it has limited internet access (only proxy possible) > | > | Machine B (home) has low bandwidth, is NATed etc. > | > | So I would like to use an USB hard drive as transportation medium.[ deleted]> I have a somewhat different scenario, but one I think is sufficiently > close that it could be adapted to yours. > > I have the entire Gentoo mirror (around 50 GB) syncronized at home which > is on low bandwidth (28.8K dialup). I keep it in sync with rsync in the > following way. I take a list of all the files I do have at home (which > can be carried over the USB device, though I send that from home to office > ahead of time over the net). I run rsync using the --exclude-from option > giving it the name of that file. It downloads files that are not in that > list (new files and files I accidentally removed). I then create a tar > file from the downloaded directory and copy that tarball directly to the > USB flash drive (no filesystem or mounting is involved).I see but I want to get closer to a directly connected rsync with all its benefits - looking at file length and attribute changes etc. Right now I am really close using my hacked cpio which generates sparse files (pls see my earlier post). I am still trying to optimize the space requirements and speed and have played with a hacked fuse-dbfs-0.6 (it does not store the file contents) but fuse-dbfs-0.6 becomes really slow if you have more than a few thousand files in one directory (I have up to 25000 or so, unfortunately) - it implements directories as a linear and unsorted list...> > You may be able to do something similar by substituting your office files > that need to be replicated at home for the remote mirror I used above. > The possible scenario might then be (assuming Unix/BSD/Linux hosts): > > 1. Bring/get list of files already at home to office. > > 2. Use rsync to make replica of office files to a temporary area using > --exclude-from to limit to new files > > 3. Save the replica subset to USB flash drive. > > 4. At home, extract files from USB flash drive. > > The big issue here is files that merely CHANGE are not detected. To get > better syncronization, dates in the list of files could be used to cross > check dates of actual files. Remove anything that has changed from the > exclude list, and proceed as above.using my sparse-file mirror rsync detects the changes quite nicely. Thanks for your input, Konrad