I was planning to use rsync to backup to a second drive, but I ran out of swap space. No problem, I will let rsync do it a few directories at a time overnight when the computer usually sits idle except for incomming email. I have 2 question: 1. How much memory does each file to be copied need. Obvisiouly I have too many files. 2. Why does this command work: rsync -ax /usr/xx /backup/usr/ when: rsync -ax /usr/xx/ /backup/usr/ refuses to create the directory xx in /backup/usr and copies the contents of the directory to /backup Happens on both 2.4.7Pre4 and 2.5.0 just released. Thanks Ian
On 29 Nov 2001, Ian Kettleborough <ian@idk.com> wrote:> 1. How much memory does each file to be copied need. Obvisiouly I have too many > files.Hard to say exactly. On the order of a hundred bytes per file.> 2. Why does this command work: > > rsync -ax /usr/xx /backup/usr/ > > > when: > > rsync -ax /usr/xx/ /backup/usr/ > > refuses to create the directory xx in /backup/usr and copies > the contents of the directory to /backupActually that's a feature not a bug: /usr/xx means "the directory xx" so it creates /backup/usr/xx /usr/xx/ means "the contents of xx" so it copies the contents directly into /backup/usr/ without creating an xx destination directory. Just use whichever one is appropriate. -- Martin
tim.conway@philips.com
2001-Dec-01 01:42 UTC
Why does one of there work and the other doesn't
from "man rsync": a trailing slash on the source changes this behavior to transfer all files from the directory src/bar on the machine foo into the /data/tmp/. A trailing / on a source name means "copy the contents of this directory". Without a trailing slash it means "copy the directory". This differ- ence becomes particularly important when using the --delete option. Wonderful things, those manuals. Warning: in my experience, this gives unpredictable results. it does NOT, in fact, always detect all the content of the directory, and as a result, a --delete can have catastrophic consequences. I have not had time to try to figure out why this happens, but my few tests aren't even repeatable... if there are more than maybe 10 entries in the directory, something is always left out, but rarely the same thing twice. Needless to say, I never use that syntax. ">ls -a src.dotfile dir1 file2 dir3 file4>ls -a dest.dotfile dir1 file2 dir3 file4>rsync -a --delete --force src/ dest >ls -a dest >ls -a dest.dotfile dir1 file2 file4>update resume" Tim Conway tim.conway@philips.com 303.682.4917 Philips Semiconductor - Longmont TC 1880 Industrial Circle, Suite D Longmont, CO 80501 Available via SameTime Connect within Philips, n9hmg on AIM perl -e 'print pack(nnnnnnnnnnnn, 19061,29556,8289,28271,29800,25970,8304,25970,27680,26721,25451,25970), ".\n" ' "There are some who call me.... Tim?" Ian Kettleborough <ian@idk.com> Sent by: rsync-admin@lists.samba.org 11/29/2001 11:58 PM To: rsync@lists.samba.org cc: (bcc: Tim Conway/LMT/SC/PHILIPS) Subject: Why does one of there work and the other doesn't Classification: I was planning to use rsync to backup to a second drive, but I ran out of swap space. No problem, I will let rsync do it a few directories at a time overnight when the computer usually sits idle except for incomming email. I have 2 question: 1. How much memory does each file to be copied need. Obvisiouly I have too many files. 2. Why does this command work: rsync -ax /usr/xx /backup/usr/ when: rsync -ax /usr/xx/ /backup/usr/ refuses to create the directory xx in /backup/usr and copies the contents of the directory to /backup Happens on both 2.4.7Pre4 and 2.5.0 just released. Thanks Ian
From: Randy Kramer [mailto:rhkramer@fast.net]> I am not sure which end the 100 bytes per file applies to, and I guess > that is the RAM memory footprint?. Does rsync need 100 bytes for each > file that might be transferred during a session (all files in the > specified directory(ies)), or does it need only 100 bytes as it does one > file at a time?Yes, the ~100 bytes is in RAM - I think a key point though is that the storage to hold the file list grows exponentially (doubling each time), so if you have a lot of files in the worst case you can use almost twice as much memory as needed. Here's an analysis I posted to the list a while back that I think is still probably valid for the current versions of rsync - a later followup noted that it didn't include an ~28 byte structure for each entry in the include/exclude list: - - - - - - - - - - - - - - - - - - - - - - - - -> (a) How much memory, in bytes/file, does rsync allocate?This is only based on my informal code peeks in the past, so take it with a grain of salt - I don't know if anyone has done a more formal memory analysis. I believe that the major driving factors in memory usage that I can see is: 1. The per-file overhead in the filelist for each file in the system. The memory is kept for all files for the life of the rsync process. I believe this is 56 bytes per file (it's a file_list structure), but a critical point is that it is allocated initially for 1000 files, but then grows exponentially (doubling). So the space will grow as 1000, 2000, 4000, 8000 etc.. until it has enough room for the files necessary. This means you might, worst case, have just about twice as much memory as necessary, but it reduces the reallocation calls quite a bit. At ~56K per 1000 files, if you've got a file system with 10000 files in it, you'll allocate room for 16000 and use up 896K. This growth pattern seems to occur on both sender and receiver of any given file list (e.g., I don't see a transfer of the total count over the wire used to optimize the allocation on the receiver). 2. The per-block overhead for the checksums for each file as it is processed. This memory exists only for the duration of one file. This is 32 bytes per file (a sum_buf) allocated as on memory chunk. This exists on the receiver as it is computed and transmitted, and on the sender as it receives it and uses it to match against the new file. 3. The match tables built to determine the delta between the original file and the new file. I haven't looked at closely at this section of code, but I believe we're basically talking about the hash table, which is going to be a one time (during rsync execution) 256K for the tag table and then 8 (or maybe 6 if your compiler doesn't pad the target struct) bytes per block of the file being worked on, which only exists for the duration of the file. This only occurs on the sender. There is also some fixed space for various things - I think the largest of which is up to 256K for the buffer used to map files.> (b) Is this the same for the rsyncs on both ends, or is there > some asymmetry there?There's asymmetry. Both sides need the memory to handle the lists of files involved. But while the receiver just constructs the checksums and sends them, and then waits for instructions on how to build the new file (either new data or pulling from the old file), the sender also constructs the hash of those checksums to use while walking through the new file. So in general on any given transfer, I think the sender will end up using a bit more memory.> (c) Does it matter whether pushing or pulling?Yes, inasmuch as the asymmetry is based on who is sending and who is receiving a given file. It doesn't matter who initiates the contact, but the direction that the files are flowing. This is due to the algorithm (the sender is the component that has to construct the mapping from the new file using portions of the old file as transmitted by the receiver). - - - - - - - - - - - - - - - - - - - - - - - - - -- David /-----------------------------------------------------------------------\ \ David Bolen \ E-mail: db3l@fitlinxx.com / | FitLinxx, Inc. \ Phone: (203) 708-5192 | / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ \-----------------------------------------------------------------------/
tim.conway@philips.com
2001-Dec-01 10:01 UTC
Why does one of there work and the other doesn't
100bytes/file of file information for every file, whether it is to be transferred or not. YMMV Tim Conway tim.conway@philips.com 303.682.4917 Philips Semiconductor - Longmont TC 1880 Industrial Circle, Suite D Longmont, CO 80501 Available via SameTime Connect within Philips, n9hmg on AIM perl -e 'print pack(nnnnnnnnnnnn, 19061,29556,8289,28271,29800,25970,8304,25970,27680,26721,25451,25970), ".\n" ' "There are some who call me.... Tim?" Randy Kramer <rhkramer@fast.net> Sent by: rsync-admin@lists.samba.org 11/30/2001 10:01 AM To: Martin Pool <mbp@samba.org> cc: Ian Kettleborough <ian@idk.com> rsync@samba.org (bcc: Tim Conway/LMT/SC/PHILIPS) Subject: Re: Why does one of there work and the other doesn't Classification: Martin Pool wrote:> Ian Kettleborough <ian@idk.com> wrote: > > 1. How much memory does each file to be copied need. Obvisiouly I havetoo many> > files. > > Hard to say exactly. On the order of a hundred bytes per file.I may have misunderstood the question, but maybe we should point out that, on the receiving end, each file needs at least an amount of *disk space* equal in size to the file as a new file is constructed before the old file is deleted. I am not sure which end the 100 bytes per file applies to, and I guess that is the RAM memory footprint?. Does rsync need 100 bytes for each file that might be transferred during a session (all files in the specified directory(ies)), or does it need only 100 bytes as it does one file at a time? Trying to learn, also, Randy Kramer
tim.conway@philips.com
2001-Dec-04 03:55 UTC
Why does one of there work and the other doesn't
rsync already has a memory-hogging issue. Imagine having it search your entire directory tree, checksumming all files, storing and sending them all, comparing both lists looking for matching date/time/checksums to guess where you've moved files to. You'd be better off to use a wrapper the tools you move files with, keeping a replayable log, and have your mirrors retrieve and replay that log, before doing the rsync. Tim Conway tim.conway@philips.com 303.682.4917 Philips Semiconductor - Longmont TC 1880 Industrial Circle, Suite D Longmont, CO 80501 Available via SameTime Connect within Philips, n9hmg on AIM perl -e 'print pack(nnnnnnnnnnnn, 19061,29556,8289,28271,29800,25970,8304,25970,27680,26721,25451,25970), ".\n" ' "There are some who call me.... Tim?" Phil Howard <phil-rsync@ipal.net> Sent by: rsync-admin@lists.samba.org 12/03/2001 09:04 AM To: rsync@samba.org cc: (bcc: Tim Conway/LMT/SC/PHILIPS) Subject: Re: Why does one of there work and the other doesn't Classification: On Mon, Dec 03, 2001 at 12:09:16AM +1100, Martin Pool wrote: | On 30 Nov 2001, Randy Kramer <rhkramer@fast.net> wrote: | | > I am not sure which end the 100 bytes per file applies to, and I guess | > that is the RAM memory footprint?. Does rsync need 100 bytes for each | > file that might be transferred during a session (all files in the | > specified directory(ies)), or does it need only 100 bytes as it does one | > file at a time? | | At the moment that is 100B for all files to be transferred in the | whole session. This is a big limit to scalability at the moment, and | a goal of mine is to reduce it to at most holding file information | from a single directory in memory. It would still be nice to have an option to gather all files at once, but this will be of value if it also gathers all the checksums and syncronizes files moves that have happened on the source end by doing the syncronization of the moved file to the new location using the old (checksum matched) file on the destination end. Right now if a file gets moved from one location to another (especially in a different directory, which is often the case with a re-organization) things get retransferred even though most every file already exists somewhere on the destination. -- ----------------------------------------------------------------- | Phil Howard - KA9WGN | Dallas | http://linuxhomepage.com/ | | phil-nospam@ipal.net | Texas, USA | http://phil.ipal.org/ | -----------------------------------------------------------------