Hi,
I got stuck within some weird prob concerning my 2-node linux cluster and
the synchronisation tool at hand (rsync-2.5.1pre1).
I have to copy a structure of 70 directories where the data of these 
directories are hardlinked to the data of the 1st directory.  Within this
"orig data" directory, I have about 30.000 files, so the amount of
files
to sync is approx. 2.100.000.  The overall size is about 9.2GB.
The method to synchronize is to have a "rsync --daemon" running on the
server in production and pull the data into the backup server via rsync::.
I secured this mechanism via a separate 100mbit network link that is
provided exclusively for the task.  The systems are
- linux-2.4.16 
- glibc-2.2
- i686 (Coppermine with 900MHz) with 512MB RAM and 400MB Swap on the
    main server and 128MB Swap on the backup server (I know this is stupid
    but at the moment I can't help it)
What happens?
The synchronization starts and gobbles up approx. 300MB of RAM/Swap by
calculating the file list at the server.  At the client system, approx.
620MB (aka. nearly all) memory is allocated to compare the file list (the
sync is set up with -auvH).  The files are transfered - when running it 
the 1st time, all files are transfered of course - and the transfer stops
at the client after an hour with
 
  rsync.c:sig_int() called.
  rsync error: received SIGUSR1 or SIGINT (code 20) at rsync.c(230)
  rsync error: received SIGUSR1 or SIGINT (code 20) at main.c(741)
where I do not see *anything* that is interfering (not me either).
OK I say, better luck next time.  However, as no rsync process remains
at the client (backup server) side, the "rsync --daemon" at the main
server did reduce its memory usage over the file transfer but after
the client broke off communications still has a child hanging around 
with 200MB of mem in use!
So, when running rsync the next time, I will have 500MB of memory eaten
up by both rsyncs on the main server (the new and the old) which is quite 
a lot.  Unfortunately, the second (and third) tries to sync break after
some time with similar messages as shown above and the hanging processes
at the main server will be happy with 700-800MB mem at their hands.
The result?  The production server is dying a slow and painful out-of-
mem-death when I don't do a "killall -9 rsync" after some time....
Any comments how to debug this?
I just have the idea that maybe the kernel at client side is sending 
silently a signal to the rsync process due to excess memory usage?
How to avoid that behaviour? (the client system was quite happy as it 
does nothing else than rsyncing...)
Regards,
- Birger
Do you see any error messages in the log file on the server? Running out of memory may be the problem. As a temporary workaround perhaps you can create an additional swap file (rather than partition) on the backup server? The rsync hlink code seems to use a bit more memory than is really necessary. Perhaps we can improve it... -- Martin
On Sun, Dec 09, 2001 at 03:51:29PM +0100, birger@takatukaland.de wrote:> > Hi, > > I got stuck within some weird prob concerning my 2-node linux cluster and > the synchronisation tool at hand (rsync-2.5.1pre1). > > I have to copy a structure of 70 directories where the data of these > directories are hardlinked to the data of the 1st directory. Within this > "orig data" directory, I have about 30.000 files, so the amount of files > to sync is approx. 2.100.000. The overall size is about 9.2GB.... [runs out of memory] ... In case you weren't aware, rsync uses a lot less memory overall if you can split up the copies into smaller pieces, because it keeps some memory for every file it touches in a run. In your case, it may be especially difficult to break up the copy into smaller pieces because of all the hardlinks. Ideas: 1. Would it be possible to use symlinks instead of hardlinks? That would give you more flexibility to split things up however you like. 2. Perhaps you could break it up into ~70 copies, where each time you give it the first directory that contains the data and another one that contains one of the hardlinks. - Dave Dykstra
I have to post a correction to this:
"
* Why would symlinks eat up disk space?
Correction, of course they don't.  I have to use hardlinks because every
"
a symbolic link does, in fact, consume disk space.  Unlike a hard link, 
(which is just another pointer to the files data, a symbolic link is a 
pointer to the files name, and fits into the directories data blocks, and 
takes up an allocation of filesystem data space, in which to store the 
arbitrary-length name, rather than the inode number.  One effect of this 
is that it allows a symbolic link to point out of the filesystem its in., 
because a directory entry points to an inode, having no way to indicate 
that its in another filesystem. 
hard link=directory entry=a name and an inode.
symbolic link=(like a)file containing the name of what it points to.
I know, technically, a symlink is not a file.  That distinction is 
maintained by the 0xF000 nybble of the mode of the object.  nevertheless, 
a directory takes space, a file takes space, a symbolic link takes space. 
An additional hard link to an existing file takes only directory space, 
which, if it's not enough of an addition to that directories existing data 
to cause the filesystem driver to add another allocation to the 
directories data space, takes up no more disk space.  A symlink, however, 
has the same effect in the directory, but, in addition, gets its own data 
space, and inode, as well.
In this example, an empty file is created.  it takes an inode, no space. 
adding another link to it takes neither space, nor an inode.
adding a symbolic link to it takes up both an inode and space... another 
symlink, to the other hard link, does the same.
adding 512 new hard links to the original file takes up no more inodes, 
but it does take up disk space, by causing the directory to expand to hold 
all those names. 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
tconway@atlas
/var/tmp/test>df .
/var               (/dev/dsk/c0t0d0s3 ): 1662512 blocks   250481 files
/var/tmp/test>ls -ld
drwxrwxrwx   2 tconway  Vlsieng     512 Dec 12 07:59 .
tconway@atlas
/var/tmp/test>touch emptyfile
tconway@atlas
/var/tmp/test>df .
/var               (/dev/dsk/c0t0d0s3 ): 1662512 blocks   250480 files
tconway@atlas
/var/tmp/test>ln emptyfile emptyfile2
tconway@atlas
/var/tmp/test>df .
/var               (/dev/dsk/c0t0d0s3 ): 1662512 blocks   250480 files
tconway@atlas
/var/tmp/test>ln -s emptyfile linktoemptyfile
tconway@atlas
/var/tmp/test>df . 
/var               (/dev/dsk/c0t0d0s3 ): 1662510 blocks   250479 files
tconway@atlas
/var/tmp/test>ln -s emptyfile2 linktoemptyfile2
tconway@atlas
/var/tmp/test>df .
/var               (/dev/dsk/c0t0d0s3 ): 1662508 blocks   250478 files
tconway@atlas
/var/tmp/test>ls -li
total 4
     27376 -rw-rw-rw-   2 tconway  Vlsieng        0 Dec 12 07:59 emptyfile
     27376 -rw-rw-rw-   2 tconway  Vlsieng        0 Dec 12 07:59 
emptyfile2
     27377 lrwxrwxrwx   1 tconway  Vlsieng        9 Dec 12 08:00 
linktoemptyfile -> emptyfile
     27380 lrwxrwxrwx   1 tconway  Vlsieng       10 Dec 12 08:00 
linktoemptyfile2 -> emptyfile2
tconway@atlas
/var/tmp/test>ls -ld
drwxrwxrwx   2 tconway  Vlsieng      512 Dec 12 08:00 .
tconway@atlas
/var/tmp/test>for a in 0 1 2 3 4 5 6 7 8 9 a b c d e
f> do
> for b in 0 1 2 3 4 5 6 7 8 9 a b c d e f
> do
> ln emptyfile $a$b
> done
> done
tconway@atlas
/var/tmp/test>ls -ld
drwxrwxrwx   2 tconway  Vlsieng     3584 Dec 12 08:03 .
tconway@atlas
/var/tmp/test>df .
/var               (/dev/dsk/c0t0d0s3 ): 1662502 blocks   250478 files
tconway@atlas
/var/tmp/test>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
I know it's not rsync-specific, but we're mostly unix guys, and need to
be
correct. 
Tim Conway
tim.conway@philips.com
303.682.4917
Philips Semiconductor - Longmont TC
1880 Industrial Circle, Suite D
Longmont, CO 80501
Available via SameTime Connect within Philips, n9hmg on AIM
perl -e 'print pack(nnnnnnnnnnnn, 
19061,29556,8289,28271,29800,25970,8304,25970,27680,26721,25451,25970), 
".\n" '
"There are some who call me.... Tim?"
birger@takatukaland.de
Sent by: rsync-admin@lists.samba.org
12/11/2001 11:42 PM
 
        To:     Dave Dykstra <dwd@bell-labs.com>
rsync@lists.samba.org
        cc:     (bcc: Tim Conway/LMT/SC/PHILIPS)
        Subject:        Re: Problems with rsync 2.5.1pre1 and hardlinks
        Classification: 
Dave Dykstra schrieb am Mon, Dec 10, 2001 at 02:21:46PM -0600:
[...]
* > 
* > * 
* > * Ideas:
* > *     1. Would it be possible to use symlinks instead of hardlinks? 
That
* > *            would give you more flexibility to split things up 
however you
* > *            like.
* > *     2. Perhaps you could break it up into ~70 copies, where each 
time you
* > *            give it the first directory that contains the data and 
another
* > *            one that contains one of the hardlinks.
* > 
* > Both alternatives will eat up huge amounts of disk space as the 
numbers
* > above suggest.  I will therefore consider plugging in more mem/swap 
before
* > trying them.
* 
* Why would symlinks eat up disk space?
Correction, of course they don't.  I have to use hardlinks because every
directory provides a chrooted environment that'll break when using 
symlinks.
* 
* I only suggested the second alternative because I thought that it would
* end up with all the destination files hardlinked together as on the
* original system.  I hadn't tested it, but now I did and it works:
* 
*     $ mkdir s s/d1 s/d2 s/d3 t
*     $ touch s/d1/l1
*     $ ln s/d1/l1 s/d2/l1
*     $ ln s/d1/l1 s/d3/l1
*     $ ls -li s/*/l1
*                286226 -rw-rw-r--   3 dwd      dwd            0 Dec 10 
14:20 s/d1/l1
*                286226 -rw-rw-r--   3 dwd      dwd            0 Dec 10 
14:20 s/d2/l1
*                286226 -rw-rw-r--   3 dwd      dwd            0 Dec 10 
14:20 s/d3/l1
*     $ rsync -aH s/d1 s/d2 t
*     $ rsync -aH s/d1 s/d3 t
*     $ ls -li t/*/l1
*                622728 -rw-rw-r--   3 dwd      dwd            0 Dec 10 
14:20 t/d1/l1
*                622728 -rw-rw-r--   3 dwd      dwd            0 Dec 10 
14:20 t/d2/l1
*                622728 -rw-rw-r--   3 dwd      dwd            0 Dec 10 
14:20 t/d3/l1
Ah, now I understand what you mean.  Yes that will surely work and 
minimize
memory utilisation on the cost of a somewhat complex rsync mechanism. I'll
definitely fall back on that technique if I'll be still short on memory 
after
some upgrades.
("somewhat complex": the backup server does not know anything about
the
 directories to be synced, maybe there will be new ones - so you'll have
 to provide a comprehensive list of the dirs from the main server to set
 up the logic above)
- Birger
Well, I'll be damned.  I'd never run into that trick.  My apologies.
Tim Conway
tim.conway@philips.com
303.682.4917
Philips Semiconductor - Longmont TC
1880 Industrial Circle, Suite D
Longmont, CO 80501
Available via SameTime Connect within Philips, n9hmg on AIM
perl -e 'print pack(nnnnnnnnnnnn, 
19061,29556,8289,28271,29800,25970,8304,25970,27680,26721,25451,25970), 
".\n" '
"There are some who call me.... Tim?"
Martin Pool <mbp@samba.org>
12/12/2001 03:27 PM
 
        To:     Tim Conway/LMT/SC/PHILIPS@AMEC
        cc:     birger@takatukaland.de
Dave Dykstra <dwd@bell-labs.com>
rsync@samba.org
        Subject:        Re: Problems with rsync 2.5.1pre1 and hardlinks
        Classification: 
On 12 Dec 2001, tim.conway@philips.com wrote:
> An additional hard link to an existing file takes only directory
> space, which, if it's not enough of an addition to that directories
> existing data to cause the filesystem driver to add another
> allocation to the directories data space, takes up no more disk
> space.  A symlink, however, has the same effect in the directory,
> but, in addition, gets its own data space, and inode, as well.
Actually on Linux (and some others?) the content of a symlink is
normally stored *inside* the inode, in the space that would otherwise
be used for pointers to the data and indirect blocks.  This only helps
if the target of the symlink as a string is short enough to fit in
this uint32[15] field, in other words 60 bytes.  These are called
"fast symlinks", and you can see them listed separately in the output
of e2fsck.  Of course as well as not occupying blocks they have the
advantage of being read faster, and not using any time in modifying
block allocation maps, etc.
If the symlink is longer, data blocks are allocated as on classic
Unix.