Christian Iversen
2012-Nov-02 13:33 UTC
vanilla rsync 3.0.9 hangs after transferring ~2000 files
Hello rsync folks I'm trying to use rsync for backing up our servers. This mostly works extremely well, with no problems. However, 1 server is giving me a lot of trouble. It has a directory with (currently) 734088 files in it, and every time I try to backup this dir, rsync hangs after transferring roughly 2000 files. Sometimes it's around 1800, sometimes it's over 2100 (I think), but it's in that ballbark. If I exclude the large directory, rsync completes the backup successfully (albeit incompletely, of course). I'm running Debian Stable on both the client and server, fully updated. I thought maybe a Debian patch could be interfering, so I've tried vanilla 3.0.9 rsync straight from the tgz, but that gives the same problem. Things I've already tried: - Different MTU - Disabling/enabling compression in rsync (-z) - Using --protocol=29 - Other variations on arguments to rsync - Simply waiting for it to finish (it will sit there for literally days). This is what it looks like with -vvv: ... false_alarms=0 hash_hits=0 matches=0 sender finished data/www/virtual/_www.ageforce.dk/wwwroot/usrPictures/110x140_9f272336db751064bc808f3c5131f84f.jpg recv_generator(data/www/virtual/_www.ageforce.dk/wwwroot/usrPictures/110x140_9f2820d5a1c9d1d2b01988138523fe9c.jpg,54940) send_files(54940, data/www/virtual/_www.ageforce.dk/wwwroot/usrPictures/110x140_9f2820d5a1c9d1d2b01988138523fe9c.jpg) send_files mapped data/www/virtual/_www.ageforce.dk/wwwroot/usrPictures/110x140_9f2820d5a1c9d1d2b01988138523fe9c.jpg of size 4866 calling match_sums data/www/virtual/_www.ageforce.dk/wwwroot/usrPictures/110x140_9f2820d5a1c9d1d2b01988138523fe9c.jpg data/www/virtual/_www.ageforce.dk/wwwroot/usrPictures/110x140_9f2820d5a1c9d1d2b01988138523fe9c.jpg sending file_sum false_alarms=0 hash_hits=0 matches=0 sender finished data/www/virtual/_www.ageforce.dk/wwwroot/usrPictures/110x140_9f2820d5a1c9d1d2b01988138523fe9c.jpg recv_generator(data/www/virtual/_www.ageforce.dk/wwwroot/usrPictures/110x140_9f299dc6bd996d789b3fc7e73b6e86dd.jpg,54941) send_files(54941, data/www/virtual/_www.ageforce.dk/wwwroot/usrPictures/110x140_9f299dc6bd996d789b3fc7e73b6e86dd.jpg) send_files mapped data/www/virtual/_www.ageforce.dk/wwwroot/usrPictures/110x140_9f299dc6bd996d789b3fc7e73b6e86dd.jpg of size 4488 calling match_sums data/www/virtual/_www.ageforce.dk/wwwroot/usrPictures/110x140_9f299dc6bd996d789b3fc7e73b6e86dd.jpg data/www/virtual/_www.ageforce.dk/wwwroot/usrPictures/110x140_9f299dc6bd996d789b3fc7e73b6e86dd.jpg sending file_sum false_alarms=0 hash_hits=0 matches=0 sender finished data/www/virtual/_www.ageforce.dk/wwwroot/usrPictures/110x140_9f299dc6bd996d789b3fc7e73b6e86dd.jpg After this is Simply Just Hangs. With strace on the client and server, I can see that they are both stuck in a select() loop. I also tried running the client with ltrace, and after a GOOD long while, I got this output: http://i.imgur.com/wYRDO.png (I couldn't make copy-paste work from that terminal). Do you have any ideas what the problem might be? Or how I can help debug it? Right now we have a customer we simply cannot do a backup for, which is pretty bad :-) Thanks in advance for any input. -- De bedste hilsner, Christian Iversen Systemadministrator, Meebox.net ------- Denne e-mail kan indeholde fortrolige oplysninger. Er du ikke den rette modtager, bedes du returnere og slette denne e-mail. -------
Karl O. Pinc
2012-Nov-02 16:12 UTC
vanilla rsync 3.0.9 hangs after transferring ~2000 files
On 11/02/2012 08:33:13 AM, Christian Iversen wrote:> Hello rsync folks >> Do you have any ideas what the problem might be? Or how I can help > debug it?I have no idea regards the problem, but it never hurts to do a tcpdump (-w, possibly -s) and look at what's on the wire. Perhaps you can figure out more by using wireshark to view the end of the tcpdumps and compare both ends. Make sure acks are getting through, and so forth. This would at least eliminate the network as the source of the problem. But, as I say, I don't know what I'm doing. This is just a thought can could cause you to spend time going down a blind alley. Regards, Karl <kop at meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Karl O. Pinc
2012-Nov-02 16:15 UTC
vanilla rsync 3.0.9 hangs after transferring ~2000 files
On 11/02/2012 08:33:13 AM, Christian Iversen wrote:> Hello rsync folks> However, 1 server is giving me a lot of trouble. It has a directory > with (currently) 734088 files in it, and every time I try to backup > this dir, rsync hangs after transferring roughly 2000 files.Since it might be filesystem related you could also tell us what the filesystem is, and try doing a fsck on it and see if that helps. Karl <kop at meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
Justin T Pryzby
2012-Nov-02 18:32 UTC
vanilla rsync 3.0.9 hangs after transferring ~2000 files
On Fri, Nov 02, 2012 at 02:33:13PM +0100, Christian Iversen wrote:> However, 1 server is giving me a lot of trouble. It has a directory > with (currently) 734088 files in it, and every time I try to backup > this dir, rsync hangs after transferring roughly 2000 files. > Sometimes it's around 1800, sometimes it's over 2100 (I think), but > it's in that ballbark.Is that using rsync:// or rsync/ssh ? What is the commandline? There are probably two rsync processes on the server. Could you strace both? How much vram are they using? ps e -O vsz -C rsync What filesystem is it? Can you create a tarball of the directory? time c /path/to/dir |wc Justin
Christian Iversen
2012-Nov-12 09:52 UTC
vanilla rsync 3.0.9 hangs after transferring ~2000 files
On 2012-11-09 02:55, Karl O. Pinc wrote:> On 11/08/2012 04:40:55 PM, Christian Iversen wrote: >> On 2012-11-02 17:12, Karl O. Pinc wrote: >>> On 11/02/2012 08:33:13 AM, Christian Iversen wrote: >> >> Thank you very much for the reply. Have to say, I think it's a bit of >> a >> red alley / blind herring. >> >> Everything is over SSH (which means it's encrypted, and as thus, >> appears >> like random data). I have also tried with a without compression, no >> difference. Also, it seems to hang after approx 2000 files each time, >> which could hardly be because of the network. >> >> It's not that I don't want to debug it, but looking at megabytes upon >> megabytes of encrypted SSH traffic is just not my idea of a good time >> ;-) > > I'm thinking syn, ack, and rst. You can learn a lot from a good > rst. :-)I dumped the entire transmission until it stopped: client: http://ifile.dk/d/f12d6711-e623-4092-a50c-92bb96d67e5b/ server: http://ifile.dk/d/659a775e-7f3c-45ca-a537-68d90503a5b4/ No RST when it hangs, but the server does seem to be recieving only packages of size 52 bytes after some incorrect checksum near the end. Any reason to be worried? :-) -- De bedste hilsner, Christian Iversen Systemadministrator, Meebox.net ------- Denne e-mail kan indeholde fortrolige oplysninger. Er du ikke den rette modtager, bedes du returnere og slette denne e-mail. -------