Hi all, today we had a performance issue transfering a big amount of data where one file was over 50GB. Rsync was tunneled over SSH and we expected the data to be synced within hours. However after over 10 hours the data is still not synced ... The sending box has rsync running with 60-80 % CPU load (2GHz Pentium 4) while the receiver is nearly idle. So far I had no acces to the poblematic setup but I will have to analyze this soon. I would like to ask beforehand if there are known performance hits syncing such huge files? Sincerely, Ren? Rebe
Hi, in reply to my previous post, I can reproduce the issue locally here. I produced a 50146750688 bytes /home/test.dat out of cat'ing a lot of data files together (needed some input data ...). The initial rsync take over an hour saturating the 100mbit ethernet. I then used shred on the first GB of the file and cat'ed some more data to the end and rerun rsync: rsync -arvPe ssh 192.168.2.45:/home/test.dat test.dat the sending Athlon XP 2500+ is saturated while the receiver gets: 619708416 1% 1.03MB/s 13:02:58 of course the dual cpu ppc64 receiver is idling waiting for any data to arrive. The sender: 22952 root 18 0 12988 11m 624 D 95.0 2.2 5:24.75 rsync Oprofile shows: samples % symbol name 9273739 87.1946 match_sums 633910 5.9602 map_ptr 459817 4.3233 mdfour64 217974 2.0495 copy64 32467 0.3053 mdfour_update 11310 0.1063 get_checksum2 2206 0.0207 writefd_unbuffered 871 0.0082 sum_update 638 0.0060 writefd 522 0.0049 mplex_write 306 0.0029 compare_targets 256 0.0024 io_flush 253 0.0024 matched 241 0.0023 send_token 229 0.0022 msg_list_push 198 0.0019 mdfour_tail 128 0.0012 write_int 122 0.0011 readfd_unbuffered 105 9.9e-04 readfd 94 8.8e-04 send_files 69 6.5e-04 mdfour_begin 63 5.9e-04 mdfour_result 37 3.5e-04 write_buf 33 3.1e-04 .plt 25 2.4e-04 copy4 25 2.4e-04 read_int 19 1.8e-04 read_buf 11 1.0e-04 read_timeout 10 9.4e-05 get_checksum1 1 9.4e-06 _fini 1 9.4e-06 clean_flist 1 9.4e-06 deflate_fast 1 9.4e-06 parse_arguments So far ... I continue to analyze the issue, maybe some rsync developer already comes to a conclusionn while I start reading thru the source. Best regards, On Monday 09 January 2006 23:38, Ren? Rebe wrote:> today we had a performance issue transfering a big amount of data where > one file was over 50GB. Rsync was tunneled over SSH and we expected the data > to be synced within hours. However after over 10 hours the data is still not > synced ... The sending box has rsync running with 60-80 % CPU load (2GHz > Pentium 4) while the receiver is nearly idle. > > So far I had no acces to the poblematic setup but I will have to analyze this > soon. I would like to ask beforehand if there are known performance hits > syncing such huge files?-- Ren? Rebe - Rubensstr. 64 - 12157 Berlin (Europe / Germany) exactcode.de | t2-project.org +49 (0)30 255 897 45
On Tue, Jan 10, 2006 at 07:46:19PM +0100, Ren? Rebe wrote:> of course the dual cpu ppc64 receiver is idling waiting for any data > to arrive.There is a known problem with really large numbers of blocks: the hash search algorithm gets too many collisions, and the search routine bogs down. This sounds like what is happening to you. Unfortunately, I can offer no suggestions for relief other than breaking up the file(s) into smaller parts. ..wayne..