-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 You didn't say if you were networking or what features of rsync you are using but if you aren't networking and aren't doing anything fancy you are probably better off with cp -au which is essentially the same as rsync -au except faster. Anyways, smaller reads and writes are usually better handled by the OS's caches than really big ones. On 04/11/2016 07:00 PM, Greg Freemyer wrote:> All, > > One big thing I failed to mention is I was running rsync inside a > cygwin windows 8.1 setup. > > I moved it to a linux box and the behavior is much better. I get > a nice smooth 85-90 MB/sec. That might be the max speed of the > source drive. > > I'd still like a way to improve rsync's performance in cygwin, but > I can understand it is a low priority. > > Thanks Greg -- Greg Freemyer www.IntelligentAvatar.net > > > On Mon, Apr 11, 2016 at 4:08 PM, Greg Freemyer > <greg.freemyer at gmail.com> wrote: >> I hope this isn't a FAQ. >> >> Per the man page I see ways to control the blocksize for hash >> comparison reasons, but no way to control it for i/o performance >> reasons. >> >> I'm using rsync to copy folder trees full of large files and I'd >> like to have control of how much data is read / written at a >> time. Maybe read 10 MB, write 10 MB, etc. >> >> Is there an existing way to do that? >> >> == details =>> >> When copying a bunch of 1.5 GB files with rsync, I'm only seeing >> 50% of the throughput I hope to see. >> >> I haven't looked at the code, or even run strace, but it seems >> like the code is doing something like: >> >> while (files) { read 1.5 GB file to ram write 1.5 GB file from >> ram fsync() ensure 1.5 GB file is on disk } endwhile >> >> I say that because I see several seconds of high-speed reading, >> then no reads. >> >> When the reads stop, I see writes kick in, then they stop and >> reads start up again. >> >> The end result is I'm only using 50% of the available bandwidth. >> >> Not that I'm copying my source folder tree to a newly created >> folder tree, so there is not any reading of the destination >> needed. My ultimate would be something like: >> >> while (files) { while (data_in_file) { read >> user_defined_blocksize to ram from file write >> user_defined_blocksize from ram to file } fsync() ensure 1.5 GB >> file is on disk } endwhile >> >> Thanks Greg -- Greg Freemyer www.IntelligentAvatar.net >- -- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. Kevin at FutureQuest.net (work) Orlando, Florida kmk at sanitarium.net (personal) Web page: http://www.sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlcMLaIACgkQVKC1jlbQAQcRfACgkhNohkizfd1zm502bXjX0cN9 BnwAn1sMsWRg3er1aiynU4koDEYEiI91 =/1Va -----END PGP SIGNATURE-----
I'm just doing a local copy: rsync -avp --progress <source_dir> <dest_dir> The source and dest are on different spindles. Some of my copies are a TB or more (I just started one that is 1.5 TB). It is my assumption (possibly faulty) that rsync is more robust for handling any aborted copies that have to get restarted after the copy failed, thus my preference for rsync. 3 performance numbers, all with the exact same drives. They are USB-3 and I'm moving them between a Windows and Linux computer. - Robocopy on a beefy Windows box - 105 MB/sec - rsync on the Windows box - 70 MB/sec - rsync on an old linux laptop - 90 MB/sec It seems to me rsync could run faster on both boxes, but 70 MB/sec is particularly bad.> Anyways, smaller reads and writes are usually better handled by the > OS's caches than really big ones.Exactly. Watching resource manager in Windows made me think rsync was reading in the full 1.5 GB file before writing anything. Maybe it is just some weird windows kernel behavior? ==As a test, in Linux I started up 2 rsync's running in parallel. Different source media, but the same destination (It's a faster drive than the source media). I got 120 MB/sec write speeds to the destination in that mode. Both of the source drives slowed down to 60 MB/sec to compensate. I was very pleased with the parallel rsync test. Greg -- Greg Freemyer www.IntelligentAvatar.net On Mon, Apr 11, 2016 at 7:05 PM, Kevin Korb <kmk at sanitarium.net> wrote:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > You didn't say if you were networking or what features of rsync you > are using but if you aren't networking and aren't doing anything fancy > you are probably better off with cp -au which is essentially the same > as rsync -au except faster. > > Anyways, smaller reads and writes are usually better handled by the > OS's caches than really big ones. > > On 04/11/2016 07:00 PM, Greg Freemyer wrote: >> All, >> >> One big thing I failed to mention is I was running rsync inside a >> cygwin windows 8.1 setup. >> >> I moved it to a linux box and the behavior is much better. I get >> a nice smooth 85-90 MB/sec. That might be the max speed of the >> source drive. >> >> I'd still like a way to improve rsync's performance in cygwin, but >> I can understand it is a low priority. >> >> Thanks Greg -- Greg Freemyer www.IntelligentAvatar.net >> >> >> On Mon, Apr 11, 2016 at 4:08 PM, Greg Freemyer >> <greg.freemyer at gmail.com> wrote: >>> I hope this isn't a FAQ. >>> >>> Per the man page I see ways to control the blocksize for hash >>> comparison reasons, but no way to control it for i/o performance >>> reasons. >>> >>> I'm using rsync to copy folder trees full of large files and I'd >>> like to have control of how much data is read / written at a >>> time. Maybe read 10 MB, write 10 MB, etc. >>> >>> Is there an existing way to do that? >>> >>> == details =>>> >>> When copying a bunch of 1.5 GB files with rsync, I'm only seeing >>> 50% of the throughput I hope to see. >>> >>> I haven't looked at the code, or even run strace, but it seems >>> like the code is doing something like: >>> >>> while (files) { read 1.5 GB file to ram write 1.5 GB file from >>> ram fsync() ensure 1.5 GB file is on disk } endwhile >>> >>> I say that because I see several seconds of high-speed reading, >>> then no reads. >>> >>> When the reads stop, I see writes kick in, then they stop and >>> reads start up again. >>> >>> The end result is I'm only using 50% of the available bandwidth. >>> >>> Not that I'm copying my source folder tree to a newly created >>> folder tree, so there is not any reading of the destination >>> needed. My ultimate would be something like: >>> >>> while (files) { while (data_in_file) { read >>> user_defined_blocksize to ram from file write >>> user_defined_blocksize from ram to file } fsync() ensure 1.5 GB >>> file is on disk } endwhile >>> >>> Thanks Greg -- Greg Freemyer www.IntelligentAvatar.net >> > > - -- > ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., > Kevin Korb Phone: (407) 252-6853 > Systems Administrator Internet: > FutureQuest, Inc. Kevin at FutureQuest.net (work) > Orlando, Florida kmk at sanitarium.net (personal) > Web page: http://www.sanitarium.net/ > PGP public key available on web site. > ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2 > > iEYEARECAAYFAlcMLaIACgkQVKC1jlbQAQcRfACgkhNohkizfd1zm502bXjX0cN9 > BnwAn1sMsWRg3er1aiynU4koDEYEiI91 > =/1Va > -----END PGP SIGNATURE----- > > -- > Please use reply-all for most replies to avoid omitting the mailing list. > To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Yeah, you aren't doing anything there that cp -auv wouldn't do better. In terms of an abort, rsync with your command line would delete the partial file it was working on so no help there. If you did add - --partial to prevent that it wouldn't matter because rsync forces - --whole-file on local copies so it is going to redo the file anyway. On 04/11/2016 07:33 PM, Greg Freemyer wrote:> I'm just doing a local copy: > > rsync -avp --progress <source_dir> <dest_dir> > > The source and dest are on different spindles. > > Some of my copies are a TB or more (I just started one that is 1.5 > TB). > > It is my assumption (possibly faulty) that rsync is more robust > for handling any aborted copies that have to get restarted after > the copy failed, thus my preference for rsync. > > 3 performance numbers, all with the exact same drives. They are > USB-3 and I'm moving them between a Windows and Linux computer. > > - Robocopy on a beefy Windows box - 105 MB/sec - rsync on the > Windows box - 70 MB/sec - rsync on an old linux laptop - 90 MB/sec > > It seems to me rsync could run faster on both boxes, but 70 MB/sec > is particularly bad. > >> Anyways, smaller reads and writes are usually better handled by >> the OS's caches than really big ones. > > Exactly. Watching resource manager in Windows made me think rsync > was reading in the full 1.5 GB file before writing anything. Maybe > it is just some weird windows kernel behavior? > > === As a test, in Linux I started up 2 rsync's running in > parallel. > > Different source media, but the same destination (It's a faster > drive than the source media). > > I got 120 MB/sec write speeds to the destination in that mode. > Both of the source drives slowed down to 60 MB/sec to compensate. > > I was very pleased with the parallel rsync test. > > Greg -- Greg Freemyer www.IntelligentAvatar.net > > > On Mon, Apr 11, 2016 at 7:05 PM, Kevin Korb <kmk at sanitarium.net> > wrote: You didn't say if you were networking or what features of > rsync you are using but if you aren't networking and aren't doing > anything fancy you are probably better off with cp -au which is > essentially the same as rsync -au except faster. > > Anyways, smaller reads and writes are usually better handled by > the OS's caches than really big ones. > > On 04/11/2016 07:00 PM, Greg Freemyer wrote: >>>> All, >>>> >>>> One big thing I failed to mention is I was running rsync >>>> inside a cygwin windows 8.1 setup. >>>> >>>> I moved it to a linux box and the behavior is much better. I >>>> get a nice smooth 85-90 MB/sec. That might be the max speed >>>> of the source drive. >>>> >>>> I'd still like a way to improve rsync's performance in >>>> cygwin, but I can understand it is a low priority. >>>> >>>> Thanks Greg -- Greg Freemyer www.IntelligentAvatar.net >>>> >>>> >>>> On Mon, Apr 11, 2016 at 4:08 PM, Greg Freemyer >>>> <greg.freemyer at gmail.com> wrote: >>>>> I hope this isn't a FAQ. >>>>> >>>>> Per the man page I see ways to control the blocksize for >>>>> hash comparison reasons, but no way to control it for i/o >>>>> performance reasons. >>>>> >>>>> I'm using rsync to copy folder trees full of large files >>>>> and I'd like to have control of how much data is read / >>>>> written at a time. Maybe read 10 MB, write 10 MB, etc. >>>>> >>>>> Is there an existing way to do that? >>>>> >>>>> == details =>>>>> >>>>> When copying a bunch of 1.5 GB files with rsync, I'm only >>>>> seeing 50% of the throughput I hope to see. >>>>> >>>>> I haven't looked at the code, or even run strace, but it >>>>> seems like the code is doing something like: >>>>> >>>>> while (files) { read 1.5 GB file to ram write 1.5 GB file >>>>> from ram fsync() ensure 1.5 GB file is on disk } endwhile >>>>> >>>>> I say that because I see several seconds of high-speed >>>>> reading, then no reads. >>>>> >>>>> When the reads stop, I see writes kick in, then they stop >>>>> and reads start up again. >>>>> >>>>> The end result is I'm only using 50% of the available >>>>> bandwidth. >>>>> >>>>> Not that I'm copying my source folder tree to a newly >>>>> created folder tree, so there is not any reading of the >>>>> destination needed. My ultimate would be something like: >>>>> >>>>> while (files) { while (data_in_file) { read >>>>> user_defined_blocksize to ram from file write >>>>> user_defined_blocksize from ram to file } fsync() ensure >>>>> 1.5 GB file is on disk } endwhile >>>>> >>>>> Thanks Greg -- Greg Freemyer www.IntelligentAvatar.net >>>> > >> >> -- Please use reply-all for most replies to avoid omitting the >> mailing list. To unsubscribe or change options: >> https://lists.samba.org/mailman/listinfo/rsync Before posting, >> read: http://www.catb.org/~esr/faqs/smart-questions.html- -- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. Kevin at FutureQuest.net (work) Orlando, Florida kmk at sanitarium.net (personal) Web page: http://www.sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlcMNwMACgkQVKC1jlbQAQfXHACeKxTJQqoAs07XmzPhjNA6YBQ6 63YAmwRAbapO83YqfnoTRfBqj4H4cZ/h =3Kdh -----END PGP SIGNATURE-----
At 01:33 12.04.2016, Greg Freemyer wrote:>Content-Transfer-Encoding: 7bit > >I'm just doing a local copy: >rsync -avp --progress <source_dir> <dest_dir>Just as side information: In local copies all files are copied wholly, the diff algorithm is not in effect. So if a file changes then it still is copied completely (without --partial, --no-whole-file etc). Second thing: From what I remember rsync does a lot of stat calls to get every file's properties. This is more expensive on cygwin/Windows than on linux directly. Rsync also uses processes/threads which are easier/faster to create and switched to on linux than on Windows. A Windows native implementation of rsync could run faster than the original rsync with cygwin layer. Some time ago somebody announced a new program using the rsync algorithm. But I never used it so I don't know about the features or speed. <http://www.acrosync.com/windows.html>http://www.acrosync.com/windows.html bye Fabi
On Mon, Apr 11, 2016 at 7:05 PM, Kevin Korb <kmk at sanitarium.net> wrote:> You didn't say if you were networking or what features of rsync you > are using but if you aren't networking and aren't doing anything fancy > you are probably better off with cp -au which is essentially the same > as rsync -au except faster.I was curious if "cp -au" was indeed as robust as rsync. No it isn't. My test: Create a folder with numerous files in it (a dozen in my case). Have one of them be 9GB (or anything relatively big). cp -au <src-folder> <dest-folder> Look in the destination folder and when you see the 9GB file growing, kill "cp -au". (I just did a control-C). Restart "cp -au". I ended up with a truncated copy of the 9GB file. (roughly a 3GB file.) The copy I did yesterday was about 1200 files. Almost all were about 1.5GB in size, so that was a multi-hour process to make the copy. Using rsync, I can kill the copy at any time (by desire or system issue) and just restart it. Using the simple "rsync -avp --progress" command I end up recopying the file that was in progress when rsync was aborted, but 1.5GB files only take 10 or 15 seconds to copy, so that is a minimal wasted effort when considering a copy process that runs for hours. fyi: In my job I work with 100GB+ read-only datasets all the time. The tools are all designed to segment the data into 1.5 GB files. One advantage is if a file becomes corrupt, just that segment file has to be replaced. All the large files are validated via MD5 hash (or SHA-256, etc). I keep a minimum of two copies of all datasets. Yesterday I was making a third copy of several of the datasets, so I had almost 2TB of data to copy. Thanks Greg -- Greg Freemyer www.IntelligentAvatar.net
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 In that instance you would need to delete the incomplete file. The same would happen if you used -u on rsync but -u is cp's only method of avoiding files that are already there. On 04/12/2016 02:54 PM, Greg Freemyer wrote:> On Mon, Apr 11, 2016 at 7:05 PM, Kevin Korb <kmk at sanitarium.net> > wrote: >> You didn't say if you were networking or what features of rsync >> you are using but if you aren't networking and aren't doing >> anything fancy you are probably better off with cp -au which is >> essentially the same as rsync -au except faster. > > I was curious if "cp -au" was indeed as robust as rsync. > > No it isn't. My test: > > Create a folder with numerous files in it (a dozen in my case). > Have one of them be 9GB (or anything relatively big). > > cp -au <src-folder> <dest-folder> > > Look in the destination folder and when you see the 9GB file > growing, kill "cp -au". (I just did a control-C). > > Restart "cp -au". > > I ended up with a truncated copy of the 9GB file. (roughly a 3GB > file.) > > The copy I did yesterday was about 1200 files. Almost all were > about 1.5GB in size, so that was a multi-hour process to make the > copy. > > Using rsync, I can kill the copy at any time (by desire or system > issue) and just restart it. > > Using the simple "rsync -avp --progress" command I end up > recopying the file that was in progress when rsync was aborted, but > 1.5GB files only take 10 or 15 seconds to copy, so that is a > minimal wasted effort when considering a copy process that runs for > hours. > > fyi: In my job I work with 100GB+ read-only datasets all the time. > The tools are all designed to segment the data into 1.5 GB files. > One advantage is if a file becomes corrupt, just that segment file > has to be replaced. All the large files are validated via MD5 hash > (or SHA-256, etc). I keep a minimum of two copies of all > datasets. Yesterday I was making a third copy of several of the > datasets, so I had almost 2TB of data to copy. > > Thanks Greg -- Greg Freemyer www.IntelligentAvatar.net >- -- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. Kevin at FutureQuest.net (work) Orlando, Florida kmk at sanitarium.net (personal) Web page: http://www.sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlcNSwAACgkQVKC1jlbQAQcEQwCdEc8gRw/Qy7F4xMKpdmKjBE2B dzYAoMk5CBmTrd2mes6lnDOwCWusaO3o =gU2g -----END PGP SIGNATURE-----