thr3ads.net - rsync - User controlled i/o block size? [Apr 2016]

If this information is useful, please help other people find it:
Share via:

Kevin Korb

2016-Apr-11 23:05 UTC

User controlled i/o block size?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

You didn't say if you were networking or what features of rsync you
are using but if you aren't networking and aren't doing anything fancy
you are probably better off with cp -au which is essentially the same
as rsync -au except faster.

Anyways, smaller reads and writes are usually better handled by the
OS's caches than really big ones.

On 04/11/2016 07:00 PM, Greg Freemyer wrote:> All,
> 
> One big thing I failed to mention is I was running rsync inside a 
> cygwin windows 8.1 setup.
> 
> I moved it to a linux box and the behavior is much better.  I get
> a nice smooth 85-90 MB/sec.  That might be the max speed of the
> source drive.
> 
> I'd still like a way to improve rsync's performance in cygwin, but
> I can understand it is a low priority.
> 
> Thanks Greg -- Greg Freemyer www.IntelligentAvatar.net
> 
> 
> On Mon, Apr 11, 2016 at 4:08 PM, Greg Freemyer
> <greg.freemyer at gmail.com> wrote:
>> I hope this isn't a FAQ.
>> 
>> Per the man page I see ways to control the blocksize for hash 
>> comparison reasons, but no way to control it for i/o performance 
>> reasons.
>> 
>> I'm using rsync to copy folder trees full of large files and
I'd
>> like to have control of how much data is read / written at a
>> time.  Maybe read 10 MB, write 10 MB, etc.
>> 
>> Is there an existing way to do that?
>> 
>> == details =>> 
>> When copying a bunch of 1.5 GB files with rsync, I'm only seeing
>> 50% of the throughput I hope to see.
>> 
>> I haven't looked at the code, or even run strace, but it seems
>> like the code is doing something like:
>> 
>> while (files)  { read 1.5 GB file to ram write 1.5 GB file from
>> ram fsync()  ensure 1.5 GB file is on disk } endwhile
>> 
>> I say that because I see several seconds of high-speed reading,
>> then no reads.
>> 
>> When the reads stop, I see writes kick in, then they stop and
>> reads start up again.
>> 
>> The end result is I'm only using 50% of the available bandwidth.
>> 
>> Not that I'm copying my source folder tree to a newly created
>> folder tree, so there is not any reading of the destination
>> needed. My ultimate would be something like:
>> 
>> while (files) { while (data_in_file) { read
>> user_defined_blocksize to ram from file write
>> user_defined_blocksize from ram to file } fsync()  ensure 1.5 GB
>> file is on disk } endwhile
>> 
>> Thanks Greg -- Greg Freemyer www.IntelligentAvatar.net
> 
- -- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
	Kevin Korb			Phone:    (407) 252-6853
	Systems Administrator		Internet:
	FutureQuest, Inc.		Kevin at FutureQuest.net  (work)
	Orlando, Florida		kmk at sanitarium.net (personal)
	Web page:			http://www.sanitarium.net/
	PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlcMLaIACgkQVKC1jlbQAQcRfACgkhNohkizfd1zm502bXjX0cN9
BnwAn1sMsWRg3er1aiynU4koDEYEiI91
=/1Va
-----END PGP SIGNATURE-----

Greg Freemyer

2016-Apr-11 23:33 UTC

head link

User controlled i/o block size?

I'm just doing a local copy:

rsync -avp --progress <source_dir> <dest_dir>

The source and dest are on different spindles.

Some of my copies are a TB or more (I just started one that is 1.5 TB).

It is my assumption (possibly faulty) that rsync is more robust for
handling any aborted copies that have to get restarted after the copy
failed, thus my preference for rsync.

3 performance numbers, all with the exact same drives.  They are USB-3
and I'm moving them between a Windows and Linux computer.

- Robocopy on a beefy Windows box - 105 MB/sec
- rsync on the Windows box - 70 MB/sec
- rsync on an old linux laptop - 90 MB/sec

It seems to me rsync could run faster on both boxes, but 70 MB/sec is
particularly bad.
> Anyways, smaller reads and writes are usually better handled by the
> OS's caches than really big ones.
Exactly.  Watching resource manager in Windows made me think rsync was
reading in the full 1.5 GB file before writing anything.  Maybe it is
just some weird windows kernel behavior?

==As a test, in Linux I started up 2 rsync's running in parallel.

Different source media, but the same destination (It's a faster drive
than the source media).

I got 120 MB/sec write speeds to the destination in that mode.  Both
of the source drives slowed down to 60 MB/sec to compensate.

I was very pleased with the parallel rsync test.

Greg
--
Greg Freemyer
www.IntelligentAvatar.net


On Mon, Apr 11, 2016 at 7:05 PM, Kevin Korb <kmk at sanitarium.net>
wrote:> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> You didn't say if you were networking or what features of rsync you
> are using but if you aren't networking and aren't doing anything
fancy
> you are probably better off with cp -au which is essentially the same
> as rsync -au except faster.
>
> Anyways, smaller reads and writes are usually better handled by the
> OS's caches than really big ones.
>
> On 04/11/2016 07:00 PM, Greg Freemyer wrote:
>> All,
>>
>> One big thing I failed to mention is I was running rsync inside a
>> cygwin windows 8.1 setup.
>>
>> I moved it to a linux box and the behavior is much better.  I get
>> a nice smooth 85-90 MB/sec.  That might be the max speed of the
>> source drive.
>>
>> I'd still like a way to improve rsync's performance in cygwin,
but
>> I can understand it is a low priority.
>>
>> Thanks Greg -- Greg Freemyer www.IntelligentAvatar.net
>>
>>
>> On Mon, Apr 11, 2016 at 4:08 PM, Greg Freemyer
>> <greg.freemyer at gmail.com> wrote:
>>> I hope this isn't a FAQ.
>>>
>>> Per the man page I see ways to control the blocksize for hash
>>> comparison reasons, but no way to control it for i/o performance
>>> reasons.
>>>
>>> I'm using rsync to copy folder trees full of large files and
I'd
>>> like to have control of how much data is read / written at a
>>> time.  Maybe read 10 MB, write 10 MB, etc.
>>>
>>> Is there an existing way to do that?
>>>
>>> == details =>>>
>>> When copying a bunch of 1.5 GB files with rsync, I'm only
seeing
>>> 50% of the throughput I hope to see.
>>>
>>> I haven't looked at the code, or even run strace, but it seems
>>> like the code is doing something like:
>>>
>>> while (files)  { read 1.5 GB file to ram write 1.5 GB file from
>>> ram fsync()  ensure 1.5 GB file is on disk } endwhile
>>>
>>> I say that because I see several seconds of high-speed reading,
>>> then no reads.
>>>
>>> When the reads stop, I see writes kick in, then they stop and
>>> reads start up again.
>>>
>>> The end result is I'm only using 50% of the available
bandwidth.
>>>
>>> Not that I'm copying my source folder tree to a newly created
>>> folder tree, so there is not any reading of the destination
>>> needed. My ultimate would be something like:
>>>
>>> while (files) { while (data_in_file) { read
>>> user_defined_blocksize to ram from file write
>>> user_defined_blocksize from ram to file } fsync()  ensure 1.5 GB
>>> file is on disk } endwhile
>>>
>>> Thanks Greg -- Greg Freemyer www.IntelligentAvatar.net
>>
>
> - --
>
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
>         Kevin Korb                      Phone:    (407) 252-6853
>         Systems Administrator           Internet:
>         FutureQuest, Inc.               Kevin at FutureQuest.net  (work)
>         Orlando, Florida                kmk at sanitarium.net (personal)
>         Web page:                       http://www.sanitarium.net/
>         PGP public key available on web site.
>
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2
>
> iEYEARECAAYFAlcMLaIACgkQVKC1jlbQAQcRfACgkhNohkizfd1zm502bXjX0cN9
> BnwAn1sMsWRg3er1aiynU4koDEYEiI91
> =/1Va
> -----END PGP SIGNATURE-----
>
> --
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options:
https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Kevin Korb

2016-Apr-11 23:45 UTC

head link

User controlled i/o block size?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Yeah, you aren't doing anything there that cp -auv wouldn't do better.

In terms of an abort, rsync with your command line would delete the
partial file it was working on so no help there.  If you did add
- --partial to prevent that it wouldn't matter because rsync forces
- --whole-file on local copies so it is going to redo the file anyway.

On 04/11/2016 07:33 PM, Greg Freemyer wrote:> I'm just doing a local copy:
> 
> rsync -avp --progress <source_dir> <dest_dir>
> 
> The source and dest are on different spindles.
> 
> Some of my copies are a TB or more (I just started one that is 1.5
> TB).
> 
> It is my assumption (possibly faulty) that rsync is more robust
> for handling any aborted copies that have to get restarted after
> the copy failed, thus my preference for rsync.
> 
> 3 performance numbers, all with the exact same drives.  They are
> USB-3 and I'm moving them between a Windows and Linux computer.
> 
> - Robocopy on a beefy Windows box - 105 MB/sec - rsync on the
> Windows box - 70 MB/sec - rsync on an old linux laptop - 90 MB/sec
> 
> It seems to me rsync could run faster on both boxes, but 70 MB/sec
> is particularly bad.
> 
>> Anyways, smaller reads and writes are usually better handled by
>> the OS's caches than really big ones.
> 
> Exactly.  Watching resource manager in Windows made me think rsync
> was reading in the full 1.5 GB file before writing anything.  Maybe
> it is just some weird windows kernel behavior?
> 
> === As a test, in Linux I started up 2 rsync's running in
> parallel.
> 
> Different source media, but the same destination (It's a faster
> drive than the source media).
> 
> I got 120 MB/sec write speeds to the destination in that mode.
> Both of the source drives slowed down to 60 MB/sec to compensate.
> 
> I was very pleased with the parallel rsync test.
> 
> Greg -- Greg Freemyer www.IntelligentAvatar.net
> 
> 
> On Mon, Apr 11, 2016 at 7:05 PM, Kevin Korb <kmk at sanitarium.net>
> wrote: You didn't say if you were networking or what features of
> rsync you are using but if you aren't networking and aren't doing
> anything fancy you are probably better off with cp -au which is
> essentially the same as rsync -au except faster.
> 
> Anyways, smaller reads and writes are usually better handled by
> the OS's caches than really big ones.
> 
> On 04/11/2016 07:00 PM, Greg Freemyer wrote:
>>>> All,
>>>> 
>>>> One big thing I failed to mention is I was running rsync
>>>> inside a cygwin windows 8.1 setup.
>>>> 
>>>> I moved it to a linux box and the behavior is much better.  I
>>>> get a nice smooth 85-90 MB/sec.  That might be the max speed
>>>> of the source drive.
>>>> 
>>>> I'd still like a way to improve rsync's performance in
>>>> cygwin, but I can understand it is a low priority.
>>>> 
>>>> Thanks Greg -- Greg Freemyer www.IntelligentAvatar.net
>>>> 
>>>> 
>>>> On Mon, Apr 11, 2016 at 4:08 PM, Greg Freemyer 
>>>> <greg.freemyer at gmail.com> wrote:
>>>>> I hope this isn't a FAQ.
>>>>> 
>>>>> Per the man page I see ways to control the blocksize for
>>>>> hash comparison reasons, but no way to control it for i/o
>>>>> performance reasons.
>>>>> 
>>>>> I'm using rsync to copy folder trees full of large
files
>>>>> and I'd like to have control of how much data is read /
>>>>> written at a time.  Maybe read 10 MB, write 10 MB, etc.
>>>>> 
>>>>> Is there an existing way to do that?
>>>>> 
>>>>> == details =>>>>> 
>>>>> When copying a bunch of 1.5 GB files with rsync, I'm
only
>>>>> seeing 50% of the throughput I hope to see.
>>>>> 
>>>>> I haven't looked at the code, or even run strace, but
it
>>>>> seems like the code is doing something like:
>>>>> 
>>>>> while (files)  { read 1.5 GB file to ram write 1.5 GB file
>>>>> from ram fsync()  ensure 1.5 GB file is on disk } endwhile
>>>>> 
>>>>> I say that because I see several seconds of high-speed
>>>>> reading, then no reads.
>>>>> 
>>>>> When the reads stop, I see writes kick in, then they stop
>>>>> and reads start up again.
>>>>> 
>>>>> The end result is I'm only using 50% of the available
>>>>> bandwidth.
>>>>> 
>>>>> Not that I'm copying my source folder tree to a newly
>>>>> created folder tree, so there is not any reading of the
>>>>> destination needed. My ultimate would be something like:
>>>>> 
>>>>> while (files) { while (data_in_file) { read 
>>>>> user_defined_blocksize to ram from file write 
>>>>> user_defined_blocksize from ram to file } fsync()  ensure
>>>>> 1.5 GB file is on disk } endwhile
>>>>> 
>>>>> Thanks Greg -- Greg Freemyer www.IntelligentAvatar.net
>>>> 
> 
>> 
>> -- Please use reply-all for most replies to avoid omitting the
>> mailing list. To unsubscribe or change options:
>> https://lists.samba.org/mailman/listinfo/rsync Before posting,
>> read: http://www.catb.org/~esr/faqs/smart-questions.html
- -- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
	Kevin Korb			Phone:    (407) 252-6853
	Systems Administrator		Internet:
	FutureQuest, Inc.		Kevin at FutureQuest.net  (work)
	Orlando, Florida		kmk at sanitarium.net (personal)
	Web page:			http://www.sanitarium.net/
	PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlcMNwMACgkQVKC1jlbQAQfXHACeKxTJQqoAs07XmzPhjNA6YBQ6
63YAmwRAbapO83YqfnoTRfBqj4H4cZ/h
=3Kdh
-----END PGP SIGNATURE-----

Fabian Cenedese

2016-Apr-12 07:02 UTC

head link

User controlled i/o block size?

At 01:33 12.04.2016, Greg Freemyer wrote:>Content-Transfer-Encoding: 7bit
>
>I'm just doing a local copy:
>rsync -avp --progress <source_dir> <dest_dir>
Just as side information: In local copies all files are copied wholly,
the diff algorithm is not in effect. So if a file changes then it still is
copied completely (without --partial, --no-whole-file etc).

Second thing: From what I remember rsync does a lot of stat calls to
get every file's properties. This is more expensive on cygwin/Windows
than on linux directly. Rsync also uses processes/threads which are
easier/faster to create and switched to on linux than on Windows.

A Windows native implementation of rsync could run faster than the
original rsync with cygwin layer. Some time ago somebody announced
a new program using the rsync algorithm. But I never used it so I don't
know about the features or speed.

<http://www.acrosync.com/windows.html>http://www.acrosync.com/windows.html

bye  Fabi

Greg Freemyer

2016-Apr-12 18:54 UTC

head link

User controlled i/o block size?

On Mon, Apr 11, 2016 at 7:05 PM, Kevin Korb <kmk at sanitarium.net>
wrote:> You didn't say if you were networking or what features of rsync you
> are using but if you aren't networking and aren't doing anything
fancy
> you are probably better off with cp -au which is essentially the same
> as rsync -au except faster.
I was curious if "cp -au" was indeed as robust as rsync.

No it isn't.  My test:

Create a folder with numerous files in it (a dozen in my case).  Have
one of them be 9GB (or anything relatively big).

cp -au <src-folder> <dest-folder>

Look in the destination folder and when you see the 9GB file growing,
kill "cp -au".  (I just did a control-C).

Restart "cp -au".

I ended up with a truncated copy of the 9GB file.  (roughly a 3GB file.)

The copy I did yesterday was about 1200 files.  Almost all were about
1.5GB in size, so that was a multi-hour process to make the copy.

Using rsync, I can kill the copy at any time (by desire or system
issue) and just restart it.

Using the simple "rsync -avp --progress" command I end up recopying
the file that was in progress when rsync was aborted, but 1.5GB files
only take 10 or 15 seconds to copy, so that is a minimal wasted effort
when considering a copy process that runs for hours.

fyi: In my job I work with 100GB+ read-only datasets all the time.
The tools are all designed  to segment the data into 1.5 GB files.
One advantage is if a file becomes corrupt, just that segment file has
to be replaced.  All the large files are validated via MD5 hash (or
SHA-256, etc).  I keep a minimum of two copies of all datasets.
Yesterday I was making a third copy of several of the datasets, so I
had almost 2TB of data to copy.

Thanks
Greg
--
Greg Freemyer
www.IntelligentAvatar.net

Kevin Korb

2016-Apr-12 19:22 UTC

head link

User controlled i/o block size?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

In that instance you would need to delete the incomplete file.  The
same would happen if you used -u on rsync but -u is cp's only method
of avoiding files that are already there.

On 04/12/2016 02:54 PM, Greg Freemyer wrote:> On Mon, Apr 11, 2016 at 7:05 PM, Kevin Korb <kmk at sanitarium.net>
> wrote:
>> You didn't say if you were networking or what features of rsync
>> you are using but if you aren't networking and aren't doing
>> anything fancy you are probably better off with cp -au which is
>> essentially the same as rsync -au except faster.
> 
> I was curious if "cp -au" was indeed as robust as rsync.
> 
> No it isn't.  My test:
> 
> Create a folder with numerous files in it (a dozen in my case).
> Have one of them be 9GB (or anything relatively big).
> 
> cp -au <src-folder> <dest-folder>
> 
> Look in the destination folder and when you see the 9GB file
> growing, kill "cp -au".  (I just did a control-C).
> 
> Restart "cp -au".
> 
> I ended up with a truncated copy of the 9GB file.  (roughly a 3GB
> file.)
> 
> The copy I did yesterday was about 1200 files.  Almost all were
> about 1.5GB in size, so that was a multi-hour process to make the
> copy.
> 
> Using rsync, I can kill the copy at any time (by desire or system 
> issue) and just restart it.
> 
> Using the simple "rsync -avp --progress" command I end up
> recopying the file that was in progress when rsync was aborted, but
> 1.5GB files only take 10 or 15 seconds to copy, so that is a
> minimal wasted effort when considering a copy process that runs for
> hours.
> 
> fyi: In my job I work with 100GB+ read-only datasets all the time. 
> The tools are all designed  to segment the data into 1.5 GB files. 
> One advantage is if a file becomes corrupt, just that segment file
> has to be replaced.  All the large files are validated via MD5 hash
> (or SHA-256, etc).  I keep a minimum of two copies of all
> datasets. Yesterday I was making a third copy of several of the
> datasets, so I had almost 2TB of data to copy.
> 
> Thanks Greg -- Greg Freemyer www.IntelligentAvatar.net
> 
- -- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
	Kevin Korb			Phone:    (407) 252-6853
	Systems Administrator		Internet:
	FutureQuest, Inc.		Kevin at FutureQuest.net  (work)
	Orlando, Florida		kmk at sanitarium.net (personal)
	Web page:			http://www.sanitarium.net/
	PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlcNSwAACgkQVKC1jlbQAQcEQwCdEc8gRw/Qy7F4xMKpdmKjBE2B
dzYAoMk5CBmTrd2mes6lnDOwCWusaO3o
=gU2g
-----END PGP SIGNATURE-----

Maybe Matching Threads

Search for more apparently analagous threads

rsync - Apr 2016 - User controlled i/o block size?

User controlled i/o block size?

User controlled i/o block size?

User controlled i/o block size?

User controlled i/o block size?

User controlled i/o block size?

User controlled i/o block size?

Maybe Matching Threads