thr3ads.net - Gluster users - [Gluster-users] Disastrous performance with rsync to mounted Gluster volume. [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Ben Turner

2015-Apr-27 21:56 UTC

[Gluster-users] Disastrous performance with rsync to mounted Gluster volume.

----- Original Message -----> From: "David Robinson" <david.robinson at corvidtec.com>
> To: "Ben Turner" <bturner at redhat.com>, "Ernie
Dunbar" <maillist at lightspeed.ca>
> Cc: "Gluster Users" <gluster-users at gluster.org>
> Sent: Monday, April 27, 2015 5:21:08 PM
> Subject: Re[2]: [Gluster-users] Disastrous performance with rsync to
mounted Gluster volume.
> 
> I am also having a terrible time with rsync and gluster.  The vast
> majority of my time is spent figuring out what to sync...  This sync
> takes 17-hours even though very little data is being transferred.
> 
> sent 120,523 bytes  received 74,485,191,265 bytes  1,210,720.02
> bytes/sec
> total size is 27,589,660,889,910  speedup is 370.40
> 
Maybe we could try something to confirm / deny my theory.  What about asking
rsync to ignore anything that could differ between bricks in a replicated pair. 
A couple options I see are:

--size-only means that rsync will skip files that match in size, even if the
timestamps differ. This means it will synchronise less files than the default
behaviour. It will miss any file with changes that don't affect the overall
file size.

--ignore-times means that rsync will checksum every file, even if the timestamps
and file sizes match. This means it will synchronise more files than the default
behaviour. It will include changes to files even where the file size is the same
and the modification date/time has been reset to the original value (resetting
the date/time is unlikely to be done in practise, but it could happen).

These may also help, but it looks more to be for recovering from brick failures:

http://blog.gluster.org/category/rsync/
https://mjanja.ch/2014/07/parallelizing-rsync/?utm_source=rss&utm_medium=rss&utm_campaign=parallelizing-rsync#sync_brick

I'll try some stuff in the lab and see if I can come up with RCA or
something that helps.

-b
 > 
> ------ Original Message ------
> From: "Ben Turner" <bturner at redhat.com>
> To: "Ernie Dunbar" <maillist at lightspeed.ca>
> Cc: "Gluster Users" <gluster-users at gluster.org>
> Sent: 4/27/2015 4:52:35 PM
> Subject: Re: [Gluster-users] Disastrous performance with rsync to
> mounted Gluster volume.
> 
> >----- Original Message -----
> >>  From: "Ernie Dunbar" <maillist at lightspeed.ca>
> >>  To: "Gluster Users" <gluster-users at
gluster.org>
> >>  Sent: Monday, April 27, 2015 4:24:56 PM
> >>  Subject: Re: [Gluster-users] Disastrous performance with rsync to
> >>mounted Gluster volume.
> >>
> >>  On 2015-04-24 11:43, Joe Julian wrote:
> >>
> >>  >> This should get you where you need to be.  Before you
start to
> >>migrate
> >>  >> the data maybe do a couple DDs and send me the output so
we can
> >>get an
> >>  >> idea of how your cluster performs:
> >>  >>
> >>  >> time `dd if=/dev/zero of=<gluster-mount>/myfile
bs=1024k
> >>count=1000;
> >>  >> sync`
> >>  >> echo 3 > /proc/sys/vm/drop_caches
> >>  >> dd if=<gluster mount> of=/dev/null bs=1024k
count=1000
> >>  >>
> >>  >> If you are using gigabit and glusterfs mounts with
replica 2 you
> >>  >> should get ~55 MB / sec writes and ~110 MB / sec reads. 
With NFS
> >>you
> >>  >> will take a bit of a hit since NFS doesnt know where
files live
> >>like
> >>  >> glusterfs does.
> >>
> >>  After copying our data and doing a couple of very slow rsyncs, I
did
> >>  your speed test and came back with these results:
> >>
> >>  1048576 bytes (1.0 MB) copied, 0.0307951 s, 34.1 MB/s
> >>  root at backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
> >>  count=1024 bs=1024; sync
> >>  1024+0 records in
> >>  1024+0 records out
> >>  1048576 bytes (1.0 MB) copied, 0.0298592 s, 35.1 MB/s
> >>  root at backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
> >>  count=1024 bs=1024; sync
> >>  1024+0 records in
> >>  1024+0 records out
> >>  1048576 bytes (1.0 MB) copied, 0.0501495 s, 20.9 MB/s
> >>  root at backup:/home/webmailbak# echo 3 >
/proc/sys/vm/drop_caches
> >>  root at backup:/home/webmailbak# # dd if=/mnt/testfile
of=/dev/null
> >>  bs=1024k count=1000
> >>  1+0 records in
> >>  1+0 records out
> >>  1048576 bytes (1.0 MB) copied, 0.0124498 s, 84.2 MB/s
> >>
> >>
> >>  Keep in mind that this is an NFS share over the network.
> >>
> >>  I've also noticed that if I increase the count of those
writes, the
> >>  transfer speed increases as well:
> >>
> >>  2097152 bytes (2.1 MB) copied, 0.036291 s, 57.8 MB/s
> >>  root at backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
> >>  count=2048 bs=1024; sync
> >>  2048+0 records in
> >>  2048+0 records out
> >>  2097152 bytes (2.1 MB) copied, 0.0362724 s, 57.8 MB/s
> >>  root at backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
> >>  count=2048 bs=1024; sync
> >>  2048+0 records in
> >>  2048+0 records out
> >>  2097152 bytes (2.1 MB) copied, 0.0360319 s, 58.2 MB/s
> >>  root at backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
> >>  count=10240 bs=1024; sync
> >>  10240+0 records in
> >>  10240+0 records out
> >>  10485760 bytes (10 MB) copied, 0.127219 s, 82.4 MB/s
> >>  root at backup:/home/webmailbak# dd if=/dev/zero of=/mnt/testfile
> >>  count=10240 bs=1024; sync
> >>  10240+0 records in
> >>  10240+0 records out
> >>  10485760 bytes (10 MB) copied, 0.128671 s, 81.5 MB/s
> >
> >This is correct, there is overhead that happens with small files and
> >the smaller the file the less throughput you get.  That said, since
> >files are smaller you should get more files / second but less MB /
> >second.  I have found that when you go under 16k changing files size
> >doesn't matter, you will get the same number of 16k files / second
as
> >you do 1 k files.
> >
> >>
> >>
> >>  However, the biggest stumbling block for rsync seems to be
changes to
> >>  directories. I'm unsure about what exactly it's doing
(probably
> >>changing
> >>  last access times?) but these minor writes seem to take a very
long
> >>time
> >>  when normally they would not. Actual file copies (as in the very
> >>files
> >>  that are actually new within those same directories) appear to
take
> >>  quite a lot less time than the directory updates.
> >
> >Dragons be here!  Access time is not kept in sync across the
> >replicas(IIRC, someone correct me if I am wrong!) and each time a dir
> >is read from a different brick I bet the access time is different.
> >
> >>
> >>  For example:
> >>
> >>  # time rsync -av --inplace --whole-file --ignore-existing
> >>--delete-after
> >>  gromm/* /mnt/gromm/
> >>  building file list ... done
> >>  Maildir/                        ## This part takes a long time.
> >>  Maildir/.INBOX.Trash/
> >>  Maildir/.INBOX.Trash/cur/
> >>  
>
>>Maildir/.INBOX.Trash/cur/1429836077.H817602P21531.pop.lightspeed.ca:2,S
> >>  Maildir/.INBOX.Trash/tmp/       ## The previous three lines took
> >>nearly
> >>  no time at all.
> >>  Maildir/cur/                    ## This takes a long time.
> >>  Maildir/cur/1430160436.H952679P13870.pop.lightspeed.ca:2,S
> >>  Maildir/new/
> >>  Maildir/tmp/                    ## The previous lines again take
no
> >>time
> >>  at all.
> >>  deleting
Maildir/cur/1429836077.H817602P21531.pop.lightspeed.ca:2,S
> >>  ## This delete did take a while.
> >>  sent 1327634 bytes  received 75 bytes  59009.29 bytes/sec
> >>  total size is 624491648  speedup is 470.35
> >>
> >>  real 0m26.110s
> >>  user 0m0.140s
> >>  sys 0m1.596s
> >>
> >>
> >>  So, rsync reports that it wrote 1327634 bytes at 59 kBytes/sec,
and
> >>the
> >>  whole operation took 26 seconds. To write 2 files that were
around
> >>20-30
> >>  kBytes each and delete 1.
> >>
> >>  The last rsync took around 56 minutes, when normally such an
rsync
> >>would
> >>  have taken 5-10 minutes, writing over the network via ssh.
> >
> >It may have something to do with the access times not being in sync
> >across replicated pairs.  Maybe some has experience with this / could
> >this be tripping up rsync?
> >
> >-b
> >
> >>  _______________________________________________
> >>  Gluster-users mailing list
> >>  Gluster-users at gluster.org
> >>  http://www.gluster.org/mailman/listinfo/gluster-users
> >>
> >_______________________________________________
> >Gluster-users mailing list
> >Gluster-users at gluster.org
> >http://www.gluster.org/mailman/listinfo/gluster-users
> 
>

David Robinson

2015-Apr-27 22:17 UTC

head link

[Gluster-users] Disastrous performance with rsync to mounted Gluster volume.

Do you think this issue is related to the one seen when you have 'ls' 
aliased to 'ls -F' or 'ls --color=auto'?
I included a snippet from a previous email that I had sent to the 
gluster devels (see below).

David


 > My code developers were moved over to the gluster 3.6.1 system and 
were
 > struggling to use it due to extremely poor performance. The issue was
 > when you went into a directory for the first time, the system would 
hang
 > for 5-10 seconds before letting you list the contents of the
 > directories. This was worse for directories with larger numbers of
 > files (approx. 200 files). I noticed that this only happened for
 > certain users and ended up tracing it out to alias settings for the 
"ls"
 > command.
 >
 > 'ls -F'
 > or
 > alias ls='ls --color=auto' #.... Default alias setting for bash
 >
 > Without these settings, the ls on a new directory takes less than a
 > second. With either of these alias settings, it can take 5-10 seconds
 > which makes code development extremely difficult. Note that after you
 > do an ls in a directory once, you can repeat it without the severe 
lag.
 > I assume it is caching this information.

Ahhh yeah. I'm pretty sure this one is a known problem, with the root 
cause
being that some options for ls cause it to do a stat against every file 
in
the directory (which then has to reach out to every server, for each 
file,
to find out which one has the latest info to report back).

------ Original Message ------
From: "Ben Turner" <bturner at redhat.com>
To: "David Robinson" <drobinson at corvidtec.com>
Cc: "Ernie Dunbar" <maillist at lightspeed.ca>; "Gluster
Users"
<gluster-users at gluster.org>
Sent: 4/27/2015 5:56:02 PM
Subject: Re: Re[2]: [Gluster-users] Disastrous performance with rsync to 
mounted Gluster volume.
>----- Original Message -----
>>  From: "David Robinson" <david.robinson at
corvidtec.com>
>>  To: "Ben Turner" <bturner at redhat.com>, "Ernie
Dunbar"
>><maillist at lightspeed.ca>
>>  Cc: "Gluster Users" <gluster-users at gluster.org>
>>  Sent: Monday, April 27, 2015 5:21:08 PM
>>  Subject: Re[2]: [Gluster-users] Disastrous performance with rsync to 
>>mounted Gluster volume.
>>
>>  I am also having a terrible time with rsync and gluster.  The vast
>>  majority of my time is spent figuring out what to sync...  This sync
>>  takes 17-hours even though very little data is being transferred.
>>
>>  sent 120,523 bytes  received 74,485,191,265 bytes  1,210,720.02
>>  bytes/sec
>>  total size is 27,589,660,889,910  speedup is 370.40
>>
>
>Maybe we could try something to confirm / deny my theory.  What about 
>asking rsync to ignore anything that could differ between bricks in a 
>replicated pair.  A couple options I see are:
>
>--size-only means that rsync will skip files that match in size, even 
>if the timestamps differ. This means it will synchronise less files 
>than the default behaviour. It will miss any file with changes that 
>don't affect the overall file size.
>
>--ignore-times means that rsync will checksum every file, even if the 
>timestamps and file sizes match. This means it will synchronise more 
>files than the default behaviour. It will include changes to files even 
>where the file size is the same and the modification date/time has been 
>reset to the original value (resetting the date/time is unlikely to be 
>done in practise, but it could happen).
>
>These may also help, but it looks more to be for recovering from brick 
>failures:
>
>http://blog.gluster.org/category/rsync/
>https://mjanja.ch/2014/07/parallelizing-rsync/?utm_source=rss&utm_medium=rss&utm_campaign=parallelizing-rsync#sync_brick
>
>I'll try some stuff in the lab and see if I can come up with RCA or 
>something that helps.
>
>-b
>
>>
>>  ------ Original Message ------
>>  From: "Ben Turner" <bturner at redhat.com>
>>  To: "Ernie Dunbar" <maillist at lightspeed.ca>
>>  Cc: "Gluster Users" <gluster-users at gluster.org>
>>  Sent: 4/27/2015 4:52:35 PM
>>  Subject: Re: [Gluster-users] Disastrous performance with rsync to
>>  mounted Gluster volume.
>>
>>  >----- Original Message -----
>>  >> From: "Ernie Dunbar" <maillist at
lightspeed.ca>
>>  >> To: "Gluster Users" <gluster-users at
gluster.org>
>>  >>  Sent: Monday, April 27, 2015 4:24:56 PM
>>  >>  Subject: Re: [Gluster-users] Disastrous performance with
rsync to
>>  >>mounted Gluster volume.
>>  >>
>>  >>  On 2015-04-24 11:43, Joe Julian wrote:
>>  >>
>>  >>  >> This should get you where you need to be.  Before
you start to
>>  >>migrate
>>  >>  >> the data maybe do a couple DDs and send me the
output so we
>>can
>>  >>get an
>>  >>  >> idea of how your cluster performs:
>>  >>  >>
>>  >>  >> time `dd if=/dev/zero
of=<gluster-mount>/myfile bs=1024k
>>  >>count=1000;
>>  >>  >> sync`
>>  >>  >> echo 3 > /proc/sys/vm/drop_caches
>>  >>  >> dd if=<gluster mount> of=/dev/null bs=1024k
count=1000
>>  >>  >>
>>  >>  >> If you are using gigabit and glusterfs mounts with
replica 2
>>you
>>  >>  >> should get ~55 MB / sec writes and ~110 MB / sec
reads.  With
>>NFS
>>  >>you
>>  >>  >> will take a bit of a hit since NFS doesnt know
where files
>>live
>>  >>like
>>  >>  >> glusterfs does.
>>  >>
>>  >>  After copying our data and doing a couple of very slow
rsyncs, I
>>did
>>  >>  your speed test and came back with these results:
>>  >>
>>  >>  1048576 bytes (1.0 MB) copied, 0.0307951 s, 34.1 MB/s
>>  >>  root at backup:/home/webmailbak# dd if=/dev/zero
of=/mnt/testfile
>>  >>  count=1024 bs=1024; sync
>>  >>  1024+0 records in
>>  >>  1024+0 records out
>>  >>  1048576 bytes (1.0 MB) copied, 0.0298592 s, 35.1 MB/s
>>  >>  root at backup:/home/webmailbak# dd if=/dev/zero
of=/mnt/testfile
>>  >>  count=1024 bs=1024; sync
>>  >>  1024+0 records in
>>  >>  1024+0 records out
>>  >>  1048576 bytes (1.0 MB) copied, 0.0501495 s, 20.9 MB/s
>>  >>  root at backup:/home/webmailbak# echo 3 >
/proc/sys/vm/drop_caches
>>  >>  root at backup:/home/webmailbak# # dd if=/mnt/testfile
of=/dev/null
>>  >>  bs=1024k count=1000
>>  >>  1+0 records in
>>  >>  1+0 records out
>>  >>  1048576 bytes (1.0 MB) copied, 0.0124498 s, 84.2 MB/s
>>  >>
>>  >>
>>  >>  Keep in mind that this is an NFS share over the network.
>>  >>
>>  >>  I've also noticed that if I increase the count of those
writes,
>>the
>>  >>  transfer speed increases as well:
>>  >>
>>  >>  2097152 bytes (2.1 MB) copied, 0.036291 s, 57.8 MB/s
>>  >>  root at backup:/home/webmailbak# dd if=/dev/zero
of=/mnt/testfile
>>  >>  count=2048 bs=1024; sync
>>  >>  2048+0 records in
>>  >>  2048+0 records out
>>  >>  2097152 bytes (2.1 MB) copied, 0.0362724 s, 57.8 MB/s
>>  >>  root at backup:/home/webmailbak# dd if=/dev/zero
of=/mnt/testfile
>>  >>  count=2048 bs=1024; sync
>>  >>  2048+0 records in
>>  >>  2048+0 records out
>>  >>  2097152 bytes (2.1 MB) copied, 0.0360319 s, 58.2 MB/s
>>  >>  root at backup:/home/webmailbak# dd if=/dev/zero
of=/mnt/testfile
>>  >>  count=10240 bs=1024; sync
>>  >>  10240+0 records in
>>  >>  10240+0 records out
>>  >>  10485760 bytes (10 MB) copied, 0.127219 s, 82.4 MB/s
>>  >>  root at backup:/home/webmailbak# dd if=/dev/zero
of=/mnt/testfile
>>  >>  count=10240 bs=1024; sync
>>  >>  10240+0 records in
>>  >>  10240+0 records out
>>  >>  10485760 bytes (10 MB) copied, 0.128671 s, 81.5 MB/s
>>  >
>>  >This is correct, there is overhead that happens with small files
and
>>  >the smaller the file the less throughput you get.  That said,
since
>>  >files are smaller you should get more files / second but less MB /
>>  >second.  I have found that when you go under 16k changing files
size
>>  >doesn't matter, you will get the same number of 16k files /
second
>>as
>>  >you do 1 k files.
>>  >
>>  >>
>>  >>
>>  >>  However, the biggest stumbling block for rsync seems to be 
>>changes to
>>  >>  directories. I'm unsure about what exactly it's
doing (probably
>>  >>changing
>>  >>  last access times?) but these minor writes seem to take a
very
>>long
>>  >>time
>>  >>  when normally they would not. Actual file copies (as in the
very
>>  >>files
>>  >>  that are actually new within those same directories) appear
to
>>take
>>  >>  quite a lot less time than the directory updates.
>>  >
>>  >Dragons be here!  Access time is not kept in sync across the
>>  >replicas(IIRC, someone correct me if I am wrong!) and each time a 
>>dir
>>  >is read from a different brick I bet the access time is different.
>>  >
>>  >>
>>  >>  For example:
>>  >>
>>  >>  # time rsync -av --inplace --whole-file --ignore-existing
>>  >>--delete-after
>>  >>  gromm/* /mnt/gromm/
>>  >>  building file list ... done
>>  >>  Maildir/                        ## This part takes a long
time.
>>  >>  Maildir/.INBOX.Trash/
>>  >>  Maildir/.INBOX.Trash/cur/
>>  >>
>>  
>>
>>Maildir/.INBOX.Trash/cur/1429836077.H817602P21531.pop.lightspeed.ca:2,S
>>  >>  Maildir/.INBOX.Trash/tmp/       ## The previous three lines
took
>>  >>nearly
>>  >>  no time at all.
>>  >>  Maildir/cur/                    ## This takes a long time.
>>  >>  Maildir/cur/1430160436.H952679P13870.pop.lightspeed.ca:2,S
>>  >>  Maildir/new/
>>  >>  Maildir/tmp/                    ## The previous lines again
take
>>no
>>  >>time
>>  >>  at all.
>>  >>  deleting 
>>Maildir/cur/1429836077.H817602P21531.pop.lightspeed.ca:2,S
>>  >>  ## This delete did take a while.
>>  >>  sent 1327634 bytes  received 75 bytes  59009.29 bytes/sec
>>  >>  total size is 624491648  speedup is 470.35
>>  >>
>>  >>  real 0m26.110s
>>  >>  user 0m0.140s
>>  >>  sys 0m1.596s
>>  >>
>>  >>
>>  >>  So, rsync reports that it wrote 1327634 bytes at 59
kBytes/sec,
>>and
>>  >>the
>>  >>  whole operation took 26 seconds. To write 2 files that were 
>>around
>>  >>20-30
>>  >>  kBytes each and delete 1.
>>  >>
>>  >>  The last rsync took around 56 minutes, when normally such an
>>rsync
>>  >>would
>>  >>  have taken 5-10 minutes, writing over the network via ssh.
>>  >
>>  >It may have something to do with the access times not being in
sync
>>  >across replicated pairs.  Maybe some has experience with this / 
>>could
>>  >this be tripping up rsync?
>>  >
>>  >-b
>>  >
>>  >>  _______________________________________________
>>  >>  Gluster-users mailing list
>>  >> Gluster-users at gluster.org
>>  >>  http://www.gluster.org/mailman/listinfo/gluster-users
>>  >>
>>  >_______________________________________________
>>  >Gluster-users mailing list
>>  >Gluster-users at gluster.org
>>  >http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150427/4edf1dbe/attachment.html>

Gluster users - Apr 2015 - Disastrous performance with rsync to mounted Gluster volume.

[Gluster-users] Disastrous performance with rsync to mounted Gluster volume.

[Gluster-users] Disastrous performance with rsync to mounted Gluster volume.