thr3ads.net - Gluster users - [Gluster-users] Tuning for small files [Oct 2015]

If this information is useful, please help other people find it:
Share via:

Thibault Godouet

2015-Sep-29 17:36 UTC

[Gluster-users] Tuning for small files

Ben,

I suspect meta-data / 'ls -l' performance is very important for my svn
use-case.

Having said that, what do you mean by small file performance? I thought
what people meant by this was really the overhead of meta-data, with a 'ls
-l' being a sort of extreme case (pure meta-data).
Obviously if you also have to read and write actual data (albeit not much
at all per file), then the effect of meta-data overhead would get diluted
to a degree, bit potentially still very present.

Would there be an easy way to tell how much time is spent on meta-data vs.
Data in a profile output?

One thing I wonder: do your comments apply to both native Fuse and NFS
mounts?

Finally, all this brings me back to my initial question really: are there
any tuning recommendation of configuration tuning for my requirement (small
file read/writes on a pair of nodes with replication) beyond the thread
counts and lookup optimize?
Or are those by far the most important in this scenario?

Thx,
Thibault.
----- Original Message -----> From: hmlth at t-hamel.fr
> To: abauer at magix.net
> Cc: gluster-users at gluster.org
> Sent: Monday, September 28, 2015 7:40:52 AM
> Subject: Re: [Gluster-users] Tuning for small files
>
> I'm also quite interested by small files performances optimization, but
> I'm a bit confused about the best option between 3.6/3.7.
>
> Ben Turner was saying that 3.6 might give the best performances:
> http://www.gluster.org/pipermail/gluster-users/2015-September/023733.html
>
> What kind of gain is expected (with consistent-metadata) if this
> regression is solved?
Just to be clear, the issue I am talking about is metadata only(think ls -l
or file browsing).  It doesn't affect small file perf(well not that much,
I'm sure a little, but I have never quantified it), with server and client
event threads set to 4 + lookup optimize I see between a 200-300% gain on
my systems on 3.7 vs 3.6 builds.  If I needed fast metadata I would go with
3.6, if I need fast smallfile I would go with 3.7.  If I needed both I
would pick the less of the two evils and go with that one and upgrade when
the fix is released.

-b

>
> I tried 3.6.5 (last version for debian jessie), and it's a bit better
> than 3.7.4 but not by much (10-15%).
>
> I was also wondering if there is recommendations for the underlying file
> system of the bricks (xfs, ext4, tuning...).
>
>
> Regards
>
> Thomas HAMEL
>
> On 2015-09-28 12:04, Andr? Bauer wrote:
> > If you're not already on Glusterfs 3.7.x i would recommend an
update
> > first.
> >
> > Am 25.09.2015 um 17:49 schrieb Thibault Godouet:
> >> Hi,
> >>
> >> There are quite a few tuning parameters for Gluster (as seen in
> >> Gluster
> >> volume XYZ get all), but I didn't find much documentation on
those.
> >> Some people do seem to set at least some of them, so the knowledge
> >> must
> >> be somewhere...
> >>
> >> Is there a good source of information to understand what they
mean,
> >> and
> >> recommendation on how to set them to get a good small file
> >> performance?
> >>
> >> Basically what I'm trying to optimize is for svn operations
(e.g. svn
> >> checkout, or svn branch) on a replicated 2 x 1 volume (hosted on 2
> >> VMs,
> >> 16GB ram, 4 cores each, 10Gb/s network tested at full speed),
using a
> >> NFS mount which appears much faster than fuse in this case (but
still
> >> much slower than when served by a normal NFS server).
> >> Any recommendation for such a setup?
> >>
> >> Thanks,
> >> Thibault.
> >>
> >>
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-users
> >>
> >
> >
> > --
> > Mit freundlichen Gr??en
> > Andr? Bauer
> >
> > MAGIX Software GmbH
> > Andr? Bauer
> > Administrator
> > August-Bebel-Stra?e 48
> > 01219 Dresden
> > GERMANY
> >
> > tel.: 0351 41884875
> > e-mail: abauer at magix.net
> > abauer at magix.net <mailto:Email>
> > www.magix.com <http://www.magix.com/>
> >
> >
> > Gesch?ftsf?hrer | Managing Directors: Dr. Arnd Schr?der, Michael Keith
> > Amtsgericht | Commercial Register: Berlin Charlottenburg, HRB 127205
> >
> > Find us on:
> >
> > <http://www.facebook.com/MAGIX>
<http://www.twitter.com/magix_de>
> > <http://www.youtube.com/wwwmagixcom>
<http://www.magixmagazin.de>
> > ----------------------------------------------------------------------
> > The information in this email is intended only for the addressee named
> > above. Access to this email by anyone else is unauthorized. If you are
> > not the intended recipient of this message any disclosure, copying,
> > distribution or any action taken in reliance on it is prohibited and
> > may be unlawful. MAGIX does not warrant that any attachments are free
> > from viruses or other defects and accepts no liability for any losses
> > resulting from infected email transmissions. Please note that any
> > views expressed in this email may be those of the originator and
do> >
Gluster-users mailing list> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150929/4c276b5b/attachment.html>

Ben Turner

2015-Sep-29 20:40 UTC

head link

[Gluster-users] Tuning for small files

----- Original Message -----> From: "Thibault Godouet" <tibo92 at godouet.net>
> To: "Ben Turner" <bturner at redhat.com>
> Cc: hmlth at t-hamel.fr, gluster-users at gluster.org
> Sent: Tuesday, September 29, 2015 1:36:20 PM
> Subject: Re: [Gluster-users] Tuning for small files
> 
> Ben,
> 
> I suspect meta-data / 'ls -l' performance is very important for my
svn
> use-case.
> 
> Having said that, what do you mean by small file performance? I thought
> what people meant by this was really the overhead of meta-data, with a
'ls
> -l' being a sort of extreme case (pure meta-data).
> Obviously if you also have to read and write actual data (albeit not much
> at all per file), then the effect of meta-data overhead would get diluted
> to a degree, bit potentially still very present.
Where you run into problems with smallfiles on gluster is latency of sending
data over the wire.  For every smallfile create there are a bunch of different
file opetations we have to do on every file.  For example we will have to do at
least 1 lookup per brick to make sure that the file doesn't exist anywhere
before we create it.  We actually got it down to 1 per brick with lookup
optimize on, its 2 IIRC(maybe more?) with it disabled.  So the time we spend
waiting for those lookups to complete adds to latency which lowers the number of
files that can be created in a given period of time.  Lookup optimize was
implemented in 3.7 and like I said its now at the optimal 1 lookup per brick on
creates.

The other problem with small files that we had in 3.6 is that we were using a
single threaded event listener(epoll is what we call it).  This single thread
would spike a CPU to 100%(called a hot thread) and glusterfs would become CPU
bound.  The solution here was to make the event listener multi threaded so that
we could spread the epoll load across CPUs there by eliminating the CPU
bottleneck and allowing us to process more events in a given time.  FYI epoll is
defaulted to 2 threads in 3.7, but I have seen cases where I still bottlenecked
on CPU without 4 threads in my envs, so I usually do 4.  This was implemented in
upstream 3.7 but was backported to RHGS 3.0.4 if you have a RH based version.

Fixing these two issues lead to the performance gains I was talking about with
smallfile creates.  You are probably thinking from a distributed FS + metadata
server perspective(MDS) where the bottleneck is the MDS for smallfiles.  Since
gluster doesn't have an MDS that load is transferred to the clients /
servers and this lead to a CPU bottleneck when epoll was single threaded.  I
think this is the piece you may have been missing.
> 
> Would there be an easy way to tell how much time is spent on meta-data vs.
> Data in a profile output?
Yep!  Can you gather some profiling info and send it to me?
> 
> One thing I wonder: do your comments apply to both native Fuse and NFS
> mounts?
> 
> Finally, all this brings me back to my initial question really: are there
> any tuning recommendation of configuration tuning for my requirement (small
> file read/writes on a pair of nodes with replication) beyond the thread
> counts and lookup optimize?
> Or are those by far the most important in this scenario?
For creating a bunch of small files those are the only two that I know of that
will have a large impact, maybe some others from the list can give some input on
anything else we can do here.

-b
> 
> Thx,
> Thibault.
> ----- Original Message -----
> > From: hmlth at t-hamel.fr
> > To: abauer at magix.net
> > Cc: gluster-users at gluster.org
> > Sent: Monday, September 28, 2015 7:40:52 AM
> > Subject: Re: [Gluster-users] Tuning for small files
> >
> > I'm also quite interested by small files performances
optimization, but
> > I'm a bit confused about the best option between 3.6/3.7.
> >
> > Ben Turner was saying that 3.6 might give the best performances:
> >
http://www.gluster.org/pipermail/gluster-users/2015-September/023733.html
> >
> > What kind of gain is expected (with consistent-metadata) if this
> > regression is solved?
> 
> Just to be clear, the issue I am talking about is metadata only(think ls -l
> or file browsing).  It doesn't affect small file perf(well not that
much,
> I'm sure a little, but I have never quantified it), with server and
client
> event threads set to 4 + lookup optimize I see between a 200-300% gain on
> my systems on 3.7 vs 3.6 builds.  If I needed fast metadata I would go with
> 3.6, if I need fast smallfile I would go with 3.7.  If I needed both I
> would pick the less of the two evils and go with that one and upgrade when
> the fix is released.
> 
> -b
> 
> 
> >
> > I tried 3.6.5 (last version for debian jessie), and it's a bit
better
> > than 3.7.4 but not by much (10-15%).
> >
> > I was also wondering if there is recommendations for the underlying
file
> > system of the bricks (xfs, ext4, tuning...).
> >
> >
> > Regards
> >
> > Thomas HAMEL
> >
> > On 2015-09-28 12:04, Andr? Bauer wrote:
> > > If you're not already on Glusterfs 3.7.x i would recommend an
update
> > > first.
> > >
> > > Am 25.09.2015 um 17:49 schrieb Thibault Godouet:
> > >> Hi,
> > >>
> > >> There are quite a few tuning parameters for Gluster (as seen
in
> > >> Gluster
> > >> volume XYZ get all), but I didn't find much documentation
on those.
> > >> Some people do seem to set at least some of them, so the
knowledge
> > >> must
> > >> be somewhere...
> > >>
> > >> Is there a good source of information to understand what they
mean,
> > >> and
> > >> recommendation on how to set them to get a good small file
> > >> performance?
> > >>
> > >> Basically what I'm trying to optimize is for svn
operations (e.g. svn
> > >> checkout, or svn branch) on a replicated 2 x 1 volume (hosted
on 2
> > >> VMs,
> > >> 16GB ram, 4 cores each, 10Gb/s network tested at full speed),
using a
> > >> NFS mount which appears much faster than fuse in this case
(but still
> > >> much slower than when served by a normal NFS server).
> > >> Any recommendation for such a setup?
> > >>
> > >> Thanks,
> > >> Thibault.
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> Gluster-users mailing list
> > >> Gluster-users at gluster.org
> > >> http://www.gluster.org/mailman/listinfo/gluster-users
> > >>
> > >
> > >
> > > --
> > > Mit freundlichen Gr??en
> > > Andr? Bauer
> > >
> > > MAGIX Software GmbH
> > > Andr? Bauer
> > > Administrator
> > > August-Bebel-Stra?e 48
> > > 01219 Dresden
> > > GERMANY
> > >
> > > tel.: 0351 41884875
> > > e-mail: abauer at magix.net
> > > abauer at magix.net <mailto:Email>
> > > www.magix.com <http://www.magix.com/>
> > >
> > >
> > > Gesch?ftsf?hrer | Managing Directors: Dr. Arnd Schr?der, Michael
Keith
> > > Amtsgericht | Commercial Register: Berlin Charlottenburg, HRB
127205
> > >
> > > Find us on:
> > >
> > > <http://www.facebook.com/MAGIX>
<http://www.twitter.com/magix_de>
> > > <http://www.youtube.com/wwwmagixcom>
<http://www.magixmagazin.de>
> > >
----------------------------------------------------------------------
> > > The information in this email is intended only for the addressee
named
> > > above. Access to this email by anyone else is unauthorized. If
you are
> > > not the intended recipient of this message any disclosure,
copying,
> > > distribution or any action taken in reliance on it is prohibited
and
> > > may be unlawful. MAGIX does not warrant that any attachments are
free
> > > from viruses or other defects and accepts no liability for any
losses
> > > resulting from infected email transmissions. Please note that any
> > > views expressed in this email may be those of the originator and
do> >
> Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-users
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>

Thibault Godouet

2015-Oct-02 17:29 UTC

head link

[Gluster-users] Tuning for small files

Right, so what I did is:
- on one node (gluster 3.7.3), run 'gluster volume shared profile start'
- on the client mount, run the test
- on the node, run 'gluster volume shared profile info' (and copied the
output)
- finally, ran 'gluster volume profile shared stop'

I repeated this for two different tests (simple rm followed by svn
checkout, and a more complete build test), on an NFS mount and on a Fuse
mount.

To my surprise the svn checkout is actually a lot faster (3x) on the Fuse
mount than NFS.
However the build test is a lot slower on the Fuse mount (+50%, which is a
lot considering the compilation is CPU intensive, not just I/Os!).

Ben I will send you the profile outputs separately now...
On 29 Sep 2015 9:40 pm, "Ben Turner" <bturner at redhat.com>
wrote:
> ----- Original Message -----
> > From: "Thibault Godouet" <tibo92 at godouet.net>
> > To: "Ben Turner" <bturner at redhat.com>
> > Cc: hmlth at t-hamel.fr, gluster-users at gluster.org
> > Sent: Tuesday, September 29, 2015 1:36:20 PM
> > Subject: Re: [Gluster-users] Tuning for small files
> >
> > Ben,
> >
> > I suspect meta-data / 'ls -l' performance is very important
for my svn
> > use-case.
> >
> > Having said that, what do you mean by small file performance? I
thought
> > what people meant by this was really the overhead of meta-data, with a
> 'ls
> > -l' being a sort of extreme case (pure meta-data).
> > Obviously if you also have to read and write actual data (albeit not
much
> > at all per file), then the effect of meta-data overhead would get
diluted
> > to a degree, bit potentially still very present.
>
> Where you run into problems with smallfiles on gluster is latency of
> sending data over the wire.  For every smallfile create there are a bunch
> of different file opetations we have to do on every file.  For example we
> will have to do at least 1 lookup per brick to make sure that the file
> doesn't exist anywhere before we create it.  We actually got it down to
1
> per brick with lookup optimize on, its 2 IIRC(maybe more?) with it
> disabled.  So the time we spend waiting for those lookups to complete adds
> to latency which lowers the number of files that can be created in a given
> period of time.  Lookup optimize was implemented in 3.7 and like I said its
> now at the optimal 1 lookup per brick on creates.
>
> The other problem with small files that we had in 3.6 is that we were
> using a single threaded event listener(epoll is what we call it).  This
> single thread would spike a CPU to 100%(called a hot thread) and glusterfs
> would become CPU bound.  The solution here was to make the event listener
> multi threaded so that we could spread the epoll load across CPUs there by
> eliminating the CPU bottleneck and allowing us to process more events in a
> given time.  FYI epoll is defaulted to 2 threads in 3.7, but I have seen
> cases where I still bottlenecked on CPU without 4 threads in my envs, so I
> usually do 4.  This was implemented in upstream 3.7 but was backported to
> RHGS 3.0.4 if you have a RH based version.
>
> Fixing these two issues lead to the performance gains I was talking about
> with smallfile creates.  You are probably thinking from a distributed FS +
> metadata server perspective(MDS) where the bottleneck is the MDS for
> smallfiles.  Since gluster doesn't have an MDS that load is transferred
to
> the clients / servers and this lead to a CPU bottleneck when epoll was
> single threaded.  I think this is the piece you may have been missing.
>
> >
> > Would there be an easy way to tell how much time is spent on meta-data
> vs.
> > Data in a profile output?
>
> Yep!  Can you gather some profiling info and send it to me?
>
> >
> > One thing I wonder: do your comments apply to both native Fuse and NFS
> > mounts?
> >
> > Finally, all this brings me back to my initial question really: are
there
> > any tuning recommendation of configuration tuning for my requirement
> (small
> > file read/writes on a pair of nodes with replication) beyond the
thread
> > counts and lookup optimize?
> > Or are those by far the most important in this scenario?
>
> For creating a bunch of small files those are the only two that I know of
> that will have a large impact, maybe some others from the list can give
> some input on anything else we can do here.
>
> -b
>
> >
> > Thx,
> > Thibault.
> > ----- Original Message -----
> > > From: hmlth at t-hamel.fr
> > > To: abauer at magix.net
> > > Cc: gluster-users at gluster.org
> > > Sent: Monday, September 28, 2015 7:40:52 AM
> > > Subject: Re: [Gluster-users] Tuning for small files
> > >
> > > I'm also quite interested by small files performances
optimization, but
> > > I'm a bit confused about the best option between 3.6/3.7.
> > >
> > > Ben Turner was saying that 3.6 might give the best performances:
> > >
> http://www.gluster.org/pipermail/gluster-users/2015-September/023733.html
> > >
> > > What kind of gain is expected (with consistent-metadata) if this
> > > regression is solved?
> >
> > Just to be clear, the issue I am talking about is metadata only(think
ls
> -l
> > or file browsing).  It doesn't affect small file perf(well not
that much,
> > I'm sure a little, but I have never quantified it), with server
and
> client
> > event threads set to 4 + lookup optimize I see between a 200-300% gain
on
> > my systems on 3.7 vs 3.6 builds.  If I needed fast metadata I would go
> with
> > 3.6, if I need fast smallfile I would go with 3.7.  If I needed both I
> > would pick the less of the two evils and go with that one and upgrade
> when
> > the fix is released.
> >
> > -b
> >
> >
> > >
> > > I tried 3.6.5 (last version for debian jessie), and it's a
bit better
> > > than 3.7.4 but not by much (10-15%).
> > >
> > > I was also wondering if there is recommendations for the
underlying
> file
> > > system of the bricks (xfs, ext4, tuning...).
> > >
> > >
> > > Regards
> > >
> > > Thomas HAMEL
> > >
> > > On 2015-09-28 12:04, Andr? Bauer wrote:
> > > > If you're not already on Glusterfs 3.7.x i would
recommend an update
> > > > first.
> > > >
> > > > Am 25.09.2015 um 17:49 schrieb Thibault Godouet:
> > > >> Hi,
> > > >>
> > > >> There are quite a few tuning parameters for Gluster (as
seen in
> > > >> Gluster
> > > >> volume XYZ get all), but I didn't find much
documentation on those.
> > > >> Some people do seem to set at least some of them, so the
knowledge
> > > >> must
> > > >> be somewhere...
> > > >>
> > > >> Is there a good source of information to understand what
they mean,
> > > >> and
> > > >> recommendation on how to set them to get a good small
file
> > > >> performance?
> > > >>
> > > >> Basically what I'm trying to optimize is for svn
operations (e.g.
> svn
> > > >> checkout, or svn branch) on a replicated 2 x 1 volume
(hosted on 2
> > > >> VMs,
> > > >> 16GB ram, 4 cores each, 10Gb/s network tested at full
speed), using
> a
> > > >> NFS mount which appears much faster than fuse in this
case (but
> still
> > > >> much slower than when served by a normal NFS server).
> > > >> Any recommendation for such a setup?
> > > >>
> > > >> Thanks,
> > > >> Thibault.
> > > >>
> > > >>
> > > >>
> > > >> _______________________________________________
> > > >> Gluster-users mailing list
> > > >> Gluster-users at gluster.org
> > > >> http://www.gluster.org/mailman/listinfo/gluster-users
> > > >>
> > > >
> > > >
> > > > --
> > > > Mit freundlichen Gr??en
> > > > Andr? Bauer
> > > >
> > > > MAGIX Software GmbH
> > > > Andr? Bauer
> > > > Administrator
> > > > August-Bebel-Stra?e 48
> > > > 01219 Dresden
> > > > GERMANY
> > > >
> > > > tel.: 0351 41884875
> > > > e-mail: abauer at magix.net
> > > > abauer at magix.net <mailto:Email>
> > > > www.magix.com <http://www.magix.com/>
> > > >
> > > >
> > > > Gesch?ftsf?hrer | Managing Directors: Dr. Arnd Schr?der,
Michael
> Keith
> > > > Amtsgericht | Commercial Register: Berlin Charlottenburg,
HRB 127205
> > > >
> > > > Find us on:
> > > >
> > > > <http://www.facebook.com/MAGIX>
<http://www.twitter.com/magix_de>
> > > > <http://www.youtube.com/wwwmagixcom>
<http://www.magixmagazin.de>
> > > >
> ----------------------------------------------------------------------
> > > > The information in this email is intended only for the
addressee
> named
> > > > above. Access to this email by anyone else is unauthorized.
If you
> are
> > > > not the intended recipient of this message any disclosure,
copying,
> > > > distribution or any action taken in reliance on it is
prohibited and
> > > > may be unlawful. MAGIX does not warrant that any attachments
are free
> > > > from viruses or other defects and accepts no liability for
any losses
> > > > resulting from infected email transmissions. Please note
that any
> > > > views expressed in this email may be those of the originator
and do>
> >
> > Gluster-users mailing list
> > > > Gluster-users at gluster.org
> > > > http://www.gluster.org/mailman/listinfo/gluster-users
> > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-users
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20151002/90ffc96d/attachment.html>

Gluster users - Oct 2015 - Tuning for small files

[Gluster-users] Tuning for small files

[Gluster-users] Tuning for small files

[Gluster-users] Tuning for small files