thr3ads.net - Gluster users - [Gluster-users] another look at high concurrency and cpu usage [Feb 2010]

If this information is useful, please help other people find it:
Share via:

John Madden

2010-Feb-15 19:13 UTC

[Gluster-users] another look at high concurrency and cpu usage

I've made a few swings at using glusterfs for the php session store for 
a heavily-used web app (~6 million pages daily) and I've found time and 
again that cpu usage and odd load characteristics cause glusterfs to be 
entirely unsuitable for this use case at least given my configuration. 
I posted on this earlier, but I'm hoping I can get some input on this as 
things are way better than they were but still not good enough.  I'm on 
v2.0.9 as the 3.0.x series doesn't seem to be fully settled yet, though 
feel free to correct me on that.

I have a two-nodes replicate setup and four clients.  Configs are below. 
  What I see is that one brick gets pegged (load avg of 8) and the other 
sites much more idle (load avg of 1).  The pegged node ends up with high 
run queues and i/o blocked processes.  CPU usage on the clients for the 
glusterfs processes gets pretty high, consuming at least an entire cpu 
when not spiking to consume both.  I have very high thread counts on the 
clients to hopefully avoid thread waits on i/o requests.  All six 
machines are identical xen instances.

When one of the bricks is down, cpu usage across the board goes way 
down, interactivity goes way up, and things seem overall to be a whole 
lot better.  Why is that?  I would think that having two nodes would at 
least result in better read rates.

I've gone through various caching schemes and tried readahead, 
writebehind, quick-read, and stat-prefetch.  I found quick-read caused a 
ton of memory consumption and didn't help on performance.  I didn't see 
much of a change at all with stat-prefetch.

...Any thoughts?

### fsd.vol:

volume sessions
   type storage/posix
   option directory /var/glusterfs/sessions
   option o-direct off
end-volume
volume data
   type storage/posix
   option directory /var/glusterfs/data
   option o-direct off
end-volume
volume locks0
   type features/locks
   option mandatory-locks on
   subvolumes data
end-volume
volume locks1
   type features/locks
   option mandatory-locks on
   subvolumes sessions
end-volume
volume brick0
   type performance/io-threads
   option thread-count 32 # default is 16
   subvolumes locks0
end-volume
volume brick1
   type performance/io-threads
   option thread-count 32 # default is 16
   subvolumes locks1
end-volume
volume server
   type protocol/server
   option transport-type tcp
   option transport.socket.nodelay on
   subvolumes brick0 brick1
   option auth.addr.brick0.allow ip's...
   option auth.addr.brick1.allow ip's...
end-volume


### client.vol (just one connection shown here)

volume glusterfs0-hs
   type protocol/client
   option transport-type tcp
   option remote-host "ip"
   option ping-timeout 10
   option transport.socket.nodelay on
   option remote-subvolume brick1
end-volume
volume glusterfs1-hs
   type protocol/client
   option transport-type tcp
   option remote-host "ip"
   option ping-timeout 10server for each request
   option transport.socket.nodelay onspeed
   option remote-subvolume brick1
end-volume
volume replicated
   type cluster/replicate
   subvolumes glusterfs0-hs glusterfs1-hs
end-volume
volume iocache
   type performance/io-cache
   option cache-size 512MB
   option cache-timeout 10
   subvolumes replicated
end-volume
volume writeback
   type performance/write-behind
   option cache-size 128MB
   option flush-behind off
   subvolumes iocache
end-volume
volume iothreads
   type performance/io-threads
   option thread-count 100
   subvolumes writeback
end-volume





-- 
John Madden
Sr UNIX Systems Engineer
Ivy Tech Community College of Indiana
jmadden at ivytech.edu

Harshavardhana

2010-Feb-15 19:44 UTC

head link

[Gluster-users] another look at high concurrency and cpu usage

Hi John,

* replies inline *
On Tue, Feb 16, 2010 at 12:43 AM, John Madden <jmadden at ivytech.edu>
wrote:
> I've made a few swings at using glusterfs for the php session store for
a
> heavily-used web app (~6 million pages daily) and I've found time and
again
> that cpu usage and odd load characteristics cause glusterfs to be entirely
> unsuitable for this use case at least given my configuration. I posted on
> this earlier, but I'm hoping I can get some input on this as things are
way
> better than they were but still not good enough.  I'm on v2.0.9 as the
3.0.x
> series doesn't seem to be fully settled yet, though feel free to
correct me
> on that.
>
> I have a two-nodes replicate setup and four clients.  Configs are below.
>  What I see is that one brick gets pegged (load avg of 8) and the other
> sites much more idle (load avg of 1).  The pegged node ends up with high
run
> queues and i/o blocked processes.  CPU usage on the clients for the
> glusterfs processes gets pretty high, consuming at least an entire cpu when
> not spiking to consume both.  I have very high thread counts on the clients
> to hopefully avoid thread waits on i/o requests.  All six machines are
> identical xen instances.
>
>Comments about your vol file are as below:

1. write-behind cache-size of 128MB is an over kill having so much
aggressiveness
over an ethernet will not get you good performance.
2. thread count of 100 is way beyond what is the actual use case, in our
tests and deployments it is seen that having 16 thread cater almost all the
cases.
3. quick-read and stat-pretech will help if you have smaller files and large
number of them. 3.0.2 has proper enhancements for getting this
functionality.

Suggestion is to divide "fsd.vol" into two vol files so having each
server
for each backend export. It has been seen that in production deployments
this helps in scalability and performance gains.

Also using "glusterfs-volgen" generated volume files are better for
all your
needs.

When one of the bricks is down, cpu usage across the board goes way
down,> interactivity goes way up, and things seem overall to be a whole lot
better.
>  Why is that?  I would think that having two nodes would at least result in
> better read rates.
>
> I've gone through various caching schemes and tried readahead,
writebehind,
> quick-read, and stat-prefetch.  I found quick-read caused a ton of memory
> consumption and didn't help on performance.  I didn't see much of a
change
> at all with stat-prefetch.
>
> ...Any thoughts?
>
> ### fsd.vol:
>
> volume sessions
>  type storage/posix
>  option directory /var/glusterfs/sessions
>  option o-direct off
> end-volume
> volume data
>  type storage/posix
>  option directory /var/glusterfs/data
>  option o-direct off
> end-volume
> volume locks0
>  type features/locks
>  option mandatory-locks on
>  subvolumes data
> end-volume
> volume locks1
>  type features/locks
>  option mandatory-locks on
>  subvolumes sessions
> end-volume
> volume brick0
>  type performance/io-threads
>  option thread-count 32 # default is 16
>  subvolumes locks0
> end-volume
> volume brick1
>  type performance/io-threads
>  option thread-count 32 # default is 16
>  subvolumes locks1
> end-volume
> volume server
>  type protocol/server
>  option transport-type tcp
>  option transport.socket.nodelay on
>  subvolumes brick0 brick1
>  option auth.addr.brick0.allow ip's...
>  option auth.addr.brick1.allow ip's...
> end-volume
>
>
> ### client.vol (just one connection shown here)
>
> volume glusterfs0-hs
>  type protocol/client
>  option transport-type tcp
>  option remote-host "ip"
>  option ping-timeout 10
>  option transport.socket.nodelay on
>  option remote-subvolume brick1
> end-volume
> volume glusterfs1-hs
>  type protocol/client
>  option transport-type tcp
>  option remote-host "ip"
>  option ping-timeout 10server for each request
>  option transport.socket.nodelay onspeed
>  option remote-subvolume brick1
> end-volume
> volume replicated
>  type cluster/replicate
>  subvolumes glusterfs0-hs glusterfs1-hs
> end-volume
> volume iocache
>  type performance/io-cache
>  option cache-size 512MB
>  option cache-timeout 10
>  subvolumes replicated
> end-volume
> volume writeback
>  type performance/write-behind
>  option cache-size 128MB
>  option flush-behind off
>  subvolumes iocache
> end-volume
> volume iothreads
>  type performance/io-threads
>  option thread-count 100
>  subvolumes writeback
> end-volume
>
>
>
>
>
> --
> John Madden
> Sr UNIX Systems Engineer
> Ivy Tech Community College of Indiana
> jmadden at ivytech.edu
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>

Arvids Godjuks

2010-Feb-16 15:59 UTC

head link

[Gluster-users] another look at high concurrency and cpu usage

John Madden

Storing PHP sessions on Gluster FS really isn't a good option, because
session file is locked while scripts access it. You will definetly hit
a performance issue. My advice is to setup a memcached server and use
memcache (or memcached) PHP module's ability to store session data in
memcache server. It has the capability to provide a fault-tolerant
service if you setup a few memcached servers. And it's lightning fast.

2010/2/15 John Madden <jmadden at ivytech.edu>:> I've made a few swings at using glusterfs for the php session store for
a
> heavily-used web app (~6 million pages daily) and I've found time and
again
> that cpu usage and odd load characteristics cause glusterfs to be entirely
> unsuitable for this use case at least given my configuration. I posted on
> this earlier, but I'm hoping I can get some input on this as things are
way
> better than they were but still not good enough. ?I'm on v2.0.9 as the
3.0.x
> series doesn't seem to be fully settled yet, though feel free to
correct me
> on that.
>
> I have a two-nodes replicate setup and four clients. ?Configs are below.
> ?What I see is that one brick gets pegged (load avg of 8) and the other
> sites much more idle (load avg of 1). ?The pegged node ends up with high
run
> queues and i/o blocked processes. ?CPU usage on the clients for the
> glusterfs processes gets pretty high, consuming at least an entire cpu when
> not spiking to consume both. ?I have very high thread counts on the clients
> to hopefully avoid thread waits on i/o requests. ?All six machines are
> identical xen instances.
>
> When one of the bricks is down, cpu usage across the board goes way down,
> interactivity goes way up, and things seem overall to be a whole lot
better.
> ?Why is that? ?I would think that having two nodes would at least result in
> better read rates.
>
> I've gone through various caching schemes and tried readahead,
writebehind,
> quick-read, and stat-prefetch. ?I found quick-read caused a ton of memory
> consumption and didn't help on performance. ?I didn't see much of a
change
> at all with stat-prefetch.
>
> ...Any thoughts?
>
> ### fsd.vol:
>
> volume sessions
> ?type storage/posix
> ?option directory /var/glusterfs/sessions
> ?option o-direct off
> end-volume
> volume data
> ?type storage/posix
> ?option directory /var/glusterfs/data
> ?option o-direct off
> end-volume
> volume locks0
> ?type features/locks
> ?option mandatory-locks on
> ?subvolumes data
> end-volume
> volume locks1
> ?type features/locks
> ?option mandatory-locks on
> ?subvolumes sessions
> end-volume
> volume brick0
> ?type performance/io-threads
> ?option thread-count 32 # default is 16
> ?subvolumes locks0
> end-volume
> volume brick1
> ?type performance/io-threads
> ?option thread-count 32 # default is 16
> ?subvolumes locks1
> end-volume
> volume server
> ?type protocol/server
> ?option transport-type tcp
> ?option transport.socket.nodelay on
> ?subvolumes brick0 brick1
> ?option auth.addr.brick0.allow ip's...
> ?option auth.addr.brick1.allow ip's...
> end-volume
>
>
> ### client.vol (just one connection shown here)
>
> volume glusterfs0-hs
> ?type protocol/client
> ?option transport-type tcp
> ?option remote-host "ip"
> ?option ping-timeout 10
> ?option transport.socket.nodelay on
> ?option remote-subvolume brick1
> end-volume
> volume glusterfs1-hs
> ?type protocol/client
> ?option transport-type tcp
> ?option remote-host "ip"
> ?option ping-timeout 10server for each request
> ?option transport.socket.nodelay onspeed
> ?option remote-subvolume brick1
> end-volume
> volume replicated
> ?type cluster/replicate
> ?subvolumes glusterfs0-hs glusterfs1-hs
> end-volume
> volume iocache
> ?type performance/io-cache
> ?option cache-size 512MB
> ?option cache-timeout 10
> ?subvolumes replicated
> end-volume
> volume writeback
> ?type performance/write-behind
> ?option cache-size 128MB
> ?option flush-behind off
> ?subvolumes iocache
> end-volume
> volume iothreads
> ?type performance/io-threads
> ?option thread-count 100
> ?subvolumes writeback
> end-volume
>
>
>
>
>
> --
> John Madden
> Sr UNIX Systems Engineer
> Ivy Tech Community College of Indiana
> jmadden at ivytech.edu
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>

Gluster users - Feb 2010 - another look at high concurrency and cpu usage

[Gluster-users] another look at high concurrency and cpu usage

[Gluster-users] another look at high concurrency and cpu usage

[Gluster-users] another look at high concurrency and cpu usage