Remi Broemeling
2011-Jul-18  17:53 UTC
[Gluster-users] GlusterFS v3.1.5 Stable Configuration
Hi,
We've been using GlusterFS to manage shared files across a number of hosts
in the past few months and have ran into a few problems -- basically one
every month, roughly.  The problems are occasionally extremely difficult to
track down to GlusterFS, as they often masquerade as something else in the
application log files that we have.  The problems have been one instance of
split-brain and then a number of instances of "stuck" files (i.e. any
stat
calls would block for an hour and then timeout with an error) as well as a
couple instances of "ghost" files (remove the file, but GlusterFS
continues
to show it for a little while until the cache times out).
We do *not* place a large amount of load on GlusterFS, and don't have any
significant performance issues to deal with.  With that in mind, the core
question of this e-mail is: "How can I modify our configuration to be the
absolute *most* stable (problem free) that it can be, even if it means
sacrificing performance?"  In sum, I don't have any particular
performance
concerns at this moment, but the GlusterFS bugs that we encounter are quite
problematic -- so I'm willing to entertain any suggested stability
improvement, even if it has a negative impact on performance (I suspect that
the answer here is just "turn off all performance-enhancing gluster
caching", but I wanted to validate that is actually true before going so
far).  Thus please suggest anything that could be done to improve the
stability of our setup -- as an aside, I think that this would be an
advantageous thing to add to the FAQ.  Right now the FAQ contains
information for *performance* tuning, but not for *stability* tuning.
Thanks for any help that you can give/suggestions that you can make.
Here are the details of our environment:
OS: RHEL5
GlusterFS Version: 3.1.5
Mount method: glusterfsd/FUSE
GlusterFS Servers: web01, web02
GlusterFS Clients: web01, web02, dj01, dj02
$ sudo gluster volume info
Volume Name: shared-application-data
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: web01:/var/glusterfs/bricks/shared
Brick2: web02:/var/glusterfs/bricks/shared
Options Reconfigured:
network.ping-timeout: 5
nfs.disable: on
Configuration File Contents:
*/etc/glusterd/vols/shared-application-data/shared-application-data-fuse.vol
*
volume shared-application-data-client-0
    type protocol/client
    option remote-host web01
    option remote-subvolume /var/glusterfs/bricks/shared
    option transport-type tcp
    option ping-timeout 5
end-volume
volume shared-application-data-client-1
    type protocol/client
    option remote-host web02
    option remote-subvolume /var/glusterfs/bricks/shared
    option transport-type tcp
    option ping-timeout 5
end-volume
volume shared-application-data-replicate-0
    type cluster/replicate
    subvolumes shared-application-data-client-0
shared-application-data-client-1
end-volume
volume shared-application-data-write-behind
    type performance/write-behind
    subvolumes shared-application-data-replicate-0
end-volume
volume shared-application-data-read-ahead
    type performance/read-ahead
    subvolumes shared-application-data-write-behind
end-volume
volume shared-application-data-io-cache
    type performance/io-cache
    subvolumes shared-application-data-read-ahead
end-volume
volume shared-application-data-quick-read
    type performance/quick-read
    subvolumes shared-application-data-io-cache
end-volume
volume shared-application-data-stat-prefetch
    type performance/stat-prefetch
    subvolumes shared-application-data-quick-read
end-volume
volume shared-application-data
    type debug/io-stats
    subvolumes shared-application-data-stat-prefetch
end-volume
*/etc/glusterfs/glusterd.vol*
volume management
    type mgmt/glusterd
    option working-directory /etc/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
end-volume
-- 
Remi Broemeling
System Administrator
Clio - Practice Management Simplified
1-888-858-2546 x(2^5) | remi at goclio.com
www.goclio.com | blog <http://www.goclio.com/blog> |
twitter<http://www.twitter.com/goclio>
 | facebook <http://www.facebook.com/goclio>
   ____
 _? oo ?_
(_      _)
  |    |
  ?_??_?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110718/3b5ca7d1/attachment.html>
On Mon, Jul 18, 2011 at 10:53 AM, Remi Broemeling <remi at goclio.com> wrote:> Hi, > > We've been using GlusterFS to manage shared files across a number of hosts > in the past few months and have ran into a few problems -- basically one > every month, roughly.? The problems are occasionally extremely difficult to > track down to GlusterFS, as they often masquerade as something else in the > application log files that we have.? The problems have been one instance of > split-brain and then a number of instances of "stuck" files (i.e. any stat > calls would block for an hour and then timeout with an error) as well as a > couple instances of "ghost" files (remove the file, but GlusterFS continues > to show it for a little while until the cache times out). > > We do not place a large amount of load on GlusterFS, and don't have any > significant performance issues to deal with.? With that in mind, the core > question of this e-mail is: "How can I modify our configuration to be the > absolute most stable (problem free) that it can be, even if it means > sacrificing performance?"? In sum, I don't have any particular performanceIt depends on kind of bugs or issues you are encountering. There might be solution for some bugs and may not be for others.> concerns at this moment, but the GlusterFS bugs that we encounter are quite > problematic -- so I'm willing to entertain any suggested stability > improvement, even if it has a negative impact on performance (I suspect that > the answer here is just "turn off all performance-enhancing gluster > caching", but I wanted to validate that is actually true before going so > far).? Thus please suggest anything that could be done to improve the > stability of our setup -- as an aside, I think that this would be an > advantageous thing to add to the FAQ.? Right now the FAQ contains > information for performance tuning, but not for stability tuning. > > Thanks for any help that you can give/suggestions that you can make. > > Here are the details of our environment: > > OS: RHEL5 > GlusterFS Version: 3.1.5 > Mount method: glusterfsd/FUSE > GlusterFS Servers: web01, web02 > GlusterFS Clients: web01, web02, dj01, dj02 > > $ sudo gluster volume info > > Volume Name: shared-application-data > Type: Replicate > Status: Started > Number of Bricks: 2 > Transport-type: tcp > Bricks: > Brick1: web01:/var/glusterfs/bricks/shared > Brick2: web02:/var/glusterfs/bricks/shared > Options Reconfigured: > network.ping-timeout: 5 > nfs.disable: on > > Configuration File Contents: > /etc/glusterd/vols/shared-application-data/shared-application-data-fuse.vol > volume shared-application-data-client-0 > ??? type protocol/client > ??? option remote-host web01 > ??? option remote-subvolume /var/glusterfs/bricks/shared > ??? option transport-type tcp > ??? option ping-timeout 5 > end-volume > > volume shared-application-data-client-1 > ??? type protocol/client > ??? option remote-host web02 > ??? option remote-subvolume /var/glusterfs/bricks/shared > ??? option transport-type tcp > ??? option ping-timeout 5 > end-volume > > volume shared-application-data-replicate-0 > ??? type cluster/replicate > ??? subvolumes shared-application-data-client-0 > shared-application-data-client-1 > end-volume > > volume shared-application-data-write-behind > ??? type performance/write-behind > ??? subvolumes shared-application-data-replicate-0 > end-volume > > volume shared-application-data-read-ahead > ??? type performance/read-ahead > ??? subvolumes shared-application-data-write-behind > end-volume > > volume shared-application-data-io-cache > ??? type performance/io-cache > ??? subvolumes shared-application-data-read-ahead > end-volume > > volume shared-application-data-quick-read > ??? type performance/quick-read > ??? subvolumes shared-application-data-io-cache > end-volume > > volume shared-application-data-stat-prefetch > ??? type performance/stat-prefetch > ??? subvolumes shared-application-data-quick-read > end-volume > > volume shared-application-data > ??? type debug/io-stats > ??? subvolumes shared-application-data-stat-prefetch > end-volume > > /etc/glusterfs/glusterd.vol > volume management > ??? type mgmt/glusterd > ??? option working-directory /etc/glusterd > ??? option transport-type socket,rdma > ??? option transport.socket.keepalive-time 10 > ??? option transport.socket.keepalive-interval 2 > end-volume > > -- > Remi Broemeling > System Administrator > Clio - Practice Management Simplified > 1-888-858-2546 x(2^5) |?remi at goclio.com > www.goclio.com?|?blog?|?twitter?|?facebook > > ____ > _? oo ?_ > (_ _) > | | > ?_??_? > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > >