thr3ads.net - Gluster users - [Gluster-users] Best Practices for Gluster Replication [May 2010]

If this information is useful, please help other people find it:
Share via:

Christopher Hawkins

2010-May-13 12:23 UTC

[Gluster-users] Best Practices for Gluster Replication

I have followed that debate before. The impression I got was that if you handle
it in the clients, then they will be able to fail from one server to the next if
the original goes down (after the timeout).

But if you handle it in the server side, then if a server goes down the only way
to get HA for the clients is to externally implement round robin DNS or
something like that. Other than this issue, I think either way is technically
acceptable. If memory serves, this is the reason why client side is the
"default" or preferred setup.

Chris

----- "James Burnash" <jburnash at knight.com> wrote:
> I know it's only been a day, and I understand that people are busy  -
> but nobody has anything to share on this subject?
> 
> It seems like it would be good thing to be able to at least understand
> why implementing it on the back end would be a bad idea ...
> 
> -----Original Message-----
> From: gluster-users-bounces at gluster.org
> [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash,
> James
> Sent: Wednesday, May 12, 2010 10:30 AM
> To: gluster-users at gluster.org
> Subject: [Gluster-users] Best Practices for Gluster Replication
> 
> Greetings List,
> 
> I've searched through the Gluster wiki and a lot of threads to try to
> answer this question, but so far no real luck.
> 
> Simply put - is it better to have replication handled by the clients,
> or by the bricks themselves?
> 
> Volgen for a raid 1 solution creates a config file that does the
> mirroring on the client side - which I would take as an implicit
> endorsement from the Gluster team (great team, BTW). However, it seems
> to me that if the bricks replicated between themselves on our 10Gb
> storage network, it could save a lot of bandwidth for the clients and
> conceivably save them CPU cycles an I/O as well.
> 
> Client machines have 1Gb connections to the storage network, and are
> running CentOS 5.2.
> Server machines have 10Gb connections to the storage network, and are
> running CentOS 5.4.
> 
> Glusterfs.vol:
> ## file auto generated by /usr/bin/glusterfs-volgen (mount.vol)
> # Cmd line:
> # $ /usr/bin/glusterfs-volgen --name testfs --raid 1
> jc1letgfs13-pfs1:/export/read-write
> jc1letgfs14-pfs1:/export/read-write
> jc1letgfs15-pfs1:/export/read-write
> jc1letgfs16-pfs1:/export/read-write
> jc1letgfs17-pfs1:/export/read-write
> jc1letgfs18-pfs1:/export/read-write
> 
> # RAID 1
> # TRANSPORT-TYPE tcp
> volume jc1letgfs17-pfs1-1
>     type protocol/client
>     option transport-type tcp
>     option remote-host jc1letgfs17-pfs1
>     option transport.socket.nodelay on
>     option transport.remote-port 6996
>     option remote-subvolume brick1
> end-volume
> 
> volume jc1letgfs18-pfs1-1
>     type protocol/client
>     option transport-type tcp
>     option remote-host jc1letgfs18-pfs1
>     option transport.socket.nodelay on
>     option transport.remote-port 6996
>     option remote-subvolume brick1
> end-volume
> 
> volume jc1letgfs13-pfs1-1
>     type protocol/client
>     option transport-type tcp
>     option remote-host jc1letgfs13-pfs1
>     option transport.socket.nodelay on
>     option transport.remote-port 6996
>     option remote-subvolume brick1
> end-volume
> 
> volume jc1letgfs15-pfs1-1
>     type protocol/client
>     option transport-type tcp
>     option remote-host jc1letgfs15-pfs1
>     option transport.socket.nodelay on
>     option transport.remote-port 6996
>     option remote-subvolume brick1
> end-volume
> 
> volume jc1letgfs16-pfs1-1
>     type protocol/client
>     option transport-type tcp
>     option remote-host jc1letgfs16-pfs1
>     option transport.socket.nodelay on
>     option transport.remote-port 6996
>     option remote-subvolume brick1
> end-volume
> 
> volume jc1letgfs14-pfs1-1
>     type protocol/client
>     option transport-type tcp
>     option remote-host jc1letgfs14-pfs1
>     option transport.socket.nodelay on
>     option transport.remote-port 6996
>     option remote-subvolume brick1
> end-volume
> 
> volume mirror-0
>     type cluster/replicate
>     subvolumes jc1letgfs13-pfs1-1 jc1letgfs14-pfs1-1
> end-volume
> 
> volume mirror-1
>     type cluster/replicate
>     subvolumes jc1letgfs15-pfs1-1 jc1letgfs16-pfs1-1
> end-volume
> 
> volume mirror-2
>     type cluster/replicate
>     subvolumes jc1letgfs17-pfs1-1 jc1letgfs18-pfs1-1
> end-volume
> 
> volume distribute
>     type cluster/distribute
>     subvolumes mirror-0 mirror-1 mirror-2
> end-volume
> 
> volume readahead
>     type performance/read-ahead
>     option page-count 4
>     subvolumes distribute
> end-volume
> 
> volume iocache
>     type performance/io-cache
>     option cache-size `echo $(( $(grep 'MemTotal' /proc/meminfo |
sed
> 's/[^0-9]//g') / 5120 ))`MB
>     option cache-timeout 1
>     subvolumes readahead
> end-volume
> 
> volume quickread
>     type performance/quick-read
>     option cache-timeout 1
>     option max-file-size 64kB
>     subvolumes iocache
> end-volume
> 
> volume writebehind
>     type performance/write-behind
>     option cache-size 4MB
>     subvolumes quickread
> end-volume
> 
> volume statprefetch
>     type performance/stat-prefetch
>     subvolumes writebehind
> end-volume
> 
> Glusterfsd.vol:
> ## file auto generated by /usr/bin/glusterfs-volgen (export.vol)
> # Cmd line:
> # $ /usr/bin/glusterfs-volgen --name testfs
> jc1letgfs13-pfs1:/export/read-write
> jc1letgfs14-pfs1:/export/read-write
> jc1letgfs15-pfs1:/export/read-write
> 
> volume posix1
>   type storage/posix
>   option directory /export/read-write
> end-volume
> 
> volume locks1
>     type features/locks
>     subvolumes posix1
> end-volume
> 
> volume brick1
>     type performance/io-threads
>     option thread-count 8
>     subvolumes locks1
> end-volume
> 
> volume server-tcp
>     type protocol/server
>     option transport-type tcp
>     option auth.addr.brick1.allow *
>     option transport.socket.listen-port 6996
>     option transport.socket.nodelay on
>     subvolumes brick1
> end-volume
> 
> James Burnash
> 
> 
> DISCLAIMER:
> This e-mail, and any attachments thereto, is intended only for use by
> the addressee(s) named herein and may contain legally privileged
> and/or confidential information. If you are not the intended recipient
> of this e-mail, you are hereby notified that any dissemination,
> distribution or copying of this e-mail, and any attachments thereto,
> is strictly prohibited. If you have received this in error, please
> immediately notify me and permanently delete the original and any copy
> of any e-mail and any printout thereof. E-mail transmission cannot be
> guaranteed to be secure or error-free. The sender therefore does not
> accept liability for any errors or omissions in the contents of this
> message which arise as a result of e-mail transmission.
> NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may,
> at its discretion, monitor and review the content of all e-mail
> communications. http://www.knight.com
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Burnash, James

2010-May-13 13:43 UTC

head link

[Gluster-users] Best Practices for Gluster Replication

Chris,

Excellent, and thanks - that was exactly what I was looking for.

James

-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at
gluster.org] On Behalf Of Christopher Hawkins
Sent: Thursday, May 13, 2010 8:23 AM
To: gluster-users
Subject: Re: [Gluster-users] Best Practices for Gluster Replication

I have followed that debate before. The impression I got was that if you handle
it in the clients, then they will be able to fail from one server to the next if
the original goes down (after the timeout).

But if you handle it in the server side, then if a server goes down the only way
to get HA for the clients is to externally implement round robin DNS or
something like that. Other than this issue, I think either way is technically
acceptable. If memory serves, this is the reason why client side is the
"default" or preferred setup.

Chris

----- "James Burnash" <jburnash at knight.com> wrote:
> I know it's only been a day, and I understand that people are busy  -
> but nobody has anything to share on this subject?
> 
> It seems like it would be good thing to be able to at least understand
> why implementing it on the back end would be a bad idea ...
> 
> -----Original Message-----
> From: gluster-users-bounces at gluster.org
> [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash,
> James
> Sent: Wednesday, May 12, 2010 10:30 AM
> To: gluster-users at gluster.org
> Subject: [Gluster-users] Best Practices for Gluster Replication
> 
> Greetings List,
> 
> I've searched through the Gluster wiki and a lot of threads to try to
> answer this question, but so far no real luck.
> 
> Simply put - is it better to have replication handled by the clients,
> or by the bricks themselves?
> 
> Volgen for a raid 1 solution creates a config file that does the
> mirroring on the client side - which I would take as an implicit
> endorsement from the Gluster team (great team, BTW). However, it seems
> to me that if the bricks replicated between themselves on our 10Gb
> storage network, it could save a lot of bandwidth for the clients and
> conceivably save them CPU cycles an I/O as well.
> 
> Client machines have 1Gb connections to the storage network, and are
> running CentOS 5.2.
> Server machines have 10Gb connections to the storage network, and are
> running CentOS 5.4.
> 
> Glusterfs.vol:
> ## file auto generated by /usr/bin/glusterfs-volgen (mount.vol)
> # Cmd line:
> # $ /usr/bin/glusterfs-volgen --name testfs --raid 1
> jc1letgfs13-pfs1:/export/read-write
> jc1letgfs14-pfs1:/export/read-write
> jc1letgfs15-pfs1:/export/read-write
> jc1letgfs16-pfs1:/export/read-write
> jc1letgfs17-pfs1:/export/read-write
> jc1letgfs18-pfs1:/export/read-write
> 
> # RAID 1
> # TRANSPORT-TYPE tcp
> volume jc1letgfs17-pfs1-1
>     type protocol/client
>     option transport-type tcp
>     option remote-host jc1letgfs17-pfs1
>     option transport.socket.nodelay on
>     option transport.remote-port 6996
>     option remote-subvolume brick1
> end-volume
> 
> volume jc1letgfs18-pfs1-1
>     type protocol/client
>     option transport-type tcp
>     option remote-host jc1letgfs18-pfs1
>     option transport.socket.nodelay on
>     option transport.remote-port 6996
>     option remote-subvolume brick1
> end-volume
> 
> volume jc1letgfs13-pfs1-1
>     type protocol/client
>     option transport-type tcp
>     option remote-host jc1letgfs13-pfs1
>     option transport.socket.nodelay on
>     option transport.remote-port 6996
>     option remote-subvolume brick1
> end-volume
> 
> volume jc1letgfs15-pfs1-1
>     type protocol/client
>     option transport-type tcp
>     option remote-host jc1letgfs15-pfs1
>     option transport.socket.nodelay on
>     option transport.remote-port 6996
>     option remote-subvolume brick1
> end-volume
> 
> volume jc1letgfs16-pfs1-1
>     type protocol/client
>     option transport-type tcp
>     option remote-host jc1letgfs16-pfs1
>     option transport.socket.nodelay on
>     option transport.remote-port 6996
>     option remote-subvolume brick1
> end-volume
> 
> volume jc1letgfs14-pfs1-1
>     type protocol/client
>     option transport-type tcp
>     option remote-host jc1letgfs14-pfs1
>     option transport.socket.nodelay on
>     option transport.remote-port 6996
>     option remote-subvolume brick1
> end-volume
> 
> volume mirror-0
>     type cluster/replicate
>     subvolumes jc1letgfs13-pfs1-1 jc1letgfs14-pfs1-1
> end-volume
> 
> volume mirror-1
>     type cluster/replicate
>     subvolumes jc1letgfs15-pfs1-1 jc1letgfs16-pfs1-1
> end-volume
> 
> volume mirror-2
>     type cluster/replicate
>     subvolumes jc1letgfs17-pfs1-1 jc1letgfs18-pfs1-1
> end-volume
> 
> volume distribute
>     type cluster/distribute
>     subvolumes mirror-0 mirror-1 mirror-2
> end-volume
> 
> volume readahead
>     type performance/read-ahead
>     option page-count 4
>     subvolumes distribute
> end-volume
> 
> volume iocache
>     type performance/io-cache
>     option cache-size `echo $(( $(grep 'MemTotal' /proc/meminfo |
sed
> 's/[^0-9]//g') / 5120 ))`MB
>     option cache-timeout 1
>     subvolumes readahead
> end-volume
> 
> volume quickread
>     type performance/quick-read
>     option cache-timeout 1
>     option max-file-size 64kB
>     subvolumes iocache
> end-volume
> 
> volume writebehind
>     type performance/write-behind
>     option cache-size 4MB
>     subvolumes quickread
> end-volume
> 
> volume statprefetch
>     type performance/stat-prefetch
>     subvolumes writebehind
> end-volume
> 
> Glusterfsd.vol:
> ## file auto generated by /usr/bin/glusterfs-volgen (export.vol)
> # Cmd line:
> # $ /usr/bin/glusterfs-volgen --name testfs
> jc1letgfs13-pfs1:/export/read-write
> jc1letgfs14-pfs1:/export/read-write
> jc1letgfs15-pfs1:/export/read-write
> 
> volume posix1
>   type storage/posix
>   option directory /export/read-write
> end-volume
> 
> volume locks1
>     type features/locks
>     subvolumes posix1
> end-volume
> 
> volume brick1
>     type performance/io-threads
>     option thread-count 8
>     subvolumes locks1
> end-volume
> 
> volume server-tcp
>     type protocol/server
>     option transport-type tcp
>     option auth.addr.brick1.allow *
>     option transport.socket.listen-port 6996
>     option transport.socket.nodelay on
>     subvolumes brick1
> end-volume
> 
> James Burnash
> 
> 
> DISCLAIMER:
> This e-mail, and any attachments thereto, is intended only for use by
> the addressee(s) named herein and may contain legally privileged
> and/or confidential information. If you are not the intended recipient
> of this e-mail, you are hereby notified that any dissemination,
> distribution or copying of this e-mail, and any attachments thereto,
> is strictly prohibited. If you have received this in error, please
> immediately notify me and permanently delete the original and any copy
> of any e-mail and any printout thereof. E-mail transmission cannot be
> guaranteed to be secure or error-free. The sender therefore does not
> accept liability for any errors or omissions in the contents of this
> message which arise as a result of e-mail transmission.
> NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may,
> at its discretion, monitor and review the content of all e-mail
> communications. http://www.knight.com
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Craig Carl

2010-May-14 05:01 UTC

head link

[Gluster-users] Best Practices for Gluster Replication

Marcus et all - 
Good discussion all around. A couple of points to clear up some of the
terminology and a couple of architecture questions that haven't been
answered.

1. The Gluster File System client is designed to be installed on devices that
are consuming the storage. By installing the client here you get -
1a. Mirror on write, simultaneous writes to any number of mirrors. 
2a. Storage server failure is transparent to your application. 
3a. Significant performance benefits. 

2. In a majority of installations the user uses the Gluster File System client
wherever possible but often needs to access the Gluster Cluster via NFS or CIFS
or some other NAS style protocol. Gluster is designed to support those needs.
There are a some concepts that are important to understand Gluster's
behavior when Gluster client isn't being used -
2a. Any file can be accessed from any node at any time. The physical location of
the file is irrelevant.
2b. The entire distributed filesystem can be accessed by all protocols at the
same time.
2c. Only the Gluster client can communicate with the Gluster server daemon. 
2d. Only the Gluster client can mirror or replicate. 
2e. The Gluster client can be installed on a Gluster server. 
2f. Fundamental to NFS, CIFS, etc is the idea that their clients access a single
IP address for storage. (Gluster client is a solution to this problem!) If the
remote storage server that they have mounted fails they have no way to access
the storage.
2g. The user is expected to provide some method of ensuring that when clients
access the Gluster cluster via NFS et all that the number of connections to any
one node are about the same as all the other nodes. The user is also expected to
provide a method of ensuring that if a storage server fails the NFS, CIFS, etc
client has the opportunity to connect to another storage server. Customers
usually use RRDNS, UCARP, Haproxy, or enterprise load balancing hardware (F5,
ACE, etc) for this IP failover / balancing layer.

That sounds more complicated than it is. We install the Gluster client on the
server, mount the distributed filesystem just like on any other host and then
re-export that mount as NFS, CIFS, etc. We install that stack on every storage
node. A user supplied layer on top of that balances inbound connections among
the nodes.

I've got a new pretty picture that tries to simplify some of this. It is a
really rough draft, your feedback is appreciated.

We (Gluster Inc) are working hard to find better ways to describe the big
picture Gluster architecture to you, our users. Any ideas, language, concepts,
pictures, questions you can't find the answers to, (42!) anything at all you
think might help please send it my way!





-- 
Craig Carl 



Gluster, Inc. 
Cell - (408) 829-9953 (California, USA) 
Gtalk - craig.carl at gmail.com 

----- Original Message ----- 
From: "Marcus Bointon" <marcus at synchromedia.co.uk> 
To: "gluster-users at gluster.org Users" <gluster-users at
gluster.org>
Sent: Thursday, May 13, 2010 7:43:26 AM GMT -08:00 US/Canada Pacific 
Subject: Re: [Gluster-users] Best Practices for Gluster Replication 


On 13 May 2010, at 16:28, Burnash, James wrote: 
> I'm also not sure how I would go about setting this up with 2 NFS
servers - would this be some sort of load balancing solution (using round robin
DNS or an actual load balancer), or would this be implemented by having each NFS
server responsible for only exporting a given portion of the whole Glusterfs
backend storage.
I'm not really sure of the best way to do it - NFS isn't really my
thing. I assume that there are load balancing / failover solutions (haproxy,
pound, heartbeat etc) that can deal with NFS - it would help if the balancer
understood NFS at some kind of transactional level (as they can for HTTP). I
would export each of the different gluster portions you want as separate NFS
share points.

Marcus 
-- 
Marcus Bointon 
Synchromedia Limited: Creators of http://www.smartmessages.net/ 
UK resellers of info at hand CRM solutions 
marcus at synchromedia.co.uk | http://www.synchromedia.co.uk/ 


_______________________________________________ 
Gluster-users mailing list 
Gluster-users at gluster.org 
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Craig Carl

2010-May-14 05:17 UTC

head link

[Gluster-users] Best Practices for Gluster Replication

Trying the attachment again, email is so complicated! 

Craig 

----- Original Message ----- 
From: "Craig Carl" <craig at gluster.com> 
To: "Marcus Bointon" <marcus at synchromedia.co.uk> 
Cc: "gluster-users at gluster.org Users" <gluster-users at
gluster.org>
Sent: Thursday, May 13, 2010 10:01:11 PM GMT -08:00 US/Canada Pacific 
Subject: Re: [Gluster-users] Best Practices for Gluster Replication 

Marcus et all - Good discussion all around. A couple of points to clear up some
of the terminology and a couple of architecture questions that haven't been
answered. 1. The Gluster File System client is designed to be installed on
devices that are consuming the storage. By installing the client here you get -
1a. Mirror on write, simultaneous writes to any number of mirrors. 2a. Storage
server failure is transparent to your application. 3a. Significant performance
benefits. 2. In a majority of installations the user uses the Gluster File
System client wherever possible but often needs to access the Gluster Cluster
via NFS or CIFS or some other NAS style protocol. Gluster is designed to support
those needs. There are a some concepts that are important to understand
Gluster's behavior when Gluster client isn't being used - 2a. Any file
can be accessed from any node at any time. The physical location of the file is
irrelevant. 2b. The entire distributed filesystem can be accessed by all
protocols at the same time. 2c. Only the Gluster client can communicate with the
Gluster server daemon. 2d. Only the Gluster client can mirror or replicate. 2e.
The Gluster client can be installed on a Gluster server. 2f. Fundamental to NFS,
CIFS, etc is the idea that their clients access a single IP address for storage.
(Gluster client is a solution to this problem!) If the remote storage server
that they have mounted fails they have no way to access the storage. 2g. The
user is expected to provide some method of ensuring that when clients access the
Gluster cluster via NFS et all that the number of connections to any one node
are about the same as all the other nodes. The user is also expected to provide
a method of ensuring that if a storage server fails the NFS, CIFS, etc client
has the opportunity to connect to another storage server. Customers usually use
RRDNS, UCARP, Haproxy, or enterprise load balancing hardware (F5, ACE, etc) for
this IP failover / balancing layer. That sounds more complicated than it is. We
install the Gluster client on the server, mount the distributed filesystem just
like on any other host and then re-export that mount as NFS, CIFS, etc. We
install that stack on every storage node. A user supplied layer on top of that
balances inbound connections among the nodes. I've got a new pretty picture
that tries to simplify some of this. It is a really rough draft, your feedback
is appreciated. We (Gluster Inc) are working hard to find better ways to
describe the big picture Gluster architecture to you, our users. Any ideas,
language, concepts, pictures, questions you can't find the answers to, (42!)
anything at all you think might help please send it my way! -- Craig Carl
Gluster, Inc. Cell - (408) 829-9953 (California, USA) Gtalk - craig.carl at
gmail.com ----- Original Message ----- From: "Marcus Bointon" To:
"gluster-users at gluster.org Users" Sent: Thursday, May 13, 2010
7:43:26 AM GMT -08:00 US/Canada Pacific Subject: Re: [Gluster-users] Best
Practices for Gluster Replication On 13 May 2010, at 16:28, Burnash, James
wrote: > I'm also not sure how I would go about setting this up with 2
NFS servers - would this be some sort of load balancing solution (using round
robin DNS or an actual load balancer), or would this be implemented by having
each NFS server responsible for only exporting a given portion of the whole
Glusterfs backend storage. I'm not really sure of the best way to do it -
NFS isn't really my thing. I assume that there are load balancing / failover
solutions (haproxy, pound, heartbeat etc) that can deal with NFS - it would help
if the balancer understood NFS at some kind of transactional level (as they can
for HTTP). I would export each of the different gluster portions you want as
separate NFS share points. Marcus -- Marcus Bointon Synchromedia Limited:
Creators of http://www.smartmessages.net/ UK resellers of info at hand CRM
solutions marcus at synchromedia.co.uk | http://www.synchromedia.co.uk/
_______________________________________________ Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Gluster users - May 2010 - Best Practices for Gluster Replication

[Gluster-users] Best Practices for Gluster Replication

[Gluster-users] Best Practices for Gluster Replication

[Gluster-users] Best Practices for Gluster Replication

[Gluster-users] Best Practices for Gluster Replication