Dear all, I'm rather new to glusterfs but have some experience running lager lustre and beegfs installations. These filesystems provide active/active failover. Now, I discovered that I can also do this in glusterfs, although I didn't find detailed documentation about it. (I'm using glusterfs 3.10.8) So my question is: can I really use glusterfs to do failover in the way described below, or am I misusing glusterfs? (and potentially corrupting my data?) My setup is: I have two servers (qlogin and gluster2) that access a shared SAN storage. Both servers connect to the same SAN (SAS multipath) and I implement locking via lvm2 and sanlock, so I can mount the same storage on either server. The idea is that normally each server serves one brick, but in case one server fails, the other server can serve both bricks. (I'm not interested on automatic failover, I'll always do this manually. I could also use this to do maintainance on one server, with only minimal downtime.) #normal setup: [root at qlogin ~]# gluster volume info g2 #... # Volume Name: g2 # Type: Distribute # Brick1: qlogin:/glust/castor/brick # Brick2: gluster2:/glust/pollux/brick # failover: let's artificially fail one server by killing one glusterfsd: [root at qlogin] systemctl status glusterd [root at qlogin] kill -9 <pid/of/glusterfsd/running/brick/castor> # unmount brick [root at qlogin] umount /glust/castor/ # deactive LV [root at qlogin] lvchange -a n vgosb06vd05/castor ### now do the failover: # active same storage on other server: [root at gluster2] lvchange -a y vgosb06vd05/castor # mount on other server [root at gluster2] mount /dev/mapper/vgosb06vd05-castor /glust/castor # now move the "failed" brick to the other server [root at gluster2] gluster volume replace-brick g2 qlogin:/glust/castor/brick gluster2:/glust/castor/brick commit force ### The last line is the one I have doubts about #now I'm in failover state: #Both bricks on one server: [root at qlogin ~]# gluster volume info g2 #... # Volume Name: g2 # Type: Distribute # Brick1: gluster2:/glust/castor/brick # Brick2: gluster2:/glust/pollux/brick Is it intended to work this way? Thanks a lot! best wishes, Stefan
Hi Stefan, I think what you propose will work, though you should test it thoroughly. I think more generally, "the GlusterFS way" would be to use 2-way replication instead of a distributed volume; then you can lose one of your servers without outage. And re-synchronize when it comes back up. Chances are if you weren't using the SAN volumes; you could have purchased two servers each with enough disk to make two copies of the data, all for less dollars... Regards, Alex On Mon, Dec 11, 2017 at 12:52 PM, Stefan Solbrig <stefan.solbrig at ur.de> wrote:> Dear all, > > I'm rather new to glusterfs but have some experience running lager lustre > and beegfs installations. These filesystems provide active/active > failover. Now, I discovered that I can also do this in glusterfs, although > I didn't find detailed documentation about it. (I'm using glusterfs 3.10.8) > > So my question is: can I really use glusterfs to do failover in the way > described below, or am I misusing glusterfs? (and potentially corrupting my > data?) > > My setup is: I have two servers (qlogin and gluster2) that access a shared > SAN storage. Both servers connect to the same SAN (SAS multipath) and I > implement locking via lvm2 and sanlock, so I can mount the same storage on > either server. > The idea is that normally each server serves one brick, but in case one > server fails, the other server can serve both bricks. (I'm not interested > on automatic failover, I'll always do this manually. I could also use this > to do maintainance on one server, with only minimal downtime.) > > > #normal setup: > [root at qlogin ~]# gluster volume info g2 > #... > # Volume Name: g2 > # Type: Distribute > # Brick1: qlogin:/glust/castor/brick > # Brick2: gluster2:/glust/pollux/brick > > # failover: let's artificially fail one server by killing one glusterfsd: > [root at qlogin] systemctl status glusterd > [root at qlogin] kill -9 <pid/of/glusterfsd/running/brick/castor> > > # unmount brick > [root at qlogin] umount /glust/castor/ > > # deactive LV > [root at qlogin] lvchange -a n vgosb06vd05/castor > > > ### now do the failover: > > # active same storage on other server: > [root at gluster2] lvchange -a y vgosb06vd05/castor > > # mount on other server > [root at gluster2] mount /dev/mapper/vgosb06vd05-castor /glust/castor > > # now move the "failed" brick to the other server > [root at gluster2] gluster volume replace-brick g2 > qlogin:/glust/castor/brick gluster2:/glust/castor/brick commit force > ### The last line is the one I have doubts about > > #now I'm in failover state: > #Both bricks on one server: > [root at qlogin ~]# gluster volume info g2 > #... > # Volume Name: g2 > # Type: Distribute > # Brick1: gluster2:/glust/castor/brick > # Brick2: gluster2:/glust/pollux/brick > > > Is it intended to work this way? > > Thanks a lot! > > best wishes, > Stefan > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171211/eb8d3cf1/attachment.html>
Hi Alex, Thank you for the quick reply! Yes, I'm aware that using ?plain? hardware with replication is more what GlusterFS is for. I cannot talk about prices where in detail, but for me, it evens more or less out. Moreover, I have more SAN that I'd rather re-use (because of Lustre) than buy new hardware. I'll test more to understand what precisely "replace-brick" changes. I understand the mode of operation in case of replicated volumes. But I was surprised (in a good way) that is was also working for distributed volumes, i.e., I was surprised that gluster does not complain of the new brick already contains data. It there some technical documentation of the inner workings of glusterfs? This leads me to the question: If I wanted to extend my current installation (the one that uses SANs) with more standard hardware: is is possible to mix replicated and non-replicated bricks? (I assume no... but I still dare to ask.) best wishes, Stefan> Am 11.12.2017 um 23:07 schrieb Alex Chekholko <alex at calicolabs.com>: > > Hi Stefan, > > I think what you propose will work, though you should test it thoroughly. > > I think more generally, "the GlusterFS way" would be to use 2-way replication instead of a distributed volume; then you can lose one of your servers without outage. And re-synchronize when it comes back up. > > Chances are if you weren't using the SAN volumes; you could have purchased two servers each with enough disk to make two copies of the data, all for less dollars... > > Regards, > Alex > > > On Mon, Dec 11, 2017 at 12:52 PM, Stefan Solbrig <stefan.solbrig at ur.de <mailto:stefan.solbrig at ur.de>> wrote: > Dear all, > > I'm rather new to glusterfs but have some experience running lager lustre and beegfs installations. These filesystems provide active/active failover. Now, I discovered that I can also do this in glusterfs, although I didn't find detailed documentation about it. (I'm using glusterfs 3.10.8) > > So my question is: can I really use glusterfs to do failover in the way described below, or am I misusing glusterfs? (and potentially corrupting my data?) > > My setup is: I have two servers (qlogin and gluster2) that access a shared SAN storage. Both servers connect to the same SAN (SAS multipath) and I implement locking via lvm2 and sanlock, so I can mount the same storage on either server. > The idea is that normally each server serves one brick, but in case one server fails, the other server can serve both bricks. (I'm not interested on automatic failover, I'll always do this manually. I could also use this to do maintainance on one server, with only minimal downtime.) > > > #normal setup: > [root at qlogin ~]# gluster volume info g2 > #... > # Volume Name: g2 > # Type: Distribute > # Brick1: qlogin:/glust/castor/brick > # Brick2: gluster2:/glust/pollux/brick > > # failover: let's artificially fail one server by killing one glusterfsd: > [root at qlogin] systemctl status glusterd > [root at qlogin] kill -9 <pid/of/glusterfsd/running/brick/castor> > > # unmount brick > [root at qlogin] umount /glust/castor/ > > # deactive LV > [root at qlogin] lvchange -a n vgosb06vd05/castor > > > ### now do the failover: > > # active same storage on other server: > [root at gluster2] lvchange -a y vgosb06vd05/castor > > # mount on other server > [root at gluster2] mount /dev/mapper/vgosb06vd05-castor /glust/castor > > # now move the "failed" brick to the other server > [root at gluster2] gluster volume replace-brick g2 qlogin:/glust/castor/brick gluster2:/glust/castor/brick commit force > ### The last line is the one I have doubts about > > #now I'm in failover state: > #Both bricks on one server: > [root at qlogin ~]# gluster volume info g2 > #... > # Volume Name: g2 > # Type: Distribute > # Brick1: gluster2:/glust/castor/brick > # Brick2: gluster2:/glust/pollux/brick > > > Is it intended to work this way? > > Thanks a lot! > > best wishes, > Stefan > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171212/ee557be4/attachment.html>