thr3ads.net - Gluster users - [Gluster-users] proper way to temporarily remove brick server from replica cluster to avoid kvm guest disruption [Mar 2022]

If this information is useful, please help other people find it:
Share via:

Strahil Nikolov

2022-Mar-06 08:06 UTC

[Gluster-users] proper way to temporarily remove brick server from replica cluster to avoid kvm guest disruption

It seems that only vh1-4 provide bricks, so vh5,6,7,8 can be removed.
First check why vh5 is offline. Changes to all modes are propagated and in this
case vh5 is down and won't receive the peer detach commands.
Once you fix vh5, you can safely 'gluster peer detach' any of the nodes
that is not in the volume.
Keep in mind that it's always best practice to have odd number of nodes in
the TSP (3,5,7,9,etc).
Best Regards,Strahil Nikolov 
 
  On Sun, Mar 6, 2022 at 4:06, Todd Pfaff<pfaff at rhpcs.mcmaster.ca>
wrote:   [root at vh1 ~]# gluster volume info vol1

Volume Name: vol1
Type: Replicate
Volume ID: dfd681bb-5b68-4831-9863-e13f9f027620
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: vh1:/pool/gluster/brick1/data
Brick2: vh2:/pool/gluster/brick1/data
Brick3: vh3:/pool/gluster/brick1/data
Brick4: vh4:/pool/gluster/brick1/data
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off


[root at vh1 ~]# gluster pool list
UUID? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Hostname? ? ? ? State
75fc4258-fabd-47c9-8198-bbe6e6a906fb? ? vh2? ? ? ? ? ? Connected
00697e28-96c0-4534-a314-e878070b653d? ? vh3? ? ? ? ? ? Connected
2a9b891b-35d0-496c-bb06-f5dab4feb6bf? ? vh4? ? ? ? ? ? Connected
8ba6fb80-3b13-4379-94cf-22662cbb48a2? ? vh5? ? ? ? ? ? Disconnected
1298d334-3500-4b40-a8bd-cc781f7349d0? ? vh6? ? ? ? ? ? Connected
79a533ac-3d89-44b9-b0ce-823cfec8cf75? ? vh7? ? ? ? ? ? Connected
4141cd74-9c13-404c-a02c-f553fa19bc22? ? vh8? ? ? ? ? ? Connected


On Sat, 5 Mar 2022, Strahil Nikolov wrote:
> Hey Todd,
> 
> can you provide 'gluster volume info <VOLUME>' ?
> 
> Best Regards,
> Strahil Nikolov
>
>? ? ? On Sat, Mar 5, 2022 at 18:17, Todd Pfaff
> <pfaff at rhpcs.mcmaster.ca> wrote:
> I have a replica volume created as:
> 
> gluster volume create vol1 replica 4 \
> ? host{1,2,3,4}:/mnt/gluster/brick1/data \
> ? force
> 
> 
> All hosts host{1,2,3,4} mount this volume as:
> 
> localhost:/vol1 /mnt/gluster/vol1 glusterfs defaults
> 
> 
> Some other hosts are trusted peers but do not contribute bricks, and
> they
> also mount vol1 in the same way:
> 
> localhost:/vol1 /mnt/gluster/vol1 glusterfs defaults
> 
> 
> All hosts run CentOS 7.9, and all are running glusterfs 9.4 or 9.5 from
> centos-release-gluster9-1.0-1.el7.noarch.
> 
> 
> All hosts run kvm guests that use qcow2 files for root filesystems that
> are stored on gluster volume vol1.
> 
> 
> This is all working well, as long as none of host{1,2,3,4} go offline.
> 
> 
> I want to take one of host{1,2,3,4} offline temporarily for
> maintenance.
> I'll refer to this as hostX.
> 
> I understand that hostX will need to be healed when it comes back
> online.
> 
> I would, of course, migrate guests from hostX to another host, in which
> case hostX would then only be participating as a gluster replica brick
> provider and serving gluster client requests.
> 
> What I've experienced is that if I take one of host{1,2,3,4} offline,
> this
> can disrupt some of the VM guests on various other hosts such that
> their
> root filesystems go to read-only.
> 
> What I'm looking for here are suggestions as to how to properly take
> one
> of host{1,2,3,4} offline to avoid such disruption or how to tune the
> libvirt kvm hosts and guests to be sufficiently resilient in the face
> of
> taking one gluster replica node offline.
> 
> Thanks,
> Todd
> ________
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
> 
> 
>  
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20220306/b2fb3547/attachment.html>

Strahil Nikolov

2022-Mar-06 08:06 UTC

head link

[Gluster-users] proper way to temporarily remove brick server from replica cluster to avoid kvm guest disruption

Is this an oVirt cluster ?
Best Regards,Strahil Nikolov
 
 
  On Sun, Mar 6, 2022 at 10:06, Strahil Nikolov<hunter86_bg at yahoo.com>
wrote:   It seems that only vh1-4 provide bricks, so vh5,6,7,8 can be removed.
First check why vh5 is offline. Changes to all modes are propagated and in this
case vh5 is down and won't receive the peer detach commands.
Once you fix vh5, you can safely 'gluster peer detach' any of the nodes
that is not in the volume.
Keep in mind that it's always best practice to have odd number of nodes in
the TSP (3,5,7,9,etc).
Best Regards,Strahil Nikolov 
 
  On Sun, Mar 6, 2022 at 4:06, Todd Pfaff<pfaff at rhpcs.mcmaster.ca>
wrote:   [root at vh1 ~]# gluster volume info vol1

Volume Name: vol1
Type: Replicate
Volume ID: dfd681bb-5b68-4831-9863-e13f9f027620
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: vh1:/pool/gluster/brick1/data
Brick2: vh2:/pool/gluster/brick1/data
Brick3: vh3:/pool/gluster/brick1/data
Brick4: vh4:/pool/gluster/brick1/data
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off


[root at vh1 ~]# gluster pool list
UUID? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Hostname? ? ? ? State
75fc4258-fabd-47c9-8198-bbe6e6a906fb? ? vh2? ? ? ? ? ? Connected
00697e28-96c0-4534-a314-e878070b653d? ? vh3? ? ? ? ? ? Connected
2a9b891b-35d0-496c-bb06-f5dab4feb6bf? ? vh4? ? ? ? ? ? Connected
8ba6fb80-3b13-4379-94cf-22662cbb48a2? ? vh5? ? ? ? ? ? Disconnected
1298d334-3500-4b40-a8bd-cc781f7349d0? ? vh6? ? ? ? ? ? Connected
79a533ac-3d89-44b9-b0ce-823cfec8cf75? ? vh7? ? ? ? ? ? Connected
4141cd74-9c13-404c-a02c-f553fa19bc22? ? vh8? ? ? ? ? ? Connected


On Sat, 5 Mar 2022, Strahil Nikolov wrote:
> Hey Todd,
> 
> can you provide 'gluster volume info <VOLUME>' ?
> 
> Best Regards,
> Strahil Nikolov
>
>? ? ? On Sat, Mar 5, 2022 at 18:17, Todd Pfaff
> <pfaff at rhpcs.mcmaster.ca> wrote:
> I have a replica volume created as:
> 
> gluster volume create vol1 replica 4 \
> ? host{1,2,3,4}:/mnt/gluster/brick1/data \
> ? force
> 
> 
> All hosts host{1,2,3,4} mount this volume as:
> 
> localhost:/vol1 /mnt/gluster/vol1 glusterfs defaults
> 
> 
> Some other hosts are trusted peers but do not contribute bricks, and
> they
> also mount vol1 in the same way:
> 
> localhost:/vol1 /mnt/gluster/vol1 glusterfs defaults
> 
> 
> All hosts run CentOS 7.9, and all are running glusterfs 9.4 or 9.5 from
> centos-release-gluster9-1.0-1.el7.noarch.
> 
> 
> All hosts run kvm guests that use qcow2 files for root filesystems that
> are stored on gluster volume vol1.
> 
> 
> This is all working well, as long as none of host{1,2,3,4} go offline.
> 
> 
> I want to take one of host{1,2,3,4} offline temporarily for
> maintenance.
> I'll refer to this as hostX.
> 
> I understand that hostX will need to be healed when it comes back
> online.
> 
> I would, of course, migrate guests from hostX to another host, in which
> case hostX would then only be participating as a gluster replica brick
> provider and serving gluster client requests.
> 
> What I've experienced is that if I take one of host{1,2,3,4} offline,
> this
> can disrupt some of the VM guests on various other hosts such that
> their
> root filesystems go to read-only.
> 
> What I'm looking for here are suggestions as to how to properly take
> one
> of host{1,2,3,4} offline to avoid such disruption or how to tune the
> libvirt kvm hosts and guests to be sufficiently resilient in the face
> of
> taking one gluster replica node offline.
> 
> Thanks,
> Todd
> ________
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
> 
> 
>  
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20220306/4bbe281a/attachment.html>

Todd Pfaff

2022-Mar-06 21:31 UTC

head link

[Gluster-users] proper way to temporarily remove brick server from replica cluster to avoid kvm guest disruption

On Sun, 6 Mar 2022, Strahil Nikolov wrote:
> It seems that only vh1-4 provide bricks, so vh5,6,7,8 can be removed.

Right, that was the point of my question: how to properly shutdown any one 
of vh1-4 for maintenance without disrupting any VMs that may be running on 
any of vh1-8.

When I had done a test of taking vh1 offline several days ago, all of the 
VMs on vh4 went root-fs-read-only, which suprised me.  I suppose it's 
possible that there was something else at play that I haven't realized, 
and that taking the vh1 gluster peer offline was not the root cause of the 
vh4 VM failure.  I haven't yet tried another such test yet - I was holding 
off until I'd gotten some advice here first.

> First check why vh5 is offline. Changes to all modes are propagated and in
> this case vh5 is down and won't receive the peer detach commands.

Ok, interesting, but I have to admit that I don't understand that 
requirement.  I knew that vh5 was offline but I didn't know that I'd
have
to bring it back online in order to properly shutdown one of vh1-4.  Are 
you certain about that?  That is, if vh5 stays offline and I take vh4 
offline, and then I bring vh5 online, will the quorum of peers not set vh5 
straight?

> 
> Once you fix vh5, you can safely 'gluster peer detach' any of the
nodes that
> is not in the volume.

Ok, I'll try peer detach to take any of vh1-4 offline in a controlled 
manner.

I take this to mean that if any one of the vh1-4 replica members were to 
go offline in a uncontrolled manner, gluster peers may have a problem 
which could lead to the sort of VM behaviour that I experienced.  Frankly 
this suprises me - I expected that my setup was more resilient in the face 
of losing gluster replica members as long as there was still a quorum of 
members operating normally.

> 
> Keep in mind that it's always best practice to have odd number of nodes
in
> the TSP (3,5,7,9,etc).

Do you know why that's the case?  I understand that 3 or more are 
recommended (could be 2 and a arbiter) but why an odd number?  What 
benefit does 3 provide that 4 does not?

Thanks,
Todd

> 
> Best Regards,
> Strahil Nikolov
>
>       On Sun, Mar 6, 2022 at 4:06, Todd Pfaff
> <pfaff at rhpcs.mcmaster.ca> wrote:
> [root at vh1 ~]# gluster volume info vol1
> 
> Volume Name: vol1
> Type: Replicate
> Volume ID: dfd681bb-5b68-4831-9863-e13f9f027620
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 4 = 4
> Transport-type: tcp
> Bricks:
> Brick1: vh1:/pool/gluster/brick1/data
> Brick2: vh2:/pool/gluster/brick1/data
> Brick3: vh3:/pool/gluster/brick1/data
> Brick4: vh4:/pool/gluster/brick1/data
> Options Reconfigured:
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: off
> 
> 
> [root at vh1 ~]# gluster pool list
> UUID? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Hostname? ? ? ? State
> 75fc4258-fabd-47c9-8198-bbe6e6a906fb? ? vh2? ? ? ? ? ? Connected
> 00697e28-96c0-4534-a314-e878070b653d? ? vh3? ? ? ? ? ? Connected
> 2a9b891b-35d0-496c-bb06-f5dab4feb6bf? ? vh4? ? ? ? ? ? Connected
> 8ba6fb80-3b13-4379-94cf-22662cbb48a2? ? vh5? ? ? ? ? ? Disconnected
> 1298d334-3500-4b40-a8bd-cc781f7349d0? ? vh6? ? ? ? ? ? Connected
> 79a533ac-3d89-44b9-b0ce-823cfec8cf75? ? vh7? ? ? ? ? ? Connected
> 4141cd74-9c13-404c-a02c-f553fa19bc22? ? vh8? ? ? ? ? ? Connected
> 
> 
> On Sat, 5 Mar 2022, Strahil Nikolov wrote:
> 
> > Hey Todd,
> >
> > can you provide 'gluster volume info <VOLUME>' ?
> >
> > Best Regards,
> > Strahil Nikolov
> >
> >? ? ? On Sat, Mar 5, 2022 at 18:17, Todd Pfaff
> > <pfaff at rhpcs.mcmaster.ca> wrote:
> > I have a replica volume created as:
> >
> > gluster volume create vol1 replica 4 \
> > ? host{1,2,3,4}:/mnt/gluster/brick1/data \
> > ? force
> >
> >
> > All hosts host{1,2,3,4} mount this volume as:
> >
> > localhost:/vol1 /mnt/gluster/vol1 glusterfs defaults
> >
> >
> > Some other hosts are trusted peers but do not contribute bricks, and
> > they
> > also mount vol1 in the same way:
> >
> > localhost:/vol1 /mnt/gluster/vol1 glusterfs defaults
> >
> >
> > All hosts run CentOS 7.9, and all are running glusterfs 9.4 or 9.5
> from
> > centos-release-gluster9-1.0-1.el7.noarch.
> >
> >
> > All hosts run kvm guests that use qcow2 files for root filesystems
> that
> > are stored on gluster volume vol1.
> >
> >
> > This is all working well, as long as none of host{1,2,3,4} go
> offline.
> >
> >
> > I want to take one of host{1,2,3,4} offline temporarily for
> > maintenance.
> > I'll refer to this as hostX.
> >
> > I understand that hostX will need to be healed when it comes back
> > online.
> >
> > I would, of course, migrate guests from hostX to another host, in
> which
> > case hostX would then only be participating as a gluster replica
> brick
> > provider and serving gluster client requests.
> >
> > What I've experienced is that if I take one of host{1,2,3,4}
offline,
> > this
> > can disrupt some of the VM guests on various other hosts such that
> > their
> > root filesystems go to read-only.
> >
> > What I'm looking for here are suggestions as to how to properly
take
> > one
> > of host{1,2,3,4} offline to avoid such disruption or how to tune the
> > libvirt kvm hosts and guests to be sufficiently resilient in the face
> > of
> > taking one gluster replica node offline.
> >
> > Thanks,
> > Todd
> > ________
> >
> >
> >
> > Community Meeting Calendar:
> >
> > Schedule -
> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > Bridge: https://meet.google.com/cpu-eiue-hvk
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >
> >
> 
> 
>

Gluster users - Mar 2022 - proper way to temporarily remove brick server from replica cluster to avoid kvm guest disruption

[Gluster-users] proper way to temporarily remove brick server from replica cluster to avoid kvm guest disruption

[Gluster-users] proper way to temporarily remove brick server from replica cluster to avoid kvm guest disruption

[Gluster-users] proper way to temporarily remove brick server from replica cluster to avoid kvm guest disruption