kup at kg-fds.de
2024-Jun-09 13:00 UTC
[Gluster-users] Replace broken host, keeping the existing bricks
Hi all,
I know there are many tutorials on how to replace a gluster host that has become
unusable. But they all seem to assume that the bricks of the respective host are
gone, too.
My problem is different and (I hope) more easily solved: the disk with the
host?s root file system died and cannot be recovered. However, all of its bricks
are on separate disks and completely undamaged.
I?m seeking your advice on what is best practice for replacing such a host.
My notion is that it should be possible to setup a new root system, configure it
and have it use the existing bricks.
My questions are:
1) Is this a good idea at all or do I miss anything? Would it be better to
format the existing bricks and start over with a completely clean new host, like
most of the tutorials do?
2) If it is feasible to use the existing bricks, two scenarios come to my mind:
a) Setup a new root file system for a gluster host and copy/change gluster
configuration from one of the existing hosts. Adjust it so that the newly setup
host actually thinks it is the old host (that died). I.e., copying over the
gluster UID, Volume configurations, Hostnamen, IP, etc. (What else would it
need?)
The pool would then recognize the new host as identical to the old one that
died and accept it just like the old host came online again.
b) Setup a new root file system for a gluster host and probe it into the
trusted pool, with a new name and new gluster UID. Transfer bricks of the old
host that died to the new one using ?change-brick?. There would be no need for
lengthy syncing as most of the data is existing and up-to-date on the new host
(that has the bricks of the old host), only self-heal would take place.
Do these scenarios sound sane to you and which one would be best practice in
this situation? This is a production system, so safety is relevant.
Thanks for any helpful comments and opinions!
Best, R. Kupper
Stefan Solbrig
2024-Jun-11 11:51 UTC
[Gluster-users] [EXT] Replace broken host, keeping the existing bricks
Hi,
The method depends a bit if you use a distributed-only system (like me) or a
replicated setting.
I'm using a distributed-only setting (many bricks on different servers, but
no replication). All my servers boot via network, i.e., on a start, it's
like a new host.
To rescue the old bricks, just set up a new server this the same OS, the same IP
and and the same hostname (!very important). The simplest thing would be if you
could retrieve the files in /var/lib/glusterd
If you install a completely new server (but with the same IP and the same
hostname), _then_ restore the files in /var/lib/glusterd, you can just use it
as before. It will be recognised as the previous peer, without any additional
commands.
In fact, I think that /var/lib/glusterd/*... should be identical on all
servers, except
/var/lib/glusterd/glusterd.info <http://glusterd.info/>
which holds the UUID of the server. However, you should be able to retrieve the
UUID from the command:
gluster pool list
This is your scenario 2a)
Note that if it's __not__ a distributed-only system, other steps might be
necessary.
Your 2b) scenario should also work, but slightly different. (Again, only
distributed-only)
I use it occasionally for failover mode, but I haven't tested it
extensively:
gluster v reset-brick NameOfVolume FailedServer:/path/to/brick start
gluster v add-brick NameOfVolume NewServer:/path/to/brick force
# Order is important!
# if brick is removed before other brick is added,
# will lead to duplicate files.
gluster v remove-brick NameOfVolume FailedServer:/path/to/brick force
gluster v rebalance NameOfVolume fix-layout start
If it's also replcated or striped or using sharding, then other steps might
be necessary.
best wishes,
Stefan Solbrig
--
Dr. Stefan Solbrig
Universit?t Regensburg
Fakult?t f?r Informatik und Data Science
93040 Regensburg
> Am 09.06.2024 um 15:00 schrieb kup at kg-fds.de:
>
> Hi all,
>
> I know there are many tutorials on how to replace a gluster host that has
become unusable. But they all seem to assume that the bricks of the respective
host are gone, too.
>
> My problem is different and (I hope) more easily solved: the disk with the
host?s root file system died and cannot be recovered. However, all of its bricks
are on separate disks and completely undamaged.
>
> I?m seeking your advice on what is best practice for replacing such a host.
>
> My notion is that it should be possible to setup a new root system,
configure it and have it use the existing bricks.
>
> My questions are:
>
> 1) Is this a good idea at all or do I miss anything? Would it be better to
format the existing bricks and start over with a completely clean new host, like
most of the tutorials do?
>
> 2) If it is feasible to use the existing bricks, two scenarios come to my
mind:
>
> a) Setup a new root file system for a gluster host and copy/change gluster
configuration from one of the existing hosts. Adjust it so that the newly setup
host actually thinks it is the old host (that died). I.e., copying over the
gluster UID, Volume configurations, Hostnamen, IP, etc. (What else would it
need?)
> The pool would then recognize the new host as identical to the old one
that died and accept it just like the old host came online again.
>
> b) Setup a new root file system for a gluster host and probe it into the
trusted pool, with a new name and new gluster UID. Transfer bricks of the old
host that died to the new one using ?change-brick?. There would be no need for
lengthy syncing as most of the data is existing and up-to-date on the new host
(that has the bricks of the old host), only self-heal would take place.
>
>
> Do these scenarios sound sane to you and which one would be best practice
in this situation? This is a production system, so safety is relevant.
>
>
> Thanks for any helpful comments and opinions!
>
> Best, R. Kupper
>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20240611/0180e37f/attachment.html>