kup at kg-fds.de
2024-Jun-09 13:00 UTC
[Gluster-users] Replace broken host, keeping the existing bricks
Hi all, I know there are many tutorials on how to replace a gluster host that has become unusable. But they all seem to assume that the bricks of the respective host are gone, too. My problem is different and (I hope) more easily solved: the disk with the host?s root file system died and cannot be recovered. However, all of its bricks are on separate disks and completely undamaged. I?m seeking your advice on what is best practice for replacing such a host. My notion is that it should be possible to setup a new root system, configure it and have it use the existing bricks. My questions are: 1) Is this a good idea at all or do I miss anything? Would it be better to format the existing bricks and start over with a completely clean new host, like most of the tutorials do? 2) If it is feasible to use the existing bricks, two scenarios come to my mind: a) Setup a new root file system for a gluster host and copy/change gluster configuration from one of the existing hosts. Adjust it so that the newly setup host actually thinks it is the old host (that died). I.e., copying over the gluster UID, Volume configurations, Hostnamen, IP, etc. (What else would it need?) The pool would then recognize the new host as identical to the old one that died and accept it just like the old host came online again. b) Setup a new root file system for a gluster host and probe it into the trusted pool, with a new name and new gluster UID. Transfer bricks of the old host that died to the new one using ?change-brick?. There would be no need for lengthy syncing as most of the data is existing and up-to-date on the new host (that has the bricks of the old host), only self-heal would take place. Do these scenarios sound sane to you and which one would be best practice in this situation? This is a production system, so safety is relevant. Thanks for any helpful comments and opinions! Best, R. Kupper
Stefan Solbrig
2024-Jun-11 11:51 UTC
[Gluster-users] [EXT] Replace broken host, keeping the existing bricks
Hi, The method depends a bit if you use a distributed-only system (like me) or a replicated setting. I'm using a distributed-only setting (many bricks on different servers, but no replication). All my servers boot via network, i.e., on a start, it's like a new host. To rescue the old bricks, just set up a new server this the same OS, the same IP and and the same hostname (!very important). The simplest thing would be if you could retrieve the files in /var/lib/glusterd If you install a completely new server (but with the same IP and the same hostname), _then_ restore the files in /var/lib/glusterd, you can just use it as before. It will be recognised as the previous peer, without any additional commands. In fact, I think that /var/lib/glusterd/*... should be identical on all servers, except /var/lib/glusterd/glusterd.info <http://glusterd.info/> which holds the UUID of the server. However, you should be able to retrieve the UUID from the command: gluster pool list This is your scenario 2a) Note that if it's __not__ a distributed-only system, other steps might be necessary. Your 2b) scenario should also work, but slightly different. (Again, only distributed-only) I use it occasionally for failover mode, but I haven't tested it extensively: gluster v reset-brick NameOfVolume FailedServer:/path/to/brick start gluster v add-brick NameOfVolume NewServer:/path/to/brick force # Order is important! # if brick is removed before other brick is added, # will lead to duplicate files. gluster v remove-brick NameOfVolume FailedServer:/path/to/brick force gluster v rebalance NameOfVolume fix-layout start If it's also replcated or striped or using sharding, then other steps might be necessary. best wishes, Stefan Solbrig -- Dr. Stefan Solbrig Universit?t Regensburg Fakult?t f?r Informatik und Data Science 93040 Regensburg> Am 09.06.2024 um 15:00 schrieb kup at kg-fds.de: > > Hi all, > > I know there are many tutorials on how to replace a gluster host that has become unusable. But they all seem to assume that the bricks of the respective host are gone, too. > > My problem is different and (I hope) more easily solved: the disk with the host?s root file system died and cannot be recovered. However, all of its bricks are on separate disks and completely undamaged. > > I?m seeking your advice on what is best practice for replacing such a host. > > My notion is that it should be possible to setup a new root system, configure it and have it use the existing bricks. > > My questions are: > > 1) Is this a good idea at all or do I miss anything? Would it be better to format the existing bricks and start over with a completely clean new host, like most of the tutorials do? > > 2) If it is feasible to use the existing bricks, two scenarios come to my mind: > > a) Setup a new root file system for a gluster host and copy/change gluster configuration from one of the existing hosts. Adjust it so that the newly setup host actually thinks it is the old host (that died). I.e., copying over the gluster UID, Volume configurations, Hostnamen, IP, etc. (What else would it need?) > The pool would then recognize the new host as identical to the old one that died and accept it just like the old host came online again. > > b) Setup a new root file system for a gluster host and probe it into the trusted pool, with a new name and new gluster UID. Transfer bricks of the old host that died to the new one using ?change-brick?. There would be no need for lengthy syncing as most of the data is existing and up-to-date on the new host (that has the bricks of the old host), only self-heal would take place. > > > Do these scenarios sound sane to you and which one would be best practice in this situation? This is a production system, so safety is relevant. > > > Thanks for any helpful comments and opinions! > > Best, R. Kupper > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20240611/0180e37f/attachment.html>