Strahil Nikolov
2022-Aug-18 21:38 UTC
[Gluster-users] Directory in split brain does not heal - Gfs 9.2
If you refer to /<path_to_brick>/.glusterfs/<gfid_first_2_characters>/<gfid_second_2_characters>/gfid - it' s a hard link to the file on the brick.Directories in the .glusterfs are just symbolic links. Can you clarify what you are planing to delete ? Best Regards,Strahil Nikolov? On Wed, Aug 17, 2022 at 14:35, Ilias Chasapakis forumZFD<chasapakis at forumZFD.de> wrote: Hi Thomas, Thanks again for your replies and patience :) We have also offline backups of the files. So, just to verify I understood this correctly, deletion of a .glusterfs-gfid file doesn't inherently include the risk of the loss of the complete brick, right? I saw you already applied this for your purposes so it worked for you... But just as a confirmation. Of course it is fully understood that the operational risk is on our side. It is just an "information-wise" question :) Best regards Ilias Am 17.08.22 um 12:47 schrieb Thomas B?tzler: _filtered {} _filtered {} _filtered {} _filtered {} _filtered {} _filtered {}#yiv5583267061 p.yiv5583267061MsoNormal, #yiv5583267061 li.yiv5583267061MsoNormal, #yiv5583267061 div.yiv5583267061MsoNormal {margin:0cm;font-size:11.0pt;font-family:"Calibri", sans-serif;}#yiv5583267061 a:link, #yiv5583267061 span.yiv5583267061MsoHyperlink {color:blue;text-decoration:underline;}#yiv5583267061 pre {margin:0cm;font-size:10.0pt;font-family:"Courier New";}#yiv5583267061 span.yiv5583267061HTMLVorformatiertZchn {font-family:"Consolas", serif;}#yiv5583267061 span.yiv5583267061E-MailFormatvorlage21 {font-family:"Calibri", sans-serif;color:windowtext;}#yiv5583267061 .yiv5583267061MsoChpDefault {font-size:10.0pt;}#yiv5583267061 div.yiv5583267061WordSection1 {} Hello Ilias, Please note that you can and should backup all of the file(s) involved in the split-brain by accessing them over the brick root instead of the gluster mount. That is also the reason why you?re not in danger of a failure cascade wiping out our data. ? Be careful when replacing bricks, though. You want that heal to go in the right direction ? ? Mit freundlichen Gr??en, i.A. Thomas B?tzler -- BRINGE Informationstechnik GmbH Zur Seeplatte 12 D-76228 Karlsruhe Germany ? Fon: +49 721 94246-0 Fon: +49 171 5438457 Fax: +49 721 94246-66 Web: http://www.bringe.de/ ? Gesch?ftsf?hrer: Dipl.-Ing. (FH) Martin Bringe Ust.Id: DE812936645, HRB 108943 Mannheim ? Von: Gluster-users <gluster-users-bounces at gluster.org> Im Auftrag von Ilias Chasapakis forumZFD Gesendet: Mittwoch, 17. August 2022 11:18 An: gluster-users at gluster.org Betreff: Re: [Gluster-users] Directory in split brain does not heal - Gfs 9.2 ? Thanks for the suggestions. My question is if the risk is actually related to only losing the file/dir or actually creating inconsistencies that span through the bricks and "break everything". Of course we have to take action anyway for this not to spread (as we already now have a second entry that developed an "unhealable" directory split-brain) so it is just a question of evaluation before acting. Am 12.08.22 um 18:12 schrieb Thomas B?tzler: Am 12.08.2022 um 17:12 schrieb Ilias Chasapakis forumZFD: Dear fellow gluster users, we are facing a problem with our replica 3 setup. Glusterfs version is 9.2. We have a problem with a directory that is in split-brain and we cannot manage to heal with: gluster volume heal gfsVol split-brain latest-mtime /folder The command throws the following error: "failed:Transport endpoint is not connected." So the split brain directory entry remains and and so the whole healing process is not completing and other entries get stuck. I saw there is a python script available https://github.com/joejulian/glusterfs-splitbrain Would that be a good solution to try? To be honest we are a bit concerned with deleting the gfid and the files from the brick manually as it seems it can create inconsistencies and break things... I can of course give you more information about our setup and situation, but if you already have some tip, that would be fantastic. You could at least verify what's going on: Go to your brick roots and list /folder from each. You have 3n bricks with n replica sets. Find the replica set where you can spot a difference. It's most likely a file or directory that's missing or different. If it's a file, do a ls -ain on the file on each brick in the replica set. It'll report an inode number. Do a find .glusterfs -inum from the brick root. You'll likely see that you have different gfid-files. To fix the problem, you have to help gluster along by cleaning up the mess. This is completely "do it at your own risk, it worked for me, ymmv": cp (not mv!) a copy of the file you want to keep. On each brick in the replica-set, delete the gfid-file and the datafile. Try a heal on the volume and verify that you can access the path in question using the glusterfs mount. Copy back your salvaged file using the glusterfs mount. We had this happening quite often on a heavily loaded glusterfs shared filesystem that held a mail-spool. There would be parallel accesses trying to mv files and sometimes we'd end up with mismatched data on the bricks of the replica set. I've reported this on github, but apparently it wasn't seen as a serious problem. We've moved on to ceph FS now. That sure has bugs, too, but hopefully not as aggravating. MfG, i.A. Thomas B?tzler -- BRINGE Informationstechnik GmbH Zur Seeplatte 12 D-76228 Karlsruhe Germany ? Fon: +49 721 94246-0 Fon: +49 171 5438457 Fax: +49 721 94246-66 Web: http://www.bringe.de/ ? Gesch?ftsf?hrer: Dipl.-Ing. (FH) Martin Bringe Ust.Id: DE812936645, HRB 108943 Mannheim ________ ? ? ? Community Meeting Calendar: ? Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- ?forumZFD Entschieden f?r Frieden | Committed to Peace ? Ilias Chasapakis Referent IT | IT Consultant ? Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service Am K?lner Brett 8 | 50825 K?ln | Germany ? Tel 0221 91273243 | Fax 0221 91273299 | http://www.forumZFD.de ? Vorstand nach ? 26 BGB, einzelvertretungsberechtigt | Executive Board: Oliver Knabe (Vorsitz | Chair), Jens von Bargen, Alexander Mauz VR 17651 Amtsgericht K?ln ? Spenden | Donations: IBAN DE37 3702 0500 0008 2401 01 BIC BFSWDE33XXX -- ?forumZFD Entschieden f?r Frieden | Committed to Peace Ilias Chasapakis Referent IT | IT Consultant Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service Am K?lner Brett 8 | 50825 K?ln | Germany Tel 0221 91273243 | Fax 0221 91273299 | http://www.forumZFD.de Vorstand nach ? 26 BGB, einzelvertretungsberechtigt | Executive Board: Oliver Knabe (Vorsitz | Chair), Jens von Bargen, Alexander Mauz VR 17651 Amtsgericht K?ln Spenden | Donations: IBAN DE37 3702 0500 0008 2401 01 BIC BFSWDE33XXX ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220818/c111ff82/attachment.html>
Ilias Chasapakis forumZFD
2022-Aug-31 12:30 UTC
[Gluster-users] Directory in split brain does not heal - Gfs 9.2
Hi all, so we went further and deleted the entries (data and gfid). The split brain is now gone, but when we triggered a heal again (simple and full) we have many entries stuck in healing (no split-brain items). They are there since days/weeks and still appearing. We would like to heal single files but as they are not in split brain I guess this is not possible right? The "source-brick" technique works only in that case I think? A concrete example of one of that files that are stuck in the healing queue: I checked the attributes with getfattr and saw that one of the nodes does not have nor the data or the gfid. Missing completely. How could I trigger a replication from the "good copy" to the gluster node that does not have the file? Is it possible for entries *not* in split brain? Doing a listing on the mount side (ls) of the affected directory did not seem to trigger a heal. Also the shd logs have some ambiguous (for me) entries. The sink value is empty, shouldn?t it be a number indicating it is healing?> [2022-08-28 17:22:11.098604 +0000] I [MSGID: 108026] > [afr-self-heal-common.c:1742:afr_log_selfheal] 0-vol-replicate-0: > Completed metadata selfheal on 94503c97-7731-4aa1-8a14-2c6ea5a84a15. > sources=1 [2]? sinks> [2022-08-28 17:22:16.227091 +0000] I [MSGID: 108026] > [afr-self-heal-common.c:1742:afr_log_selfheal] 0-gv-ho-replicate-0: > Completed metadata selfheal on 94503c97-7731-4aa1-8a14-2c6ea5a84a15. > sources=1 [2]? sinks=I try to use the guide here: https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-afr/#ii-self-heal-is-stuck-not-getting-completed but find difficult to apply. Do you have any suggestions on how to "unblock" these stuck entries and what is a methodic approach to troubleshooting this situation? Finally I would like to ask if the risk of updating the glusters (we have pending updates now) would be too dangerous without previously fixing the unhealed entries. Our hope is that an update could eventually fix the problems. Best regards. Ilias Am 18.08.22 um 23:38 schrieb Strahil Nikolov:> If you refer to > /<path_to_brick>/.glusterfs/<gfid_first_2_characters>/<gfid_second_2_characters>/gfid > - it' s a hard link to the file on the brick. > Directories in the .glusterfs are just symbolic links. > > Can you clarify what you are planing to delete ? > > Best Regards, > Strahil Nikolov > > On Wed, Aug 17, 2022 at 14:35, Ilias Chasapakis forumZFD > <chasapakis at forumZFD.de> wrote: > > Hi Thomas, > > Thanks again for your replies and patience :) > > We have also offline backups of the files. > > So, just to verify I understood this correctly, deletion of a > .glusterfs-gfid file doesn't inherently include the risk of the > loss of the complete brick, right? > > I saw you already applied this for your purposes so it worked for > you... But just as a confirmation. Of course it is fully > understood that the operational risk is on our side. > > It is just an "information-wise" question :) > > Best regards > Ilias > > Am 17.08.22 um 12:47 schrieb Thomas B?tzler: >> >> Hello Ilias, >> >> Please note that you can and should backup all of the file(s) >> involved in the split-brain by accessing them over the brick root >> instead of the gluster mount. That is also the reason why you?re >> not in danger of a failure cascade wiping out our data. >> >> Be careful when replacing bricks, though. You want that heal to >> go in the right direction ? >> >> Mit freundlichen Gr??en, >> >> i.A. Thomas B?tzler >> >> -- >> >> BRINGE Informationstechnik GmbH >> >> Zur Seeplatte 12 >> >> D-76228 Karlsruhe >> >> Germany >> >> Fon: +49 721 94246-0 >> >> Fon: +49 171 5438457 >> >> Fax: +49 721 94246-66 >> >> Web: http://www.bringe.de/ <http://www.bringe.de/> >> >> Gesch?ftsf?hrer: Dipl.-Ing. (FH) Martin Bringe >> >> Ust.Id: DE812936645, HRB 108943 Mannheim >> >> *Von:* Gluster-users <gluster-users-bounces at gluster.org> >> <mailto:gluster-users-bounces at gluster.org> *Im Auftrag von *Ilias >> Chasapakis forumZFD >> *Gesendet:* Mittwoch, 17. August 2022 11:18 >> *An:* gluster-users at gluster.org <mailto:gluster-users at gluster.org> >> *Betreff:* Re: [Gluster-users] Directory in split brain does not >> heal - Gfs 9.2 >> >> Thanks for the suggestions. My question is if the risk is >> actually related to only losing the file/dir or actually creating >> inconsistencies that span through the bricks and "break everything". >> Of course we have to take action anyway for this not to spread >> (as we already now have a second entry that developed an >> "unhealable" directory split-brain) so it is just a question of >> evaluation before acting. >> >> Am 12.08.22 um 18:12 schrieb Thomas B?tzler: >> >> Am 12.08.2022 um 17:12 schrieb Ilias Chasapakis forumZFD: >> >> Dear fellow gluster users, >> >> we are facing a problem with our replica 3 setup. >> Glusterfs version is 9.2. >> >> We have a problem with a directory that is in split-brain >> and we cannot manage to heal with: >> >> gluster volume heal gfsVol split-brain latest-mtime >> /folder >> >> The command throws the following error: "failed:Transport >> endpoint is not connected." >> >> So the split brain directory entry remains and and so the >> whole healing process is not completing and other entries >> get stuck. >> >> I saw there is a python script available >> https://github.com/joejulian/glusterfs-splitbrain >> <https://github.com/joejulian/glusterfs-splitbrain> Would >> that be a good solution to try? To be honest we are a bit >> concerned with deleting the gfid and the files from the >> brick manually as it seems it can create inconsistencies >> and break things... I can of course give you more >> information about our setup and situation, but if you >> already have some tip, that would be fantastic. >> >> You could at least verify what's going on: Go to your brick >> roots and list /folder from each. You have 3n bricks with n >> replica sets. Find the replica set where you can spot a >> difference. It's most likely a file or directory that's >> missing or different. If it's a file, do a ls -ain on the >> file on each brick in the replica set. It'll report an inode >> number. Do a find .glusterfs -inum from the brick root. >> You'll likely see that you have different gfid-files. >> >> To fix the problem, you have to help gluster along by >> cleaning up the mess. This is completely "do it at your own >> risk, it worked for me, ymmv": cp (not mv!) a copy of the >> file you want to keep. On each brick in the replica-set, >> delete the gfid-file and the datafile. Try a heal on the >> volume and verify that you can access the path in question >> using the glusterfs mount. Copy back your salvaged file using >> the glusterfs mount. >> >> We had this happening quite often on a heavily loaded >> glusterfs shared filesystem that held a mail-spool. There >> would be parallel accesses trying to mv files and sometimes >> we'd end up with mismatched data on the bricks of the replica >> set. I've reported this on github, but apparently it wasn't >> seen as a serious problem. We've moved on to ceph FS now. >> That sure has bugs, too, but hopefully not as aggravating. >> >> MfG, >> >> i.A. Thomas B?tzler >> >> -- >> >> BRINGE Informationstechnik GmbH >> >> Zur Seeplatte 12 >> >> D-76228 Karlsruhe >> >> Germany >> >> >> >> Fon: +49 721 94246-0 >> >> Fon: +49 171 5438457 >> >> Fax: +49 721 94246-66 >> >> Web:http://www.bringe.de/ <http://www.bringe.de/> >> >> >> >> Gesch?ftsf?hrer: Dipl.-Ing. (FH) Martin Bringe >> >> Ust.Id: DE812936645, HRB 108943 Mannheim >> >> >> >> ________ >> >> >> >> >> >> >> >> Community Meeting Calendar: >> >> >> >> Schedule - >> >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> >> Bridge:https://meet.google.com/cpu-eiue-hvk <https://meet.google.com/cpu-eiue-hvk> >> >> Gluster-users mailing list >> >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> >> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users> >> >> -- >> ?forumZFD >> Entschieden f?r Frieden | Committed to Peace >> >> Ilias Chasapakis >> Referent IT | IT Consultant >> >> Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service >> Am K?lner Brett 8 | 50825 K?ln | Germany >> >> Tel 0221 91273243 | Fax 0221 91273299 |http://www.forumZFD.de <http://www.forumZFD.de> >> >> Vorstand nach ? 26 BGB, einzelvertretungsberechtigt | Executive Board: >> Oliver Knabe (Vorsitz | Chair), Jens von Bargen, Alexander Mauz >> VR 17651 Amtsgericht K?ln >> >> Spenden | Donations: IBAN DE37 3702 0500 0008 2401 01 BIC BFSWDE33XXX > > -- > ?forumZFD > Entschieden f?r Frieden | Committed to Peace > > Ilias Chasapakis > Referent IT | IT Consultant > > Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service > Am K?lner Brett 8 | 50825 K?ln | Germany > > Tel 0221 91273243 | Fax 0221 91273299 |http://www.forumZFD.de <http://www.forumZFD.de> > > Vorstand nach ? 26 BGB, einzelvertretungsberechtigt | Executive Board: > Oliver Knabe (Vorsitz | Chair), Jens von Bargen, Alexander Mauz > VR 17651 Amtsgericht K?ln > > Spenden | Donations: IBAN DE37 3702 0500 0008 2401 01 BIC BFSWDE33XXX > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-- ?forumZFD Entschieden f?r Frieden | Committed to Peace Ilias Chasapakis Referent IT | IT Consultant Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service Am K?lner Brett 8 | 50825 K?ln | Germany Tel 0221 91273243 | Fax 0221 91273299 |http://www.forumZFD.de Vorstand nach ? 26 BGB, einzelvertretungsberechtigt | Executive Board: Oliver Knabe (Vorsitz | Chair), Jens von Bargen, Alexander Mauz VR 17651 Amtsgericht K?ln Spenden | Donations: IBAN DE37 3702 0500 0008 2401 01 BIC BFSWDE33XXX -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220831/36632ed8/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: OpenPGP_signature Type: application/pgp-signature Size: 665 bytes Desc: OpenPGP digital signature URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220831/36632ed8/attachment.sig>
Strahil Nikolov
2022-Sep-03 06:17 UTC
[Gluster-users] Directory in split brain does not heal - Gfs 9.2
I would start by reading the 3 blogs from Ravi:https://ravispeaks.wordpress.com/2019/04/05/glusterfs-afr-the-complete-guide/https://ravispeaks.wordpress.com/2019/04/15/gluster-afr-the-complete-guide-part-2/https://ravispeaks.wordpress.com/2019/05/14/gluster-afr-the-complete-guide-part-3/ All pending heals a hard links created in? .glusterfs/indices/xattrop?.Check the attributes of a the gfids there (one by one) for differences? on the bricks and if they are the same - you can delete them from .glusterfs/indices/xattrop (the root entry must stay !!!). If not, the attributes can hint you what happened and which is the good copy. Best Regards,Strahil Nikolov? On Wed, Aug 31, 2022 at 15:30, Ilias Chasapakis forumZFD<chasapakis at forumZFD.de> wrote: Hi all, so we went further and deleted the entries (data and gfid). The split brain is now gone, but when we triggered a heal again (simple and full) we have many entries stuck in healing (no split-brain items). They are there since days/weeks and still appearing. We would like to heal single files but as they are not in split brain I guess this is not possible right? The "source-brick" technique works only in that case I think? A concrete example of one of that files that are stuck in the healing queue: I checked the attributes with getfattr and saw that one of the nodes does not have nor the data or the gfid. Missing completely. How could I trigger a replication from the "good copy" to the gluster node that does not have the file? Is it possible for entries *not* in split brain? Doing a listing on the mount side (ls) of the affected directory did not seem to trigger a heal. Also the shd logs have some ambiguous (for me) entries. The sink value is empty, shouldn?t it be a number indicating it is healing? [2022-08-28 17:22:11.098604 +0000] I [MSGID: 108026] [afr-self-heal-common.c:1742:afr_log_selfheal] 0-vol-replicate-0: Completed metadata selfheal on 94503c97-7731-4aa1-8a14-2c6ea5a84a15. sources=1 [2]? sinks= [2022-08-28 17:22:16.227091 +0000] I [MSGID: 108026] [afr-self-heal-common.c:1742:afr_log_selfheal] 0-gv-ho-replicate-0: Completed metadata selfheal on 94503c97-7731-4aa1-8a14-2c6ea5a84a15. sources=1 [2]? sinks= I try to use the guide here: https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-afr/#ii-self-heal-is-stuck-not-getting-completed but find difficult to apply. Do you have any suggestions on how to "unblock" these stuck entries and what is a methodic approach to troubleshooting this situation? Finally I would like to ask if the risk of updating the glusters (we have pending updates now) would be too dangerous without previously fixing the unhealed entries. Our hope is that an update could eventually fix the problems. Best regards. Ilias Am 18.08.22 um 23:38 schrieb Strahil Nikolov: If you refer to/<path_to_brick>/.glusterfs/<gfid_first_2_characters>/<gfid_second_2_characters>/gfid - it' s a hard link to the file on the brick. Directories in the .glusterfs are just symbolic links. Can you clarify what you are planing to delete ? Best Regards, Strahil Nikolov? On Wed, Aug 17, 2022 at 14:35, Ilias Chasapakis forumZFD <chasapakis at forumZFD.de> wrote: Hi Thomas, Thanks again for your replies and patience :) We have also offline backups of the files. So, just to verify I understood this correctly, deletion of a .glusterfs-gfid file doesn't inherently include the risk of the loss of the complete brick, right? I saw you already applied this for your purposes so it worked for you... But just as a confirmation. Of course it is fully understood that the operational risk is on our side. It is just an "information-wise" question :) Best regards Ilias Am 17.08.22 um 12:47 schrieb Thomas B?tzler: #yiv8110016555 filtered {}#yiv8110016555 filtered {}#yiv8110016555 filtered {}#yiv8110016555 filtered {}#yiv8110016555 filtered {}#yiv8110016555 filtered {}#yiv8110016555 p.yiv8110016555MsoNormal, #yiv8110016555 li.yiv8110016555MsoNormal, #yiv8110016555 div.yiv8110016555MsoNormal {margin:0cm;font-size:11.0pt;font-family:sans-serif;}#yiv8110016555 a:link, #yiv8110016555 span.yiv8110016555MsoHyperlink {color:blue;text-decoration:underline;}#yiv8110016555 pre {margin:0cm;font-size:10.0pt;}#yiv8110016555 span.yiv8110016555HTMLVorformatiertZchn {font-family:serif;}#yiv8110016555 span.yiv8110016555E-MailFormatvorlage21 {font-family:sans-serif;color:windowtext;}#yiv8110016555 .yiv8110016555MsoChpDefault {font-size:10.0pt;}#yiv8110016555 div.yiv8110016555WordSection1 {} Hello Ilias, Please note that you can and should backup all of the file(s) involved in the split-brain by accessing them over the brick root instead of the gluster mount. That is also the reason why you?re not in danger of a failure cascade wiping out our data. ? Be careful when replacing bricks, though. You want that heal to go in the right direction ? ? Mit freundlichen Gr??en, i.A. Thomas B?tzler -- BRINGE Informationstechnik GmbH Zur Seeplatte 12 D-76228 Karlsruhe Germany ? Fon: +49 721 94246-0 Fon: +49 171 5438457 Fax: +49 721 94246-66 Web: http://www.bringe.de/ ? Gesch?ftsf?hrer: Dipl.-Ing. (FH) Martin Bringe Ust.Id: DE812936645, HRB 108943 Mannheim ? Von: Gluster-users <gluster-users-bounces at gluster.org> Im Auftrag von Ilias Chasapakis forumZFD Gesendet: Mittwoch, 17. August 2022 11:18 An: gluster-users at gluster.org Betreff: Re: [Gluster-users] Directory in split brain does not heal - Gfs 9.2 ? Thanks for the suggestions. My question is if the risk is actually related to only losing the file/dir or actually creating inconsistencies that span through the bricks and "break everything". Of course we have to take action anyway for this not to spread (as we already now have a second entry that developed an "unhealable" directory split-brain) so it is just a question of evaluation before acting. Am 12.08.22 um 18:12 schrieb Thomas B?tzler: Am 12.08.2022 um 17:12 schrieb Ilias Chasapakis forumZFD: Dear fellow gluster users, we are facing a problem with our replica 3 setup. Glusterfs version is 9.2. We have a problem with a directory that is in split-brain and we cannot manage to heal with: gluster volume heal gfsVol split-brain latest-mtime /folder The command throws the following error: "failed:Transport endpoint is not connected." So the split brain directory entry remains and and so the whole healing process is not completing and other entries get stuck. I saw there is a python script available https://github.com/joejulian/glusterfs-splitbrain Would that be a good solution to try? To be honest we are a bit concerned with deleting the gfid and the files from the brick manually as it seems it can create inconsistencies and break things... I can of course give you more information about our setup and situation, but if you already have some tip, that would be fantastic. You could at least verify what's going on: Go to your brick roots and list /folder from each. You have 3n bricks with n replica sets. Find the replica set where you can spot a difference. It's most likely a file or directory that's missing or different. If it's a file, do a ls -ain on the file on each brick in the replica set. It'll report an inode number. Do a find .glusterfs -inum from the brick root. You'll likely see that you have different gfid-files. To fix the problem, you have to help gluster along by cleaning up the mess. This is completely "do it at your own risk, it worked for me, ymmv": cp (not mv!) a copy of the file you want to keep. On each brick in the replica-set, delete the gfid-file and the datafile. Try a heal on the volume and verify that you can access the path in question using the glusterfs mount. Copy back your salvaged file using the glusterfs mount. We had this happening quite often on a heavily loaded glusterfs shared filesystem that held a mail-spool. There would be parallel accesses trying to mv files and sometimes we'd end up with mismatched data on the bricks of the replica set. I've reported this on github, but apparently it wasn't seen as a serious problem. We've moved on to ceph FS now. That sure has bugs, too, but hopefully not as aggravating. MfG, i.A. Thomas B?tzler -- BRINGE Informationstechnik GmbH Zur Seeplatte 12 D-76228 Karlsruhe Germany ? Fon: +49 721 94246-0 Fon: +49 171 5438457 Fax: +49 721 94246-66 Web: http://www.bringe.de/ ? Gesch?ftsf?hrer: Dipl.-Ing. (FH) Martin Bringe Ust.Id: DE812936645, HRB 108943 Mannheim ________ ? ? ? Community Meeting Calendar: ? Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- ?forumZFD Entschieden f?r Frieden | Committed to Peace ? Ilias Chasapakis Referent IT | IT Consultant ? Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service Am K?lner Brett 8 | 50825 K?ln | Germany ? Tel 0221 91273243 | Fax 0221 91273299 | http://www.forumZFD.de ? Vorstand nach ? 26 BGB, einzelvertretungsberechtigt | Executive Board: Oliver Knabe (Vorsitz | Chair), Jens von Bargen, Alexander Mauz VR 17651 Amtsgericht K?ln ? Spenden | Donations: IBAN DE37 3702 0500 0008 2401 01 BIC BFSWDE33XXX -- ?forumZFD Entschieden f?r Frieden | Committed to Peace Ilias Chasapakis Referent IT | IT Consultant Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service Am K?lner Brett 8 | 50825 K?ln | Germany Tel 0221 91273243 | Fax 0221 91273299 | http://www.forumZFD.de Vorstand nach ? 26 BGB, einzelvertretungsberechtigt | Executive Board: Oliver Knabe (Vorsitz | Chair), Jens von Bargen, Alexander Mauz VR 17651 Amtsgericht K?ln Spenden | Donations: IBAN DE37 3702 0500 0008 2401 01 BIC BFSWDE33XXX ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- ?forumZFD Entschieden f?r Frieden | Committed to Peace Ilias Chasapakis Referent IT | IT Consultant Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service Am K?lner Brett 8 | 50825 K?ln | Germany Tel 0221 91273243 | Fax 0221 91273299 | http://www.forumZFD.de Vorstand nach ? 26 BGB, einzelvertretungsberechtigt | Executive Board: Oliver Knabe (Vorsitz | Chair), Jens von Bargen, Alexander Mauz VR 17651 Amtsgericht K?ln Spenden | Donations: IBAN DE37 3702 0500 0008 2401 01 BIC BFSWDE33XXX -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220903/a3d9ebda/attachment.html>