dpgluster at posteo.de
2022-Sep-14 11:08 UTC
[Gluster-users] gluster volume not healing - remote operation failed
Hi folks, my gluster volume isn't fully healing. We had an outage couple days ago and all other files got healed successfully. Now - days later - i can see there are still two gfid's per node remaining in healing list. root at storage-001~# for i in `gluster volume list`; do gluster volume heal $i info; done Brick storage-003.mydomain.com:/mnt/bricks/g-volume-myvolume <gfid:612ebae7-3df2-467f-aa02-47d9e3bafc1a> <gfid:876597cd-702a-49ec-a9ed-46d21f90f754> Status: Connected Number of entries: 2 Brick storage-002.mydomain.com:/mnt/bricks/g-volume-myvolume <gfid:a4babc5a-bd5a-4429-b65e-758651d5727c> <gfid:48791313-e5e7-44df-bf99-3ebc8d4cf5d5> Status: Connected Number of entries: 2 Brick storage-001.mydomain.com:/mnt/bricks/g-volume-myvolume <gfid:a4babc5a-bd5a-4429-b65e-758651d5727c> <gfid:48791313-e5e7-44df-bf99-3ebc8d4cf5d5> Status: Connected Number of entries: 2 In the log i can see that the glustershd process is invoked to heal the reamining files but fails with "remote operation failed". [2022-09-14 10:56:50.007978 +0000] I [MSGID: 108026] [afr-self-heal-entry.c:1053:afr_selfheal_entry_do] 0-g-volume-myvolume-replicate-0: performing entry selfheal on 48791313-e5e7-44df-bf99-3ebc8d4cf5d5 [2022-09-14 10:56:50.008428 +0000] I [MSGID: 108026] [afr-self-heal-entry.c:1053:afr_selfheal_entry_do] 0-g-volume-myvolume-replicate-0: performing entry selfheal on a4babc5a-bd5a-4429-b65e-758651d5727c [2022-09-14 10:56:50.015005 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-g-volume-myvolume-client-2: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}] [2022-09-14 10:56:50.015007 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-g-volume-myvolume-client-3: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}] [2022-09-14 10:56:50.015138 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-g-volume-myvolume-client-4: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}] [2022-09-14 10:56:50.614082 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-g-volume-myvolume-client-2: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}] [2022-09-14 10:56:50.614108 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-g-volume-myvolume-client-3: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}] [2022-09-14 10:56:50.614099 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-g-volume-myvolume-client-4: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}] [2022-09-14 10:56:51.619623 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-g-volume-myvolume-client-2: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}] [2022-09-14 10:56:51.619630 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-g-volume-myvolume-client-3: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}] [2022-09-14 10:56:51.619632 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-g-volume-myvolume-client-4: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}] The gluster is running with opversion 90000 on CentOS. There are no entries in split brain. How can i get these files finally healed? Thanks in advance.
Strahil Nikolov
2022-Sep-14 19:54 UTC
[Gluster-users] gluster volume not healing - remote operation failed
Have you checked which files are those and what attributes are having. Usually I use method2 (don't forget to mounr the volume on another location with the aux-gfid-mount? option) to find the path to the brick: https://docs.gluster.org/en/main/Troubleshooting/gfid-to-path/ Then check the attributes (check on all nodes): getfattr -d -m . -e hex </brick/path/to/file> Best Regards,Strahil Nikolov? Hi folks, my gluster volume isn't fully healing. We had an outage couple days ago and all other files got healed successfully. Now - days later - i can see there are still two gfid's per node remaining in healing list. root at storage-001~# for i in `gluster volume list`; do gluster volume heal $i info; done Brick storage-003.mydomain.com:/mnt/bricks/g-volume-myvolume <gfid:612ebae7-3df2-467f-aa02-47d9e3bafc1a> <gfid:876597cd-702a-49ec-a9ed-46d21f90f754> Status: Connected Number of entries: 2 Brick storage-002.mydomain.com:/mnt/bricks/g-volume-myvolume <gfid:a4babc5a-bd5a-4429-b65e-758651d5727c> <gfid:48791313-e5e7-44df-bf99-3ebc8d4cf5d5> Status: Connected Number of entries: 2 Brick storage-001.mydomain.com:/mnt/bricks/g-volume-myvolume <gfid:a4babc5a-bd5a-4429-b65e-758651d5727c> <gfid:48791313-e5e7-44df-bf99-3ebc8d4cf5d5> Status: Connected Number of entries: 2 In the log i can see that the glustershd process is invoked to heal the reamining files but fails with "remote operation failed". [2022-09-14 10:56:50.007978 +0000] I [MSGID: 108026] [afr-self-heal-entry.c:1053:afr_selfheal_entry_do] 0-g-volume-myvolume-replicate-0: performing entry selfheal on 48791313-e5e7-44df-bf99-3ebc8d4cf5d5 [2022-09-14 10:56:50.008428 +0000] I [MSGID: 108026] [afr-self-heal-entry.c:1053:afr_selfheal_entry_do] 0-g-volume-myvolume-replicate-0: performing entry selfheal on a4babc5a-bd5a-4429-b65e-758651d5727c [2022-09-14 10:56:50.015005 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-g-volume-myvolume-client-2: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}] [2022-09-14 10:56:50.015007 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-g-volume-myvolume-client-3: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}] [2022-09-14 10:56:50.015138 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-g-volume-myvolume-client-4: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}] [2022-09-14 10:56:50.614082 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-g-volume-myvolume-client-2: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}] [2022-09-14 10:56:50.614108 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-g-volume-myvolume-client-3: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}] [2022-09-14 10:56:50.614099 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-g-volume-myvolume-client-4: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}] [2022-09-14 10:56:51.619623 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-g-volume-myvolume-client-2: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}] [2022-09-14 10:56:51.619630 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-g-volume-myvolume-client-3: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}] [2022-09-14 10:56:51.619632 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-g-volume-myvolume-client-4: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}] The gluster is running with opversion 90000 on CentOS. There are no entries in split brain. How can i get these files finally healed? Thanks in advance. ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220914/9e00ca4a/attachment.html>
Eli V
2022-Nov-02 11:39 UTC
[Gluster-users] gluster volume not healing - remote operation failed
On Wed, Sep 14, 2022 at 7:08 AM <dpgluster at posteo.de> wrote:> > Hi folks, > > my gluster volume isn't fully healing. We had an outage couple days ago > and all other files got healed successfully. Now - days later - i can > see there are still two gfid's per node remaining in healing list. > > root at storage-001~# for i in `gluster volume list`; do gluster volume > heal $i info; done > Brick storage-003.mydomain.com:/mnt/bricks/g-volume-myvolume > <gfid:612ebae7-3df2-467f-aa02-47d9e3bafc1a> > <gfid:876597cd-702a-49ec-a9ed-46d21f90f754> > Status: Connected > Number of entries: 2 > > Brick storage-002.mydomain.com:/mnt/bricks/g-volume-myvolume > <gfid:a4babc5a-bd5a-4429-b65e-758651d5727c> > <gfid:48791313-e5e7-44df-bf99-3ebc8d4cf5d5> > Status: Connected > Number of entries: 2 > > Brick storage-001.mydomain.com:/mnt/bricks/g-volume-myvolume > <gfid:a4babc5a-bd5a-4429-b65e-758651d5727c> > <gfid:48791313-e5e7-44df-bf99-3ebc8d4cf5d5> > Status: Connected > Number of entries: 2 > > In the log i can see that the glustershd process is invoked to heal the > reamining files but fails with "remote operation failed". > [2022-09-14 10:56:50.007978 +0000] I [MSGID: 108026] > [afr-self-heal-entry.c:1053:afr_selfheal_entry_do] > 0-g-volume-myvolume-replicate-0: performing entry selfheal on > 48791313-e5e7-44df-bf99-3ebc8d4cf5d5 > [2022-09-14 10:56:50.008428 +0000] I [MSGID: 108026] > [afr-self-heal-entry.c:1053:afr_selfheal_entry_do] > 0-g-volume-myvolume-replicate-0: performing entry selfheal on > a4babc5a-bd5a-4429-b65e-758651d5727c > [2022-09-14 10:56:50.015005 +0000] E [MSGID: 114031] > [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] > 0-g-volume-myvolume-client-2: remote operation failed. [{path=(null)}, > {errno=22}, {error=Invalid argument}] > [2022-09-14 10:56:50.015007 +0000] E [MSGID: 114031] > [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] > 0-g-volume-myvolume-client-3: remote operation failed. [{path=(null)}, > {errno=22}, {error=Invalid argument}] > [2022-09-14 10:56:50.015138 +0000] E [MSGID: 114031] > [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] > 0-g-volume-myvolume-client-4: remote operation failed. [{path=(null)}, > {errno=22}, {error=Invalid argument}] > [2022-09-14 10:56:50.614082 +0000] E [MSGID: 114031] > [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] > 0-g-volume-myvolume-client-2: remote operation failed. [{path=(null)}, > {errno=22}, {error=Invalid argument}] > [2022-09-14 10:56:50.614108 +0000] E [MSGID: 114031] > [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] > 0-g-volume-myvolume-client-3: remote operation failed. [{path=(null)}, > {errno=22}, {error=Invalid argument}] > [2022-09-14 10:56:50.614099 +0000] E [MSGID: 114031] > [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] > 0-g-volume-myvolume-client-4: remote operation failed. [{path=(null)}, > {errno=22}, {error=Invalid argument}] > [2022-09-14 10:56:51.619623 +0000] E [MSGID: 114031] > [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] > 0-g-volume-myvolume-client-2: remote operation failed. [{path=(null)}, > {errno=22}, {error=Invalid argument}] > [2022-09-14 10:56:51.619630 +0000] E [MSGID: 114031] > [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] > 0-g-volume-myvolume-client-3: remote operation failed. [{path=(null)}, > {errno=22}, {error=Invalid argument}] > [2022-09-14 10:56:51.619632 +0000] E [MSGID: 114031] > [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] > 0-g-volume-myvolume-client-4: remote operation failed. [{path=(null)}, > {errno=22}, {error=Invalid argument}] > > The gluster is running with opversion 90000 on CentOS. There are no > entries in split brain. > > How can i get these files finally healed? > > Thanks in advance. > ________I've seen this too. The only I've found to fix it is run a find under each of my bricks and run getfattr -n trusted.gfid -e hex on all the files, saving the output to a text file and then greping for the problematic gfid's to identify which file it is. Accessing the files through the gluster fuse mount can sometimes heal them, but I've had symlinks I just had to rm and recreate and other files that were just failed removals that only exist in one brick and no others that have to be removed by hand. Happens often enough I wrote a script that traverses all files under a brick and recursively removes the file in the brick and it's gfid version under .glusterfs. I can dig it up if you're still interested, don't have it handy atm.