Kingsley Tart
2022-May-27 08:45 UTC
[Gluster-users] transport endpoint not connected on just 2 files
Hi, thanks. OK that's interesting. Picking one of the files, on bricks A and B I see this (and all of the values are identical between bricks A and B): trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gw-runqueues-client-2=0x000000010000000200000000 trusted.gfid=0xa40bb83ff3784ae09c997d272296a7a9 trusted.gfid2path.06eddbe9be9c7c75=0x30323665396561652d613661662d346365642d623863632d6261353037333339646364372f677733 trusted.glusterfs.mdata=0x01000000000000000000000000628ec57700000000007168bb00000000628ec576000000000000000000000000628ec5760000000000000000 and on brick C I see this: trusted.gfid=0xd73992aee03e4021824b1baced973df3 trusted.gfid2path.06eddbe9be9c7c75=0x30323665396561652d613661662d346365642d623863632d6261353037333339646364372f677733 trusted.glusterfs.mdata=0x01000000000000000000000000628ec5230000000030136ca000000000628ec523000000000000000000000000628ec5230000000000000000 So brick C is missing the trusted.afr attributes and the trusted.gfid and mdata differ. What do I need to do to fix this? Cheers, Kingsley. On Fri, 2022-05-27 at 03:59 +0000, Strahil Nikolov wrote:> Check the file attributes on all bricks: > > getfattr -d -e hex -m. /data/brick/gw-runqueues/<path to file> > > > Best Regards, > Strahil Nikolov > > > On Thu, May 26, 2022 at 16:05, Kingsley Tart > > <gluster at gluster.dogwind.com> wrote: > > Hi, > > > > I've got a strange issue where on all clients I've tested on > > (tested on > > 4) I have "transport endpoint is not connected" on two files in a > > directory, whereas other files can be read fine. > > > > Any ideas? > > > > On one of the servers (all same version): > > > > # gluster --version > > glusterfs 9.1 > > > > On one of the clients (same thing with all of them) - problem with > > files "gw3" and "gw11": > > > > [root at gw6 btl]# cd /mnt/runqueues/runners/ > > [root at gw6 runners]# ls -la > > ls: cannot access gw11: Transport endpoint is not connected > > ls: cannot access gw3: Transport endpoint is not connected > > total 8 > > drwxr-xr-x 2 root root 4096 May 26 09:48 . > > drwxr-xr-x 13 root root 4096 Apr 12 2021 .. > > -rw-r--r-- 1 root root 0 May 26 09:49 gw1 > > -rw-r--r-- 1 root root 0 May 26 09:49 gw10 > > -????????? ? ? ? ? ? gw11 > > -rw-r--r-- 1 root root 0 May 26 09:49 gw2 > > -????????? ? ? ? ? ? gw3 > > -rw-r--r-- 1 root root 0 May 26 09:49 gw4 > > -rw-r--r-- 1 root root 0 May 26 09:49 gw6 > > -rw-r--r-- 1 root root 0 May 26 09:49 gw7 > > [root at gw6 runners]# cat * > > cat: gw11: Transport endpoint is not connected > > cat: gw3: Transport endpoint is not connected > > [root at gw6 runners]# > > > > > > Querying on a server shows those two problematic files: > > > > # gluster volume heal gw-runqueues info > > Brick gluster9a:/data/brick/gw-runqueues > > /runners > > /runners/gw11 > > /runners/gw3 > > Status: Connected > > Number of entries: 3 > > > > Brick gluster9b:/data/brick/gw-runqueues > > /runners > > /runners/gw11 > > /runners/gw3 > > Status: Connected > > Number of entries: 3 > > > > Brick gluster9c:/data/brick/gw-runqueues > > Status: Connected > > Number of entries: 0 > > > > > > However several hours later there's no obvious change. The servers > > have > > hardly any load and the volume is tiny. From a client: > > > > # find /mnt/runqueues | wc -l > > 35 > > > > > > glfsheal-gw-runqueues.log from server gluster9a: > > https://pastebin.com/7mPszBBM > > > > glfsheal-gw-runqueues.log from server gluster9b: > > https://pastebin.com/rxXs5Tcv > > > > > > Any pointers would be much appreciated! > > > > Cheers, > > Kingsley. > > > > ________ > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://meet.google.com/cpu-eiue-hvk > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220527/06ceb0a1/attachment.html>
Strahil Nikolov
2022-May-30 18:41 UTC
[Gluster-users] transport endpoint not connected on just 2 files
Make a backup from all bricks. Based on the info 2 of the bricks have the same copy while brickC has another copy (gfid mismatch). I would use mtime to identify the latest version and use that, but I have no clue what kind of application you have. Usually, It's not recommended to manipulate bricks directly, but in this case it might be necessary. The simplest way is to move the file on brick C (the only one that is different) away, but if you need exactly that one, you can rsync/scp it to the other 2 bricks. Best Regards,Strahil Nikolov On Fri, May 27, 2022 at 11:45, Kingsley Tart<gluster at gluster.dogwind.com> wrote: Hi, thanks. OK that's interesting. Picking one of the files, on bricks A and B I see this (and all of the values are identical between bricks A and B): trusted.afr.dirty=0x000000000000000000000000trusted.afr.gw-runqueues-client-2=0x000000010000000200000000trusted.gfid=0xa40bb83ff3784ae09c997d272296a7a9trusted.gfid2path.06eddbe9be9c7c75=0x30323665396561652d613661662d346365642d623863632d6261353037333339646364372f677733trusted.glusterfs.mdata=0x01000000000000000000000000628ec57700000000007168bb00000000628ec576000000000000000000000000628ec5760000000000000000 and on brick C I see this: trusted.gfid=0xd73992aee03e4021824b1baced973df3trusted.gfid2path.06eddbe9be9c7c75=0x30323665396561652d613661662d346365642d623863632d6261353037333339646364372f677733trusted.glusterfs.mdata=0x01000000000000000000000000628ec5230000000030136ca000000000628ec523000000000000000000000000628ec5230000000000000000 So brick C is missing the trusted.afr attributes and the trusted.gfid and mdata differ. What do I need to do to fix this? Cheers,Kingsley. On Fri, 2022-05-27 at 03:59 +0000, Strahil Nikolov wrote: Check the file attributes on all bricks: getfattr -d -e hex -m. /data/brick/gw-runqueues/<path to file> Best Regards,Strahil Nikolov On Thu, May 26, 2022 at 16:05, Kingsley Tart<gluster at gluster.dogwind.com> wrote:Hi, I've got a strange issue where on all clients I've tested on (tested on4) I have "transport endpoint is not connected" on two files in adirectory, whereas other files can be read fine. Any ideas? On one of the servers (all same version): # gluster --versionglusterfs 9.1 On one of the clients (same thing with all of them) - problem withfiles "gw3" and "gw11": [root at gw6 btl]# cd /mnt/runqueues/runners/[root at gw6 runners]# ls -lals: cannot access gw11: Transport endpoint is not connectedls: cannot access gw3: Transport endpoint is not connectedtotal 8drwxr-xr-x? 2 root root 4096 May 26 09:48 .drwxr-xr-x 13 root root 4096 Apr 12? 2021 ..-rw-r--r--? 1 root root? ? 0 May 26 09:49 gw1-rw-r--r--? 1 root root? ? 0 May 26 09:49 gw10-?????????? ? ?? ? ?? ? ? ?? ? ? ? ? ? ? gw11-rw-r--r--? 1 root root? ? 0 May 26 09:49 gw2-?????????? ? ?? ? ?? ? ? ?? ? ? ? ? ? ? gw3-rw-r--r--? 1 root root? ? 0 May 26 09:49 gw4-rw-r--r--? 1 root root? ? 0 May 26 09:49 gw6-rw-r--r--? 1 root root? ? 0 May 26 09:49 gw7[root at gw6 runners]# cat *cat: gw11: Transport endpoint is not connectedcat: gw3: Transport endpoint is not connected[root at gw6 runners]# Querying on a server shows those two problematic files: # gluster volume heal gw-runqueues infoBrick gluster9a:/data/brick/gw-runqueues/runners/runners/gw11/runners/gw3Status: ConnectedNumber of entries: 3 Brick gluster9b:/data/brick/gw-runqueues/runners/runners/gw11/runners/gw3Status: ConnectedNumber of entries: 3 Brick gluster9c:/data/brick/gw-runqueuesStatus: ConnectedNumber of entries: 0 However several hours later there's no obvious change. The servers havehardly any load and the volume is tiny. From a client: # find /mnt/runqueues | wc -l35 glfsheal-gw-runqueues.log from server gluster9a:https://pastebin.com/7mPszBBM glfsheal-gw-runqueues.log from server gluster9b:https://pastebin.com/rxXs5Tcv Any pointers would be much appreciated! Cheers,Kingsley. ________ Community Meeting Calendar: Schedule -Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTCBridge: https://meet.google.com/cpu-eiue-hvkGluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users ________ Community Meeting Calendar: Schedule -Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTCBridge: https://meet.google.com/cpu-eiue-hvkGluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220530/48c7b7f5/attachment.html>
Kingsley Tart
2022-Jun-07 11:50 UTC
[Gluster-users] transport endpoint not connected on just 2 files
Hi, Thanks - sorry for the late reply - I was suddenly swamped with other work then it was a UK holiday. I've tried rsync -A -X with the volume stopped, then restarted it. Will see whether it heals. Cheers, Kingsley. On Mon, 2022-05-30 at 18:41 +0000, Strahil Nikolov wrote:> Make a backup from all bricks. Based on the info 2 of the bricks have > the same copy while brickC has another copy (gfid mismatch). > > I would use mtime to identify the latest version and use that, but I > have no clue what kind of application you have. > > Usually, It's not recommended to manipulate bricks directly, but in > this case it might be necessary. The simplest way is to move the file > on brick C (the only one that is different) away, but if you need > exactly that one, you can rsync/scp it to the other 2 bricks. > > > Best Regards, > Strahil Nikolov > > > On Fri, May 27, 2022 at 11:45, Kingsley Tart > > <gluster at gluster.dogwind.com> wrote: > > Hi, thanks. > > > > OK that's interesting. Picking one of the files, on bricks A and B > > I see this (and all of the values are identical between bricks A > > and B): > > > > trusted.afr.dirty=0x000000000000000000000000 > > trusted.afr.gw-runqueues-client-2=0x000000010000000200000000 > > trusted.gfid=0xa40bb83ff3784ae09c997d272296a7a9 > > trusted.gfid2path.06eddbe9be9c7c75=0x30323665396561652d613661662d34 > > 6365642d623863632d6261353037333339646364372f677733 > > trusted.glusterfs.mdata=0x01000000000000000000000000628ec5770000000 > > 0007168bb00000000628ec576000000000000000000000000628ec5760000000000 > > 000000 > > > > and on brick C I see this: > > > > trusted.gfid=0xd73992aee03e4021824b1baced973df3 > > trusted.gfid2path.06eddbe9be9c7c75=0x30323665396561652d613661662d34 > > 6365642d623863632d6261353037333339646364372f677733 > > trusted.glusterfs.mdata=0x01000000000000000000000000628ec5230000000 > > 030136ca000000000628ec523000000000000000000000000628ec5230000000000 > > 000000 > > > > So brick C is missing the trusted.afr attributes and the > > trusted.gfid and mdata differ. > > > > What do I need to do to fix this? > > > > Cheers, > > Kingsley. > > > > On Fri, 2022-05-27 at 03:59 +0000, Strahil Nikolov wrote: > > > Check the file attributes on all bricks: > > > > > > getfattr -d -e hex -m. /data/brick/gw-runqueues/<path to file> > > > > > > > > > Best Regards, > > > Strahil Nikolov > > > > > > > On Thu, May 26, 2022 at 16:05, Kingsley Tart > > > > <gluster at gluster.dogwind.com> wrote: > > > > Hi, > > > > > > > > I've got a strange issue where on all clients I've tested on > > > > (tested on > > > > 4) I have "transport endpoint is not connected" on two files in > > > > a > > > > directory, whereas other files can be read fine. > > > > > > > > Any ideas? > > > > > > > > On one of the servers (all same version): > > > > > > > > # gluster --version > > > > glusterfs 9.1 > > > > > > > > On one of the clients (same thing with all of them) - problem > > > > with > > > > files "gw3" and "gw11": > > > > > > > > [root at gw6 btl]# cd /mnt/runqueues/runners/ > > > > [root at gw6 runners]# ls -la > > > > ls: cannot access gw11: Transport endpoint is not connected > > > > ls: cannot access gw3: Transport endpoint is not connected > > > > total 8 > > > > drwxr-xr-x 2 root root 4096 May 26 09:48 . > > > > drwxr-xr-x 13 root root 4096 Apr 12 2021 .. > > > > -rw-r--r-- 1 root root 0 May 26 09:49 gw1 > > > > -rw-r--r-- 1 root root 0 May 26 09:49 gw10 > > > > -????????? ? ? ? ? ? gw11 > > > > -rw-r--r-- 1 root root 0 May 26 09:49 gw2 > > > > -????????? ? ? ? ? ? gw3 > > > > -rw-r--r-- 1 root root 0 May 26 09:49 gw4 > > > > -rw-r--r-- 1 root root 0 May 26 09:49 gw6 > > > > -rw-r--r-- 1 root root 0 May 26 09:49 gw7 > > > > [root at gw6 runners]# cat * > > > > cat: gw11: Transport endpoint is not connected > > > > cat: gw3: Transport endpoint is not connected > > > > [root at gw6 runners]# > > > > > > > > > > > > Querying on a server shows those two problematic files: > > > > > > > > # gluster volume heal gw-runqueues info > > > > Brick gluster9a:/data/brick/gw-runqueues > > > > /runners > > > > /runners/gw11 > > > > /runners/gw3 > > > > Status: Connected > > > > Number of entries: 3 > > > > > > > > Brick gluster9b:/data/brick/gw-runqueues > > > > /runners > > > > /runners/gw11 > > > > /runners/gw3 > > > > Status: Connected > > > > Number of entries: 3 > > > > > > > > Brick gluster9c:/data/brick/gw-runqueues > > > > Status: Connected > > > > Number of entries: 0 > > > > > > > > > > > > However several hours later there's no obvious change. The > > > > servers have > > > > hardly any load and the volume is tiny. From a client: > > > > > > > > # find /mnt/runqueues | wc -l > > > > 35 > > > > > > > > > > > > glfsheal-gw-runqueues.log from server gluster9a: > > > > https://pastebin.com/7mPszBBM > > > > > > > > glfsheal-gw-runqueues.log from server gluster9b: > > > > https://pastebin.com/rxXs5Tcv > > > > > > > > > > > > Any pointers would be much appreciated! > > > > > > > > Cheers, > > > > Kingsley. > > > > > > > > ________ > > > > > > > > > > > > > > > > Community Meeting Calendar: > > > > > > > > Schedule - > > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > > > Bridge: https://meet.google.com/cpu-eiue-hvk > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > ________ > > > > > > > > > > > > Community Meeting Calendar: > > > > > > Schedule - > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > > Bridge: https://meet.google.com/cpu-eiue-hvk > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > https://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20220607/a439bcba/attachment.html>