Pasi Kärkkäinen
2016-Sep-21 17:24 UTC
[Gluster-users] gluster 3.7 healing errors (no data available, buf->ia_gfid is null)
Hi, On Wed, Sep 21, 2016 at 10:12:44PM +0530, Ravishankar N wrote:> On 09/21/2016 06:45 PM, Pasi K?rkk?inen wrote: > >Hello, > > > >I have a pretty basic two-node gluster 3.7 setup, with a volume replicated/mirrored to both servers. > > > >One of the servers was down for hardware maintenance, and later when it got back up, the healing process started, re-syncing files. > >In the beginning there was some 200 files that need to be synced, and now the number of files is down to 10, but it seems the last 10 files don't seem to get synced.. > > > >So the problem is the healing/re-sync never ends for these files.. > > > > > ># gluster volume heal gvol1 info > >Brick gnode1:/bricks/vol1/brick1 > >/foo > >/ - Possibly undergoing heal > > > >/foo6 > >/foo8 > >/foo7 > >/foo9 > >/foo2 > >/foo5 > >/foo4 > >/foo3 > >Status: Connected > >Number of entries: 10 > > > >Brick gnode2:/bricks/vol1/brick1 > >/ > >Status: Connected > >Number of entries: 1 > > > > > >In the brick logs for the volume I see these errors repeating: > > > >[2016-09-21 12:41:43.063209] E [MSGID: 113002] [posix.c:252:posix_lookup] 0-gvol1-posix: buf->ia_gfid is null for /bricks/vol1/brick1/foo [No data available] > >[2016-09-21 12:41:43.063266] E [MSGID: 115050] [server-rpc-fops.c:179:server_lookup_cbk] 0-gvol1-server: 1484202: LOOKUP /foo (00000000-0000-0000-0000-000000000001/foo) ==> (No data available) [No data available] > > > > > >Any idea what might cause those errors? (/foo is exactly the file that is being healed, but fails to heal) > >Any tricks to try? > > Can you check if the 'trusted.gfid' xattr is present for those files > on the bricks and the files also have the associated hardlink inside > .glusterfs? You can refer to https://joejulian.name/blog/what-is-this-new-glusterfs-directory-in-33/ > if you are not familiar with the .glusterfs directory. >Let's see. # getfattr -m . -d -e hex /bricks/vol1/brick1/foo getfattr: Removing leading '/' from absolute path names # file: bricks/vol1/brick1/foo security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000 So hmm.. no trusted.gfid it seems.. is that perhaps because this node was down when the file was created? On another node: # getfattr -m . -d -e hex /bricks/vol1/brick1/foo getfattr: Removing leading '/' from absolute path names # file: bricks/vol1/brick1/foo security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gvol1-client-1=0x000016620000000100000000 trusted.bit-rot.version=0x020000000000000057e00db5000624ed trusted.gfid=0xc1ca778ed2af4828b981171c0c5bd45e So there we have the gfid.. How do I fix this and allow healing process to continue/finish.. ? Thanks, -- Pasi> -Ravi > > > > >Software versions: CentOS 7 with gluster37 repo (running Gluster 3.7.15), and nfs-ganesha 2.3.3. > > > > > >Thanks a lot, > > > >-- Pasi > >
Ravishankar N
2016-Sep-22 04:28 UTC
[Gluster-users] gluster 3.7 healing errors (no data available, buf->ia_gfid is null)
On 09/21/2016 10:54 PM, Pasi K?rkk?inen wrote:> Let's see. > > # getfattr -m . -d -e hex /bricks/vol1/brick1/foo > getfattr: Removing leading '/' from absolute path names > # file: bricks/vol1/brick1/foo > security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > > So hmm.. no trusted.gfid it seems.. is that perhaps because this node was down when the file was created?No, even if that were the case, the gfid should have been set while healing the file to this node. Can you try doing a setfattr -n trusted.gfid -v 0xc1ca778ed2af4828b981171c0c5bd45e on the file. and launch heal again? What about the .glusterfs hardlink- does that exist? -Ravi> > > On another node: > > # getfattr -m . -d -e hex /bricks/vol1/brick1/foo > getfattr: Removing leading '/' from absolute path names > # file: bricks/vol1/brick1/foo > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.gvol1-client-1=0x000016620000000100000000 > trusted.bit-rot.version=0x020000000000000057e00db5000624ed > trusted.gfid=0xc1ca778ed2af4828b981171c0c5bd45e > > So there we have the gfid.. > > How do I fix this and allow healing process to continue/finish.. ?> > > Thanks, > > -- Pasi