Shawn Heisey
2014-Mar-04 23:46 UTC
[Gluster-users] Fixing heal / split-brain when the entry is a directory
I have a bunch of heal problems on a volume. For this email, I won't speculate about what caused them - that's a whole other discussion that I may have at some point in the future. This will concentrate on fixing the immediate problems so I can move forward. Thanks to JoeJulian's blog posts and talking to him in the IRC channel, I have a pretty good handle on how to fix entries in the 'heal $vol info' output ... but only if the entry given refers to a real *file* or a gluster link file. Almost all of the entries in my report are directories, and I have no idea how to fix it. All I have for these entries is gfid values, so I first locate the entry in .glusterfs. In this case, it's a symlink. [root at slc01dfs001a ~]# stat /bricks/d00v00/mdfs/.glusterfs/fe/93/fe93de6e-5b91-4193-a31c-786726886ff1 File: `/bricks/d00v00/mdfs/.glusterfs/fe/93/fe93de6e-5b91-4193-a31c-786726886ff1' -> `../../a7/30/a730505c-84f3-407f-ac27-d45465a17f40/331' Size: 52 Blocks: 0 IO Block: 4096 symbolic link Device: fd06h/64774d Inode: 2152112572 Links: 1 Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-06-21 03:17:27.740839811 -0600 Modify: 2013-06-21 03:17:27.740839811 -0600 Change: 2013-06-21 03:17:27.740839811 -0600 To figure out what the actual directory name is, I use readlink: [root at slc01dfs001a ~]# readlink -f /bricks/d00v00/mdfs/.glusterfs/fe/93/fe93de6e-5b91-4193-a31c-786726886ff1 /bricks/d00v00/mdfs/REDACTED/mdfs/RTR/rtrphotosfour/docs/331 I can get the extended attributes. I know from talking to Joe Julian that the following output means both copies think the other needs healing. If I compare 'ls -al' output from the brick directory on both copies, they are the same. [root at slc01dfs001a ~]# getfattr -m . -d -e hex /bricks/d00v00/mdfs/REDACTED/mdfs/RTR/rtrphotosfour/docs/331 getfattr: Removing leading '/' from absolute path names # file: bricks/d00v00/mdfs/REDACTED/mdfs/RTR/rtrphotosfour/docs/331 trusted.afr.mdfs-client-0=0x00000000000000000000006e trusted.afr.mdfs-client-1=0x00000000000000000000006e trusted.gfid=0xfe93de6e5b914193a31c786726886ff1 trusted.glusterfs.dht=0x00000001000000003ffffffc4ffffffa Now for the big question ... what do I do, in a step-by-step format, to eliminate this entry from the heal info output? On another entry, I tried deleting the second trusted.afr entry on both copies, I tried deleting them both, I tried deleting one and setting the other to zero, and I tried changing them to both to zero. In between each of these, I did a stat on the directory via the FUSE mount. It did not change the heal info output. Thanks, Shawn
Viktor Villafuerte
2014-Mar-05 00:20 UTC
[Gluster-users] Fixing heal / split-brain when the entry is a directory
You may have tried this already.. but what if you leave both trusted.afr entries, change only one to '0' and then self-heal? v On Tue 04 Mar 2014 16:46:14, Shawn Heisey wrote:> I have a bunch of heal problems on a volume. For this email, I > won't speculate about what caused them - that's a whole other > discussion that I may have at some point in the future. This will > concentrate on fixing the immediate problems so I can move forward. > > Thanks to JoeJulian's blog posts and talking to him in the IRC > channel, I have a pretty good handle on how to fix entries in the > 'heal $vol info' output ... but only if the entry given refers to a > real *file* or a gluster link file. Almost all of the entries in my > report are directories, and I have no idea how to fix it. > > All I have for these entries is gfid values, so I first locate the > entry in .glusterfs. In this case, it's a symlink. > > [root at slc01dfs001a ~]# stat /bricks/d00v00/mdfs/.glusterfs/fe/93/fe93de6e-5b91-4193-a31c-786726886ff1 > File: `/bricks/d00v00/mdfs/.glusterfs/fe/93/fe93de6e-5b91-4193-a31c-786726886ff1' > -> `../../a7/30/a730505c-84f3-407f-ac27-d45465a17f40/331' > Size: 52 Blocks: 0 IO Block: 4096 symbolic link > Device: fd06h/64774d Inode: 2152112572 Links: 1 > Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root) > Access: 2013-06-21 03:17:27.740839811 -0600 > Modify: 2013-06-21 03:17:27.740839811 -0600 > Change: 2013-06-21 03:17:27.740839811 -0600 > > To figure out what the actual directory name is, I use readlink: > > [root at slc01dfs001a ~]# readlink -f /bricks/d00v00/mdfs/.glusterfs/fe/93/fe93de6e-5b91-4193-a31c-786726886ff1 > /bricks/d00v00/mdfs/REDACTED/mdfs/RTR/rtrphotosfour/docs/331 > > I can get the extended attributes. I know from talking to Joe Julian > that the following output means both copies think the other needs > healing. If I compare 'ls -al' output from the brick directory on > both copies, they are the same. > > [root at slc01dfs001a ~]# getfattr -m . -d -e hex > /bricks/d00v00/mdfs/REDACTED/mdfs/RTR/rtrphotosfour/docs/331 > getfattr: Removing leading '/' from absolute path names > # file: bricks/d00v00/mdfs/REDACTED/mdfs/RTR/rtrphotosfour/docs/331 > trusted.afr.mdfs-client-0=0x00000000000000000000006e > trusted.afr.mdfs-client-1=0x00000000000000000000006e > trusted.gfid=0xfe93de6e5b914193a31c786726886ff1 > trusted.glusterfs.dht=0x00000001000000003ffffffc4ffffffa > > Now for the big question ... what do I do, in a step-by-step format, > to eliminate this entry from the heal info output? On another > entry, I tried deleting the second trusted.afr entry on both copies, > I tried deleting them both, I tried deleting one and setting the > other to zero, and I tried changing them to both to zero. In > between each of these, I did a stat on the directory via the FUSE > mount. It did not change the heal info output. > > Thanks, > Shawn > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users-- Regards Viktor Villafuerte Optus Internet Engineering t: 02 808-25265