Shawn Heisey
2014-Mar-09 02:45 UTC
[Gluster-users] Problems with .gluster structure - bad symlinks
Some background: ---- On version 3.3.1, we tried to rebalance after adding storage. It blew up badly due to this bug: https://bugzilla.redhat.com/show_bug.cgi?id=859387 We have now upgraded to 3.4.2. A new rebalance attempt resulted in a several dozen entries showing up in the 'gluster volume heal $vol info' output. ---- With the help of Joe Julian in the IRC channel, I made my way through the heal problems, but I continue to get errors in my server logs. I have now learned that there are a bunch of bad symlinks in the .glusterfs structure on each of my bricks. All of them say too many levels of symbolic links. I do not believe they are loops ... when I manually checked a couple of them, they were actually valid, but had more than the allowed number of symlinks in the chain. cat: /bricks/d00v00/mdfs/.glusterfs/65/30/6530ce82-310d-4c7c-8d14-135655328a77: Too many levels of symbolic links What do I need to do to fix this problem? Is there something I can do for each of the bad symlinks? Would a 'heal full' do anything useful? Do I need to do something more drastic, like take the volume down and entirely remove (or rename) the .glusterfs structure from all 32 bricks (16x2 distributed-replicate)? I don't want to cause myself more problems, but I want to get the volume in a completely pristine state and NOT risk losing any of the 52 terabytes of data that's in the volume. Thanks, Shawn
Shawn Heisey
2014-Mar-09 16:39 UTC
[Gluster-users] Problems with .gluster structure - bad symlinks
On 3/8/2014 7:45 PM, Shawn Heisey wrote:> cat: > /bricks/d00v00/mdfs/.glusterfs/65/30/6530ce82-310d-4c7c-8d14-135655328a77: > Too many levels of symbolic links > > What do I need to do to fix this problem? Is there something I can do > for each of the bad symlinks? Would a 'heal full' do anything useful? > Do I need to do something more drastic, like take the volume down and > entirely remove (or rename) the .glusterfs structure from all 32 bricks > (16x2 distributed-replicate)? I don't want to cause myself more > problems, but I want to get the volume in a completely pristine state > and NOT risk losing any of the 52 terabytes of data that's in the volume.Some additional info: http://fpaste.org/83806/43825451/ This is from nfs.log on the server that all my clients contact for NFS mounts. It is peered with the other servers, but has no bricks. So far I have determined the following about my bricks: * There are no stray directories under .glusterfs/??/??/ * There is nothing remaining with nonzero trusted.afr* attributes * There *are* broken symlinks (too many levels) I will run another check to make sure there are no files with one hardlink outside of the indices directory. I will also check for files that have more than two hardlinks. I do not use hardlinks in my data, so I think that this should never happen. Is there anything else I can look for, and if I find something, where can I go for information about how to fix it? Thanks, Shawn