edrock200
2023-Nov-10  18:37 UTC
[Gluster-users] Dispersed Volume Errors after failed expansion
Hello,
I've run into an issue with Gluster 11.1 and need some assistance. I have a
4+1 dispersed gluster setup consisting of 20 nodes and 200 bricks. This setup
was 15 nodes and 150 bricks until last week and was working flawlessly. We
needed more space so we expanded the volume by adding 5 more nodes and 50
bricks.
We added the nodes and triggered a fix-layout command. Unknown to us at the
time, one of the five new nodes had a hardware issue, the CPU cooling fan was
bad. This caused the node to throttle down to 500mhz on all cores and eventually
shut itself down mid fix-layout. Due to how our ISP works, we could only replace
the entire node, so we did and executed a replace-brick command.
Presently this is the state we are in and I'm not sure how best to proceed
to fix the errors and behavior I'm seeing. I'm not sure if running
another fix-layout command again should be the next step or not given hundreds
of objects are stuck in a persistent heal state, and the fact that doing just
about any command other than status, info or heal volume info, results in all
client mounts hanging for ~5m or bricks start to drop. The client logs show
numerous anomolies as well such as:
[2023-11-10 17:41:52.153423 +0000] W [MSGID: 122040]
[ec-common.c:1262:ec_prepare_update_cbk] 0-media-disperse-30: Failed to get size
and version : FOP : 'XATTROP' failed on '/path/to/folder' with
gfid 0d295c94-5577-4445-9e57-6258f24d22c5. Parent FOP: OPENDIR [Input/output
error]
[2023-11-10 17:48:46.965415 +0000] E [MSGID: 122038]
[ec-dir-read.c:398:ec_manager_readdir] 0-media-disperse-36: EC is not winding
readdir: FOP : 'READDIRP' failed on gfid
f8ad28d0-05b4-4df3-91ea-73fabf27712c. Parent FOP: No Parent [File descriptor in
bad state]
[2023-11-10 17:39:46.076149 +0000] I [MSGID: 109018]
[dht-common.c:1840:dht_revalidate_cbk] 0-media-dht: Mismatching layouts for
/path/to/folder2, gfid = f04124e5-63e6-4ddf-9b6b-aa47770f90f2
[2023-11-10 17:39:18.463421 +0000] E [MSGID: 122034]
[ec-common.c:662:ec_log_insufficient_vol] 0-media-disperse-4: Insufficient
available children for this request: Have : 0, Need : 4 : Child UP : 11111 Mask:
00000, Healing : 00000 : FOP : 'XATTROP' failed on
'/path/to/another/folder with gfid f04124e5-63e6-4ddf-9b6b-aa47770f90f2.
Parent FOP: SETXATTR
[2023-11-10 17:36:21.565681 +0000] W [MSGID: 122006]
[ec-combine.c:188:ec_iatt_combine] 0-media-disperse-39: Failed to combine iatt
(inode: 13324146332441721129-13324146332441721129, links: 2-2, uid: 1000-1000,
gid: 1000-1001, rdev: 0-0, size: 10-10, mode: 40775-40775), FOP :
'LOOKUP' failed on '/path/to/yet/another/folder'. Parent FOP: No
Parent
[2023-11-10 17:39:46.147299 +0000] W [MSGID: 114031]
[client-rpc-fops_v2.c:2563:client4_0_lookup_cbk] 0-media-client-1: remote
operation failed. [{path=/path/to/folder3},
{gfid=00000000-0000-0000-0000-000000000000}, {errno=13}, {error=Permission
denied}]
[2023-11-10 17:39:46.093069 +0000] W [MSGID: 114061]
[client-common.c:1232:client_pre_readdirp_v2] 0-media-client-14: remote_fd is
-1. EBADFD [{gfid=f04124e5-63e6-4ddf-9b6b-aa47770f90f2}, {errno=77}, {error=File
descriptor in bad state}]
[2023-11-10 17:55:11.407630 +0000] E [MSGID: 122038]
[ec-dir-read.c:398:ec_manager_readdir] 0-media-disperse-30: EC is not winding
readdir: FOP : 'READDIRP' failed on gfid
2bba7b7e-7a4b-416a-80f0-dd50caffd2c2. Parent FOP: No Parent [File descriptor in
bad state]
[2023-11-10 17:39:46.076179 +0000] W [MSGID: 109221]
[dht-selfheal.c:2023:dht_selfheal_directory] 0-media-dht: Directory selfheal
failed [{path=/path/to/folder7}, {misc=2}, {unrecoverable-errors},
{gfid=f04124e5-63e6-4ddf-9b6b-aa47770f90f2}]
Something about this failed expansion has caused these errors and I'm not
sure how to proceed. Right now doing just about anything causes the client
mounts to hang for up to 5 minutes including restarting a node, trying to use a
volume set command, etc. I tried increasing a cache timeout value and ~153
bricks out of 200 dropped offline. Restarting a node seems to cause the mounts
to hang as well.
I've tried:
running a gluster volume heal volumename full - will cause mounts to hang for
3-5m but seems to proceed
Running ls -alhR against volume to trigger heals
Tried removing new bricks, which triggers a rebalance which fails almost
immediately, and most of the self-heal agents go offline as well
Turned off bit-rot to reduce load on system
Replace a brick with a new brick (same drive, new dir.) Attempted force as well.
Changed heal mode from diff to full
Lowered parallel heal count to 4
When I replaced the one brick, the heal count dropped on that brick from ~100 to
~6, however, those 6 are folders in the root of the volume vs subfolders many
layers in. I suspect this is causing a lot of the issues I'm seeing and I
don't know how to resolve this without damaging any of the existing data.
I'm hoping its just due to the fix layout failing and that just needs to run
again but wanted to seek guidance from the group as to not make things worse.
I'm not opposed to losing the data already copied to the new bricks, I just
need to know how to do so without damaging the data on the original 150 bricks.
I did notice something else odd as well which I'm not sure is pertinent or
not, but on one of the original 15 nodes, if I go to /data/brick1/volume dir and
to an ls -l, the permissions show 1000:1000, which is how it is on the actual
fuse mount as well. If I do the same on one of the new bricks, it shows
root:root. I didn't alter any of this, again as to not cause more problems.
Thanks in advance for any guidance/help.
-Ed
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20231110/aa7a360b/attachment.html>