fredrik ronnvall
2012-May-29 16:58 UTC
[Gluster-users] Problems with Gluster NFS export (unable to stat files)
Hi, We're seeing some random errors quite frequently mounting one of our volumes via NFS. At random a client will fail to access certain files/directories, they show up like this: $ ls -l ls: cannot access xxx: No such file or directory ls: cannot access yyy: No such file or directory l????????? ? ? ? ? ? xxx l????????? ? ? ? ? ? yyy drwxrwxrwx 2 user group 95 2012-05-08 18:11 zzz Tracing back the NFS mount to one of the gluster servers, this shows up in nfs.log: [2012-05-09 14:47:32.807853] E [client3_1-fops.c:411:client3_1_stat_cbk] 0-glustervol1-client-2: remote operation failed: No such file or directory [2012-05-09 14:47:32.808430] E [client3_1-fops.c:411:client3_1_stat_cbk] 0-glustervol1-client-3: remote operation failed: No such file or directory [2012-05-09 14:47:32.841125] E [client3_1-fops.c:411:client3_1_stat_cbk] 0-glustervol1-client-3: remote operation failed: No such file or directory [2012-05-09 14:47:32.841762] E [client3_1-fops.c:411:client3_1_stat_cbk] 0-glustervol1-client-2: remote operation failed: No such file or directory Restarting the gluster server seems to fix the issue, though I am unhappy with this solution. Today this showed up in the logs following the same symptoms: [2012-05-29 10:19:04.332031] E [afr-self-heal-metadata.c:561:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-glustervol1-replicate-3: Non Blocking metadata inodelks failed for <path>. [2012-05-29 10:19:04.332059] E [afr-self-heal-metadata.c:563:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-glustervol1-replicate-3: Metadata self-heal failed for <path>. [2012-05-29 10:19:04.332503] E [afr-self-heal-metadata.c:561:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-glustervol1-replicate-2: Non Blocking metadata inodelks failed for <path>. [2012-05-29 10:19:04.332534] E [afr-self-heal-metadata.c:563:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-glustervol1-replicate-2: Metadata self-heal failed for <path>. A restart of gluster on the server the client was connected to from solved the issue. This seems to happen several times a day and is becoming a serious issue. The problem frequently happens to symlinks, however regular files are also affected. The volume in question is configured across 4 servers (OpenSUSE 11.3) with 2 bricks per server as distributed-replicate. Gluster version is 3.2.5. Has anyone experienced similar issues? Is there a sanity check of sorts that I could carry out? Fredrik