thr3ads.net - Gluster users - [Gluster-users] nfs-ganesha logs [Feb 2017]

If this information is useful, please help other people find it:
Share via:

Mahdi Adnan

2017-Feb-28 07:59 UTC

[Gluster-users] nfs-ganesha logs

Hi,


We have a Gluster volume hosting VMs for ESXi exported via Ganesha.

Im getting the following messages in ganesha-gfapi.log and ganesha.log



====
[2017-02-28 07:44:55.194621] E [MSGID: 109040]
[dht-helper.c:1198:dht_migration_complete_check_task] 0-vmware2-dht:
<gfid:ec846aeb-50f9-4b39-b0c9-24a8b833afe6>: failed to lookup the file on
vmware2-dht [Stale file handle]
[2017-02-28 07:44:55.194660] E [MSGID: 133014]
[shard.c:1129:shard_common_stat_cbk] 0-vmware2-shard: stat failed:
ec846aeb-50f9-4b39-b0c9-24a8b833afe6 [Stale file handle]
[2017-02-28 07:44:55.207154] W [MSGID: 108008] [afr-read-txn.c:228:afr_read_txn]
0-vmware2-replicate-5: Unreadable subvolume -1 found with event generation 8 for
gfid 4a50127e-4403-49a5-9886-80541a76299c. (Possible split-brain)
[2017-02-28 07:44:55.209205] E [MSGID: 109040]
[dht-helper.c:1198:dht_migration_complete_check_task] 0-vmware2-dht:
<gfid:4a50127e-4403-49a5-9886-80541a76299c>: failed to lookup the file on
vmware2-dht [Stale file handle]
[2017-02-28 07:44:55.209265] E [MSGID: 133014]
[shard.c:1129:shard_common_stat_cbk] 0-vmware2-shard: stat failed:
4a50127e-4403-49a5-9886-80541a76299c [Stale file handle]
[2017-02-28 07:44:55.212556] W [MSGID: 108008] [afr-read-txn.c:228:afr_read_txn]
0-vmware2-replicate-4: Unreadable subvolume -1 found with event generation 2 for
gfid cec80035-1f51-434a-9dbf-8bcdd5f4a8f7. (Possible split-brain)
[2017-02-28 07:44:55.214702] E [MSGID: 109040]
[dht-helper.c:1198:dht_migration_complete_check_task] 0-vmware2-dht:
<gfid:cec80035-1f51-434a-9dbf-8bcdd5f4a8f7>: failed to lookup the file on
vmware2-dht [Stale file handle]
[2017-02-28 07:44:55.214741] E [MSGID: 133014]
[shard.c:1129:shard_common_stat_cbk] 0-vmware2-shard: stat failed:
cec80035-1f51-434a-9dbf-8bcdd5f4a8f7 [Stale file handle]
[2017-02-28 07:44:55.259729] I [MSGID: 108031]
[afr-common.c:2154:afr_local_discovery_cbk] 0-vmware2-replicate-0: selecting
local read_child vmware2-client-0
[2017-02-28 07:44:55.259937] I [MSGID: 108031]
[afr-common.c:2154:afr_local_discovery_cbk] 0-vmware2-replicate-4: selecting
local read_child vmware2-client-8

====
28/02/2017 06:27:54 : epoch 58b05af4 : gluster01 :
ganesha.nfsd-2015[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status
is unhealthy.  Not sending heartbeat
28/02/2017 06:33:36 : epoch 58b05af4 : gluster01 : ganesha.nfsd-2015[work-9]
cache_inode_avl_qp_insert :INODE :CRIT :cache_inode_avl_qp_insert_s: name
conflict (access, access)
====

The volume is hosting a few VMs without any noticeable workload, and all bricks
are SSDs.

Im censored about the logs messages because i have another cluster and ganesha
keeps on crashing every few days with the following message spamming the log:


28/02/2017 08:02:45 : epoch 58b1e2f5 : gfs01 :
ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status
is unhealthy.  Not sending heartbeat
28/02/2017 08:41:08 : epoch 58b1e2f5 : gfs01 :
ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status
is unhealthy.  Not sending heartbeat
28/02/2017 08:48:38 : epoch 58b1e2f5 : gfs01 :
ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status
is unhealthy.  Not sending heartbeat
28/02/2017 08:48:52 : epoch 58b1e2f5 : gfs01 :
ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status
is unhealthy.  Not sending heartbeat
28/02/2017 09:16:27 : epoch 58b1e2f5 : gfs01 :
ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status
is unhealthy.  Not sending heartbeat
28/02/2017 09:46:54 : epoch 58b1e2f5 : gfs01 :
ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status
is unhealthy.  Not sending heartbeat
28/02/2017 09:50:02 : epoch 58b1e2f5 : gfs01 :
ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status
is unhealthy.  Not sending heartbeat
28/02/2017 09:57:03 : epoch 58b1e2f5 : gfs01 :
ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status
is unhealthy.  Not sending heartbeat
28/02/2017 09:57:14 : epoch 58b1e2f5 : gfs01 :
ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status
is unhealthy.  Not sending heartbeat
28/02/2017 10:48:41 : epoch 58b1e2f5 : gfs01 :
ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status
is unhealthy.  Not sending heartbeat



SSDs volumes are running Gluster 3.8.9 and Ganesha V2.3.3 and the other cluster
is running Gluster 3.7.19 and Ganesha V2.3.0.


also, how can i get IO statics from Ganesha ?

I appreciate your help.




--

Respectfully
Mahdi A. Mahdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170228/e873118e/attachment.html>

Soumya Koduri

2017-Mar-01 12:55 UTC

head link

[Gluster-users] nfs-ganesha logs

I am not sure if there are any outstanding issues with exposing shard 
volume via gfapi. CCin Krutika.

On 02/28/2017 01:29 PM, Mahdi Adnan wrote:> Hi,
>
>
> We have a Gluster volume hosting VMs for ESXi exported via Ganesha.
>
> Im getting the following messages in ganesha-gfapi.log and ganesha.log
>
>
>
> ====>
> [2017-02-28 07:44:55.194621] E [MSGID: 109040]
> [dht-helper.c:1198:dht_migration_complete_check_task] 0-vmware2-dht:
> <gfid:ec846aeb-50f9-4b39-b0c9-24a8b833afe6>: failed to lookup the
file
> on vmware2-dht [Stale file handle]
This "Stale file handle" error suggests that the file may have just
got
removed at the back-end. Probably someone more familiar with dht (cc'ed 
Nithya) can confirm if there are other possibilities.
> [2017-02-28 07:44:55.194660] E [MSGID: 133014]
> [shard.c:1129:shard_common_stat_cbk] 0-vmware2-shard: stat failed:
> ec846aeb-50f9-4b39-b0c9-24a8b833afe6 [Stale file handle]
> [2017-02-28 07:44:55.207154] W [MSGID: 108008]
> [afr-read-txn.c:228:afr_read_txn] 0-vmware2-replicate-5: Unreadable
> subvolume -1 found with event generation 8 for gfid
> 4a50127e-4403-49a5-9886-80541a76299c. (Possible split-brain)
> [2017-02-28 07:44:55.209205] E [MSGID: 109040]
> [dht-helper.c:1198:dht_migration_complete_check_task] 0-vmware2-dht:
> <gfid:4a50127e-4403-49a5-9886-80541a76299c>: failed to lookup the
file
> on vmware2-dht [Stale file handle]
> [2017-02-28 07:44:55.209265] E [MSGID: 133014]
> [shard.c:1129:shard_common_stat_cbk] 0-vmware2-shard: stat failed:
> 4a50127e-4403-49a5-9886-80541a76299c [Stale file handle]
> [2017-02-28 07:44:55.212556] W [MSGID: 108008]
> [afr-read-txn.c:228:afr_read_txn] 0-vmware2-replicate-4: Unreadable
> subvolume -1 found with event generation 2 for gfid
> cec80035-1f51-434a-9dbf-8bcdd5f4a8f7. (Possible split-brain)
> [2017-02-28 07:44:55.214702] E [MSGID: 109040]
> [dht-helper.c:1198:dht_migration_complete_check_task] 0-vmware2-dht:
> <gfid:cec80035-1f51-434a-9dbf-8bcdd5f4a8f7>: failed to lookup the
file
> on vmware2-dht [Stale file handle]
> [2017-02-28 07:44:55.214741] E [MSGID: 133014]
> [shard.c:1129:shard_common_stat_cbk] 0-vmware2-shard: stat failed:
> cec80035-1f51-434a-9dbf-8bcdd5f4a8f7 [Stale file handle]
> [2017-02-28 07:44:55.259729] I [MSGID: 108031]
> [afr-common.c:2154:afr_local_discovery_cbk] 0-vmware2-replicate-0:
> selecting local read_child vmware2-client-0
> [2017-02-28 07:44:55.259937] I [MSGID: 108031]
> [afr-common.c:2154:afr_local_discovery_cbk] 0-vmware2-replicate-4:
> selecting local read_child vmware2-client-8
>
> ====>
> 28/02/2017 06:27:54 : epoch 58b05af4 : gluster01 :
> ganesha.nfsd-2015[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health
> status is unhealthy.  Not sending heartbeat
> 28/02/2017 06:33:36 : epoch 58b05af4 : gluster01 :
> ganesha.nfsd-2015[work-9] cache_inode_avl_qp_insert :INODE :CRIT
> :cache_inode_avl_qp_insert_s: name conflict (access, access)
> ====>
>
> The volume is hosting a few VMs without any noticeable workload, and all
> bricks are SSDs.
>
> Im censored about the logs messages because i have another cluster and
> ganesha keeps on crashing every few days with the following message
> spamming the log:
>
Do you happen to have core? If yes, could you please check the bt. Below 
messages are just heartbeat warnings typically thrown when the 
outstanding request queue is above certain bench mark and nfs-ganesha 
server is taking a while to process them. Also you seem to be using 
nfs-ganesha 2.3.x version. Its not being actively maintained. There are 
many improvements and fixes done in nfs-ganesha 2.4.x. I suggest to try 
out that version if possible.

  >> 28/02/2017 08:02:45 : epoch 58b1e2f5 : gfs01 :
> ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health
> status is unhealthy.  Not sending heartbeat
> 28/02/2017 08:41:08 : epoch 58b1e2f5 : gfs01 :
> ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health
> status is unhealthy.  Not sending heartbeat
> 28/02/2017 08:48:38 : epoch 58b1e2f5 : gfs01 :
> ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health
> status is unhealthy.  Not sending heartbeat
> 28/02/2017 08:48:52 : epoch 58b1e2f5 : gfs01 :
> ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health
> status is unhealthy.  Not sending heartbeat
> 28/02/2017 09:16:27 : epoch 58b1e2f5 : gfs01 :
> ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health
> status is unhealthy.  Not sending heartbeat
> 28/02/2017 09:46:54 : epoch 58b1e2f5 : gfs01 :
> ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health
> status is unhealthy.  Not sending heartbeat
> 28/02/2017 09:50:02 : epoch 58b1e2f5 : gfs01 :
> ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health
> status is unhealthy.  Not sending heartbeat
> 28/02/2017 09:57:03 : epoch 58b1e2f5 : gfs01 :
> ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health
> status is unhealthy.  Not sending heartbeat
> 28/02/2017 09:57:14 : epoch 58b1e2f5 : gfs01 :
> ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health
> status is unhealthy.  Not sending heartbeat
> 28/02/2017 10:48:41 : epoch 58b1e2f5 : gfs01 :
> ganesha.nfsd-31929[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health
> status is unhealthy.  Not sending heartbeat
>
>
> SSDs volumes are running Gluster 3.8.9 and Ganesha V2.3.3 and the other
> cluster is running Gluster 3.7.19 and Ganesha V2.3.0.
>
>
> also, how can i get IO statics from Ganesha ?
AFAIK, there are no tools integrated with nfs-ganesha which monitors and 
displays IO statistics. Request Frank & others to comment.

Thanks,
Soumya
>
> I appreciate your help.
>
>
>
>
>
> --
>
> Respectfully*
> **Mahdi A. Mahdi*
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>

Gluster users - Feb 2017 - nfs-ganesha logs

[Gluster-users] nfs-ganesha logs

[Gluster-users] nfs-ganesha logs