thr3ads.net - Gluster users - [Gluster-users] glusterfs 4.1.7 + nfs-ganesha 2.7.1 freeze during write [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Maurits Lamers

2019-Feb-07 12:31 UTC

[Gluster-users] glusterfs 4.1.7 + nfs-ganesha 2.7.1 freeze during write

Hi all,

I am trying to find out more about why a nfs mount through nfs-ganesha of a
glusterfs volume freezes.

Little bit of a background: 
The system consists of one glusterfs volume across 5 nodes. Every node runs
Ubuntu 16.04, gluster 4.1.7 and nfs-ganesha 2.7.1
The gluster volume is exported using the setup described on the first half of
https://docs.gluster.org/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/

The node which freezes is running Nextcloud in a docker setup, where the entire
application is stored on a path, which is a nfs-ganesha mount of the glusterfs
volume.
When I am running a synchronisation operation with this nextcloud instance, at
some point the entire system freezes. The only solution is to completely restart
the node,
Just before this freeze the /var/log/ganesha/ganesha-gfapi.log file contains an
error, which seems to result to timeouts after a short while.

The node running the nextcloud instance is the only one freezing, the rest of
the cluster seems to not be affected.

2019-02-07 10:11:17.342132] W [dict.c:671:dict_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/quick-read.so(+0x59b4)
[0x7f2f035139b4]
-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/io-cache.so(+0xa2cd)
[0x7f2f037242cd]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x50)
[0x7f2f0f312370] ) 0-dict: dict is NULL [Invalid argument]
[2019-02-07 10:11:17.345776] W [dict.c:671:dict_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/quick-read.so(+0x59b4)
[0x7f2f035139b4]
-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/io-cache.so(+0xa2cd)
[0x7f2f037242cd]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x50)
[0x7f2f0f312370] ) 0-dict: dict is NULL [Invalid argument]
[2019-02-07 10:11:17.346079] W [dict.c:671:dict_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/quick-read.so(+0x59b4)
[0x7f2f035139b4]
-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/io-cache.so(+0xa2cd)
[0x7f2f037242cd]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x50)
[0x7f2f0f312370] ) 0-dict: dict is NULL [Invalid argument]
[2019-02-07 10:11:17.396853] W [dict.c:671:dict_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/quick-read.so(+0x59b4)
[0x7f2f035139b4]
-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/io-cache.so(+0xa2cd)
[0x7f2f037242cd]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x50)
[0x7f2f0f312370] ) 0-dict: dict is NULL [Invalid argument]
[2019-02-07 10:11:17.397650] W [dict.c:671:dict_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/quick-read.so(+0x59b4)
[0x7f2f035139b4]
-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/io-cache.so(+0xa2cd)
[0x7f2f037242cd]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x50)
[0x7f2f0f312370] ) 0-dict: dict is NULL [Invalid argument]
[2019-02-07 10:11:17.398036] W [dict.c:671:dict_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/quick-read.so(+0x59b4)
[0x7f2f035139b4]
-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/io-cache.so(+0xa2cd)
[0x7f2f037242cd]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x50)
[0x7f2f0f312370] ) 0-dict: dict is NULL [Invalid argument]
[2019-02-07 10:11:17.407839] W [dict.c:671:dict_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/quick-read.so(+0x59b4)
[0x7f2f035139b4]
-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/io-cache.so(+0xa2cd)
[0x7f2f037242cd]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x50)
[0x7f2f0f312370] ) 0-dict: dict is NULL [Invalid argument]
[2019-02-07 10:11:24.812606] E [MSGID: 104055]
[glfs-fops.c:4955:glfs_cbk_upcall_data] 0-gfapi: Synctak for Upcall
event_type(1) and gfid(y????
         Mz???SL4_@) failed
[2019-02-07 10:11:24.819376] E [MSGID: 104055]
[glfs-fops.c:4955:glfs_cbk_upcall_data] 0-gfapi: Synctak for Upcall
event_type(1) and gfid(eTn?EU?H.<Y4failed
[2019-02-07 10:11:24.833299] E [MSGID: 104055]
[glfs-fops.c:4955:glfs_cbk_upcall_data] 0-gfapi: Synctak for Upcall
event_type(1) and gfid(g?L??F??0b??k) failed
[2019-02-07 10:25:01.642509] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired]
0-gv0-client-2: server [node1]:49152 has not responded in the last 42 seconds,
disconnecting.
[2019-02-07 10:25:01.642805] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired]
0-gv0-client-1: server [node2]:49152 has not responded in the last 42 seconds,
disconnecting.
[2019-02-07 10:25:01.642946] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired]
0-gv0-client-4: server [node3]:49152 has not responded in the last 42 seconds,
disconnecting.
[2019-02-07 10:25:02.643120] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired]
0-gv0-client-3: server 127.0.1.1:49152 has not responded in the last 42 seconds,
disconnecting.
[2019-02-07 10:25:02.643314] C [rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired]
0-gv0-client-0: server [node4]:49152 has not responded in the last 42 seconds,
disconnecting.


The log in /var/log/glusterfs/glusterd.log contains nothing after the default
messages at startup.

The log in /var/log/ganesha/ganesha.log starts warning about its health after
the messages above:

07/02/2019 09:42:17 : epoch 5c5bef56 : [myhost] : ganesha.nfsd-1384[main]
nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
07/02/2019 09:42:17 : epoch 5c5bef56 : [myhost] : ganesha.nfsd-1384[main]
nfs_start :NFS STARTUP :EVENT :             NFS SERVER INITIALIZED
07/02/2019 09:42:17 : epoch 5c5bef56 : [myhost] : ganesha.nfsd-1384[main]
nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
07/02/2019 09:43:28 : epoch 5c5bef56 : [myhost] : ganesha.nfsd-1384[reaper]
nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE
07/02/2019 11:26:17 : epoch 5c5bef56 : [myhost] :
ganesha.nfsd-1384[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is
unhealthy. enq new: 170321, old: 170320; deq new: 170314, old: 170314
07/02/2019 11:28:17 : epoch 5c5bef56 : [myhost] :
ganesha.nfsd-1384[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is
unhealthy. enq new: 170325, old: 170324; deq new: 170317, old: 170317
07/02/2019 11:29:21 : epoch 5c5bef56 : [myhost] :
ganesha.nfsd-1384[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is
unhealthy. enq new: 170327, old: 170326; deq new: 170318, old: 170318

Any hints on how to solve this would be greatly appreciated.

cheers

Maurits

Soumya Koduri

2019-Feb-07 17:51 UTC

head link

[Gluster-users] glusterfs 4.1.7 + nfs-ganesha 2.7.1 freeze during write

On 2/7/19 6:01 PM, Maurits Lamers wrote:> Hi all,
> 
> I am trying to find out more about why a nfs mount through nfs-ganesha of a
glusterfs volume freezes.
> 
> Little bit of a background:
> The system consists of one glusterfs volume across 5 nodes. Every node runs
Ubuntu 16.04, gluster 4.1.7 and nfs-ganesha 2.7.1
> The gluster volume is exported using the setup described on the first half
of
https://docs.gluster.org/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/
> 
> The node which freezes is running Nextcloud in a docker setup, where the
entire application is stored on a path, which is a nfs-ganesha mount of the
glusterfs volume.
> When I am running a synchronisation operation with this nextcloud instance,
at some point the entire system freezes. The only solution is to completely
restart the node,
> Just before this freeze the /var/log/ganesha/ganesha-gfapi.log file
contains an error, which seems to result to timeouts after a short while.
> 
> The node running the nextcloud instance is the only one freezing, the rest
of the cluster seems to not be affected.
> 
> 2019-02-07 10:11:17.342132] W [dict.c:671:dict_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/quick-read.so(+0x59b4)
[0x7f2f035139b4]
-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/io-cache.so(+0xa2cd)
[0x7f2f037242cd]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x50)
[0x7f2f0f312370] ) 0-dict: dict is NULL [Invalid argument]
> [2019-02-07 10:11:17.345776] W [dict.c:671:dict_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/quick-read.so(+0x59b4)
[0x7f2f035139b4]
-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/io-cache.so(+0xa2cd)
[0x7f2f037242cd]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x50)
[0x7f2f0f312370] ) 0-dict: dict is NULL [Invalid argument]
> [2019-02-07 10:11:17.346079] W [dict.c:671:dict_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/quick-read.so(+0x59b4)
[0x7f2f035139b4]
-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/io-cache.so(+0xa2cd)
[0x7f2f037242cd]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x50)
[0x7f2f0f312370] ) 0-dict: dict is NULL [Invalid argument]
> [2019-02-07 10:11:17.396853] W [dict.c:671:dict_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/quick-read.so(+0x59b4)
[0x7f2f035139b4]
-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/io-cache.so(+0xa2cd)
[0x7f2f037242cd]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x50)
[0x7f2f0f312370] ) 0-dict: dict is NULL [Invalid argument]
> [2019-02-07 10:11:17.397650] W [dict.c:671:dict_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/quick-read.so(+0x59b4)
[0x7f2f035139b4]
-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/io-cache.so(+0xa2cd)
[0x7f2f037242cd]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x50)
[0x7f2f0f312370] ) 0-dict: dict is NULL [Invalid argument]
> [2019-02-07 10:11:17.398036] W [dict.c:671:dict_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/quick-read.so(+0x59b4)
[0x7f2f035139b4]
-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/io-cache.so(+0xa2cd)
[0x7f2f037242cd]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x50)
[0x7f2f0f312370] ) 0-dict: dict is NULL [Invalid argument]
> [2019-02-07 10:11:17.407839] W [dict.c:671:dict_ref]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/quick-read.so(+0x59b4)
[0x7f2f035139b4]
-->/usr/lib/x86_64-linux-gnu/glusterfs/4.1.7/xlator/performance/io-cache.so(+0xa2cd)
[0x7f2f037242cd]
-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x50)
[0x7f2f0f312370] ) 0-dict: dict is NULL [Invalid argument]
There is a patch [1] submitted and under review which fixes above error 
messages.
> [2019-02-07 10:11:24.812606] E [MSGID: 104055]
[glfs-fops.c:4955:glfs_cbk_upcall_data] 0-gfapi: Synctak for Upcall
event_type(1) and gfid(y????
>           Mz???SL4_@) failed
> [2019-02-07 10:11:24.819376] E [MSGID: 104055]
[glfs-fops.c:4955:glfs_cbk_upcall_data] 0-gfapi: Synctak for Upcall
event_type(1) and gfid(eTn?EU?H.<Y4failed
> [2019-02-07 10:11:24.833299] E [MSGID: 104055]
[glfs-fops.c:4955:glfs_cbk_upcall_data] 0-gfapi: Synctak for Upcall
event_type(1) and gfid(g?L??F??0b??k) failed
> [2019-02-07 10:25:01.642509] C
[rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-gv0-client-2: server
[node1]:49152 has not responded in the last 42 seconds, disconnecting.
> [2019-02-07 10:25:01.642805] C
[rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-gv0-client-1: server
[node2]:49152 has not responded in the last 42 seconds, disconnecting.
> [2019-02-07 10:25:01.642946] C
[rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-gv0-client-4: server
[node3]:49152 has not responded in the last 42 seconds, disconnecting.
> [2019-02-07 10:25:02.643120] C
[rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-gv0-client-3: server
127.0.1.1:49152 has not responded in the last 42 seconds, disconnecting.
> [2019-02-07 10:25:02.643314] C
[rpc-clnt-ping.c:166:rpc_clnt_ping_timer_expired] 0-gv0-client-0: server
[node4]:49152 has not responded in the last 42 seconds, disconnecting.
> 
Strange that synctask failed. Could you please turn off 
features.cache-invalidation volume option and check if the issue still 
persists.

Also next time the system freezes, please check following:

1) if all the brick servers of that volume are up and running. Sometimes 
if the brick servers are not reachable, it takes a while for outstanding 
requests to timeout and get application (nfs-ganesha) back to normal 
state. Please wait for a while and check if the mount becomes accessible.

2) check if nfs-ganesha server is responding to other requests -
#showmount -e localhost (on the node server is running)
# try mount and I/Os from any other client.

3) and if the ganesha server isnt responding to any client, please try 
collecting 2-3 cores/stack traces of nfs-ganesha server with brief 
interval (say 5min) in between using below commands . This is to check 
if any threads are stuck in deadlock.

to collect core:
#gcore <nfs-ganesha-pid>

to collect stack trace:
#gstack <nfs-ganesha-pid>

Thanks,
Soumya

[1] https://review.gluster.org/#/c/glusterfs/+/22126/> 
> The log in /var/log/glusterfs/glusterd.log contains nothing after the
default messages at startup.
> 
> The log in /var/log/ganesha/ganesha.log starts warning about its health
after the messages above:
> 
> 07/02/2019 09:42:17 : epoch 5c5bef56 : [myhost] : ganesha.nfsd-1384[main]
nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
> 07/02/2019 09:42:17 : epoch 5c5bef56 : [myhost] : ganesha.nfsd-1384[main]
nfs_start :NFS STARTUP :EVENT :             NFS SERVER INITIALIZED
> 07/02/2019 09:42:17 : epoch 5c5bef56 : [myhost] : ganesha.nfsd-1384[main]
nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
> 07/02/2019 09:43:28 : epoch 5c5bef56 : [myhost] : ganesha.nfsd-1384[reaper]
nfs_lift_grace_locked :STATE :EVENT :NFS Server Now NOT IN GRACE
> 07/02/2019 11:26:17 : epoch 5c5bef56 : [myhost] :
ganesha.nfsd-1384[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is
unhealthy. enq new: 170321, old: 170320; deq new: 170314, old: 170314
> 07/02/2019 11:28:17 : epoch 5c5bef56 : [myhost] :
ganesha.nfsd-1384[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is
unhealthy. enq new: 170325, old: 170324; deq new: 170317, old: 170317
> 07/02/2019 11:29:21 : epoch 5c5bef56 : [myhost] :
ganesha.nfsd-1384[dbus_heartbeat] nfs_health :DBUS :WARN :Health status is
unhealthy. enq new: 170327, old: 170326; deq new: 170318, old: 170318
> 
> Any hints on how to solve this would be greatly appreciated.
> 
> cheers
> 
> Maurits
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>

Gluster users - Feb 2019 - glusterfs 4.1.7 + nfs-ganesha 2.7.1 freeze during write

[Gluster-users] glusterfs 4.1.7 + nfs-ganesha 2.7.1 freeze during write

[Gluster-users] glusterfs 4.1.7 + nfs-ganesha 2.7.1 freeze during write