Le 01/12/2016 ? 13:12, Yannick Perret a ?crit :> Hello,
> I have a client machine that mounts as NFS a replicate x2 volume.
> Practicaly this is configured with automount such as:
> DIR-NAME -rw,soft,intr server1,server2:/VOLUME
>
> Gluster servers are using 3.6.7.
> Sometimes the NFS blocks on client with
> server server2 not responding, timed out (here it was connected on
> server2)
> but network communication is fine beetween the two machines (they are
> connected to the same switch, I can ssh on each, they ping each other?).
>
> I can also see few "xs_tcp_setup_socket: connect returned unhandled
> error -107" on the client.
> On 'server2' side I can see in the gluster nfs logs:
>
> [2016-12-01 10:50:15.887927] W [rpcsvc.c:261:rpcsvc_program_actor]
> 0-rpc-service: RPC program version not available (req 100003 2)
> [2016-12-01 10:50:15.887965] E
> [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed
> to complete successfully
> [2016-12-01 10:50:15.901880] W [rpcsvc.c:261:rpcsvc_program_actor]
> 0-rpc-service: RPC program version not available (req 100003 4)
> [2016-12-01 10:50:15.901900] E
> [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed
> to complete successfully
> [2016-12-01 10:51:03.777145] W [rpcsvc.c:261:rpcsvc_program_actor]
> 0-rpc-service: RPC program version not available (req 100003 2)
> [2016-12-01 10:51:03.777191] E
> [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed
> to complete successfully
> [2016-12-01 10:51:03.790561] W [rpcsvc.c:261:rpcsvc_program_actor]
> 0-rpc-service: RPC program version not available (req 100003 4)
> [2016-12-01 10:51:03.790580] E
> [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed
> to complete successfully
>
It looks like these correspond to the NFS re-connection (client trying
NFSv2 and NFSv4 I think).
Just before that here are the logs:
l_layout_new_directory] 0-HOME-LIRIS-dht: assigning range size
0xffe76e40 to HOME-LIRIS-replicate-0
[2016-12-01 10:48:36.990028] W
[client-rpc-fops.c:2145:client3_3_setattr_cbk] 0-HOME-LIRIS-client-1:
remote operation failed: Op?ration non permise
[2016-12-01 10:48:36.990303] W
[client-rpc-fops.c:2145:client3_3_setattr_cbk] 0-HOME-LIRIS-client-0:
remote operation failed: Op?ration non permise
The message "I [MSGID: 109036]
[dht-common.c:6296:dht_log_new_layout_for_dir_selfheal]
0-HOME-LIRIS-dht: Setting layout of
<gfid:6f8bb427-eea5-4dd5-b004-9db8582bdda2>/_indexer.lock with
[Subvol_name: HOME-LIRIS-replicate-0, Err: -1 , Start: 0 , Stop:
4294967295 ], " repeated 2 times between [2016-12-01 10:48:36.404738]
and [2016-12-01 10:48:36.949907]
[2016-12-01 10:48:36.990728] I [MSGID: 109036]
[dht-common.c:6296:dht_log_new_layout_for_dir_selfheal]
0-HOME-LIRIS-dht: Setting layout of
<gfid:6f8bb427-eea5-4dd5-b004-9db8582bdda2>/39132555496bb098708af2d5e7b56d67
with [Subvol_name: HOME-LIRIS-replicate-0, Err: -1 , Start: 0 , Stop:
4294967295 ],
[2016-12-01 10:50:10.360020] I [dht-rename.c:1344:dht_rename]
0-HOME-LIRIS-dht: renaming
<gfid:2a1f640e-ff3e-4a56-8019-64ec6d803fc1>/tmp_km1NUe
(hash=HOME-LIRIS-replicate-0/cache=HOME-LIRIS-replicate-0) =>
<gfid:2a1f640e-ff3e-4a56-8019-64ec6d803fc1>/general.php
(hash=HOME-LIRIS-replicate-0/cache=HOME-LIRIS-replicate-0)
[2016-12-01 10:50:10.423561] I [dht-rename.c:1344:dht_rename]
0-HOME-LIRIS-dht: renaming
<gfid:2a1f640e-ff3e-4a56-8019-64ec6d803fc1>/tmp_2pOZ5T
(hash=HOME-LIRIS-replicate-0/cache=HOME-LIRIS-replicate-0) =>
<gfid:2a1f640e-ff3e-4a56-8019-64ec6d803fc1>/1.php
(hash=HOME-LIRIS-replicate-0/cache=HOME-LIRIS-replicate-0)
[2016-12-01 10:50:10.485882] I [dht-rename.c:1344:dht_rename]
0-HOME-LIRIS-dht: renaming
<gfid:2a1f640e-ff3e-4a56-8019-64ec6d803fc1>/tmp_86Lmpz
(hash=HOME-LIRIS-replicate-0/cache=HOME-LIRIS-replicate-0) =>
<gfid:2a1f640e-ff3e-4a56-8019-64ec6d803fc1>/general.php
(hash=HOME-LIRIS-replicate-0/cache=HOME-LIRIS-replicate-0)
I also tried to set "nfs.mount-rmtab /dev/shm/glusterfs.rmtab" as I
read
on an old thread. Will check if it change something.
Regards,
--
Y.
> at a time that correspond to the NFS timeouts.
>
> This problem occurs "often" (at least each day or each 2 days),
and
> neither client nor servers are on heavy load (memory and CPU far to be
> full).
>
> Any idea about what can be the reason and how to prevent it to occur?
> I reduced the autofs timeout in order to reduce impact but it is not a
> very nice solution? Note: I can't use the glusterfs client instead of
> NFS because of the memory leaks that still exist in it.
>
> Thanks.
>
> Regards,
> --
> Y.
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161201/7ffec564/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3369 bytes
Desc: Signature cryptographique S/MIME
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161201/7ffec564/attachment.p7s>