Karli Sjöberg
2018-Aug-10 12:08 UTC
[Gluster-users] ganesha.nfsd process dies when copying files
Hey all! I am playing around on my computer with setting up a virtual mini- cluster of five VM's: 1x router 1x client 3x Gluster/NFS-Ganesha servers The router is pfSense, the client is Xubuntu 18.04 and the servers are CentOS 7.5. I set up the cluster using 'gdeploy' with configuration snippets taken from oVirt/Cockpit HCI setup and another snippet for setting up the NFS-Ganesha part of it. The configuration is successful apart from some minor details I debugged but I'm fairly sure I haven't made any obvious misses. All of the VM's are registered in pfSense's DNS, as well as the VIP's for the NFS-Ganesha nodes, which works great and the client have no issues with resolving any of the names. hv01.localdomain 192.168.1.101 hv02.localdomain 192.168.1.102 hv03.localdomain 192.168.1.103 hv01v.localdomain 192.168.1.110 hv02v.localdomain 192.168.1.111 hv03v.localdomain 192.168.1.112 The cluster status is HEALTHY accoring to '/usr/libexec/ganesha/ganesha-ha.sh' before I start my tests: client# mount -t nfs -o vers=4.1 hv01v.localdomain:/data /mnt client# dd if=/dev/urandom of=/var/tmp/test.bin bs=1M count=1024 client# while true; do rsync /var/tmp/test.bin /mnt/; rm -f /mnt/test.bin; done Then after a while, the 'nfs-ganesha' service unexpectedly dies and doesn't restart by itself. The copy loop gets picked up after a while on 'hv02' until history repeats itself until all of the nodes' 'nfs- ganesha' services are dead. With normal logs activated, the dead node says nothing before dying; sudden heart attack syndrome- so no clues there, and ones remaining only says they've taken over... Right now I'm running with FULL_DEBUG which makes testing very difficult since the throughput is down to a crawl. Nothing strange about that, just takes a lot more time to provoke. Please don't hesitate to ask for more information in case there's something else you'd like me to share! I'm hoping someone recognizes this behaviour and knows what I'm doing wrong:) glusterfs-client-xlators-3.10.12-1.el7.x86_64 glusterfs-api-3.10.12-1.el7.x86_64 nfs-ganesha-2.4.5-1.el7.x86_64 centos-release-gluster310-1.0-1.el7.centos.noarch glusterfs-3.10.12-1.el7.x86_64 glusterfs-cli-3.10.12-1.el7.x86_64 nfs-ganesha-gluster-2.4.5-1.el7.x86_64 glusterfs-server-3.10.12-1.el7.x86_64 glusterfs-libs-3.10.12-1.el7.x86_64 glusterfs-fuse-3.10.12-1.el7.x86_64 glusterfs-ganesha-3.10.12-1.el7.x86_64 Thanks in advance! /K -------------- next part -------------- #gdeploy configuration generated by cockpit-gluster plugin [hosts] hv01.localdomain hv02.localdomain hv03.localdomain [yum] action=install repolistgpgcheck=no update=no packages=glusterfs-server,glusterfs-api,glusterfs-ganesha,nfs-ganesha,nfs-ganesha-gluster,policycoreutils-python,device-mapper-multipath,corosync,pacemaker,pcs [script1:hv01.localdomain] action=execute ignore_script_errors=no file=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d vdb -h hv01.localdomain,hv02.localdomain,hv03.localdomain [script1:hv02.localdomain] action=execute ignore_script_errors=no file=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d vdb -h hv01.localdomain,hv02.localdomain,hv03.localdomain [script1:hv03.localdomain] action=execute ignore_script_errors=no file=/usr/share/gdeploy/scripts/grafton-sanity-check.sh -d vdb -h hv01.localdomain,hv02.localdomain,hv03.localdomain [disktype] jbod [diskcount] 12 [stripesize] 256 [service1] action=enable service=chronyd [service2] action=restart service=chronyd [script3] action=execute file=/usr/share/gdeploy/scripts/blacklist_all_disks.sh ignore_script_errors=no [pv1:hv01.localdomain] action=create devices=vdb ignore_pv_errors=no [pv1:hv02.localdomain] action=create devices=vdb ignore_pv_errors=no [pv1:hv03.localdomain] action=create devices=vdb ignore_pv_errors=no [vg1:hv01.localdomain] action=create vgname=gluster_vg_vdb pvname=vdb ignore_vg_errors=no [vg1:hv02.localdomain] action=create vgname=gluster_vg_vdb pvname=vdb ignore_vg_errors=no [vg1:hv03.localdomain] action=create vgname=gluster_vg_vdb pvname=vdb ignore_vg_errors=no [lv1:hv01.localdomain] action=create poolname=gluster_thinpool_vdb ignore_lv_errors=no vgname=gluster_vg_vdb lvtype=thinpool size=450GB poolmetadatasize=3GB [lv2:hv02.localdomain] action=create poolname=gluster_thinpool_vdb ignore_lv_errors=no vgname=gluster_vg_vdb lvtype=thinpool size=450GB poolmetadatasize=3GB [lv3:hv03.localdomain] action=create poolname=gluster_thinpool_vdb ignore_lv_errors=no vgname=gluster_vg_vdb lvtype=thinpool size=45GB poolmetadatasize=1GB [lv4:hv01.localdomain] action=create lvname=gluster_lv_data ignore_lv_errors=no vgname=gluster_vg_vdb mount=/gluster_bricks/data lvtype=thinlv poolname=gluster_thinpool_vdb virtualsize=450GB [lv5:hv02.localdomain] action=create lvname=gluster_lv_data ignore_lv_errors=no vgname=gluster_vg_vdb mount=/gluster_bricks/data lvtype=thinlv poolname=gluster_thinpool_vdb virtualsize=450GB [lv6:hv03.localdomain] action=create lvname=gluster_lv_data ignore_lv_errors=no vgname=gluster_vg_vdb mount=/gluster_bricks/data lvtype=thinlv poolname=gluster_thinpool_vdb virtualsize=45GB [selinux] yes [service3] action=restart service=glusterd slice_setup=yes [firewalld] action=add ports=111/tcp,2049/tcp,54321/tcp,5900/tcp,5900-6923/tcp,5666/tcp,16514/tcp,662/tcp,662/udp,892/tcp,892/udp,2020/tcp,2020/udp,875/tcp,875/udp services=glusterfs,nfs,rpc-bind,high-availability,mountd [script2] action=execute file=/usr/share/gdeploy/scripts/disable-gluster-hooks.sh [volume1] action=create volname=data transport=tcp replica=yes replica_count=3 key=group,storage.owner-uid,storage.owner-gid,network.ping-timeout,performance.strict-o-direct,network.remote-dio,cluster.granular-entry-heal value=virt,0,0,30,on,off,enable brick_dirs=hv01.localdomain:/gluster_bricks/data/data,hv02.localdomain:/gluster_bricks/data/data,hv03.localdomain:/gluster_bricks/data/data ignore_volume_errors=no arbiter_count=1 [nfs-ganesha] action=create-cluster ha-name=ganesha-ha-360 cluster-nodes=hv01.localdomain,hv02.localdomain,hv03.localdomain vip=192.168.1.110,192.168.1.111,192.168.1.112 volname=data
Kaleb S. KEITHLEY
2018-Aug-10 12:39 UTC
[Gluster-users] ganesha.nfsd process dies when copying files
On 08/10/2018 08:08 AM, Karli Sj?berg wrote:> Hey all! > ... > > glusterfs-client-xlators-3.10.12-1.el7.x86_64 > glusterfs-api-3.10.12-1.el7.x86_64 > nfs-ganesha-2.4.5-1.el7.x86_64 > centos-release-gluster310-1.0-1.el7.centos.noarch > glusterfs-3.10.12-1.el7.x86_64 > glusterfs-cli-3.10.12-1.el7.x86_64 > nfs-ganesha-gluster-2.4.5-1.el7.x86_64 > glusterfs-server-3.10.12-1.el7.x86_64 > glusterfs-libs-3.10.12-1.el7.x86_64 > glusterfs-fuse-3.10.12-1.el7.x86_64 > glusterfs-ganesha-3.10.12-1.el7.x86_64 >For nfs-ganesha problems you'd really be better served by posting to support@ or devel at lists.nfs-ganesha.org. Both glusterfs-3.10 and nfs-ganesha-2.4 are really old. glusterfs-3.10 is even officially EOL. Ganesha isn't really organized enough to have done anything as bold as officially declaring 2.4 as having reached EOL. The nfs-ganesha devs are currently working on 2.7; maintaining and supporting 2.6, and less so 2.5, is pretty much at the limit of what they might be willing to help debug. I strongly encourage you to update to a more recent version of both glusterfs and nfs-ganesha. glusterfs-4.1 and nfs-ganesha-2.6 would be ideal. Then if you still have problems you're much more likely to get help. -- Kaleb