Hi, you shouldn't do that,as it is intentional - glusterd is just a management layer and you might need to restart it in order to reconfigure a node. You don't want to kill your bricks to introduce a change, right?? For details, you can check?https://access.redhat.com/solutions/1313303 (you can obtain a subscription from developers.redhat.com). In CentOS there is a dedicated service that takes care to shutdown all processes and avoid such freeze . Here is it in case your distro doesn't provide it: user at system:~/Gluster/usr/lib/systemd/system> cat glusterfsd.service [Unit] Description=GlusterFS brick processes (stopping only) After=network.target glusterd.service [Service] Type=oneshot # glusterd starts the glusterfsd processed on-demand # /bin/true will mark this service as started, RemainAfterExit keeps it active ExecStart=/bin/true RemainAfterExit=yes # if there are no glusterfsd processes, a stop/reload should not give an error ExecStop=/bin/sh -c "/bin/killall --wait glusterfsd || /bin/true" ExecReload=/bin/sh -c "/bin/killall -HUP glusterfsd || /bin/true" [Install] WantedBy=multi-user.target Of course you can also use '/usr/share/glusterfs/scripts/stop-all-gluster-processes.sh' to prevent the freeze as it will kill all gluster processes (including FUSE mounts on the system) and thus allow the FUSE clients accessing the bricks' processes and the rest of the TSP to act accordingly. Both the glusterfsd.service and the stop-all-gluster-processes.sh are provided by the glusterfs-server package. Best Regards, Strahil Nikolov ? ?????, 2 ????????? 2020 ?., 21:59:45 ???????+3, Ward Poelmans <wpoely86 at gmail.com> ??????: Hi, I've playing with glusterfs on a couple of VMs to get some feeling with it. The setup is 2 bricks with replication with a thin arbiter. I've noticed something 'odd' with the systemd unit file for glusterd. It has KillMode=process which means that on a 'systemctl stop glusterd' it will only kill the glusterd daemon and not any of the subprocesses started by glusterd (like glusterfs and glusterfsd). Does anyone know the reason for this? The git history of the file doesn't help. It was added in 2013 but the commit doesn't mention anything about it. The reason I'm asking is because I noticed that a write was hanging when I rebooted one of the brick VMs: a client was doing 'dd if=/dev/zero of=/some/file' on gluster when I did a clean shut down of one of the brick VMs. This caused the dd to hang for the duration of network.ping-timeout (42 seconds by default). When I changed the kill mode to 'control-group' (which kills all process started by glusterd too), this didn't happen any more. I was not expecting any 'hangs' on a proper shut down of one of the bricks when replication is used. Is this a bug or is something wrong with my setup? Ward ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
And it seems gdeploy is deprecated in favour of gluster-ansible ->?gluster/gluster-ansible?. gluster/gluster-ansible A core library of gluster specific roles and modules for ansible/ansible tower. - gluster/gluster-ansible Best Regards, Strahil Nikolov ? ?????, 2 ????????? 2020 ?., 22:30:33 ???????+3, Strahil Nikolov <hunter86_bg at yahoo.com> ??????: Hi, you shouldn't do that,as it is intentional - glusterd is just a management layer and you might need to restart it in order to reconfigure a node. You don't want to kill your bricks to introduce a change, right?? For details, you can check?https://access.redhat.com/solutions/1313303 (you can obtain a subscription from developers.redhat.com). In CentOS there is a dedicated service that takes care to shutdown all processes and avoid such freeze . Here is it in case your distro doesn't provide it: user at system:~/Gluster/usr/lib/systemd/system> cat glusterfsd.service [Unit] Description=GlusterFS brick processes (stopping only) After=network.target glusterd.service [Service] Type=oneshot # glusterd starts the glusterfsd processed on-demand # /bin/true will mark this service as started, RemainAfterExit keeps it active ExecStart=/bin/true RemainAfterExit=yes # if there are no glusterfsd processes, a stop/reload should not give an error ExecStop=/bin/sh -c "/bin/killall --wait glusterfsd || /bin/true" ExecReload=/bin/sh -c "/bin/killall -HUP glusterfsd || /bin/true" [Install] WantedBy=multi-user.target Of course you can also use '/usr/share/glusterfs/scripts/stop-all-gluster-processes.sh' to prevent the freeze as it will kill all gluster processes (including FUSE mounts on the system) and thus allow the FUSE clients accessing the bricks' processes and the rest of the TSP to act accordingly. Both the glusterfsd.service and the stop-all-gluster-processes.sh are provided by the glusterfs-server package. Best Regards, Strahil Nikolov ? ?????, 2 ????????? 2020 ?., 21:59:45 ???????+3, Ward Poelmans <wpoely86 at gmail.com> ??????: Hi, I've playing with glusterfs on a couple of VMs to get some feeling with it. The setup is 2 bricks with replication with a thin arbiter. I've noticed something 'odd' with the systemd unit file for glusterd. It has KillMode=process which means that on a 'systemctl stop glusterd' it will only kill the glusterd daemon and not any of the subprocesses started by glusterd (like glusterfs and glusterfsd). Does anyone know the reason for this? The git history of the file doesn't help. It was added in 2013 but the commit doesn't mention anything about it. The reason I'm asking is because I noticed that a write was hanging when I rebooted one of the brick VMs: a client was doing 'dd if=/dev/zero of=/some/file' on gluster when I did a clean shut down of one of the brick VMs. This caused the dd to hang for the duration of network.ping-timeout (42 seconds by default). When I changed the kill mode to 'control-group' (which kills all process started by glusterd too), this didn't happen any more. I was not expecting any 'hangs' on a proper shut down of one of the bricks when replication is used. Is this a bug or is something wrong with my setup? Ward ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
> In CentOS there is a dedicated service that takes care to shutdown all processes and avoid such freezeIf you didn't stop your network interfaces as part of the shutdown, this wouldn't happen either. The final kill will kill the glusterfsd processes, closing the TCP connections properly and preventing the clients from waiting for the server to come back. The problem you're seeing is that the network is being shut down - preventing the clients from getting the proper TCP termination. On 9/2/20 12:30 PM, Strahil Nikolov wrote:> Hi, > > you shouldn't do that,as it is intentional - glusterd is just a management layer and you might need to restart it in order to reconfigure a node. You don't want to kill your bricks to introduce a change, right? > For details, you can check?https://access.redhat.com/solutions/1313303 (you can obtain a subscription from developers.redhat.com). > > In CentOS there is a dedicated service that takes care to shutdown all processes and avoid such freeze . > Here is it in case your distro doesn't provide it: > > user at system:~/Gluster/usr/lib/systemd/system> cat glusterfsd.service > [Unit] > Description=GlusterFS brick processes (stopping only) > After=network.target glusterd.service > > > [Service] > Type=oneshot > # glusterd starts the glusterfsd processed on-demand > # /bin/true will mark this service as started, RemainAfterExit keeps it active > ExecStart=/bin/true > RemainAfterExit=yes > # if there are no glusterfsd processes, a stop/reload should not give an error > ExecStop=/bin/sh -c "/bin/killall --wait glusterfsd || /bin/true" > ExecReload=/bin/sh -c "/bin/killall -HUP glusterfsd || /bin/true" > > > [Install] > WantedBy=multi-user.target > > > Of course you can also use '/usr/share/glusterfs/scripts/stop-all-gluster-processes.sh' to prevent the freeze as it will kill all gluster processes (including FUSE mounts on the system) and thus allow the FUSE clients accessing the bricks' processes and the rest of the TSP to act accordingly. > > Both the glusterfsd.service and the stop-all-gluster-processes.sh are provided by the glusterfs-server package. > > > Best Regards, > Strahil Nikolov > > > > > ? ?????, 2 ????????? 2020 ?., 21:59:45 ???????+3, Ward Poelmans <wpoely86 at gmail.com> ??????: > > > > > > Hi, > > I've playing with glusterfs on a couple of VMs to get some feeling with > it. The setup is 2 bricks with replication with a thin arbiter. I've > noticed something 'odd' with the systemd unit file for glusterd. It has > KillMode=process > which means that on a 'systemctl stop glusterd' it will only kill the > glusterd daemon and not any of the subprocesses started by glusterd > (like glusterfs and glusterfsd). > > Does anyone know the reason for this? The git history of the file > doesn't help. It was added in 2013 but the commit doesn't mention > anything about it. > > The reason I'm asking is because I noticed that a write was hanging when > I rebooted one of the brick VMs: a client was doing 'dd if=/dev/zero > of=/some/file' on gluster when I did a clean shut down of one of the > brick VMs. This caused the dd to hang for the duration of > network.ping-timeout (42 seconds by default). When I changed the kill > mode to 'control-group' (which kills all process started by glusterd > too), this didn't happen any more. > > I was not expecting any 'hangs' on a proper shut down of one of the > bricks when replication is used. Is this a bug or is something wrong > with my setup? > > Ward > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users
Hi Strahil, On 2/09/2020 21:30, Strahil Nikolov wrote:> you shouldn't do that,as it is intentional - glusterd is just a management layer and you might need to restart it in order to reconfigure a node. You don't want to kill your bricks to introduce a change, right?Starting up daemons in one systemd unit and killing them with another is a bit weird? Can't a reconfigure happen through a ExecReload? Or let the management daemon and the actual brick daemons run under different systemd units?> In CentOS there is a dedicated service that takes care to shutdown all processes and avoid such freeze .Thanks, that should fix the issue too. Ward