Simon Deziel
2020-May-28 18:00 UTC
[nsd-users] NSD still shows permission errors on Debian 10 Buster
On 2020-05-28 5:20 a.m., Anand Buddhdev wrote:> On 28/05/2020 02:34, Simon Deziel via nsd-users wrote: >> I like the idea. Since Debian wants to preserve compatibility with >> both systemd and init, I proposed a slightly different fix to >> Debian for nsd [1] and unbound [2]. Thanks! > > I also noticed one other deficiency in the Debian unit file. It's > missing "Killmode=process".Indeed, the default is KillMode=control-group which SIGTERM everyone in the cgroup, wait 90s by default and then SIGKILL what remains.> NSD starts with a main process, and that then spawns child processes > to handle queries. When you want to kill NSD cleanly, you send a > TERM singal to the main process, which takes care of killing its > children. > > However, systemd by default will send a TERM singal to all the > processes. This causes a haphazard termination of NSD. With the > Killmode setting as above, systemd sends a TERM signal only to the > main process, and NSD handles its shutdown cleanly.I only manage a small fleet of nsd servers so that's probably why I never noticed any problem with cgroup-based killing. However, I did try to simulate this: # ps faux nsd 12972 0.0 21.3 109664 53340 ? Ss 17:05 0:00 /usr/sbin/nsd -d -P nsd 12990 0.0 8.5 42000 21396 ? S 17:05 0:00 \_ /usr/sbin/nsd -d -P nsd 13011 0.0 1.4 57596 3636 ? S 17:05 0:00 \_ /usr/sbin/nsd -d -P kill -SIGTEM 13011 -> does absolutely nothing, the child ignores it kill -SIGTEM 12972 or kill -SIGTEM 12990 triggers a clean shutdown: nsd[12990]: warning: signal received, shutting down... Also, sending SIGTERM to all 3 triggers an orderly shutdown. The above seems to match what the code intends to do but take that with a grain of salt as I can barely read C. Anand, could you please provide some instructions on how to reproduce the issue you are/were having with the cgroup-based killing as my test scenario was likely too simplistic. Thanks Regards, Simon
Anand Buddhdev
2020-May-28 18:44 UTC
[nsd-users] NSD still shows permission errors on Debian 10 Buster
On 28/05/2020 20:00, Simon Deziel wrote: Hi Simon,>> I also noticed one other deficiency in the Debian unit file. It's >> missing "Killmode=process". > > Indeed, the default is KillMode=control-group which SIGTERM everyone in > the cgroup, wait 90s by default and then SIGKILL what remains.Correct. [snip]> Anand, could you please provide some instructions on how to reproduce > the issue you are/were having with the cgroup-based killing as my test > scenario was likely too simplistic. ThanksI don't have a reproducible scenario on hand, but on servers I manage, there are often up to 32 child processes, and the servers are busy answering thousnads of queries per second, and also often doing zone transfers. I noticed that sometimes when I wanted to shut down NSD on such servers, there would be temporary files left over from incomplete zone transfers. There may also have been something else, but I can't remember it now. Anyway, I realised this was caused by system sending TERM to all processes at the same time. That's why I fixed it with "KillMode=process". Maybe you can try by increasing the server count to a higher value, then forcing some zone transfers and then terminating NSD. Regards, Anand