Stephane Bortzmeyer
2020-Jan-06 19:39 UTC
[nsd-users] Repeated crashes of NSD, without a clear explanation
For now one week, one machine has NSD crashing after a few hours of running, corrupting nsd.db. The log (verbosity 4) says: Jan 06 20:31:30 ada nsd[1974]: process 1975 exited with status 9 Jan 06 20:31:30 ada nsd[1974]: [2020-01-06 19:31:30.892] nsd[1974]: error: process 1975 exited with status 9 Jan 06 20:31:30 ada nsd[1974]: rmdir /tmp/nsd-xfr-1974 failed: Directory not empty Jan 06 20:31:30 ada nsd[1974]: [2020-01-06 19:31:30.909] nsd[1974]: warning: rmdir /tmp/nsd-xfr-1974 failed: Directory not empty Jan 06 20:31:31 ada nsd[2195]: nsd starting (NSD 4.1.26) Jan 06 20:31:31 ada nsd[2195]: [2020-01-06 19:31:31.418] nsd[2195]: notice: nsd starting (NSD 4.1.26) Jan 06 20:31:31 ada nsd[2195]: setup SSL certificates Jan 06 20:31:31 ada nsd[2195]: [2020-01-06 19:31:31.421] nsd[2195]: info: setup SSL certificates Jan 06 20:31:31 ada nsd[2196]: /var/lib/nsd/nsd.db: not cleanly closed 0 Jan 06 20:31:31 ada nsd[2195]: [2020-01-06 19:31:31.798] nsd[2196]: warning: /var/lib/nsd/nsd.db: not cleanly closed 0 Jan 06 20:31:31 ada nsd[2195]: [2020-01-06 19:31:31.798] nsd[2196]: warning: can not use /var/lib/nsd/nsd.db, will create anew And then NSD stops. I have to start it manually, making it work for a few more hours. This machine worked fine, with the same set of zones, for several years (yes, of course, software was upgraded, but another Debian, machine, same version and same NSD, and almost same set of zones, has no problem). Debian "stable" 10.2, Linux kernel 4.19.0, NSD 4.1.26. As I said, a very similar machine works fine. % ls -alt /var/lib/nsd total 552 -rw------- 1 nsd nsd 589824 Jan 6 20:33 nsd.db -rw-r--r-- 1 nsd nsd 6605 Jan 6 20:31 xfrd.state drwxr-xr-x 2 nsd nsd 4096 Jan 6 20:31 . drwxr-xr-x 70 root root 4096 Jan 6 20:18 .. Deleting all /var/lib/nsd and starting from a fresh directory changes nothing. What can I investigate?
Anand Buddhdev
2020-Jan-07 06:11 UTC
[nsd-users] Repeated crashes of NSD, without a clear explanation
On 06/01/2020 22:39, Stephane Bortzmeyer via nsd-users wrote: Hi Stephane,> For now one week, one machine has NSD crashing after a few hours of > running, corrupting nsd.db. > > The log (verbosity 4) says: > > Jan 06 20:31:30 ada nsd[1974]: process 1975 exited with status 9 > Jan 06 20:31:30 ada nsd[1974]: [2020-01-06 19:31:30.892] nsd[1974]: error: process 1975 exited with status 9 > Jan 06 20:31:30 ada nsd[1974]: rmdir /tmp/nsd-xfr-1974 failed: Directory not emptyThis suggests that an incoming XFR is triggering a bug. Have you saved the contents of the nsd-xfr-1974 directory? If not, perhaps you can save it the next time it happens. This may help the developers in figuring out what causes the crash. Also, is there any log above this, to indicate which zone it might be? Note that there are several newer versions of NSD since 4.1.26, so this bug may also have been fixed in a newer version. If you can upgrade, you may want to do that. Finally, the database mode is no longer recommended. Could you try running your instance of NSD with: database: "" Regards, Anand Buddhdev