Folks, running nsd 4.1.14 as the hidden master for the domain "exim.org", noticed something weird after rebooting for kernel security updates. Have checked doc/ChangeLog for 4.1.15 and don't see anything relevant. Seems to be a race in startup which can result in returning NOT IMPL? System is Ubuntu (14.04.5 LTS), has static IPv4 and IPv6 assigned as fixed values in /etc/network/interfaces but the default route is learnt via route advertisements. Because I need to wait for some read-only bind-mounts constructing the chroot to be available, I have /etc/init/nsd.override anyway to specify the start-up condition and it's guarded on `... and net-device-up IFACE!=lo`. We have unbound listening on localhost:53, so nsd is being given explicit IPs to bind. Since the last-reboot-before-today I also have a `/etc/network/if-up.d/010wait-for-ipv6` which is trying to wait on IPv6 being configured, as Upstart seems to emit as soon as IPv4 is ready, before even the statically configured IPv6 is ready. In previous reboots, nsd would fail to start because it couldn't bind the addresses for serving, because they weren't yet available. Because the wait-script is buggy (I now know what's needed to fix it) we only got a "bit" of a delay, so the first attempt nsd failed as before, but the third attempt succeeded: the IPv6 address was present, so nsd could start. But routing was not yet present. [2017-03-09 02:42:02.037] nsd[1081]: notice: nsd starting (NSD 4.1.14) [2017-03-09 02:42:02.050] nsd[1081]: info: setup SSL certificates [2017-03-09 02:42:02.277] nsd[1087]: info: zonefile zonefiles/db.exim.org is not modified [2017-03-09 02:42:02.277] nsd[1087]: info: zonefile zonefiles/db.testdns.exim.org is not modified [2017-03-09 02:42:02.277] nsd[1087]: info: zonefile zonefiles/unsigned.invalid254.testdns.exim.org is not modified [2017-03-09 02:42:02.277] nsd[1087]: notice: nsd started (NSD 4.1.14), pid 1081 [2017-03-09 02:42:02.278] nsd[1081]: error: xfrd: sendto 2a02:898:31::53:0 failed Network is unreachable [2017-03-09 02:42:02.278] nsd[1081]: error: xfrd: zone exim.org: could not send notify #1 to 2a02:898:31::53:0 [2017-03-09 02:42:02.278] nsd[1081]: error: xfrd: sendto 2a02:898:31::53:0 failed Network is unreachable [2017-03-09 02:42:02.278] nsd[1081]: error: xfrd: zone testdns.exim.org: could not send notify #1 to 2a02:898:31::53:0 [2017-03-09 02:42:02.278] nsd[1081]: error: xfrd: sendto 2a02:898:31::53:0 failed Network is unreachable [2017-03-09 02:42:02.278] nsd[1081]: error: xfrd: zone invalid254.testdns.exim.org: could not send notify #1 to 2a02:898:31::53:0 Seeing this, I check that things are working with a NOTIFY: [2017-03-09 02:46:57.186] nsd[1081]: info: new control connection from 127.0.0.1 [2017-03-09 02:46:57.210] nsd[1081]: info: remote control connection authenticated [2017-03-09 02:46:57.247] nsd[1081]: info: control cmd: notify testdns.exim.org [2017-03-09 02:46:57.248] nsd[1081]: info: remote control operation completed [2017-03-09 02:46:57.745] nsd[3709]: info: axfr for testdns.exim.org. from 2604:a880:800:a1::419:1001 [2017-03-09 02:46:57.908] nsd[3709]: error: failed reading from 2604:a880:800:a1::419:1001 tcp: Connection reset by peer [2017-03-09 02:46:58.740] nsd[3709]: info: axfr for testdns.exim.org. from 2604:a880:800:a1::419:1001 [2017-03-09 02:46:58.901] nsd[3709]: error: failed reading from 2604:a880:800:a1::419:1001 tcp: Connection reset by peer [2017-03-09 02:46:59.743] nsd[3709]: info: axfr for testdns.exim.org. from 2604:a880:800:a1::419:1001 [2017-03-09 02:46:59.906] nsd[3709]: error: failed reading from 2604:a880:800:a1::419:1001 tcp: Connection reset by peer I check that server's logs; it's "NSD 4.1.7" from OS vendor packages, and it sees: [2017-03-09 02:46:57.649] nsd[1692]: error: xfrd: zone testdns.exim.org received error code NOT IMPL from [hidden-master] ... and similarly before that for the startup notifies: they did seem to make it out in the end, but the secondary had been unable to transfer. I restart nsd 4.1.14 on the hidden master and the problem disappears. I can't prove the two are connected in causation, but what I see is "server experiences issues sending on IPv6 at startup" is followed by "secondary gets NOT IMPL on transfer". Are they linked? Is there an issue here? (Meanwhile, I'm going to defer nsd startup until we also have an IPv6 route). Thanks, -Phil
Anand Buddhdev
2017-Mar-09 08:56 UTC
[nsd-users] startup race leading to NOT IMPL on AXFR?
On 09/03/2017 04:25, Phil Pennock wrote: Hi Phil, There are 2 different things here. See my responses below.> In previous reboots, nsd would fail to start because it couldn't bind > the addresses for serving, because they weren't yet available. Because > the wait-script is buggy (I now know what's needed to fix it) we only > got a "bit" of a delay, so the first attempt nsd failed as before, but > the third attempt succeeded: the IPv6 address was present, so nsd could > start. But routing was not yet present.For this you may want to use the option "ip-transparent: yes" to allow NSD to bind to addresses not yet available. As soon as they become available, NSD can use them.> [2017-03-09 02:46:57.649] nsd[1692]: error: xfrd: zone > testdns.exim.org received error code NOT IMPL from [hidden-master]This might be a red herring. NSD as a slave tries IXFR first. However, the master is also NSD, and it cannot provide IXFR (because it doesn't have any journals), so it returns NOTIMPL. The slave then falls back to AXFR. If you want to avoid the initial IXFR (because it will never work), set this on the slave: request-xfr: AXFR <master> <key> Regards, Anand Buddhdev