Hello.
We had a network outage today, which caused quite some grief for us,
and one of the reasons was NSD, especially `nsdc notify' and nsd-notify.
We've about 30 zones defined locally, for which there are 2 local and 2
remote servers. We had to modify some data in these zones (actually
just refreshing dnssec keys because I forgot to do that earlier, but
it does not matter). After modification, I ran `nsdc notify', which
did multiple attempts to notify the two unreachable nameservers with
increasing timeouts (even if the network layer immediately returned
"no route to host"). As a result, I failed to fix our dns promptly,
since I waited for quite some time till it gets to the zones which
are really important -- so I interrupted it, copied the config,
removed all references to the unreachable nameservers from it,
re-ran the notify part, and restored the config.
Now, the questions.
Should maybe nsd-notify implement the functionality of the
nsdc script in this case, by scanning the conffile and sending
all notifies to all found zones and to all nameservers just the
same way as `nsdc notify' does, but doing it all in parallel, not
one after another?
And, should nsd-notify wait for so long and try to do so many
attempts for each? Maybe do just two attempts (second within
a 1-second interval) and be done with it? Or maybe there should
be some option for that?
Or maybe it is better for nsd itself to send the notifies, f.e.
as triggered by nsd-notify - so that nsd-notify does not send
notifies itself but sends a trigger to a running daemon who
maintains list of "pending" notifications? (Probably too
complicated for the daemon)
Why nsd-notify does not detect ICMP errors which are being
returned by the operating system, and waits till timeout
expires?
Right now I "fixed" this issue by adding an ampersand (&)
to the end of nsd-notify commandline in nsdc, and added one
`wait' call at the very end - this is not really portable,
but at least this way it works, unlike originally where it
will take ages to complete. Obviously this is also wrong
if the number of zones will be large - too many processes
may be spawned. But the "right" behavour can't be coded
in shell easily, standard /bin/sh does not have controls
for that - hence I asked if maybe nsd-notify itself should
parse the conffile and doing it all in parallel...
BTW, this is nsd version 3.2.8, I haven't seen 3.2.9 yet.
Thanks!
/mjt