01.08.2021 12:19, Michael Tokarev via nsd-users wrote:> 01.08.2021 12:12, Anand Buddhdev via nsd-users wrote:
>
>> TSIG requires the time on the primary and secondary to be synchronised
>> to within 5 minutes. Check the system time on your two servers. One of
>> them has probably drifted more than 5 minutes. If you're not
already
>> running something like ntp or chrony, you should do that to keep the
>> time accurate on these servers.
>
> The time is syncronized within microseconds. We're running time sync
> daemons on both, and both works as expected. This is the first thing
> I checked - if the time is okay - yes it is, ticking exactly in line
> on both servers.
>
> I'm trying to rebuild NSD with #ifdebug turned on in the place where
> BADTIME is returned, but so far it doesn't work, - apparently it tries
> to print to a closed filedescriptor :)
And after actually enabling the debug stuff in there I found out that
the time differs by exactly 3 hours - which is our GMT offset. Which
lead me to realize that the problem apparently is wrong timezone
picked by one of the NSDs, - after restart of NSD on master, the
secondary were finally able to transfer the zones.
There was an issue still, - but this looks more like a prob is with
systemd integration (debian), not in nsd itself:
Aug 01 12:24:15 panda.tls.msk.ru systemd[1]: Starting Name Server Daemon...
Aug 01 12:24:15 panda.tls.msk.ru nsd[3195]: nsd starting (NSD 4.3.5)
Aug 01 12:24:15 panda.tls.msk.ru nsd[3195]: [2021-08-01 12:24:15.242] nsd[3195]:
notice: nsd starting (NSD 4.3.5)
Aug 01 12:24:15 panda.tls.msk.ru nsd[3196]: nsd started (NSD 4.3.5), pid 3195
Aug 01 12:24:15 panda.tls.msk.ru nsd[3196]: [2021-08-01 12:24:15.274] nsd[3196]:
notice: nsd started (NSD 4.3.5), pid 3195
Aug 01 12:24:16 panda.tls.msk.ru nsd[3195]: zone corpit.ru serial 2020071628 is
updated to 2020071632
Aug 01 12:24:16 panda.tls.msk.ru nsd[3195]: [2021-08-01 12:24:16.375] nsd[3195]:
info: zone corpit.ru serial 2020071628 is updated to 2020071632
.... a few more zones updated ....
Aug 01 12:25:45 panda.tls.msk.ru systemd[1]: nsd.service: start operation timed
out. Terminating.
Aug 01 12:25:45 panda.tls.msk.ru nsd[3196]: signal received, shutting down...
Aug 01 12:25:45 panda.tls.msk.ru nsd[3196]: [2021-08-01 12:25:45.228] nsd[3196]:
warning: signal received, shutting down...
Aug 01 12:25:45 panda.tls.msk.ru systemd[1]: nsd.service: Failed with result
'timeout'.
Aug 01 12:25:45 panda.tls.msk.ru systemd[1]: Failed to start Name Server Daemon.
Aug 01 12:25:45 panda.tls.msk.ru systemd[1]: nsd.service: Scheduled restart job,
restart counter is at 1.
Aug 01 12:25:45 panda.tls.msk.ru systemd[1]: Stopped Name Server Daemon.
Aug 01 12:25:45 panda.tls.msk.ru systemd[1]: Starting Name Server Daemon...
Aug 01 12:25:45 panda.tls.msk.ru nsd[3240]: nsd starting (NSD 4.3.5)
Aug 01 12:25:45 panda.tls.msk.ru nsd[3240]: [2021-08-01 12:25:45.512] nsd[3240]:
notice: nsd starting (NSD 4.3.5)
Aug 01 12:25:45 panda.tls.msk.ru nsd[3241]: nsd started (NSD 4.3.5), pid 3240
Aug 01 12:25:45 panda.tls.msk.ru nsd[3241]: [2021-08-01 12:25:45.543] nsd[3241]:
notice: nsd started (NSD 4.3.5), pid 3240
Aug 01 12:27:15 panda.tls.msk.ru systemd[1]: nsd.service: start operation timed
out. Terminating.
Aug 01 12:27:15 panda.tls.msk.ru nsd[3241]: signal received, shutting down...
Aug 01 12:27:15 panda.tls.msk.ru nsd[3241]: [2021-08-01 12:27:15.728] nsd[3241]:
warning: signal received, shutting down...
Aug 01 12:27:15 panda.tls.msk.ru systemd[1]: nsd.service: Failed with result
'timeout'.
Aug 01 12:27:15 panda.tls.msk.ru systemd[1]: Failed to start Name Server Daemon.
Aug 01 12:27:15 panda.tls.msk.ru systemd[1]: nsd.service: Scheduled restart job,
restart counter is at 2.
Aug 01 12:27:15 panda.tls.msk.ru systemd[1]: Stopped Name Server Daemon.
Aug 01 12:27:15 panda.tls.msk.ru systemd[1]: Starting Name Server Daemon...
Aug 01 12:27:16 panda.tls.msk.ru nsd[3266]: nsd starting (NSD 4.3.5)
(for some reason some lines are repeated in the log but with different prefix,
this is yet another thing to look at).
Anyway, the initial - main - issue is now solved, and I'm trying to
understand
what actually happened.
Thank you!
/mjt