We experienced a funny problem with a nsd name server. Serial numbers seem to oscillate between an old value and the current one: % check_soa ma ns2.nic.fr has serial number 2005112202 % check_soa ma ns2.nic.fr has serial number 2005111902 nsd logs show nothing and the zone did not change between the two tests. I notice old daemons on the machines: [bortzmeyer at ns2 ~]$ ps auxww|grep nsd nsd 24699 0.0 21.9 459176 455580 ? S Nov21 0:43 /usr/local/nsd/sbin/nsd -a 192.93.0.4 -a 2001:660:3005:1::1:2 -n 15 nsd 24720 4.3 21.9 459752 455964 ? S Nov21 71:25 /usr/local/nsd/sbin/nsd -a 192.93.0.4 -a 2001:660:3005:1::1:2 -n 15 nsd 31290 0.0 0.0 0 0 ? Z Nov21 0:43 [nsd] <defunct> nsd 18064 1.9 21.9 459844 456164 ? S 10:25 0:44 /usr/local/nsd/sbin/nsd -a 192.93.0.4 -a 2001:660:3005:1::1:2 -n 15 nsd 18074 5.2 21.9 460304 456408 ? S 10:25 1:55 /usr/local/nsd/sbin/nsd -a 192.93.0.4 -a 2001:660:3005:1::1:2 -n 15 Killing them all (they require a -KILL) apparently solved the problem. Is it possible that the "old" daemons were still receiving some of the UDP requests (I did not test with TCP, unfortunately) and replied with old data? NSD 2.3.0 CentOS release 4.2 (Final) Linux 2.6.9-22.0.1.ELsmp
On Tue, Nov 22, 2005 at 11:14:38AM +0100, Stephane Bortzmeyer wrote:> Is it possible that the "old" daemons were still receiving some of the > UDP requestsIf you had managed to get a list of which processes had which files open, before you had done the kill, this would have told you if it was technically possible for a process to receive traffic on the port. I use lsof myself for that information. I know others exist, but I forgot their name. Try this "next time". (lsof compiles on most systems). This is how it looks in Solaris 9: robert at thunder (0)$ lsof -Pn|head -1; lsof -Pn|grep UDP.\*:53 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME named 398 root 20u IPv4 0x300038a99b0 0t0 UDP 127.0.0.1:53 (Idle) named 398 root 24u IPv4 0x300039990d0 0t0 UDP 172.16.11.53:53 (Idle) named 398 root 26u IPv4 0x30003827b68 0t0 UDP *:53 (Idle) Whether nsd itself has a feature which enables the behaviour you experienced, I don't know. I will leave it to others to elaborate on that. -- Robert Martin-Leg?ne IT-sikkerhedschef - IT security manager DK Hostmaster A/S - the DK TLD Registry
[On 22 Nov, @11:14, Stephane Bortzmeyer wrote in "An old NSD daemon, stuck with ..."]> We experienced a funny problem with a nsd name server. Serial numbers > seem to oscillate between an old value and the current one: > > % check_soa ma > ns2.nic.fr has serial number 2005112202 > % check_soa ma > ns2.nic.fr has serial number 2005111902 > > nsd logs show nothing and the zone did not change between the two tests.We will try to simulate this here, but more details would certainly be helpfull. What I gathered from Jaap was that something went haywire with a failed AXFR (tsig related) and that possibly the reload failed? -- grtz, - Miek http://www.miek.nl http://www.nlnetlabs.nl PGP Key ID: 3880 D0F6 fingerprint: 6A3C F450 6D4E 7C6B C23C F982 258B 85CF 3880 D0F6 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: <http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20051122/bb1a914a/attachment.bin>
[On 22 Nov, @11:14, Stephane Bortzmeyer wrote in "An old NSD daemon, stuck with ..."]> problem. Is it possible that the "old" daemons were still receiving > some of the UDP requests (I did not test with TCP, unfortunately) andYou are hit by a tricky race condition in the NSD code. The following patch minimizes the race window. This patch will be in nsd 2.3.2. For nsd 2.3.3 I will refactor the sig_handler. Index: nsd.c ==================================================================--- nsd.c (revision 1724) +++ nsd.c (working copy) @@ -167,6 +167,7 @@ sig_handler (int sig) { size_t i; + /* To avoid race cond. We really don't want to use log_msg() in this handler */ /* Are we a child server? */ if (nsd.server_kind != NSD_SERVER_MAIN) { @@ -197,7 +198,7 @@ case SIGCHLD: return; case SIGHUP: - log_msg(LOG_WARNING, "signal %d received, reloading...", sig); + /* log_msg(LOG_WARNING, "signal %d received, reloading...", sig); */ nsd.mode = NSD_RELOAD; return; case SIGALRM: @@ -223,7 +224,7 @@ case SIGTERM: default: nsd.mode = NSD_SHUTDOWN; - log_msg(LOG_WARNING, "signal %d received, shutting down...", sig); + /* log_msg(LOG_WARNING, "signal %d received, shutting down...", sig); */ sig = SIGTERM; break; } Index: server.c ==================================================================--- server.c (revision 1724) +++ server.c (working copy) @@ -460,6 +460,8 @@ break; } + log_msg(LOG_WARNING, "signal received, reloading..."); + reload_pid = fork(); switch (reload_pid) { case -1: @@ -520,6 +522,7 @@ server_shutdown(nsd); break; case NSD_SHUTDOWN: + log_msg(LOG_WARNING, "signal received, shutting down..."); break; default: log_msg(LOG_WARNING, "NSD main server mode invalid: %d", nsd->mode); -- grtz, - Miek http://www.miek.nl http://www.nlnetlabs.nl PGP Key ID: 3880 D0F6 fingerprint: 6A3C F450 6D4E 7C6B C23C F982 258B 85CF 3880 D0F6 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: <http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20051124/541e91b6/attachment.bin>