Hello! We dynamically add secondary zones to NSD, and then test if NSD has fetched the zone (asking for the SOA). If the zone was not fetched yet, we try to push NSD to transfer zone faster. But sometimes that does not seem to work. For example in this case it took 15s to fetch the zone from the primary (running on localhost), although NSD was idle at this moment. 00:19:13 nsd[16628]: control cmd: addzone example.com slave_zone_from_local_pdns 00:19:13 nsd[16632]: zonefile slave_zone_from_local_pdns/example.com does not exist 00:19:14 nsd[16628]: new control connection from 127.0.0.1 00:19:14 nsd[16628]: control cmd: force_transfer example.com 00:19:15 nsd[16628]: new control connection from 127.0.0.1 00:19:15 nsd[16628]: control cmd: force_transfer example.com 00:19:16 nsd[16628]: new control connection from 127.0.0.1 00:19:16 nsd[16628]: control cmd: force_transfer example.com 00:19:18 nsd[16628]: new control connection from 127.0.0.1 00:19:18 nsd[16628]: control cmd: force_transfer example.com 00:19:19 nsd[16628]: new control connection from 127.0.0.1 00:19:19 nsd[16628]: control cmd: force_transfer example.com 00:19:20 nsd[16628]: new control connection from 127.0.0.1 00:19:20 nsd[16628]: control cmd: force_transfer example.com 00:19:21 nsd[16628]: new control connection from 127.0.0.1 00:19:21 nsd[16628]: control cmd: force_transfer example.com 00:19:22 nsd[16628]: new control connection from 127.0.0.1 00:19:22 nsd[16628]: control cmd: force_transfer example.com 00:19:23 nsd[16628]: new control connection from 127.0.0.1 00:19:23 nsd[16628]: control cmd: force_transfer example.com 00:19:24 nsd[16628]: new control connection from 127.0.0.1 00:19:24 nsd[16628]: control cmd: force_transfer example.com 00:19:28 nsd[16628]: xfrd: zone example.com committed "received update to serial 2018091101 at 2023-04-03T00:19:28 from 127.0.0.1 at 14018" 00:19:28 nsd[16632]: zone example.com. received update to serial 2018091101 at 2023-04-03T00:19:28 from 127.0.0.1 at 14018 of 417 bytes in 0.00071 seconds 00:19:28 nsd[16628]: zone example.com serial 0 is updated to 2018091101 I can think of 2 possibilites: a) the primary name server is overloaded and does not accept AXFR requests immediately b) NSD has some timers that stop him from acting immediately Are there any such timers in NSD? Can you please give me some hints how to debug this issue? Thanks Klaus -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20230403/52fa9121/attachment.htm>
Hi Klaus, The log message says the transfer happened at 00:19:28. That is about 15 seconds after the 00:19:13 from the first command. That means it the upstream connection only happened at that time. Something must have delayed it. High verbosity can log more details if this is desired. What could have happened is that the upstream server refused, or failed, to answer, and the 15 second delay is the retry timeout. The value of 15 seconds for that is fairly normal for that kind of thing, from SOA or nsd configuration on retry timeout minimum and maximum. Without a SOA record, since there is no zone loaded it logs, NSD also uses a value of around 10 seconds for retries. So it could very likely be this kind of thing. Perhaps the initial fetch is too early, the upstream server has not actually loaded the zone in the way that AXFRs can get an answer yet, and connections fail. A wait for that could maybe help with that, so that an AXFR request is answered. Otherwise, I also do not see how it would be delayed, possibly, NSD is so overloaded and suffers from a lack of resource, that upstream connections for AXFR are delayed to get resources. This can be improved by allowing more file descriptors to xfrd, configurable in nsd.conf with xfrd-tcp-max. Best regards, Wouter On 4/3/23 09:54, Klaus Darilion via nsd-users wrote:> Hello! > > We dynamically add secondary zones to NSD, and then test if NSD has > fetched the zone (asking for the SOA). If the zone was not fetched yet, > we try to push NSD to transfer zone faster. But sometimes that does not > seem to work. For example in this case it took 15s to fetch the zone > from the primary (running on localhost), although NSD was idle at this > moment. > > 00:19:13 nsd[16628]: control cmd:? addzone example.com > slave_zone_from_local_pdns > > 00:19:13 nsd[16632]: zonefile slave_zone_from_local_pdns/example.com > does not exist > > 00:19:14 nsd[16628]: new control connection from 127.0.0.1 > > 00:19:14 nsd[16628]: control cmd:? force_transfer example.com > > 00:19:15 nsd[16628]: new control connection from 127.0.0.1 > > 00:19:15 nsd[16628]: control cmd:? force_transfer example.com > > 00:19:16 nsd[16628]: new control connection from 127.0.0.1 > > 00:19:16 nsd[16628]: control cmd:? force_transfer example.com > > 00:19:18 nsd[16628]: new control connection from 127.0.0.1 > > 00:19:18 nsd[16628]: control cmd:? force_transfer example.com > > 00:19:19 nsd[16628]: new control connection from 127.0.0.1 > > 00:19:19 nsd[16628]: control cmd:? force_transfer example.com > > 00:19:20 nsd[16628]: new control connection from 127.0.0.1 > > 00:19:20 nsd[16628]: control cmd:? force_transfer example.com > > 00:19:21 nsd[16628]: new control connection from 127.0.0.1 > > 00:19:21 nsd[16628]: control cmd:? force_transfer example.com > > 00:19:22 nsd[16628]: new control connection from 127.0.0.1 > > 00:19:22 nsd[16628]: control cmd:? force_transfer example.com > > 00:19:23 nsd[16628]: new control connection from 127.0.0.1 > > 00:19:23 nsd[16628]: control cmd:? force_transfer example.com > > 00:19:24 nsd[16628]: new control connection from 127.0.0.1 > > 00:19:24 nsd[16628]: control cmd:? force_transfer example.com > > 00:19:28 nsd[16628]: xfrd: zone example.com committed "received update > to serial 2018091101 at 2023-04-03T00:19:28 from 127.0.0.1 at 14018" > > 00:19:28 nsd[16632]: zone example.com. received update to serial > 2018091101 at 2023-04-03T00:19:28 from 127.0.0.1 at 14018 of 417 bytes in > 0.00071 seconds > > 00:19:28 nsd[16628]: zone example.com serial 0 is updated to 2018091101 > > I can think of 2 possibilites: > a) the primary name server is overloaded and does not accept AXFR > requests immediately > > b) NSD has some timers that stop him from acting immediately > > Are there any such timers in NSD? Can you please give me some hints how > to debug this issue? > > Thanks > > Klaus > > > _______________________________________________ > nsd-users mailing list > nsd-users at lists.nlnetlabs.nl > https://lists.nlnetlabs.nl/mailman/listinfo/nsd-users
Hi Wouter! Thanks for the detailed answer. This NSD instance was really idle. So, I suspect the upstream server was busy (the zone was for sure already available). Looks like I have to start tcpdump and wait until it happens again to have a proof. Thanks Klaus -- Klaus Darilion, Head of Operations nic.at GmbH, Jakob-Haringer-Stra?e 8/V 5020 Salzburg, Austria> -----Urspr?ngliche Nachricht----- > Von: Wouter Wijngaards <wouter at nlnetlabs.nl> > Gesendet: Donnerstag, 6. April 2023 11:25 > An: Klaus Darilion <klaus.darilion at nic.at>; nsd-users at lists.nlnetlabs.nl > Betreff: Re: [nsd-users] NSD slow fetching new zones > > Hi Klaus, > > The log message says the transfer happened at 00:19:28. That is about 15 > seconds after the 00:19:13 from the first command. That means it the > upstream connection only happened at that time. Something must have > delayed it. High verbosity can log more details if this is desired. What > could have happened is that the upstream server refused, or failed, to > answer, and the 15 second delay is the retry timeout. The value of 15 > seconds for that is fairly normal for that kind of thing, from SOA or > nsd configuration on retry timeout minimum and maximum. Without a SOA > record, since there is no zone loaded it logs, NSD also uses a value of > around 10 seconds for retries. So it could very likely be this kind of > thing. Perhaps the initial fetch is too early, the upstream server has > not actually loaded the zone in the way that AXFRs can get an answer > yet, and connections fail. A wait for that could maybe help with that, > so that an AXFR request is answered. Otherwise, I also do not see how it > would be delayed, possibly, NSD is so overloaded and suffers from a lack > of resource, that upstream connections for AXFR are delayed to get > resources. This can be improved by allowing more file descriptors to > xfrd, configurable in nsd.conf with xfrd-tcp-max. > > Best regards, Wouter > > On 4/3/23 09:54, Klaus Darilion via nsd-users wrote: > > Hello! > > > > We dynamically add secondary zones to NSD, and then test if NSD has > > fetched the zone (asking for the SOA). If the zone was not fetched > yet, > > we try to push NSD to transfer zone faster. But sometimes that does > not > > seem to work. For example in this case it took 15s to fetch the zone > > from the primary (running on localhost), although NSD was idle at this > > moment. > > > > 00:19:13 nsd[16628]: control cmd:? addzone example.com > > slave_zone_from_local_pdns > > > > 00:19:13 nsd[16632]: zonefile slave_zone_from_local_pdns/example.com > > does not exist > > > > 00:19:14 nsd[16628]: new control connection from 127.0.0.1 > > > > 00:19:14 nsd[16628]: control cmd:? force_transfer example.com > > > > 00:19:15 nsd[16628]: new control connection from 127.0.0.1 > > > > 00:19:15 nsd[16628]: control cmd:? force_transfer example.com > > > > 00:19:16 nsd[16628]: new control connection from 127.0.0.1 > > > > 00:19:16 nsd[16628]: control cmd:? force_transfer example.com > > > > 00:19:18 nsd[16628]: new control connection from 127.0.0.1 > > > > 00:19:18 nsd[16628]: control cmd:? force_transfer example.com > > > > 00:19:19 nsd[16628]: new control connection from 127.0.0.1 > > > > 00:19:19 nsd[16628]: control cmd:? force_transfer example.com > > > > 00:19:20 nsd[16628]: new control connection from 127.0.0.1 > > > > 00:19:20 nsd[16628]: control cmd:? force_transfer example.com > > > > 00:19:21 nsd[16628]: new control connection from 127.0.0.1 > > > > 00:19:21 nsd[16628]: control cmd:? force_transfer example.com > > > > 00:19:22 nsd[16628]: new control connection from 127.0.0.1 > > > > 00:19:22 nsd[16628]: control cmd:? force_transfer example.com > > > > 00:19:23 nsd[16628]: new control connection from 127.0.0.1 > > > > 00:19:23 nsd[16628]: control cmd:? force_transfer example.com > > > > 00:19:24 nsd[16628]: new control connection from 127.0.0.1 > > > > 00:19:24 nsd[16628]: control cmd:? force_transfer example.com > > > > 00:19:28 nsd[16628]: xfrd: zone example.com committed "received update > > to serial 2018091101 at 2023-04-03T00:19:28 from 127.0.0.1 at 14018" > > > > 00:19:28 nsd[16632]: zone example.com. received update to serial > > 2018091101 at 2023-04-03T00:19:28 from 127.0.0.1 at 14018 of 417 bytes in > > 0.00071 seconds > > > > 00:19:28 nsd[16628]: zone example.com serial 0 is updated to > 2018091101 > > > > I can think of 2 possibilites: > > a) the primary name server is overloaded and does not accept AXFR > > requests immediately > > > > b) NSD has some timers that stop him from acting immediately > > > > Are there any such timers in NSD? Can you please give me some hints > how > > to debug this issue? > > > > Thanks > > > > Klaus > > > > > > _______________________________________________ > > nsd-users mailing list > > nsd-users at lists.nlnetlabs.nl > > https://lists.nlnetlabs.nl/mailman/listinfo/nsd-users