Stephane Bortzmeyer
2014-Jul-30 11:40 UTC
[nsd-users] "update failed (acquired: 1406646354), restarting transfer (notified zone)"
A NSD 3.2.16, secondary for a large zone, suddenly stopped updating the zone: xfrd: zone foobar.example: soa serial 2222549638 update failed (acquired: 1406646354), restarting transfer (notified zone) Other (smaller) zones on the same machine are properly updated. Checking the source code (xfrd.c) does not give me many hints. "acquired" is the time where the SOA was acquired but it it not clear what is wrong exactly, and how to solve it. Did anyone see such behavior?
Michael Braunoeder
2014-Sep-26 09:37 UTC
[nsd-users] "update failed (acquired: 1406646354), restarting transfer (notified zone)"
Hi Stephane, Am 30.07.2014 um 13:40 schrieb Stephane Bortzmeyer:> A NSD 3.2.16, secondary for a large zone, suddenly stopped updating > the zone: > > xfrd: zone foobar.example: soa serial 2222549638 update failed (acquired: 1406646354), restarting transfer (notified zone) > > Other (smaller) zones on the same machine are properly updated. > > Checking the source code (xfrd.c) does not give me many > hints. "acquired" is the time where the SOA was acquired but it it not > clear what is wrong exactly, and how to solve it. Did anyone see such > behavior?We see the same behavior (also with nsd 3.2.16). It run stable until we started to make a lot of updates to the zone. 1 IXFR out of 4 fails and the automatic restart of the transfer won't also work. It needs an extra notify to trigger the transfer again (and then it works). Did you discover anything? As you mentioned the source code does not give many hints. Best, Michael
Michael Braunoeder
2014-Oct-01 09:27 UTC
[nsd-users] "update failed (acquired: 1406646354), restarting transfer (notified zone)"
Am 26.09.2014 um 11:37 schrieb Michael Braunoeder: [...]> We see the same behavior (also with nsd 3.2.16). It run stable until we > started to make a lot of updates to the zone. 1 IXFR out of 4 fails and > the automatic restart of the transfer won't also work. It needs an extra > notify to trigger the transfer again (and then it works).I did some debugging and found some strange behavoirs. The bug is triggered by a strange race condition including a big zone transfer, a slow connection (results in a long running transfer) and the nsd-patch job. I can trigger this error if I run the nsd-patch job during an active tranfers but not all the time. There is a small time window when the nsd-patch job kills the reloading of the zone. Is there a recommendation how often the nsd-patch job should run? What happens if the jobs runs during an active IXFR? I noticed that the ixfr.db gets merged into the nsd.db but the transfer is still running and starts a new ixfr.db. Is the nsd.db now in an inconsitent state (with an incompleted zonetransfer applied)? Wouldn't it be better to trigger the patch-job after a successful transfer rather then time based? Best, Michael