jad at nominet.org.uk
2006-May-31 14:47 UTC
no response from nsd 2.3.4 with more than 12 server processes
Hi,
I just thought I would try and load test nsd 2.3.4 on a Sun T2000 running
Solaris 10 01/06 but I am having a few problems.
If I specify the number of servers to be more than 12 I get no response
from nsd when I try to query it. With 12 or less servers it works fine. I
am specifying the number of servers by editing nsdc and adding -N 12 to
the flags line. The server I am using has 32 virtual processors so in the
end I want to use -N 32. For example after starting with -N 13, dig gives
the following
dig @localhost jadjadjad.example.com soa
; <<>> DiG 9.2.4 <<>> @localhost jadjadjad.example.com
soa
;; global options: printcmd
;; connection timed out; no servers could be reached
with -N 12 I get a response.
Has anyone else tried running nsd with so many servers?
BTW - I also noticed that if I start nsd using nsdc start and then try and
start it again nsdc correctly reports that nsd is already running. However
for some reason the pid file gets removed. For example
bash-3.00# /opt/nsd/sbin/nsdc start
bash-3.00# ls -l /opt/nsd/etc/nsd/nsd.pid
-rw-r--r-- 1 nsd other 6 May 31 15:27
/opt/nsd/etc/nsd/nsd.pid
bash-3.00# /opt/nsd/sbin/nsdc start
[1149085664] nsd[26337]: warning: nsd is already running as 26320,
continuing
bash-3.00# ls -l /opt/nsd/etc/nsd/nsd.pid
/opt/nsd/etc/nsd/nsd.pid: No such file or directory
bash-3.00# /opt/nsd/sbin/nsdc stop
nsd is not running
bash-3.00# ps -ef | grep nsd
nsd 26324 26320 0 15:27:35 ? 0:00 /opt/nsd/sbin/nsd -f
/opt/nsd/etc/nsd/nsd.db -P /opt/nsd/etc/nsd/nsd.pid
root 26343 26314 0 15:27:59 pts/2 0:00 grep nsd
nsd 26326 26320 0 15:27:35 ? 0:00 /opt/nsd/sbin/nsd -f
/opt/nsd/etc/nsd/nsd.db -P /opt/nsd/etc/nsd/nsd.pid
...
Thanks
John
dr. W.C.A. Wijngaards
2006-Jun-02 11:10 UTC
no response from nsd 2.3.4 with more than 12 server processes
Hi Jad, I've tried to reproduce this, and on a AMD linux 2.6.16 system I get replies using 9 servers, but not with 10 servers. (with the freshly-released nsd 2.3.5 by the way). The code does nothing special with the number of servers it forks. Each server select()s on the port. With 10 servers, none of the servers come out of select(). With 9 or fewer, one comes out of select() and handles the udp message and goes back into select(). There is no immediately solution, I cannot unblock select(). Both Solaris and linux then have this feature. So, I can reproduce your problem, and I will be looking into it. Entered as http://www.nlnetlabs.nl/bugs/show_bug.cgi?id=134 Thank you for the report. As for your problem with killing them off, when you start it doubly, i.e. you start one when the old one is still running, then NSD detects the old NSD, and continues to attempt to start as well. It overwrites the pidfile with its own pid. But then fails because it cannot bind to the port (it is in use by the old NSD and you start on the same port), and then it exits and unlinks the pidfile. This removes the pidfile. I cannot easily fix this, as there is only one pid file, and two NSDs running. Thank you for the report, Wouter jad at nominet.org.uk wrote:> Hi, > > I just thought I would try and load test nsd 2.3.4 on a Sun T2000 running > Solaris 10 01/06 but I am having a few problems. > > If I specify the number of servers to be more than 12 I get no response > from nsd when I try to query it. With 12 or less servers it works fine. I > am specifying the number of servers by editing nsdc and adding -N 12 to > the flags line. The server I am using has 32 virtual processors so in the > end I want to use -N 32. For example after starting with -N 13, dig gives > the following > > dig @localhost jadjadjad.example.com soa > ; <<>> DiG 9.2.4 <<>> @localhost jadjadjad.example.com soa > ;; global options: printcmd > ;; connection timed out; no servers could be reached > > with -N 12 I get a response. > > Has anyone else tried running nsd with so many servers? > > BTW - I also noticed that if I start nsd using nsdc start and then try and > start it again nsdc correctly reports that nsd is already running. However > for some reason the pid file gets removed. For example > bash-3.00# /opt/nsd/sbin/nsdc start > bash-3.00# ls -l /opt/nsd/etc/nsd/nsd.pid > -rw-r--r-- 1 nsd other 6 May 31 15:27 > /opt/nsd/etc/nsd/nsd.pid > bash-3.00# /opt/nsd/sbin/nsdc start > [1149085664] nsd[26337]: warning: nsd is already running as 26320, > continuing > bash-3.00# ls -l /opt/nsd/etc/nsd/nsd.pid > /opt/nsd/etc/nsd/nsd.pid: No such file or directory > bash-3.00# /opt/nsd/sbin/nsdc stop > nsd is not running > bash-3.00# ps -ef | grep nsd > nsd 26324 26320 0 15:27:35 ? 0:00 /opt/nsd/sbin/nsd -f > /opt/nsd/etc/nsd/nsd.db -P /opt/nsd/etc/nsd/nsd.pid > root 26343 26314 0 15:27:59 pts/2 0:00 grep nsd > nsd 26326 26320 0 15:27:35 ? 0:00 /opt/nsd/sbin/nsd -f > /opt/nsd/etc/nsd/nsd.db -P /opt/nsd/etc/nsd/nsd.pid > ... > > Thanks > John > _______________________________________________ > nsd-users mailing list > nsd-users at NLnetLabs.nl > http://open.nlnetlabs.nl/mailman/listinfo/nsd-users-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 251 bytes Desc: OpenPGP digital signature URL: <http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20060602/aef816cb/attachment.bin>
dr. W.C.A. Wijngaards
2006-Jun-02 14:10 UTC
no response from nsd 2.3.4 with more than 12 server processes
Hi,
Bug has been found with Miek heroically running -N 300 on his machine.
Array out of bounds. Fix is to change i to 0 in line 608:
#ifdef INET6
if (hints[i].ai_family == AF_UNSPEC) {
# ifdef IPV6_V6ONLY
----
#ifdef INET6
if (hints[0].ai_family == AF_UNSPEC) {
# ifdef IPV6_V6ONLY
in nsd.c. This fix is in subversion for 2.3.6 and for 3.0pre.
Best regards,
Wouter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 251 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20060602/e2a8bd0e/attachment.bin>