jad at nominet.org.uk
2006-May-31 14:47 UTC
no response from nsd 2.3.4 with more than 12 server processes
Hi, I just thought I would try and load test nsd 2.3.4 on a Sun T2000 running Solaris 10 01/06 but I am having a few problems. If I specify the number of servers to be more than 12 I get no response from nsd when I try to query it. With 12 or less servers it works fine. I am specifying the number of servers by editing nsdc and adding -N 12 to the flags line. The server I am using has 32 virtual processors so in the end I want to use -N 32. For example after starting with -N 13, dig gives the following dig @localhost jadjadjad.example.com soa ; <<>> DiG 9.2.4 <<>> @localhost jadjadjad.example.com soa ;; global options: printcmd ;; connection timed out; no servers could be reached with -N 12 I get a response. Has anyone else tried running nsd with so many servers? BTW - I also noticed that if I start nsd using nsdc start and then try and start it again nsdc correctly reports that nsd is already running. However for some reason the pid file gets removed. For example bash-3.00# /opt/nsd/sbin/nsdc start bash-3.00# ls -l /opt/nsd/etc/nsd/nsd.pid -rw-r--r-- 1 nsd other 6 May 31 15:27 /opt/nsd/etc/nsd/nsd.pid bash-3.00# /opt/nsd/sbin/nsdc start [1149085664] nsd[26337]: warning: nsd is already running as 26320, continuing bash-3.00# ls -l /opt/nsd/etc/nsd/nsd.pid /opt/nsd/etc/nsd/nsd.pid: No such file or directory bash-3.00# /opt/nsd/sbin/nsdc stop nsd is not running bash-3.00# ps -ef | grep nsd nsd 26324 26320 0 15:27:35 ? 0:00 /opt/nsd/sbin/nsd -f /opt/nsd/etc/nsd/nsd.db -P /opt/nsd/etc/nsd/nsd.pid root 26343 26314 0 15:27:59 pts/2 0:00 grep nsd nsd 26326 26320 0 15:27:35 ? 0:00 /opt/nsd/sbin/nsd -f /opt/nsd/etc/nsd/nsd.db -P /opt/nsd/etc/nsd/nsd.pid ... Thanks John
dr. W.C.A. Wijngaards
2006-Jun-02 11:10 UTC
no response from nsd 2.3.4 with more than 12 server processes
Hi Jad, I've tried to reproduce this, and on a AMD linux 2.6.16 system I get replies using 9 servers, but not with 10 servers. (with the freshly-released nsd 2.3.5 by the way). The code does nothing special with the number of servers it forks. Each server select()s on the port. With 10 servers, none of the servers come out of select(). With 9 or fewer, one comes out of select() and handles the udp message and goes back into select(). There is no immediately solution, I cannot unblock select(). Both Solaris and linux then have this feature. So, I can reproduce your problem, and I will be looking into it. Entered as http://www.nlnetlabs.nl/bugs/show_bug.cgi?id=134 Thank you for the report. As for your problem with killing them off, when you start it doubly, i.e. you start one when the old one is still running, then NSD detects the old NSD, and continues to attempt to start as well. It overwrites the pidfile with its own pid. But then fails because it cannot bind to the port (it is in use by the old NSD and you start on the same port), and then it exits and unlinks the pidfile. This removes the pidfile. I cannot easily fix this, as there is only one pid file, and two NSDs running. Thank you for the report, Wouter jad at nominet.org.uk wrote:> Hi, > > I just thought I would try and load test nsd 2.3.4 on a Sun T2000 running > Solaris 10 01/06 but I am having a few problems. > > If I specify the number of servers to be more than 12 I get no response > from nsd when I try to query it. With 12 or less servers it works fine. I > am specifying the number of servers by editing nsdc and adding -N 12 to > the flags line. The server I am using has 32 virtual processors so in the > end I want to use -N 32. For example after starting with -N 13, dig gives > the following > > dig @localhost jadjadjad.example.com soa > ; <<>> DiG 9.2.4 <<>> @localhost jadjadjad.example.com soa > ;; global options: printcmd > ;; connection timed out; no servers could be reached > > with -N 12 I get a response. > > Has anyone else tried running nsd with so many servers? > > BTW - I also noticed that if I start nsd using nsdc start and then try and > start it again nsdc correctly reports that nsd is already running. However > for some reason the pid file gets removed. For example > bash-3.00# /opt/nsd/sbin/nsdc start > bash-3.00# ls -l /opt/nsd/etc/nsd/nsd.pid > -rw-r--r-- 1 nsd other 6 May 31 15:27 > /opt/nsd/etc/nsd/nsd.pid > bash-3.00# /opt/nsd/sbin/nsdc start > [1149085664] nsd[26337]: warning: nsd is already running as 26320, > continuing > bash-3.00# ls -l /opt/nsd/etc/nsd/nsd.pid > /opt/nsd/etc/nsd/nsd.pid: No such file or directory > bash-3.00# /opt/nsd/sbin/nsdc stop > nsd is not running > bash-3.00# ps -ef | grep nsd > nsd 26324 26320 0 15:27:35 ? 0:00 /opt/nsd/sbin/nsd -f > /opt/nsd/etc/nsd/nsd.db -P /opt/nsd/etc/nsd/nsd.pid > root 26343 26314 0 15:27:59 pts/2 0:00 grep nsd > nsd 26326 26320 0 15:27:35 ? 0:00 /opt/nsd/sbin/nsd -f > /opt/nsd/etc/nsd/nsd.db -P /opt/nsd/etc/nsd/nsd.pid > ... > > Thanks > John > _______________________________________________ > nsd-users mailing list > nsd-users at NLnetLabs.nl > http://open.nlnetlabs.nl/mailman/listinfo/nsd-users-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 251 bytes Desc: OpenPGP digital signature URL: <http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20060602/aef816cb/attachment.bin>
dr. W.C.A. Wijngaards
2006-Jun-02 14:10 UTC
no response from nsd 2.3.4 with more than 12 server processes
Hi, Bug has been found with Miek heroically running -N 300 on his machine. Array out of bounds. Fix is to change i to 0 in line 608: #ifdef INET6 if (hints[i].ai_family == AF_UNSPEC) { # ifdef IPV6_V6ONLY ---- #ifdef INET6 if (hints[0].ai_family == AF_UNSPEC) { # ifdef IPV6_V6ONLY in nsd.c. This fix is in subversion for 2.3.6 and for 3.0pre. Best regards, Wouter -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 251 bytes Desc: OpenPGP digital signature URL: <http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20060602/e2a8bd0e/attachment.bin>