W.C.A. Wijngaards
2016-Apr-06 09:04 UTC
[nsd-users] NSD4 goes unresponsive with lots of TCP connection!
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi Kabindra, I have not heard of this before, how is TCP affecting NSD? NSD has a fixed number of tcp connections, configured in tcp-count: 100 from the nsd.conf file. That should be what is services. You should increase that count to increase responsiveness to TCP. UDP should be unaffected. The backlog is for tcp connections waiting to be accepted. 256 is reasonably portable, reasonably large. I don't see how that value is your problem. Is your kernel and networking subsystem failing? The OS can return EMFILE or ENFILE to accept(), nsd starts to stop accepting TCP connections to relieve buffer stress on the OS. But again, UDP should not have been impacted? Are you using so-reuseport: yes? I have had reports that it disrupts connectivity (depending on OS, particular version of the OS, and more recent versions of NSD do not use reuseport on TCP anymore). Best regards, Wouter On 05/04/16 18:28, Kabindra Shrestha wrote:> Hi, > > We are seeing some large number of TCP connections to our DNS > servers (in thousands) and NSD goes unresponsive after certain time > and doesn't recover, it stops responding to UDP as well. We tried > increasing the number of tcp-counts but it doesn't help. I noticed > the TCP backlog is hardcoded to 256 in NSD config, so even with > customised TCP backlogs on the system its still being throttled at > around 256. Is there anyway we can change this value without > recompiling the NSD. > > > [kabindra at 05 nsd-4.1.8]$ grep BACKLOG * config.h.in:#undef > TCP_BACKLOG configure:#define TCP_BACKLOG 256 > configure.ac:AC_DEFINE_UNQUOTED([TCP_BACKLOG], [256], [Define to > the backlog to be used with listen.]) > > > We are using NSD4.1.8. > > ( From one of the servers that went unresponsive, we have seen that > TCP number closing to 10k. ) > > #ss -s Total: 5591 (kernel 5640) TCP: 5067 (estab 4968, closed 4, > orphaned 0, synrecv 0, timewait 3/0), ports 28 > > Transport Total IP IPv6 * 5640 - - RAW > 0 0 0 UDP 122 63 59 TCP 5063 > 5017 46 INET 5185 5080 105 FRAG 0 0 > 0 > > > Thanks. > > Regards, Kabindra Shrestha > > > > _______________________________________________ nsd-users mailing > list nsd-users at NLnetLabs.nl > https://open.nlnetlabs.nl/mailman/listinfo/nsd-users >-----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJXBNE1AAoJEJ9vHC1+BF+N7YYP/2CCwjlGFYvh3hQ5pWpMMRNT NOgwyENf2HGATR2uFvuASJixQ9SLGU2/YHRpaBDS5dP6zKdde6abH1FTPIGK5U94 6tcG2IgsDbm4eGFJ//ndEts1MRT+xvTOqgiR8cLzshu3ENJJ+dyYgDG0clhNyQPb gFr0yxvpjVbE56a2DA/NTTM5N7vP5Zdg5YyEq57me0fSGl2GHcUiRWy/zgE/iCjk IqP0Sygf2mz5Ig+PQm0QcD82GjAdPuWUb/+06b6P4m0s2QuZCelSLP2hqaKGGBRx tTlHa266Skf2xj23kuKm9Sf9AHEHggW4mCZaYL9VqGbD2aG2vuY2D92BZh4gcaca w1Le5DmwICeV/EyjIynC6WJoQ+AbU1T7lF6HS6XP1j5tNOWTzIphufBBe0N1S49p 5v2psAE8kJNGD4Bm8b13Fi7YQCfQJ2Z1GFLU6kWrbvuJCJA4TlGFDM1X9xtDLJYD NS0n4gGI063RUvyve/BPNPtHcNsLmAiYdYT5ZbN3Y18jO+QJnlbqOmBJCIlcZK4W QlRnc+83S7l8WfAy72EeqcuZ3q/J9UGNEhWDaEyUckZ/FFVjEM1tvZZLHxwkc7DM wOroMvaoDfSYbuV1+OoQxZNkLlvo6Hdrl3/EtVtKRABTtdjON2NJjMi4OTyUWFn4 puYMhYPxE56t9C/EsQOn =fhoJ -----END PGP SIGNATURE-----
Kabindra Shrestha
2016-Apr-08 06:08 UTC
[nsd-users] NSD4 goes unresponsive with lots of TCP connection!
Hi Wouter,> On Apr 6, 2016, at 2:49 PM, W.C.A. Wijngaards <wouter at nlnetlabs.nl> wrote: > > Signed PGP part > Hi Kabindra, > > I have not heard of this before, how is TCP affecting NSD?After couple thousand of TCP queries, NSD goes unresponsive for both TCP and UDP. [kabindra at 1 ~]$ dig @`hostname` -p 5350 ch txt hostname.bind ; <<>> DiG 9.8.1 <<>> @<replaced> -p 5350 ch txt hostname.bind ; (2 servers found) ;; global options: +cmd ;; connection timed out; no servers could be reached [kabindra at 1 ~]$ dig @`hostname` -p 5350 ch txt hostname.bind +tcp ; <<>> DiG 9.8.1 <<>> @ <replaced> -p 5350 ch txt hostname.bind +tcp ; (2 servers found) ;; global options: +cmd ;; connection timed out; no servers could be reached One thing we noticed, we have set the server-count to 4, so it should have 4 child process forked, right? when NSD goes unresponsive, we see couple of <defunct> process and more than 4 child processes. also, these NSD processes are using lots of CPU. I have left this box out of service for almost 2 days now after going unresponsive but you can see the cpu usage on the below image, it's not coming down.>> NSD has a > fixed number of tcp connections, configured in tcp-count: 100 from the > nsd.conf file. That should be what is services. You should increase > that count to increase responsiveness to TCP.Yes, that's what we changed earlier to increase responsiveness to TCP.> > UDP should be unaffected.That is not the case we are seeing.> > The backlog is for tcp connections waiting to be accepted. 256 is > reasonably portable, reasonably large. I don't see how that value is > your problem.It has been so far and should be true for most of the users but recently with the increase in TCP traffic, I doubt that's still the case. With the RRL implemented I believe it's going to increase some amount of TCP traffic than what it used to be, right? So say if I increase the number of tcp-counts to 1024 but my backlog is set to 256, will I still be able to get 1024 connections at a time or will I be limited to 256 connections concurrently?> Is your kernel and networking subsystem failing?I don't think so, if it was the problem I would see problem for other services on that server as well, right?> > The OS can return EMFILE or ENFILE to accept(), nsd starts to stop > accepting TCP connections to relieve buffer stress on the OS. But > again, UDP should not have been impacted?Again, that's not the case we are seeing.> > Are you using so-reuseport: yes?Nope.> I have had reports that it disrupts > connectivity (depending on OS, particular version of the OS, and more > recent versions of NSD do not use reuseport on TCP anymore).Sorry, forgot to mention earlier, we are on CentOS 6 and NSD 4.1.8. Thanks.> > Best regards, Wouter > > On 05/04/16 18:28, Kabindra Shrestha wrote: > > Hi, > > > > We are seeing some large number of TCP connections to our DNS > > servers (in thousands) and NSD goes unresponsive after certain time > > and doesn't recover, it stops responding to UDP as well. We tried > > increasing the number of tcp-counts but it doesn't help. I noticed > > the TCP backlog is hardcoded to 256 in NSD config, so even with > > customised TCP backlogs on the system its still being throttled at > > around 256. Is there anyway we can change this value without > > recompiling the NSD. > > > > > > [kabindra at 05 nsd-4.1.8]$ grep BACKLOG * config.h.in:#undef > > TCP_BACKLOG configure:#define TCP_BACKLOG 256 > > configure.ac:AC_DEFINE_UNQUOTED([TCP_BACKLOG], [256], [Define to > > the backlog to be used with listen.]) > > > > > > We are using NSD4.1.8. > > > > ( From one of the servers that went unresponsive, we have seen that > > TCP number closing to 10k. ) > > > > #ss -s Total: 5591 (kernel 5640) TCP: 5067 (estab 4968, closed 4, > > orphaned 0, synrecv 0, timewait 3/0), ports 28 > > > > Transport Total IP IPv6 * 5640 - - RAW > > 0 0 0 UDP 122 63 59 TCP 5063 > > 5017 46 INET 5185 5080 105 FRAG 0 0 > > 0 > > > > > > Thanks. > > > > Regards, Kabindra Shrestha > > > > > > > > _______________________________________________ nsd-users mailing > > list nsd-users at NLnetLabs.nl > > https://open.nlnetlabs.nl/mailman/listinfo/nsd-users > > > > _______________________________________________ > nsd-users mailing list > nsd-users at NLnetLabs.nl > https://open.nlnetlabs.nl/mailman/listinfo/nsd-usersRegards, Kabindra Shrestha -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20160408/38cdc0dc/attachment.htm> -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-1.png Type: image/png Size: 114187 bytes Desc: not available URL: <http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20160408/38cdc0dc/attachment.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-1.png Type: image/png Size: 119257 bytes Desc: not available URL: <http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20160408/38cdc0dc/attachment-0001.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 495 bytes Desc: Message signed with OpenPGP using GPGMail URL: <http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20160408/38cdc0dc/attachment.bin>