NSD currently only processes one UDP packet per socket per select(). Since select() is kind of expensive, under load this means it burns a lot of CPU unnecessarily. There's a simple trick to avoid this. Make the UDP socket non-blocking, and loop on recvfrom() until it returns -1, ignoring any EAGAIN errors. Under light load, this results in an extra recvfrom() every packet. But under heavy load, this avoids select() until the input buffer is drained. Attached is an example patch against NSD 2.3.4 that implements this. According to the queryperf tool that comes with BIND, on a simple query against localhost on an old Linux box, this takes NSD's peak throughput from 39kpps to 48kpps, a 23% improvement. These are obviously ideal conditions, but please feel free to test for yourself. If the patch gets munged in transit, it is also available from: https://www.die.net/tmp/1c50b61e244661c1/nsd-2.3.4-fewerselects.patch -- Aaron --- diff -ur nsd-2.3.4/server.c nsd-2.3.4.faster/server.c --- nsd-2.3.4/server.c 2006-04-06 07:26:35.000000000 -0700 +++ nsd-2.3.4.faster/server.c 2006-05-12 03:03:42.000000000 -0700 @@ -276,6 +276,11 @@ } #endif + if (fcntl(nsd->udp[i].s, F_SETFL, O_NONBLOCK) == -1) { + log_msg(LOG_ERR, "fcntl failed: %s", strerror(errno)); + return -1; + } + /* Bind it... */ if (bind(nsd->udp[i].s, (struct sockaddr *) nsd->udp[i].addr->ai_addr, nsd->udp[i].addr->ai_addrlen) != 0) { log_msg(LOG_ERR, "can't bind the socket: %s", strerror(errno)); @@ -707,28 +712,31 @@ return; } - /* Account... */ - if (data->socket->addr->ai_family == AF_INET) { - STATUP(data->nsd, qudp); - } else if (data->socket->addr->ai_family == AF_INET6) { - STATUP(data->nsd, qudp6); - } + while (1) { + /* Initialize the query... */ + query_reset(q, UDP_MAX_MESSAGE_LEN, 0); + + received = recvfrom(handler->fd, + buffer_begin(q->packet), + buffer_remaining(q->packet), + 0, + (struct sockaddr *)&q->addr, + &q->addrlen); + if (received == -1) { + if (errno != EAGAIN && errno != EINTR) { + log_msg(LOG_ERR, "recvfrom failed: %s", strerror(errno)); + STATUP(data->nsd, rxerr); + } + return; + } - /* Initialize the query... */ - query_reset(q, UDP_MAX_MESSAGE_LEN, 0); - - received = recvfrom(handler->fd, - buffer_begin(q->packet), - buffer_remaining(q->packet), - 0, - (struct sockaddr *)&q->addr, - &q->addrlen); - if (received == -1) { - if (errno != EAGAIN && errno != EINTR) { - log_msg(LOG_ERR, "recvfrom failed: %s", strerror(errno)); - STATUP(data->nsd, rxerr); + /* Account... */ + if (data->socket->addr->ai_family == AF_INET) { + STATUP(data->nsd, qudp); + } else if (data->socket->addr->ai_family == AF_INET6) { + STATUP(data->nsd, qudp6); } - } else { + buffer_skip(q->packet, received); buffer_flip(q->packet);
[On 12 May, @12:41, Aaron Hopkins wrote in "Reducing select() usage under ..."]> against localhost on an old Linux box, this takes NSD's peak throughput from > 39kpps to 48kpps, a 23% improvement. These are obviously ideal conditions, > but please feel free to test for yourself.whoa! :-) That's very nice. We'll look at this asap, but next week we're all at SANE 2006. So at the earliest this will be in the week after SANE. -- grtz, - Miek http://www.miek.nl http://www.nlnetlabs.nl PGP: 6A3C F450 6D4E 7C6B C23C F982 258B 85CF 3880 D0F6 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 191 bytes Desc: Digital signature URL: <http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20060512/1798c8ed/attachment.bin>
Aaron Hopkins wrote:> NSD currently only processes one UDP packet per socket per select(). Since > select() is kind of expensive, under load this means it burns a lot of CPU > unnecessarily. > > There's a simple trick to avoid this. Make the UDP socket non-blocking, and > loop on recvfrom() until it returns -1, ignoring any EAGAIN errors. Under > light load, this results in an extra recvfrom() every packet. But under > heavy load, this avoids select() until the input buffer is drained.But you will have to be careful not to starve other sockets that may have incoming requests waiting. Since NSD will usually run with multiple sockets (UDP, TCP, IPv4, IPv6, multiple interfaces) this can become quite hard and/or expensive. That's why NSD currently uses a select and processes all readable sockets (not just the first!) every iteration. Regards, Erik
On Fri, 12 May 2006, Erik Rozendaal wrote:> But you will have to be careful not to starve other sockets that may have > incoming requests waiting. Since NSD will usually run with multiple sockets > (UDP, TCP, IPv4, IPv6, multiple interfaces) this can become quite hard and/or > expensive. That's why NSD currently uses a select and processes all readable > sockets (not just the first!) every iteration.It is hard and expensive if you want to ensure perfect fairness and interleave responses from every socket. But I think there a compromise available between perfect fairness and only answering requests from one socket when it is flooded. Changing that while(1) I added to something that would only loop up to a fixed number of times (e.g. 100) would be trivial. You'd still amortize the cost of the select() over many UDP packets, without being able to starve other sockets for more than a few milliseconds. You'd concentrate on work from one socket, then switch to the next one and do everything pending up to the same limit. And the performance gains would be approximately the same. As for TCP fairness in this scheme, you'd probably also want to loop accepting TCP connections up until current_tcp_count >= maximum_tcp_count. But since each TCP connection gets its own socket, each one will get some attention every select(), and select()s will still be happening hundreds of times per second. -- Aaron
[On 12 May, @12:41, Aaron Hopkins wrote in "Reducing select() usage under ..."]> NSD currently only processes one UDP packet per socket per select(). Since > select() is kind of expensive, under load this means it burns a lot of CPU > unnecessarily.Hello, Thanks for your patches. The speed improvements you see are impressive. We are however rather reluctant to apply the select()-patch to NSD at this moment: We've tested the speed improvements in our DISTEL testlab and we did see some gain, but not the amount of improvement you noticed. However, it could be this is because we are filling the 100Mb interfaces. We will therefore upgrade the testlabs hardware to a 1Gb network. Once we have finalized the measurements we'll produce the result on this list. If the differences are not very significant we are hesitant to code around the use of select() because of principles of code simplicity and portability. -- grtz, - Miek http://www.miek.nl http://www.nlnetlabs.nl PGP: 6A3C F450 6D4E 7C6B C23C F982 258B 85CF 3880 D0F6 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 191 bytes Desc: Digital signature URL: <http://lists.nlnetlabs.nl/pipermail/nsd-users/attachments/20060524/7163f4e9/attachment.bin>
On Fri, 12 May 2006, Aaron Hopkins wrote:> According to the queryperf tool that comes with BIND, on a simple query > against localhost on an old Linux box, this takes NSD's peak throughput from > 39kpps to 48kpps, a 23% improvement. These are obviously ideal conditions, > but please feel free to test for yourself.I had a chance to test this patch in different environment. I had three Linux 2.6 boxes with one 3.4ghz hyperthreaded P4 each on a gigabit network. Two machines acted as clients running BIND's queryperf tool, one was running NSD 2.3.4. With stock NSD 2.3.4 with no -N specified, I got 43000 qps total. With -N 2 specified, I got 50000 qps total. Adding my select()-reduction patch to stock NSD 2.3.4 with no -N specified, I got 49000 qps total. With -N 2 specified, I got 55000 qps total. In this environment, it seems that reducing select()s offers a 10-14% performance improvement. -- Aaron
On Thu, Jun 22, 2006 at 07:13:35PM -0700, Aaron Hopkins <lists at die.net> wrote a message of 26 lines which said:> With stock NSD 2.3.4 with no -N specified, I got 43000 qps total. With -N 2 > specified, I got 50000 qps total.And 15 days for this message to get out of NLnetlabs. That's impressive :-)