thr3ads.net - nsd users - [nsd-users] Xfrd scalability problem [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Martin Švec

2010-Feb-28 19:30 UTC

[nsd-users] Xfrd scalability problem

Hello again,

I think that xfrd daemon suffers a scalability problem with respect to 
the number of zones. For every zone, xfrd adds a netio_handler to the 
linked list of handlers. Then, every netio_dispatch call sequentially 
scans the entire list for "valid" filedescriptors and timeouts. With a
large number of zones, this scan is pretty expensive and superfluous, 
because almost all zone filedescriptors/timeouts are usually not 
assigned. The problem is most obvious during "nsdc reload". Because 
server_reload function sends soa infos of all zones to xfrd, xfrd 
performs full scan of the linked list for every zone. So the resulting 
complexity of reload is O(n^2). Just try "nsdc reload" with 65000
zones
and you'll see that xfrd daemon consumes 100% CPU for several _minutes_! 
However, I guess that the scalability problem is not only limited to 
reload, because _every_ socket communication with xfrd goes through the 
same netio_dispatch. There is "perf record" result of xfrd process 
during reload:

# Overhead  Command        Shared Object  Symbol
# ........  .......  ...................  ......
#
    98.69%      nsd  /usr/sbin/nsd        [.] netio_dispatch
     0.06%      nsd  [kernel]             [k] unix_stream_recvmsg
     0.05%      nsd  /usr/sbin/nsd        [.] rbtree_find_less_equal
     0.04%      nsd  [kernel]             [k] kfree
     0.04%      nsd  [kernel]             [k] copy_to_user

Then, "perf annotate netio_dispatch" shows that the heart of the
problem
is indeed in the loop scanning the linked list (because of gcc 
optimizations, line numbers are only estimative):

48.24% /work/nsd-3.2.4/netio.c:158
45.41% /work/nsd-3.2.4/netio.c:158
2.14% /work/nsd-3.2.4/netio.c:172
2.14% /work/nsd-3.2.4/netio.c:156
1.81% /work/nsd-3.2.4/netio.c:172

I wonder why the linked list in xfrd contains netio_handlers of _all_ 
zones. Wouldn't be better to dynamically add/remove zone handlers only 
when their filedescriptors/timeouts are assigned/cleared? And perhaps 
replace the linked list with a more scalable data structure? (Or NSD is 
intentionally designed to serve only a small number of zones? ;-))

Best regards
Martin Svec

W.C.A. Wijngaards

2010-Mar-01 08:16 UTC

head link

[nsd-users] Xfrd scalability problem

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Martin,

Thanks for the perf measurements.  I did not know that.  I wrote that
code some time ago, and decided against optimizing xfrd like this,
because the netio handler is also used by the server processes.  Those
processes listen on only a limited number of sockets, and thus this is
more efficient for them.  If this is the only bottleneck for a larger
number of zones, then it may be relatively easy to fix.

Best regards,
   Wouter

On 02/28/2010 08:30 PM, Martin ?vec wrote:> Hello again,
> 
> I think that xfrd daemon suffers a scalability problem with respect to
> the number of zones. For every zone, xfrd adds a netio_handler to the
> linked list of handlers. Then, every netio_dispatch call sequentially
> scans the entire list for "valid" filedescriptors and timeouts.
With a
> large number of zones, this scan is pretty expensive and superfluous,
> because almost all zone filedescriptors/timeouts are usually not
> assigned. The problem is most obvious during "nsdc reload".
Because
> server_reload function sends soa infos of all zones to xfrd, xfrd
> performs full scan of the linked list for every zone. So the resulting
> complexity of reload is O(n^2). Just try "nsdc reload" with 65000
zones
> and you'll see that xfrd daemon consumes 100% CPU for several
_minutes_!
> However, I guess that the scalability problem is not only limited to
> reload, because _every_ socket communication with xfrd goes through the
> same netio_dispatch. There is "perf record" result of xfrd
process
> during reload:
> 
> # Overhead  Command        Shared Object  Symbol
> # ........  .......  ...................  ......
> #
>    98.69%      nsd  /usr/sbin/nsd        [.] netio_dispatch
>     0.06%      nsd  [kernel]             [k] unix_stream_recvmsg
>     0.05%      nsd  /usr/sbin/nsd        [.] rbtree_find_less_equal
>     0.04%      nsd  [kernel]             [k] kfree
>     0.04%      nsd  [kernel]             [k] copy_to_user
> 
> Then, "perf annotate netio_dispatch" shows that the heart of the
problem
> is indeed in the loop scanning the linked list (because of gcc
> optimizations, line numbers are only estimative):
> 
> 48.24% /work/nsd-3.2.4/netio.c:158
> 45.41% /work/nsd-3.2.4/netio.c:158
> 2.14% /work/nsd-3.2.4/netio.c:172
> 2.14% /work/nsd-3.2.4/netio.c:156
> 1.81% /work/nsd-3.2.4/netio.c:172
> 
> I wonder why the linked list in xfrd contains netio_handlers of _all_
> zones. Wouldn't be better to dynamically add/remove zone handlers only
> when their filedescriptors/timeouts are assigned/cleared? And perhaps
> replace the linked list with a more scalable data structure? (Or NSD is
> intentionally designed to serve only a small number of zones? ;-))
> 
> Best regards
> Martin Svec
> 
> 
> _______________________________________________
> nsd-users mailing list
> nsd-users at NLnetLabs.nl
> http://open.nlnetlabs.nl/mailman/listinfo/nsd-users
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAkuLd9AACgkQkDLqNwOhpPgFvACfX5IQLLcI9iCwBMWaGmVtzK1J
7xsAn2UdLeJXS90z/Z5dvKERxN5P9Xqu
=cgTf
-----END PGP SIGNATURE-----

Dan Durrer

2010-Mar-10 22:11 UTC

head link

[nsd-users] Xfrd scalability problem

I have also been trying to run some tests using 60k+ zones.  I grabbed a very
recent snapshot of these zones from bind so there shouldn't be too many
zones that need updating.  But its been 30 minutes or more  and all zones seem
to be returning servfail.  I see some zone transfer traffic in the logs.  CPU on
the nsd process shows 99.9%, w/ 3.3% memory usage.  Centos,   8gb ram, quad core
5500.  Also applied the Memory patch posted earlier this month on 3.2.4.   In
bind we use the serial-query-rate option,  the default value is too low for how
often our zones change.  Does an option like that exist in nsd?

Any help would be appreciated.  The performance of NSD on a single zone is
phenomenal. 112k qps on this hardware.
Dan


On Feb 28, 2010, at 11:30 AM, Martin ?vec wrote:
> Hello again,
> 
> I think that xfrd daemon suffers a scalability problem with respect to the
number of zones. For every zone, xfrd adds a netio_handler to the linked list of
handlers. Then, every netio_dispatch call sequentially scans the entire list for
"valid" filedescriptors and timeouts. With a large number of zones,
this scan is pretty expensive and superfluous, because almost all zone
filedescriptors/timeouts are usually not assigned. The problem is most obvious
during "nsdc reload". Because server_reload function sends soa infos
of all zones to xfrd, xfrd performs full scan of the linked list for every zone.
So the resulting complexity of reload is O(n^2). Just try "nsdc
reload" with 65000 zones and you'll see that xfrd daemon consumes 100%
CPU for several _minutes_! However, I guess that the scalability problem is not
only limited to reload, because _every_ socket communication with xfrd goes
through the same netio_dispatch. There is "perf record" result of xfrd
process during reload:
> 
> # Overhead  Command        Shared Object  Symbol
> # ........  .......  ...................  ......
> #
>   98.69%      nsd  /usr/sbin/nsd        [.] netio_dispatch
>    0.06%      nsd  [kernel]             [k] unix_stream_recvmsg
>    0.05%      nsd  /usr/sbin/nsd        [.] rbtree_find_less_equal
>    0.04%      nsd  [kernel]             [k] kfree
>    0.04%      nsd  [kernel]             [k] copy_to_user
> 
> Then, "perf annotate netio_dispatch" shows that the heart of the
problem is indeed in the loop scanning the linked list (because of gcc
optimizations, line numbers are only estimative):
> 
> 48.24% /work/nsd-3.2.4/netio.c:158
> 45.41% /work/nsd-3.2.4/netio.c:158
> 2.14% /work/nsd-3.2.4/netio.c:172
> 2.14% /work/nsd-3.2.4/netio.c:156
> 1.81% /work/nsd-3.2.4/netio.c:172
> 
> I wonder why the linked list in xfrd contains netio_handlers of _all_
zones. Wouldn't be better to dynamically add/remove zone handlers only when
their filedescriptors/timeouts are assigned/cleared? And perhaps replace the
linked list with a more scalable data structure? (Or NSD is intentionally
designed to serve only a small number of zones? ;-))
> 
> Best regards
> Martin Svec
> 
> 
> _______________________________________________
> nsd-users mailing list
> nsd-users at NLnetLabs.nl
> http://open.nlnetlabs.nl/mailman/listinfo/nsd-users

nsd users - Feb 2010 - Xfrd scalability problem

[nsd-users] Xfrd scalability problem

[nsd-users] Xfrd scalability problem

[nsd-users] Xfrd scalability problem