On 14/05/2020 13:29, Wouter Wijngaards via nsd-users wrote:
Hi Wouter,
> Yes this applies to incoming queries and to outgoing queries. 120
> seconds by default.
Thanks for the clarification. I think the default of 120s should be
documented in the man page.
I'm still not clear on what the timeout applies to though. Is it to the
time between individual DNS messages in a TCP connection? Or does it
apply to any period of inactivity in the connection?
> A much smaller value, of 200 msec, is used when the server is nearly
> full on capacity, for incoming connections that are over the limit.
> Also when the server has updated the existing connections get a smaller
> 100 msec timeout to wait for them to complete their tcp query to NSD.
>
> That last feature since 4.2.1. The tcp full shorter timeout is since
> 4.1.11.
Now that you've explained it here, I recall that there was something
about this in the release notes. However, the values of 200ms isn't
documented. The release notes have:
"When tcp is more than half full, use short timeout for tcp session."
So
I'm guessing that "short timeout" here is 200ms. Also, it's
not clear
whether the timeout is dynamic. What I mean is: is it applied to all
sessions (existing and new), or only to new ones. When the number of tcp
connections drops to less than half, is the timeout reset to 120s? And
is it reset for all sessions, or just new ones?
Dropping from the default 120s, to a mere 200ms when the number of TCP
connections goes up, is quite dramatic. And I happen to think that 200ms
is too low. A client that's getting an AXFR from such an NSD server is
quite likely to suffer disconnects. In fact, I have been observing
exactly this behaviour on the servers we run. We have a use case where a
user is doing AXFR of some largish zones, and when the client is a bit
slow, NSD drops the connection. This causes the client to retry. This,
IMHO, is rather wasteful.
The other feature of shortening the timeout to 100ms is also not so
obvious. The release notes have:
"Fix #14, tcp connections have 1/10 to be active and have to work
every second, and then they get time to complete during a reload,
this is a process that lingers with the old version during a version
update."
The 1/10 there is not very readable. I think that 100ms would be much
clearer. And I also don't understad what you mean by "and have to work
every second". Could you please explain that?
In my opinion, such details should not be buried in the release notes
document. The release notes are useful when comparing one version to
another. All these features of how the server dynamically adjusts its
behaviour should be in the operations manual or at least the nsd.conf
man page.
Imagine a new user of NSD, who is trying to configure and tune the
server, and sets "tcp-timeout" to some value, and still observes
different behaviour when running the server. This leads to confusion.
And it's not reasonable to expect the user to read the entire set of
release notes trying to find such undocumented features.
Regards,
Anand Buddhdev
RIPE NCC