Jan Srzednicki
2007-Nov-27 05:53 UTC
connect() returns EADDRINUSE during massive host->host conn rate
Hello, I have a pair of hosts. One of them performs a massive amount of TCP connections to the other one, all to the same port. This setup mostly works fine, but from time to time (that varies, from once a minute to one a half an hour), the connect(2) syscall fails with EADDRINUSE. The connection rate tops to 50 connection initiations/second. The socket is non-blocking. It does standard job of creating the socket, setting up the relevant fields, setting SO_REUSEADDR and SO_KEEPALIVE, setting O_NONBLOCK on the descriptor. No bind(2) is performed. The connection is initiated from inside a jail (not sure if that implies a internal bind(2) to the jail's address). There are no connections from the other host to the first one. I've tried tuning the net.inet.ip.portrange variables: I've increased the available portrange to over 45000 ports (quite a lot, should be more than enough for just anything) and I've toggled net.inet.ip.portrange.randomized off, but that didn't change anything. The workaround on the application side - retrying on EADDRINUSE - works pretty well, but hey, from what I know from the Stevens book, that shouldn't be happening, though Google said all BSD had a bad habit of throwing out EADDRINUSE from time to time. This all happens on a 6.2-RELEASE system. The symptoms are easily reproducable in my environment. Is there any known fix for that? If there ain't, can it be fixed? :) -- Jan Srzednicki :: http://wrzask.pl/ "Remember, remember, the fifth of November" -- V for Vendetta
Jan Srzednicki
2007-Nov-28 10:17 UTC
connect() returns EADDRINUSE during massive host->host conn rate
On Tue, Nov 27, 2007 at 02:53:20PM +0100, Jan Srzednicki wrote:> Hello, > > setting up the relevant fields, setting SO_REUSEADDR and SO_KEEPALIVE, > setting O_NONBLOCK on the descriptor. No bind(2) is performed. The > connection is initiated from inside a jail (not sure if that implies a > internal bind(2) to the jail's address). There are no connections from > the other host to the first one.And some additional info: subsequent connect()s on the same keep returning EADDRINUSE as well. In order to establish a connection, the application must create a brand new socket and then retry connect(). -- Jan Srzednicki :: http://wrzask.pl/ "Remember, remember, the fifth of November" -- V for Vendetta
Jan Srzednicki
2007-Nov-28 10:30 UTC
connect() returns EADDRINUSE during massive host->host conn rate
On Wed, Nov 28, 2007 at 10:22:08AM -0800, Julian Elischer wrote:> Jan Srzednicki wrote: >> Hello, >> I have a pair of hosts. One of them performs a massive amount of >> TCP connections to the other one, all to the same port. This setup >> mostly works fine, but from time to time (that varies, from once a >> minute to one a half an hour), the connect(2) syscall fails with >> EADDRINUSE. The connection rate tops to 50 connection > > so, what does netstat -aAn show?How can I get any usable information from netstat? It shows a bunch of connections, of course, but since connect(2) failed, I have no idea what local port I was trying to use. And, what I forgot to mention, it's a SMP box, which could matter in case of some race condition. -- Jan Srzednicki :: http://wrzask.pl/ "Remember, remember, the fifth of November" -- V for Vendetta
Julian Elischer
2007-Nov-28 10:34 UTC
connect() returns EADDRINUSE during massive host->host conn rate
Jan Srzednicki wrote:> Hello, > > I have a pair of hosts. One of them performs a massive amount of > TCP connections to the other one, all to the same port. This setup > mostly works fine, but from time to time (that varies, from once a > minute to one a half an hour), the connect(2) syscall fails with > EADDRINUSE. The connection rate tops to 50 connectionso, what does netstat -aAn show?> initiations/second. > > The socket is non-blocking. It does standard job of creating the socket, > setting up the relevant fields, setting SO_REUSEADDR and SO_KEEPALIVE, > setting O_NONBLOCK on the descriptor. No bind(2) is performed. The > connection is initiated from inside a jail (not sure if that implies a > internal bind(2) to the jail's address). There are no connections from > the other host to the first one. > > I've tried tuning the net.inet.ip.portrange variables: I've increased > the available portrange to over 45000 ports (quite a lot, should be more > than enough for just anything) and I've toggled > net.inet.ip.portrange.randomized off, but that didn't change anything. > > The workaround on the application side - retrying on EADDRINUSE - works > pretty well, but hey, from what I know from the Stevens book, that > shouldn't be happening, though Google said all BSD had a bad habit of > throwing out EADDRINUSE from time to time. > > This all happens on a 6.2-RELEASE system. The symptoms are easily > reproducable in my environment. > > Is there any known fix for that? If there ain't, can it be fixed? :) >
Ivan Voras
2007-Nov-28 10:39 UTC
connect() returns EADDRINUSE during massive host->host conn rate
Jan Srzednicki wrote:> Hello, > > I have a pair of hosts. One of them performs a massive amount of > TCP connections to the other one, all to the same port. This setup > mostly works fine, but from time to time (that varies, from once a > minute to one a half an hour), the connect(2) syscall fails with > EADDRINUSE. The connection rate tops to 50 connection > initiations/second.This looks like the old (and probably well known) problem "ab" has. ("ab" is "apache benchmark", a utility which is bundled with apache and which does repeated connections to the specified address, does transactions and computes some statistics). AFAIK this behaviour was present since at least 5.2, maybe earlier. No known fixes. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20071128/ab9d29af/signature.pgp
Daniel Eischen
2007-Nov-28 11:16 UTC
connect() returns EADDRINUSE during massive host->host conn rate
On Wed, 28 Nov 2007, Ivan Voras wrote:> Jan Srzednicki wrote: >> Hello, >> >> I have a pair of hosts. One of them performs a massive amount of >> TCP connections to the other one, all to the same port. This setup >> mostly works fine, but from time to time (that varies, from once a >> minute to one a half an hour), the connect(2) syscall fails with >> EADDRINUSE. The connection rate tops to 50 connection >> initiations/second. > > This looks like the old (and probably well known) problem "ab" has. > ("ab" is "apache benchmark", a utility which is bundled with apache and > which does repeated connections to the specified address, does > transactions and computes some statistics). AFAIK this behaviour was > present since at least 5.2, maybe earlier. No known fixes.Could it have anything to do with the listen backlog on the server end? -- DE
Ivan Voras
2007-Nov-28 12:06 UTC
connect() returns EADDRINUSE during massive host->host conn rate
Daniel Eischen wrote:> On Wed, 28 Nov 2007, Ivan Voras wrote:>> This looks like the old (and probably well known) problem "ab" has. >> ("ab" is "apache benchmark", a utility which is bundled with apache and >> which does repeated connections to the specified address, does >> transactions and computes some statistics). AFAIK this behaviour was >> present since at least 5.2, maybe earlier. No known fixes. > > Could it have anything to do with the listen backlog on the > server end?No, since the same utility used from Linux (not the same binary...) works as it should. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20071128/6252f25d/signature.pgp