thr3ads.net - freebsd stable - Socket leak (Was: Re: What triggers "No Buffer Space) Available"? [May 2007]

If this information is useful, please help other people find it:
Share via:

Marc G. Fournier

2007-May-04 01:09 UTC

Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


I'm trying to probe this as well as I can, but network stacks and sockets
have
never been my strong suit ...

Robert had mentioned in one of his emails about a "Sockets can also exist 
without any referencing process (if the application closes, but there is still 
data draining on an open socket)."

Now, that makes sense to me, I can understand that ... but, how would that look 
as far as netstat -nA shows?  Or, would it?  For example, I have:

mars# netstat -nA | grep c9655a20
c9655a20 stream      0      0        0 c95d63f0        0        0
c95d63f0 stream      0      0        0 c9655a20        0        0
mars# netstat -nA | grep c95d63f0
c9655a20 stream      0      0        0 c95d63f0        0        0
c95d63f0 stream      0      0        0 c9655a20        0        0

They are attached to each other, but there appears to be no 'referencing 
process' ... it is now 10pm at night ... I saved a 'snapshot' of
netstat -nA
output at 6:45pm, over 3 hours ago, and it has the same entries as above:

c9655a20 stream      0      0        0 c95d63f0        0        0
c95d63f0 stream      0      0        0 c9655a20        0        0

again, if I'm reading this right, there is no 'referencing process'
... first,
of course, am I reading this right?

second ... if I am reading this right, and, if I am understanding what Robert 
was saying about 'draining' (alot of ifs, I know) ... isn't it odd
for it to
take >3 hours to drain?

Again, if I'm reading / understanding things right, without the
'referencing
process', it won't show up in sockstat -u, which is why my netstat -nA
numbers
keep growing, but sockstat -u numbers don't ... which also means that there
is
no way to figure out what process / program is leaving 'dangling
sockets'? :(


- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGOoe94QvfyHIvDvMRAj2LAKDXobcYr4VGOB+WfXYqCBTatZNZLQCfbyWa
zsG/o1K3RM3ybjA5RLiSW5s=8DJi
-----END PGP SIGNATURE-----

Matthew Dillon

2007-May-04 01:26 UTC

head link

Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

:I'm trying to probe this as well as I can, but network stacks and sockets
have
:never been my strong suit ...
:
:Robert had mentioned in one of his emails about a "Sockets can also exist 
:without any referencing process (if the application closes, but there is still 
:data draining on an open socket)."
:
:Now, that makes sense to me, I can understand that ... but, how would that look
:as far as netstat -nA shows?  Or, would it?  For example, I have:
:
:...

    Netstat should show any sockets, whether they are attached to processes
    or not.  Usually you can match up the address from netstat -nA with
    the addresses from sockets shown by fstat to figure out what processes
    the sockets are attached to.

    There are three situations that you have to watch out for:

    (1) The socket was close()'d and is still draining.  The socket
	will timeout and terminate within ~1-5 minutes.  It will not
	be referenced to a descriptor or process.

    (2) The socket descriptor itself has been sent over a unix domain socket
	from one process to another and is currently in transit.  The 
	file pointer representing the descriptor is what is actually in
	transit, and will not be referenced by any processes while it is
	in transit.

	There is a garbage collector that figures out unreferencable loops.
	I think its called unp_gc or something like that.

    (3) The socket is not closed, but is idle (like having a remote shell
	open and never typing in it).  Service processes can get stuck
	waiting for data on such sockets.  The socket WILL be referenced
	by some process.

	These are controlled by net.inet.tcp.keep* and
	net.inet.tcp.always_keepalive.  I almost universally turn on
	net.inet.tcp.always_keepalive to ensure that dead idle connections
	get cleaned out.

	Note that keepalive only applies to idle connections.  A socket
	that has been closed and needs to drain (either data or the FIN
	state) will timeout and clean up itself whether keepalive is
	turned on or off).

    netstat -nA will give you the status of all your sockets.  You can
    observe the state of any TCP sockets.

    Unix domain sockets have no state and closure is governed simply by
    them being dereferenced, just like a pipe.  In this case there are really
    only two situations:  (1) One end of the unix domain socket is still
    referenced by a process or (2) The socket has been sent over another
    unix domain socket and is 'in transit'.  The socket will remain
intact
    until it is either no longer in transit (read out from the other unix
    domain socket), or the garbage collector determines that the socket the
    descripor is transiting over is not externally referencablee, and
    will destroy it and any in-transit sockets contained within.

    Any sockets that don't fall into these categories are in trouble...
    either a timer has failed somewhere or (if unix domain) the garbage
    collector has failed to detect that it is in an unreferencable loop.

    -

    One thing you can do is drop into single user mode... kill all the 
    processes on the system, and see if the sockets are recovered.  That
    will give you a good idea as to whether it is a real leak or whether
    some process is directly or indirectly (by not draining a unix domain
    socket on which other sockets are being transfered) holding onto the
    socket.
    
						-Matt

Ian Smith

2007-May-04 06:37 UTC

head link

Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

On Thu, 3 May 2007, Marc G. Fournier wrote:

 > Robert had mentioned in one of his emails about a "Sockets can also
exist
 > without any referencing process (if the application closes, but there is
still
 > data draining on an open socket)."

[..]

 > Again, if I'm reading / understanding things right, without the
'referencing
 > process', it won't show up in sockstat -u, which is why my netstat
-nA numbers
 > keep growing, but sockstat -u numbers don't ... which also means that
there is
 > no way to figure out what process / program is leaving 'dangling
sockets'? :(

Marc, I don't know if it may provide any more clues in this instance,
but lsof -U also shows unix domain sockets with pid, command and fd. 

Cheers, Ian

Robert Watson

2007-May-04 11:05 UTC

head link

Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

On Thu, 3 May 2007, Marc G. Fournier wrote:
> I'm trying to probe this as well as I can, but network stacks and
sockets
> have never been my strong suit ...
>
> Robert had mentioned in one of his emails about a "Sockets can also
exist
> without any referencing process (if the application closes, but there is 
> still data draining on an open socket)."
>
> Now, that makes sense to me, I can understand that ... but, how would that 
> look as far as netstat -nA shows?  Or, would it?  For example, I have:
>
> mars# netstat -nA | grep c9655a20
> c9655a20 stream      0      0        0 c95d63f0        0        0
> c95d63f0 stream      0      0        0 c9655a20        0        0
> mars# netstat -nA | grep c95d63f0
> c9655a20 stream      0      0        0 c95d63f0        0        0
> c95d63f0 stream      0      0        0 c9655a20        0        0
>
> They are attached to each other, but there appears to be no
'referencing
> process' ... it is now 10pm at night ... I saved a 'snapshot'
of netstat -nA
> output at 6:45pm, over 3 hours ago, and it has the same entries as above:
>
> c9655a20 stream      0      0        0 c95d63f0        0        0
> c95d63f0 stream      0      0        0 c9655a20        0        0
>
> again, if I'm reading this right, there is no 'referencing
process' ...
> first, of course, am I reading this right?
>
> second ... if I am reading this right, and, if I am understanding what 
> Robert was saying about 'draining' (alot of ifs, I know) ...
isn't it odd
> for it to take >3 hours to drain?
>
> Again, if I'm reading / understanding things right, without the
'referencing
> process', it won't show up in sockstat -u, which is why my netstat
-nA
> numbers keep growing, but sockstat -u numbers don't ... which also
means
> that there is no way to figure out what process / program is leaving 
> 'dangling sockets'? :(
I think we should be careful to avoid prematurely drawing conclusions about 
the source of the problem.  First question: have you confirmed that the 
resource limit on sockets is definitely what is causing the error you're 
seeing?  I.e., does the number of sockets hit the maximum sockets?

Second point: there are two kinds of resource leaks that seem likely 
candidates for a socket resource exhaustion problem. First, kernel bugs, in 
which the kernel maintains objects despite there being no application 
references, and second, application reference leaks, in which applications 
keep references to kernel objects despite no longer needing them.  Our 
immediate goal is to determine which of these is the case: is it a kernel bug, 
or an application bug?  Using tools like netstat and sockstat, we can try and 
determine if all kernel sockets are properly referenced.  Experience suggests 
that it is an application bug, but we shouldn't rule out a kernel bug; the 
good news is that the tools to use in the debugging process are identical at 
this stage.

Robert N M Watson
Computer Laboratory
University of Cambridge

Oliver Fromme

2007-May-07 17:01 UTC

head link

Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

Marc G. Fournier wrote:
 > Now, that makes sense to me, I can understand that ... but, how would
 > that look as far as netstat -nA shows?  Or, would it?  For example, I
 > have:

You should use "-na" to list all sockets, not "-nA".

 > mars# netstat -nA | grep c9655a20
 > c9655a20 stream      0      0        0 c95d63f0        0        0
 > c95d63f0 stream      0      0        0 c9655a20        0        0
 > mars# netstat -nA | grep c95d63f0
 > c9655a20 stream      0      0        0 c95d63f0        0        0
 > c95d63f0 stream      0      0        0 c9655a20        0        0
 > 
 > They are attached to each other, but there appears to be no
'referencing
 > process'

netstat doesn't show processes at all (sockstat, fstat
and lsof list sockets by processes).  The sockets above
are probably from a socketpair(2) or a pipe (which is
implemented with socketpair(2), AFAIK).  That's perfectly
normal.

If I remember correctly, you wrote that 11k sockets are
in use with 90 jails.  That's about 120 sockets per jail,
which isn't out of the ordinary.  Of course it depends on
what is running in those jails, but my guess is that you
just need to increase the limit on the number of sockets
(i.e. kern.ipc.maxsockets).

 > Again, if I'm reading / understanding things right, without the
'referencing
 > process', it won't show up in sockstat -u, which is why my netstat
-nA numbers
 > keep growing, but sockstat -u numbers don't ... which also means that
there is
 > no way to figure out what process / program is leaving 'dangling
sockets'? :(

Be careful here, sockstat's output is process-based and
lists sockets multiple times.  For example, the server
sockets that httpd children inherit from their parent
are listed for every single child, while you see it only
once in the netstat output.  On the other hand, sockstat
doesn't show sockets that have been closed and are in
TIME_WAIT state or similar.

Are you sure that UNIX domain sockets are causing the
problem?  Can you rule out other sockets (e.g. tcp)?
In that case you should run "netstat -funix" to list
only UNIX domain sockets (basically the same as the
-u option to sockstat).

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Gesch?ftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M?n-
chen, HRB 125758,  Gesch?ftsf?hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

$ dd if=/dev/urandom of=test.pl count=1
$ file test.pl
test.pl: perl script text executable

freebsd stable - May 2007 - Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

Socket leak (Was: Re: What triggers "No Buffer Space) Available"?

Socket leak (Was: Re: What triggers "No Buffer Space) Available"?