-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm still being hit by this one ... more frequently right now as I had to move a bit more stuff *onto* that server ... I'm trying to figure out what I can monitor for a 'leak' somewhere, but the only thing I'm able to find is the whole nmbclusters stuff: mars# netstat -m | grep "mbuf clusters" 130/542/672/25600 mbuf clusters in use (current/cache/total/max) the above is after 26hrs uptime ... Is there something else that will trigger/generate the above error message? - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFGN0W14QvfyHIvDvMRAo+CAKCGpBrcf30/BWFJcrKsJNFr2G7jJQCff67L FxFIiBd52huPFdQgb88AtHE=mbLc -----END PGP SIGNATURE-----
On 01/05/07, Marc G. Fournier <scrappy@freebsd.org> wrote:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > I'm still being hit by this one ... more frequently right now as I had to move > a bit more stuff *onto* that server ... I'm trying to figure out what I can > monitor for a 'leak' somewhere, but the only thing I'm able to find is the > whole nmbclusters stuff: > > mars# netstat -m | grep "mbuf clusters" > 130/542/672/25600 mbuf clusters in use (current/cache/total/max) > > the above is after 26hrs uptime ... > > Is there something else that will trigger/generate the above error message?It doesn't panic whe it happens, no? I'd check the number of sockets you've currently got open at that point. Some applications might be holding open a whole load of sockets and their buffers stay allocated until they're closed. If they don't handle/don't get told about the error then they'll just hold open the mbufs. (I came across this when banging TCP connections through a simple TCP socket proxy and wondered why networking would lock up. Turns out FreeBSD-6 isn't logging the "please consider raising NMBCLUSTERS" kernel message anymore and I needed to do exactly that. Killing the proxy process actually restored network connectivity.) Adrian -- Adrian Chadd - adrian@freebsd.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - --On Wednesday, May 02, 2007 11:17:02 -0700 John-Mark Gurney <gurney_j@resnet.uoregon.edu> wrote:> netstat -A will list the socket address, fstat will list the fd, and what > socket it connected to that fd..Oh wow ... according to this, I have: mars# wc -l /tmp/output 11238 /tmp/output (minus some header lines) sockets running righ tnow ... okay, next question ... under 'Active UNIX domain sockets, I see alot that have no Addr: Active UNIX domain sockets Address Type Recv-Q Send-Q Inode Conn Refs Nextref Addr d06b7480 stream 0 0 0 c969b240 0 0 private/proxymap c969b240 stream 0 0 0 d06b7480 0 0 ce6fc870 stream 0 0 0 cf744870 0 0 private/rewrite cf744870 stream 0 0 0 ce6fc870 0 0 ce4b2630 stream 0 0 0 d0cee900 0 0 private/proxymap d0cee900 stream 0 0 0 ce4b2630 0 0 d0437240 stream 0 0 0 cf716000 0 0 private/proxymap cf716000 stream 0 0 0 d0437240 0 0 c94f4990 stream 0 0 0 cee6ed80 0 0 private/rewrite cee6ed80 stream 0 0 0 c94f4990 0 0 d0cefcf0 stream 0 0 0 cb281a20 0 0 private/rewrite cb281a20 stream 0 0 0 d0cefcf0 0 0 ce0d5240 stream 0 0 0 cb251480 0 0 private/anvil Now, the 'Conn' field from the previous line matches the 'Address' line of the 'blank Addr' ... so there are two sockets for each Addr? in vs out? To give reference point ... mars above has 91 jail'd environments running on it, its been up 2days, 9hrs now, and has 11k sockets in use ... Hrmmm ... just checked jupiter, and she has 32 jail with 1080 sockets ... venus has 62 jail with 2819 sockets ... and pluto has 35 jails with 1818 sockets ... mars is running on average 2x the number of sockets per jail then the other servers ... Is this normal? mars# grep d067f900 /tmp/output d067f900 stream 0 0 0 cafd4c60 0 0 cafd4c60 stream 0 0 0 d067f900 0 0 There is no 'Addr' related to either of them? I can scroll down pages and pages of those types of entries, that don't have any Addr field associated with them ...> > -- > John-Mark Gurney Voice: +1 415 225 5579 > > "All that I will do, has been done, All that I have, has not." > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"- ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFGOPj/4QvfyHIvDvMRAsbFAKDRrAE4QazlJ1iQM6lLOULBwdNSygCfV2r2 AeY8lpmf0E+Av1zmAGijo+g=zDXV -----END PGP SIGNATURE-----
On Tue, 1 May 2007, Marc G. Fournier wrote:> I'm still being hit by this one ... more frequently right now as I had to > move a bit more stuff *onto* that server ... I'm trying to figure out what I > can monitor for a 'leak' somewhere, but the only thing I'm able to find is > the whole nmbclusters stuff: > > mars# netstat -m | grep "mbuf clusters" > 130/542/672/25600 mbuf clusters in use (current/cache/total/max) > > the above is after 26hrs uptime ... > > Is there something else that will trigger/generate the above error message?ENOBUFS is a common error in the network stack reflecting a lack of free memory or exceeding a system, user, or process resource limit. While the classic source of ENOBUFS is mbuf or mbuf cluster exhaustion, there are several other sources of the error. For example, you will get ENOBUFS back if you run out of sockets, or a process tries to increase the size of socket buffers beyond the user resource limit. I'd look at all the output of netstat -m, not just clusters. I'd also look at kern.ipc.numopensockets and compare it to kern.ipc.maxsockets. Robert N M Watson Computer Laboratory University of Cambridge
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - --On Thursday, May 03, 2007 19:28:56 +0100 Robert Watson <rwatson@FreeBSD.org> wrote:> I generally recommend using a combination of netstat and sockstat. Sockets > represent, loosely, IPC endpoints. There are actually two "layers" > associated with each socket -- the IPC object (socket) and the protocol > control block (PCB). Both are resource limited to pevent run-away processes > from swamping the system, so exhaustion of either can lead to ENOBUFS. > > The behaviors of netstat and sockstat are quite different, even though the > output is similar: netstat walks the protocol-layer connection lists and > prints information about them. sockstat walks the process file descriptor > table and prints information on reachable sockets. As sockets can exist > without PCBs, and PCBs can exist without sockets, you need to look at both to > get a full picture. This can occur if a proces exits, closes the socket, and > the connection remains in, for example, the TIME_WAIT state. > > There are some other differences -- the same socket can appear more than once > in sockstat output, as more than one process can reference the same socket. > Sockets can also exist without any referencing process (if the application > closes, but there is still data draining on an open socket). > > I would suggest starting with sockstat, as that will allow you to link socket > use to applications, and provide a fairly neat summary. When using netstat, > use "netstat -na", which will list all sockets and avoid name lookups.'k, all I'm looking at right now is the Unix Domain Sockets, and the output of netstat -> sockstat is growing since I first started counting both .. This was shortly after reboot: mars# netstat -A | grep stream | wc -l ; sockstat -u | wc -l 2705 2981 - From your explanation above, I'm guessing that the higher sockstat #s is where you were talking about one socket being used by multiple processes? But, right now: mars# netstat -nA | grep stream | wc -l ; sockstat -u | wc -l 5025 2905 sockstat -u #s are *down*, but netstat -na is almost double ... Again, based on what you state above: "Sockets can also exist without any referencing process (if the application closes, but there is still data draining on an open socket)." Now, looking at another 6-STABLE server, but one that has been running for 2 months now, I'm seeing numbers more consistent with what mars looks like shortly after all the jails start up: venus# netstat -nA | grep stream | wc -l ; sockstat -u | wc -l 2126 2209 So, if those sockets on mars are 'still draining on an open socket', is there some way of finding out where? If I'm understanding what you've said above, these 'draining sockets' don't have any processes associated with them anymore? So, its not like I can just kill off a process, correct? - ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFGOlh34QvfyHIvDvMRApSUAJ9jPszXBw83hXPRLbczimNWFtn6WwCgpijT nDWi/kW4Gt8/J2a4U3n2prk=IQCW -----END PGP SIGNATURE-----