MichelsonMorley
2007-Jul-19 08:13 UTC
one mongrel with *lots* of close_wait tcp connections
* cross posted to the mongrel mailing list* Hi, I''m running into a strange issue where one mongrel will sometimes develop hundreds of CLOSE_WAIT TCP connections, mostly to apache (I think -- see sample lsof output below). I haven''t had a chance to get the mongrel with this behavior into USR1 debug mode yet. I wrote a little loop below that will catch it next time. This issue occurs a couple times a day on average at seemingly random times. The problem goes away within a minute or two, probably after a restart of the mongrel. I''m probably doing something crazy to cause this behavior, but I''m having trouble figuring out exactly what the problem is. It probably has to do with the fact that my mongrels get files off of amazon s3 for some requests. We do HTTPClient.get(url) for some s3 urls. I''m setting up dnsmasq now, by the way, but it''s not up yet. My next steps are to get the mongrel into USR1 debugging mode and to see what actions are causing the problem, and to install dnsmasq and cacti. I think I''ve got a good guess which action is responsible -- it''s probably the one that gets the files from s3, but I''ll make sure. If you have any thoughts or other ideas, please let me know. Thanks a ton for your help! Some sample output from lsof: lsof -i -P | grep CLOSE_ | grep mongrel CLOSE_WAIT --mysite mongrel_r 831 root 6u IPv4 95162945 TCP localhost.localdomain :8011->localhost.localdomain:59311 (CLOSE_WAIT) mongrel_r 831 root 9u IPv4 95161753 TCP mysite.com:49269->xxx-xxx-xxx-xxx.amazon.com:80<http://xxx-xxx-xxx- xxx.amazon.com/>(CLOSE_WAIT) mongrel_r 831 root 11u IPv4 95162093 TCP mysite.com: 49339-> xxx-xxx-xxx-xxx.amazon.com:80 <http://xxx-xxx-xxx-xxx.amazon.com/>(CLOSE_WAIT)mongrel_r 831 root 14u IPv4 95162202 TCP mysite.com: 49373-> xxx-xxx-xxx-xxx.amazon.com:80 <http://xxx-xxx-xxx-xxx.amazon.com/>(CLOSE_WAIT)mongrel_r 831 root 15u IPv4 95162229 TCP mysite.com: 49380-> xxx-xxx-xxx-xxx.amazon.com:80 <http://xxx-xxx-xxx-xxx.amazon.com/>(CLOSE_WAIT)mongrel_r 831 root 16u IPv4 95162319 TCP mysite.com:49399->xxx-xxx-xxx-xxx.amazon.com:80<http://xxx-xxx-xxx- xxx.amazon.com/>(CLOSE_WAIT) mongrel_r 831 root 17u IPv4 95162477 TCP mysite.com: 49436-> xxx-xxx-xxx-xxx.amazon.com:80 <http://xxx-xxx-xxx-xxx.amazon.com/>(CLOSE_WAIT)mongrel_r 831 root 19u IPv4 95163082 TCP localhost.localdomain :8011->localhost.localdomain:59348 (CLOSE_WAIT) mongrel_r 831 root 20u IPv4 95163221 TCP localhost.localdomain :8011->localhost.localdomain :59387 (CLOSE_WAIT) mongrel_r 831 root 21u IPv4 95163360 TCP localhost.localdomain :8011->localhost.localdomain:59426 (CLOSE_WAIT) mongrel_r 831 root 22u IPv4 95161592 TCP mysite.com: 49227 -> xxx-xxx-xxx-xxx.amazon.com:80 <http://xxx-xxx-xxx-xxx.amazon.com/>(CLOSE_WAIT)mongrel_r 831 root 23u IPv4 95163507 TCP localhost.localdomain :8011->localhost.localdomain :59463 (CLOSE_WAIT) mongrel_r 831 root 24u IPv4 95163675 TCP localhost.localdomain :8011->localhost.localdomain:59495 (CLOSE_WAIT) mongrel_r 831 root 25u IPv4 95164041 TCP localhost.localdomain:8011-> localhost.localdomain:59586 (CLOSE_WAIT) mongrel_r 831 root 26u IPv4 95164181 TCP localhost.localdomain :8011->localhost.localdomain:59618 (CLOSE_WAIT) mongrel_r 831 root 27u IPv4 95164293 TCP localhost.localdomain :8011->localhost.localdomain:59641 (CLOSE_WAIT) mongrel_r 831 root 28u IPv4 95164441 TCP localhost.localdomain :8011->localhost.localdomain:59670 (CLOSE_WAIT) mongrel_r 831 root 29u IPv4 95164607 TCP localhost.localdomain :8011->localhost.localdomain:59705 (CLOSE_WAIT) mongrel_r 831 root 30u IPv4 95164748 TCP localhost.localdomain :8011->localhost.localdomain:59746 (CLOSE_WAIT) mongrel_r 831 root 31u IPv4 95164895 TCP localhost.localdomain :8011->localhost.localdomain:59786 (CLOSE_WAIT) mongrel_r 831 root 32u IPv4 95165064 TCP localhost.localdomain :8011->localhost.localdomain:59830 (CLOSE_WAIT) etc. this goes on for 700 lines, where the mongrel on port 8011 has roughly 700 CLOSE_WAIT TCP connections to the 30-60k port range (to apache, I believe). All of these close_waits are for the mongrel on port 8011, in this case. Also, any ideas what''s going on with the close_wait connections to amazon s3? lsof -i -P | grep CLOSE_ | grep mongrel | wc -l 703 netstat | grep 56586 # an example port tcp 1 0 localhost.localdomain:8011 localhost.localdomain: 56586 CLOSE_WAIT tcp 0 0 localhost.localdomain :56586 localhost.localdomain: 8011 FIN_WAIT2 getnameinfo failed getnameinfo failed #background loop to set the bad mongrel to debug mode during the close_wait period def debug_mongrel_loop sleep (60) until (`lsof -i -P | grep CLOSE_WAIT | grep mongrel | wc -l`).to_i > 100 `killall -USR1 mongrel_rails` AdminMailer.deliver_mongrel_debug_mode_turned_on # optional email alert # sleep 2 minutes, and then undo the debug mode. sleep(120) `killall -USR1 mongrel_rails` end --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---