thr3ads.net - Mongrel users - [Mongrel] Mongrel hangs with 100% CPU / EBADF (Bad file descriptor) [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Front Line

2008-Aug-27 11:40 UTC

[Mongrel] Mongrel hangs with 100% CPU / EBADF (Bad file descriptor)

We have a server with 10 running mongrel_cluster instances with apache
in front of them, and every now and then one or some of them hang.
No activity is seen in the database (we''re using activerecord
sessions).
Mysql with innodb tables. show innodb status shows no locks. show
processlist shows nothing.
The server is linux debian 4.0
Ruby is: ruby 1.8.6 (2008-03-03 patchlevel 114) [i486-linux]
Rails is: Rails 1.1.2 (yes, quite old)
We''re using the native mysql connector (gem install mysql)

"strace -p PID" gives the following in a loop for the hung mongrel
process:

gettimeofday({1219834026, 235289}, NULL) = 0
select(4, [3], [0], [], {0, 905241})    = -1 EBADF (Bad file descriptor)
gettimeofday({1219834026, 235477}, NULL) = 0
select(4, [3], [0], [], {0, 905053})    = -1 EBADF (Bad file descriptor)
gettimeofday({1219834026, 235654}, NULL) = 0
select(4, [3], [0], [], {0, 904875})    = -1 EBADF (Bad file descriptor)
gettimeofday({1219834026, 235829}, NULL) = 0
select(4, [3], [0], [], {0, 904700})    = -1 EBADF (Bad file descriptor)
gettimeofday({1219834026, 236017}, NULL) = 0
select(4, [3], [0], [], {0, 904513})    = -1 EBADF (Bad file descriptor)
gettimeofday({1219834026, 236192}, NULL) = 0
select(4, [3], [0], [], {0, 904338})    = -1 EBADF (Bad file descriptor)
gettimeofday({1219834026, 236367}, NULL) = 0
...

I used lsof and found that the process used 67 file descriptors (lsof -p
PID |wc -l)

Is there any other way I can  debug this, so that I could for example
determine which file descriptor is "bad"?
Any other info or suggestions? Anybody else seen this?

The site is fairly used, but not overly so, load averages usually around
0.3.
-- 
Posted via http://www.ruby-forum.com/.

Roger Pack

2008-Sep-27 15:20 UTC

head link

[Mongrel] Mongrel hangs with 100% CPU / EBADF (Bad file descriptor)

> gettimeofday({1219834026, 235289}, NULL) = 0
> select(4, [3], [0], [], {0, 905241})    = -1 EBADF (Bad file descriptor)
> gettimeofday({1219834026, 235477}, NULL) = 0
> select(4, [3], [0], [], {0, 905053})    = -1 EBADF (Bad file descriptor)
> gettimeofday({1219834026, 235654}, NULL) = 0
> select(4, [3], [0], [], {0, 904875})    = -1 EBADF (Bad file descriptor)
> gettimeofday({1219834026, 235829}, NULL) = 0
> select(4, [3], [0], [], {0, 904700})    = -1 EBADF (Bad file descriptor)
> gettimeofday({1219834026, 236017}, NULL) = 0
> select(4, [3], [0], [], {0, 904513})    = -1 EBADF (Bad file descriptor)
> gettimeofday({1219834026, 236192}, NULL) = 0
> select(4, [3], [0], [], {0, 904338})    = -1 EBADF (Bad file descriptor)
> gettimeofday({1219834026, 236367}, NULL) = 0
> ...
> 
> I used lsof and found that the process used 67 file descriptors (lsof -p
> PID |wc -l)
> 
You could try evented mongrel.

I think the real problem is that internally ruby''s select mechanism 
isn''t designed to handle -1''s from select.  I''d call
that a ruby bug,
should that be the case.

In Python when this happens it raises an exception and relies on the 
caller to loop through each socket and discover the offending one.  I 
can only hope that 1.9 does better at this situation.

I beliee ruby''s select also doesn''t handle "more than
1024 socket
descriptors" [it ignores those above 1024] so...I''d call it less
than
perfect.[1]

-=R
[1] 
http://rubyforge.org/tracker/index.php?func=detail&aid=20088&group_id=426&atid=1698
-- 
Posted via http://www.ruby-forum.com/.

Mongrel users - Aug 2008 - Mongrel hangs with 100% CPU / EBADF (Bad file descriptor)

[Mongrel] Mongrel hangs with 100% CPU / EBADF (Bad file descriptor)

[Mongrel] Mongrel hangs with 100% CPU / EBADF (Bad file descriptor)