thr3ads.net - Eventmachine talk - [Eventmachine-talk] sockets fin''ing themselves [Dec 2007]

If this information is useful, please help other people find it:
Share via:

Roger Pack

2007-Dec-29 01:43 UTC

[Eventmachine-talk] sockets fin''ing themselves

I don''t know if this is a problem with EM, per se, but I was just
wondering on some insights...
It seems that if I have an EM server running on localhost, and then
have a lot of clients running within another EM, sometimes when the
clients connect to the server, the following is what occurs
(ignore the INCORRECT''s, I think they''re actually not correct
)


No.     Time        Source                Destination           Protocol Info
     51 9.067843    127.0.0.1             127.0.0.1             TCP
  51815 > 7779 [SYN] Seq=0 Win=65535 [TCP CHECKSUM INCORRECT] Len=0
MSS=16344 WS=1 TSV=895386257 TSER=0
     52 9.067964    127.0.0.1             127.0.0.1             TCP
  7779 > 51815 [SYN, ACK] Seq=0 Ack=1 Win=65535 [TCP CHECKSUM
INCORRECT] Len=0 MSS=16344 WS=1 TSV=895386257 TSER=895386257
     53 9.067990    127.0.0.1             127.0.0.1             TCP
  51815 > 7779 [ACK] Seq=1 Ack=1 Win=81660 [TCP CHECKSUM INCORRECT]
Len=0 TSV=895386257 TSER=895386257
     85 14.090994   127.0.0.1             127.0.0.1             TCP
  51815 > 7779 [FIN, ACK] Seq=1 Ack=1 Win=81660 [TCP CHECKSUM
INCORRECT] Len=0 TSV=895386307 TSER=895386257
     86 14.091087   127.0.0.1             127.0.0.1             TCP
  7779 > 51815 [ACK] Seq=1 Ack=2 Win=81660 [TCP CHECKSUM INCORRECT]
Len=0 TSV=895386307 TSER=895386307
     87 14.091566   127.0.0.1             127.0.0.1             TCP
  7779 > 51815 [FIN, ACK] Seq=1 Ack=2 Win=81660 [TCP CHECKSUM
INCORRECT] Len=0 TSV=895386307 TSER=895386307
     88 14.091616   127.0.0.1             127.0.0.1             TCP
  51815 > 7779 [ACK] Seq=2 Ack=2 Win=81660 [TCP CHECKSUM INCORRECT]
Len=0 TSV=895386307 TSER=895386307

So basically it arbitrarily sends a FIN packet about 5 seconds later,
and never sends the data I instructed the client to send.  The client
and the server then both call unbind appropriately, and wonder what
just happened (note I never tell it to close).

This in a client with hundreds of connections.
Is this a known problem with load testing on localhost?  I''ve seen
this before on win32 (this one happens to be on OS X 10.5).  Am I just
generally running out of buffer space so the kernel drops my packets?
Thoughts?
Thanks!
-Roger

Roger Pack

2007-Dec-29 17:49 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

It appears that if a connection doesn''t ''connect''
within X seconds
(specified as 4), then EM considers it a ''dead'' connection and
immediately closes the new connection.  It''s an application specific
setting (think ping timing out after awhile--how long is that
''awhile''?)
Setting it higher fixes it.
So if your server is more than 4 seconds away...ugh.
Thanks!
-Roger

Francis Cianfrocca

2007-Dec-30 00:01 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

On Dec 29, 2007 8:49 PM, Roger Pack <rogerpack2005 at gmail.com> wrote:
> It appears that if a connection doesn''t
''connect'' within X seconds
> (specified as 4), then EM considers it a ''dead''
connection and
> immediately closes the new connection.  It''s an application
specific
> setting (think ping timing out after awhile--how long is that
> ''awhile''?)
> Setting it higher fixes it.
> So if your server is more than 4 seconds away...ugh.
> Thanks!
> -Roger
> _______________________________________________
> Eventmachine-talk mailing list
> Eventmachine-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/eventmachine-talk
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20071230/d71dcb1d/attachment.html

Francis Cianfrocca

2007-Dec-30 00:03 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

On Dec 29, 2007 8:49 PM, Roger Pack <rogerpack2005 at gmail.com> wrote:
> It appears that if a connection doesn''t
''connect'' within X seconds
> (specified as 4), then EM considers it a ''dead''
connection and
> immediately closes the new connection.  It''s an application
specific
> setting (think ping timing out after awhile--how long is that
> ''awhile''?)
> Setting it higher fixes it.
> So if your server is more than 4 seconds away...ugh.
> Thanks!
>

It would be pretty easy to make the timeout interval configurable. But are
you sure your use case is typical? Or were you just experimenting? If a TCP
server takes multiple seconds to handshake, you''ve got a more basic
problem
in your application. (Unless you''re satellite-linking to Antarctica or
something like that.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20071230/f3be8e93/attachment.html

Roger Pack

2007-Dec-31 06:32 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

The conditions where it occurs I think are when you''re connecting a
lot to localhost so you have a lot of full queues and things like
that--a lot of network bandwidth and so it takes those first packets
awhile to get through the queue back and forth between the two hosts.
So for example if you have a bittorrent client that is serving and
downloading a lot of files, it might take 4s or more for an incoming
ack packet to get through the queue among all the incoming data.

That said, however, if I were to design it I might suggest waiting
120s by default.  That would seem to be more ''conservative'' to
allow
for TCP''s faults, and reduce the chance of this error cropping up
again during somebody else''s load testing.
Then it would behave more like set_connection_timeout (which doesn''t
have a predefined value, so the user understands what is happening
when their connections close, later, because they had to manually set
it, in the code).
This also points out the usefulness again of a function like
get_unbind_reason or get_unbind_status or something.

My $.02 for the day.
Thanks all!
-Roger

Francis Cianfrocca

2007-Dec-31 06:58 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

On Dec 31, 2007 9:32 AM, Roger Pack <rogerpack2005 at gmail.com> wrote:
> The conditions where it occurs I think are when you''re connecting
a
> lot to localhost so you have a lot of full queues and things like
> that--a lot of network bandwidth and so it takes those first packets
> awhile to get through the queue back and forth between the two hosts.
> So for example if you have a bittorrent client that is serving and
> downloading a lot of files, it might take 4s or more for an incoming
> ack packet to get through the queue among all the incoming data.
>

Yeah, I can see that, actually, since user-written code could be causing the
bottleneck. EM interleaves reads and accepts in order to keep from starving
either, but it has no control over how long it takes user code to respond to
read-events.

I''d be in favor of adding a flag to set the default connect-pending
timeout.
I don''t like setting it to 120 seconds by default. (I assume you got
that
from the default FIN-WAIT time on some kernels?)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20071231/2cddcdac/attachment.html

Roger Pack

2008-Jan-01 08:51 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

> I''d be in favor of adding a flag to set the default
connect-pending timeout.
> I don''t like setting it to 120 seconds by default. (I assume you
got that
> from the default FIN-WAIT time on some kernels?)
That would work--so by default they don''t timeout and there''s
a flag
to turn it on?  That would be cool.
-Roger

Roger Pack

2008-Jan-01 08:57 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

?My latest problem seems to be that at times EM will stop reading from
sockets, but those sockets will still have queued data in them
I''m actually not sure if this is EM''s fault or my own, or if
maybe EM
is abandoning reading from sockets too early, or something?
         send-q   rec-q
tcp4       0  18460  127.0.0.1.7779         127.0.0.1.54550        FIN_WAIT_1
tcp4   81660      0  127.0.0.1.54550        127.0.0.1.7779         ESTABLISHED

This is a connection between two processes, both running on localhost,
both running EM single threaded, and both not firing any EM events.  I
have yet to check to see if ''unbind'' has already been called
on one or
both sockets, but anyway you can see that the bottom socket is trying
to send lots of packets to the top socket.  This kind of uses up
available kernel buffer space.

Will try to figure it out.  Wish me luck.
-Roger

Francis Cianfrocca

2008-Jan-01 09:47 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

No, there''s really no way to avoid timing out connect-attempts that
fail.
Otherwise over time you''d end up not freeing the descriptors. I was
suggesting to keep the existing timeout and give a method to set its value.

I''m in the middle of some huge and difficult performance-related
changes.
When that''s all checked in, I''ll add the new timeout method.

On Jan 1, 2008 11:51 AM, Roger Pack <rogerpack2005 at gmail.com> wrote:
> > I''d be in favor of adding a flag to set the default
connect-pending
> timeout.
> > I don''t like setting it to 120 seconds by default. (I assume
you got
> that
> > from the default FIN-WAIT time on some kernels?)
>
> That would work--so by default they don''t timeout and
there''s a flag
> to turn it on?  That would be cool.
> -Roger
> _______________________________________________
> Eventmachine-talk mailing list
> Eventmachine-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/eventmachine-talk
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080101/9be7bd5d/attachment-0001.html

Francis Cianfrocca

2008-Jan-01 09:48 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

If the top socket is in FIN_WAIT_1, it has already been closed, either by
your program, or by an exception, or something else. What platform are you
running this on? And is this a repeatable problem that you could send a
test-case for?

On Jan 1, 2008 11:57 AM, Roger Pack <rogerpack2005 at gmail.com> wrote:
> ?My latest problem seems to be that at times EM will stop reading from
> sockets, but those sockets will still have queued data in them
> I''m actually not sure if this is EM''s fault or my own, or
if maybe EM
> is abandoning reading from sockets too early, or something?
>         send-q   rec-q
> tcp4       0  18460  127.0.0.1.7779         127.0.0.1.54550
>  FIN_WAIT_1
> tcp4   81660      0  127.0.0.1.54550        127.0.0.1.7779
> ESTABLISHED
>
> This is a connection between two processes, both running on localhost,
> both running EM single threaded, and both not firing any EM events.  I
> have yet to check to see if ''unbind'' has already been
called on one or
> both sockets, but anyway you can see that the bottom socket is trying
> to send lots of packets to the top socket.  This kind of uses up
> available kernel buffer space.
>
> Will try to figure it out.  Wish me luck.
> -Roger
> _______________________________________________
> Eventmachine-talk mailing list
> Eventmachine-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/eventmachine-talk
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080101/050e0f63/attachment.html

Roger Pack

2008-Jan-02 17:32 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

Question on a bug:
After a certain load EM ''naturally'' runs out of descriptors.
After this point, however, it seems that select (normal select, at
least) returns 22 (invalid argument)--always.
Perhaps the descriptor list gets corrupted?
Since select is returning this, EM never checks its ''still
good''
sockets for read/write status, but just sleeps for a second (I think
it assumes the error will clear itself up).  A good thing might be to
clear up the current bug, and also for it to check for error statuses
and act accordingly.

Should this be a real bug, it would also explain why, on windows,
after a certain load, servers quit accepting, which I run into often,
as windows only allows 256 file descriptors.

Anyway something to think about.

Now some thoughts on EM speedup:

One way to speed up EM might be to have the user specify which
functions they actually override, and only call those.
Like if a user never needs ''connection_completed'' then you
never call
it.  Of course, I can''t imagine many ''real'' protocols
that don''t use
receive_data, so my idea is pretty moot except for maybe unbind and
connection_completed (post init?).  Oh wait EM does this already :)
Kind of minimal gain, though.  Seems as if EM is about as fast as it
can go. Dunno. :)

Another thought would be to allow users to choose if rb_select is used
or select itself (in single threaded mode, you can get away with using
straight select, it seems).  Wonder if that would be worthwhile or
not.

Some more oddities for speed gain would be an option to ''only read
data in large chunks'' to save on message calling overhead.
One might also make heartbeats optional (or keep a list of sockets
that have requested them).
Also I think it''s in there but a ''reusable'' select
array might help, too.
Having select timeout at ''the next timer firing'' might help,
too

Now some random thoughts, for fun.

It would be nice to ''save'' errno away somewhere, so that we
can tell
why certain calls fail.

As mentioned, it seems that socket ''double unbind'' currently
(only
noticed this after the latest SVN, though, so it''s probably not too
hard to find).

I have received this error before--not sure if the assertion is right
or not...it could well be :)

Assertion failed: nbytes > 0, file ed.cpp, line 595 # win32
Assertion failed: (nbytes > 0), function _WriteOutboundData, file
ed.cpp, line 596. (on mac os x, after hitting ctrl-c to interrupt
current transfers) # os x

Anyway..will look into the bug.
Viva EM.
-Roger

On Jan 1, 2008 10:48 AM, Francis Cianfrocca <garbagecat10 at gmail.com>
wrote:> If the top socket is in FIN_WAIT_1, it has already been closed, either by
> your program, or by an exception, or something else. What platform are you
> running this on? And is this a repeatable problem that you could send a
> test-case for?

Francis Cianfrocca

2008-Jan-03 02:03 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

On Jan 2, 2008 8:32 PM, Roger Pack <rogerpack2005 at gmail.com> wrote:
> Question on a bug:
> After a certain load EM ''naturally'' runs out of
descriptors.
> After this point, however, it seems that select (normal select, at
> least) returns 22 (invalid argument)--always.
> Perhaps the descriptor list gets corrupted?
> Since select is returning this, EM never checks its ''still
good''
> sockets for read/write status, but just sleeps for a second (I think
> it assumes the error will clear itself up).  A good thing might be to
> clear up the current bug, and also for it to check for error statuses
> and act accordingly.
>
> Should this be a real bug, it would also explain why, on windows,
> after a certain load, servers quit accepting, which I run into often,
> as windows only allows 256 file descriptors.
>
If select is returning EINVAL, that''s an obvious bug. Do you have a
repeatable test case? Does this happen on Windows? OSX is another system
where the default number of file descriptors is only 256. I wonder if the
problem is the first parameter to the select call (maxsockets), although
that parameter is (supposed to be) ignored on Windows.


>
> One way to speed up EM might be to have the user specify which
> functions they actually override, and only call those.
> Like if a user never needs ''connection_completed'' then
you never call
> it.  Of course, I can''t imagine many ''real''
protocols that don''t use
> receive_data, so my idea is pretty moot except for maybe unbind and
> connection_completed (post init?).  Oh wait EM does this already :)
> Kind of minimal gain, though.  Seems as if EM is about as fast as it
> can go. Dunno. :)
>
I''ve generally found that eliminating rb_funcalls is a very helpful way
to
improve performance, *except* when there is a lot of network I/O going on,
which tends to dominate the profile. Not sure there''s much bang for the
buck
here.

>
> Another thought would be to allow users to choose if rb_select is used
> or select itself (in single threaded mode, you can get away with using
> straight select, it seems).  Wonder if that would be worthwhile or
> not.
>
In Ruby 1.8.x, if you call rb_thread_select in a program with no additional
threads beside the main one, the performance impact is almost unmeasurable.
If you have even a single additional Ruby thread, even if it''s only
sleeping, the performance impact is huge. For Ruby 1.9, EM uses the newer
thread_nonblocking_region. I haven''t profiled performance under Ruby
1.9,
which is still very buggy anyway.
>
> Some more oddities for speed gain would be an option to ''only read
> data in large chunks'' to save on message calling overhead.
> One might also make heartbeats optional (or keep a list of sockets
> that have requested them).
> Also I think it''s in there but a ''reusable''
select array might help, too.
> Having select timeout at ''the next timer firing'' might
help, too
>
There''s an endless amount of lore on how to make select faster with
large
sets, going back to the early Web days, when large sets first started
happening. None of it is really as good as using things like epoll and
kqueue.
>
>
> Now some random thoughts, for fun.
>
> It would be nice to ''save'' errno away somewhere, so that
we can tell
> why certain calls fail.
>
> As mentioned, it seems that socket ''double unbind''
currently (only
> noticed this after the latest SVN, though, so it''s probably not
too
> hard to find).
>
Since there''s no verb in your dependent clause, I don''t know
what you''re
saying here :-), but I do want to know. The recursive unbind problem is
something I really want to solve.
>
>
> I have received this error before--not sure if the assertion is right
> or not...it could well be :)
>
> Assertion failed: nbytes > 0, file ed.cpp, line 595 # win32
> Assertion failed: (nbytes > 0), function _WriteOutboundData, file
> ed.cpp, line 596. (on mac os x, after hitting ctrl-c to interrupt
> current transfers) # os x

This sounds benign. Is it repeatable on the mac?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080103/89e8018d/attachment.html

Roger Pack

2008-Jan-03 10:38 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

> In Ruby 1.8.x, if you call rb_thread_select in a program with no additional
> threads beside the main one, the performance impact is almost unmeasurable.
> If you have even a single additional Ruby thread, even if it''s
only
> sleeping, the performance impact is huge. For Ruby 1.9, EM uses the newer
> thread_nonblocking_region. I haven''t profiled performance under
Ruby 1.9,
> which is still very buggy anyway.
The other day I remember thinking ''what is taking up 100% cpu''
and it
turns out to be a ruby process with 16 threads or something, each
using network I/O.  I was reminded of how poorly ruby performs in a
multi-threaded environment :)  Its native threads scar eme.
>If select is returning EINVAL, that''s an obvious bug. Do you have arepeatable test case? Does this happen on Windows? >OSX is another
system where the default number of file descriptors is only 256. I
wonder if the problem is the first parameter >to the select call
(maxsockets), although that parameter is (supposed to be) ignored on
Windows.

on windows, after a few seconds of high load I begin to get (a custom
print out line)
select failed 10038 An operation was attempted on something that is not a socket

(same on os x, just EINVAL instead, and I can move the file descriptor
limit up, so it''s not as common)

However, when an existing connection closes, select (I think) seems to
work, because I can accept (one more) connection, then it keeps
erring.

Mytheory for this is that when you run out of file descriptors, some
bad things happen to existing sockets.   Like maybe acceptors
''err''
when they are passed to select.
I''m not sure, however, as it seems that some
''writable'' sockets err,
and some ''readable'' sockets err.  Maybe pending connections
(typically
readable) also err on select, if there aren''t file descriptors
available.  I''m not sure.

I haven''t totally figured it out yet, but it appears that looping
through and excluding those that err from the select descriptors
before hand avoids the error and EM seems to handle according to
specs.  I first thought that if a socket erred on select it was
totally toast--however it might be that those sockets ''become
readable'' later, so just excluding them is a good option.  I''m
honestly not sure why this stuff occurs
.
I am not sure how sockets respond with epoll.
> > It would be nice to ''save'' errno away somewhere, so
that we can tell
> > why certain calls fail.
This is a request for a function to access ''errno'' so we can
tell why
connect_server fails (is it out of descriptors?) or to
strerror(errno).
> > As mentioned, it seems that socket ''double unbind''
currently (only
> > noticed this after the latest SVN, though, so it''s probably
not too
> > hard to find).
> >
>
> Since there''s no verb in your dependent clause, I don''t
know what you''re
> saying here :-)
??>, but I do want to know. The recursive unbind problem is
> something I really want to solve.
I just commented out the ''throw'' clause for double unbinds
(hack job)
and it works for now for myself.  Monkey patches here we come.  I note
that it is only a ''recent'' problem in the code, I believe.
> > Assertion failed: nbytes > 0, file ed.cpp, line 595 # win32
> > Assertion failed: (nbytes > 0), function _WriteOutboundData, file
> > ed.cpp, line 596. (on mac os x, after hitting ctrl-c to interrupt
> > current transfers) # os x
> This sounds benign. Is it repeatable on the mac?
I''ve seen it once on mac os x, under heavy load.  It is repeatable
under heavy load, at least on win32.  I just turned commented it out
and things seem to be well.  I think I can reproduce it consistently.

Well that''s about it!
Thanks all!
-Roger

Roger Pack

2008-Jan-03 17:57 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

Question: on win32 do you need to propagate the fderr group (select''s
4th param?) in order to ascertain whether a socket does NOT connect
well?

Also, from http://itamarst.org/writings/win32sockets.html it mentions
that FD_SETSIZE should be resized--anybody know if that is still the
case?  I guess it was in 2001, but hey :)

Now, some more random bugs that have happened:
Now it appears that sometimes for me, on win32, my program will reach
a state such that select always passes ''immediately'' with a
value of
1, and does read on a socket (not the loopbreak reader) but doesn''t
close that socket and doesn''t do any callbacks to code.  I would look
into it but I think I''m gonna bail on win32 and go to os x again.
Any ideas?


./eventmachine/svn/version_0/lib/eventmachine.rb:226: [BUG] Segmentation fault
ruby 1.8.6 (2007-10-21) [i386-mingw32]

Happens infrequently, but does happen.

Anyway if select fails and you then check each socket, one by one, and
only select on the valid sockets, it seems to work still.

Wish me luck.
-Roger

Francis Cianfrocca

2008-Jan-03 18:39 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

On Jan 3, 2008 8:57 PM, Roger Pack <rogerpack2005 at gmail.com> wrote:
> Question: on win32 do you need to propagate the fderr group
(select''s
> 4th param?) in order to ascertain whether a socket does NOT connect
> well?
>
We don''t do anything (for now at any rate) with fderr. On Windows,
failures-to-connect are caught by the heartbeat mechanism. This is obviously
not ideal but I don''t know how Windows signals connect errors. Being
Windows, there''s no definitive documentation. And being Windows, the
behavior is likely to be different for every version of the OS that''s
out
there. That''s why I punted on this one.

> Also, from http://itamarst.org/writings/win32sockets.html it mentions
> that FD_SETSIZE should be resized--anybody know if that is still the
> case?  I guess it was in 2001, but hey :)
>
FD_SETSIZE is always sized to 1024. That''s done in extconf.rb (it
generates
a compiler flag). Maybe you could try setting it down to 256 for Windows and
Mac and see if that changes anything?

> Now, some more random bugs that have happened:
> Now it appears that sometimes for me, on win32, my program will reach
> a state such that select always passes ''immediately'' with
a value of
> 1, and does read on a socket (not the loopbreak reader) but
doesn''t
> close that socket and doesn''t do any callbacks to code.  I would
look
> into it but I think I''m gonna bail on win32 and go to os x again.
> Any ideas?
>
>
> ./eventmachine/svn/version_0/lib/eventmachine.rb:226: [BUG] Segmentation
> fault
> ruby 1.8.6 (2007-10-21) [i386-mingw32]
>
> Happens infrequently, but does happen.
>
> Anyway if select fails and you then check each socket, one by one, and
> only select on the valid sockets, it seems to work still.

What do you mean by "valid" sockets? Are you calling an ioctl or
something
to see if they''re valid?

> Wish me luck.
> -Roger
> _______________________________________________
> Eventmachine-talk mailing list
> Eventmachine-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/eventmachine-talk
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080103/ef67202f/attachment-0001.html

Roger Pack

2008-Jan-03 19:03 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

> We don''t do anything (for now at any rate) with fderr. On Windows,
> failures-to-connect are caught by the heartbeat mechanism. This is
obviously
> not ideal but I don''t know how Windows signals connect errors.
Being
> Windows, there''s no definitive documentation. And being Windows,
the
> behavior is likely to be different for every version of the OS
that''s out
> there. That''s why I punted on this one.
Searching seemed to yield.
http://msdn2.microsoft.com/en-us/library/ms740141.aspx

Which describes one way.  Hard to find, though.  Unfortunately this is
somewhat of a limitation in windows with its few file
descriptors/process :)

> FD_SETSIZE is always sized to 1024. That''s done in extconf.rb (it
generates
> a compiler flag). Maybe you could try setting it down to 256 for Windows
and
> Mac and see if that changes anything?
Sweet.  Thanks for doing that.
It seems that at least on mac os,
http://www.delorie.com/gnu/docs/glibc/libc_248.html seems to imply
that, of all file descriptors created (say you run ulimit -n 2000 --
you could create descriptors up to 1024) if any of those descriptor
numbers are > FD_SETSIZE, then you can''t pass them to select.  I
think
this may be why select is returning EINVAL after awhile.
There may also be a small conflict in how many sockets EM allows--if
it has a total of > 1024 created at any one time, then the fd_set''s
can''t fit them all for a single select, and so that might also be a
reason it returns EINVAL.

In other random news...there are sometimes random pauses, but I think
that they may be caused by attempting to do name resolution when you
don''t have any buffer space available (but who knows).  At least
they''re only pauses, and pretty rare.  Probably my code, but hey,
thought I''d throw it out there in case anybody ran into it.

Any thoughts?
Thanks!
-Roger

Francis Cianfrocca

2008-Jan-04 05:19 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

On Jan 3, 2008 10:03 PM, Roger Pack <rogerpack2005 at gmail.com> wrote:
>
> > FD_SETSIZE is always sized to 1024. That''s done in extconf.rb
(it
> generates
> > a compiler flag). Maybe you could try setting it down to 256 for
Windows
> and
> > Mac and see if that changes anything?
>
> Sweet.  Thanks for doing that.
> It seems that at least on mac os,
> http://www.delorie.com/gnu/docs/glibc/libc_248.html seems to imply
> that, of all file descriptors created (say you run ulimit -n 2000 --
> you could create descriptors up to 1024) if any of those descriptor
> numbers are > FD_SETSIZE, then you can''t pass them to select. 
I think
> this may be why select is returning EINVAL after awhile.
> There may also be a small conflict in how many sockets EM allows--if
> it has a total of > 1024 created at any one time, then the
fd_set''s
> can''t fit them all for a single select, and so that might also be
a
> reason it returns EINVAL.
>
It''s definitely true that FD_SETSIZE controls how many descriptors you
can
pass to select. It''s also true that EM is a library linked into a ruby
process, so Ruby''s FD_SETSIZE is what controls the outcome. And in
Ruby,
that''s never larger than 1024 descriptors.

The only way to solve this is by using epoll on Linux 2.6 and kqueue on OSX
or BSD. Kqueue support was added to EM after the last release, so try
syncing to the head revision. Read the document titled EPOLL and mentally
substitute "kqueue" for "epoll" wherever it appears.

> In other random news...there are sometimes random pauses, but I think
> that they may be caused by attempting to do name resolution when you
> don''t have any buffer space available (but who knows).  At least
> they''re only pauses, and pretty rare.  Probably my code, but hey,
> thought I''d throw it out there in case anybody ran into it.
>
Yes. Definitely try using IP addresses instead of hostnames to see if the
problem goes away. DNS resolution in the standard Ruby library actually
spins threads and is horrendously slow. I wrote an evented DNS
resolver/cache a few months back, which is far faster than the standard one.
I haven''t added it to the distro yet but I should.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080104/d05fe503/attachment.html

Roger Pack

2008-Jan-04 09:55 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

> It''s definitely true that FD_SETSIZE controls how many descriptors
you can
> pass to select. It''s also true that EM is a library linked into a
ruby
> process, so Ruby''s FD_SETSIZE is what controls the outcome. And in
Ruby,
> that''s never larger than 1024 descriptors.
Sweetness.  Turns out if you arbitrarily close sockets (descriptor
number) > FD_SETSIZE then EM works on os x!  Yea!  Select no longer
returns EINVAL.  I can submit the patch if you''d like.
The reason this is necessary is that os x by default has 256.  When
you hit that limit, you naturally want to up the file descriptors
available to something larger.  The gotcha is you can up the limit to
up to 10000--but if you do past 1024 then you can create
''valid''
descriptors that fail when passed to select.  Fix is the patch (just
check them on creation to see if they''ll fit within an fd_set, if
you''re using select), or to use kqueue.


Dunno if this helps the couple of probs on win32.
Also dunno if such a thing would be good for epoll/kqueue as well.
> Yes. Definitely try using IP addresses instead of hostnames to see if the
> problem goes away. DNS resolution in the standard Ruby library actually
> spins threads and is horrendously slow. I wrote an evented DNS
> resolver/cache a few months back, which is far faster than the standard
one.
> I haven''t added it to the distro yet but I should.Grin.  Yep you got me--I thought it was something else but it was DNS
again.  Silly me.


Another optimization thought:  after select we run through every
socket, then check the loop breaker.  An optimization would be to
check if s==1 and the loop breaker is in the set, then you don''t have
to run through the loop to check each socket.  Also theoretically you
only need to run through the fd_sets until you''ve found
''s'' worth of
readable/writable sockets, so you can break the loop early.  Just some
thoughts.

Thanks all.

--
-Roger Pack
For God hath not given us the spirit of fear; but of power, and of
love, and of a sound mind" -- 2 Timothy 1:7

Francis Cianfrocca

2008-Jan-04 18:23 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

On Jan 4, 2008 12:55 PM, Roger Pack <rogerpack2005 at gmail.com> wrote:
>  Sweetness.  Turns out if you arbitrarily close sockets (descriptor
> number) > FD_SETSIZE then EM works on os x!  Yea!  Select no longer
> returns EINVAL.  I can submit the patch if you''d like.
> The reason this is necessary is that os x by default has 256.  When
> you hit that limit, you naturally want to up the file descriptors
> available to something larger.  The gotcha is you can up the limit to
> up to 10000--but if you do past 1024 then you can create
''valid''
> descriptors that fail when passed to select.  Fix is the patch (just
> check them on creation to see if they''ll fit within an fd_set, if
> you''re using select), or to use kqueue.
>
I want the patch :-).

Do try the kqueue implementation. I''d like to know if it works for
anyone
besides me.

>
> Another optimization thought:  after select we run through every
> socket, then check the loop breaker.  An optimization would be to
> check if s==1 and the loop breaker is in the set, then you don''t
have
> to run through the loop to check each socket.  Also theoretically you
> only need to run through the fd_sets until you''ve found
''s'' worth of
> readable/writable sockets, so you can break the loop early.  Just some
> thoughts.
>

Think about it for a moment. If s==1, then the process by definition
isn''t
heavily loaded so it doesn''t need optimizing. This might make it go
faster
in a benchmark but there''s no benefit in the real world. :-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080104/4eeff3b9/attachment.html

Roger Pack

2008-Jan-04 19:17 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

> I want the patch :-).
I haven''t tested it in win32--I assume you''d like me to do
that and
get it ''perfect'' first, or do you just want it?  I assume the
polished
one?
I may try to create a test case that shows how this is broken that
fails on the old implementation and doesn''t with the new.  I think
it''s basically ''use up all your file descriptors''
then close one and
open one--it should connect.  I''ll see if I can create one, too.

Another question--are you more concerned with raw speed or with
guaranteed functionality for patches?  Like extra assertions--I assume
leave them in?
> Do try the kqueue implementation. I''d like to know if it works for
anyone
> besides me.I should do that.  I''m interested with what happens when you hit the
upper boundary--if you try and allocate too many ports or what not.
Does it still need boundary checking?  That type of thing.
Theoretically the problem should just go away.

Thanks for all the work and time.
-Roger

Francis Cianfrocca

2008-Jan-04 19:35 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

On Jan 4, 2008 10:17 PM, Roger Pack <rogerpack2005 at gmail.com> wrote:
> > I want the patch :-).
>
> I haven''t tested it in win32--I assume you''d like me to
do that and
> get it ''perfect'' first, or do you just want it?  I assume
the polished
> one?
> I may try to create a test case that shows how this is broken that
> fails on the old implementation and doesn''t with the new.  I think
> it''s basically ''use up all your file
descriptors'' then close one and
> open one--it should connect.  I''ll see if I can create one, too.
>
All the testing you can do is most welcome. And if you can make a test case
that can go into the distro that''s really superb. I was just thinking
we
need more unit tests that are stress tests rather than just correctness
tests.

> Another question--are you more concerned with raw speed or with
> guaranteed functionality for patches?  Like extra assertions--I assume
> leave them in?
>
Leave assertions and parameter-checks in. Raw speed is of no value if the
code isn''t 100% reliable.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080104/78c477bd/attachment.html

Roger Pack

2008-Jan-07 12:09 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

> > Another optimization thought:  after select we run through every
> > socket, then check the loop breaker.  An optimization would be to
> > check if s==1 and the loop breaker is in the set, then you
don''t have
> > to run through the loop to check each socket.  Also theoretically you
> > only need to run through the fd_sets until you''ve found
''s'' worth of
> > readable/writable sockets, so you can break the loop early.  Just some
> > thoughts.
> >
>
> Think about it for a moment. If s==1, then the process by definition
isn''t
> heavily loaded so it doesn''t need optimizing. This might make it
go faster
> in a benchmark but there''s no benefit in the real world. :-)
Another thought is that it allows the code that broke the loop to be
executed more quickly.  Also in the case of those that use next_tick
constantly their loop would be executed more quickly.  And it''s easy
that''s why I suggested it :)

But as you suggested the only real way to improve speed is to profile
and kill the inefficiencies in the bottleneck--not in the typical use
cases (though this one might possibly do something).

Thanks.
-Roger

Roger Pack

2008-Jan-07 14:39 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

looks like we might be able to work in _DARWIN_UNLIMITED_SELECT as a
parameter to select in mac os x to help us, too, though kqueue works
well.

Roger Pack

2008-Jan-08 18:11 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

As a note I also noted that inexplicably get_sockname will fail on
windows, at times.  I don''t know how this is really possible, but it
does.  Go figure :)
-Roger
> On windows (at least mine--mingw), it appears that setting FD_SETSIZE
> to more than 64 makes it so that some sockets are ignored (see the
> attached test).  Not sure why.  Wonder if using winsock2 would work
> better.  The current extconf.rb leaves it at 64, so we should be good
> (should be fully functional as it is for win32).
>
> That being said, the patch still doesn''t fix
''all'' the problems on
> win32.  It fixes several, but not all.
>
> Sometimes select with the current code base fails in windows, because,
> of all sockets, a few of them are ''bad''.  Running through
each socket
> one at a time after a failed select, and checking if each ''is the
> failing socket'' by selecting with a timeval of {0,0} catches
''the
> problem socket'' 50% of the time.  Sometimes, however, even that is
not
> enough.  Each socket by itself passes, but then select fails again.
> So I made the assumption that checking each socket with a timeval of
> {0,1} would find those.  It might not.
> The problem of ''weirdness even given select {0,0} '' seems
to happen
> after  I open some sockets, then some file descriptors, then selects
> fail.  Perhaps they''re getting ploughed.
>
> It also appears, separately, that sometimes select returns immediately
> with the value 3, though no sockets are in select except the loop
> breaker.  That was odd.
> So overall I''d say there are still some probs on win32 that cause
> select to return immediately, and in error.  I probably won''t take
a
> look at them unless I get really bored, as 50% of the problems seem to
> be fixed.
>

Roger Pack

2008-Jan-11 08:32 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

> That being said, the patch still doesn''t fix
''all'' the problems on
> win32.  It fixes several, but not all.
I did have a question when I was making it, though, of what to do in
the case that you ''accept'' an incoming socket but
it''s too high
numbered (linux) or you already have too many sockets (windows) in
order to use it?  I assumed to just close it, but wasn''t sure.
Thoughts?
-Roger

Francis Cianfrocca

2008-Jan-11 11:58 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

On Jan 11, 2008 11:32 AM, Roger Pack <rogerpack2005 at gmail.com> wrote:
> > That being said, the patch still doesn''t fix
''all'' the problems on
> > win32.  It fixes several, but not all.
>
> I did have a question when I was making it, though, of what to do in
> the case that you ''accept'' an incoming socket but
it''s too high
> numbered (linux) or you already have too many sockets (windows) in
> order to use it?  I assumed to just close it, but wasn''t sure.
>

It''s possible to accept a socket that''s too high-numbered to
work in a
select set, if the process is permitted to create a larger number of
descriptors than FD_SETSIZE. You''re really not likely to see this on
Windows, however. (On some versions of Windows, file descriptors are
pointers rather than index numbers, anyway, so there''s no meaningful
numeric
comparison.) On Unix you can make it happen if you go out of your way.

So you''re making a good point and it''s worth validating that a
descriptor is
selectable.

I think that our future direction really should be away from select and
toward the native high-performance polling mechanism on all platforms. We
have epoll and kqueue now. We should add /dev/poll for Solaris and IOCP for
Windows. Then the only platforms left on select will be be back versions of
Linux and less-common Unixes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080111/077925ae/attachment-0001.html

Kirk Haines

2008-Jan-11 12:48 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

On Jan 11, 2008 12:58 PM, Francis Cianfrocca <garbagecat10 at gmail.com>
wrote:
> I think that our future direction really should be away from select and
> toward the native high-performance polling mechanism on all platforms. We
> have epoll and kqueue now. We should add /dev/poll for Solaris and IOCP for
> Windows. Then the only platforms left on select will be be back versions of
> Linux and less-common Unixes.
 Absolutely.  And that brings up a question.

EM.kqueue
EM.epoll

First, IMHO, if those are called, they should return a useful
true/false indicating whether that method is enabled or not.

Second, I think that calling those should be unnecessary; that EM
should use the correct version for the platform, and should instead
permit someone to call EM.select in order to force select''s use.

Thoughts?

Kirk Haines

Michael S. Fischer

2008-Jan-11 15:49 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

On Jan 11, 2008 12:48 PM, Kirk Haines <wyhaines at gmail.com>
wrote:>
> On Jan 11, 2008 12:58 PM, Francis Cianfrocca <garbagecat10 at
gmail.com> wrote:
>
> > I think that our future direction really should be away from select
and
> > toward the native high-performance polling mechanism on all platforms.
We
> > have epoll and kqueue now. We should add /dev/poll for Solaris and
IOCP for
> > Windows. Then the only platforms left on select will be be back
versions of
> > Linux and less-common Unixes.
>
>  Absolutely.  And that brings up a question.
>
> EM.kqueue
> EM.epoll
>
> First, IMHO, if those are called, they should return a useful
> true/false indicating whether that method is enabled or not.
>
> Second, I think that calling those should be unnecessary; that EM
> should use the correct version for the platform, and should instead
> permit someone to call EM.select in order to force select''s use.
>
> Thoughts?
>
+1 :-)

--Michael

Roger Pack

2008-May-12 00:00 UTC

head link

[Eventmachine-talk] sockets fin''ing themselves

What can I do to help my patch be accepted?  It does have a test, as well.
-R

On Fri, Jan 4, 2008 at 9:35 PM, Francis Cianfrocca
<garbagecat10 at gmail.com> wrote:> On Jan 4, 2008 10:17 PM, Roger Pack <rogerpack2005 at gmail.com>
wrote:
>
> > > I want the patch :-).
> >
> > I haven''t tested it in win32--I assume you''d like me
to do that and
> > get it ''perfect'' first, or do you just want it?  I
assume the polished
> > one?
> > I may try to create a test case that shows how this is broken that
> > fails on the old implementation and doesn''t with the new.  I
think
> > it''s basically ''use up all your file
descriptors'' then close one and
> > open one--it should connect.  I''ll see if I can create one,
too.
> >
>
> All the testing you can do is most welcome. And if you can make a test case
> that can go into the distro that''s really superb. I was just
thinking we
> need more unit tests that are stress tests rather than just correctness
> tests.
>
>
> >
> > Another question--are you more concerned with raw speed or with
> > guaranteed functionality for patches?  Like extra assertions--I assume
> > leave them in?
> >
> >
>
> Leave assertions and parameter-checks in. Raw speed is of no value if the
> code isn''t 100% reliable.
>
>
> _______________________________________________
>  Eventmachine-talk mailing list
>  Eventmachine-talk at rubyforge.org
>  http://rubyforge.org/mailman/listinfo/eventmachine-talk
>

Eventmachine talk - Dec 2007 - sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves

[Eventmachine-talk] sockets fin''ing themselves