Not quite sure what to make of this. As a way to handle hung connections I added a timer in post_init as follows. The client connections are fairly fast, they just connect, the server encrypts the data, creates a hash out of a string, and prints the the hash to the client which closes the connection. After about 10-15 connections the server dies saying it can''t add the timer, as if the connection was completed before it added the timer to the list. It doesn''t die after a set number of connections, but the range is usually the same. def post_init EventMachine::add_timer( 300 ) { close_connection } # Close connection after 5 minutes start_tls end /usr/local/lib/ruby/site_ruby/1.8/eventmachine.rb:256:in `add_oneshot_timer'': no timer (RuntimeError) from /usr/local/lib/ruby/site_ruby/1.8/eventmachine.rb:256:in `add_timer'' from /usr/local/processors/tserver/lib/protocol.rb:19:in `post_init'' from /usr/local/lib/ruby/site_ruby/1.8/eventmachine.rb:756:in `initialize'' from /usr/local/processors/tserver/lib/protocol.rb:14:in `initialize'' from /usr/local/lib/ruby/site_ruby/1.8/eventmachine.rb:711:in `event_callback'' from /usr/local/lib/ruby/site_ruby/1.8/eventmachine.rb:200:in `run'' from /usr/local/processors/tserver/lib/tserver.rb:162:in `run'' from bin/start.rb:24
On 9/13/06, snacktime <snacktime at gmail.com> wrote:> Not quite sure what to make of this. As a way to handle hung > connections I added a timer in post_init as follows. The client > connections are fairly fast, they just connect, the server encrypts > the data, creates a hash out of a string, and prints the the hash to > the client which closes the connection. After about 10-15 connections > the server dies saying it can''t add the timer, as if the connection > was completed before it added the timer to the list. It doesn''t die > after a set number of connections, but the range is usually the same. >The extension has a limit of 40 timers currently active at any given time, which is really low. Is that possibly what you''re hitting? I''ve tested it up to 100,000 timers in my own versions of the code and it''s fine, so we should increase this limit. You can do it yourself if you want to touch the C code (MaxTimersOutstanding in ext/em.h) or I can do it if you prefer.
On 9/13/06, Francis Cianfrocca <garbagecat10 at gmail.com> wrote:> On 9/13/06, snacktime <snacktime at gmail.com> wrote: > > Not quite sure what to make of this. As a way to handle hung > > connections I added a timer in post_init as follows. The client > > connections are fairly fast, they just connect, the server encrypts > > the data, creates a hash out of a string, and prints the the hash to > > the client which closes the connection. After about 10-15 connections > > the server dies saying it can''t add the timer, as if the connection > > was completed before it added the timer to the list. It doesn''t die > > after a set number of connections, but the range is usually the same. > > > > The extension has a limit of 40 timers currently active at any given > time, which is really low. Is that possibly what you''re hitting?No that can''t be it. It happens way before 40 connections, and the number of connections it takes to cause the error is different every time.
On 9/13/06, snacktime <snacktime at gmail.com> wrote:> On 9/13/06, Francis Cianfrocca <garbagecat10 at gmail.com> wrote: > > On 9/13/06, snacktime <snacktime at gmail.com> wrote: > > > Not quite sure what to make of this. As a way to handle hung > > > connections I added a timer in post_init as follows. The client > > > connections are fairly fast, they just connect, the server encrypts > > > the data, creates a hash out of a string, and prints the the hash to > > > the client which closes the connection. After about 10-15 connections > > > the server dies saying it can''t add the timer, as if the connection > > > was completed before it added the timer to the list. It doesn''t die > > > after a set number of connections, but the range is usually the same. > > > > > > > The extension has a limit of 40 timers currently active at any given > > time, which is really low. Is that possibly what you''re hitting? > > No that can''t be it. It happens way before 40 connections, and the > number of connections it takes to cause the error is different every > time. >Never mind, that is it. I should know better then to be testing stuff like this at 2 am.
Ok so in the server it''s instantiating an object for each connection, and in that object I''m setting the timer to call close_connection. What happens to the timer when the connection closes before it''s fired? I would think it would try to call a method on an object that no longer exists and throw an exception, but it doesn''t.
That''s a verrrry interesting question. Timers are a property of the EventMachine, not of a particular connection. Sounds like you need an ability to cancel outstanding timers when you close a connection that is expecting one. On 9/13/06, snacktime <snacktime at gmail.com> wrote:> Ok so in the server it''s instantiating an object for each connection, > and in that object I''m setting the timer to call close_connection. > What happens to the timer when the connection closes before it''s > fired? I would think it would try to call a method on an object that > no longer exists and throw an exception, but it doesn''t. > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk >
On 9/13/06, Francis Cianfrocca <garbagecat10 at gmail.com> wrote:> That''s a verrrry interesting question. Timers are a property of the > EventMachine, not of a particular connection. Sounds like you need an > ability to cancel outstanding timers when you close a connection that > is expecting one.I tried calling close_connection twice in a row in unbind, and no errors. Seems like the return value of close_connection is just discarded?
On 9/13/06, snacktime <snacktime at gmail.com> wrote:> > I tried calling close_connection twice in a row in unbind, and no > errors. Seems like the return value of close_connection is just > discarded?Unbind is called by the system in response to a connection being closed (either by your code or by the remote peer), so the connection is already gone by then. Chris, I looked in the code and I think the only possible way for you to get the "no timer" message is if you have more than 40 timers outstanding. Is it your expectation that when a connection closes, that all timers opened by event handlers on that connection get discarded automatically? Because they don''t. Maybe they should be. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/eventmachine-talk/attachments/20060914/d3e08935/attachment.html
On 9/13/06, Francis Cianfrocca <garbagecat10 at gmail.com> wrote:> On 9/13/06, snacktime <snacktime at gmail.com> wrote: > > I tried calling close_connection twice in a row in unbind, and no > > errors. Seems like the return value of close_connection is just > > discarded? > > > Unbind is called by the system in response to a connection being closed > (either by your code or by the remote peer), so the connection is already > gone by then. >I know, I was just doing that to confirm that calling close_connection after it was in fact already closed wouldn''t raise an error, which it doesn''t. Also, I believe the instance of Netstring is kept alive by the timer, or more accurately the instance method of Netstring that''s passed to the timer. In the following code the connections are all under a second, and timeout_connection fires off roughly 4 seconds after the connection has already been closed. So I''ll definitely need to find another way to timeout connections or cancel the timer, or I''ll end up with hundreds of Netstring instances just sitting there (the real timeout should be around 5 minutes not 5 seconds). class Netstring < EventMachine::Connection def initialize *args super @linebuffer = "" end def post_init EventMachine::add_timer( 5 ) { timeout_connection } start_tls end def timeout_connection close_connection end> Chris, I looked in the code and I think the only possible way for you to get > the "no timer" message is if you have more than 40 timers outstanding. Is it > your expectation that when a connection closes, that all timers opened by > event handlers on that connection get discarded automatically? Because they > don''t. Maybe they should be. >Yes it is the 40 timer limit that was causing that.
On 9/14/06, snacktime <snacktime at gmail.com> wrote:> > doesn''t. Also, I believe the instance of Netstring is kept alive by > the timer, or more accurately the instance method of Netstring that''s > passed to the timer. In the following code the connections are all > under a second, and timeout_connection fires off roughly 4 seconds > after the connection has already been closed. So I''ll definitely need > to find another way to timeout connections or cancel the timer, or > I''ll end up with hundreds of Netstring instances just sitting there > (the real timeout should be around 5 minutes not 5 seconds).Yes, exactly, if you pass a block to a timer, it''s a closure like any other Ruby block, so it''s going to hold object references open long before they should be. Looking at what you''re trying to do, I''m thinking what you really want is an inactivity timeout for connections. Something like: "If this connection has no read or write activity for 300 seconds, close it." Correct? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/eventmachine-talk/attachments/20060914/7cc4a256/attachment.html
On 9/13/06, Francis Cianfrocca <garbagecat10 at gmail.com> wrote:> On 9/14/06, snacktime <snacktime at gmail.com> wrote: > > doesn''t. Also, I believe the instance of Netstring is kept alive by > > the timer, or more accurately the instance method of Netstring that''s > > passed to the timer. In the following code the connections are all > > under a second, and timeout_connection fires off roughly 4 seconds > > after the connection has already been closed. So I''ll definitely need > > to find another way to timeout connections or cancel the timer, or > > I''ll end up with hundreds of Netstring instances just sitting there > > (the real timeout should be around 5 minutes not 5 seconds). > > > Yes, exactly, if you pass a block to a timer, it''s a closure like any other > Ruby block, so it''s going to hold object references open long before they > should be. Looking at what you''re trying to do, I''m thinking what you really > want is an inactivity timeout for connections. Something like: "If this > connection has no read or write activity for 300 seconds, close it." > Correct?Yes that would be perfect.
On 9/14/06, snacktime <snacktime at gmail.com> wrote:> > > Yes, exactly, if you pass a block to a timer, it''s a closure like any > other > > Ruby block, so it''s going to hold object references open long before > they > > should be. Looking at what you''re trying to do, I''m thinking what you > really > > want is an inactivity timeout for connections. Something like: "If this > > connection has no read or write activity for 300 seconds, close it." > > Correct? > > Yes that would be perfect.Now you''ve got me thinking (which is always dangerous). Maybe every connection should have such a timeout by default, and if you wanted to suppress it or lengthen it, there would be an API to do so. By the way, there already is a "heartbeat" mechanism in the EM core which was intended to eventually support this requirement, so it won''t be hard to do. I''ll let you know when it''s done. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/eventmachine-talk/attachments/20060914/f2011253/attachment-0001.html