It seems that if you run multiple threads with sockets in Ruby (some of them using select, etc.) that sometimes descriptors get corrupted--especially in win32, but also to a lesser extent in Linux. This without EM. Now, I was wondering if anybody could help with this: if you mix the same with EM connections, then I think after awhile EM servers no longer get called correctly to ''accept'' incoming connections. Also if you call stop_server, the call sometimes doesn''t work, and future connections to that port are just ''eaten'' and not responded to by anything. At times, descriptors seem to get ''switched'', too -- one sends the input of the others data or what not (even in EM). The good news is it seems to work much better in Linux, but I''m not sure if it works perfectly or not. I''m not sure if this occurs when EM is used in a single threaded environment (but am increasingly seeing this as the way to go). I hope it doesn''t. One possibility is that Ruby''s current select might not be multi-thread safe. So the only for-sure thing is that some combination of win32+threads+sockets yields problems. Any thoughts? Thanks for any help. -- -Roger For God hath not given us the spirit of fear; but of power, and of love, and of a sound mind" -- 2 Timothy 1:7
From: "Roger Pack" <rogerpack2005 at gmail.com>> > It seems that if you run multiple threads with sockets in Ruby (some > of them using select, etc.) that sometimes descriptors get > corrupted--especially in win32, but also to a lesser extent in Linux. > This without EM.Strange. I have dozens of multithreaded ruby servers that run for months and months without problem on linux. (I wrote them before I knew about EM...) The servers use both TCP and UDP, and are crazily multithreaded. I''ve run these servers continually since 2003, with the occasional restart to add new features. I''ve run these servers under win32 as well, although just temporarily - so only for days instead of months - but still, I don''t recall any socket-related issues. . . . Note, I''m not saying you''re wrong, I''m just saying I''m surprised. :) Regards, Bill
I don''t have anything novel to contribute to the discussion as it relates to sockets and Ruby threads, but I will say this: One of EM''s design points is to enable concurrent applications *without* threads. You won''t have the kind of socket-corruption problems you''ve been seeing if you use EM single-threaded. But even more, you''ll avoid all the difficulties of threaded programming itself. I''m one of the people who believe that, in the general case, threading adds difficulties that are not fully offset by its benefits. That''s not a prejudice but rather comes from a dozen years of experience. On Nov 24, 2007 7:17 PM, Roger Pack <rogerpack2005 at gmail.com> wrote:> It seems that if you run multiple threads with sockets in Ruby (some > of them using select, etc.) that sometimes descriptors get > corrupted--especially in win32, but also to a lesser extent in Linux. > This without EM. > Now, I was wondering if anybody could help with this: if you mix the > same with EM connections, then I think after awhile EM servers no > longer get called correctly to ''accept'' incoming connections. > Also if you call stop_server, the call sometimes doesn''t work, and > future connections to that port are just ''eaten'' and not responded to > by anything. > At times, descriptors seem to get ''switched'', too -- one sends the > input of the others data or what not (even in EM). > > The good news is it seems to work much better in Linux, but I''m not > sure if it works perfectly or not. > > I''m not sure if this occurs when EM is used in a single threaded > environment (but am increasingly seeing this as the way to go). I > hope it doesn''t. > > One possibility is that Ruby''s current select might not be multi-thread safe. > So the only for-sure thing is that some combination of > win32+threads+sockets yields problems. > Any thoughts? > Thanks for any help. > > -- > -Roger > For God hath not given us the spirit of fear; but of power, and of > love, and of a sound mind" -- 2 Timothy 1:7 > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk >
Yeah I have no idea where the corruption comes from. It seems far more common on windows than on Linux. To recreate it basically create (all within the same process) lots of clients and servers all talking to each other (lots as in hammer the system) and sometimes the problem occurs. It is indeed odd. Thanks for your courteous reply. -Roger On Nov 24, 2007 7:25 PM, Bill Kelly <billk at cts.com> wrote:> > From: "Roger Pack" <rogerpack2005 at gmail.com> > > > > It seems that if you run multiple threads with sockets in Ruby (some > > of them using select, etc.) that sometimes descriptors get > > corrupted--especially in win32, but also to a lesser extent in Linux. > > This without EM. > > Strange. I have dozens of multithreaded ruby servers that run for > months and months without problem on linux. (I wrote them before I > knew about EM...) The servers use both TCP and UDP, and are crazily > multithreaded. I''ve run these servers continually since 2003, with > the occasional restart to add new features. > > I''ve run these servers under win32 as well, although just temporarily > - so only for days instead of months - but still, I don''t recall any > socket-related issues. > > . . . Note, I''m not saying you''re wrong, I''m just saying I''m surprised. > > :) > > > Regards, > > Bill > > > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk >-- -Roger Pack For God hath not given us the spirit of fear; but of power, and of love, and of a sound mind" -- 2 Timothy 1:7
Some questions... It would appear that get_peername at times returns an empty string (even on Linux). Only under heavy load, I believe--any thoughts? Also I noticed that a call to stop_server twice (with the same sig) will crash EM. Also, in mingw unfortunately the causes of internal errors are not output to the screen nicely like they are in Linux. I suppose a fix might be to do a printf+raise if you are in windows? (Just wondering what to do to make a patch). It would also appear that unbind is called when the peer closes the port (correct me if I''m wrong), as sometimes there is data in transit, so EM will call receive_data till the pipes are empty, then call unbind. The current status of my incantation mentioned in the previous posts is that on win32 when I run 61 client/server pairs it always ''stalls'' at the same exact point some ~8 minutes into the program (it''s single-threaded, and using the same code well for 8 minutes, then freezes). After that poitn, some sockets still accept (so I know EM isn''t frozen), but no data is ever received back and forth between sockets. I''m looking into it. Thoughts? In Linux it seems to work just fine (same code). It might be a mingw things, or perhaps a ''windows limitation'' of some sort. I know it''s not running out of file descriptors (it starts with 512 available, has like 480 remaining), and it doesn''t seem like there are too many sockets in TIME_WAIT or what not, so it''s a mystery still. At least it works in Linux :) In reality, though, the problems may be there in Linux--it just runs a lot faster so it could be obscuring them. The funny part is that this code started as heavily multi-threaded. As such, it would get to 11 peers then ''stall''. Then I removed a lot of the extraneous threads. It would get to 29 peers then ''stall''. Now it gets to 64 (being single-threaded). Hopefully getting toward the goal :) Thanks all. -Roger PS I also added wrappers for get_sockname if anybody wants ''em.
Turns out on mac OS X IO if you have two threads, and one of them is doing a backtick command `uptime` or what not, and the other running EM, it sometimes freezes your program. People used to blame IO problems on win32 on windows. I''m not sure sure anymore. Take away the concurrency and viola--Ruby works again. Thanks for the tip.>It would appear that get_peername at times returns an empty string >(even on Linux). Only under heavy load, I believe--any thoughts?I think this one is when you run get_peername while the connection is still ''pending.'' -- unfortunately at least on Linux it pauses 1s (exactly) THEN returns an empty string, so it might be worth arbitrarily returning ''not_yet_connected'' as the host name, without doing the lookup. I could do the patch. Thoughts? -Roger On Nov 24, 2007 10:01 PM, Francis Cianfrocca <garbagecat10 at gmail.com> wrote:> I don''t have anything novel to contribute to the discussion as it > relates to sockets and Ruby threads, but I will say this: > > One of EM''s design points is to enable concurrent applications > *without* threads. You won''t have the kind of socket-corruption > problems you''ve been seeing if you use EM single-threaded. But even > more, you''ll avoid all the difficulties of threaded programming > itself. > > I''m one of the people who believe that, in the general case, threading > adds difficulties that are not fully offset by its benefits. That''s > not a prejudice but rather comes from a dozen years of experience.
On Dec 31, 2007 9:37 AM, Roger Pack <rogerpack2005 at gmail.com> wrote:> Turns out on mac OS X IO if you have two threads, and one of them is > doing a backtick command `uptime` or what not, and the other running > EM, it sometimes freezes your program. People used to blame IO > problems on win32 on windows. I''m not sure sure anymore. > Take away the concurrency and viola--Ruby works again. > Thanks for the tip. > > >It would appear that get_peername at times returns an empty string > >(even on Linux). Only under heavy load, I believe--any thoughts? > I think this one is when you run get_peername while the connection is > still ''pending.'' -- unfortunately at least on Linux it pauses 1s > (exactly) THEN returns an empty string, so it might be worth > arbitrarily returning ''not_yet_connected'' as the host name, without > doing the lookup. I could do the patch. > Thoughts? >I''d like to see that patch if you''re willing to do it. Another important thing is 1.9. Have you tested with that? I''m working on a case now where TCPSocket freezes on 1.9 when it runs in a separate thread (and of course in 1.9, threads are native threads). -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/eventmachine-talk/attachments/20071231/11ceaa13/attachment.html