Mark Van De Vyver
2007-May-03 16:55 UTC
[Eventmachine-talk] Some newbie clarification questions.
Hi, Thank you for all the effort that has gone into making EventMachine (EM) available. I''m relatively new to Ruby and TCP communication, so I thought to check if my understanding of EM, and my intended use, is correct. I''ve read the wiki example. Is there other online documentation I''ve missed? I have a 3rd party application and some ruby code (from another 3rd party) that sends/receives data to/from this application. The application has a handshake sequence where data is sent received (processed) then more data sent depending on what was received. In the ruby code, data is currently sent using: TCPSocket#syswrite( data_elem.to_s + "\0" ) where data is stored in a Queue. Sometimes data is sent using TCPSocket#send( integer ) Data is read using repeated use of: TCPSocket#gets("\0").chop Currently the ruby code has the ''message'' construction/parsing and sending/receiving tightly coupled. My though was that I could use EM to simplify matters by: - separating message construction from message sending, an outgoing ''message'' would be a single string of values each ''field'' separated by "\0" (as above). - make the sending/receiving non-blocking - generally benefit from EM''s infrastructure/robustness. Some questions I have are: 1) Should I place the ''handshake'' code in the connection_completed or post_init methods. I assume I define these methods in a module/class that plays the same role as the EchoServer module in the wiki example? 2) Is there a ''time line'' setting out what EM methods are called when in the life of a connection? 3) Is there likely to be much of a speed improvement by writing/reading a single string (5-100 characters of null separated values) instead of 5-30 separate writes/reads? 4) Currently data is read one-at-a-time using gets("\0"), is it possible to use EM to ''read'' several "\0" separated fields at once? Unfortunately the 3rd party application has no distinct _message_ delimiter for incoming messages. 5) In the wiki example ''data'' is passed to the read_data method. How does EM determine the end of the ''data'' - hopefully that makes sense? I appreciate any light you can shed on these questions. Regards Mark
Francis Cianfrocca
2007-May-03 17:38 UTC
[Eventmachine-talk] Some newbie clarification questions.
On 5/3/07, Mark Van De Vyver <mvyver at gmail.com> wrote:> > > I''ve read the wiki example. Is there other online documentation I''ve > missed?There is an extensive rdoc with explanations and sample code. Based on your description, I''m assuming that in your code acts as the TCP client and the 3rd party application is the TCP server. (Meaning, your code initiates the TCP connection and the other app accepts it.) If I''m wrong, please correct me.> 1) Should I place the ''handshake'' code in the connection_completed or > post_init methods. > I assume I define these methods in a module/class that plays the > same role as the EchoServer module in the wiki example?Per my assumptions as stated above, your code is the TCP client, and you''ve called EventMachine#connect. If this is true, then your handshake code belongs in connection_completed. This is because EventMachine#connect issues a nonblocking connect. post_init is called after initialize (which you also may override) completes, but in general the connection to the remote server has not completed by that time. (If you were writing a TCP server, then your handshake would go in post_init, because a server connection doesn''t receive connection_completed.) Your assumption is correct. 2) Is there a ''time line'' setting out what EM methods are called when> in the life of a connection?Yes, it''s deterministic and guaranteed and stated in the documents of the EventMachine::Connection methods. In short: Any class which you pass in the handler argument of #connect, #start_server or their siblings must be a subclass of EventMachine::Connection. If you pass a Module (which is generally easier for simple things), then an instance of an anonymous subclass of EventMachine::Connection is created, and your Module is included into it. A new connection first calls #initialize, which you may override (don''t forget to call super). Next, the new object is yielded to the block passed to #connect or #start_server, if any. Next, #post_init is called. Next, if the connection is a client, AND the connection completes, #connection_completed is called. connection_completed is NOT called for server sockets or for client sockets that do not complete due to error or timeout. Next, receive_data is called zero or more times, as the connection receives data. Finally, #unbind is called. This ALWAYS happens, regardless of whether the connection closes because you closed it, the remote peer closed it, or there was an error. #unbind is even called if a client connection fails to complete. 3) Is there likely to be much of a speed improvement by> writing/reading a single string (5-100 characters of null separated > values) instead of 5-30 separate writes/reads?If you''re writing raw I/O, the answer is definitely yes. With EM, the answer is probably no, because EM buffers and coalesces outbound data to minimize the number of syscalls it has to make. 4) Currently data is read one-at-a-time using gets("\0"), is it> possible to use EM to ''read'' several "\0" separated fields at once? > Unfortunately the 3rd party application has no distinct _message_ > delimiter for incoming messages.This confuses me because I understood "\0" to be the message delimiter. Unless you mean it''s the field delimiter. At any rate, this is the area that will give you the most confusion if you''re like most people. EM will call receive_data with whatever it gets from the network. It may coalesce or fragment the data in a handful of ways. You can only assume that you''ll get all the data in the correct order, but you can make NO assumptions about how many receive_data calls will be made. A naive way to do what you''re proposing might be: def post_init @data = "" end def receive_data data @data << data loop { head,tail = @data.split("\0") if tail # head is the first field @data = tail else break # here, the current data has no \0 character end end Other people may suggest more efficient ways to write this loop, but the key point is that you have to keep unconsumed data around inside your object between calls to receive_data. (The @data instance variable does that in this example.) Once you grasp this, everything else about EM is easy. This is the fundamental difference between the event-handling style and the threaded style. 5) In the wiki example ''data'' is passed to the read_data method. How> does EM determine the end of the ''data'' - hopefully that makes sense?EM will generally send you as much data as it can pull from the kernel read buffer in one go, up to a predetermined limit (currently about 160K if I recall). If you have relatively few connections, your reads will often be about the size of an ethernet packet. Under heavy loads, they will often be fewer and bigger. You can''t depend on any of this, however, because it can and will change under different load conditions. EM always tries to provide an optimal mix of speed and "fairness." Now observe, my whole discussion assumes TCP. If you''re using UDP, then it''s all different. EM will ALWAYS respect UDP message boundaries as you would expect, and receive_data will get exactly the number of bytes that were in the packet received from the network. If you get ten UDP packets, then you''ll get ten #receive_data calls. (In UDP, a receive_data can send you a zero-length string since this is valid in UDP. With TCP, receive_data will NEVER send you zero bytes.) I appreciate any light you can shed on these questions. Good luck and write back if you have trouble. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/eventmachine-talk/attachments/20070503/bdd96ccf/attachment.html
Francis Cianfrocca
2007-May-03 17:41 UTC
[Eventmachine-talk] Some newbie clarification questions.
On 5/3/07, Mark Van De Vyver <mvyver at gmail.com> wrote:> > 1) Should I place the ''handshake'' code in the connection_completed or > post_init methods.I need to clarify something I said. I suggested that for client connections, you place any initial data that needs to be sent in #connection_completed. This isn''t necessary. You can call #send_data in post_init or even in your constructor. EM will buffer the outbound data inside your process, however, and it won''t get copied to the kernel write buffer until after the connection completes. There''s nothing wrong with this. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/eventmachine-talk/attachments/20070503/18e2efc3/attachment.html
Mark Van De Vyver
2007-May-03 18:37 UTC
[Eventmachine-talk] Some newbie clarification questions.
Hi Francis, Thank you for the prompt and excellent response. You are correct my ruby code plays the role of a TCP client with "\0" delimiting fields - I now understand that what the server application regards/sends as a single message might arrive at the client via several receive_data calls, so a little ''business logic'' will need to be employed in the receive_data method. Of course it is possible that the server sends each field as a separate ''message'' - I''ll need to look into this I''ll get the EM code and go through the documentation - apologies - I''ve been assuming ruby forge is ''automagic'' and that if it''s not on ruby forge then it is not in the ruby forge gem (blush). Thanks for the helpful tips on potential gotcha''s, and the receive_data example - that was new to me, and I think I''ll be spending most of my time working on this method. I''ll also look into whether a UDP connection is possible. I think EM can do what I need (and much more...), so I''ll jump in. I don''t have more questions at the moment. Thanks again for a great response and a great gem! Regards Mark On 5/4/07, Francis Cianfrocca <garbagecat10 at gmail.com> wrote:> On 5/3/07, Mark Van De Vyver <mvyver at gmail.com> wrote: > > > > I''ve read the wiki example. Is there other online documentation I''ve > missed? > There is an extensive rdoc with explanations and sample code. > > Based on your description, I''m assuming that in your code acts as the TCP > client and the 3rd party application is the TCP server. (Meaning, your code > initiates the TCP connection and the other app accepts it.) If I''m wrong, > please correct me. > > > > > 1) Should I place the ''handshake'' code in the connection_completed or > > post_init methods. > > I assume I define these methods in a module/class that plays the > > same role as the EchoServer module in the wiki example? > > Per my assumptions as stated above, your code is the TCP client, and you''ve > called EventMachine#connect. If this is true, then your handshake code > belongs in connection_completed. This is because EventMachine#connect issues > a nonblocking connect. post_init is called after initialize (which you also > may override) completes, but in general the connection to the remote server > has not completed by that time. > > (If you were writing a TCP server, then your handshake would go in > post_init, because a server connection doesn''t receive > connection_completed.) > > Your assumption is correct. > > > > > 2) Is there a ''time line'' setting out what EM methods are called when > > in the life of a connection? > Yes, it''s deterministic and guaranteed and stated in the documents of the > EventMachine::Connection methods. In short: > Any class which you pass in the handler argument of #connect, #start_server > or their siblings must be a subclass of EventMachine::Connection. If you > pass a Module (which is generally easier for simple things), then an > instance of an anonymous subclass of EventMachine::Connection is created, > and your Module is included into it. > A new connection first calls #initialize, which you may override (don''t > forget to call super). > Next, the new object is yielded to the block passed to #connect or > #start_server, if any. > Next, #post_init is called. > Next, if the connection is a client, AND the connection completes, > #connection_completed is called. connection_completed is NOT called for > server sockets or for client sockets that do not complete due to error or > timeout. > Next, receive_data is called zero or more times, as the connection receives > data. > Finally, #unbind is called. This ALWAYS happens, regardless of whether the > connection closes because you closed it, the remote peer closed it, or there > was an error. #unbind is even called if a client connection fails to > complete. > > > 3) Is there likely to be much of a speed improvement by > > writing/reading a single string (5-100 characters of null separated > > values) instead of 5-30 separate writes/reads? > > If you''re writing raw I/O, the answer is definitely yes. With EM, the answer > is probably no, because EM buffers and coalesces outbound data to minimize > the number of syscalls it has to make. > > > 4) Currently data is read one-at-a-time using gets("\0"), is it > > possible to use EM to ''read'' several "\0" separated fields at once? > > Unfortunately the 3rd party application has no distinct _message_ > > delimiter for incoming messages. > > This confuses me because I understood "\0" to be the message delimiter. > Unless you mean it''s the field delimiter. At any rate, this is the area that > will give you the most confusion if you''re like most people. > EM will call receive_data with whatever it gets from the network. It may > coalesce or fragment the data in a handful of ways. You can only assume that > you''ll get all the data in the correct order, but you can make NO > assumptions about how many receive_data calls will be made. > A naive way to do what you''re proposing might be: > > def post_init > @data = "" > end > def receive_data data > @data << data > loop { > head,tail = @data.split("\0") > if tail > # head is the first field > @data = tail > else > break # here, the current data has no \0 character > end > end > > Other people may suggest more efficient ways to write this loop, but the key > point is that you have to keep unconsumed data around inside your object > between calls to receive_data. (The @data instance variable does that in > this example.) Once you grasp this, everything else about EM is easy. This > is the fundamental difference between the event-handling style and the > threaded style. > > > 5) In the wiki example ''data'' is passed to the read_data method. How > > does EM determine the end of the ''data'' - hopefully that makes sense? > > EM will generally send you as much data as it can pull from the kernel read > buffer in one go, up to a predetermined limit (currently about 160K if I > recall). If you have relatively few connections, your reads will often be > about the size of an ethernet packet. Under heavy loads, they will often be > fewer and bigger. You can''t depend on any of this, however, because it can > and will change under different load conditions. EM always tries to provide > an optimal mix of speed and "fairness." > > Now observe, my whole discussion assumes TCP. If you''re using UDP, then it''s > all different. EM will ALWAYS respect UDP message boundaries as you would > expect, and receive_data will get exactly the number of bytes that were in > the packet received from the network. If you get ten UDP packets, then > you''ll get ten #receive_data calls. (In UDP, a receive_data can send you a > zero-length string since this is valid in UDP. With TCP, receive_data will > NEVER send you zero bytes.) > > > > I appreciate any light you can shed on these questions. > > Good luck and write back if you have trouble. > > >
Hi, From: "Mark Van De Vyver" <mvyver at gmail.com>> > I now understand that what the server application > regards/sends as a single message might arrive at the client via > several receive_data calls, so a little ''business logic'' will need to > be employed in the receive_data method. > Of course it is possible that the server sends each field as a > separate ''message'' - I''ll need to look into thisJust wanted to clarify, in case you might not already know; but even if EventMachine weren''t in the picture, even if you were doing a recv() directly on the TCP socket, you still would not be able to depend on receiving an entire ''message'' in one chunk regardless of whether or not the remote end may have been sending each field with a separate send() or write() call. . . . I''m not sure if what I''ve said above is clearer or more confusing. :) I''m just trying to clarify that TCP is a streaming protocol, so you will just receive a stream of bytes, regardless of whether the remote end may have transmitted the data in field-sized or message-sized bursts. If you want to detect fields or messages in the result, your detection will have to be based on some physical delimiter or structure in the data.> I''ll also look into whether a UDP connection is possible.BTW, UDP is very low-level, and has its own set of pitfalls as a result: If the remote end sends a UDP packet to you, you may not receive it at all. Or you may receive it multiple times. If the remote end sends multiple UDP packets to you, you may receive none of them, some of them, and the ones you receive may come out of order, multiple times, and any combination thereof. UDP does have its uses; but you can think of TCP as (essentially) a streaming protocol built on top of UDP that deals with all the packetloss and duplicate packets for you, so that you just receive a nice contiguous stream of bytes in the proper order. (TCP is not literally built on top of UDP, but the packetloss and duplicate packets it deals with on your behalf are the same as what you would have to deal with manually if you used UDP.) Hope this helps, Bill
Mark Van De Vyver
2007-May-03 19:59 UTC
[Eventmachine-talk] Some newbie clarification questions.
Hi Bill,> Just wanted to clarify, in case you might not already know; > but even if EventMachine weren''t in the picture, even if you > were doing a recv() directly on the TCP socket, you still would > not be able to depend on receiving an entire ''message'' in one > chunk regardless of whether or not the remote end may have been > sending each field with a separate send() or write() call. > > . . . I''m not sure if what I''ve said above is clearer or more > confusing. :) I''m just trying to clarify that TCP is a > streaming protocol, so you will just receive a stream of bytes, > regardless of whether the remote end may have transmitted the > data in field-sized or message-sized bursts. If you want to > detect fields or messages in the result, your detection will > have to be based on some physical delimiter or structure in > the data.Thanks, that does help.> > > I''ll also look into whether a UDP connection is possible. > > BTW, UDP is very low-level, and has its own set of pitfalls as > a result: If the remote end sends a UDP packet to you, you may > not receive it at all. Or you may receive it multiple times. > If the remote end sends multiple UDP packets to you, you may > receive none of them, some of them, and the ones you receive > may come out of order, multiple times, and any combination > thereof.Thanks, good to know too - turns out it is not possible to use UDP with the server application.> UDP does have its uses; but you can think of TCP as (essentially) > a streaming protocol built on top of UDP that deals with all the > packetloss and duplicate packets for you, so that you just > receive a nice contiguous stream of bytes in the proper order. > (TCP is not literally built on top of UDP, but the packetloss and > duplicate packets it deals with on your behalf are the same as > what you would have to deal with manually if you used UDP.)Great info, I wasn''t aware of this.> > Hope this helps,It certainly does. Thank you Mark> Bill > > > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk >
Francis Cianfrocca
2007-May-03 23:57 UTC
[Eventmachine-talk] Some newbie clarification questions.
Yecccch. There was a syntax error in the code sample in my original reply. Here''s a more correct version: def post_init @data = "" end def receive_data data @data << data loop { head,tail = @data.split("\0") if tail # head is the first field @data = tail else break # here, the current data has no \0 character end } end -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/eventmachine-talk/attachments/20070504/7ffb6d9c/attachment.html
Francis Cianfrocca
2007-May-04 00:20 UTC
[Eventmachine-talk] Some newbie clarification questions.
On 5/3/07, Bill Kelly <billk at cts.com> wrote:> > > Just wanted to clarify, in case you might not already know; > but even if EventMachine weren''t in the picture, even if you > were doing a recv() directly on the TCP socket, you still would > not be able to depend on receiving an entire ''message'' in one > chunk regardless of whether or not the remote end may have been > sending each field with a separate send() or write() call. > > . . . I''m not sure if what I''ve said above is clearer or more > confusing. :) I''m just trying to clarify that TCP is a > streaming protocol, so you will just receive a stream of bytes, > regardless of whether the remote end may have transmitted the > data in field-sized or message-sized bursts. If you want to > detect fields or messages in the result, your detection will > have to be based on some physical delimiter or structure in > the data.... BTW, UDP is very low-level, and has its own set of pitfalls as> a result: If the remote end sends a UDP packet to you, you may > not receive it at all. Or you may receive it multiple times. > If the remote end sends multiple UDP packets to you, you may > receive none of them, some of them, and the ones you receive > may come out of order, multiple times, and any combination > thereof. > > UDP does have its uses; but you can think of TCP as (essentially) > a streaming protocol built on top of UDP that deals with all the > packetloss and duplicate packets for you, so that you just > receive a nice contiguous stream of bytes in the proper order. > (TCP is not literally built on top of UDP, but the packetloss and > duplicate packets it deals with on your behalf are the same as > what you would have to deal with manually if you used UDP.)All of these clarifications are excellent, Bill. I wonder if we should sometime write up a primer on basic networking for Ruby users. Designing a really effective network protocol is one of those things that is simple on the surface but subtle and difficult when you get deep into it. I have a personal bias that most users of EM will eventually choose to use an in-the-package implementation of a standard protocol that is appropriate for their application (HTTP, XMPP, Stomp, whatever), rather than develop their own protocol. But I could be totally wrong about that, and of course there probably is a huge pile of legacy homegrown protocols out there that people will need to support, as the OP of this thread does. We should be looking to provide quality implementations of the standard biggies, as well as adding more support classes (like EventMachine::Protocols::LineAndTextProtocol) for handrolled protocols. It''s annoying because a lot of these already have competent Ruby implementations that are not event-friendly, and you always hate to reinvent the wheel. Tobias Gustaffson is working on an eventable implementation of SIP, which is a large, challenging protocol packed with features, and supported in both TCP and UDP. I''m really hopeful about his work because I think SIP is a protocol that has a huge future, perhaps as important as HTTP. And finally, the student who is doing his Summer of Code project on EM will be specifically working in a area that will benefit people who have to support legacy protocols, so keep your eyes on that effort. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/eventmachine-talk/attachments/20070504/49149754/attachment.html
Mark Van De Vyver
2007-May-04 00:54 UTC
[Eventmachine-talk] Some newbie clarification questions.
Hi, Thanks for the correction Francis - I''m that new to Ruby that I was wondering if Ruby''s relaxed syntax/notation allowed that behavior :) I think I now understand the TCP behavior a little better and, surprise-surprise, the ruby code now makes much more sense.. at the moment when an instance of the ''IncomingMessage'' class is created/instantiated it immediately goes and does a blocking read on the TCP connection - when a ''complete message'' i.e. when some message specific number of "\0" separated values have been read, the TCP reading loop/sequence ends and the read data (complete message) is returned. It then seems to me that in return for non-blocking reads and all the other goodness in EM, I only have to accommodate the possibility that the ''data'' passed to read_data by the EM might consist of the last elements of one message and the initial elements of another message, or only part of one message, more than one message etc, etc.. My newbie thought is to handle this by: - splitting the ''data'' as Francis suggests - adding the split ''data'' to a Queue (similar scope to Francis'' @data) - calling my message parsing method(s) from within read_data, and make them remove elements from the queue, _but_ only remove elements when a whole message has been parsed, otherwise the queue is left untouched. - write an unbind that checks to see if there is a non-empty queue and make some log if there is. I think this gets around me having to record the state of the message queue or handle any partially processed messages, and is simplest I can think of.... I''d appreciate any comment if there is some massive design flaw in this approach. I think it also accommodates a case where X read_data calls are required before all the data that constitutes one message is assembled. If the above approach should work then it is a remarkably small change to make to get all that EM offers! Thanks a lot for all the help and input Regards Mark On 5/4/07, Francis Cianfrocca <garbagecat10 at gmail.com> wrote:> Yecccch. There was a syntax error in the code sample in my original reply. > Here''s a more correct version: > > > def post_init > @data = "" > end > > def receive_data data > @data << data > loop { > head,tail = @data.split("\0") > if tail > # head is the first field > @data = tail > else > break # here, the current data has no \0 character > end > } > end > >
Francis Cianfrocca
2007-May-04 01:16 UTC
[Eventmachine-talk] Some newbie clarification questions.
On 5/4/07, Mark Van De Vyver <mvyver at gmail.com> wrote:> > It then seems to me that in return for non-blocking reads and all the > other goodness in EM, I only have to accommodate the possibility that > the ''data'' passed to read_data by the EM might consist of the last > elements of one message and the initial elements of another message, > or only part of one message, more than one message etc, etc..Yup. You have to manage state that a threaded programmer just leaves on the call stack. That''s the tradeoff with the event-driven programming model.> I think this gets around me having to record the state of the message > queue or handle any partially processed messages, and is simplest I > can think of....The call is receive_data rather than read_data. That will bite you, which I why risking a breach of taste to point it out. :-) You''ve got the right idea. The only flaw I see with this is if your incoming messages are huge. (Megabytes, maybe?) In this case you''ll almost certainly be calling your processing loop many times (for each incoming chunk of data off the network) before fulfilling a complete message. So if you''re not careful, you''ll re-compute a lot of state on every pass through the loop only to throw it away and start over on the next receive_data call. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/eventmachine-talk/attachments/20070504/82c11e8a/attachment.html
Mark Van De Vyver
2007-May-04 18:26 UTC
[Eventmachine-talk] Some newbie clarification questions.
> You''ve got the right idea. The only flaw I see with this is if your incoming > messages are huge. (Megabytes, maybe?) In this case you''ll almost certainly > be calling your processing loop many times (for each incoming chunk of data > off the network) before fulfilling a complete message. So if you''re not > careful, you''ll re-compute a lot of state on every pass through the loop > only to throw it away and start over on the next receive_data call.Great, my messages in this case are approx 1-10K max. It is a good point you make, which I''ll force to mind by keeping some ''received bytes'' or ''received fields'' counter, and only attempting to parse a message after some minimum received data count is reached. Thanks again Mark
Hi Francis, From: Francis Cianfrocca I wonder if we should sometime write up a primer on basic networking for Ruby users. Sounds nice. Maybe accompanied by some cookbook-style small runnable examples? Designing a really effective network protocol is one of those things that is simple on the surface but subtle and difficult when you get deep into it. I have a personal bias that most users of EM will eventually choose to use an in-the-package implementation of a standard protocol that is appropriate for their application (HTTP, XMPP, Stomp, whatever), rather than develop their own protocol. But I could be totally wrong about that, and of course there probably is a huge pile of legacy homegrown protocols out there that people will need to support, as the OP of this thread does. Indeed: if the EM protocol implementation exists, and it happens to match my needs, I''m not sure why I wouldn''t use it. For example, I''m using LineAndText for a couple utilities right now, and it was nice to have it available. (One issue I had, was I need to be able to accept lines ending in both LF and CRLF. So I just went with LF as the delimiter, and call chomp on the lines passed to me by recv_line(). Simple enough.) For a different project, I''m developing my own protocol. I would be surprised if anything pre-existing would fit (but I certainly don''t have an encyclopedic knowledge of protocols.) This application is kind of game-like, and mostly uses TCP, but also uses UDP for sending live telemetry (the sort where there''s no point in retransmitting old packets that were lost; all that matters is the most current information.) Incidentally, the TCP part of the protocol is similar to DRb in some ways, except that it''s not ruby-specific. I was asking about DRb last year, but I ended up dropping that idea because some of the nodes that will be speaking this protocol are pure C++. Yet, from a ruby point of view, it still maps well to taking the *args from a method_missing, and sending them across the wire (not accidentally :) We should be looking to provide quality implementations of the standard biggies, as well as adding more support classes (like EventMachine::Protocols::LineAndTextProtocol) for handrolled protocols. It''s annoying because a lot of these already have competent Ruby implementations that are not event-friendly, and you always hate to reinvent the wheel. Tobias Gustaffson is working on an eventable implementation of SIP, which is a large, challenging protocol packed with features, and supported in both TCP and UDP. I''m really hopeful about his work because I think SIP is a protocol that has a huge future, perhaps as important as HTTP. Cool. I wasn''t aware of SIP. From what I''ve read so far, it sounds like something I should learn more about. :) One thing that concerns me--overall--is ruby''s performance when it comes down to heavy byte-by-byte parsing. For example, I was kind of gritting my teeth when I implemented ANSI/VT100 terminal protocols in ruby, then implemented a line-editing and text-windowing system on top of that. The teeth-gritting was not because I was writing the code in ruby, which was easy and fun; but because I knew how much work the CPU was doing behind the scenes. Even now, with just one client, if the client pastes, say, 5 or 10 KB of text into the line-edit prompt in the terminal, the CPU on the server pegs at 100% for a full second or two processing the input. (I should note that I haven''t profiled this; and I haven''t *tried* to optimize it; but I started out in 8088 assembler and FORTH, and later on spent a decade writing video games in assembler and C, and even though I''m a staunch premature-optimization-is-the-root-of-evil kind of guy; measure! measure! measure!, etc.; ... I feel like I''m taking a couple orders of magnitude hit on this stuff performance-wise.) Where I start to really ponder the performance, is when I consider scaling to, say, 1000 or more concurrent clients or so. (Say, the Next Generation IRC-like server... You want maximum clients possible per box.) I''m not sure where I''m going with this; as it''s not meant to be a rant, exactly. But I found myself wondering about the possibility of a ''hook'' of sorts, between the raw ''C'' EM event_callback, and the ruby dispatch... static void event_callback (const char *a1, int a2, const char *a3, int a4) { // <-- some kind of hook here? rb_funcall (EmModule, rb_intern ("event_callback"), 3, rb_str_new2(a1), (a2 << 1) | 1, rb_str_new(a3,a4)); } I dunno... maybe that''s too low-level to be convenient, but what I''d like is to be able to factor-out ruby as needed, while still being able to inherit from the ''C'' implementation of the protocol in ruby. Thinking further, I guess there''s nothing stopping anyone from implementing, say, LineAndTextProtocol in C (as a ruby extension class with C methods.) We''d still include the minimal overhead of the rb_funcall and the EventMachine::event_callback case statement in eventmachine.rb, but I''d be surprised if that couldn''t execute 1,000,000 times a second... Haha, indeed: t1 = Time.now; 1_000_000.times { case 123; when 1 then "foo"; when 2 then "bar"; end; } ; Time.now - t1 ...executes almost exactly in 1 second on my 2GHz athlon 64 system. Hmm... Well thanks for reading this far; I guess I''ve convinced myself it''s OK to use EM in ruby, knowing I can implement the ''protocol'' classes in C as I need them. If I implemented EM-compatible versions of my ANSI/VT100 parsing classes in ''C'', I can probably expect good performance scaling to many concurrent users. As long as I can avoid ruby memory leaks in the rest of my ruby code (Array push/shift, anyone? ;-o) Regards, Bill -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/eventmachine-talk/attachments/20070506/c7da5677/attachment.html
Francis Cianfrocca
2007-May-07 05:38 UTC
[Eventmachine-talk] Some newbie clarification questions.
On 5/7/07, Bill Kelly <billk at cts.com> wrote:> > > Cool. I wasn''t aware of SIP. From what I''ve read so far, it sounds like > something I should learn more about. :) >Vint Cerf, of all people, says that SIP may become the most important protocol of all as the internet progresses. It''s already the standard for business IP telephony (companies generally won''t use Skype). I''m not sure where I''m going with this; as it''s not meant to be a rant,> exactly. But I found myself wondering about the possibility of a ''hook'' of > sorts, between the raw ''C'' EM event_callback, and the ruby dispatch... > > static void event_callback (const char *a1, int a2, const char *a3, int > a4) > { > // <-- some kind of hook here? > rb_funcall (EmModule, rb_intern ("event_callback"), 3, > rb_str_new2(a1), (a2 << 1) | 1, rb_str_new(a3,a4)); > } > > I dunno... maybe that''s too low-level to be convenient, but what I''d like > is to be able to factor-out ruby as needed, while still being able to > inherit from the ''C'' implementation of the protocol in ruby. >Keep your eyes on the Summer of Code project that seeks to integrate Ragel grammars with EM. This just might be a huge win for people who need to roll their own protocols, especially for legacy support. I can easily imagine a standard pattern emerging in which people use EM to wrap up a legacy protocol and then proxy it into something stadard (REST, perhaps?). This would make EM the tool of choice for working with pre-existing network-aware applications. I think EM needs a competent telnet implementation. What can I do to help you restart that effort? I''m not really scared of implementing standard protocols as C extensions. I''ve developed kind of a pattern for that, having done it for HTTP, SAX2, and (partially-complete) SMTP. (I did an LDAP server in EM in pure-Ruby.) The nice thing about this is that you can define as many Ruby call-outs as you want, implement them as fast stubs in C, and then users can redefine in Ruby only the ones they need. As I said, I''m very keen to see SIP, Stomp, telnet, active-FTP, and XMPP in the box. When I get a few minutes, I''ll publish these additional protocol handlers that I''ve done, to get the pattern out there. The packaging is a bit of a question mark. If you look in the Twisted distro, they have the main distro and a batch of offshoots as separate downloads with a standard naming convention. That''s worth doing if some of the additional pieces have library dependencies (my eventable SAX2 processor calls out to libxml2), but it would be more convenient to put everything in the box, which really isn''t that big. Any thoughts? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/eventmachine-talk/attachments/20070507/21e5f9cd/attachment-0001.html
Mark Van De Vyver
2007-May-09 16:57 UTC
[Eventmachine-talk] Some newbie clarification questions.
In case some other ruby fresher come along this way.... look out for the ''splat'' (*) in the line I''ve uncommented below. HTH Mark> def post_init > @data = "" > end > > def receive_data data > @data << data > loop {head,*tail = @data.split("\0")> if tail > # head is the first field > @data = tail > else > break # here, the current data has no \0 character > end > } > end > >