thr3ads.net - Eventmachine talk - [Eventmachine-talk] Some newbie clarification questions. [May 2007]

If this information is useful, please help other people find it:
Share via:

Mark Van De Vyver

2007-May-03 16:55 UTC

[Eventmachine-talk] Some newbie clarification questions.

Hi,
Thank you for all the effort that has gone into making EventMachine
(EM) available.

I''m relatively new to Ruby and TCP communication, so I thought to
check if my understanding of EM, and my intended use, is correct.

I''ve read the wiki example.  Is there other online documentation
I''ve missed?

I have a 3rd party application and some ruby code (from another 3rd
party) that sends/receives data to/from this application.

The application has a handshake sequence where data is sent received
(processed) then more data sent depending on what was received.
In the ruby code, data is currently sent using:

    TCPSocket#syswrite( data_elem.to_s + "\0" )
where data is stored in a Queue.
Sometimes data is sent using
    TCPSocket#send( integer )

Data is read using repeated use of:
    TCPSocket#gets("\0").chop

Currently the ruby code has the ''message'' construction/parsing
and
sending/receiving tightly coupled.

My though was that I could use EM to simplify matters by:
 - separating message construction from message sending, an outgoing
''message'' would be a single string of values each
''field'' separated by
"\0" (as above).
 - make the sending/receiving non-blocking
 - generally benefit from EM''s infrastructure/robustness.

Some questions I have are:

1) Should I place the ''handshake'' code in the
connection_completed or
post_init methods.
    I assume I define these methods in a module/class that plays the
same role as the EchoServer module in the wiki example?
2) Is there a ''time line'' setting out what EM methods are
called when
in the life of a connection?
3) Is there likely to be much of a speed improvement by
writing/reading a single string (5-100 characters of null separated
values) instead of 5-30 separate writes/reads?
4) Currently data is read  one-at-a-time using gets("\0"), is it
possible to use EM to ''read'' several "\0" separated
fields at once?
Unfortunately the 3rd party application has no distinct _message_
delimiter for incoming messages.
5) In the wiki example ''data'' is passed to the read_data
method.  How
does EM determine the end of the ''data'' - hopefully that makes
sense?

I appreciate any light you can shed on these questions.

Regards
Mark

Francis Cianfrocca

2007-May-03 17:38 UTC

head link

[Eventmachine-talk] Some newbie clarification questions.

On 5/3/07, Mark Van De Vyver <mvyver at gmail.com>
wrote:>
>
> I''ve read the wiki example.  Is there other online documentation
I''ve
> missed?
There is an extensive rdoc with explanations and sample code.

Based on your description, I''m assuming that in your code acts as the
TCP
client and the 3rd party application is the TCP server. (Meaning, your code
initiates the TCP connection and the other app accepts it.) If I''m
wrong,
please correct me.

> 1) Should I place the ''handshake'' code in the
connection_completed or
> post_init methods.
>     I assume I define these methods in a module/class that plays the
> same role as the EchoServer module in the wiki example?

Per my assumptions as stated above, your code is the TCP client, and
you''ve
called EventMachine#connect. If this is true, then your handshake code
belongs in connection_completed. This is because EventMachine#connect issues
a nonblocking connect. post_init is called after initialize (which you also
may override) completes, but in general the connection to the remote server
has not completed by that time.

(If you were writing a TCP server, then your handshake would go in
post_init, because a server connection doesn''t receive
connection_completed.)

Your assumption is correct.

2) Is there a ''time line'' setting out what EM methods are
called when> in the life of a connection?
Yes, it''s deterministic and guaranteed and stated in the documents of
the
EventMachine::Connection methods. In short:
Any class which you pass in the handler argument of #connect, #start_server
or their siblings must be a subclass of EventMachine::Connection. If you
pass a Module (which is generally easier for simple things), then an
instance of an anonymous subclass of EventMachine::Connection is created,
and your Module is included into it.
A new connection first calls #initialize, which you may override (don''t
forget to call super).
Next, the new object is yielded to the block passed to #connect or
#start_server, if any.
Next, #post_init is called.
Next, if the connection is a client, AND the connection completes,
#connection_completed is called. connection_completed is NOT called for
server sockets or for client sockets that do not complete due to error or
timeout.
Next, receive_data is called zero or more times, as the connection receives
data.
Finally, #unbind is called. This ALWAYS happens, regardless of whether the
connection closes because you closed it, the remote peer closed it, or there
was an error. #unbind is even called if a client connection fails to
complete.

3) Is there likely to be much of a speed improvement by> writing/reading a single string (5-100 characters of null separated
> values) instead of 5-30 separate writes/reads?

If you''re writing raw I/O, the answer is definitely yes. With EM, the
answer
is probably no, because EM buffers and coalesces outbound data to minimize
the number of syscalls it has to make.

4) Currently data is read  one-at-a-time using gets("\0"), is
it> possible to use EM to ''read'' several "\0"
separated fields at once?
> Unfortunately the 3rd party application has no distinct _message_
> delimiter for incoming messages.

This confuses me because I understood "\0" to be the message
delimiter.
Unless you mean it''s the field delimiter. At any rate, this is the area
that
will give you the most confusion if you''re like most people.
EM will call receive_data with whatever it gets from the network. It may
coalesce or fragment the data in a handful of ways. You can only assume that
you''ll get all the data in the correct order, but you can make NO
assumptions about how many receive_data calls will be made.
A naive way to do what you''re proposing might be:

def post_init
  @data = ""
end
def receive_data data
  @data << data
  loop {
    head,tail = @data.split("\0")
    if tail
      # head is the first field
      @data = tail
    else
      break # here, the current data has no \0 character
    end
end

Other people may suggest more efficient ways to write this loop, but the key
point is that you have to keep unconsumed data around inside your object
between calls to receive_data. (The @data instance variable does that in
this example.) Once you grasp this, everything else about EM is easy. This
is the fundamental difference between the event-handling style and the
threaded style.

5) In the wiki example ''data'' is passed to the read_data
method.  How> does EM determine the end of the ''data'' - hopefully that
makes sense?

EM will generally send you as much data as it can pull from the kernel read
buffer in one go, up to a predetermined limit (currently about 160K if I
recall). If you have relatively few connections, your reads will often be
about the size of an ethernet packet. Under heavy loads, they will often be
fewer and bigger. You can''t depend on any of this, however, because it
can
and will change under different load conditions. EM always tries to provide
an optimal mix of speed and "fairness."

Now observe, my whole discussion assumes TCP. If you''re using UDP, then
it''s
all different. EM will ALWAYS respect UDP message boundaries as you would
expect, and receive_data will get exactly the number of bytes that were in
the packet received from the network. If you get ten UDP packets, then
you''ll get ten #receive_data calls. (In UDP, a receive_data can send
you a
zero-length string since this is valid in UDP. With TCP, receive_data will
NEVER send you zero bytes.)

I appreciate any light you can shed on these questions.

Good luck and write back if you have trouble.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20070503/bdd96ccf/attachment.html

Francis Cianfrocca

2007-May-03 17:41 UTC

head link

[Eventmachine-talk] Some newbie clarification questions.

On 5/3/07, Mark Van De Vyver <mvyver at gmail.com>
wrote:>
>  1) Should I place the ''handshake'' code in the
connection_completed or
> post_init methods.


I need to clarify something I said. I suggested that for client connections,
you place any initial data that needs to be sent in #connection_completed.
This isn''t necessary. You can call #send_data in post_init or even in
your
constructor. EM will buffer the outbound data inside your process, however,
and it won''t get copied to the kernel write buffer until after the
connection completes. There''s nothing wrong with this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20070503/18e2efc3/attachment.html

Mark Van De Vyver

2007-May-03 18:37 UTC

head link

[Eventmachine-talk] Some newbie clarification questions.

Hi Francis,
Thank you for the prompt and excellent response.
You are correct my ruby code plays the role of a TCP client with "\0"
delimiting fields - I now understand that what the server application
regards/sends as a single message might arrive at the client via
several receive_data calls, so a little ''business logic'' will
need to
be employed in the receive_data method.
Of course it is possible that the server sends each field as a
separate ''message'' - I''ll need to look into this

I''ll get the EM code and go through the documentation - apologies -
I''ve been assuming ruby forge is ''automagic'' and that
if it''s not on
ruby forge then it is not in the ruby forge gem (blush).

Thanks for the helpful tips on potential gotcha''s, and the
receive_data example - that was new to me, and I think I''ll be
spending most of my time working on this method.
I''ll also look into whether a UDP connection is possible.

I think EM can do what I need (and much more...), so I''ll jump in.
I don''t have more questions at the moment.

Thanks again for a great response and a great gem!

Regards
Mark



On 5/4/07, Francis Cianfrocca <garbagecat10 at gmail.com>
wrote:> On 5/3/07, Mark Van De Vyver <mvyver at gmail.com> wrote:
> >
> > I''ve read the wiki example.  Is there other online
documentation I''ve
> missed?
> There is an extensive rdoc with explanations and sample code.
>
> Based on your description, I''m assuming that in your code acts as
the TCP
> client and the 3rd party application is the TCP server. (Meaning, your code
> initiates the TCP connection and the other app accepts it.) If I''m
wrong,
> please correct me.
>
> >
> > 1) Should I place the ''handshake'' code in the
connection_completed or
> > post_init methods.
> >     I assume I define these methods in a module/class that plays the
> > same role as the EchoServer module in the wiki example?
>
> Per my assumptions as stated above, your code is the TCP client, and
you''ve
> called EventMachine#connect. If this is true, then your handshake code
> belongs in connection_completed. This is because EventMachine#connect
issues
> a nonblocking connect. post_init is called after initialize (which you also
> may override) completes, but in general the connection to the remote server
> has not completed by that time.
>
> (If you were writing a TCP server, then your handshake would go in
> post_init, because a server connection doesn''t receive
> connection_completed.)
>
> Your assumption is correct.
>
>
>
> > 2) Is there a ''time line'' setting out what EM
methods are called when
> > in the life of a connection?
> Yes, it''s deterministic and guaranteed and stated in the documents
of the
> EventMachine::Connection methods. In short:
> Any class which you pass in the handler argument of #connect, #start_server
> or their siblings must be a subclass of EventMachine::Connection. If you
> pass a Module (which is generally easier for simple things), then an
> instance of an anonymous subclass of EventMachine::Connection is created,
> and your Module is included into it.
> A new connection first calls #initialize, which you may override
(don''t
> forget to call super).
> Next, the new object is yielded to the block passed to #connect or
> #start_server, if any.
> Next, #post_init is called.
> Next, if the connection is a client, AND the connection completes,
> #connection_completed is called. connection_completed is NOT called for
> server sockets or for client sockets that do not complete due to error or
> timeout.
> Next, receive_data is called zero or more times, as the connection receives
> data.
> Finally, #unbind is called. This ALWAYS happens, regardless of whether the
> connection closes because you closed it, the remote peer closed it, or
there
> was an error. #unbind is even called if a client connection fails to
> complete.
>
> > 3) Is there likely to be much of a speed improvement by
> > writing/reading a single string (5-100 characters of null separated
> > values) instead of 5-30 separate writes/reads?
>
> If you''re writing raw I/O, the answer is definitely yes. With EM,
the answer
> is probably no, because EM buffers and coalesces outbound data to minimize
> the number of syscalls it has to make.
>
> > 4) Currently data is read  one-at-a-time using gets("\0"),
is it
> > possible to use EM to ''read'' several "\0"
separated fields at once?
> > Unfortunately the 3rd party application has no distinct _message_
> > delimiter for incoming messages.
>
> This confuses me because I understood "\0" to be the message
delimiter.
> Unless you mean it''s the field delimiter. At any rate, this is the
area that
> will give you the most confusion if you''re like most people.
> EM will call receive_data with whatever it gets from the network. It may
> coalesce or fragment the data in a handful of ways. You can only assume
that
> you''ll get all the data in the correct order, but you can make NO
> assumptions about how many receive_data calls will be made.
> A naive way to do what you''re proposing might be:
>
> def post_init
>   @data = ""
> end
> def receive_data data
>   @data << data
>   loop {
>     head,tail = @data.split("\0")
>      if tail
>       # head is the first field
>       @data = tail
>     else
>       break # here, the current data has no \0 character
>     end
> end
>
> Other people may suggest more efficient ways to write this loop, but the
key
> point is that you have to keep unconsumed data around inside your object
> between calls to receive_data. (The @data instance variable does that in
> this example.) Once you grasp this, everything else about EM is easy. This
> is the fundamental difference between the event-handling style and the
> threaded style.
>
> > 5) In the wiki example ''data'' is passed to the
read_data method.  How
> > does EM determine the end of the ''data'' - hopefully
that makes sense?
>
> EM will generally send you as much data as it can pull from the kernel read
> buffer in one go, up to a predetermined limit (currently about 160K if I
> recall). If you have relatively few connections, your reads will often be
> about the size of an ethernet packet. Under heavy loads, they will often be
> fewer and bigger. You can''t depend on any of this, however,
because it can
> and will change under different load conditions. EM always tries to provide
> an optimal mix of speed and "fairness."
>
> Now observe, my whole discussion assumes TCP. If you''re using UDP,
then it''s
> all different. EM will ALWAYS respect UDP message boundaries as you would
> expect, and receive_data will get exactly the number of bytes that were in
> the packet received from the network. If you get ten UDP packets, then
> you''ll get ten #receive_data calls. (In UDP, a receive_data can
send you a
> zero-length string since this is valid in UDP. With TCP, receive_data will
> NEVER send you zero bytes.)
>
>
> > I appreciate any light you can shed on these questions.
>
> Good luck and write back if you have trouble.
>
>
>

Bill Kelly

2007-May-03 19:47 UTC

head link

[Eventmachine-talk] Some newbie clarification questions.

Hi,

From: "Mark Van De Vyver" <mvyver at
gmail.com>>
> I now understand that what the server application
> regards/sends as a single message might arrive at the client via
> several receive_data calls, so a little ''business logic''
will need to
> be employed in the receive_data method.
> Of course it is possible that the server sends each field as a
> separate ''message'' - I''ll need to look into this
Just wanted to clarify, in case you might not already know;
but even if EventMachine weren''t in the picture, even if you
were doing a recv() directly on the TCP socket, you still would
not be able to depend on receiving an entire ''message'' in one
chunk regardless of whether or not the remote end may have been
sending each field with a separate send() or write() call.

. . . I''m not sure if what I''ve said above is clearer or more
confusing.  :)  I''m just trying to clarify that TCP is a
streaming protocol, so you will just receive a stream of bytes,
regardless of whether the remote end may have transmitted the
data in field-sized or message-sized bursts.  If you want to
detect fields or messages in the result, your detection will
have to be based on some physical delimiter or structure in
the data.

> I''ll also look into whether a UDP connection is possible.
BTW, UDP is very low-level, and has its own set of pitfalls as
a result: If the remote end sends a UDP packet to you, you may
not receive it at all. Or you may receive it multiple times.
If the remote end sends multiple UDP packets to you, you may
receive none of them, some of them, and the ones you receive
may come out of order, multiple times, and any combination
thereof.

UDP does have its uses; but you can think of TCP as (essentially)
a streaming protocol built on top of UDP that deals with all the
packetloss and duplicate packets for you, so that you just
receive a nice contiguous stream of bytes in the proper order.
(TCP is not literally built on top of UDP, but the packetloss and
duplicate packets it deals with on your behalf are the same as
what you would have to deal with manually if you used UDP.)


Hope this helps,

Bill

Mark Van De Vyver

2007-May-03 19:59 UTC

head link

[Eventmachine-talk] Some newbie clarification questions.

Hi Bill,
> Just wanted to clarify, in case you might not already know;
> but even if EventMachine weren''t in the picture, even if you
> were doing a recv() directly on the TCP socket, you still would
> not be able to depend on receiving an entire ''message'' in
one
> chunk regardless of whether or not the remote end may have been
> sending each field with a separate send() or write() call.
>
> . . . I''m not sure if what I''ve said above is clearer or
more
> confusing.  :)  I''m just trying to clarify that TCP is a
> streaming protocol, so you will just receive a stream of bytes,
> regardless of whether the remote end may have transmitted the
> data in field-sized or message-sized bursts.  If you want to
> detect fields or messages in the result, your detection will
> have to be based on some physical delimiter or structure in
> the data.
Thanks, that does help.
>
> > I''ll also look into whether a UDP connection is possible.
>
> BTW, UDP is very low-level, and has its own set of pitfalls as
> a result: If the remote end sends a UDP packet to you, you may
> not receive it at all. Or you may receive it multiple times.
> If the remote end sends multiple UDP packets to you, you may
> receive none of them, some of them, and the ones you receive
> may come out of order, multiple times, and any combination
> thereof.
Thanks, good to know too  - turns out it is not possible to use UDP
with the server application.
> UDP does have its uses; but you can think of TCP as (essentially)
> a streaming protocol built on top of UDP that deals with all the
> packetloss and duplicate packets for you, so that you just
> receive a nice contiguous stream of bytes in the proper order.
> (TCP is not literally built on top of UDP, but the packetloss and
> duplicate packets it deals with on your behalf are the same as
> what you would have to deal with manually if you used UDP.)
Great info, I wasn''t aware of this.
>
> Hope this helps,
It certainly does.

Thank you
Mark
> Bill
>
>
> _______________________________________________
> Eventmachine-talk mailing list
> Eventmachine-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/eventmachine-talk
>

Francis Cianfrocca

2007-May-03 23:57 UTC

head link

[Eventmachine-talk] Some newbie clarification questions.

Yecccch. There was a syntax error in the code sample in my original reply.
Here''s a more correct version:

def post_init
  @data = ""
end

def receive_data data
  @data << data
  loop {
    head,tail = @data.split("\0")
    if tail
      # head is the first field
      @data = tail
    else
      break # here, the current data has no \0 character
    end
  }
end
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20070504/7ffb6d9c/attachment.html

Francis Cianfrocca

2007-May-04 00:20 UTC

head link

[Eventmachine-talk] Some newbie clarification questions.

On 5/3/07, Bill Kelly <billk at cts.com> wrote:>
>
> Just wanted to clarify, in case you might not already know;
> but even if EventMachine weren''t in the picture, even if you
> were doing a recv() directly on the TCP socket, you still would
> not be able to depend on receiving an entire ''message'' in
one
> chunk regardless of whether or not the remote end may have been
> sending each field with a separate send() or write() call.
>
> . . . I''m not sure if what I''ve said above is clearer or
more
> confusing.  :)  I''m just trying to clarify that TCP is a
> streaming protocol, so you will just receive a stream of bytes,
> regardless of whether the remote end may have transmitted the
> data in field-sized or message-sized bursts.  If you want to
> detect fields or messages in the result, your detection will
> have to be based on some physical delimiter or structure in
> the data.

...

BTW, UDP is very low-level, and has its own set of pitfalls
as> a result: If the remote end sends a UDP packet to you, you may
> not receive it at all. Or you may receive it multiple times.
> If the remote end sends multiple UDP packets to you, you may
> receive none of them, some of them, and the ones you receive
> may come out of order, multiple times, and any combination
> thereof.
>
> UDP does have its uses; but you can think of TCP as (essentially)
> a streaming protocol built on top of UDP that deals with all the
> packetloss and duplicate packets for you, so that you just
> receive a nice contiguous stream of bytes in the proper order.
> (TCP is not literally built on top of UDP, but the packetloss and
> duplicate packets it deals with on your behalf are the same as
> what you would have to deal with manually if you used UDP.)

All of these clarifications are excellent, Bill. I wonder if we should
sometime write up a primer on basic networking for Ruby users.

Designing a really effective network protocol is one of those things that is
simple on the surface but subtle and difficult when you get deep into it. I
have a personal bias that most users of EM will eventually choose to use an
in-the-package implementation of a standard protocol that is appropriate for
their application (HTTP, XMPP, Stomp, whatever), rather than develop their
own protocol. But I could be totally wrong about that, and of course there
probably is a huge pile of legacy homegrown protocols out there that people
will need to support, as the OP of this thread does.

We should be looking to provide quality implementations of the standard
biggies, as well as adding more support classes (like
EventMachine::Protocols::LineAndTextProtocol) for handrolled protocols.
It''s
annoying because a lot of these already have competent Ruby implementations
that are not event-friendly, and you always hate to reinvent the wheel.

Tobias Gustaffson is working on an eventable implementation of SIP, which is
a large, challenging protocol packed with features, and supported in both
TCP and UDP. I''m really hopeful about his work because I  think SIP is
a
protocol that has a huge future, perhaps as important as HTTP.

And finally, the student who is doing his Summer of Code project on EM will
be specifically working in a area that will benefit people who have to
support legacy protocols, so keep your eyes on that effort.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20070504/49149754/attachment.html

Mark Van De Vyver

2007-May-04 00:54 UTC

head link

[Eventmachine-talk] Some newbie clarification questions.

Hi,
Thanks for the correction Francis - I''m that new to Ruby that I was
wondering if Ruby''s relaxed syntax/notation allowed that behavior :)

I think I now understand the TCP behavior a little better and,
surprise-surprise, the ruby code now makes much more sense.. at the
moment when an instance of the ''IncomingMessage'' class is
created/instantiated it immediately goes and does a blocking read on
the TCP connection - when a ''complete message'' i.e. when some
message
specific number of "\0" separated values have been read, the TCP
reading loop/sequence ends and the read data (complete message) is
returned.

It then seems to me that in return for non-blocking reads and all the
other goodness in EM, I only have to accommodate the possibility that
the ''data'' passed to read_data by the EM might consist of the
last
elements of one message and the initial elements of another message,
or only part of one message, more than one message etc, etc..

My newbie thought is to handle this by:
 -  splitting the ''data'' as Francis suggests
 -  adding the split ''data'' to a Queue (similar scope to
Francis'' @data)
 -  calling my message parsing method(s) from within read_data, and
make them remove elements from the queue, _but_ only remove elements
when a whole message has been parsed, otherwise the queue is left
untouched.
 -  write an unbind that checks to see if there is a non-empty queue
and make some log if there is.

I think this gets around me having to record the state of the message
queue or handle any partially processed messages, and is simplest I
can think of....

I''d appreciate any comment if there is some massive design flaw in
this approach.
I think it also accommodates a case where X read_data calls are
required before all the data that constitutes one message is
assembled.

If the above approach should work then it is a remarkably small change
to make to get all that EM offers!

Thanks a lot for all the help and input
Regards
Mark

On 5/4/07, Francis Cianfrocca <garbagecat10 at gmail.com>
wrote:> Yecccch. There was a syntax error in the code sample in my original reply.
> Here''s a more correct version:
>
>
> def post_init
>   @data = ""
> end
>
> def receive_data data
>   @data << data
>   loop {
>     head,tail = @data.split("\0")
>      if tail
>       # head is the first field
>       @data = tail
>     else
>       break # here, the current data has no \0 character
>     end
>   }
> end
>
>

Francis Cianfrocca

2007-May-04 01:16 UTC

head link

[Eventmachine-talk] Some newbie clarification questions.

On 5/4/07, Mark Van De Vyver <mvyver at gmail.com>
wrote:>
>  It then seems to me that in return for non-blocking reads and all the
> other goodness in EM, I only have to accommodate the possibility that
> the ''data'' passed to read_data by the EM might consist of
the last
> elements of one message and the initial elements of another message,
> or only part of one message, more than one message etc, etc..

Yup. You have to manage state that a threaded programmer just leaves on the
call stack. That''s the tradeoff with the event-driven programming
model.

> I think this gets around me having to record the state of the message
> queue or handle any partially processed messages, and is simplest I
> can think of....


The call is receive_data rather than read_data. That will bite you, which I
why risking a breach of taste to point it out. :-)

You''ve got the right idea. The only flaw I see with this is if your
incoming
messages are huge. (Megabytes, maybe?) In this case you''ll almost
certainly
be calling your processing loop many times (for each incoming chunk of data
off the network) before fulfilling a complete message. So if you''re not
careful, you''ll re-compute a lot of state on every pass through the
loop
only to throw it away and start over on the next receive_data call.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20070504/82c11e8a/attachment.html

Mark Van De Vyver

2007-May-04 18:26 UTC

head link

[Eventmachine-talk] Some newbie clarification questions.

> You''ve got the right idea. The only flaw I see with this is if
your incoming
> messages are huge. (Megabytes, maybe?) In this case you''ll almost
certainly
> be calling your processing loop many times (for each incoming chunk of data
> off the network) before fulfilling a complete message. So if
you''re not
> careful, you''ll re-compute a lot of state on every pass through
the loop
> only to throw it away and start over on the next receive_data call.
Great, my messages in this case are approx 1-10K max.
It is a good point you make, which I''ll force to mind by keeping some
''received bytes'' or ''received fields''
counter, and only attempting to
parse a message after some minimum received data count is reached.

Thanks again
Mark

Bill Kelly

2007-May-06 22:38 UTC

head link

[Eventmachine-talk] Some newbie clarification questions.

Hi Francis,
  From: Francis Cianfrocca


  I wonder if we should sometime write up a primer on basic networking for Ruby
users.
Sounds nice.  Maybe accompanied by some cookbook-style small runnable examples?
  Designing a really effective network protocol is one of those things that is
simple on the surface but subtle and difficult when you get deep into it. I have
a personal bias that most users of EM will eventually choose to use an
in-the-package implementation of a standard protocol that is appropriate for
their application (HTTP, XMPP, Stomp, whatever), rather than develop their own
protocol. But I could be totally wrong about that, and of course there probably
is a huge pile of legacy homegrown protocols out there that people will need to
support, as the OP of this thread does.
Indeed: if the EM protocol implementation exists, and it happens to match my
needs, I''m not sure why I wouldn''t use it.  For example,
I''m using LineAndText for a couple utilities right now, and it was nice
to have it available.  (One issue I had, was I need to be able to accept lines
ending in both LF and CRLF.  So I just went with LF as the delimiter, and call
chomp on the lines passed to me by recv_line().  Simple enough.)

For a different project, I''m developing my own protocol.  I would be
surprised if anything pre-existing would fit (but I certainly don''t
have an encyclopedic knowledge of protocols.)

This application is kind of game-like, and mostly uses TCP, but also uses UDP
for sending live telemetry (the sort where there''s no point in
retransmitting old packets that were lost; all that matters is the most current
information.)

Incidentally, the TCP part of the protocol is similar to DRb in some ways,
except that it''s not ruby-specific.  I was asking about DRb last year,
but I ended up dropping that idea because some of the nodes that will be
speaking this protocol are pure C++.  Yet, from a ruby point of view, it still
maps well to taking the *args from a method_missing, and sending them across the
wire (not accidentally :)
  We should be looking to provide quality implementations of the standard
biggies, as well as adding more support classes (like
EventMachine::Protocols::LineAndTextProtocol) for handrolled protocols.
It''s annoying because a lot of these already have competent Ruby
implementations that are not event-friendly, and you always hate to reinvent the
wheel.

  Tobias Gustaffson is working on an eventable implementation of SIP, which is a
large, challenging protocol packed with features, and supported in both TCP and
UDP. I''m really hopeful about his work because I  think SIP is a
protocol that has a huge future, perhaps as important as HTTP.
Cool. I wasn''t aware of SIP.  From what I''ve read so far, it
sounds like something I should learn more about. :)


One thing that concerns me--overall--is ruby''s performance when it
comes down to heavy byte-by-byte parsing.

For example, I was kind of gritting my teeth when I implemented ANSI/VT100
terminal protocols in ruby, then implemented a line-editing and text-windowing
system on top of that.  The teeth-gritting was not because I was writing the
code in ruby, which was easy and fun; but because I knew how much work the CPU
was doing behind the scenes.  Even now, with just one client, if the client
pastes, say, 5 or 10 KB of text into the line-edit prompt in the terminal, the
CPU on the server pegs at 100% for a full second or two processing the input. (I
should note that I haven''t profiled this; and I haven''t
*tried* to optimize it; but I started out in 8088 assembler and FORTH, and later
on spent a decade writing video games in assembler and C, and even though
I''m a staunch premature-optimization-is-the-root-of-evil kind of guy;
measure! measure! measure!, etc.; ... I feel like I''m taking a couple
orders of magnitude hit on this stuff performance-wise.)  Where I start to
really ponder the performance, is when I consider scaling to, say, 1000 or more
concurrent clients or so.  (Say, the Next Generation IRC-like server... You want
maximum clients possible per box.)

I''m not sure where I''m going with this; as it''s not
meant to be a rant, exactly.  But I found myself wondering about the possibility
of a ''hook'' of sorts, between the raw ''C'' EM
event_callback, and the ruby dispatch...

static void event_callback (const char *a1, int a2, const char *a3, int a4)
{
    //  <-- some kind of hook here?
    rb_funcall (EmModule, rb_intern ("event_callback"), 3,
rb_str_new2(a1), (a2 << 1) | 1, rb_str_new(a3,a4));
}

I dunno... maybe that''s too low-level to be convenient, but what
I''d like is to be able to factor-out ruby as needed, while still being
able to inherit from the ''C'' implementation of the protocol in
ruby.

Thinking further, I guess there''s nothing stopping anyone from
implementing, say, LineAndTextProtocol in C (as a ruby extension class with C
methods.)  We''d still include the minimal overhead of the rb_funcall
and the EventMachine::event_callback case statement in eventmachine.rb, but
I''d be surprised if that couldn''t execute 1,000,000 times a
second...

Haha, indeed:

t1 = Time.now; 1_000_000.times { case 123; when 1 then "foo"; when 2
then "bar"; end; } ; Time.now - t1

...executes almost exactly in 1 second on my 2GHz athlon 64 system.

Hmm...

Well thanks for reading this far; I guess I''ve convinced myself
it''s OK to use EM in ruby, knowing I can implement the
''protocol'' classes in C as I need them.

If I implemented EM-compatible versions of my ANSI/VT100 parsing classes in
''C'', I can probably expect good performance scaling to many
concurrent users.  As long as I can avoid ruby memory leaks in the rest of my
ruby code (Array push/shift, anyone? ;-o)


Regards,

Bill

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20070506/c7da5677/attachment.html

Francis Cianfrocca

2007-May-07 05:38 UTC

head link

[Eventmachine-talk] Some newbie clarification questions.

On 5/7/07, Bill Kelly <billk at cts.com> wrote:>
>
> Cool. I wasn''t aware of SIP.  From what I''ve read so far,
it sounds like
> something I should learn more about. :)
>

Vint Cerf, of all people, says that SIP may become the most important
protocol of all as the internet progresses. It''s already the standard
for
business IP telephony (companies generally won''t use Skype).

I''m not sure where I''m going with this; as it''s not
meant to be a rant,> exactly.  But I found myself wondering about the possibility of a
''hook'' of
> sorts, between the raw ''C'' EM event_callback, and the
ruby dispatch...
>
> static void event_callback (const char *a1, int a2, const char *a3, int
> a4)
> {
>     //  <-- some kind of hook here?
>     rb_funcall (EmModule, rb_intern ("event_callback"), 3,
> rb_str_new2(a1), (a2 << 1) | 1, rb_str_new(a3,a4));
> }
>
> I dunno... maybe that''s too low-level to be convenient, but what
I''d like
> is to be able to factor-out ruby as needed, while still being able to
> inherit from the ''C'' implementation of the protocol in
ruby.
>

Keep your eyes on the Summer of Code project that seeks to integrate Ragel
grammars with EM. This just might be a huge win for people who need to roll
their own protocols, especially for legacy support. I can easily imagine a
standard pattern emerging in which people use EM to wrap up a legacy
protocol and then proxy it into something stadard (REST, perhaps?). This
would make EM the tool of choice for working with pre-existing network-aware
applications.

I think EM needs a competent telnet implementation. What can I do to help
you restart that effort?

I''m not really scared of implementing standard protocols as C
extensions.
I''ve developed kind of a pattern for that, having done it for HTTP,
SAX2,
and (partially-complete) SMTP. (I did an LDAP server in EM in pure-Ruby.)
The nice thing about this is that you can define as many Ruby call-outs as
you want, implement them as fast stubs in C, and then users can redefine in
Ruby only the ones they need. As I said, I''m very keen to see SIP,
Stomp,
telnet, active-FTP, and XMPP in the box. When I get a few minutes, I''ll
publish these additional protocol handlers that I''ve done, to get the
pattern out there.

The packaging is a bit of a question mark. If you look in the Twisted
distro, they have the main distro and a batch of offshoots as separate
downloads with a standard naming convention. That''s worth doing if some
of
the additional pieces have library dependencies (my eventable SAX2 processor
calls out to libxml2), but it would be more convenient to put everything in
the box, which really isn''t that big. Any thoughts?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20070507/21e5f9cd/attachment-0001.html

Mark Van De Vyver

2007-May-09 16:57 UTC

head link

[Eventmachine-talk] Some newbie clarification questions.

In case some other ruby fresher come along this way.... look out for
the ''splat'' (*) in the line I''ve uncommented below.

HTH
Mark
> def post_init
>   @data = ""
> end
>
> def receive_data data
>   @data << data
>   loop {
       head,*tail = @data.split("\0")
>      if tail
>       # head is the first field
>       @data = tail
>     else
>       break # here, the current data has no \0 character
>     end
>   }
> end
>
>

Eventmachine talk - May 2007 - Some newbie clarification questions.

[Eventmachine-talk] Some newbie clarification questions.

[Eventmachine-talk] Some newbie clarification questions.

[Eventmachine-talk] Some newbie clarification questions.

[Eventmachine-talk] Some newbie clarification questions.

[Eventmachine-talk] Some newbie clarification questions.

[Eventmachine-talk] Some newbie clarification questions.

[Eventmachine-talk] Some newbie clarification questions.

[Eventmachine-talk] Some newbie clarification questions.

[Eventmachine-talk] Some newbie clarification questions.

[Eventmachine-talk] Some newbie clarification questions.

[Eventmachine-talk] Some newbie clarification questions.

[Eventmachine-talk] Some newbie clarification questions.

[Eventmachine-talk] Some newbie clarification questions.

[Eventmachine-talk] Some newbie clarification questions.