thr3ads.net - Eventmachine talk - [Eventmachine-talk] Message queueing [Aug 2006]

If this information is useful, please help other people find it:
Share via:

Francis Cianfrocca

2006-Aug-27 08:30 UTC

[Eventmachine-talk] Message queueing

All, I''ve been thinking about writing an MQ server on top of
EventMachine. I
was noticing that every few days a new thread pops up on the Ruby and the
Rails boards about how to do asynchronous processing. Of course there''s
DRb,
and Bill Kelly is working on implementing DRb on top of EM. But I was
thinking maybe we could benefit from something more powerful, scalable,
concurrent, and fault-tolerant. And even maybe it could interoperate with
other (non-Ruby) systems.

Any reaction to the idea? If enough people are interested, I''ll throw
together a straw-man API so we can start a discussion of usage patterns and
practices.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20060827/dd2a9d56/attachment.html

Bill Kelly

2006-Aug-27 10:56 UTC

head link

[Eventmachine-talk] Message queueing

From: Francis Cianfrocca>
> All, I''ve been thinking about writing an MQ server on top of
> EventMachine. I was noticing that every few days a new thread
> pops up on the Ruby and the Rails boards about how to do
> asynchronous processing. Of course there''s DRb, and Bill Kelly
> is working on implementing DRb on top of EM. But I was thinking
> maybe we could benefit from something more powerful, scalable,
> concurrent, and fault-tolerant. And even maybe it could
> interoperate with other (non-Ruby) systems.
>
> Any reaction to the idea? If enough people are interested, I''ll
> throw together a straw-man API so we can start a discussion of
> usage patterns and practices.
It sounds excellent to me.  I should note that I''m not wedded to
DRb; I''m just trying to arrive at some system that meets the
inter-process communication needs of my current application.

DRb caught my interest because a) it already exists; and b)
it seems ''neat'' to be able to pass Ruby objects between
processes and just make standard method calls on them, even
if the work is done on the remote end.  (That is, I like
programming in Ruby, and DRb seems designed to allow one to
just ''continue programming in ruby'' even though it''s
distributed.)

On the other hand, something "more powerful, scalable,
concurrent, and fault-tolerant" sounds excellent.  That''s
actually where I''d want to be headed in the long run.


Regards,

Bill

Francis Cianfrocca

2006-Aug-27 11:49 UTC

head link

[Eventmachine-talk] Message queueing

On 8/27/06, Bill Kelly <billk at cts.com> wrote:>
> From: Francis Cianfrocca
>  It sounds excellent to me.  I should note that I''m not wedded to
> DRb; I''m just trying to arrive at some system that meets the
> inter-process communication needs of my current application.

If I''m not mistaken, DRb allows you to instantiate and host a Ruby
object in
the server, and the client can invoke methods repeatedly on the server-side
object, which then retains its state between the calls. Is that right, Bill?

If so, then I think there will always be a role for DRb, because a really
powerful and scalable distributed-Ruby framework should be idempotent.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/eventmachine-talk/attachments/20060827/81236be0/attachment-0001.html

snacktime

2006-Aug-27 12:34 UTC

head link

[Eventmachine-talk] Message queueing

I actually ran across a use for a MQ server just the other day.  For a
while we have been using big brother to monitor our network, but more
and more we really need a smaller application customized to our own
needs.  Drb would work, but a good MQ system I think would be better.

I''ve also toyed around with the idea of using a MQ system for
designing some fault tolerant distributed transaction processing
applications, but just haven''t had the time to really explore it.  I
played around with spread last year a bit, that''s as far as I got.

Bill Kelly

2006-Sep-01 12:35 UTC

head link

[Eventmachine-talk] Message queueing

From: Francis Cianfrocca>
> If I''m not mistaken, DRb allows you to instantiate and host
> a Ruby object in the server, and the client can invoke methods
> repeatedly on the server-side object, which then retains its
> state between the calls. Is that right, Bill?
Sorry for the delay, I needed to learn a little more about
DRb.

Indeed, DRb is pretty slick.  I wrote a simple test of the
''hub'' server idea, loading plugins, and having a client
connect (also using DRb, for now) and query the hub, which
in turn queries the plugins.

Actually, it goes a little further, in that it demonstrates
that proxied objects can be passed around promiscuously between
all connected nodes, and method calls will be seamlessly routed
(bridged?) to the object''s true owner.

That is, I set up the test so that instead of having the hub
query both plugins, it queries one plug-in, which in turn
requests the api-object of the second plug-in from the hub,
and queries that plugin, and returns an aggregate result.

http://tastyspleen.net/~billk/drbtest/
(see comments at top of hub.rb for additional explanation)


I really wanted to dump the packets being exchanged, but I
couldn''t figure out how to get windump (tcpdump) to capture
packets over loopback.

(Maybe I''ll settle for putting printouts in drb.rb...)

((I am not sure what would happen if plugin1 called plugin2
which turned around and tried to call back into plugin1 !!!
Maybe deadlock???  I''ll try that, next...))


Anyway, I have not yet progressed to attempting to implement
an EventMachine compatible version of DRbTCPSocket... But I
do have a question:

Given that the event handlers in EM should be non-blocking,
it seems I''ll need to keep some sort of transaction state,
and when events arrive, be able to associate them with the
ongoing transaction to which they belong.

I guess what I''m wondering is whether there may be existing
best practices for how such state is usually managed
within an EM architecture.

Here''s a simple scenario:

   Client A invokes method on Hub
     Hub invokes method on Plugin 1
       Plugin 1 returns result to Hub
     Hub returns result to client
>From the Hub''s point of view, when it receives data fromClient A, it will have to determine how to route the
request.  For ex., it''s possible the request is a purely
local method invocation that could be answered immediately.

Well, more to the point, the Hub could conceivably call
a local method, which through some call chain, eventually
makes one or more remote calls.

I''m having trouble imagining how to turn this transparently
into a nonblocking event-based architecture, without
either using threads or, maybe, continuations (which have
thread-like overhead).

....Well... it seems ruby can create around 10,000 threads
per second on my system, so, maybe I shouldn''t be worried
about creating a thread to handle each request.


Well I guess this turned into a sort of brain dump with
where I am currently in thinking about possible
implementations.

But any thoughts are welcome ...... !  :)



Regards,

Bill

Francis Cianfrocca

2006-Sep-01 13:34 UTC

head link

[Eventmachine-talk] Message queueing

On 9/1/06, Bill Kelly <billk at cts.com> wrote:
 > Anyway, I have not yet progressed to attempting to
implement> an EventMachine compatible version of DRbTCPSocket... But I
> do have a question:
>
> Given that the event handlers in EM should be non-blocking,
> it seems I''ll need to keep some sort of transaction state,
> and when events arrive, be able to associate them with the
> ongoing transaction to which they belong.
>
> I guess what I''m wondering is whether there may be existing
> best practices for how such state is usually managed
> within an EM architecture.
The basic idea when handling a protocol in an event-driven way is that
you have to make your state handling restartable. That means keeping
track of where you are in a protocol-conversation and also buffering
up incomplete data. This is the same work you would do on a thread,
except that you don''t have to think about any particular protocol
states until you''ve seen enough data to get into it. (This is hard to
explain.) What I usually do is make a map of the different "states" a
protocol can be in, and whenever new data comes in, apply it to the
current state. (Bearing in mind that the data may trigger one or more
state transitions, and also that you have to consume all of the data
you get in any one event.) For a simple example, look at the
preliminary http client implementation in version_0/lib/protocols. You
get good at it after a while but it does seem backwards at first.

>
> Here''s a simple scenario:
>
>    Client A invokes method on Hub
>      Hub invokes method on Plugin 1
>        Plugin 1 returns result to Hub
>      Hub returns result to client
>
> >From the Hub''s point of view, when it receives data from
> Client A, it will have to determine how to route the
> request.  For ex., it''s possible the request is a purely
> local method invocation that could be answered immediately.
>
> Well, more to the point, the Hub could conceivably call
> a local method, which through some call chain, eventually
> makes one or more remote calls.
>
> I''m having trouble imagining how to turn this transparently
> into a nonblocking event-based architecture, without
> either using threads or, maybe, continuations (which have
> thread-like overhead).
>
The EventMachine#defer method was intended for this kind of thing. The
whole point of event-driven programming is to keep the CPU as busy as
possible at all times with real work (as opposed to context switching,
swapping or page-faulting). With your DRb hub, you''re talking about an
architecture in which a user can plug in arbitrary code that you may
need to execute in response to a network event. It''s straightforward,
naive, and potentially useful (depending on the application) to simply
execute the plugin code synchronously and return the result from the
inside of an event handler! I''ve found that MOST protocols can be
handled this way, but there is a balance you have to strike: if you
have to run a long-running computation on an event handler, you''ll
burn the CPU time anyway, so the only real reason to context-switch
away from it is if other requests will suffer from the waiting. That''s
what you have to balance: some applications (like client-facing ones)
need really fast responsiveness, and others don''t. In the former case,
you pay a price in lower total throughput for the greater
responsiveness since you have to add in all the thread-switch
overhead.

There is a third case: the plugin code may do something that involves
system or network latency, like a database call. In this case, you''re
doing nothing with your CPU time but waiting for some other CPU, so
this case almost always calls for #defer. Local disk i/o may be a
different story. I''m in the middle of adding some code to EM so you
can do nonblocking local file i/o. I''m not sure this will add any
benefit (because it depends on how the CPU interacts with the disk
drivers, caches on the actual disk controllers, etc) but if it does
prove beneficial, it will be neat.

There''s a lot of lore in regard to whether and how you prioritize
incoming requests that have you have to squeeze through a single pipe
(like a CPU). Software like transaction monitors and
messaging-queueing systems often either infer or configure a priority
discipline and then make sure even low-priority (typically large)
requests get a few CPU slices every so often. Other systems, like IP
routers, apply stricter rules. For example: the shortest packet (or
the smallest request) ALWAYS goes first, no exceptions. As EM matures,
we''ll probably add some of this stuff to it.

It might be good for you to give plugin code a chance to give a "hint"
about whether it expects to be long-running or not. If so, you would
call it with EventMachine#defer (activating the internal thread pool),
and if not, you just call it synchronously.

I think that one benefit of this effort to run DRb over EM is that it
will give people the chance to issue a batch of remote requests all in
a bunch and not return till they are done, which can potentially give
a big speedup, because you don''t have to serialize all the network
i/o. Libcurl has a feature like this.

Bill Kelly

2006-Sep-04 13:12 UTC

head link

[Eventmachine-talk] Message queueing

From: "Francis Cianfrocca" <garbagecat10 at
gmail.com>>
> The basic idea when handling a protocol in an event-driven way is that
> you have to make your state handling restartable. That means keeping
> track of where you are in a protocol-conversation and also buffering
> up incomplete data. This is the same work you would do on a thread,
> except that you don''t have to think about any particular protocol
> states until you''ve seen enough data to get into it. (This is hard
to
> explain.) What I usually do is make a map of the different
"states" a
> protocol can be in, and whenever new data comes in, apply it to the
> current state. (Bearing in mind that the data may trigger one or more
> state transitions, and also that you have to consume all of the data
> you get in any one event.) For a simple example, look at the
> preliminary http client implementation in version_0/lib/protocols. You
> get good at it after a while but it does seem backwards at first.
Ah, OK thanks.  I''m familiar with this, actually.  It''s how my
ANSI/VT100 emulator works.
> The EventMachine#defer method was intended for this kind of thing. The
> whole point of event-driven programming is to keep the CPU as busy as
> possible at all times with real work (as opposed to context switching,
> swapping or page-faulting). With your DRb hub, you''re talking
about an
> architecture in which a user can plug in arbitrary code that you may
> need to execute in response to a network event. It''s
straightforward,
> naive, and potentially useful (depending on the application) to simply
> execute the plugin code synchronously and return the result from the
> inside of an event handler!
Ahh... so, even though the plugin is in a different process,
the idea is, in terms of what the CPU is doing, we''re betting
that most often we can synchronously message the other
process and wait for a result, because the OS scheduler will
switch to that process immediately (or, at least, _efficiently_,
with regard to CPU cycles), do the work, and get right back to
us.
> It might be good for you to give plugin code a chance to give a
"hint"
> about whether it expects to be long-running or not. If so, you would
> call it with EventMachine#defer (activating the internal thread pool),
> and if not, you just call it synchronously.
Hmm.  Indeed, the first example I''d had in mind was calling a
#search method on all my plugins.  Plugins would, in turn, be
accessing either local databases or making internet queries.

But I can see how one might anticipate that many DRb method
calls would be able to return an immediate result, and how the
synchronous event IPC model makes sense in such cases.

. . . Hmm. . . I''m a little uneasy about having to add hinting,
for some reason, . . . but I don''t have an alternative in mind.

One thing I do know, is I''ll need very low-latency responses
to some client requests.  (For ex. if a client is viewing a
user interface modeled on the server, and the client is
manipulating some sort of slider widget... I''ll want very low-
latency responses... (I might be better off using UDP for some
of the client/server communcation.))


Regards,

Bill

Eventmachine talk - Aug 2006 - Message queueing

[Eventmachine-talk] Message queueing

[Eventmachine-talk] Message queueing

[Eventmachine-talk] Message queueing

[Eventmachine-talk] Message queueing

[Eventmachine-talk] Message queueing

[Eventmachine-talk] Message queueing

[Eventmachine-talk] Message queueing