Short question: is there way to tell EM to actually send data after send_data call? I''m building a file transferring app. I send Mashal.dump''ed metadata first, and then - the file contents (chunked). I found a silly bug: receive_data() gets marshalled metadata and the first chunk of the file in a single variable. Like that: c1.send_data("meta") c1.send_data("chunk1") c1.send_data("chunk2") receiver.receive_data(data): data == "metachunk1chunk2" I have two possible solutions: 1) Some kind of flush between some of the #send_data calls 2) Explicitly split incoming data The first one looks better, but i don''t know is it a right design decision. Thanks in advance. Oleg Andreev (oleganza)
Short answer is no, messages are buffered up and sent after #send_data finishes. In your case, it would be pretty simple to add some stateful checking in your #received_data call and then start the file transfer. Longer answer: The reason you can''t send / receive in the same block is because that would cause blocking, and the reactor is built around the idea of never blocking on network IO. Hope that helps. Jason On Feb 7, 2008 6:01 AM, Oleg Andreev <oleganza at gmail.com> wrote:> Short question: is there way to tell EM to actually send data after > send_data call? > > I''m building a file transferring app. I send Mashal.dump''ed metadata > first, and then - the file contents (chunked). I found a silly bug: > receive_data() gets marshalled metadata and the first chunk of the > file in a single variable. > > Like that: > > c1.send_data("meta") > c1.send_data("chunk1") > c1.send_data("chunk2") > > receiver.receive_data(data): data == "metachunk1chunk2" > > I have two possible solutions: > 1) Some kind of flush between some of the #send_data calls > 2) Explicitly split incoming data > > The first one looks better, but i don''t know is it a right design > decision. > > Thanks in advance. > Oleg Andreev (oleganza) > > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk >
On Feb 7, 2008 6:01 AM, Oleg Andreev <oleganza at gmail.com> wrote:> Short question: is there way to tell EM to actually send data after > send_data call? > > I''m building a file transferring app. I send Mashal.dump''ed metadata > first, and then - the file contents (chunked). I found a silly bug: > receive_data() gets marshalled metadata and the first chunk of the > file in a single variable. > > Like that: > > c1.send_data("meta") > c1.send_data("chunk1") > c1.send_data("chunk2") > > receiver.receive_data(data): data == "metachunk1chunk2" > > I have two possible solutions: > 1) Some kind of flush between some of the #send_data calls > 2) Explicitly split incoming data > > The first one looks better, but i don''t know is it a right design > decision. >There are some important issues with the way you''re approaching this. I assuming you''re using TCP for this. EM doesn''t give you a way to insert boundaries into the data stream, because TCP itself doesn''t. EM''s strategy is to keep the kernel''s outbound buffers as clear as possible. If you have a large number of active data streams, it will actively balance them to make sure none of them starve. And when scheduling I/O, it prefers writing to reading. When it does read, it tries to take as much off the wire as possible (again, subject to a fairness discipline), and it will coalesce incoming TCP data into as few event calls as possible. Even if it were possible to "flush" outbound buffers between each of your three calls to #send_data, that still doesn''t mean that the receiving process will receive the data in three distinct #receive_data calls. And even if your sending process emitted each chunk of data in a different network packet, the network itself might fragment or coalesce them. The bottom line is that you need to recognize and handle the boundaries in your data in your #receive_data handler. If you''re sending marshalled data, you might try this: send four hexadecimal digits before each of the three chunks containing the length of the chunk. I think someone wrote a protocol handler that automatically does something like this, and it may already be in the distro. Hope that helps. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080207/6e8346dc/attachment.html
The LengthPrefixProtocol class in the DistribuStream distribution does this. It frames each "packet" with a 16-bit or 32-bit unsigned big-endian integer, and provides send_packet and receive_packet methods to send data in discrete chunks. On ,Feb 7, 2008 5:56 AM, Francis Cianfrocca <garbagecat10 at gmail.com> wrote:> On Feb 7, 2008 6:01 AM, Oleg Andreev <oleganza at gmail.com> wrote: > > > Short question: is there way to tell EM to actually send data after > > send_data call? > > > > I''m building a file transferring app. I send Mashal.dump''ed metadata > > first, and then - the file contents (chunked). I found a silly bug: > > receive_data() gets marshalled metadata and the first chunk of the > > file in a single variable. > > > > Like that: > > > > c1.send_data("meta") > > c1.send_data("chunk1") > > c1.send_data("chunk2") > > > > receiver.receive_data(data): data == "metachunk1chunk2" > > > > I have two possible solutions: > > 1) Some kind of flush between some of the #send_data calls > > 2) Explicitly split incoming data > > > > The first one looks better, but i don''t know is it a right design > > decision. > > > > > > There are some important issues with the way you''re approaching this. I > assuming you''re using TCP for this. > > EM doesn''t give you a way to insert boundaries into the data stream, > because TCP itself doesn''t. EM''s strategy is to keep the kernel''s outbound > buffers as clear as possible. If you have a large number of active data > streams, it will actively balance them to make sure none of them starve. And > when scheduling I/O, it prefers writing to reading. When it does read, it > tries to take as much off the wire as possible (again, subject to a fairness > discipline), and it will coalesce incoming TCP data into as few event calls > as possible. > > Even if it were possible to "flush" outbound buffers between each of your > three calls to #send_data, that still doesn''t mean that the receiving > process will receive the data in three distinct #receive_data calls. And > even if your sending process emitted each chunk of data in a different > network packet, the network itself might fragment or coalesce them. > > The bottom line is that you need to recognize and handle the boundaries in > your data in your #receive_data handler. > > If you''re sending marshalled data, you might try this: send four > hexadecimal digits before each of the three chunks containing the length of > the chunk. I think someone wrote a protocol handler that automatically does > something like this, and it may already be in the distro. > > Hope that helps. > > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk >-- Tony Arcieri ClickCaster, Inc. tony at clickcaster.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080207/50cc3b60/attachment.html
On 07.02.2008, at 21:02, Tony Arcieri wrote:> The LengthPrefixProtocol class in the DistribuStream distribution > does this. It frames each "packet" with a 16-bit or 32-bit unsigned > big-endian integer, and provides send_packet and receive_packet > methods to send data in discrete chunks.Wow. That''s exactly what i was looking for. Actually, i''ve already made a simple state machine (after metadata received, remote node responds with "Let''s start transfer"), but for complex situations LengthPrefixProtocol is just what the doctor ordered. I''ll give it a try next time. Francis, Tony: grazie per il info! Oleg.
James Tucker
2008-Feb-07 11:56 UTC
[Eventmachine-talk] Buffer flushing (on marshalling protocols)
On 7 Feb 2008, at 12:56, Francis Cianfrocca wrote:> > If you''re sending marshalled data, you might try this: send four > hexadecimal digits before each of the three chunks containing the > length of the chunk. I think someone wrote a protocol handler that > automatically does something like this, and it may already be in the > distro.I have one of these which packs the same way as drb does (which amused me, as I wrote mine prior to reading any drb source code). A single unsigned integer size header. Obviously this too has a limit, and the reason I have not yet released the protocol implementation is that, high as the limit may be, it is a hard limit on packet size for which I currently do not protect (and moreover, this protocol design cannot recover from the corruption that it causes on failure (this makes tls a great option (my current defaults), and udp a very very bad one!)). I also wanted to make it possible to drop this into a monkeypatch on drb in order to make drb evented using the protocol already implemented. (I believe they are similar). Having looked (briefly) at the drb source, I don''t think drb solves this issue either, so some patches may need distributing to clean up this rare, but potential use case. I also am not sure if there are limits in marshaller on this, although that''s somewhat out of scope here. I remember reading not so long ago, talk of trying to ensure that all of the marshalling protocol implementations can talk the same internal wire protocol (security layer excepting). If I remember correctly the mentioning included Francis and Tony, so I also wanted to find out the status of this, and (when I come to it) to discuss this very large object framing issue. Maybe now is the time to open discussion? Something which I know raised Francis eyes when he saw it in the code comments of my implementation, was I had been planning also to add zlib support to the protocol too. (Great for some objects (like heavy coalesced geo data ;) )) A final outstanding area for discussion, as they''re just falling out of my memory right now, is the SSL implementations, I am not sure drb and EM can talk to each other securely right now - however I would need to test this for real to verify it, unless someone here can comment? Kind regards, James.> > > Hope that helps. > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk
Francis Cianfrocca
2008-Feb-07 13:05 UTC
[Eventmachine-talk] Buffer flushing (on marshalling protocols)
On Feb 7, 2008 2:56 PM, James Tucker <jftucker at gmail.com> wrote:> > > A final outstanding area for discussion, as they''re just falling out > of my memory right now, is the SSL implementations, I am not sure drb > and EM can talk to each other securely right now - however I would > need to test this for real to verify it, unless someone here can > comment? > >I have no clue how drb handles SSL, but it''s easy and graceful in EM. (Except that integrating third-party cert chains should be automated somehow.) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080207/46622bd1/attachment.html