We have been trying to send large files with EventMachine and noticed a few issues. If we just use send data with the contents of a file inside it is slow, and the server eats about 98% of the CPU. The send_file call only supports files up to 32K, which we are sending files as large as 5mb. Lastly we have been unable to use stream_file_data, because it has a dependency on evma_fastfilereader, which I couldn''t seem to find anywhere to install anymore. Some of these issues have been discussed in this thread: http://groups.google.com/group/eventmachine/browse_thread/thread/3cc6b0ee1a8419?pli=1 Has anyone been sending large file with eventmachine that could share some tips. In our case we are using EM for both the client and the server. We are trying to sync over a directory of many files, is this just not a recommended usage of EM? Besides looking for solutions to make this work better on EM, are there other recommendations of better ways to send and receive large amounts of file data with Ruby? Thanks, Dan -- Dan Mayer Co-founder, Devver (http://devver.net) follow us on twitter: http://twitter.com/devver My Blog (http://mayerdan.com) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080928/d77b49e7/attachment.html>
Kirk Haines
2008-Sep-28 19:46 UTC
[Eventmachine-talk] EM sending and receiving large files
On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <dan at devver.net> wrote:> We have been trying to send large files with EventMachine and noticed a few > issues. If we just use send data with the contents of a file inside it is > slow, and the server eats about 98% of the CPU. The send_file call only > supports files up to 32K, which we are sending files as large as 5mb. Lastly > we have been unable to use stream_file_data, because it has a dependency on > evma_fastfilereader, which I couldn''t seem to find anywhere to install > anymore. >Hmmm. I think that was confused oversight on Francis/my part. evma_fastfilereader should be part of EM. Until it is, you can get it by installing Swiftiply.> > Has anyone been sending large file with eventmachine that could share some > tips. In our case we are using EM for both the client and the server. We are > trying to sync over a directory of many files, is this just not a > recommended usage of EM? Besides looking for solutions to make this work > better on EM, are there other recommendations of better ways to send and > receive large amounts of file data with Ruby? >Using stream_file_data I regularly transfer very large files with Swiftiply. Kirk Haines -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080928/8bb3d6f9/attachment.html>
James Tucker
2008-Sep-29 04:56 UTC
[Eventmachine-talk] EM sending and receiving large files
On 29 Sep 2008, at 03:46, Kirk Haines wrote:> > > On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <dan at devver.net> wrote: > We have been trying to send large files with EventMachine and > noticed a few issues. If we just use send data with the contents of > a file inside it is slow, and the server eats about 98% of the CPU. > The send_file call only supports files up to 32K, which we are > sending files as large as 5mb. Lastly we have been unable to use > stream_file_data, because it has a dependency on > evma_fastfilereader, which I couldn''t seem to find anywhere to > install anymore. > > Hmmm. I think that was confused oversight on Francis/my part. > evma_fastfilereader should be part of EM. Until it is, you can get > it by installing Swiftiply.I''ve been meaning to come and grab it and commit it to EM, as it''s also the last failing test in the suite run from trunk after the last months work. Assuming there are no other issues raised, I will get this committed to the EM code base.> > > Has anyone been sending large file with eventmachine that could > share some tips. In our case we are using EM for both the client and > the server. We are trying to sync over a directory of many files, is > this just not a recommended usage of EM? Besides looking for > solutions to make this work better on EM, are there other > recommendations of better ways to send and receive large amounts of > file data with Ruby? > > Using stream_file_data I regularly transfer very large files with > Swiftiply. > > > Kirk Haines > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080929/97d83ab1/attachment.html>
Thanks for the tip on installing Swiftiply, that made stream_file_data work perfectly. Unfortunately, it didn''t solve our problem. Large files were still taking a long time to transfer. So I looked deeper into the issue, I had always been assuming the delay was actually the slow transfer time. Running a profiler against our code was enlightening as always, it appears our message buffer is adding a significant amount of the time. If I completely get rid of any message buffer on the server used to split up multiple messages, either send_data or stream_file_data (with larger files) drops to less than 1 second. After searching around a bit I found BufferedTokenizer, which is one of the protocols for EM. Switching from our apparently bad buffer to the one included with EM brought us from 10 seconds to 1.2 seconds. Thanks for the the help, looks like everything is back on track for our EM performance. thanks, Dan Mayer On Mon, Sep 29, 2008 at 5:56 AM, James Tucker <jftucker at gmail.com> wrote:> > On 29 Sep 2008, at 03:46, Kirk Haines wrote: > > > > On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <dan at devver.net> wrote: > >> We have been trying to send large files with EventMachine and noticed a >> few issues. If we just use send data with the contents of a file inside it >> is slow, and the server eats about 98% of the CPU. The send_file call only >> supports files up to 32K, which we are sending files as large as 5mb. Lastly >> we have been unable to use stream_file_data, because it has a dependency on >> evma_fastfilereader, which I couldn''t seem to find anywhere to install >> anymore. >> > > Hmmm. I think that was confused oversight on Francis/my part. > evma_fastfilereader should be part of EM. Until it is, you can get it by > installing Swiftiply. > > > I''ve been meaning to come and grab it and commit it to EM, as it''s also the > last failing test in the suite run from trunk after the last months work. > Assuming there are no other issues raised, I will get this committed to the > EM code base. > > > >> Has anyone been sending large file with eventmachine that could share some >> tips. In our case we are using EM for both the client and the server. We are >> trying to sync over a directory of many files, is this just not a >> recommended usage of EM? Besides looking for solutions to make this work >> better on EM, are there other recommendations of better ways to send and >> receive large amounts of file data with Ruby? >> > > Using stream_file_data I regularly transfer very large files with > Swiftiply. > > > Kirk Haines > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk > > > > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk >-- Dan Mayer Co-founder, Devver (http://devver.net) follow us on twitter: http://twitter.com/devver My Blog (http://mayerdan.com) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080929/f05ea7cb/attachment.html>
Do you know what specifically about your buffer was causing issues? Were you using String#<< Aman On Mon, Sep 29, 2008 at 5:45 PM, Dan Mayer <dan at devver.net> wrote:> Thanks for the tip on installing Swiftiply, that made stream_file_data work > perfectly. > > Unfortunately, it didn''t solve our problem. Large files were still taking a > long time to transfer. So I looked deeper into the issue, I had always been > assuming the delay was actually the slow transfer time. Running a profiler > against our code was enlightening as always, it appears our message buffer > is adding a significant amount of the time. If I completely get rid of any > message buffer on the server used to split up multiple messages, either > send_data or stream_file_data (with larger files) drops to less than 1 > second. After searching around a bit I found BufferedTokenizer, which is one > of the protocols for EM. Switching from our apparently bad buffer to the one > included with EM brought us from 10 seconds to 1.2 seconds. > > Thanks for the the help, looks like everything is back on track for our EM > performance. > > thanks, > Dan Mayer > > On Mon, Sep 29, 2008 at 5:56 AM, James Tucker <jftucker at gmail.com> wrote: >> >> On 29 Sep 2008, at 03:46, Kirk Haines wrote: >> >> >> On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <dan at devver.net> wrote: >>> >>> We have been trying to send large files with EventMachine and noticed a >>> few issues. If we just use send data with the contents of a file inside it >>> is slow, and the server eats about 98% of the CPU. The send_file call only >>> supports files up to 32K, which we are sending files as large as 5mb. Lastly >>> we have been unable to use stream_file_data, because it has a dependency on >>> evma_fastfilereader, which I couldn''t seem to find anywhere to install >>> anymore. >> >> Hmmm. I think that was confused oversight on Francis/my part. >> evma_fastfilereader should be part of EM. Until it is, you can get it by >> installing Swiftiply. >> >> I''ve been meaning to come and grab it and commit it to EM, as it''s also >> the last failing test in the suite run from trunk after the last months >> work. Assuming there are no other issues raised, I will get this committed >> to the EM code base. >> >>> >>> Has anyone been sending large file with eventmachine that could share >>> some tips. In our case we are using EM for both the client and the server. >>> We are trying to sync over a directory of many files, is this just not a >>> recommended usage of EM? Besides looking for solutions to make this work >>> better on EM, are there other recommendations of better ways to send and >>> receive large amounts of file data with Ruby? >> >> Using stream_file_data I regularly transfer very large files with >> Swiftiply. >> >> >> Kirk Haines >> _______________________________________________ >> Eventmachine-talk mailing list >> Eventmachine-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/eventmachine-talk >> >> _______________________________________________ >> Eventmachine-talk mailing list >> Eventmachine-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/eventmachine-talk > > > > -- > Dan Mayer > Co-founder, Devver > (http://devver.net) > follow us on twitter: http://twitter.com/devver > My Blog (http://mayerdan.com) > > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk >
Aman (and hopefully others interested on the list), Here is a profiler dump after I optimized a bit, I got ours from 26ish seconds down to 10 by getting rid of things like String#<< 14.44 3.49 0.66 668 0.99 0.99 String#split 13.13 4.09 0.60 665 0.90 0.90 String#index 4.16 4.28 0.19 668 0.28 3.29 DataBuffer#grab 3.06 4.42 0.14 661 0.21 6.87 EmServerExample#receive_data 0.88 4.46 0.04 2007 0.02 0.02 Array#length 0.66 4.49 0.03 2007 0.01 0.01 Fixnum#> 0.66 4.52 0.03 662 0.05 3.31 DataBuffer#append What is the fastest way to do appending to strings? This is a really messy since I was messing around trying a bunch optimizations and other things, before finding and switching to the EM buffer. class DataBuffer FRONT_DELIMITER = "0x5b".hex.chr # ''['' #'']''[0].to_s(16).hex.chr BACK_DELIMITER = "0x5d".hex.chr # '']'' #crazy delimiter because normal ones kept showing up in binary files DELIMITER "|#{FRONT_DELIMITER}#{FRONT_DELIMITER}#{FRONT_DELIMITER}GT_DELIM#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}|" #added to replace, dynamically making these DELIM_ESCAPE = /#{Regexp.escape(DELIMITER)}/ DELIM_ESCAPE_END = /#{Regexp.escape(DELIMITER)}\Z/ def initialize @unprocessed = "" @commands = [] end def grab new_messages = @unprocessed.split(DELIM_ESCAPE) while new_messages.length > 1 @commands << new_messages.shift end msg_length = new_messages.length if msg_length > 0 if msg_length == 1 && (@unprocessed=~DELIM_ESCAPE_END) # @commands << new_messages.shift @commands.push(new_messages.shift) @unprocessed = "" else #put the rest of the last statement back into the buffer while(cut=@unprocessed.index(DELIM_ESCAPE)) @unprocessed = (@unprocessed[cut.. at unprocessed.length ]).sub(DELIMITER,"") end end end if @commands.length > 0 return @commands.shift else return nil #if @commands.length==0 end end def prepare(str) str.to_s+DELIMITER end def append(data) # @unprocessed << data @unprocessed = @unprocessed + data end end ... client / server code usage... send_data(@buffer.prepare("some_msg")) def receive_data(data) @buffer.append(data) while(command = @buffer.grab) process(command) end end def process(data) puts "got data: #{data}" end ... I am probably going to look closer at the EM buffer and our code and I am sure I will realize something pretty dumb that we did. Thanks, Dan On Mon, Sep 29, 2008 at 7:49 PM, Aman Gupta <themastermind1 at gmail.com>wrote:> Do you know what specifically about your buffer was causing issues? > Were you using String#<< > > Aman > > On Mon, Sep 29, 2008 at 5:45 PM, Dan Mayer <dan at devver.net> wrote: > > Thanks for the tip on installing Swiftiply, that made stream_file_data > work > > perfectly. > > > > Unfortunately, it didn''t solve our problem. Large files were still taking > a > > long time to transfer. So I looked deeper into the issue, I had always > been > > assuming the delay was actually the slow transfer time. Running a > profiler > > against our code was enlightening as always, it appears our message > buffer > > is adding a significant amount of the time. If I completely get rid of > any > > message buffer on the server used to split up multiple messages, either > > send_data or stream_file_data (with larger files) drops to less than 1 > > second. After searching around a bit I found BufferedTokenizer, which is > one > > of the protocols for EM. Switching from our apparently bad buffer to the > one > > included with EM brought us from 10 seconds to 1.2 seconds. > > > > Thanks for the the help, looks like everything is back on track for our > EM > > performance. > > > > thanks, > > Dan Mayer > > > > On Mon, Sep 29, 2008 at 5:56 AM, James Tucker <jftucker at gmail.com> > wrote: > >> > >> On 29 Sep 2008, at 03:46, Kirk Haines wrote: > >> > >> > >> On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <dan at devver.net> wrote: > >>> > >>> We have been trying to send large files with EventMachine and noticed a > >>> few issues. If we just use send data with the contents of a file inside > it > >>> is slow, and the server eats about 98% of the CPU. The send_file call > only > >>> supports files up to 32K, which we are sending files as large as 5mb. > Lastly > >>> we have been unable to use stream_file_data, because it has a > dependency on > >>> evma_fastfilereader, which I couldn''t seem to find anywhere to install > >>> anymore. > >> > >> Hmmm. I think that was confused oversight on Francis/my part. > >> evma_fastfilereader should be part of EM. Until it is, you can get it > by > >> installing Swiftiply. > >> > >> I''ve been meaning to come and grab it and commit it to EM, as it''s also > >> the last failing test in the suite run from trunk after the last months > >> work. Assuming there are no other issues raised, I will get this > committed > >> to the EM code base. > >> > >>> > >>> Has anyone been sending large file with eventmachine that could share > >>> some tips. In our case we are using EM for both the client and the > server. > >>> We are trying to sync over a directory of many files, is this just not > a > >>> recommended usage of EM? Besides looking for solutions to make this > work > >>> better on EM, are there other recommendations of better ways to send > and > >>> receive large amounts of file data with Ruby? > >> > >> Using stream_file_data I regularly transfer very large files with > >> Swiftiply. > >> > >> > >> Kirk Haines > >> _______________________________________________ > >> Eventmachine-talk mailing list > >> Eventmachine-talk at rubyforge.org > >> http://rubyforge.org/mailman/listinfo/eventmachine-talk > >> > >> _______________________________________________ > >> Eventmachine-talk mailing list > >> Eventmachine-talk at rubyforge.org > >> http://rubyforge.org/mailman/listinfo/eventmachine-talk > > > > > > > > -- > > Dan Mayer > > Co-founder, Devver > > (http://devver.net) > > follow us on twitter: http://twitter.com/devver > > My Blog (http://mayerdan.com) > > > > _______________________________________________ > > Eventmachine-talk mailing list > > Eventmachine-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/eventmachine-talk > > > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk >-- Dan Mayer Co-founder, Devver (http://devver.net) follow us on twitter: http://twitter.com/devver My Blog (http://mayerdan.com) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080929/9040e9e6/attachment.html>
James Tucker
2008-Sep-30 04:25 UTC
[Eventmachine-talk] EM sending and receiving large files
Dan, If you have some time, would you be able to use your data sets against this other BufferedTokenizer implementation: http://pastie.textmate.org/private/ykjtuipjedrwgzwgggu5w There are varying cases for performance depending on the specific data sets and chunk size being added to the buffer. Ruby''s GC certainly starts to cause performance issues with too many objects, so I''m trying to strike a balance. Any input would be welcome, Kind regards, J. On 30 Sep 2008, at 03:07, Dan Mayer wrote:> Aman (and hopefully others interested on the list), > > Here is a profiler dump after I optimized a bit, I got ours from > 26ish seconds down to 10 by getting rid of things like String#<< > 14.44 3.49 0.66 668 0.99 0.99 String#split > 13.13 4.09 0.60 665 0.90 0.90 String#index > 4.16 4.28 0.19 668 0.28 3.29 DataBuffer#grab > 3.06 4.42 0.14 661 0.21 6.87 > EmServerExample#receive_data > 0.88 4.46 0.04 2007 0.02 0.02 Array#length > 0.66 4.49 0.03 2007 0.01 0.01 Fixnum#> > 0.66 4.52 0.03 662 0.05 3.31 > DataBuffer#append > > What is the fastest way to do appending to strings? > > This is a really messy since I was messing around trying a bunch > optimizations and other things, before finding and switching to the > EM buffer. > > class DataBuffer > FRONT_DELIMITER = "0x5b".hex.chr # ''['' > #'']''[0].to_s(16).hex.chr > BACK_DELIMITER = "0x5d".hex.chr # '']'' > #crazy delimiter because normal ones kept showing up in binary files > DELIMITER = "| > #{FRONT_DELIMITER > }#{FRONT_DELIMITER > }#{FRONT_DELIMITER > }GT_DELIM > #{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}|" > #added to replace, dynamically making these > DELIM_ESCAPE = /#{Regexp.escape(DELIMITER)}/ > DELIM_ESCAPE_END = /#{Regexp.escape(DELIMITER)}\Z/ > > def initialize > @unprocessed = "" > @commands = [] > end > > def grab > new_messages = @unprocessed.split(DELIM_ESCAPE) > while new_messages.length > 1 > @commands << new_messages.shift > end > msg_length = new_messages.length > if msg_length > 0 > if msg_length == 1 && (@unprocessed=~DELIM_ESCAPE_END) > # @commands << new_messages.shift > @commands.push(new_messages.shift) > @unprocessed = "" > else > #put the rest of the last statement back into the buffer > while(cut=@unprocessed.index(DELIM_ESCAPE)) > @unprocessed > (@unprocessed[cut.. at unprocessed.length]).sub(DELIMITER,"") > end > end > end > if @commands.length > 0 > return @commands.shift > else > return nil #if @commands.length==0 > end > end > > def prepare(str) > str.to_s+DELIMITER > end > > def append(data) > # @unprocessed << data > @unprocessed = @unprocessed + data > end > > end > > ... client / server code usage... > send_data(@buffer.prepare("some_msg")) > > def receive_data(data) > @buffer.append(data) > while(command = @buffer.grab) > process(command) > end > end > > def process(data) > puts "got data: #{data}" > end > ... > > I am probably going to look closer at the EM buffer and our code and > I am sure I will realize something pretty dumb that we did. > > Thanks, > Dan > > On Mon, Sep 29, 2008 at 7:49 PM, Aman Gupta > <themastermind1 at gmail.com> wrote: > Do you know what specifically about your buffer was causing issues? > Were you using String#<< > > Aman > > On Mon, Sep 29, 2008 at 5:45 PM, Dan Mayer <dan at devver.net> wrote: > > Thanks for the tip on installing Swiftiply, that made > stream_file_data work > > perfectly. > > > > Unfortunately, it didn''t solve our problem. Large files were still > taking a > > long time to transfer. So I looked deeper into the issue, I had > always been > > assuming the delay was actually the slow transfer time. Running a > profiler > > against our code was enlightening as always, it appears our > message buffer > > is adding a significant amount of the time. If I completely get > rid of any > > message buffer on the server used to split up multiple messages, > either > > send_data or stream_file_data (with larger files) drops to less > than 1 > > second. After searching around a bit I found BufferedTokenizer, > which is one > > of the protocols for EM. Switching from our apparently bad buffer > to the one > > included with EM brought us from 10 seconds to 1.2 seconds. > > > > Thanks for the the help, looks like everything is back on track > for our EM > > performance. > > > > thanks, > > Dan Mayer > > > > On Mon, Sep 29, 2008 at 5:56 AM, James Tucker <jftucker at gmail.com> > wrote: > >> > >> On 29 Sep 2008, at 03:46, Kirk Haines wrote: > >> > >> > >> On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <dan at devver.net> wrote: > >>> > >>> We have been trying to send large files with EventMachine and > noticed a > >>> few issues. If we just use send data with the contents of a file > inside it > >>> is slow, and the server eats about 98% of the CPU. The send_file > call only > >>> supports files up to 32K, which we are sending files as large as > 5mb. Lastly > >>> we have been unable to use stream_file_data, because it has a > dependency on > >>> evma_fastfilereader, which I couldn''t seem to find anywhere to > install > >>> anymore. > >> > >> Hmmm. I think that was confused oversight on Francis/my part. > >> evma_fastfilereader should be part of EM. Until it is, you can > get it by > >> installing Swiftiply. > >> > >> I''ve been meaning to come and grab it and commit it to EM, as > it''s also > >> the last failing test in the suite run from trunk after the last > months > >> work. Assuming there are no other issues raised, I will get this > committed > >> to the EM code base. > >> > >>> > >>> Has anyone been sending large file with eventmachine that could > share > >>> some tips. In our case we are using EM for both the client and > the server. > >>> We are trying to sync over a directory of many files, is this > just not a > >>> recommended usage of EM? Besides looking for solutions to make > this work > >>> better on EM, are there other recommendations of better ways to > send and > >>> receive large amounts of file data with Ruby? > >> > >> Using stream_file_data I regularly transfer very large files with > >> Swiftiply. > >> > >> > >> Kirk Haines > >> _______________________________________________ > >> Eventmachine-talk mailing list > >> Eventmachine-talk at rubyforge.org > >> http://rubyforge.org/mailman/listinfo/eventmachine-talk > >> > >> _______________________________________________ > >> Eventmachine-talk mailing list > >> Eventmachine-talk at rubyforge.org > >> http://rubyforge.org/mailman/listinfo/eventmachine-talk > > > > > > > > -- > > Dan Mayer > > Co-founder, Devver > > (http://devver.net) > > follow us on twitter: http://twitter.com/devver > > My Blog (http://mayerdan.com) > > > > _______________________________________________ > > Eventmachine-talk mailing list > > Eventmachine-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/eventmachine-talk > > > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk > > > > -- > Dan Mayer > Co-founder, Devver > (http://devver.net) > follow us on twitter: http://twitter.com/devver > My Blog (http://mayerdan.com) > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080930/c6bee611/attachment-0001.html>
Tony Arcieri
2008-Sep-30 09:46 UTC
[Eventmachine-talk] EM sending and receiving large files
On Tue, Sep 30, 2008 at 5:25 AM, James Tucker <jftucker at gmail.com> wrote:> Dan, > If you have some time, would you be able to use your data sets against this > other BufferedTokenizer implementation: > > http://pastie.textmate.org/private/ykjtuipjedrwgzwgggu5w >A string-based one should generally be faster on Ruby 1.8 -- Tony Arcieri medioh.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080930/a4155c1d/attachment.html>
James Tucker
2008-Sep-30 10:55 UTC
[Eventmachine-talk] EM sending and receiving large files
On 30 Sep 2008, at 17:46, Tony Arcieri wrote:> On Tue, Sep 30, 2008 at 5:25 AM, James Tucker <jftucker at gmail.com> > wrote: > Dan, > > If you have some time, would you be able to use your data sets > against this other BufferedTokenizer implementation: > > http://pastie.textmate.org/private/ykjtuipjedrwgzwgggu5w > > A string-based one should generally be faster on Ruby 1.8In a few tests I did here, the differences were related to size of incoming chunk and number of chunks per token mostly. 1.8 - 1.9 speed differences vary, each has it''s own advantages at certain tasks, but the two implementations were overall quite comparable on both interpreters. What I''m hoping to get an idea of is where and why the differences really come up.> > > -- > Tony Arcieri > medioh.com > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080930/7ca788ea/attachment.html>
Sure no problem. Sorry it took me so long to get back to this, I got slammed with some items that I had to take care of today. I ran it on a small test set of data, and the results were very similar... The current tokenizer in EM seemed to outperform your pastie by very small amounts. Tomorrow I can run it against a much large and real project, and I will let you know if I notice any significant differences. I am cleaning up some of the code I have been using, and will likely make a post about various methods of sending files through EM in the next couple days. I noticed it wasn''t the easiest to find examples of the various options just out on the web, so it might help a few people running into similar problems. peace, Dan Mayer On Tue, Sep 30, 2008 at 5:25 AM, James Tucker <jftucker at gmail.com> wrote:> Dan, > If you have some time, would you be able to use your data sets against this > other BufferedTokenizer implementation: > > http://pastie.textmate.org/private/ykjtuipjedrwgzwgggu5w > > There are varying cases for performance depending on the specific data sets > and chunk size being added to the buffer. Ruby''s GC certainly starts to > cause performance issues with too many objects, so I''m trying to strike a > balance. > > Any input would be welcome, > > Kind regards, > > J. > > On 30 Sep 2008, at 03:07, Dan Mayer wrote: > > Aman (and hopefully others interested on the list), > > Here is a profiler dump after I optimized a bit, I got ours from 26ish > seconds down to 10 by getting rid of things like String#<< > 14.44 3.49 0.66 668 0.99 0.99 String#split > 13.13 4.09 0.60 665 0.90 0.90 String#index > 4.16 4.28 0.19 668 0.28 3.29 DataBuffer#grab > 3.06 4.42 0.14 661 0.21 6.87 > EmServerExample#receive_data > 0.88 4.46 0.04 2007 0.02 0.02 Array#length > 0.66 4.49 0.03 2007 0.01 0.01 Fixnum#> > 0.66 4.52 0.03 662 0.05 3.31 DataBuffer#append > > What is the fastest way to do appending to strings? > > This is a really messy since I was messing around trying a bunch > optimizations and other things, before finding and switching to the EM > buffer. > > class DataBuffer > FRONT_DELIMITER = "0x5b".hex.chr # ''['' > #'']''[0].to_s(16).hex.chr > BACK_DELIMITER = "0x5d".hex.chr # '']'' > #crazy delimiter because normal ones kept showing up in binary files > DELIMITER > "|#{FRONT_DELIMITER}#{FRONT_DELIMITER}#{FRONT_DELIMITER}GT_DELIM#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}|" > #added to replace, dynamically making these > DELIM_ESCAPE = /#{Regexp.escape(DELIMITER)}/ > DELIM_ESCAPE_END = /#{Regexp.escape(DELIMITER)}\Z/ > > def initialize > @unprocessed = "" > @commands = [] > end > > def grab > new_messages = @unprocessed.split(DELIM_ESCAPE) > while new_messages.length > 1 > @commands << new_messages.shift > end > msg_length = new_messages.length > if msg_length > 0 > if msg_length == 1 && (@unprocessed=~DELIM_ESCAPE_END) > # @commands << new_messages.shift > @commands.push(new_messages.shift) > @unprocessed = "" > else > #put the rest of the last statement back into the buffer > while(cut=@unprocessed.index(DELIM_ESCAPE)) > @unprocessed = (@unprocessed[cut.. at unprocessed.length > ]).sub(DELIMITER,"") > end > end > end > if @commands.length > 0 > return @commands.shift > else > return nil #if @commands.length==0 > end > end > > def prepare(str) > str.to_s+DELIMITER > end > > def append(data) > # @unprocessed << data > @unprocessed = @unprocessed + data > end > > end > > ... client / server code usage... > send_data(@buffer.prepare("some_msg")) > > def receive_data(data) > @buffer.append(data) > while(command = @buffer.grab) > process(command) > end > end > > def process(data) > puts "got data: #{data}" > end > ... > > I am probably going to look closer at the EM buffer and our code and I am > sure I will realize something pretty dumb that we did. > > Thanks, > Dan > > On Mon, Sep 29, 2008 at 7:49 PM, Aman Gupta <themastermind1 at gmail.com>wrote: > >> Do you know what specifically about your buffer was causing issues? >> Were you using String#<< >> >> Aman >> >> On Mon, Sep 29, 2008 at 5:45 PM, Dan Mayer <dan at devver.net> wrote: >> > Thanks for the tip on installing Swiftiply, that made stream_file_data >> work >> > perfectly. >> > >> > Unfortunately, it didn''t solve our problem. Large files were still >> taking a >> > long time to transfer. So I looked deeper into the issue, I had always >> been >> > assuming the delay was actually the slow transfer time. Running a >> profiler >> > against our code was enlightening as always, it appears our message >> buffer >> > is adding a significant amount of the time. If I completely get rid of >> any >> > message buffer on the server used to split up multiple messages, either >> > send_data or stream_file_data (with larger files) drops to less than 1 >> > second. After searching around a bit I found BufferedTokenizer, which is >> one >> > of the protocols for EM. Switching from our apparently bad buffer to the >> one >> > included with EM brought us from 10 seconds to 1.2 seconds. >> > >> > Thanks for the the help, looks like everything is back on track for our >> EM >> > performance. >> > >> > thanks, >> > Dan Mayer >> > >> > On Mon, Sep 29, 2008 at 5:56 AM, James Tucker <jftucker at gmail.com> >> wrote: >> >> >> >> On 29 Sep 2008, at 03:46, Kirk Haines wrote: >> >> >> >> >> >> On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <dan at devver.net> wrote: >> >>> >> >>> We have been trying to send large files with EventMachine and noticed >> a >> >>> few issues. If we just use send data with the contents of a file >> inside it >> >>> is slow, and the server eats about 98% of the CPU. The send_file call >> only >> >>> supports files up to 32K, which we are sending files as large as 5mb. >> Lastly >> >>> we have been unable to use stream_file_data, because it has a >> dependency on >> >>> evma_fastfilereader, which I couldn''t seem to find anywhere to install >> >>> anymore. >> >> >> >> Hmmm. I think that was confused oversight on Francis/my part. >> >> evma_fastfilereader should be part of EM. Until it is, you can get it >> by >> >> installing Swiftiply. >> >> >> >> I''ve been meaning to come and grab it and commit it to EM, as it''s also >> >> the last failing test in the suite run from trunk after the last months >> >> work. Assuming there are no other issues raised, I will get this >> committed >> >> to the EM code base. >> >> >> >>> >> >>> Has anyone been sending large file with eventmachine that could share >> >>> some tips. In our case we are using EM for both the client and the >> server. >> >>> We are trying to sync over a directory of many files, is this just not >> a >> >>> recommended usage of EM? Besides looking for solutions to make this >> work >> >>> better on EM, are there other recommendations of better ways to send >> and >> >>> receive large amounts of file data with Ruby? >> >> >> >> Using stream_file_data I regularly transfer very large files with >> >> Swiftiply. >> >> >> >> >> >> Kirk Haines >> >> _______________________________________________ >> >> Eventmachine-talk mailing list >> >> Eventmachine-talk at rubyforge.org >> >> http://rubyforge.org/mailman/listinfo/eventmachine-talk >> >> >> >> _______________________________________________ >> >> Eventmachine-talk mailing list >> >> Eventmachine-talk at rubyforge.org >> >> http://rubyforge.org/mailman/listinfo/eventmachine-talk >> > >> > >> > >> > -- >> > Dan Mayer >> > Co-founder, Devver >> > (http://devver.net) >> > follow us on twitter: http://twitter.com/devver >> > My Blog (http://mayerdan.com) >> > >> > _______________________________________________ >> > Eventmachine-talk mailing list >> > Eventmachine-talk at rubyforge.org >> > http://rubyforge.org/mailman/listinfo/eventmachine-talk >> > >> _______________________________________________ >> Eventmachine-talk mailing list >> Eventmachine-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/eventmachine-talk >> > > > > -- > Dan Mayer > Co-founder, Devver > (http://devver.net) > follow us on twitter: http://twitter.com/devver > My Blog (http://mayerdan.com) > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk > > > > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk >-- Dan Mayer Co-founder, Devver (http://devver.net) follow us on twitter: http://twitter.com/devver My Blog (http://mayerdan.com) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/eventmachine-talk/attachments/20080930/e1cecec0/attachment-0001.html>
One final follow up. I posted some quick benchmarks comparing sending files with our buffer, EM''s buffer, the buffer James Tucker suggested, and stream_file_data. I also included some benchmarks with compression. I included the code I used for testing. I thought since I hadn''t easily found a good way to send files it might help out some people in the future. It was nice to be able to just switch buffers and get a 10X improvement on speed. http://devver.net/blog/2008/10/sending-files-with-eventmachine/ If anyone has any thoughts, tips, or alternative buffers let me know. thanks, Dan On Tue, Sep 30, 2008 at 10:55 PM, Dan Mayer <dan at devver.net> wrote:> Sure no problem. Sorry it took me so long to get back to this, I got > slammed with some items that I had to take care of today. > > I ran it on a small test set of data, and the results were very similar... > The current tokenizer in EM seemed to outperform your pastie by very small > amounts. Tomorrow I can run it against a much large and real project, and I > will let you know if I notice any significant differences. > > I am cleaning up some of the code I have been using, and will likely make a > post about various methods of sending files through EM in the next couple > days. I noticed it wasn''t the easiest to find examples of the various > options just out on the web, so it might help a few people running into > similar problems. > > peace, > Dan Mayer > > > On Tue, Sep 30, 2008 at 5:25 AM, James Tucker <jftucker at gmail.com> wrote: > >> Dan, >> If you have some time, would you be able to use your data sets against >> this other BufferedTokenizer implementation: >> >> http://pastie.textmate.org/private/ykjtuipjedrwgzwgggu5w >> >> There are varying cases for performance depending on the specific data >> sets and chunk size being added to the buffer. Ruby''s GC certainly starts to >> cause performance issues with too many objects, so I''m trying to strike a >> balance. >> >> Any input would be welcome, >> >> Kind regards, >> >> J. >> >> On 30 Sep 2008, at 03:07, Dan Mayer wrote: >> >> Aman (and hopefully others interested on the list), >> >> Here is a profiler dump after I optimized a bit, I got ours from 26ish >> seconds down to 10 by getting rid of things like String#<< >> 14.44 3.49 0.66 668 0.99 0.99 String#split >> 13.13 4.09 0.60 665 0.90 0.90 String#index >> 4.16 4.28 0.19 668 0.28 3.29 DataBuffer#grab >> 3.06 4.42 0.14 661 0.21 6.87 >> EmServerExample#receive_data >> 0.88 4.46 0.04 2007 0.02 0.02 Array#length >> 0.66 4.49 0.03 2007 0.01 0.01 Fixnum#> >> 0.66 4.52 0.03 662 0.05 3.31 DataBuffer#append >> >> What is the fastest way to do appending to strings? >> >> This is a really messy since I was messing around trying a bunch >> optimizations and other things, before finding and switching to the EM >> buffer. >> >> class DataBuffer >> FRONT_DELIMITER = "0x5b".hex.chr # ''['' >> #'']''[0].to_s(16).hex.chr >> BACK_DELIMITER = "0x5d".hex.chr # '']'' >> #crazy delimiter because normal ones kept showing up in binary files >> DELIMITER >> "|#{FRONT_DELIMITER}#{FRONT_DELIMITER}#{FRONT_DELIMITER}GT_DELIM#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}|" >> #added to replace, dynamically making these >> DELIM_ESCAPE = /#{Regexp.escape(DELIMITER)}/ >> DELIM_ESCAPE_END = /#{Regexp.escape(DELIMITER)}\Z/ >> >> def initialize >> @unprocessed = "" >> @commands = [] >> end >> >> def grab >> new_messages = @unprocessed.split(DELIM_ESCAPE) >> while new_messages.length > 1 >> @commands << new_messages.shift >> end >> msg_length = new_messages.length >> if msg_length > 0 >> if msg_length == 1 && (@unprocessed=~DELIM_ESCAPE_END) >> # @commands << new_messages.shift >> @commands.push(new_messages.shift) >> @unprocessed = "" >> else >> #put the rest of the last statement back into the buffer >> while(cut=@unprocessed.index(DELIM_ESCAPE)) >> @unprocessed = (@unprocessed[cut.. at unprocessed.length >> ]).sub(DELIMITER,"") >> end >> end >> end >> if @commands.length > 0 >> return @commands.shift >> else >> return nil #if @commands.length==0 >> end >> end >> >> def prepare(str) >> str.to_s+DELIMITER >> end >> >> def append(data) >> # @unprocessed << data >> @unprocessed = @unprocessed + data >> end >> >> end >> >> ... client / server code usage... >> send_data(@buffer.prepare("some_msg")) >> >> def receive_data(data) >> @buffer.append(data) >> while(command = @buffer.grab) >> process(command) >> end >> end >> >> def process(data) >> puts "got data: #{data}" >> end >> ... >> >> I am probably going to look closer at the EM buffer and our code and I am >> sure I will realize something pretty dumb that we did. >> >> Thanks, >> Dan >> >> On Mon, Sep 29, 2008 at 7:49 PM, Aman Gupta <themastermind1 at gmail.com>wrote: >> >>> Do you know what specifically about your buffer was causing issues? >>> Were you using String#<< >>> >>> Aman >>> >>> On Mon, Sep 29, 2008 at 5:45 PM, Dan Mayer <dan at devver.net> wrote: >>> > Thanks for the tip on installing Swiftiply, that made stream_file_data >>> work >>> > perfectly. >>> > >>> > Unfortunately, it didn''t solve our problem. Large files were still >>> taking a >>> > long time to transfer. So I looked deeper into the issue, I had always >>> been >>> > assuming the delay was actually the slow transfer time. Running a >>> profiler >>> > against our code was enlightening as always, it appears our message >>> buffer >>> > is adding a significant amount of the time. If I completely get rid of >>> any >>> > message buffer on the server used to split up multiple messages, either >>> > send_data or stream_file_data (with larger files) drops to less than 1 >>> > second. After searching around a bit I found BufferedTokenizer, which >>> is one >>> > of the protocols for EM. Switching from our apparently bad buffer to >>> the one >>> > included with EM brought us from 10 seconds to 1.2 seconds. >>> > >>> > Thanks for the the help, looks like everything is back on track for our >>> EM >>> > performance. >>> > >>> > thanks, >>> > Dan Mayer >>> > >>> > On Mon, Sep 29, 2008 at 5:56 AM, James Tucker <jftucker at gmail.com> >>> wrote: >>> >> >>> >> On 29 Sep 2008, at 03:46, Kirk Haines wrote: >>> >> >>> >> >>> >> On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <dan at devver.net> wrote: >>> >>> >>> >>> We have been trying to send large files with EventMachine and noticed >>> a >>> >>> few issues. If we just use send data with the contents of a file >>> inside it >>> >>> is slow, and the server eats about 98% of the CPU. The send_file call >>> only >>> >>> supports files up to 32K, which we are sending files as large as 5mb. >>> Lastly >>> >>> we have been unable to use stream_file_data, because it has a >>> dependency on >>> >>> evma_fastfilereader, which I couldn''t seem to find anywhere to >>> install >>> >>> anymore. >>> >> >>> >> Hmmm. I think that was confused oversight on Francis/my part. >>> >> evma_fastfilereader should be part of EM. Until it is, you can get it >>> by >>> >> installing Swiftiply. >>> >> >>> >> I''ve been meaning to come and grab it and commit it to EM, as it''s >>> also >>> >> the last failing test in the suite run from trunk after the last >>> months >>> >> work. Assuming there are no other issues raised, I will get this >>> committed >>> >> to the EM code base. >>> >> >>> >>> >>> >>> Has anyone been sending large file with eventmachine that could share >>> >>> some tips. In our case we are using EM for both the client and the >>> server. >>> >>> We are trying to sync over a directory of many files, is this just >>> not a >>> >>> recommended usage of EM? Besides looking for solutions to make this >>> work >>> >>> better on EM, are there other recommendations of better ways to send >>> and >>> >>> receive large amounts of file data with Ruby? >>> >> >>> >> Using stream_file_data I regularly transfer very large files with >>> >> Swiftiply. >>> >> >>> >> >>> >> Kirk Haines >>> >> _______________________________________________ >>> >> Eventmachine-talk mailing list >>> >> Eventmachine-talk at rubyforge.org >>> >> http://rubyforge.org/mailman/listinfo/eventmachine-talk >>> >> >>> >> _______________________________________________ >>> >> Eventmachine-talk mailing list >>> >> Eventmachine-talk at rubyforge.org >>> >> http://rubyforge.org/mailman/listinfo/eventmachine-talk >>> > >>> > >>> > >>> > -- >>> > Dan Mayer >>> > Co-founder, Devver >>> > (http://devver.net) >>> > follow us on twitter: http://twitter.com/devver >>> > My Blog (http://mayerdan.com) >>> > >>> > _______________________________________________ >>> > Eventmachine-talk mailing list >>> > Eventmachine-talk at rubyforge.org >>> > http://rubyforge.org/mailman/listinfo/eventmachine-talk >>> > >>> _______________________________________________ >>> Eventmachine-talk mailing list >>> Eventmachine-talk at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/eventmachine-talk >>> >> >> >> >> -- >> Dan Mayer >> Co-founder, Devver >> (http://devver.net) >> follow us on twitter: http://twitter.com/devver >> My Blog (http://mayerdan.com) >> _______________________________________________ >> Eventmachine-talk mailing list >> Eventmachine-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/eventmachine-talk >> >> >> >> _______________________________________________ >> Eventmachine-talk mailing list >> Eventmachine-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/eventmachine-talk >> > > > > -- > Dan Mayer > Co-founder, Devver > (http://devver.net) > follow us on twitter: http://twitter.com/devver > My Blog (http://mayerdan.com) >-- Dan Mayer Co-founder, Devver (http://devver.net) follow us on twitter: http://twitter.com/devver My Blog (http://mayerdan.com) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/eventmachine-talk/attachments/20081008/348317a5/attachment.html>
> If anyone has any thoughts, tips, or alternative buffers let me know.You might also try Tony''s C buffer: http://github.com/igrigorik/em-http-request/tree/master/ext/buffer/em_buffer.c http://github.com/tarcieri/rev/tree/master/ext/rev/rev_buffer.c Aman> > thanks, > Dan > > On Tue, Sep 30, 2008 at 10:55 PM, Dan Mayer <dan at devver.net> wrote: >> >> Sure no problem. Sorry it took me so long to get back to this, I got >> slammed with some items that I had to take care of today. >> >> I ran it on a small test set of data, and the results were very similar... >> The current tokenizer in EM seemed to outperform your pastie by very small >> amounts. Tomorrow I can run it against a much large and real project, and I >> will let you know if I notice any significant differences. >> >> I am cleaning up some of the code I have been using, and will likely make >> a post about various methods of sending files through EM in the next couple >> days. I noticed it wasn''t the easiest to find examples of the various >> options just out on the web, so it might help a few people running into >> similar problems. >> >> peace, >> Dan Mayer >> >> On Tue, Sep 30, 2008 at 5:25 AM, James Tucker <jftucker at gmail.com> wrote: >>> >>> Dan, >>> If you have some time, would you be able to use your data sets against >>> this other BufferedTokenizer implementation: >>> http://pastie.textmate.org/private/ykjtuipjedrwgzwgggu5w >>> There are varying cases for performance depending on the specific data >>> sets and chunk size being added to the buffer. Ruby''s GC certainly starts to >>> cause performance issues with too many objects, so I''m trying to strike a >>> balance. >>> Any input would be welcome, >>> Kind regards, >>> J. >>> >>> On 30 Sep 2008, at 03:07, Dan Mayer wrote: >>> >>> Aman (and hopefully others interested on the list), >>> >>> Here is a profiler dump after I optimized a bit, I got ours from 26ish >>> seconds down to 10 by getting rid of things like String#<< >>> 14.44 3.49 0.66 668 0.99 0.99 String#split >>> 13.13 4.09 0.60 665 0.90 0.90 String#index >>> 4.16 4.28 0.19 668 0.28 3.29 DataBuffer#grab >>> 3.06 4.42 0.14 661 0.21 6.87 >>> EmServerExample#receive_data >>> 0.88 4.46 0.04 2007 0.02 0.02 Array#length >>> 0.66 4.49 0.03 2007 0.01 0.01 Fixnum#> >>> 0.66 4.52 0.03 662 0.05 3.31 DataBuffer#append >>> >>> What is the fastest way to do appending to strings? >>> >>> This is a really messy since I was messing around trying a bunch >>> optimizations and other things, before finding and switching to the EM >>> buffer. >>> >>> class DataBuffer >>> FRONT_DELIMITER = "0x5b".hex.chr # ''['' >>> #'']''[0].to_s(16).hex.chr >>> BACK_DELIMITER = "0x5d".hex.chr # '']'' >>> #crazy delimiter because normal ones kept showing up in binary files >>> DELIMITER >>> "|#{FRONT_DELIMITER}#{FRONT_DELIMITER}#{FRONT_DELIMITER}GT_DELIM#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}|" >>> #added to replace, dynamically making these >>> DELIM_ESCAPE = /#{Regexp.escape(DELIMITER)}/ >>> DELIM_ESCAPE_END = /#{Regexp.escape(DELIMITER)}\Z/ >>> >>> def initialize >>> @unprocessed = "" >>> @commands = [] >>> end >>> >>> def grab >>> new_messages = @unprocessed.split(DELIM_ESCAPE) >>> while new_messages.length > 1 >>> @commands << new_messages.shift >>> end >>> msg_length = new_messages.length >>> if msg_length > 0 >>> if msg_length == 1 && (@unprocessed=~DELIM_ESCAPE_END) >>> # @commands << new_messages.shift >>> @commands.push(new_messages.shift) >>> @unprocessed = "" >>> else >>> #put the rest of the last statement back into the buffer >>> while(cut=@unprocessed.index(DELIM_ESCAPE)) >>> @unprocessed >>> (@unprocessed[cut.. at unprocessed.length]).sub(DELIMITER,"") >>> end >>> end >>> end >>> if @commands.length > 0 >>> return @commands.shift >>> else >>> return nil #if @commands.length==0 >>> end >>> end >>> >>> def prepare(str) >>> str.to_s+DELIMITER >>> end >>> >>> def append(data) >>> # @unprocessed << data >>> @unprocessed = @unprocessed + data >>> end >>> >>> end >>> >>> ... client / server code usage... >>> send_data(@buffer.prepare("some_msg")) >>> >>> def receive_data(data) >>> @buffer.append(data) >>> while(command = @buffer.grab) >>> process(command) >>> end >>> end >>> >>> def process(data) >>> puts "got data: #{data}" >>> end >>> ... >>> >>> I am probably going to look closer at the EM buffer and our code and I am >>> sure I will realize something pretty dumb that we did. >>> >>> Thanks, >>> Dan >>> >>> On Mon, Sep 29, 2008 at 7:49 PM, Aman Gupta <themastermind1 at gmail.com> >>> wrote: >>>> >>>> Do you know what specifically about your buffer was causing issues? >>>> Were you using String#<< >>>> >>>> Aman >>>> >>>> On Mon, Sep 29, 2008 at 5:45 PM, Dan Mayer <dan at devver.net> wrote: >>>> > Thanks for the tip on installing Swiftiply, that made stream_file_data >>>> > work >>>> > perfectly. >>>> > >>>> > Unfortunately, it didn''t solve our problem. Large files were still >>>> > taking a >>>> > long time to transfer. So I looked deeper into the issue, I had always >>>> > been >>>> > assuming the delay was actually the slow transfer time. Running a >>>> > profiler >>>> > against our code was enlightening as always, it appears our message >>>> > buffer >>>> > is adding a significant amount of the time. If I completely get rid of >>>> > any >>>> > message buffer on the server used to split up multiple messages, >>>> > either >>>> > send_data or stream_file_data (with larger files) drops to less than 1 >>>> > second. After searching around a bit I found BufferedTokenizer, which >>>> > is one >>>> > of the protocols for EM. Switching from our apparently bad buffer to >>>> > the one >>>> > included with EM brought us from 10 seconds to 1.2 seconds. >>>> > >>>> > Thanks for the the help, looks like everything is back on track for >>>> > our EM >>>> > performance. >>>> > >>>> > thanks, >>>> > Dan Mayer >>>> > >>>> > On Mon, Sep 29, 2008 at 5:56 AM, James Tucker <jftucker at gmail.com> >>>> > wrote: >>>> >> >>>> >> On 29 Sep 2008, at 03:46, Kirk Haines wrote: >>>> >> >>>> >> >>>> >> On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <dan at devver.net> wrote: >>>> >>> >>>> >>> We have been trying to send large files with EventMachine and >>>> >>> noticed a >>>> >>> few issues. If we just use send data with the contents of a file >>>> >>> inside it >>>> >>> is slow, and the server eats about 98% of the CPU. The send_file >>>> >>> call only >>>> >>> supports files up to 32K, which we are sending files as large as >>>> >>> 5mb. Lastly >>>> >>> we have been unable to use stream_file_data, because it has a >>>> >>> dependency on >>>> >>> evma_fastfilereader, which I couldn''t seem to find anywhere to >>>> >>> install >>>> >>> anymore. >>>> >> >>>> >> Hmmm. I think that was confused oversight on Francis/my part. >>>> >> evma_fastfilereader should be part of EM. Until it is, you can get >>>> >> it by >>>> >> installing Swiftiply. >>>> >> >>>> >> I''ve been meaning to come and grab it and commit it to EM, as it''s >>>> >> also >>>> >> the last failing test in the suite run from trunk after the last >>>> >> months >>>> >> work. Assuming there are no other issues raised, I will get this >>>> >> committed >>>> >> to the EM code base. >>>> >> >>>> >>> >>>> >>> Has anyone been sending large file with eventmachine that could >>>> >>> share >>>> >>> some tips. In our case we are using EM for both the client and the >>>> >>> server. >>>> >>> We are trying to sync over a directory of many files, is this just >>>> >>> not a >>>> >>> recommended usage of EM? Besides looking for solutions to make this >>>> >>> work >>>> >>> better on EM, are there other recommendations of better ways to send >>>> >>> and >>>> >>> receive large amounts of file data with Ruby? >>>> >> >>>> >> Using stream_file_data I regularly transfer very large files with >>>> >> Swiftiply. >>>> >> >>>> >> >>>> >> Kirk Haines >>>> >> _______________________________________________ >>>> >> Eventmachine-talk mailing list >>>> >> Eventmachine-talk at rubyforge.org >>>> >> http://rubyforge.org/mailman/listinfo/eventmachine-talk >>>> >> >>>> >> _______________________________________________ >>>> >> Eventmachine-talk mailing list >>>> >> Eventmachine-talk at rubyforge.org >>>> >> http://rubyforge.org/mailman/listinfo/eventmachine-talk >>>> > >>>> > >>>> > >>>> > -- >>>> > Dan Mayer >>>> > Co-founder, Devver >>>> > (http://devver.net) >>>> > follow us on twitter: http://twitter.com/devver >>>> > My Blog (http://mayerdan.com) >>>> > >>>> > _______________________________________________ >>>> > Eventmachine-talk mailing list >>>> > Eventmachine-talk at rubyforge.org >>>> > http://rubyforge.org/mailman/listinfo/eventmachine-talk >>>> > >>>> _______________________________________________ >>>> Eventmachine-talk mailing list >>>> Eventmachine-talk at rubyforge.org >>>> http://rubyforge.org/mailman/listinfo/eventmachine-talk >>> >>> >>> >>> -- >>> Dan Mayer >>> Co-founder, Devver >>> (http://devver.net) >>> follow us on twitter: http://twitter.com/devver >>> My Blog (http://mayerdan.com) >>> _______________________________________________ >>> Eventmachine-talk mailing list >>> Eventmachine-talk at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/eventmachine-talk >>> >>> _______________________________________________ >>> Eventmachine-talk mailing list >>> Eventmachine-talk at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/eventmachine-talk >> >> >> >> -- >> Dan Mayer >> Co-founder, Devver >> (http://devver.net) >> follow us on twitter: http://twitter.com/devver >> My Blog (http://mayerdan.com) > > > > -- > Dan Mayer > Co-founder, Devver > (http://devver.net) > follow us on twitter: http://twitter.com/devver > My Blog (http://mayerdan.com) > > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk >
Tony Arcieri
2008-Oct-08 22:47 UTC
[Eventmachine-talk] EM sending and receiving large files
Although that buffer may be the source of the problems you were experiencing with Rev... that''d be good to know. On Wed, Oct 8, 2008 at 10:17 PM, Aman Gupta <themastermind1 at gmail.com>wrote:> > If anyone has any thoughts, tips, or alternative buffers let me know. > > You might also try Tony''s C buffer: > > > http://github.com/igrigorik/em-http-request/tree/master/ext/buffer/em_buffer.c > http://github.com/tarcieri/rev/tree/master/ext/rev/rev_buffer.c > > Aman > > > > > thanks, > > Dan > > > > On Tue, Sep 30, 2008 at 10:55 PM, Dan Mayer <dan at devver.net> wrote: > >> > >> Sure no problem. Sorry it took me so long to get back to this, I got > >> slammed with some items that I had to take care of today. > >> > >> I ran it on a small test set of data, and the results were very > similar... > >> The current tokenizer in EM seemed to outperform your pastie by very > small > >> amounts. Tomorrow I can run it against a much large and real project, > and I > >> will let you know if I notice any significant differences. > >> > >> I am cleaning up some of the code I have been using, and will likely > make > >> a post about various methods of sending files through EM in the next > couple > >> days. I noticed it wasn''t the easiest to find examples of the various > >> options just out on the web, so it might help a few people running into > >> similar problems. > >> > >> peace, > >> Dan Mayer > >> > >> On Tue, Sep 30, 2008 at 5:25 AM, James Tucker <jftucker at gmail.com> > wrote: > >>> > >>> Dan, > >>> If you have some time, would you be able to use your data sets against > >>> this other BufferedTokenizer implementation: > >>> http://pastie.textmate.org/private/ykjtuipjedrwgzwgggu5w > >>> There are varying cases for performance depending on the specific data > >>> sets and chunk size being added to the buffer. Ruby''s GC certainly > starts to > >>> cause performance issues with too many objects, so I''m trying to strike > a > >>> balance. > >>> Any input would be welcome, > >>> Kind regards, > >>> J. > >>> > >>> On 30 Sep 2008, at 03:07, Dan Mayer wrote: > >>> > >>> Aman (and hopefully others interested on the list), > >>> > >>> Here is a profiler dump after I optimized a bit, I got ours from 26ish > >>> seconds down to 10 by getting rid of things like String#<< > >>> 14.44 3.49 0.66 668 0.99 0.99 String#split > >>> 13.13 4.09 0.60 665 0.90 0.90 String#index > >>> 4.16 4.28 0.19 668 0.28 3.29 DataBuffer#grab > >>> 3.06 4.42 0.14 661 0.21 6.87 > >>> EmServerExample#receive_data > >>> 0.88 4.46 0.04 2007 0.02 0.02 Array#length > >>> 0.66 4.49 0.03 2007 0.01 0.01 Fixnum#> > >>> 0.66 4.52 0.03 662 0.05 3.31 DataBuffer#append > >>> > >>> What is the fastest way to do appending to strings? > >>> > >>> This is a really messy since I was messing around trying a bunch > >>> optimizations and other things, before finding and switching to the EM > >>> buffer. > >>> > >>> class DataBuffer > >>> FRONT_DELIMITER = "0x5b".hex.chr # ''['' > >>> #'']''[0].to_s(16).hex.chr > >>> BACK_DELIMITER = "0x5d".hex.chr # '']'' > >>> #crazy delimiter because normal ones kept showing up in binary files > >>> DELIMITER > >>> > "|#{FRONT_DELIMITER}#{FRONT_DELIMITER}#{FRONT_DELIMITER}GT_DELIM#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}#{BACK_DELIMITER}|" > >>> #added to replace, dynamically making these > >>> DELIM_ESCAPE = /#{Regexp.escape(DELIMITER)}/ > >>> DELIM_ESCAPE_END = /#{Regexp.escape(DELIMITER)}\Z/ > >>> > >>> def initialize > >>> @unprocessed = "" > >>> @commands = [] > >>> end > >>> > >>> def grab > >>> new_messages = @unprocessed.split(DELIM_ESCAPE) > >>> while new_messages.length > 1 > >>> @commands << new_messages.shift > >>> end > >>> msg_length = new_messages.length > >>> if msg_length > 0 > >>> if msg_length == 1 && (@unprocessed=~DELIM_ESCAPE_END) > >>> # @commands << new_messages.shift > >>> @commands.push(new_messages.shift) > >>> @unprocessed = "" > >>> else > >>> #put the rest of the last statement back into the buffer > >>> while(cut=@unprocessed.index(DELIM_ESCAPE)) > >>> @unprocessed > >>> (@unprocessed[cut.. at unprocessed.length]).sub(DELIMITER,"") > >>> end > >>> end > >>> end > >>> if @commands.length > 0 > >>> return @commands.shift > >>> else > >>> return nil #if @commands.length==0 > >>> end > >>> end > >>> > >>> def prepare(str) > >>> str.to_s+DELIMITER > >>> end > >>> > >>> def append(data) > >>> # @unprocessed << data > >>> @unprocessed = @unprocessed + data > >>> end > >>> > >>> end > >>> > >>> ... client / server code usage... > >>> send_data(@buffer.prepare("some_msg")) > >>> > >>> def receive_data(data) > >>> @buffer.append(data) > >>> while(command = @buffer.grab) > >>> process(command) > >>> end > >>> end > >>> > >>> def process(data) > >>> puts "got data: #{data}" > >>> end > >>> ... > >>> > >>> I am probably going to look closer at the EM buffer and our code and I > am > >>> sure I will realize something pretty dumb that we did. > >>> > >>> Thanks, > >>> Dan > >>> > >>> On Mon, Sep 29, 2008 at 7:49 PM, Aman Gupta <themastermind1 at gmail.com> > >>> wrote: > >>>> > >>>> Do you know what specifically about your buffer was causing issues? > >>>> Were you using String#<< > >>>> > >>>> Aman > >>>> > >>>> On Mon, Sep 29, 2008 at 5:45 PM, Dan Mayer <dan at devver.net> wrote: > >>>> > Thanks for the tip on installing Swiftiply, that made > stream_file_data > >>>> > work > >>>> > perfectly. > >>>> > > >>>> > Unfortunately, it didn''t solve our problem. Large files were still > >>>> > taking a > >>>> > long time to transfer. So I looked deeper into the issue, I had > always > >>>> > been > >>>> > assuming the delay was actually the slow transfer time. Running a > >>>> > profiler > >>>> > against our code was enlightening as always, it appears our message > >>>> > buffer > >>>> > is adding a significant amount of the time. If I completely get rid > of > >>>> > any > >>>> > message buffer on the server used to split up multiple messages, > >>>> > either > >>>> > send_data or stream_file_data (with larger files) drops to less than > 1 > >>>> > second. After searching around a bit I found BufferedTokenizer, > which > >>>> > is one > >>>> > of the protocols for EM. Switching from our apparently bad buffer to > >>>> > the one > >>>> > included with EM brought us from 10 seconds to 1.2 seconds. > >>>> > > >>>> > Thanks for the the help, looks like everything is back on track for > >>>> > our EM > >>>> > performance. > >>>> > > >>>> > thanks, > >>>> > Dan Mayer > >>>> > > >>>> > On Mon, Sep 29, 2008 at 5:56 AM, James Tucker <jftucker at gmail.com> > >>>> > wrote: > >>>> >> > >>>> >> On 29 Sep 2008, at 03:46, Kirk Haines wrote: > >>>> >> > >>>> >> > >>>> >> On Sun, Sep 28, 2008 at 8:18 PM, Dan Mayer <dan at devver.net> wrote: > >>>> >>> > >>>> >>> We have been trying to send large files with EventMachine and > >>>> >>> noticed a > >>>> >>> few issues. If we just use send data with the contents of a file > >>>> >>> inside it > >>>> >>> is slow, and the server eats about 98% of the CPU. The send_file > >>>> >>> call only > >>>> >>> supports files up to 32K, which we are sending files as large as > >>>> >>> 5mb. Lastly > >>>> >>> we have been unable to use stream_file_data, because it has a > >>>> >>> dependency on > >>>> >>> evma_fastfilereader, which I couldn''t seem to find anywhere to > >>>> >>> install > >>>> >>> anymore. > >>>> >> > >>>> >> Hmmm. I think that was confused oversight on Francis/my part. > >>>> >> evma_fastfilereader should be part of EM. Until it is, you can get > >>>> >> it by > >>>> >> installing Swiftiply. > >>>> >> > >>>> >> I''ve been meaning to come and grab it and commit it to EM, as it''s > >>>> >> also > >>>> >> the last failing test in the suite run from trunk after the last > >>>> >> months > >>>> >> work. Assuming there are no other issues raised, I will get this > >>>> >> committed > >>>> >> to the EM code base. > >>>> >> > >>>> >>> > >>>> >>> Has anyone been sending large file with eventmachine that could > >>>> >>> share > >>>> >>> some tips. In our case we are using EM for both the client and the > >>>> >>> server. > >>>> >>> We are trying to sync over a directory of many files, is this just > >>>> >>> not a > >>>> >>> recommended usage of EM? Besides looking for solutions to make > this > >>>> >>> work > >>>> >>> better on EM, are there other recommendations of better ways to > send > >>>> >>> and > >>>> >>> receive large amounts of file data with Ruby? > >>>> >> > >>>> >> Using stream_file_data I regularly transfer very large files with > >>>> >> Swiftiply. > >>>> >> > >>>> >> > >>>> >> Kirk Haines > >>>> >> _______________________________________________ > >>>> >> Eventmachine-talk mailing list > >>>> >> Eventmachine-talk at rubyforge.org > >>>> >> http://rubyforge.org/mailman/listinfo/eventmachine-talk > >>>> >> > >>>> >> _______________________________________________ > >>>> >> Eventmachine-talk mailing list > >>>> >> Eventmachine-talk at rubyforge.org > >>>> >> http://rubyforge.org/mailman/listinfo/eventmachine-talk > >>>> > > >>>> > > >>>> > > >>>> > -- > >>>> > Dan Mayer > >>>> > Co-founder, Devver > >>>> > (http://devver.net) > >>>> > follow us on twitter: http://twitter.com/devver > >>>> > My Blog (http://mayerdan.com) > >>>> > > >>>> > _______________________________________________ > >>>> > Eventmachine-talk mailing list > >>>> > Eventmachine-talk at rubyforge.org > >>>> > http://rubyforge.org/mailman/listinfo/eventmachine-talk > >>>> > > >>>> _______________________________________________ > >>>> Eventmachine-talk mailing list > >>>> Eventmachine-talk at rubyforge.org > >>>> http://rubyforge.org/mailman/listinfo/eventmachine-talk > >>> > >>> > >>> > >>> -- > >>> Dan Mayer > >>> Co-founder, Devver > >>> (http://devver.net) > >>> follow us on twitter: http://twitter.com/devver > >>> My Blog (http://mayerdan.com) > >>> _______________________________________________ > >>> Eventmachine-talk mailing list > >>> Eventmachine-talk at rubyforge.org > >>> http://rubyforge.org/mailman/listinfo/eventmachine-talk > >>> > >>> _______________________________________________ > >>> Eventmachine-talk mailing list > >>> Eventmachine-talk at rubyforge.org > >>> http://rubyforge.org/mailman/listinfo/eventmachine-talk > >> > >> > >> > >> -- > >> Dan Mayer > >> Co-founder, Devver > >> (http://devver.net) > >> follow us on twitter: http://twitter.com/devver > >> My Blog (http://mayerdan.com) > > > > > > > > -- > > Dan Mayer > > Co-founder, Devver > > (http://devver.net) > > follow us on twitter: http://twitter.com/devver > > My Blog (http://mayerdan.com) > > > > _______________________________________________ > > Eventmachine-talk mailing list > > Eventmachine-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/eventmachine-talk > > > _______________________________________________ > Eventmachine-talk mailing list > Eventmachine-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/eventmachine-talk >-- Tony Arcieri medioh.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/eventmachine-talk/attachments/20081008/89f895f1/attachment-0001.html>