thr3ads.net - Backgroundrb devel - [Backgroundrb-devel] Backgroundrb fixes for transfering large amounts of data [Jun 2008]

If this information is useful, please help other people find it:
Share via:

Mike Evans

2008-Jun-10 14:42 UTC

[Backgroundrb-devel] Backgroundrb fixes for transfering large amounts of data

Hemant

We''ve continued testing our application with backgroundrb and found a
couple of other problems when transfering large amounts of data.  Both
of these problems are still present in the github version of code.

The first problem is an exception in the Marshal.load call in the
receive_data method in the Packet::MetaPimp class.  The root cause is in
the BinParser module, in the arm of code handling the parser state 1
(reading in the data).  The issue is that, at the marked line of code
the pack_data array will be at most @numeric_length entries because of
the format string passed to the unpack call.  This results in the code
dropping a chunk of data and then hitting the exception in a subsequent
Marshal.load call.

      elsif @parser_state == 1
        pack_data,remaining = new_data.unpack("a#{@numeric_length}a*")
        if pack_data.length < @numeric_length
          @data << pack_data
          @numeric_length = @numeric_length - pack_data.length
        elsif pack_data.length == @numeric_length     <======== this
should be "elsif remaining.length == 0"
          @data << pack_data
          extracter_block.call(@data.join)
          @data = []
          @parser_state = 0
          @length_string = ""
          @numeric_length = 0
        else
          @data << pack_data
          extracter_block.call(@data.join)
          @data = []
          @parser_state = 0
          @length_string = ""
          @numeric_length = 0
          extract(remaining,&extracter_block)
        end
      end

The second problem we hit was ask_status repeatedly returning nil.  The
root cause of this problem is in the read_object method of the
BackgrounDRb::WorkerProxy class when a data record is large enough to
cause connection.read_nonblock to throw the Errno::EAGAIN exception
multiple times.  We changed the code to make sure read_nonblock is
called repeatedly until the tokenizer finds a complete record, and this
fixed the problem.

  def read_object
    begin
      while (true)
        sock_data = ""
        begin
          while(sock_data << @connection.read_nonblock(1023)); end
        rescue Errno::EAGAIN
          @tokenizer.extract(sock_data) { |b_data| return b_data }
        end
      end
    rescue
      raise BackgrounDRb::BdrbConnError.new("Not able to connect")
    end
  end

Regards, Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20080610/d02a94bb/attachment.html>

Hemant Kumar

2008-Jun-10 15:01 UTC

head link

[Backgroundrb-devel] Backgroundrb fixes for transfering large amounts of data

Mike Evans wrote:>
> Hemant
>
> We''ve continued testing our application with backgroundrb and
found a
> couple of other problems when transfering large amounts of data.  Both 
> of these problems are still present in the github version of code.
>
> The first problem is an exception in the Marshal.load call in the 
> receive_data method in the Packet::MetaPimp class.  The root cause is 
> in the BinParser module, in the arm of code handling the parser state 
> 1 (reading in the data).  The issue is that, at the marked line of 
> code the pack_data array will be at most @numeric_length entries 
> because of the format string passed to the unpack call.  This results 
> in the code dropping a chunk of data and then hitting the exception in 
> a subsequent Marshal.load call.
>
>       elsif @parser_state == 1
>         pack_data,remaining =
new_data.unpack("a#{@numeric_length}a*")
>         if pack_data.length < @numeric_length
>           @data << pack_data
>           @numeric_length = @numeric_length - pack_data.length
>         elsif pack_data.length == @numeric_length     <======== this 
> should be "elsif remaining.length == 0"
>           @data << pack_data
>           extracter_block.call(@data.join)
>           @data = []
>           @parser_state = 0
>           @length_string = ""
>           @numeric_length = 0
>         else
>           @data << pack_data
>           extracter_block.call(@data.join)
>           @data = []
>           @parser_state = 0
>           @length_string = ""
>           @numeric_length = 0
>           extract(remaining,&extracter_block)
>         end
>       end
>
> The second problem we hit was ask_status repeatedly returning nil.  
> The root cause of this problem is in the read_object method of the 
> BackgrounDRb::WorkerProxy class when a data record is large enough to 
> cause connection.read_nonblock to throw the Errno::EAGAIN exception 
> multiple times.  We changed the code to make sure read_nonblock is 
> called repeatedly until the tokenizer finds a complete record, and 
> this fixed the problem.
>
>   def read_object
>     begin
>       while (true)
>         sock_data = ""
>         begin
>           while(sock_data << @connection.read_nonblock(1023)); end
>         rescue Errno::EAGAIN
>           @tokenizer.extract(sock_data) { |b_data| return b_data }
>         end
>       end
>     rescue
>       raise BackgrounDRb::BdrbConnError.new("Not able to
connect")
>     end
>   end
>
> Regards, Mike
>If you update to THE latest github version of BackgrounDRb, you will 
find that above thing is already fixed and read is no more nonblocking 
for clients ( blocking read makes much more sense for clients ).

Also, I have made BinParser class to be iterative ( just for better 
stability, assuming your data is large enough to throw StackLevel Too 
deep errors), but thats not yet pushed to master version of packet. I 
will implement your fix tonight, when I push the changes to master 
repository of packet.

Mike Evans

2008-Jun-10 15:06 UTC

head link

[Backgroundrb-devel] Backgroundrb fixes for transfering large amounts of data

Hemant

Thanks for the quick response - I''ll admit I hadn''t resynced
for a week
or so!  Agreed that non-blocking reads make more sense for the client.

We''re still testing with a patched version of packet-0.1.5 and the svn
release of backgroundrb because of the problems we hit getting RoR model
objects to load properly on the worker tasks with the latest code - have
you found a fix for this yet?

Mike 

-----Original Message-----
From: Hemant Kumar [mailto:gethemant at gmail.com] 
Sent: 10 June 2008 16:01
To: Mike Evans
Cc: backgroundrb-devel at rubyforge.org
Subject: Re: Backgroundrb fixes for transfering large amounts of data

Mike Evans wrote:>
> Hemant
>
> We''ve continued testing our application with backgroundrb and
found a
> couple of other problems when transfering large amounts of data.  Both
> of these problems are still present in the github version of code.
>
> The first problem is an exception in the Marshal.load call in the 
> receive_data method in the Packet::MetaPimp class.  The root cause is 
> in the BinParser module, in the arm of code handling the parser state
> 1 (reading in the data).  The issue is that, at the marked line of 
> code the pack_data array will be at most @numeric_length entries 
> because of the format string passed to the unpack call.  This results 
> in the code dropping a chunk of data and then hitting the exception in
> a subsequent Marshal.load call.
>
>       elsif @parser_state == 1
>         pack_data,remaining =
new_data.unpack("a#{@numeric_length}a*")
>         if pack_data.length < @numeric_length
>           @data << pack_data
>           @numeric_length = @numeric_length - pack_data.length
>         elsif pack_data.length == @numeric_length     <======== this 
> should be "elsif remaining.length == 0"
>           @data << pack_data
>           extracter_block.call(@data.join)
>           @data = []
>           @parser_state = 0
>           @length_string = ""
>           @numeric_length = 0
>         else
>           @data << pack_data
>           extracter_block.call(@data.join)
>           @data = []
>           @parser_state = 0
>           @length_string = ""
>           @numeric_length = 0
>           extract(remaining,&extracter_block)
>         end
>       end
>
> The second problem we hit was ask_status repeatedly returning nil.  
> The root cause of this problem is in the read_object method of the 
> BackgrounDRb::WorkerProxy class when a data record is large enough to 
> cause connection.read_nonblock to throw the Errno::EAGAIN exception 
> multiple times.  We changed the code to make sure read_nonblock is 
> called repeatedly until the tokenizer finds a complete record, and 
> this fixed the problem.
>
>   def read_object
>     begin
>       while (true)
>         sock_data = ""
>         begin
>           while(sock_data << @connection.read_nonblock(1023)); end
>         rescue Errno::EAGAIN
>           @tokenizer.extract(sock_data) { |b_data| return b_data }
>         end
>       end
>     rescue
>       raise BackgrounDRb::BdrbConnError.new("Not able to
connect")
>     end
>   end
>
> Regards, Mike
>If you update to THE latest github version of BackgrounDRb, you will
find that above thing is already fixed and read is no more nonblocking
for clients ( blocking read makes much more sense for clients ).

Also, I have made BinParser class to be iterative ( just for better
stability, assuming your data is large enough to throw StackLevel Too
deep errors), but thats not yet pushed to master version of packet. I
will implement your fix tonight, when I push the changes to master
repository of packet.

hemant

2008-Jun-10 17:40 UTC

head link

[Backgroundrb-devel] Backgroundrb fixes for transfering large amounts of data

On Tue, Jun 10, 2008 at 8:36 PM, Mike Evans <mike at metaswitch.com>
wrote:> Hemant
>
> Thanks for the quick response - I''ll admit I hadn''t
resynced for a week
> or so!  Agreed that non-blocking reads make more sense for the client.
You meant blocking!
>
> We''re still testing with a patched version of packet-0.1.5 and the
svn
> release of backgroundrb because of the problems we hit getting RoR model
> objects to load properly on the worker tasks with the latest code - have
> you found a fix for this yet?
>
Yeah,  you will find model loading behavior much more reliable in
current git version of backgroundrb. Also, I did get in your fix, so
no need to worry. Basically, there are only two ways:

1. Get that damn regexp correct.
2. Load entire models, plugins everything explicitly, because just
loading environment.rb, doesn''t load models by default in rails.

I went for option#1, to keep your bdrb worker process lean.
All, in all, with git version of bdrb and packet, everything should be
much more smoother.

Maybe Matching Threads

Search for more seemingly similar threads

Backgroundrb devel - Jun 2008 - Backgroundrb fixes for transfering large amounts of data

[Backgroundrb-devel] Backgroundrb fixes for transfering large amounts of data

[Backgroundrb-devel] Backgroundrb fixes for transfering large amounts of data

[Backgroundrb-devel] Backgroundrb fixes for transfering large amounts of data

[Backgroundrb-devel] Backgroundrb fixes for transfering large amounts of data

Maybe Matching Threads