thr3ads.net - Backgroundrb devel - [Backgroundrb-devel] Problems sending large results with backgroundrb [May 2008]

If this information is useful, please help other people find it:
Share via:

Mike Evans

2008-May-20 19:30 UTC

[Backgroundrb-devel] Problems sending large results with backgroundrb

I''m working on an application that does extensive database searching.
These searches can take a long time, so we have been working on moving
the searches to a backgroundrb worker task so we can provide a sexy AJAX
progress bar, and populate the search results as they are available.
All of this seems to work fine until the size of the search results gets
sufficiently large, when we start to hit exceptions in backgroundrb
(most likely in the packet layer). We are using packet-0.5.1 and
backgroundrb from the latest svn mirror.

We have found and fixed one problem in the packet sender. This is
triggered when the non-blocking send in NbioHelper::send_once cannot
send the entire buffer, resulting in an exception in the line

write_scheduled[fileno] ||= connections[fileno].instance

in Core::schedule_write because connections[fileno] is nil. I can''t
claim to fully understand the code, but I think there are two problems
here.

The main issue seems to be that when Core::handle_write_event calls
write_and_schedule to schedule the write, it doesn''t clear out
internal_scheduled_write[fileno]. It looks like the code is expecting
the cancel_write call at the end of write_and_schedule to clear it out,
but this doesn''t happen if there is enough queued data to cause the
non-blocking write to only partially succeed again. In this case,
Core::schedule_write is called again, and because
internal_schedule_write[fileno] has not been cleared out, the code drops
through to the second if test, then hits the above exception. We fixed
this by adding the line

internal_scheduled_write.delete(fileno)

immediately before the call to write_and_schedule in
Core::handle_write_event.

The secondary issue is that the connections[fileno] structure is not
getting populated for this connection - I''m guessing because it is an
internal socket rather than a network socket, but I couldn''t be sure.
We changed the second if test in Core::schedule_write to

elsif write_scheduled[fileno].nil? && !connections[fileno].nil?

to firewall against this, but we are not sure if this is the right fix.

We are now hitting problems in the Packet::MetaPimp module receiving the
data, usually an exception in the Marshal.load call in
MetaPimp::receive_data. We suspect this is caused by the packet code
corrupting the data somewhere, probably because we are sending such
large arrays of results (the repro I am working on at the moment is
trying to marshal over 200k of data). We''ve been trying to put extra
diagnostics in the code so we can see what is happening, but if we edit
puts statements into the code we only seem to get output from the end of
the connection that hits an exception and so far our attempts to make
logger objects available throughout the code have failed. We therefore
thought we would ask for help - either to see whether this is a known
problem, or whether there is a recommended way to add diagnostics to the
packet code.

I''m also open to ideas as to better ways to solve the problem!

Thanks in advance,

Mike

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20080520/aad353ae/attachment.html>

hemant

2008-May-21 04:12 UTC

head link

[Backgroundrb-devel] Problems sending large results with backgroundrb

On Wed, May 21, 2008 at 1:00 AM, Mike Evans <mike at metaswitch.com>
wrote:> I''m working on an application that does extensive database
searching.  These
> searches can take a long time, so we have been working on moving the
> searches to a backgroundrb worker task so we can provide a sexy AJAX
> progress bar, and populate the search results as they are available.  All
of
> this seems to work fine until the size of the search results gets
> sufficiently large, when we start to hit exceptions in backgroundrb (most
> likely in the packet layer).  We are using packet-0.5.1 and backgroundrb
> from the latest svn mirror.
>
> We have found and fixed one problem in the packet sender.  This is
triggered
> when the non-blocking send in NbioHelper::send_once cannot send the entire
> buffer, resulting in an exception in the line
>
>       write_scheduled[fileno] ||= connections[fileno].instance
>
> in Core::schedule_write because connections[fileno] is nil.  I
can''t claim
> to fully understand the code, but I think there are two problems here.
>
> The main issue seems to be that when Core::handle_write_event calls
> write_and_schedule to schedule the write, it doesn''t clear out
> internal_scheduled_write[fileno].  It looks like the code is expecting the
> cancel_write call at the end of write_and_schedule to clear it out, but
this
> doesn''t happen if there is enough queued data to cause the
non-blocking
> write to only partially succeed again.  In this case, Core::schedule_write
> is called again, and because internal_schedule_write[fileno] has not been
> cleared out, the code drops through to the second if test, then hits the
> above exception.  We fixed this by adding the line
>
> internal_scheduled_write.delete(fileno)
>
> immediately before the call to write_and_schedule in
> Core::handle_write_event.
>
> The secondary issue is that the connections[fileno] structure is not
getting
> populated for this connection - I''m guessing because it is an
internal
> socket rather than a network socket, but I couldn''t be sure.  We
changed the
> second if test in Core::schedule_write to
>
>       elsif write_scheduled[fileno].nil? &&
!connections[fileno].nil?
>
> to firewall against this, but we are not sure if this is the right fix.
Thats was surely a bug and I fixed it like this:

      def schedule_write(t_sock,internal_instance = nil)
        fileno = t_sock.fileno
        if UNIXSocket === t_sock &&
internal_scheduled_write[fileno].nil?
          write_ios << t_sock
          internal_scheduled_write[t_sock.fileno] ||= internal_instance
        elsif write_scheduled[fileno].nil? &&
!(t_sock.is_a?(UNIXSocket))
          write_ios << t_sock
          write_scheduled[fileno] ||= connections[fileno].instance
        end
      end

Also, I fixed issue with marshalling larger data across the channel.
Thanks for reporting this. I have been terribly busy with things in
office and personal life and hence my work on BackgrounDRb has been in
hiatus for a while. Unfortunately, you can''t use trunk packet code
which is available from:

git clone git://github.com/gnufied/packet.git

directly with svn mirror of backgroundrb, since packet now uses fork
and exec to run workers and hence reducing memory usage of workers.
However in a day or two I will update git repository of BackgrounDRb
which makes use of latest packet version. In the meanwhile, you can
try backporting relevant packet changes to version you are using and
see if it fixes your problem.
>
> We are now hitting problems in the Packet::MetaPimp module receiving the
> data, usually an exception in the Marshal.load call in
> MetaPimp::receive_data.  We suspect this is caused by the packet code
> corrupting the data somewhere, probably because we are sending such large
> arrays of results (the repro I am working on at the moment is trying to
> marshal over 200k of data).  We''ve been trying to put extra
diagnostics in
> the code so we can see what is happening, but if we edit puts statements
> into the code we only seem to get output from the end of the connection
that
> hits an exception and so far our attempts to make logger objects available
> throughout the code have failed.  We therefore thought we would ask for
help
> - either to see whether this is a known problem, or whether there is a
> recommended way to add diagnostics to the packet code.
>
> I''m also open to ideas as to better ways to solve the problem!
>
> Thanks in advance,
>
> Mike
>
>
> _______________________________________________
> Backgroundrb-devel mailing list
> Backgroundrb-devel at rubyforge.org
> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>


-- 
Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.

http://gnufied.org

hemant

2008-May-21 04:36 UTC

head link

[Backgroundrb-devel] Problems sending large results with backgroundrb

You can test git version of backgroundrb with git version of packet
(which incorporates latest changes). The procedure is as follows:

clone the packet git repo:

git clone git://github.com/gnufied/packet.git
cd packet;rake gem
cd pkg; sudo gem install --local packet-0.1.6.gem

Go to your vendor directory of your rails directory and remove or
backup older version of backgroundrb plugin and backup related config
file as well.

from vendor directory:

git clone git://gitorious.org/backgroundrb/mainline.git backgroundrb
cd RAILS_ROOT
<<assuming older script and config file has been backed up>>
rake backgroundrb:setup
<<modify config/backgroundrb.yml according to your needs>>
./script/backgroundrb start
<<Let me know, how it goes and if this fixes your problem>>


On Wed, May 21, 2008 at 9:42 AM, hemant <gethemant at gmail.com>
wrote:> On Wed, May 21, 2008 at 1:00 AM, Mike Evans <mike at metaswitch.com>
wrote:
>> I''m working on an application that does extensive database
searching.  These
>> searches can take a long time, so we have been working on moving the
>> searches to a backgroundrb worker task so we can provide a sexy AJAX
>> progress bar, and populate the search results as they are available. 
All of
>> this seems to work fine until the size of the search results gets
>> sufficiently large, when we start to hit exceptions in backgroundrb
(most
>> likely in the packet layer).  We are using packet-0.5.1 and
backgroundrb
>> from the latest svn mirror.
>>
>> We have found and fixed one problem in the packet sender.  This is
triggered
>> when the non-blocking send in NbioHelper::send_once cannot send the
entire
>> buffer, resulting in an exception in the line
>>
>>       write_scheduled[fileno] ||= connections[fileno].instance
>>
>> in Core::schedule_write because connections[fileno] is nil.  I
can''t claim
>> to fully understand the code, but I think there are two problems here.
>>
>> The main issue seems to be that when Core::handle_write_event calls
>> write_and_schedule to schedule the write, it doesn''t clear out
>> internal_scheduled_write[fileno].  It looks like the code is expecting
the
>> cancel_write call at the end of write_and_schedule to clear it out, but
this
>> doesn''t happen if there is enough queued data to cause the
non-blocking
>> write to only partially succeed again.  In this case,
Core::schedule_write
>> is called again, and because internal_schedule_write[fileno] has not
been
>> cleared out, the code drops through to the second if test, then hits
the
>> above exception.  We fixed this by adding the line
>>
>> internal_scheduled_write.delete(fileno)
>>
>> immediately before the call to write_and_schedule in
>> Core::handle_write_event.
>>
>> The secondary issue is that the connections[fileno] structure is not
getting
>> populated for this connection - I''m guessing because it is an
internal
>> socket rather than a network socket, but I couldn''t be sure. 
We changed the
>> second if test in Core::schedule_write to
>>
>>       elsif write_scheduled[fileno].nil? &&
!connections[fileno].nil?
>>
>> to firewall against this, but we are not sure if this is the right fix.
>
> Thats was surely a bug and I fixed it like this:
>
>      def schedule_write(t_sock,internal_instance = nil)
>        fileno = t_sock.fileno
>        if UNIXSocket === t_sock &&
internal_scheduled_write[fileno].nil?
>          write_ios << t_sock
>          internal_scheduled_write[t_sock.fileno] ||= internal_instance
>        elsif write_scheduled[fileno].nil? &&
!(t_sock.is_a?(UNIXSocket))
>          write_ios << t_sock
>          write_scheduled[fileno] ||= connections[fileno].instance
>        end
>      end
>
> Also, I fixed issue with marshalling larger data across the channel.
> Thanks for reporting this. I have been terribly busy with things in
> office and personal life and hence my work on BackgrounDRb has been in
> hiatus for a while. Unfortunately, you can''t use trunk packet code
> which is available from:
>
> git clone git://github.com/gnufied/packet.git
>
> directly with svn mirror of backgroundrb, since packet now uses fork
> and exec to run workers and hence reducing memory usage of workers.
> However in a day or two I will update git repository of BackgrounDRb
> which makes use of latest packet version. In the meanwhile, you can
> try backporting relevant packet changes to version you are using and
> see if it fixes your problem.
>
>>
>> We are now hitting problems in the Packet::MetaPimp module receiving
the
>> data, usually an exception in the Marshal.load call in
>> MetaPimp::receive_data.  We suspect this is caused by the packet code
>> corrupting the data somewhere, probably because we are sending such
large
>> arrays of results (the repro I am working on at the moment is trying to
>> marshal over 200k of data).  We''ve been trying to put extra
diagnostics in
>> the code so we can see what is happening, but if we edit puts
statements
>> into the code we only seem to get output from the end of the connection
that
>> hits an exception and so far our attempts to make logger objects
available
>> throughout the code have failed.  We therefore thought we would ask for
help
>> - either to see whether this is a known problem, or whether there is a
>> recommended way to add diagnostics to the packet code.
>>
>> I''m also open to ideas as to better ways to solve the problem!
>>
>> Thanks in advance,
>>
>> Mike
>>
>>
>> _______________________________________________
>> Backgroundrb-devel mailing list
>> Backgroundrb-devel at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>>
>
>
>
> --
> Let them talk of their oriental summer climes of everlasting
> conservatories; give me the privilege of making my own summer with my
> own coals.
>
> http://gnufied.org
>


-- 
Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.

http://gnufied.org

Mike Evans

2008-May-21 07:14 UTC

head link

[Backgroundrb-devel] Problems sending large results with backgroundrb

Hemant

I got to the bottom of the other problem last night.  The issue was with
the NbioHelper::write_and_schedule method deleting entries from the
outbound_data array while iterating through it.  This can end up with
data getting out of order.  I fixed it by changing the
outbound_data.delete_at(index) statement to outbound_data[index] = nil,
and then compacting the array at the end of the iteration.

    # write the data in socket buffer and schedule the thing
    def write_and_schedule sock
      outbound_data.each_with_index do |t_data,index|
        leftover = write_once(t_data,sock)
        if leftover.empty?
          outbound_data[index] = nil
        else
          outbound_data[index] = leftover
          reactor.schedule_write(sock)
          break
        end
      end
      outbound_data.compact!
      reactor.cancel_write(sock) if outbound_data.empty?
    end

Mike 

-----Original Message-----
From: hemant [mailto:gethemant at gmail.com] 
Sent: 21 May 2008 05:36
To: Mike Evans
Cc: backgroundrb-devel at rubyforge.org
Subject: Re: [Backgroundrb-devel] Problems sending large results with
backgroundrb

You can test git version of backgroundrb with git version of packet
(which incorporates latest changes). The procedure is as follows:

clone the packet git repo:

git clone git://github.com/gnufied/packet.git
cd packet;rake gem
cd pkg; sudo gem install --local packet-0.1.6.gem

Go to your vendor directory of your rails directory and remove or backup
older version of backgroundrb plugin and backup related config file as
well.

from vendor directory:

git clone git://gitorious.org/backgroundrb/mainline.git backgroundrb cd
RAILS_ROOT <<assuming older script and config file has been backed
up>>
rake backgroundrb:setup <<modify config/backgroundrb.yml according to
your needs>> ./script/backgroundrb start <<Let me know, how it goes
and
if this fixes your problem>>


On Wed, May 21, 2008 at 9:42 AM, hemant <gethemant at gmail.com>
wrote:> On Wed, May 21, 2008 at 1:00 AM, Mike Evans <mike at metaswitch.com>
wrote:>> I''m working on an application that does extensive database
searching.
>> These searches can take a long time, so we have been working on 
>> moving the searches to a backgroundrb worker task so we can provide a
>> sexy AJAX progress bar, and populate the search results as they are 
>> available.  All of this seems to work fine until the size of the 
>> search results gets sufficiently large, when we start to hit 
>> exceptions in backgroundrb (most likely in the packet layer).  We are
>> using packet-0.5.1 and backgroundrb from the latest svn mirror.
>>
>> We have found and fixed one problem in the packet sender.  This is 
>> triggered when the non-blocking send in NbioHelper::send_once cannot 
>> send the entire buffer, resulting in an exception in the line
>>
>>       write_scheduled[fileno] ||= connections[fileno].instance
>>
>> in Core::schedule_write because connections[fileno] is nil.  I
can''t
>> claim to fully understand the code, but I think there are two
problems here.>>
>> The main issue seems to be that when Core::handle_write_event calls 
>> write_and_schedule to schedule the write, it doesn''t clear out
>> internal_scheduled_write[fileno].  It looks like the code is 
>> expecting the cancel_write call at the end of write_and_schedule to 
>> clear it out, but this doesn''t happen if there is enough
queued data
>> to cause the non-blocking write to only partially succeed again.  In 
>> this case, Core::schedule_write is called again, and because 
>> internal_schedule_write[fileno] has not been cleared out, the code 
>> drops through to the second if test, then hits the above exception.  
>> We fixed this by adding the line
>>
>> internal_scheduled_write.delete(fileno)
>>
>> immediately before the call to write_and_schedule in 
>> Core::handle_write_event.
>>
>> The secondary issue is that the connections[fileno] structure is not 
>> getting populated for this connection - I''m guessing because
it is an
>> internal socket rather than a network socket, but I couldn''t
be sure.
>> We changed the second if test in Core::schedule_write to
>>
>>       elsif write_scheduled[fileno].nil? &&
!connections[fileno].nil?
>>
>> to firewall against this, but we are not sure if this is the right
fix.>
> Thats was surely a bug and I fixed it like this:
>
>      def schedule_write(t_sock,internal_instance = nil)
>        fileno = t_sock.fileno
>        if UNIXSocket === t_sock &&
internal_scheduled_write[fileno].nil?>          write_ios << t_sock
>          internal_scheduled_write[t_sock.fileno] ||= internal_instance
>        elsif write_scheduled[fileno].nil? &&
!(t_sock.is_a?(UNIXSocket))>          write_ios << t_sock
>          write_scheduled[fileno] ||= connections[fileno].instance
>        end
>      end
>
> Also, I fixed issue with marshalling larger data across the channel.
> Thanks for reporting this. I have been terribly busy with things in 
> office and personal life and hence my work on BackgrounDRb has been in
> hiatus for a while. Unfortunately, you can''t use trunk packet code
> which is available from:
>
> git clone git://github.com/gnufied/packet.git
>
> directly with svn mirror of backgroundrb, since packet now uses fork 
> and exec to run workers and hence reducing memory usage of workers.
> However in a day or two I will update git repository of BackgrounDRb 
> which makes use of latest packet version. In the meanwhile, you can 
> try backporting relevant packet changes to version you are using and 
> see if it fixes your problem.
>
>>
>> We are now hitting problems in the Packet::MetaPimp module receiving 
>> the data, usually an exception in the Marshal.load call in 
>> MetaPimp::receive_data.  We suspect this is caused by the packet code
>> corrupting the data somewhere, probably because we are sending such 
>> large arrays of results (the repro I am working on at the moment is 
>> trying to marshal over 200k of data).  We''ve been trying to
put extra
>> diagnostics in the code so we can see what is happening, but if we 
>> edit puts statements into the code we only seem to get output from 
>> the end of the connection that hits an exception and so far our 
>> attempts to make logger objects available throughout the code have 
>> failed.  We therefore thought we would ask for help
>> - either to see whether this is a known problem, or whether there is 
>> a recommended way to add diagnostics to the packet code.
>>
>> I''m also open to ideas as to better ways to solve the problem!
>>
>> Thanks in advance,
>>
>> Mike
>>
>>
>> _______________________________________________
>> Backgroundrb-devel mailing list
>> Backgroundrb-devel at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>>
>
>
>
> --
> Let them talk of their oriental summer climes of everlasting 
> conservatories; give me the privilege of making my own summer with my 
> own coals.
>
> http://gnufied.org
>


--
Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.

http://gnufied.org

hemant

2008-May-21 07:56 UTC

head link

[Backgroundrb-devel] Problems sending large results with backgroundrb

Yeah that too. But I wonder, how did you solve following two problems:

Take a look at this code:

      def handle_write_event(p_ready_fds)
        p_ready_fds.each do |sock_fd|
          fileno = sock_fd.fileno
          if UNIXSocket === sock_fd && internal_scheduled_write[fileno]
# we have a problem here
            write_and_schedule(sock_fd)
          elsif extern_opts = connection_completion_awaited[fileno]
            complete_connection(sock_fd,extern_opts)
          elsif handler_instance = write_scheduled[fileno]
# I was drunk while writing following line
            handler_instance.write_scheduled(sock_fd)
          end
        end
      end

The problem is, as you said say in a MetaPimp some data is left
unwritten, it won''t get written in subsequent writes because
outbound_data belongs to MetaPimp class not main reactor class and
hence, it should be:

      def handle_write_event(p_ready_fds)
        p_ready_fds.each do |sock_fd|
          fileno = sock_fd.fileno
          if UNIXSocket === sock_fd && (internal_instance
internal_scheduled_write[fileno])
            internal_instance.write_and_schedule(sock_fd)
          elsif extern_opts = connection_completion_awaited[fileno]
            complete_connection(sock_fd,extern_opts)
          elsif handler_instance = write_scheduled[fileno]
            handler_instance.write_and_schedule(sock_fd)
          end
        end
      end

Also, I have included your changes in packet git. So, if you can give
backgroundrb git a shot, I will appreciate that ( Please backup your
older plugin and config files)

On Wed, May 21, 2008 at 12:44 PM, Mike Evans <mike at metaswitch.com>
wrote:> Hemant
>
> I got to the bottom of the other problem last night.  The issue was with
> the NbioHelper::write_and_schedule method deleting entries from the
> outbound_data array while iterating through it.  This can end up with
> data getting out of order.  I fixed it by changing the
> outbound_data.delete_at(index) statement to outbound_data[index] = nil,
> and then compacting the array at the end of the iteration.
>
>    # write the data in socket buffer and schedule the thing
>    def write_and_schedule sock
>      outbound_data.each_with_index do |t_data,index|
>        leftover = write_once(t_data,sock)
>        if leftover.empty?
>          outbound_data[index] = nil
>        else
>          outbound_data[index] = leftover
>          reactor.schedule_write(sock)
>          break
>        end
>      end
>      outbound_data.compact!
>      reactor.cancel_write(sock) if outbound_data.empty?
>    end
>
> Mike
>
> -----Original Message-----
> From: hemant [mailto:gethemant at gmail.com]
> Sent: 21 May 2008 05:36
> To: Mike Evans
> Cc: backgroundrb-devel at rubyforge.org
> Subject: Re: [Backgroundrb-devel] Problems sending large results with
> backgroundrb
>
> You can test git version of backgroundrb with git version of packet
> (which incorporates latest changes). The procedure is as follows:
>
> clone the packet git repo:
>
> git clone git://github.com/gnufied/packet.git
> cd packet;rake gem
> cd pkg; sudo gem install --local packet-0.1.6.gem
>
> Go to your vendor directory of your rails directory and remove or backup
> older version of backgroundrb plugin and backup related config file as
> well.
>
> from vendor directory:
>
> git clone git://gitorious.org/backgroundrb/mainline.git backgroundrb cd
> RAILS_ROOT <<assuming older script and config file has been backed
up>>
> rake backgroundrb:setup <<modify config/backgroundrb.yml according to
> your needs>> ./script/backgroundrb start <<Let me know, how it
goes and
> if this fixes your problem>>
>
>
> On Wed, May 21, 2008 at 9:42 AM, hemant <gethemant at gmail.com>
wrote:
>> On Wed, May 21, 2008 at 1:00 AM, Mike Evans <mike at
metaswitch.com>
> wrote:
>>> I''m working on an application that does extensive database
searching.
>
>>> These searches can take a long time, so we have been working on
>>> moving the searches to a backgroundrb worker task so we can provide
a
>
>>> sexy AJAX progress bar, and populate the search results as they are
>>> available.  All of this seems to work fine until the size of the
>>> search results gets sufficiently large, when we start to hit
>>> exceptions in backgroundrb (most likely in the packet layer).  We
are
>
>>> using packet-0.5.1 and backgroundrb from the latest svn mirror.
>>>
>>> We have found and fixed one problem in the packet sender.  This is
>>> triggered when the non-blocking send in NbioHelper::send_once
cannot
>>> send the entire buffer, resulting in an exception in the line
>>>
>>>       write_scheduled[fileno] ||= connections[fileno].instance
>>>
>>> in Core::schedule_write because connections[fileno] is nil.  I
can''t
>>> claim to fully understand the code, but I think there are two
> problems here.
>>>
>>> The main issue seems to be that when Core::handle_write_event calls
>>> write_and_schedule to schedule the write, it doesn''t clear
out
>>> internal_scheduled_write[fileno].  It looks like the code is
>>> expecting the cancel_write call at the end of write_and_schedule to
>>> clear it out, but this doesn''t happen if there is enough
queued data
>>> to cause the non-blocking write to only partially succeed again. 
In
>>> this case, Core::schedule_write is called again, and because
>>> internal_schedule_write[fileno] has not been cleared out, the code
>>> drops through to the second if test, then hits the above exception.
>>> We fixed this by adding the line
>>>
>>> internal_scheduled_write.delete(fileno)
>>>
>>> immediately before the call to write_and_schedule in
>>> Core::handle_write_event.
>>>
>>> The secondary issue is that the connections[fileno] structure is
not
>>> getting populated for this connection - I''m guessing
because it is an
>
>>> internal socket rather than a network socket, but I
couldn''t be sure.
>
>>> We changed the second if test in Core::schedule_write to
>>>
>>>       elsif write_scheduled[fileno].nil? &&
!connections[fileno].nil?
>>>
>>> to firewall against this, but we are not sure if this is the right
> fix.
>>
>> Thats was surely a bug and I fixed it like this:
>>
>>      def schedule_write(t_sock,internal_instance = nil)
>>        fileno = t_sock.fileno
>>        if UNIXSocket === t_sock &&
> internal_scheduled_write[fileno].nil?
>>          write_ios << t_sock
>>          internal_scheduled_write[t_sock.fileno] ||= internal_instance
>>        elsif write_scheduled[fileno].nil? &&
> !(t_sock.is_a?(UNIXSocket))
>>          write_ios << t_sock
>>          write_scheduled[fileno] ||= connections[fileno].instance
>>        end
>>      end
>>
>> Also, I fixed issue with marshalling larger data across the channel.
>> Thanks for reporting this. I have been terribly busy with things in
>> office and personal life and hence my work on BackgrounDRb has been in
>
>> hiatus for a while. Unfortunately, you can''t use trunk packet
code
>> which is available from:
>>
>> git clone git://github.com/gnufied/packet.git
>>
>> directly with svn mirror of backgroundrb, since packet now uses fork
>> and exec to run workers and hence reducing memory usage of workers.
>> However in a day or two I will update git repository of BackgrounDRb
>> which makes use of latest packet version. In the meanwhile, you can
>> try backporting relevant packet changes to version you are using and
>> see if it fixes your problem.
>>
>>>
>>> We are now hitting problems in the Packet::MetaPimp module
receiving
>>> the data, usually an exception in the Marshal.load call in
>>> MetaPimp::receive_data.  We suspect this is caused by the packet
code
>
>>> corrupting the data somewhere, probably because we are sending such
>>> large arrays of results (the repro I am working on at the moment is
>>> trying to marshal over 200k of data).  We''ve been trying
to put extra
>
>>> diagnostics in the code so we can see what is happening, but if we
>>> edit puts statements into the code we only seem to get output from
>>> the end of the connection that hits an exception and so far our
>>> attempts to make logger objects available throughout the code have
>>> failed.  We therefore thought we would ask for help
>>> - either to see whether this is a known problem, or whether there
is
>>> a recommended way to add diagnostics to the packet code.
>>>
>>> I''m also open to ideas as to better ways to solve the
problem!
>>>
>>> Thanks in advance,
>>>
>>> Mike
>>>
>>>
>>> _______________________________________________
>>> Backgroundrb-devel mailing list
>>> Backgroundrb-devel at rubyforge.org
>>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>>>
>>
>>
>>
>> --
>> Let them talk of their oriental summer climes of everlasting
>> conservatories; give me the privilege of making my own summer with my
>> own coals.
>>
>> http://gnufied.org
>>
>
>
>
> --
> Let them talk of their oriental summer climes of everlasting
> conservatories; give me the privilege of making my own summer with my
> own coals.
>
> http://gnufied.org
>


-- 
Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.

http://gnufied.org

Mike Evans

2008-May-24 13:51 UTC

head link

[Backgroundrb-devel] Problems sending large results with backgroundrb

Hemant

I''m not sure why we didn''t hit that problem in original
testing, but we
have hit it in later testing.

I''ve tried upgrading to the latest packet and backgroundrb from git,
but
I''m now having problems with the initial start_worker.  I''m
trying to
start the worker passing it a Ruby object of type SearchDn (which is
declared in app/model/search_dn.rb), but I''m hitting the exception
below.  Previously I was running with :lazy_load set to false, but this
doesn''t seem to make any difference - has this feature been retired in
this version of code?

Any thoughts?

Mike

/usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in
`gem_original_require'': no such file to load -- dn (MissingSourceFile)
        from
/usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in
`require''
        from
/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.4/lib/active_support
/dependencies.rb:495:in `require''
        from
/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.4/lib/active_support
/dependencies.rb:342:in `new_constants_in''
        from
/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.4/lib/active_support
/dependencies.rb:495:in `require''
        from
/disk0.7/var/opt/MetaViewSAS/tview/vendor/plugins/backgroundrb/server/li
b/master_worker.rb:60:in `load_data''
        from
/disk0.7/var/opt/MetaViewSAS/tview/vendor/plugins/backgroundrb/server/li
b/master_worker.rb:32:in `receive_data''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/lib/packet/packet_parser.
rb:30:in `call''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/lib/packet/packet_parser.
rb:30:in `extract''
         ... 9 levels...
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/lib/packet/packet_master.
rb:21:in `run''
        from
/disk0.7/var/opt/MetaViewSAS/tview/vendor/plugins/backgroundrb/server/li
b/master_worker.rb:188:in `initialize''
        from ../script/backgroundrb:42:in `new''
        from ../script/backgroundrb:42
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
nbio.rb:25:in `read_data'': Packet::DisconnectError
(Packet::DisconnectError)
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
worker.rb:49:in `handle_internal_messages''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
core.rb:179:in `handle_read_event''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
core.rb:177:in `each''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
core.rb:177:in `handle_read_event''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
core.rb:133:in `start_reactor''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
core.rb:126:in `loop''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
core.rb:126:in `start_reactor''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
worker.rb:21:in `start_worker''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner:
38:in `load_worker''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner:
26:in `initialize''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner:
47:in `new''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner:
47
        from /usr/local/bin/packet_worker_runner:16:in `load''
        from /usr/local/bin/packet_worker_runner:16




-----Original Message-----
From: hemant [mailto:gethemant at gmail.com] 
Sent: 21 May 2008 08:56
To: Mike Evans
Cc: backgroundrb-devel at rubyforge.org
Subject: Re: [Backgroundrb-devel] Problems sending large results with
backgroundrb

Yeah that too. But I wonder, how did you solve following two problems:

Take a look at this code:

      def handle_write_event(p_ready_fds)
        p_ready_fds.each do |sock_fd|
          fileno = sock_fd.fileno
          if UNIXSocket === sock_fd && internal_scheduled_write[fileno]
# we have a problem here
            write_and_schedule(sock_fd)
          elsif extern_opts = connection_completion_awaited[fileno]
            complete_connection(sock_fd,extern_opts)
          elsif handler_instance = write_scheduled[fileno] # I was drunk
while writing following line
            handler_instance.write_scheduled(sock_fd)
          end
        end
      end

The problem is, as you said say in a MetaPimp some data is left
unwritten, it won''t get written in subsequent writes because
outbound_data belongs to MetaPimp class not main reactor class and
hence, it should be:

      def handle_write_event(p_ready_fds)
        p_ready_fds.each do |sock_fd|
          fileno = sock_fd.fileno
          if UNIXSocket === sock_fd && (internal_instance
internal_scheduled_write[fileno])
            internal_instance.write_and_schedule(sock_fd)
          elsif extern_opts = connection_completion_awaited[fileno]
            complete_connection(sock_fd,extern_opts)
          elsif handler_instance = write_scheduled[fileno]
            handler_instance.write_and_schedule(sock_fd)
          end
        end
      end

Also, I have included your changes in packet git. So, if you can give
backgroundrb git a shot, I will appreciate that ( Please backup your
older plugin and config files)

On Wed, May 21, 2008 at 12:44 PM, Mike Evans <mike at metaswitch.com>
wrote:> Hemant
>
> I got to the bottom of the other problem last night.  The issue was 
> with the NbioHelper::write_and_schedule method deleting entries from 
> the outbound_data array while iterating through it.  This can end up 
> with data getting out of order.  I fixed it by changing the
> outbound_data.delete_at(index) statement to outbound_data[index] = 
> nil, and then compacting the array at the end of the iteration.
>
>    # write the data in socket buffer and schedule the thing
>    def write_and_schedule sock
>      outbound_data.each_with_index do |t_data,index|
>        leftover = write_once(t_data,sock)
>        if leftover.empty?
>          outbound_data[index] = nil
>        else
>          outbound_data[index] = leftover
>          reactor.schedule_write(sock)
>          break
>        end
>      end
>      outbound_data.compact!
>      reactor.cancel_write(sock) if outbound_data.empty?
>    end
>
> Mike
>
> -----Original Message-----
> From: hemant [mailto:gethemant at gmail.com]
> Sent: 21 May 2008 05:36
> To: Mike Evans
> Cc: backgroundrb-devel at rubyforge.org
> Subject: Re: [Backgroundrb-devel] Problems sending large results with 
> backgroundrb
>
> You can test git version of backgroundrb with git version of packet 
> (which incorporates latest changes). The procedure is as follows:
>
> clone the packet git repo:
>
> git clone git://github.com/gnufied/packet.git
> cd packet;rake gem
> cd pkg; sudo gem install --local packet-0.1.6.gem
>
> Go to your vendor directory of your rails directory and remove or 
> backup older version of backgroundrb plugin and backup related config 
> file as well.
>
> from vendor directory:
>
> git clone git://gitorious.org/backgroundrb/mainline.git backgroundrb 
> cd RAILS_ROOT <<assuming older script and config file has been backed
> up>> rake backgroundrb:setup <<modify config/backgroundrb.yml 
> according to your needs>> ./script/backgroundrb start <<Let me
know,
> how it goes and if this fixes your problem>>
>
>
> On Wed, May 21, 2008 at 9:42 AM, hemant <gethemant at gmail.com>
wrote:
>> On Wed, May 21, 2008 at 1:00 AM, Mike Evans <mike at
metaswitch.com>
> wrote:
>>> I''m working on an application that does extensive database
searching.>
>>> These searches can take a long time, so we have been working on 
>>> moving the searches to a backgroundrb worker task so we can provide
>>> a
>
>>> sexy AJAX progress bar, and populate the search results as they are
>>> available.  All of this seems to work fine until the size of the 
>>> search results gets sufficiently large, when we start to hit 
>>> exceptions in backgroundrb (most likely in the packet layer).  We 
>>> are
>
>>> using packet-0.5.1 and backgroundrb from the latest svn mirror.
>>>
>>> We have found and fixed one problem in the packet sender.  This is 
>>> triggered when the non-blocking send in NbioHelper::send_once
cannot
>>> send the entire buffer, resulting in an exception in the line
>>>
>>>       write_scheduled[fileno] ||= connections[fileno].instance
>>>
>>> in Core::schedule_write because connections[fileno] is nil.  I
can''t
>>> claim to fully understand the code, but I think there are two
> problems here.
>>>
>>> The main issue seems to be that when Core::handle_write_event calls
>>> write_and_schedule to schedule the write, it doesn''t clear
out
>>> internal_scheduled_write[fileno].  It looks like the code is 
>>> expecting the cancel_write call at the end of write_and_schedule to
>>> clear it out, but this doesn''t happen if there is enough
queued data
>>> to cause the non-blocking write to only partially succeed again. 
In
>>> this case, Core::schedule_write is called again, and because 
>>> internal_schedule_write[fileno] has not been cleared out, the code 
>>> drops through to the second if test, then hits the above exception.
>>> We fixed this by adding the line
>>>
>>> internal_scheduled_write.delete(fileno)
>>>
>>> immediately before the call to write_and_schedule in 
>>> Core::handle_write_event.
>>>
>>> The secondary issue is that the connections[fileno] structure is
not
>>> getting populated for this connection - I''m guessing
because it is
>>> an
>
>>> internal socket rather than a network socket, but I
couldn''t be
sure.>
>>> We changed the second if test in Core::schedule_write to
>>>
>>>       elsif write_scheduled[fileno].nil? &&
!connections[fileno].nil?>>>
>>> to firewall against this, but we are not sure if this is the right
> fix.
>>
>> Thats was surely a bug and I fixed it like this:
>>
>>      def schedule_write(t_sock,internal_instance = nil)
>>        fileno = t_sock.fileno
>>        if UNIXSocket === t_sock &&
> internal_scheduled_write[fileno].nil?
>>          write_ios << t_sock
>>          internal_scheduled_write[t_sock.fileno] ||internal_instance
>>        elsif write_scheduled[fileno].nil? &&
> !(t_sock.is_a?(UNIXSocket))
>>          write_ios << t_sock
>>          write_scheduled[fileno] ||= connections[fileno].instance
>>        end
>>      end
>>
>> Also, I fixed issue with marshalling larger data across the channel.
>> Thanks for reporting this. I have been terribly busy with things in 
>> office and personal life and hence my work on BackgrounDRb has been 
>> in
>
>> hiatus for a while. Unfortunately, you can''t use trunk packet
code
>> which is available from:
>>
>> git clone git://github.com/gnufied/packet.git
>>
>> directly with svn mirror of backgroundrb, since packet now uses fork 
>> and exec to run workers and hence reducing memory usage of workers.
>> However in a day or two I will update git repository of BackgrounDRb 
>> which makes use of latest packet version. In the meanwhile, you can 
>> try backporting relevant packet changes to version you are using and 
>> see if it fixes your problem.
>>
>>>
>>> We are now hitting problems in the Packet::MetaPimp module
receiving
>>> the data, usually an exception in the Marshal.load call in 
>>> MetaPimp::receive_data.  We suspect this is caused by the packet 
>>> code
>
>>> corrupting the data somewhere, probably because we are sending such
>>> large arrays of results (the repro I am working on at the moment is
>>> trying to marshal over 200k of data).  We''ve been trying
to put
>>> extra
>
>>> diagnostics in the code so we can see what is happening, but if we 
>>> edit puts statements into the code we only seem to get output from 
>>> the end of the connection that hits an exception and so far our 
>>> attempts to make logger objects available throughout the code have 
>>> failed.  We therefore thought we would ask for help
>>> - either to see whether this is a known problem, or whether there
is
>>> a recommended way to add diagnostics to the packet code.
>>>
>>> I''m also open to ideas as to better ways to solve the
problem!
>>>
>>> Thanks in advance,
>>>
>>> Mike
>>>
>>>
>>> _______________________________________________
>>> Backgroundrb-devel mailing list
>>> Backgroundrb-devel at rubyforge.org
>>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>>>
>>
>>
>>
>> --
>> Let them talk of their oriental summer climes of everlasting 
>> conservatories; give me the privilege of making my own summer with my
>> own coals.
>>
>> http://gnufied.org
>>
>
>
>
> --
> Let them talk of their oriental summer climes of everlasting 
> conservatories; give me the privilege of making my own summer with my 
> own coals.
>
> http://gnufied.org
>


--
Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.

http://gnufied.org

Mike Evans

2008-May-24 14:49 UTC

head link

[Backgroundrb-devel] Problems sending large results withbackgroundrb

Hemant

I fixed a minor bug that means the code is now getting the right file
name, but object file is still failing to load.

The fix is to change the regular expression used to process the
Marshal.load exception is MasterWorker::load_data from

      if error_msg =~ /^undefined.+([A-Z]\w+)/ 

to

      if error_msg =~ /^undefined.+ ([A-Z]\w+)/

The extra space forces it to take the whole of the last word in the
error message, not the just the last capital onward.

I suspect the issue I''m now seeing is because the MasterWorker class
doesn''t load the Rails environment.  Any thoughts on how to fix this?

Mike

-----Original Message-----
From: backgroundrb-devel-bounces at rubyforge.org
[mailto:backgroundrb-devel-bounces at rubyforge.org] On Behalf Of Mike
Evans
Sent: 24 May 2008 14:52
To: hemant
Cc: backgroundrb-devel at rubyforge.org
Subject: Re: [Backgroundrb-devel] Problems sending large results
withbackgroundrb

Hemant

I''m not sure why we didn''t hit that problem in original
testing, but we
have hit it in later testing.

I''ve tried upgrading to the latest packet and backgroundrb from git,
but
I''m now having problems with the initial start_worker.  I''m
trying to
start the worker passing it a Ruby object of type SearchDn (which is
declared in app/model/search_dn.rb), but I''m hitting the exception
below.  Previously I was running with :lazy_load set to false, but this
doesn''t seem to make any difference - has this feature been retired in
this version of code?

Any thoughts?

Mike

/usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in
`gem_original_require'': no such file to load -- dn (MissingSourceFile)
        from
/usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in
`require''
        from
/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.4/lib/active_support
/dependencies.rb:495:in `require''
        from
/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.4/lib/active_support
/dependencies.rb:342:in `new_constants_in''
        from
/usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.4/lib/active_support
/dependencies.rb:495:in `require''
        from
/disk0.7/var/opt/MetaViewSAS/tview/vendor/plugins/backgroundrb/server/li
b/master_worker.rb:60:in `load_data''
        from
/disk0.7/var/opt/MetaViewSAS/tview/vendor/plugins/backgroundrb/server/li
b/master_worker.rb:32:in `receive_data''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/lib/packet/packet_parser.
rb:30:in `call''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/lib/packet/packet_parser.
rb:30:in `extract''
         ... 9 levels...
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/lib/packet/packet_master.
rb:21:in `run''
        from
/disk0.7/var/opt/MetaViewSAS/tview/vendor/plugins/backgroundrb/server/li
b/master_worker.rb:188:in `initialize''
        from ../script/backgroundrb:42:in `new''
        from ../script/backgroundrb:42
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
nbio.rb:25:in `read_data'': Packet::DisconnectError
(Packet::DisconnectError)
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
worker.rb:49:in `handle_internal_messages''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
core.rb:179:in `handle_read_event''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
core.rb:177:in `each''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
core.rb:177:in `handle_read_event''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
core.rb:133:in `start_reactor''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
core.rb:126:in `loop''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
core.rb:126:in `start_reactor''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_
worker.rb:21:in `start_worker''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner:
38:in `load_worker''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner:
26:in `initialize''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner:
47:in `new''
        from
/usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner:
47
        from /usr/local/bin/packet_worker_runner:16:in `load''
        from /usr/local/bin/packet_worker_runner:16




-----Original Message-----
From: hemant [mailto:gethemant at gmail.com]
Sent: 21 May 2008 08:56
To: Mike Evans
Cc: backgroundrb-devel at rubyforge.org
Subject: Re: [Backgroundrb-devel] Problems sending large results with
backgroundrb

Yeah that too. But I wonder, how did you solve following two problems:

Take a look at this code:

      def handle_write_event(p_ready_fds)
        p_ready_fds.each do |sock_fd|
          fileno = sock_fd.fileno
          if UNIXSocket === sock_fd && internal_scheduled_write[fileno]
# we have a problem here
            write_and_schedule(sock_fd)
          elsif extern_opts = connection_completion_awaited[fileno]
            complete_connection(sock_fd,extern_opts)
          elsif handler_instance = write_scheduled[fileno] # I was drunk
while writing following line
            handler_instance.write_scheduled(sock_fd)
          end
        end
      end

The problem is, as you said say in a MetaPimp some data is left
unwritten, it won''t get written in subsequent writes because
outbound_data belongs to MetaPimp class not main reactor class and
hence, it should be:

      def handle_write_event(p_ready_fds)
        p_ready_fds.each do |sock_fd|
          fileno = sock_fd.fileno
          if UNIXSocket === sock_fd && (internal_instance
internal_scheduled_write[fileno])
            internal_instance.write_and_schedule(sock_fd)
          elsif extern_opts = connection_completion_awaited[fileno]
            complete_connection(sock_fd,extern_opts)
          elsif handler_instance = write_scheduled[fileno]
            handler_instance.write_and_schedule(sock_fd)
          end
        end
      end

Also, I have included your changes in packet git. So, if you can give
backgroundrb git a shot, I will appreciate that ( Please backup your
older plugin and config files)

On Wed, May 21, 2008 at 12:44 PM, Mike Evans <mike at metaswitch.com>
wrote:> Hemant
>
> I got to the bottom of the other problem last night.  The issue was 
> with the NbioHelper::write_and_schedule method deleting entries from 
> the outbound_data array while iterating through it.  This can end up 
> with data getting out of order.  I fixed it by changing the
> outbound_data.delete_at(index) statement to outbound_data[index] = 
> nil, and then compacting the array at the end of the iteration.
>
>    # write the data in socket buffer and schedule the thing
>    def write_and_schedule sock
>      outbound_data.each_with_index do |t_data,index|
>        leftover = write_once(t_data,sock)
>        if leftover.empty?
>          outbound_data[index] = nil
>        else
>          outbound_data[index] = leftover
>          reactor.schedule_write(sock)
>          break
>        end
>      end
>      outbound_data.compact!
>      reactor.cancel_write(sock) if outbound_data.empty?
>    end
>
> Mike
>
> -----Original Message-----
> From: hemant [mailto:gethemant at gmail.com]
> Sent: 21 May 2008 05:36
> To: Mike Evans
> Cc: backgroundrb-devel at rubyforge.org
> Subject: Re: [Backgroundrb-devel] Problems sending large results with 
> backgroundrb
>
> You can test git version of backgroundrb with git version of packet 
> (which incorporates latest changes). The procedure is as follows:
>
> clone the packet git repo:
>
> git clone git://github.com/gnufied/packet.git
> cd packet;rake gem
> cd pkg; sudo gem install --local packet-0.1.6.gem
>
> Go to your vendor directory of your rails directory and remove or 
> backup older version of backgroundrb plugin and backup related config 
> file as well.
>
> from vendor directory:
>
> git clone git://gitorious.org/backgroundrb/mainline.git backgroundrb 
> cd RAILS_ROOT <<assuming older script and config file has been backed
> up>> rake backgroundrb:setup <<modify config/backgroundrb.yml
> according to your needs>> ./script/backgroundrb start <<Let me
know,
> how it goes and if this fixes your problem>>
>
>
> On Wed, May 21, 2008 at 9:42 AM, hemant <gethemant at gmail.com>
wrote:
>> On Wed, May 21, 2008 at 1:00 AM, Mike Evans <mike at
metaswitch.com>
> wrote:
>>> I''m working on an application that does extensive database
searching.>
>>> These searches can take a long time, so we have been working on 
>>> moving the searches to a backgroundrb worker task so we can provide
>>> a
>
>>> sexy AJAX progress bar, and populate the search results as they are
>>> available.  All of this seems to work fine until the size of the 
>>> search results gets sufficiently large, when we start to hit 
>>> exceptions in backgroundrb (most likely in the packet layer).  We 
>>> are
>
>>> using packet-0.5.1 and backgroundrb from the latest svn mirror.
>>>
>>> We have found and fixed one problem in the packet sender.  This is 
>>> triggered when the non-blocking send in NbioHelper::send_once
cannot
>>> send the entire buffer, resulting in an exception in the line
>>>
>>>       write_scheduled[fileno] ||= connections[fileno].instance
>>>
>>> in Core::schedule_write because connections[fileno] is nil.  I
can''t
>>> claim to fully understand the code, but I think there are two
> problems here.
>>>
>>> The main issue seems to be that when Core::handle_write_event calls
>>> write_and_schedule to schedule the write, it doesn''t clear
out
>>> internal_scheduled_write[fileno].  It looks like the code is 
>>> expecting the cancel_write call at the end of write_and_schedule to
>>> clear it out, but this doesn''t happen if there is enough
queued data
>>> to cause the non-blocking write to only partially succeed again. 
In
>>> this case, Core::schedule_write is called again, and because 
>>> internal_schedule_write[fileno] has not been cleared out, the code 
>>> drops through to the second if test, then hits the above exception.
>>> We fixed this by adding the line
>>>
>>> internal_scheduled_write.delete(fileno)
>>>
>>> immediately before the call to write_and_schedule in 
>>> Core::handle_write_event.
>>>
>>> The secondary issue is that the connections[fileno] structure is
not
>>> getting populated for this connection - I''m guessing
because it is
>>> an
>
>>> internal socket rather than a network socket, but I
couldn''t be
sure.>
>>> We changed the second if test in Core::schedule_write to
>>>
>>>       elsif write_scheduled[fileno].nil? &&
!connections[fileno].nil?>>>
>>> to firewall against this, but we are not sure if this is the right
> fix.
>>
>> Thats was surely a bug and I fixed it like this:
>>
>>      def schedule_write(t_sock,internal_instance = nil)
>>        fileno = t_sock.fileno
>>        if UNIXSocket === t_sock &&
> internal_scheduled_write[fileno].nil?
>>          write_ios << t_sock
>>          internal_scheduled_write[t_sock.fileno] ||internal_instance
>>        elsif write_scheduled[fileno].nil? &&
> !(t_sock.is_a?(UNIXSocket))
>>          write_ios << t_sock
>>          write_scheduled[fileno] ||= connections[fileno].instance
>>        end
>>      end
>>
>> Also, I fixed issue with marshalling larger data across the channel.
>> Thanks for reporting this. I have been terribly busy with things in 
>> office and personal life and hence my work on BackgrounDRb has been 
>> in
>
>> hiatus for a while. Unfortunately, you can''t use trunk packet
code
>> which is available from:
>>
>> git clone git://github.com/gnufied/packet.git
>>
>> directly with svn mirror of backgroundrb, since packet now uses fork 
>> and exec to run workers and hence reducing memory usage of workers.
>> However in a day or two I will update git repository of BackgrounDRb 
>> which makes use of latest packet version. In the meanwhile, you can 
>> try backporting relevant packet changes to version you are using and 
>> see if it fixes your problem.
>>
>>>
>>> We are now hitting problems in the Packet::MetaPimp module
receiving
>>> the data, usually an exception in the Marshal.load call in 
>>> MetaPimp::receive_data.  We suspect this is caused by the packet 
>>> code
>
>>> corrupting the data somewhere, probably because we are sending such
>>> large arrays of results (the repro I am working on at the moment is
>>> trying to marshal over 200k of data).  We''ve been trying
to put
>>> extra
>
>>> diagnostics in the code so we can see what is happening, but if we 
>>> edit puts statements into the code we only seem to get output from 
>>> the end of the connection that hits an exception and so far our 
>>> attempts to make logger objects available throughout the code have 
>>> failed.  We therefore thought we would ask for help
>>> - either to see whether this is a known problem, or whether there
is
>>> a recommended way to add diagnostics to the packet code.
>>>
>>> I''m also open to ideas as to better ways to solve the
problem!
>>>
>>> Thanks in advance,
>>>
>>> Mike
>>>
>>>
>>> _______________________________________________
>>> Backgroundrb-devel mailing list
>>> Backgroundrb-devel at rubyforge.org
>>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel
>>>
>>
>>
>>
>> --
>> Let them talk of their oriental summer climes of everlasting 
>> conservatories; give me the privilege of making my own summer with my
>> own coals.
>>
>> http://gnufied.org
>>
>
>
>
> --
> Let them talk of their oriental summer climes of everlasting 
> conservatories; give me the privilege of making my own summer with my 
> own coals.
>
> http://gnufied.org
>


--
Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.

http://gnufied.org
_______________________________________________
Backgroundrb-devel mailing list
Backgroundrb-devel at rubyforge.org
http://rubyforge.org/mailman/listinfo/backgroundrb-devel

hemant kumar

2008-May-24 15:13 UTC

head link

[Backgroundrb-devel] Problems sending large results withbackgroundrb

On Sat, 2008-05-24 at 15:49 +0100, Mike Evans wrote:> Hemant
> 
> I fixed a minor bug that means the code is now getting the right file
> name, but object file is still failing to load.
> 
> The fix is to change the regular expression used to process the
> Marshal.load exception is MasterWorker::load_data from
> 
>       if error_msg =~ /^undefined.+([A-Z]\w+)/ 
> 
> to
> 
>       if error_msg =~ /^undefined.+ ([A-Z]\w+)/
> 
> The extra space forces it to take the whole of the last word in the
> error message, not the just the last capital onward.
> 
> I suspect the issue I''m now seeing is because the MasterWorker
class
> doesn''t load the Rails environment.  Any thoughts on how to fix
this?
> 
Yeah, Mike, When I saw your mail I knew that there is some problem with
that piece of regexp. Now, rails environment IS getting loaded in master
worker, but somehow, autoloading of models is not working from master
class (although, it works from workers alright) and hence my own hand
rolled mechanism of autoloading models. I have no idea, from top of my
head, why thats happening, but expect a fix soon ( or a patch is more
than welcome ).

Apparently Analagous Threads

Search for more seemingly similar threads

Backgroundrb devel - May 2008 - Problems sending large results with backgroundrb

[Backgroundrb-devel] Problems sending large results with backgroundrb

[Backgroundrb-devel] Problems sending large results with backgroundrb

[Backgroundrb-devel] Problems sending large results with backgroundrb

[Backgroundrb-devel] Problems sending large results with backgroundrb

[Backgroundrb-devel] Problems sending large results with backgroundrb

[Backgroundrb-devel] Problems sending large results with backgroundrb

[Backgroundrb-devel] Problems sending large results withbackgroundrb

[Backgroundrb-devel] Problems sending large results withbackgroundrb

Apparently Analagous Threads