Mike Evans
2008-May-20 19:30 UTC
[Backgroundrb-devel] Problems sending large results with backgroundrb
I''m working on an application that does extensive database searching. These searches can take a long time, so we have been working on moving the searches to a backgroundrb worker task so we can provide a sexy AJAX progress bar, and populate the search results as they are available. All of this seems to work fine until the size of the search results gets sufficiently large, when we start to hit exceptions in backgroundrb (most likely in the packet layer). We are using packet-0.5.1 and backgroundrb from the latest svn mirror. We have found and fixed one problem in the packet sender. This is triggered when the non-blocking send in NbioHelper::send_once cannot send the entire buffer, resulting in an exception in the line write_scheduled[fileno] ||= connections[fileno].instance in Core::schedule_write because connections[fileno] is nil. I can''t claim to fully understand the code, but I think there are two problems here. The main issue seems to be that when Core::handle_write_event calls write_and_schedule to schedule the write, it doesn''t clear out internal_scheduled_write[fileno]. It looks like the code is expecting the cancel_write call at the end of write_and_schedule to clear it out, but this doesn''t happen if there is enough queued data to cause the non-blocking write to only partially succeed again. In this case, Core::schedule_write is called again, and because internal_schedule_write[fileno] has not been cleared out, the code drops through to the second if test, then hits the above exception. We fixed this by adding the line internal_scheduled_write.delete(fileno) immediately before the call to write_and_schedule in Core::handle_write_event. The secondary issue is that the connections[fileno] structure is not getting populated for this connection - I''m guessing because it is an internal socket rather than a network socket, but I couldn''t be sure. We changed the second if test in Core::schedule_write to elsif write_scheduled[fileno].nil? && !connections[fileno].nil? to firewall against this, but we are not sure if this is the right fix. We are now hitting problems in the Packet::MetaPimp module receiving the data, usually an exception in the Marshal.load call in MetaPimp::receive_data. We suspect this is caused by the packet code corrupting the data somewhere, probably because we are sending such large arrays of results (the repro I am working on at the moment is trying to marshal over 200k of data). We''ve been trying to put extra diagnostics in the code so we can see what is happening, but if we edit puts statements into the code we only seem to get output from the end of the connection that hits an exception and so far our attempts to make logger objects available throughout the code have failed. We therefore thought we would ask for help - either to see whether this is a known problem, or whether there is a recommended way to add diagnostics to the packet code. I''m also open to ideas as to better ways to solve the problem! Thanks in advance, Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20080520/aad353ae/attachment.html>
hemant
2008-May-21 04:12 UTC
[Backgroundrb-devel] Problems sending large results with backgroundrb
On Wed, May 21, 2008 at 1:00 AM, Mike Evans <mike at metaswitch.com> wrote:> I''m working on an application that does extensive database searching. These > searches can take a long time, so we have been working on moving the > searches to a backgroundrb worker task so we can provide a sexy AJAX > progress bar, and populate the search results as they are available. All of > this seems to work fine until the size of the search results gets > sufficiently large, when we start to hit exceptions in backgroundrb (most > likely in the packet layer). We are using packet-0.5.1 and backgroundrb > from the latest svn mirror. > > We have found and fixed one problem in the packet sender. This is triggered > when the non-blocking send in NbioHelper::send_once cannot send the entire > buffer, resulting in an exception in the line > > write_scheduled[fileno] ||= connections[fileno].instance > > in Core::schedule_write because connections[fileno] is nil. I can''t claim > to fully understand the code, but I think there are two problems here. > > The main issue seems to be that when Core::handle_write_event calls > write_and_schedule to schedule the write, it doesn''t clear out > internal_scheduled_write[fileno]. It looks like the code is expecting the > cancel_write call at the end of write_and_schedule to clear it out, but this > doesn''t happen if there is enough queued data to cause the non-blocking > write to only partially succeed again. In this case, Core::schedule_write > is called again, and because internal_schedule_write[fileno] has not been > cleared out, the code drops through to the second if test, then hits the > above exception. We fixed this by adding the line > > internal_scheduled_write.delete(fileno) > > immediately before the call to write_and_schedule in > Core::handle_write_event. > > The secondary issue is that the connections[fileno] structure is not getting > populated for this connection - I''m guessing because it is an internal > socket rather than a network socket, but I couldn''t be sure. We changed the > second if test in Core::schedule_write to > > elsif write_scheduled[fileno].nil? && !connections[fileno].nil? > > to firewall against this, but we are not sure if this is the right fix.Thats was surely a bug and I fixed it like this: def schedule_write(t_sock,internal_instance = nil) fileno = t_sock.fileno if UNIXSocket === t_sock && internal_scheduled_write[fileno].nil? write_ios << t_sock internal_scheduled_write[t_sock.fileno] ||= internal_instance elsif write_scheduled[fileno].nil? && !(t_sock.is_a?(UNIXSocket)) write_ios << t_sock write_scheduled[fileno] ||= connections[fileno].instance end end Also, I fixed issue with marshalling larger data across the channel. Thanks for reporting this. I have been terribly busy with things in office and personal life and hence my work on BackgrounDRb has been in hiatus for a while. Unfortunately, you can''t use trunk packet code which is available from: git clone git://github.com/gnufied/packet.git directly with svn mirror of backgroundrb, since packet now uses fork and exec to run workers and hence reducing memory usage of workers. However in a day or two I will update git repository of BackgrounDRb which makes use of latest packet version. In the meanwhile, you can try backporting relevant packet changes to version you are using and see if it fixes your problem.> > We are now hitting problems in the Packet::MetaPimp module receiving the > data, usually an exception in the Marshal.load call in > MetaPimp::receive_data. We suspect this is caused by the packet code > corrupting the data somewhere, probably because we are sending such large > arrays of results (the repro I am working on at the moment is trying to > marshal over 200k of data). We''ve been trying to put extra diagnostics in > the code so we can see what is happening, but if we edit puts statements > into the code we only seem to get output from the end of the connection that > hits an exception and so far our attempts to make logger objects available > throughout the code have failed. We therefore thought we would ask for help > - either to see whether this is a known problem, or whether there is a > recommended way to add diagnostics to the packet code. > > I''m also open to ideas as to better ways to solve the problem! > > Thanks in advance, > > Mike > > > _______________________________________________ > Backgroundrb-devel mailing list > Backgroundrb-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/backgroundrb-devel >-- Let them talk of their oriental summer climes of everlasting conservatories; give me the privilege of making my own summer with my own coals. http://gnufied.org
hemant
2008-May-21 04:36 UTC
[Backgroundrb-devel] Problems sending large results with backgroundrb
You can test git version of backgroundrb with git version of packet (which incorporates latest changes). The procedure is as follows: clone the packet git repo: git clone git://github.com/gnufied/packet.git cd packet;rake gem cd pkg; sudo gem install --local packet-0.1.6.gem Go to your vendor directory of your rails directory and remove or backup older version of backgroundrb plugin and backup related config file as well. from vendor directory: git clone git://gitorious.org/backgroundrb/mainline.git backgroundrb cd RAILS_ROOT <<assuming older script and config file has been backed up>> rake backgroundrb:setup <<modify config/backgroundrb.yml according to your needs>> ./script/backgroundrb start <<Let me know, how it goes and if this fixes your problem>> On Wed, May 21, 2008 at 9:42 AM, hemant <gethemant at gmail.com> wrote:> On Wed, May 21, 2008 at 1:00 AM, Mike Evans <mike at metaswitch.com> wrote: >> I''m working on an application that does extensive database searching. These >> searches can take a long time, so we have been working on moving the >> searches to a backgroundrb worker task so we can provide a sexy AJAX >> progress bar, and populate the search results as they are available. All of >> this seems to work fine until the size of the search results gets >> sufficiently large, when we start to hit exceptions in backgroundrb (most >> likely in the packet layer). We are using packet-0.5.1 and backgroundrb >> from the latest svn mirror. >> >> We have found and fixed one problem in the packet sender. This is triggered >> when the non-blocking send in NbioHelper::send_once cannot send the entire >> buffer, resulting in an exception in the line >> >> write_scheduled[fileno] ||= connections[fileno].instance >> >> in Core::schedule_write because connections[fileno] is nil. I can''t claim >> to fully understand the code, but I think there are two problems here. >> >> The main issue seems to be that when Core::handle_write_event calls >> write_and_schedule to schedule the write, it doesn''t clear out >> internal_scheduled_write[fileno]. It looks like the code is expecting the >> cancel_write call at the end of write_and_schedule to clear it out, but this >> doesn''t happen if there is enough queued data to cause the non-blocking >> write to only partially succeed again. In this case, Core::schedule_write >> is called again, and because internal_schedule_write[fileno] has not been >> cleared out, the code drops through to the second if test, then hits the >> above exception. We fixed this by adding the line >> >> internal_scheduled_write.delete(fileno) >> >> immediately before the call to write_and_schedule in >> Core::handle_write_event. >> >> The secondary issue is that the connections[fileno] structure is not getting >> populated for this connection - I''m guessing because it is an internal >> socket rather than a network socket, but I couldn''t be sure. We changed the >> second if test in Core::schedule_write to >> >> elsif write_scheduled[fileno].nil? && !connections[fileno].nil? >> >> to firewall against this, but we are not sure if this is the right fix. > > Thats was surely a bug and I fixed it like this: > > def schedule_write(t_sock,internal_instance = nil) > fileno = t_sock.fileno > if UNIXSocket === t_sock && internal_scheduled_write[fileno].nil? > write_ios << t_sock > internal_scheduled_write[t_sock.fileno] ||= internal_instance > elsif write_scheduled[fileno].nil? && !(t_sock.is_a?(UNIXSocket)) > write_ios << t_sock > write_scheduled[fileno] ||= connections[fileno].instance > end > end > > Also, I fixed issue with marshalling larger data across the channel. > Thanks for reporting this. I have been terribly busy with things in > office and personal life and hence my work on BackgrounDRb has been in > hiatus for a while. Unfortunately, you can''t use trunk packet code > which is available from: > > git clone git://github.com/gnufied/packet.git > > directly with svn mirror of backgroundrb, since packet now uses fork > and exec to run workers and hence reducing memory usage of workers. > However in a day or two I will update git repository of BackgrounDRb > which makes use of latest packet version. In the meanwhile, you can > try backporting relevant packet changes to version you are using and > see if it fixes your problem. > >> >> We are now hitting problems in the Packet::MetaPimp module receiving the >> data, usually an exception in the Marshal.load call in >> MetaPimp::receive_data. We suspect this is caused by the packet code >> corrupting the data somewhere, probably because we are sending such large >> arrays of results (the repro I am working on at the moment is trying to >> marshal over 200k of data). We''ve been trying to put extra diagnostics in >> the code so we can see what is happening, but if we edit puts statements >> into the code we only seem to get output from the end of the connection that >> hits an exception and so far our attempts to make logger objects available >> throughout the code have failed. We therefore thought we would ask for help >> - either to see whether this is a known problem, or whether there is a >> recommended way to add diagnostics to the packet code. >> >> I''m also open to ideas as to better ways to solve the problem! >> >> Thanks in advance, >> >> Mike >> >> >> _______________________________________________ >> Backgroundrb-devel mailing list >> Backgroundrb-devel at rubyforge.org >> http://rubyforge.org/mailman/listinfo/backgroundrb-devel >> > > > > -- > Let them talk of their oriental summer climes of everlasting > conservatories; give me the privilege of making my own summer with my > own coals. > > http://gnufied.org >-- Let them talk of their oriental summer climes of everlasting conservatories; give me the privilege of making my own summer with my own coals. http://gnufied.org
Mike Evans
2008-May-21 07:14 UTC
[Backgroundrb-devel] Problems sending large results with backgroundrb
Hemant I got to the bottom of the other problem last night. The issue was with the NbioHelper::write_and_schedule method deleting entries from the outbound_data array while iterating through it. This can end up with data getting out of order. I fixed it by changing the outbound_data.delete_at(index) statement to outbound_data[index] = nil, and then compacting the array at the end of the iteration. # write the data in socket buffer and schedule the thing def write_and_schedule sock outbound_data.each_with_index do |t_data,index| leftover = write_once(t_data,sock) if leftover.empty? outbound_data[index] = nil else outbound_data[index] = leftover reactor.schedule_write(sock) break end end outbound_data.compact! reactor.cancel_write(sock) if outbound_data.empty? end Mike -----Original Message----- From: hemant [mailto:gethemant at gmail.com] Sent: 21 May 2008 05:36 To: Mike Evans Cc: backgroundrb-devel at rubyforge.org Subject: Re: [Backgroundrb-devel] Problems sending large results with backgroundrb You can test git version of backgroundrb with git version of packet (which incorporates latest changes). The procedure is as follows: clone the packet git repo: git clone git://github.com/gnufied/packet.git cd packet;rake gem cd pkg; sudo gem install --local packet-0.1.6.gem Go to your vendor directory of your rails directory and remove or backup older version of backgroundrb plugin and backup related config file as well. from vendor directory: git clone git://gitorious.org/backgroundrb/mainline.git backgroundrb cd RAILS_ROOT <<assuming older script and config file has been backed up>> rake backgroundrb:setup <<modify config/backgroundrb.yml according to your needs>> ./script/backgroundrb start <<Let me know, how it goes and if this fixes your problem>> On Wed, May 21, 2008 at 9:42 AM, hemant <gethemant at gmail.com> wrote:> On Wed, May 21, 2008 at 1:00 AM, Mike Evans <mike at metaswitch.com>wrote:>> I''m working on an application that does extensive database searching.>> These searches can take a long time, so we have been working on >> moving the searches to a backgroundrb worker task so we can provide a>> sexy AJAX progress bar, and populate the search results as they are >> available. All of this seems to work fine until the size of the >> search results gets sufficiently large, when we start to hit >> exceptions in backgroundrb (most likely in the packet layer). We are>> using packet-0.5.1 and backgroundrb from the latest svn mirror. >> >> We have found and fixed one problem in the packet sender. This is >> triggered when the non-blocking send in NbioHelper::send_once cannot >> send the entire buffer, resulting in an exception in the line >> >> write_scheduled[fileno] ||= connections[fileno].instance >> >> in Core::schedule_write because connections[fileno] is nil. I can''t >> claim to fully understand the code, but I think there are twoproblems here.>> >> The main issue seems to be that when Core::handle_write_event calls >> write_and_schedule to schedule the write, it doesn''t clear out >> internal_scheduled_write[fileno]. It looks like the code is >> expecting the cancel_write call at the end of write_and_schedule to >> clear it out, but this doesn''t happen if there is enough queued data >> to cause the non-blocking write to only partially succeed again. In >> this case, Core::schedule_write is called again, and because >> internal_schedule_write[fileno] has not been cleared out, the code >> drops through to the second if test, then hits the above exception. >> We fixed this by adding the line >> >> internal_scheduled_write.delete(fileno) >> >> immediately before the call to write_and_schedule in >> Core::handle_write_event. >> >> The secondary issue is that the connections[fileno] structure is not >> getting populated for this connection - I''m guessing because it is an>> internal socket rather than a network socket, but I couldn''t be sure.>> We changed the second if test in Core::schedule_write to >> >> elsif write_scheduled[fileno].nil? && !connections[fileno].nil? >> >> to firewall against this, but we are not sure if this is the rightfix.> > Thats was surely a bug and I fixed it like this: > > def schedule_write(t_sock,internal_instance = nil) > fileno = t_sock.fileno > if UNIXSocket === t_sock &&internal_scheduled_write[fileno].nil?> write_ios << t_sock > internal_scheduled_write[t_sock.fileno] ||= internal_instance > elsif write_scheduled[fileno].nil? &&!(t_sock.is_a?(UNIXSocket))> write_ios << t_sock > write_scheduled[fileno] ||= connections[fileno].instance > end > end > > Also, I fixed issue with marshalling larger data across the channel. > Thanks for reporting this. I have been terribly busy with things in > office and personal life and hence my work on BackgrounDRb has been in> hiatus for a while. Unfortunately, you can''t use trunk packet code > which is available from: > > git clone git://github.com/gnufied/packet.git > > directly with svn mirror of backgroundrb, since packet now uses fork > and exec to run workers and hence reducing memory usage of workers. > However in a day or two I will update git repository of BackgrounDRb > which makes use of latest packet version. In the meanwhile, you can > try backporting relevant packet changes to version you are using and > see if it fixes your problem. > >> >> We are now hitting problems in the Packet::MetaPimp module receiving >> the data, usually an exception in the Marshal.load call in >> MetaPimp::receive_data. We suspect this is caused by the packet code>> corrupting the data somewhere, probably because we are sending such >> large arrays of results (the repro I am working on at the moment is >> trying to marshal over 200k of data). We''ve been trying to put extra>> diagnostics in the code so we can see what is happening, but if we >> edit puts statements into the code we only seem to get output from >> the end of the connection that hits an exception and so far our >> attempts to make logger objects available throughout the code have >> failed. We therefore thought we would ask for help >> - either to see whether this is a known problem, or whether there is >> a recommended way to add diagnostics to the packet code. >> >> I''m also open to ideas as to better ways to solve the problem! >> >> Thanks in advance, >> >> Mike >> >> >> _______________________________________________ >> Backgroundrb-devel mailing list >> Backgroundrb-devel at rubyforge.org >> http://rubyforge.org/mailman/listinfo/backgroundrb-devel >> > > > > -- > Let them talk of their oriental summer climes of everlasting > conservatories; give me the privilege of making my own summer with my > own coals. > > http://gnufied.org >-- Let them talk of their oriental summer climes of everlasting conservatories; give me the privilege of making my own summer with my own coals. http://gnufied.org
hemant
2008-May-21 07:56 UTC
[Backgroundrb-devel] Problems sending large results with backgroundrb
Yeah that too. But I wonder, how did you solve following two problems: Take a look at this code: def handle_write_event(p_ready_fds) p_ready_fds.each do |sock_fd| fileno = sock_fd.fileno if UNIXSocket === sock_fd && internal_scheduled_write[fileno] # we have a problem here write_and_schedule(sock_fd) elsif extern_opts = connection_completion_awaited[fileno] complete_connection(sock_fd,extern_opts) elsif handler_instance = write_scheduled[fileno] # I was drunk while writing following line handler_instance.write_scheduled(sock_fd) end end end The problem is, as you said say in a MetaPimp some data is left unwritten, it won''t get written in subsequent writes because outbound_data belongs to MetaPimp class not main reactor class and hence, it should be: def handle_write_event(p_ready_fds) p_ready_fds.each do |sock_fd| fileno = sock_fd.fileno if UNIXSocket === sock_fd && (internal_instance internal_scheduled_write[fileno]) internal_instance.write_and_schedule(sock_fd) elsif extern_opts = connection_completion_awaited[fileno] complete_connection(sock_fd,extern_opts) elsif handler_instance = write_scheduled[fileno] handler_instance.write_and_schedule(sock_fd) end end end Also, I have included your changes in packet git. So, if you can give backgroundrb git a shot, I will appreciate that ( Please backup your older plugin and config files) On Wed, May 21, 2008 at 12:44 PM, Mike Evans <mike at metaswitch.com> wrote:> Hemant > > I got to the bottom of the other problem last night. The issue was with > the NbioHelper::write_and_schedule method deleting entries from the > outbound_data array while iterating through it. This can end up with > data getting out of order. I fixed it by changing the > outbound_data.delete_at(index) statement to outbound_data[index] = nil, > and then compacting the array at the end of the iteration. > > # write the data in socket buffer and schedule the thing > def write_and_schedule sock > outbound_data.each_with_index do |t_data,index| > leftover = write_once(t_data,sock) > if leftover.empty? > outbound_data[index] = nil > else > outbound_data[index] = leftover > reactor.schedule_write(sock) > break > end > end > outbound_data.compact! > reactor.cancel_write(sock) if outbound_data.empty? > end > > Mike > > -----Original Message----- > From: hemant [mailto:gethemant at gmail.com] > Sent: 21 May 2008 05:36 > To: Mike Evans > Cc: backgroundrb-devel at rubyforge.org > Subject: Re: [Backgroundrb-devel] Problems sending large results with > backgroundrb > > You can test git version of backgroundrb with git version of packet > (which incorporates latest changes). The procedure is as follows: > > clone the packet git repo: > > git clone git://github.com/gnufied/packet.git > cd packet;rake gem > cd pkg; sudo gem install --local packet-0.1.6.gem > > Go to your vendor directory of your rails directory and remove or backup > older version of backgroundrb plugin and backup related config file as > well. > > from vendor directory: > > git clone git://gitorious.org/backgroundrb/mainline.git backgroundrb cd > RAILS_ROOT <<assuming older script and config file has been backed up>> > rake backgroundrb:setup <<modify config/backgroundrb.yml according to > your needs>> ./script/backgroundrb start <<Let me know, how it goes and > if this fixes your problem>> > > > On Wed, May 21, 2008 at 9:42 AM, hemant <gethemant at gmail.com> wrote: >> On Wed, May 21, 2008 at 1:00 AM, Mike Evans <mike at metaswitch.com> > wrote: >>> I''m working on an application that does extensive database searching. > >>> These searches can take a long time, so we have been working on >>> moving the searches to a backgroundrb worker task so we can provide a > >>> sexy AJAX progress bar, and populate the search results as they are >>> available. All of this seems to work fine until the size of the >>> search results gets sufficiently large, when we start to hit >>> exceptions in backgroundrb (most likely in the packet layer). We are > >>> using packet-0.5.1 and backgroundrb from the latest svn mirror. >>> >>> We have found and fixed one problem in the packet sender. This is >>> triggered when the non-blocking send in NbioHelper::send_once cannot >>> send the entire buffer, resulting in an exception in the line >>> >>> write_scheduled[fileno] ||= connections[fileno].instance >>> >>> in Core::schedule_write because connections[fileno] is nil. I can''t >>> claim to fully understand the code, but I think there are two > problems here. >>> >>> The main issue seems to be that when Core::handle_write_event calls >>> write_and_schedule to schedule the write, it doesn''t clear out >>> internal_scheduled_write[fileno]. It looks like the code is >>> expecting the cancel_write call at the end of write_and_schedule to >>> clear it out, but this doesn''t happen if there is enough queued data >>> to cause the non-blocking write to only partially succeed again. In >>> this case, Core::schedule_write is called again, and because >>> internal_schedule_write[fileno] has not been cleared out, the code >>> drops through to the second if test, then hits the above exception. >>> We fixed this by adding the line >>> >>> internal_scheduled_write.delete(fileno) >>> >>> immediately before the call to write_and_schedule in >>> Core::handle_write_event. >>> >>> The secondary issue is that the connections[fileno] structure is not >>> getting populated for this connection - I''m guessing because it is an > >>> internal socket rather than a network socket, but I couldn''t be sure. > >>> We changed the second if test in Core::schedule_write to >>> >>> elsif write_scheduled[fileno].nil? && !connections[fileno].nil? >>> >>> to firewall against this, but we are not sure if this is the right > fix. >> >> Thats was surely a bug and I fixed it like this: >> >> def schedule_write(t_sock,internal_instance = nil) >> fileno = t_sock.fileno >> if UNIXSocket === t_sock && > internal_scheduled_write[fileno].nil? >> write_ios << t_sock >> internal_scheduled_write[t_sock.fileno] ||= internal_instance >> elsif write_scheduled[fileno].nil? && > !(t_sock.is_a?(UNIXSocket)) >> write_ios << t_sock >> write_scheduled[fileno] ||= connections[fileno].instance >> end >> end >> >> Also, I fixed issue with marshalling larger data across the channel. >> Thanks for reporting this. I have been terribly busy with things in >> office and personal life and hence my work on BackgrounDRb has been in > >> hiatus for a while. Unfortunately, you can''t use trunk packet code >> which is available from: >> >> git clone git://github.com/gnufied/packet.git >> >> directly with svn mirror of backgroundrb, since packet now uses fork >> and exec to run workers and hence reducing memory usage of workers. >> However in a day or two I will update git repository of BackgrounDRb >> which makes use of latest packet version. In the meanwhile, you can >> try backporting relevant packet changes to version you are using and >> see if it fixes your problem. >> >>> >>> We are now hitting problems in the Packet::MetaPimp module receiving >>> the data, usually an exception in the Marshal.load call in >>> MetaPimp::receive_data. We suspect this is caused by the packet code > >>> corrupting the data somewhere, probably because we are sending such >>> large arrays of results (the repro I am working on at the moment is >>> trying to marshal over 200k of data). We''ve been trying to put extra > >>> diagnostics in the code so we can see what is happening, but if we >>> edit puts statements into the code we only seem to get output from >>> the end of the connection that hits an exception and so far our >>> attempts to make logger objects available throughout the code have >>> failed. We therefore thought we would ask for help >>> - either to see whether this is a known problem, or whether there is >>> a recommended way to add diagnostics to the packet code. >>> >>> I''m also open to ideas as to better ways to solve the problem! >>> >>> Thanks in advance, >>> >>> Mike >>> >>> >>> _______________________________________________ >>> Backgroundrb-devel mailing list >>> Backgroundrb-devel at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel >>> >> >> >> >> -- >> Let them talk of their oriental summer climes of everlasting >> conservatories; give me the privilege of making my own summer with my >> own coals. >> >> http://gnufied.org >> > > > > -- > Let them talk of their oriental summer climes of everlasting > conservatories; give me the privilege of making my own summer with my > own coals. > > http://gnufied.org >-- Let them talk of their oriental summer climes of everlasting conservatories; give me the privilege of making my own summer with my own coals. http://gnufied.org
Mike Evans
2008-May-24 13:51 UTC
[Backgroundrb-devel] Problems sending large results with backgroundrb
Hemant I''m not sure why we didn''t hit that problem in original testing, but we have hit it in later testing. I''ve tried upgrading to the latest packet and backgroundrb from git, but I''m now having problems with the initial start_worker. I''m trying to start the worker passing it a Ruby object of type SearchDn (which is declared in app/model/search_dn.rb), but I''m hitting the exception below. Previously I was running with :lazy_load set to false, but this doesn''t seem to make any difference - has this feature been retired in this version of code? Any thoughts? Mike /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `gem_original_require'': no such file to load -- dn (MissingSourceFile) from /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require'' from /usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.4/lib/active_support /dependencies.rb:495:in `require'' from /usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.4/lib/active_support /dependencies.rb:342:in `new_constants_in'' from /usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.4/lib/active_support /dependencies.rb:495:in `require'' from /disk0.7/var/opt/MetaViewSAS/tview/vendor/plugins/backgroundrb/server/li b/master_worker.rb:60:in `load_data'' from /disk0.7/var/opt/MetaViewSAS/tview/vendor/plugins/backgroundrb/server/li b/master_worker.rb:32:in `receive_data'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/lib/packet/packet_parser. rb:30:in `call'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/lib/packet/packet_parser. rb:30:in `extract'' ... 9 levels... from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/lib/packet/packet_master. rb:21:in `run'' from /disk0.7/var/opt/MetaViewSAS/tview/vendor/plugins/backgroundrb/server/li b/master_worker.rb:188:in `initialize'' from ../script/backgroundrb:42:in `new'' from ../script/backgroundrb:42 /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_ nbio.rb:25:in `read_data'': Packet::DisconnectError (Packet::DisconnectError) from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_ worker.rb:49:in `handle_internal_messages'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_ core.rb:179:in `handle_read_event'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_ core.rb:177:in `each'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_ core.rb:177:in `handle_read_event'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_ core.rb:133:in `start_reactor'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_ core.rb:126:in `loop'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_ core.rb:126:in `start_reactor'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_ worker.rb:21:in `start_worker'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner: 38:in `load_worker'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner: 26:in `initialize'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner: 47:in `new'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner: 47 from /usr/local/bin/packet_worker_runner:16:in `load'' from /usr/local/bin/packet_worker_runner:16 -----Original Message----- From: hemant [mailto:gethemant at gmail.com] Sent: 21 May 2008 08:56 To: Mike Evans Cc: backgroundrb-devel at rubyforge.org Subject: Re: [Backgroundrb-devel] Problems sending large results with backgroundrb Yeah that too. But I wonder, how did you solve following two problems: Take a look at this code: def handle_write_event(p_ready_fds) p_ready_fds.each do |sock_fd| fileno = sock_fd.fileno if UNIXSocket === sock_fd && internal_scheduled_write[fileno] # we have a problem here write_and_schedule(sock_fd) elsif extern_opts = connection_completion_awaited[fileno] complete_connection(sock_fd,extern_opts) elsif handler_instance = write_scheduled[fileno] # I was drunk while writing following line handler_instance.write_scheduled(sock_fd) end end end The problem is, as you said say in a MetaPimp some data is left unwritten, it won''t get written in subsequent writes because outbound_data belongs to MetaPimp class not main reactor class and hence, it should be: def handle_write_event(p_ready_fds) p_ready_fds.each do |sock_fd| fileno = sock_fd.fileno if UNIXSocket === sock_fd && (internal_instance internal_scheduled_write[fileno]) internal_instance.write_and_schedule(sock_fd) elsif extern_opts = connection_completion_awaited[fileno] complete_connection(sock_fd,extern_opts) elsif handler_instance = write_scheduled[fileno] handler_instance.write_and_schedule(sock_fd) end end end Also, I have included your changes in packet git. So, if you can give backgroundrb git a shot, I will appreciate that ( Please backup your older plugin and config files) On Wed, May 21, 2008 at 12:44 PM, Mike Evans <mike at metaswitch.com> wrote:> Hemant > > I got to the bottom of the other problem last night. The issue was > with the NbioHelper::write_and_schedule method deleting entries from > the outbound_data array while iterating through it. This can end up > with data getting out of order. I fixed it by changing the > outbound_data.delete_at(index) statement to outbound_data[index] = > nil, and then compacting the array at the end of the iteration. > > # write the data in socket buffer and schedule the thing > def write_and_schedule sock > outbound_data.each_with_index do |t_data,index| > leftover = write_once(t_data,sock) > if leftover.empty? > outbound_data[index] = nil > else > outbound_data[index] = leftover > reactor.schedule_write(sock) > break > end > end > outbound_data.compact! > reactor.cancel_write(sock) if outbound_data.empty? > end > > Mike > > -----Original Message----- > From: hemant [mailto:gethemant at gmail.com] > Sent: 21 May 2008 05:36 > To: Mike Evans > Cc: backgroundrb-devel at rubyforge.org > Subject: Re: [Backgroundrb-devel] Problems sending large results with > backgroundrb > > You can test git version of backgroundrb with git version of packet > (which incorporates latest changes). The procedure is as follows: > > clone the packet git repo: > > git clone git://github.com/gnufied/packet.git > cd packet;rake gem > cd pkg; sudo gem install --local packet-0.1.6.gem > > Go to your vendor directory of your rails directory and remove or > backup older version of backgroundrb plugin and backup related config > file as well. > > from vendor directory: > > git clone git://gitorious.org/backgroundrb/mainline.git backgroundrb > cd RAILS_ROOT <<assuming older script and config file has been backed > up>> rake backgroundrb:setup <<modify config/backgroundrb.yml > according to your needs>> ./script/backgroundrb start <<Let me know, > how it goes and if this fixes your problem>> > > > On Wed, May 21, 2008 at 9:42 AM, hemant <gethemant at gmail.com> wrote: >> On Wed, May 21, 2008 at 1:00 AM, Mike Evans <mike at metaswitch.com> > wrote: >>> I''m working on an application that does extensive databasesearching.> >>> These searches can take a long time, so we have been working on >>> moving the searches to a backgroundrb worker task so we can provide >>> a > >>> sexy AJAX progress bar, and populate the search results as they are >>> available. All of this seems to work fine until the size of the >>> search results gets sufficiently large, when we start to hit >>> exceptions in backgroundrb (most likely in the packet layer). We >>> are > >>> using packet-0.5.1 and backgroundrb from the latest svn mirror. >>> >>> We have found and fixed one problem in the packet sender. This is >>> triggered when the non-blocking send in NbioHelper::send_once cannot>>> send the entire buffer, resulting in an exception in the line >>> >>> write_scheduled[fileno] ||= connections[fileno].instance >>> >>> in Core::schedule_write because connections[fileno] is nil. I can''t>>> claim to fully understand the code, but I think there are two > problems here. >>> >>> The main issue seems to be that when Core::handle_write_event calls >>> write_and_schedule to schedule the write, it doesn''t clear out >>> internal_scheduled_write[fileno]. It looks like the code is >>> expecting the cancel_write call at the end of write_and_schedule to >>> clear it out, but this doesn''t happen if there is enough queued data>>> to cause the non-blocking write to only partially succeed again. In>>> this case, Core::schedule_write is called again, and because >>> internal_schedule_write[fileno] has not been cleared out, the code >>> drops through to the second if test, then hits the above exception. >>> We fixed this by adding the line >>> >>> internal_scheduled_write.delete(fileno) >>> >>> immediately before the call to write_and_schedule in >>> Core::handle_write_event. >>> >>> The secondary issue is that the connections[fileno] structure is not>>> getting populated for this connection - I''m guessing because it is >>> an > >>> internal socket rather than a network socket, but I couldn''t besure.> >>> We changed the second if test in Core::schedule_write to >>> >>> elsif write_scheduled[fileno].nil? &&!connections[fileno].nil?>>> >>> to firewall against this, but we are not sure if this is the right > fix. >> >> Thats was surely a bug and I fixed it like this: >> >> def schedule_write(t_sock,internal_instance = nil) >> fileno = t_sock.fileno >> if UNIXSocket === t_sock && > internal_scheduled_write[fileno].nil? >> write_ios << t_sock >> internal_scheduled_write[t_sock.fileno] ||internal_instance >> elsif write_scheduled[fileno].nil? && > !(t_sock.is_a?(UNIXSocket)) >> write_ios << t_sock >> write_scheduled[fileno] ||= connections[fileno].instance >> end >> end >> >> Also, I fixed issue with marshalling larger data across the channel. >> Thanks for reporting this. I have been terribly busy with things in >> office and personal life and hence my work on BackgrounDRb has been >> in > >> hiatus for a while. Unfortunately, you can''t use trunk packet code >> which is available from: >> >> git clone git://github.com/gnufied/packet.git >> >> directly with svn mirror of backgroundrb, since packet now uses fork >> and exec to run workers and hence reducing memory usage of workers. >> However in a day or two I will update git repository of BackgrounDRb >> which makes use of latest packet version. In the meanwhile, you can >> try backporting relevant packet changes to version you are using and >> see if it fixes your problem. >> >>> >>> We are now hitting problems in the Packet::MetaPimp module receiving>>> the data, usually an exception in the Marshal.load call in >>> MetaPimp::receive_data. We suspect this is caused by the packet >>> code > >>> corrupting the data somewhere, probably because we are sending such >>> large arrays of results (the repro I am working on at the moment is >>> trying to marshal over 200k of data). We''ve been trying to put >>> extra > >>> diagnostics in the code so we can see what is happening, but if we >>> edit puts statements into the code we only seem to get output from >>> the end of the connection that hits an exception and so far our >>> attempts to make logger objects available throughout the code have >>> failed. We therefore thought we would ask for help >>> - either to see whether this is a known problem, or whether there is>>> a recommended way to add diagnostics to the packet code. >>> >>> I''m also open to ideas as to better ways to solve the problem! >>> >>> Thanks in advance, >>> >>> Mike >>> >>> >>> _______________________________________________ >>> Backgroundrb-devel mailing list >>> Backgroundrb-devel at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel >>> >> >> >> >> -- >> Let them talk of their oriental summer climes of everlasting >> conservatories; give me the privilege of making my own summer with my>> own coals. >> >> http://gnufied.org >> > > > > -- > Let them talk of their oriental summer climes of everlasting > conservatories; give me the privilege of making my own summer with my > own coals. > > http://gnufied.org >-- Let them talk of their oriental summer climes of everlasting conservatories; give me the privilege of making my own summer with my own coals. http://gnufied.org
Mike Evans
2008-May-24 14:49 UTC
[Backgroundrb-devel] Problems sending large results withbackgroundrb
Hemant I fixed a minor bug that means the code is now getting the right file name, but object file is still failing to load. The fix is to change the regular expression used to process the Marshal.load exception is MasterWorker::load_data from if error_msg =~ /^undefined.+([A-Z]\w+)/ to if error_msg =~ /^undefined.+ ([A-Z]\w+)/ The extra space forces it to take the whole of the last word in the error message, not the just the last capital onward. I suspect the issue I''m now seeing is because the MasterWorker class doesn''t load the Rails environment. Any thoughts on how to fix this? Mike -----Original Message----- From: backgroundrb-devel-bounces at rubyforge.org [mailto:backgroundrb-devel-bounces at rubyforge.org] On Behalf Of Mike Evans Sent: 24 May 2008 14:52 To: hemant Cc: backgroundrb-devel at rubyforge.org Subject: Re: [Backgroundrb-devel] Problems sending large results withbackgroundrb Hemant I''m not sure why we didn''t hit that problem in original testing, but we have hit it in later testing. I''ve tried upgrading to the latest packet and backgroundrb from git, but I''m now having problems with the initial start_worker. I''m trying to start the worker passing it a Ruby object of type SearchDn (which is declared in app/model/search_dn.rb), but I''m hitting the exception below. Previously I was running with :lazy_load set to false, but this doesn''t seem to make any difference - has this feature been retired in this version of code? Any thoughts? Mike /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `gem_original_require'': no such file to load -- dn (MissingSourceFile) from /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require'' from /usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.4/lib/active_support /dependencies.rb:495:in `require'' from /usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.4/lib/active_support /dependencies.rb:342:in `new_constants_in'' from /usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.4/lib/active_support /dependencies.rb:495:in `require'' from /disk0.7/var/opt/MetaViewSAS/tview/vendor/plugins/backgroundrb/server/li b/master_worker.rb:60:in `load_data'' from /disk0.7/var/opt/MetaViewSAS/tview/vendor/plugins/backgroundrb/server/li b/master_worker.rb:32:in `receive_data'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/lib/packet/packet_parser. rb:30:in `call'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/lib/packet/packet_parser. rb:30:in `extract'' ... 9 levels... from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/lib/packet/packet_master. rb:21:in `run'' from /disk0.7/var/opt/MetaViewSAS/tview/vendor/plugins/backgroundrb/server/li b/master_worker.rb:188:in `initialize'' from ../script/backgroundrb:42:in `new'' from ../script/backgroundrb:42 /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_ nbio.rb:25:in `read_data'': Packet::DisconnectError (Packet::DisconnectError) from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_ worker.rb:49:in `handle_internal_messages'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_ core.rb:179:in `handle_read_event'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_ core.rb:177:in `each'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_ core.rb:177:in `handle_read_event'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_ core.rb:133:in `start_reactor'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_ core.rb:126:in `loop'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_ core.rb:126:in `start_reactor'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/../lib/packet/packet_ worker.rb:21:in `start_worker'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner: 38:in `load_worker'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner: 26:in `initialize'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner: 47:in `new'' from /usr/local/lib/ruby/gems/1.8/gems/packet-0.1.6/bin/packet_worker_runner: 47 from /usr/local/bin/packet_worker_runner:16:in `load'' from /usr/local/bin/packet_worker_runner:16 -----Original Message----- From: hemant [mailto:gethemant at gmail.com] Sent: 21 May 2008 08:56 To: Mike Evans Cc: backgroundrb-devel at rubyforge.org Subject: Re: [Backgroundrb-devel] Problems sending large results with backgroundrb Yeah that too. But I wonder, how did you solve following two problems: Take a look at this code: def handle_write_event(p_ready_fds) p_ready_fds.each do |sock_fd| fileno = sock_fd.fileno if UNIXSocket === sock_fd && internal_scheduled_write[fileno] # we have a problem here write_and_schedule(sock_fd) elsif extern_opts = connection_completion_awaited[fileno] complete_connection(sock_fd,extern_opts) elsif handler_instance = write_scheduled[fileno] # I was drunk while writing following line handler_instance.write_scheduled(sock_fd) end end end The problem is, as you said say in a MetaPimp some data is left unwritten, it won''t get written in subsequent writes because outbound_data belongs to MetaPimp class not main reactor class and hence, it should be: def handle_write_event(p_ready_fds) p_ready_fds.each do |sock_fd| fileno = sock_fd.fileno if UNIXSocket === sock_fd && (internal_instance internal_scheduled_write[fileno]) internal_instance.write_and_schedule(sock_fd) elsif extern_opts = connection_completion_awaited[fileno] complete_connection(sock_fd,extern_opts) elsif handler_instance = write_scheduled[fileno] handler_instance.write_and_schedule(sock_fd) end end end Also, I have included your changes in packet git. So, if you can give backgroundrb git a shot, I will appreciate that ( Please backup your older plugin and config files) On Wed, May 21, 2008 at 12:44 PM, Mike Evans <mike at metaswitch.com> wrote:> Hemant > > I got to the bottom of the other problem last night. The issue was > with the NbioHelper::write_and_schedule method deleting entries from > the outbound_data array while iterating through it. This can end up > with data getting out of order. I fixed it by changing the > outbound_data.delete_at(index) statement to outbound_data[index] = > nil, and then compacting the array at the end of the iteration. > > # write the data in socket buffer and schedule the thing > def write_and_schedule sock > outbound_data.each_with_index do |t_data,index| > leftover = write_once(t_data,sock) > if leftover.empty? > outbound_data[index] = nil > else > outbound_data[index] = leftover > reactor.schedule_write(sock) > break > end > end > outbound_data.compact! > reactor.cancel_write(sock) if outbound_data.empty? > end > > Mike > > -----Original Message----- > From: hemant [mailto:gethemant at gmail.com] > Sent: 21 May 2008 05:36 > To: Mike Evans > Cc: backgroundrb-devel at rubyforge.org > Subject: Re: [Backgroundrb-devel] Problems sending large results with > backgroundrb > > You can test git version of backgroundrb with git version of packet > (which incorporates latest changes). The procedure is as follows: > > clone the packet git repo: > > git clone git://github.com/gnufied/packet.git > cd packet;rake gem > cd pkg; sudo gem install --local packet-0.1.6.gem > > Go to your vendor directory of your rails directory and remove or > backup older version of backgroundrb plugin and backup related config > file as well. > > from vendor directory: > > git clone git://gitorious.org/backgroundrb/mainline.git backgroundrb > cd RAILS_ROOT <<assuming older script and config file has been backed > up>> rake backgroundrb:setup <<modify config/backgroundrb.yml > according to your needs>> ./script/backgroundrb start <<Let me know, > how it goes and if this fixes your problem>> > > > On Wed, May 21, 2008 at 9:42 AM, hemant <gethemant at gmail.com> wrote: >> On Wed, May 21, 2008 at 1:00 AM, Mike Evans <mike at metaswitch.com> > wrote: >>> I''m working on an application that does extensive databasesearching.> >>> These searches can take a long time, so we have been working on >>> moving the searches to a backgroundrb worker task so we can provide >>> a > >>> sexy AJAX progress bar, and populate the search results as they are >>> available. All of this seems to work fine until the size of the >>> search results gets sufficiently large, when we start to hit >>> exceptions in backgroundrb (most likely in the packet layer). We >>> are > >>> using packet-0.5.1 and backgroundrb from the latest svn mirror. >>> >>> We have found and fixed one problem in the packet sender. This is >>> triggered when the non-blocking send in NbioHelper::send_once cannot>>> send the entire buffer, resulting in an exception in the line >>> >>> write_scheduled[fileno] ||= connections[fileno].instance >>> >>> in Core::schedule_write because connections[fileno] is nil. I can''t>>> claim to fully understand the code, but I think there are two > problems here. >>> >>> The main issue seems to be that when Core::handle_write_event calls >>> write_and_schedule to schedule the write, it doesn''t clear out >>> internal_scheduled_write[fileno]. It looks like the code is >>> expecting the cancel_write call at the end of write_and_schedule to >>> clear it out, but this doesn''t happen if there is enough queued data>>> to cause the non-blocking write to only partially succeed again. In>>> this case, Core::schedule_write is called again, and because >>> internal_schedule_write[fileno] has not been cleared out, the code >>> drops through to the second if test, then hits the above exception. >>> We fixed this by adding the line >>> >>> internal_scheduled_write.delete(fileno) >>> >>> immediately before the call to write_and_schedule in >>> Core::handle_write_event. >>> >>> The secondary issue is that the connections[fileno] structure is not>>> getting populated for this connection - I''m guessing because it is >>> an > >>> internal socket rather than a network socket, but I couldn''t besure.> >>> We changed the second if test in Core::schedule_write to >>> >>> elsif write_scheduled[fileno].nil? &&!connections[fileno].nil?>>> >>> to firewall against this, but we are not sure if this is the right > fix. >> >> Thats was surely a bug and I fixed it like this: >> >> def schedule_write(t_sock,internal_instance = nil) >> fileno = t_sock.fileno >> if UNIXSocket === t_sock && > internal_scheduled_write[fileno].nil? >> write_ios << t_sock >> internal_scheduled_write[t_sock.fileno] ||internal_instance >> elsif write_scheduled[fileno].nil? && > !(t_sock.is_a?(UNIXSocket)) >> write_ios << t_sock >> write_scheduled[fileno] ||= connections[fileno].instance >> end >> end >> >> Also, I fixed issue with marshalling larger data across the channel. >> Thanks for reporting this. I have been terribly busy with things in >> office and personal life and hence my work on BackgrounDRb has been >> in > >> hiatus for a while. Unfortunately, you can''t use trunk packet code >> which is available from: >> >> git clone git://github.com/gnufied/packet.git >> >> directly with svn mirror of backgroundrb, since packet now uses fork >> and exec to run workers and hence reducing memory usage of workers. >> However in a day or two I will update git repository of BackgrounDRb >> which makes use of latest packet version. In the meanwhile, you can >> try backporting relevant packet changes to version you are using and >> see if it fixes your problem. >> >>> >>> We are now hitting problems in the Packet::MetaPimp module receiving>>> the data, usually an exception in the Marshal.load call in >>> MetaPimp::receive_data. We suspect this is caused by the packet >>> code > >>> corrupting the data somewhere, probably because we are sending such >>> large arrays of results (the repro I am working on at the moment is >>> trying to marshal over 200k of data). We''ve been trying to put >>> extra > >>> diagnostics in the code so we can see what is happening, but if we >>> edit puts statements into the code we only seem to get output from >>> the end of the connection that hits an exception and so far our >>> attempts to make logger objects available throughout the code have >>> failed. We therefore thought we would ask for help >>> - either to see whether this is a known problem, or whether there is>>> a recommended way to add diagnostics to the packet code. >>> >>> I''m also open to ideas as to better ways to solve the problem! >>> >>> Thanks in advance, >>> >>> Mike >>> >>> >>> _______________________________________________ >>> Backgroundrb-devel mailing list >>> Backgroundrb-devel at rubyforge.org >>> http://rubyforge.org/mailman/listinfo/backgroundrb-devel >>> >> >> >> >> -- >> Let them talk of their oriental summer climes of everlasting >> conservatories; give me the privilege of making my own summer with my>> own coals. >> >> http://gnufied.org >> > > > > -- > Let them talk of their oriental summer climes of everlasting > conservatories; give me the privilege of making my own summer with my > own coals. > > http://gnufied.org >-- Let them talk of their oriental summer climes of everlasting conservatories; give me the privilege of making my own summer with my own coals. http://gnufied.org _______________________________________________ Backgroundrb-devel mailing list Backgroundrb-devel at rubyforge.org http://rubyforge.org/mailman/listinfo/backgroundrb-devel
hemant kumar
2008-May-24 15:13 UTC
[Backgroundrb-devel] Problems sending large results withbackgroundrb
On Sat, 2008-05-24 at 15:49 +0100, Mike Evans wrote:> Hemant > > I fixed a minor bug that means the code is now getting the right file > name, but object file is still failing to load. > > The fix is to change the regular expression used to process the > Marshal.load exception is MasterWorker::load_data from > > if error_msg =~ /^undefined.+([A-Z]\w+)/ > > to > > if error_msg =~ /^undefined.+ ([A-Z]\w+)/ > > The extra space forces it to take the whole of the last word in the > error message, not the just the last capital onward. > > I suspect the issue I''m now seeing is because the MasterWorker class > doesn''t load the Rails environment. Any thoughts on how to fix this? >Yeah, Mike, When I saw your mail I knew that there is some problem with that piece of regexp. Now, rails environment IS getting loaded in master worker, but somehow, autoloading of models is not working from master class (although, it works from workers alright) and hence my own hand rolled mechanism of autoloading models. I have no idea, from top of my head, why thats happening, but expect a fix soon ( or a patch is more than welcome ).