Dave Dupre
2008-Jan-03 20:10 UTC
[Backgroundrb-devel] Memory leak and long process problem
I use backgroundrb for many long tasks in my system, but I''m having issues with one in particular. Two large tasks for me are importing people and updating companies. def import_contacts(args = nil) thread_pool.defer(args) do |job_id| begin job = ImportJob.find(job_id) job.process_job rescue => err logger.error "MscWorker#import_contacts failed! #{err.class}: #{err}" end end end def update_company_from_vendor(args = nil) thread_pool.defer(args) do |company_id| begin company = Company.find(company_id) info = company.firm_info_from_vendor # webservice call to vendor if info && info.size == 1 company.update_from_vendor!(Company.find_firm_info_details_from_vendor(info[0])) # webservice call to vendor end rescue => err logger.error "MscWorker#update_company_from_vendor failed! #{ err.class}: #{err}" end end end Part of import_contacts will result in many ask_work calls to update_company_from_vendor while it is processing. Importing contacts is heavily db dependent, but not very code intensive. If I upload two files with > 1000 contacts each (two ask_work calls to import_contacts), things will progress along and then pause for 20-40 seconds. There is no DB activity during the pause, but the backgroundrb process is using most of CPU (98-99%). There are no deadlock errors when things startup again, but it really slows things down. Are you using polling somewhere? Also, on my Mac, Activity Monitor is only showing 1 thread and 1.2 Gig(!!) of memory used. I expected to see many threads due to my use of thread_pool. Since all of my processing code is in models, it is very easy to switch to synchronous execution. When I execute job.process_job (see import_contacts), things never pause, and the ruby process never gets over 120Meg in size. This all leaves me with two questions: 1. Sure looks like there is a serious memory leak someplace, but I don''t think it is my code. 2. What is the recommended method for this processing. Currently, I have a single worker for my web app to call for background tasks -- each task is implemented as a thread pool. I don''t have much need for status since I can get the status from the database. Should I change things to dynamically create workers? 3. I should repeat that I never saw multiple threads being created even though update_company_from_vendor was called 1500 times during one call to import_contacts. update_company_from_vendor takes several seconds to execute so I know calls should have queued up. Thoughts? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/backgroundrb-devel/attachments/20080103/912d7699/attachment.html
Hi Dave, On Jan 4, 2008 1:40 AM, Dave Dupre <gobigdave at gmail.com> wrote:> I use backgroundrb for many long tasks in my system, but I''m having issues > with one in particular. Two large tasks for me are importing people and > updating companies. >So method below is invoked from rails using ask_work command, right? Are you by any chance, passing uploaded file itself to worker? If yes, preferably don''t do that. Save the file somewhere or in db and pas the location. And why do you need, thread_pool there? Do you want concurrent execution of tasks, or you just want a worker queue?> def import_contacts(args = nil) > thread_pool.defer(args) do |job_id| > begin > job = ImportJob.find(job_id) > job.process_job > rescue => err > logger.error "MscWorker#import_contacts failed! #{err.class}: > #{err}" > end > end > end >Similar doubts as previous worker method.> def update_company_from_vendor(args = nil) > thread_pool.defer(args) do |company_id| > begin > company = Company.find(company_id) > info = company.firm_info_from_vendor # webservice call to vendor > if info && info.size == 1 > company.update_from_vendor!(Company.find_firm_info_details > _from_vendor(info[0])) # webservice call to vendor > end > rescue => err > logger.error "MscWorker#update_company_from_vendor failed! > #{err.class}: #{err}" > end > end > end > > Part of import_contacts will result in many ask_work calls to > update_company_from_vendor while it is processing. Importing contacts is > heavily db dependent, but not very code intensive. If I upload two files > with > 1000 contacts each (two ask_work calls to import_contacts), things > will progress along and then pause for 20-40 seconds. There is no DB > activity during the pause, but the backgroundrb process is using most of CPU > (98-99%). There are no deadlock errors when things startup again, but it > really slows things down. Are you using polling somewhere? > > Also, on my Mac, Activity Monitor is only showing 1 thread and 1.2 Gig(!!) > of memory used. I expected to see many threads due to my use of > thread_pool. > > Since all of my processing code is in models, it is very easy to switch to > synchronous execution. When I execute job.process_job (see > import_contacts), things never pause, and the ruby process never gets over > 120Meg in size. > > This all leaves me with two questions: > > 1. Sure looks like there is a serious memory leak someplace, but I don''t > think it is my code. > > 2. What is the recommended method for this processing. Currently, I have a > single worker for my web app to call for background tasks -- each task is > implemented as a thread pool. I don''t have much need for status since I can > get the status from the database. Should I change things to dynamically > create workers? > > 3. I should repeat that I never saw multiple threads being created even > though update_company_from_vendor was called 1500 times during one call to > import_contacts. update_company_from_vendor takes several seconds to > execute so I know calls should have queued up.Ruby uses green threads, so I don''t think Activity Monitor will show multiple created threads.Also, thread pool reaches its pool size depending upon number of tasks in the queue. If queue is empty, only one thread will be actually created initially. -- Let them talk of their oriental summer climes of everlasting conservatories; give me the privilege of making my own summer with my own coals. http://gnufied.org