Russell Branca
2010-May-25 18:53 UTC
Forking off the unicorn master process to create a background worker
Hello, I''m trying to find an efficient way to create a new instance of a rails application to perform some background tasks without having to load up the entire rails stack every time, so I figured forking off the master process would be a good way to go. Now I can easily just increment the worker count and then send a web request in, but the new worker would be part of the main worker pool, so in the time between spawning a new worker and sending the request, another request could have come in and snagged that worker. Is it possible to create a new worker and not have it enter the main worker pool so I could access it directly? I know this is not your typical use case for unicorn, and you''re probably thinking there is a lot better ways to do this, however, I currently have a rails framework that powers a handful of standalone applications on a server with limited resources, and I''m trying to make a centralized queue that all the applications use, so the queue needs to be able to spawn a new worker for each of the applications efficiently, and incrementing/decrementing worker counts in unicorn is the most efficient way I''ve found to spawn a new rails instance. Any help, suggestions or insight into this would be greatly appreciated. -Russell
Eric Wong
2010-May-26 21:05 UTC
Forking off the unicorn master process to create a background worker
Russell Branca <chewbranca at gmail.com> wrote:> Hello, > > I''m trying to find an efficient way to create a new instance of a > rails application to perform some background tasks without having to > load up the entire rails stack every time, so I figured forking off > the master process would be a good way to go. Now I can easily just > increment the worker count and then send a web request in, but the new > worker would be part of the main worker pool, so in the time between > spawning a new worker and sending the request, another request could > have come in and snagged that worker. Is it possible to create a new > worker and not have it enter the main worker pool so I could access it > directly?Hi Russell, You could try having an endpoint in your webapp (with authentication, or have it reject env[''REMOTE_ADDR''] != ''127.0.0.1'') that runs the background task for you. Since it''s a background app, you should probably fork + Process.setsid + fork (or Process.daemon in 1.9), and return an HTTP response immediately so your app can serve other requests. The following example should be enough to get you started (totally untested) ------------ config.ru ------------- require ''rack/lobster'' map "/.seekrit_endpoint" do use Rack::ContentLength use Rack::ContentType, ''text/plain'' run(lambda { |env| return [ 403, {}, [] ] if env[''REMOTE_ADDR''] != ''127.0.0.1'' pid = fork if pid Process.waitpid(pid) # cheap way to avoid unintentional fd sharing with our children, # this causes the current Unicorn worker to exit after sending # the response: # Otherwise you''d have to be careful to disconnect+reconnect # databases/memcached/redis/whatever (in both the parent and # child) to avoid unintentional sharing that''ll lead to headaches Process.kill(:QUIT, $$) [ 200, {}, [ "started background process\n" ] ] else # child, daemonize it so the unicorn master won''t need to # reap it (that''s the job of init) Process.setsid exit if fork begin # run your background code here instead of sleeping sleep 5 env["rack.logger"].info "done sleeping" rescue => e env["rack.logger"].error(e.inspect) end # make sure we don''t enter the normal response cycle back in the # worker... exit!(0) end }) end map "/" do run Rack::Lobster.new end> I know this is not your typical use case for unicorn, and you''re > probably thinking there is a lot better ways to do this, however, I > currently have a rails framework that powers a handful of standalone > applications on a server with limited resources, and I''m trying to > make a centralized queue that all the applications use, so the queue > needs to be able to spawn a new worker for each of the applications > efficiently, and incrementing/decrementing worker counts in unicorn is > the most efficient way I''ve found to spawn a new rails instance.Yeah, it''s definitely an odd case and there are ways to shoot yourself in the foot with it (especially with unintentional fd sharing), but Ruby exposes all the Unix process management goodies better than most languages (probably better than anything else I''ve used).> Any help, suggestions or insight into this would be greatly appreciated.Let us know how it goes :) -- Eric Wong
Russell Branca
2010-Jun-15 17:55 UTC
Forking off the unicorn master process to create a background worker
Hello Eric, Sorry for the delayed response, with the combination of being sick and heading out of town for a while, this project got put on the backburner. I really appreciate your response and think its a clean solution for what I''m trying to do. I''ve started back in getting the job queue working this week, and will hopefully have a working solution in the next day or two. A little more information about what I''m doing, I''m trying to create a centralized resque job queue server that each of the different applications can queue work into, so I''ll be using redis behind resque for storing jobs and what not, which brings me an area I''m not sure of the best approach on. So when we hit the job queue endpoint in the rack app, it spawns the new worker, and then immediately returns the 200 ok started background job message, which cuts off communication back to the job queue. My plan is to save a status message of the result of the background task into redis, and have resque check that to verify the task was successful. Is there a better approach for returning the resulting status code with unicorn, or is this a reasonable approach? Thanks again for your help. -Russell On Wed, May 26, 2010 at 2:05 PM, Eric Wong <normalperson at yhbt.net> wrote:> Russell Branca <chewbranca at gmail.com> wrote: >> Hello, >> >> I''m trying to find an efficient way to create a new instance of a >> rails application to perform some background tasks without having to >> load up the entire rails stack every time, so I figured forking off >> the master process would be a good way to go. Now I can easily just >> increment the worker count and then send a web request in, but the new >> worker would be part of the main worker pool, so in the time between >> spawning a new worker and sending the request, another request could >> have come in and snagged that worker. Is it possible to create a new >> worker and not have it enter the main worker pool so I could access it >> directly? > > Hi Russell, > > You could try having an endpoint in your webapp (with authentication, or > have it reject env[''REMOTE_ADDR''] != ''127.0.0.1'') that runs the > background task for you. ?Since it''s a background app, you should > probably fork + Process.setsid + fork (or Process.daemon in 1.9), and > return an HTTP response immediately so your app can serve other > requests. > > The following example should be enough to get you started (totally > untested) > > ------------ config.ru ------------- > require ''rack/lobster'' > > map "/.seekrit_endpoint" do > ?use Rack::ContentLength > ?use Rack::ContentType, ''text/plain'' > ?run(lambda { |env| > ? ?return [ 403, {}, [] ] if env[''REMOTE_ADDR''] != ''127.0.0.1'' > ? ?pid = fork > ? ?if pid > ? ? ?Process.waitpid(pid) > > ? ? ?# cheap way to avoid unintentional fd sharing with our children, > ? ? ?# this causes the current Unicorn worker to exit after sending > ? ? ?# the response: > ? ? ?# Otherwise you''d have to be careful to disconnect+reconnect > ? ? ?# databases/memcached/redis/whatever (in both the parent and > ? ? ?# child) to avoid unintentional sharing that''ll lead to headaches > ? ? ?Process.kill(:QUIT, $$) > > ? ? ?[ 200, {}, [ "started background process\n" ] ] > ? ?else > ? ? ?# child, daemonize it so the unicorn master won''t need to > ? ? ?# reap it (that''s the job of init) > ? ? ?Process.setsid > ? ? ?exit if fork > > ? ? ?begin > ? ? ? ?# run your background code here instead of sleeping > ? ? ? ?sleep 5 > ? ? ? ?env["rack.logger"].info "done sleeping" > ? ? ?rescue => e > ? ? ? ?env["rack.logger"].error(e.inspect) > ? ? ?end > ? ? ?# make sure we don''t enter the normal response cycle back in the > ? ? ?# worker... > ? ? ?exit!(0) > ? ?end > ?}) > end > > map "/" do > ?run Rack::Lobster.new > end > >> I know this is not your typical use case for unicorn, and you''re >> probably thinking there is a lot better ways to do this, however, I >> currently have a rails framework that powers a handful of standalone >> applications on a server with limited resources, and I''m trying to >> make a centralized queue that all the applications use, so the queue >> needs to be able to spawn a new worker for each of the applications >> efficiently, and incrementing/decrementing worker counts in unicorn is >> the most efficient way I''ve found to spawn a new rails instance. > > Yeah, it''s definitely an odd case and there are ways to shoot yourself > in the foot with it (especially with unintentional fd sharing), but Ruby > exposes all the Unix process management goodies better than most > languages (probably better than anything else I''ve used). > >> Any help, suggestions or insight into this would be greatly appreciated. > > Let us know how it goes :) > > -- > Eric Wong >
Eric Wong
2010-Jun-15 22:14 UTC
Forking off the unicorn master process to create a background worker
Russell Branca <chewbranca at gmail.com> wrote:> Hello Eric, > > Sorry for the delayed response, with the combination of being sick and > heading out of town for a while, this project got put on the > backburner. I really appreciate your response and think its a clean > solution for what I''m trying to do. I''ve started back in getting the > job queue working this week, and will hopefully have a working > solution in the next day or two. A little more information about what > I''m doing, I''m trying to create a centralized resque job queue server > that each of the different applications can queue work into, so I''ll > be using redis behind resque for storing jobs and what not, which > brings me an area I''m not sure of the best approach on. So when we hit > the job queue endpoint in the rack app, it spawns the new worker, and > then immediately returns the 200 ok started background job message, > which cuts off communication back to the job queue. My plan is to save > a status message of the result of the background task into redis, and > have resque check that to verify the task was successful. Is there a > better approach for returning the resulting status code with unicorn, > or is this a reasonable approach? Thanks again for your help.Hi Russell, please don''t top post, thanks. If you already have a queue server (and presumably a standalone app processing the queue), I would probably forgo the background Unicorn worker entirely. Based on my ancient (mid-2000s) knowledge of user-facing web applications: the application should queue the job, return 200, and have HTML meta refresh to constantly reload the page every few seconds. Hitting the reload endpoint would check the database (Redis in this case) for completion, and return a new HTML page to stop the meta refresh loop. This means you''re no longer keeping a single Unicorn worker idle and wasting it. Nowadays you could do it with long-polling on Rainbows!/Thin/Zbatery, too, but long-polling is less reliable for people switching between WiFi access points. The meta refresh method can be a waste of power/bandwidth on the client side if the background job takes a long time, though. I''m familiar at all with Resque or Redis, but I suspect other folks on this mailing list should be able to help you flesh out the details. -- Eric Wong
Russell Branca
2010-Jun-15 22:51 UTC
Forking off the unicorn master process to create a background worker
On Tue, Jun 15, 2010 at 3:14 PM, Eric Wong <normalperson at yhbt.net> wrote:> Russell Branca <chewbranca at gmail.com> wrote: >> Hello Eric, >> >> Sorry for the delayed response, with the combination of being sick and >> heading out of town for a while, this project got put on the >> backburner. I really appreciate your response and think its a clean >> solution for what I''m trying to do. I''ve started back in getting the >> job queue working this week, and will hopefully have a working >> solution in the next day or two. A little more information about what >> I''m doing, I''m trying to create a centralized resque job queue server >> that each of the different applications can queue work into, so I''ll >> be using redis behind resque for storing jobs and what not, which >> brings me an area I''m not sure of the best approach on. So when we hit >> the job queue endpoint in the rack app, it spawns the new worker, and >> then immediately returns the 200 ok started background job message, >> which cuts off communication back to the job queue. My plan is to save >> a status message of the result of the background task into redis, and >> have resque check that to verify the task was successful. Is there a >> better approach for returning the resulting status code with unicorn, >> or is this a reasonable approach? Thanks again for your help. > > Hi Russell, please don''t top post, thanks. > > If you already have a queue server (and presumably a standalone app > processing the queue), I would probably forgo the background Unicorn > worker entirely. > > Based on my ancient (mid-2000s) knowledge of user-facing web > applications: the application should queue the job, return 200, and have > HTML meta refresh to constantly reload the page every few seconds. > > Hitting the reload endpoint would check the database (Redis in this > case) for completion, and return a new HTML page to stop the meta > refresh loop. > > This means you''re no longer keeping a single Unicorn worker idle and > wasting it. ?Nowadays you could do it with long-polling on > Rainbows!/Thin/Zbatery, too, but long-polling is less reliable for > people switching between WiFi access points. ?The meta refresh method > can be a waste of power/bandwidth on the client side if the background > job takes a long time, though. > > I''m familiar at all with Resque or Redis, but I suspect other folks > on this mailing list should be able to help you flesh out the details. > > -- > Eric Wong >Hi Eric, I have a queue server, but I don''t have a standalone app processing the jobs, because I have a large number of stand alone applications on a single server. Right now I''ve got 12 separate apps running, so if I wanted to have a standalone app for each, that would be 12 additional applications in memory for handling background jobs. The whole reason I want to go with the unicorn worker approach for handling background jobs, is so I can fork off the master process as needed, avoid the spawning time for a normal rails instance, and only use workers as needed. This way I can have just a few workers running at any given time, rather than 1 worker for each app. The number of apps is only going to increase, but I want to keep the worker pool a constant. I''ll probably just update status of completion with redis, these jobs won''t be run by users, this is all background stuff like sending notifications, data analysis, feed parsing, etc etc, so I''m planning on just having resque initiate a request directly, and then use unicorn to process the task in the background. I didn''t exactly follow what you meant when you were talking about a unicorn worker being idle, from the example config.ru you responded with earlier on, it looks like I can just spawn a new worker that will be outside of the normal worker pool to handle the job. I''m pretty sure this will work, I was curious about the best approach for returning completion status, but I think just having the worker record its status and exit is better than having long polling connections open between the job queue and the new unicorn worker. -Russell
Eric Wong
2010-Jun-16 00:06 UTC
Forking off the unicorn master process to create a background worker
Russell Branca <chewbranca at gmail.com> wrote:> On Tue, Jun 15, 2010 at 3:14 PM, Eric Wong <normalperson at yhbt.net> wrote: > > Russell Branca <chewbranca at gmail.com> wrote: > >> Hello Eric, > >> > >> Sorry for the delayed response, with the combination of being sick and > >> heading out of town for a while, this project got put on the > >> backburner. I really appreciate your response and think its a clean > >> solution for what I''m trying to do. I''ve started back in getting the > >> job queue working this week, and will hopefully have a working > >> solution in the next day or two. A little more information about what > >> I''m doing, I''m trying to create a centralized resque job queue server > >> that each of the different applications can queue work into, so I''ll > >> be using redis behind resque for storing jobs and what not, which > >> brings me an area I''m not sure of the best approach on. So when we hit > >> the job queue endpoint in the rack app, it spawns the new worker, and > >> then immediately returns the 200 ok started background job message, > >> which cuts off communication back to the job queue. My plan is to save > >> a status message of the result of the background task into redis, and > >> have resque check that to verify the task was successful. Is there a > >> better approach for returning the resulting status code with unicorn, > >> or is this a reasonable approach? Thanks again for your help. > > > > Hi Russell, please don''t top post, thanks. > > > > If you already have a queue server (and presumably a standalone app > > processing the queue), I would probably forgo the background Unicorn > > worker entirely. > > > > Based on my ancient (mid-2000s) knowledge of user-facing web > > applications: the application should queue the job, return 200, and have > > HTML meta refresh to constantly reload the page every few seconds. > > > > Hitting the reload endpoint would check the database (Redis in this > > case) for completion, and return a new HTML page to stop the meta > > refresh loop. > > > > This means you''re no longer keeping a single Unicorn worker idle and > > wasting it. ?Nowadays you could do it with long-polling on > > Rainbows!/Thin/Zbatery, too, but long-polling is less reliable for > > people switching between WiFi access points. ?The meta refresh method > > can be a waste of power/bandwidth on the client side if the background > > job takes a long time, though. > > > > I''m familiar at all with Resque or Redis, but I suspect other folks > > on this mailing list should be able to help you flesh out the details. > > Hi Eric, > > I have a queue server, but I don''t have a standalone app processing > the jobs, because I have a large number of stand alone applications on > a single server. Right now I''ve got 12 separate apps running, so if I > wanted to have a standalone app for each, that would be 12 additional > applications in memory for handling background jobs. The whole reason > I want to go with the unicorn worker approach for handling background > jobs, is so I can fork off the master process as needed, avoid the > spawning time for a normal rails instance, and only use workers as > needed. This way I can have just a few workers running at any given > time, rather than 1 worker for each app. The number of apps is only > going to increase, but I want to keep the worker pool a constant. I''ll > probably just update status of completion with redis, these jobs won''t > be run by users, this is all background stuff like sending > notifications, data analysis, feed parsing, etc etc, so I''m planning > on just having resque initiate a request directly, and then use > unicorn to process the task in the background.Ah, so I guess it''s a single queue server but multiple queues? I guess thats where I got confused with your description.> I didn''t exactly follow what you meant when you were talking about a > unicorn worker being idle, from the example config.ru you responded > with earlier on, it looks like I can just spawn a new worker that will > be outside of the normal worker pool to handle the job. I''m pretty > sure this will work, I was curious about the best approach for > returning completion status, but I think just having the worker record > its status and exit is better than having long polling connections > open between the job queue and the new unicorn worker.Yes, having the fork as I made in the example should work. I haven''t tested it, of course :) My instincts tell me recording the status and exiting ASAP is better because it uses less memory. You should test and experiment with it either way. You know your apps, requirements, and Redis/Resque far better than I do :) Consider software an evolutionary process, so whatever the "best approach" may be, another one can usurp it eventually or be completely wrong in a slightly different setting :) -- Eric Wong