hemant
2008-Jun-18 20:44 UTC
[Backgroundrb-devel] Please comment on upcoming changes of backgroundrb
Folks, I am getting ready for a new release of BackgrounDRb. It will be mostly tagging of git release which is being used in production. Not to mention, Packet and BackgrounDRb has seen a _lot_ of improvement and fixes since last release. Also, there are few changes that I want to introduce, please comment on them: 1. As posted earlier - A way of running threads inside backgroundrb and yet able to use result saving and stuff like that easily. Currently method is named fetch_parallely which i am planning to rename to run_concurrently. #defer stays as it is. http://gnufied.org/2008/06/12/unthreaded-threads-of-hobbiton/ 2. Ability to cluster and connect to multiple backgroundrb servers. It involves some additions to backgroundrb.yml like: # following section is totally optional, and only useful if you are trying to cluster of backgroundrb server # if you do not specify this section backgroundrb will assume that, from rails you are connecting to the # backgroundrb server which has been specified in previous section :client: :drb_servers: "10.0.0.1:11006,10.0.0.2:11007" So, when you say: MiddleMan.worker(:hello_worker).fooo(@user) BackgrounDRb will delegate the task to your backgroundrb servers in round robin manner. By using: MiddleMan.worker(:hello_worker) you can one specific instance of worker which is tied to one particular server. Also, #new_worker will be working a round robin manner, but you must call #delete on returned object. 3. With clustering, there comes the question of storing worker results. So far, backgroundrb result storage has been a bit of problem. In new version, I am planning to rename ask_status to ask_result. Also, register_status will be deprecated and for storing results, you will have to use: cache[some_key] = result where cache is local cache in your worker. note that, here you are defining a key for your result. But in memory process storage of results won''t work if backgroundrb servers are clustered and hence, you will have to use memcache based storage if you are going to cluster your workers. mechanism will be the same, you will specify the key which will be combined with worker_name and worker_key. Also, job_key wherever used will be replaced with worker_key since I find that name confusing. Thats all folks. Please try git version. Stress test it and let me know about any problems. http://gnufied.org/2008/05/21/bleeding-edge-version-of-backgroundrb-for-better-memory-usage/ -- Let them talk of their oriental summer climes of everlasting conservatories; give me the privilege of making my own summer with my own coals. http://gnufied.org
Stevie Clifton
2008-Jun-20 15:22 UTC
[Backgroundrb-devel] Please comment on upcoming changes of backgroundrb
Hey Hemant, This is great. Thanks for putting the effort to getting the git version tagged so people feel comfortable using it in production! I''m assuming when you say that you''ll tag the git release you''ll also push out new bdrd and packet gems, correct? I had a quick question about fetch_parallely/run_concurrently. Is it basically doing the same thing as defer(), but just with a thread-safe callback? If so, it might be more intuitive to make the name of the method reflect that, such as "defer_concurrently" so as not to make people think that they do completely different things. Also, if the above is true (run_concurrently is a thread-safe version of defer), I think it would be less cumbersome if you just extended defer() to allow the same functionality by passing the name of a callback method in an options hash instead of creating an entirely new method. This way people could just use the block passed to defer() as the request_proc, and another method at the worker level for the response_proc instead of creating a bunch of procs to pass to the method. For example, instead of: run_concurrently(args, request_proc, response_proc) You could do: def example_worker(args) defer(args, :callback => :my_callback_method) |args| # same as the body of request_proc end end def my_callback_method(result) # same as body of response_proc (call register_status or whatever) end I looked through the code in git for fetch_parallely, and I''m not sure if this would give you the same flexibility, but it would be a more intuitive solution to a user IMHO. stevie On Wed, Jun 18, 2008 at 4:44 PM, hemant <gethemant at gmail.com> wrote:> Folks, > > I am getting ready for a new release of BackgrounDRb. It will be > mostly tagging of git release which is being used in production. Not > to mention, Packet and BackgrounDRb has seen a _lot_ of improvement > and fixes since last release. > > Also, there are few changes that I want to introduce, please comment on them: > > 1. As posted earlier - A way of running threads inside backgroundrb > and yet able to use result saving and stuff like that easily. > Currently method is named fetch_parallely which i am planning to > rename to run_concurrently. #defer stays as it is. > > http://gnufied.org/2008/06/12/unthreaded-threads-of-hobbiton/ > > 2. Ability to cluster and connect to multiple backgroundrb servers. It > involves some additions to backgroundrb.yml like: > > # following section is totally optional, and only useful if you are > trying to cluster of backgroundrb server > # if you do not specify this section backgroundrb will assume that, > from rails you are connecting to the > # backgroundrb server which has been specified in previous section > :client: > :drb_servers: "10.0.0.1:11006,10.0.0.2:11007" > > So, when you say: > > MiddleMan.worker(:hello_worker).fooo(@user) > > BackgrounDRb will delegate the task to your backgroundrb servers in > round robin manner. By using: > > MiddleMan.worker(:hello_worker) > > you can one specific instance of worker which is tied to one > particular server. Also, #new_worker will be working a round robin > manner, but you must call #delete on returned object. > > 3. With clustering, there comes the question of storing worker > results. So far, backgroundrb result storage has been a bit of > problem. In new version, I am planning to rename ask_status to > ask_result. Also, register_status will be deprecated and for storing > results, you will have to use: > > cache[some_key] = result > > where cache is local cache in your worker. note that, here you are > defining a key for your result. But in memory process storage of > results won''t work if backgroundrb servers are clustered and hence, > you will have to use memcache based storage if you are going to > cluster your workers. mechanism will be the same, you will specify the > key which will be combined with worker_name and worker_key. > > Also, job_key wherever used will be replaced with worker_key since I > find that name confusing. > > Thats all folks. Please try git version. Stress test it and let me > know about any problems. > > http://gnufied.org/2008/05/21/bleeding-edge-version-of-backgroundrb-for-better-memory-usage/ > > > -- > Let them talk of their oriental summer climes of everlasting > conservatories; give me the privilege of making my own summer with my > own coals. > > http://gnufied.org > _______________________________________________ > Backgroundrb-devel mailing list > Backgroundrb-devel at rubyforge.org > http://rubyforge.org/mailman/listinfo/backgroundrb-devel >
hemant
2008-Jun-20 17:26 UTC
[Backgroundrb-devel] Please comment on upcoming changes of backgroundrb
On Fri, Jun 20, 2008 at 8:52 PM, Stevie Clifton <stevie at slowbicycle.com> wrote:> Hey Hemant, > > This is great. Thanks for putting the effort to getting the git > version tagged so people feel comfortable using it in production! I''m > assuming when you say that you''ll tag the git release you''ll also push > out new bdrd and packet gems, correct? > > I had a quick question about fetch_parallely/run_concurrently. Is it > basically doing the same thing as defer(), but just with a thread-safe > callback? If so, it might be more intuitive to make the name of the > method reflect that, such as "defer_concurrently" so as not to make > people think that they do completely different things. > > Also, if the above is true (run_concurrently is a thread-safe version > of defer), I think it would be less cumbersome if you just extended > defer() to allow the same functionality by passing the name of a > callback method in an options hash instead of creating an entirely new > method. This way people could just use the block passed to defer() as > the request_proc, and another method at the worker level for the > response_proc instead of creating a bunch of procs to pass to the > method. For example, instead of: > > run_concurrently(args, request_proc, response_proc) > > You could do: > > def example_worker(args) > defer(args, :callback => :my_callback_method) |args| > # same as the body of request_proc > end > end > > def my_callback_method(result) > # same as body of response_proc (call register_status or whatever) > end > > I looked through the code in git for fetch_parallely, and I''m not sure > if this would give you the same flexibility, but it would be a more > intuitive solution to a user IMHO. >Well, thanks for looking up. Actually I made quite bit of changes and pushed an update to git on testcase branch. http://github.com/gnufied/backgroundrb/commits/testcase Well, I take your advice for run_concurrent method. But I have made result storage completely thread safe and hence user can call it from threads without any worries. Actually, I made quite a bit of API changes today: Here is a sample API, if you try new branch: http://pastie.org/218967 Or # Run a task like: # this is our dear remote process class HelloWorker set_worker_name :hello_worker def barbar t_user # runs method some_task inside thread pool thread_pool.defer(t_user,method(:some_task)) end def some_task user user_feeds = user.feeds loop do # user can retrieve/add/edit objects from result/cache without worrying about thread safety # when you call job_key in your threads it automatically resolves to job_key for the task being executed # there may be another task being executed in another thread, but since job_key is Thread local variable # it will always resolve to the correct one. old_counter = cache[job_key] cache[job_key] += 10 end end end # invoke tasks/methods in worker async MiddleMan.worker(:hello_worker).async_barbar(<some_job_key>, at user) # invoke method in worker sync MiddleMan.worker(:hello_worker).barbar(<some_job_key>, at user) # ask the result object stored with job_key "wow" MiddleMan.worker(:hello_worker).ask_result(<job_key_or_key_with_which_you_stored_result>) # If you are doing this in production, i will strongly advise to use mecache for result storage # there is an issue if user invokes multiple tasks in thread pool directly from one of the worker # under current settings they are going to end up with same job key # Also, new_worker can''t have same method invocation conventions because it accepts more parameters. Again, MiddleMan wraps basically a cluster of bdrb servers now and when you say: MiddleMan.worker(:hello_worker).async_barbar(<some_job_key>, at user) It will be invoked in a round robin manner in all the bdrb servers specified in configuration file. Complete configuration file looks like: # A Sample YAML configuration file --- :backgroundrb: :ip: 0.0.0.0 #ip on which backgroundrb server is running :port: 11006 #port on which backgroundrb server is running :environment: production # rails environment loaded, defaults to development :debug_log: true # whether to print debug logs to a seperate worker, defaults to true :log: foreground # will print log messages to STDOUT, defaults to seperate log worker :result_storage: memcache # store results in a mecache cluster, you also need to specify location of your memcache clusters in next section :memcache: "10.0.0.1:11211,10.0.0.2:11211" #=> location of mecache clusters seperated by comma # following section is totally optional, and only useful if you are trying to cluster of backgroundrb server # if you do not specify this section backgroundrb will assume that, from rails you are connecting to the # backgroundrb server which has been specified in previous section :client: :drb_servers: "10.0.0.1:11006,10.0.0.2:11007" # You specify your worker schedules here :schedules: :foo_worker: # worker name :barbar: #worker method :trigger_args: */5 * * * * * * #worker schedule Please comment on this.