All, I''m currently developing an application that is very dependent on hierarchical data. In this application I will have 500,000+ trees(roots), with an average range of 1-20+ data members (children + leaf nodes) for each tree. In this application, there will be more reads than writes, but will still have a fair number of writes, mostly just adding additional objects to the end of the hierarchy. Essentially, it would be like a forum where you have a very large number of root threads, and a fairly small number of threads underneath it (1-100 replies). Currently, in my beta I''m using act_as_tree (adjacent list) with out any problems, but it''s just me and some test data. Eventually, there will around 1,000 concurrent users reading, traversing, and changing the hierarchies/trees. After doing some research on the topic I discovered this plug-in, and started to learn about nested sets. During my research I noticed that many authors highlight that read performance in greatly enhanced with nested sets, while write performance was greatly reduced especially as the tree grows. They also point out that a write requires a table lock (http://dev.mysql.com/tech-resources/articles/hierarchical-data.html) which seems like a pretty high cost when used in a situation where the hierarchy will change fairly regularly as opposed to something like a menu system. For my application, which of these methods is the best fit? Is there a better method out there for my application? I have been bouncing back and forth on the strengths of each method for my application. I appreciate any advice you have to offer! A couple of notes: - Currently using Postgresql 8.1 and plan to move to 8.2 - Currently developing with Rails 1.2RC Thanks! --Nick
Hi Nick, I''m glad you''re interested in Better Nested Set. I think a nested set would serve you well because 1) most of your writes will be to the far right of your trees 2) your trees will be small, and 3) reads will greatly outnumber writes (which is almost always the case in web apps). It sounds like you have a good grasp of this, but it''s worth restating exactly what types of update operations nested sets can and can''t do quickly. At one extreme, adding a child node at the far right of the tree is almost as fast as it would be in an adjacency list setup, because only one additional row in the table (the root) needs to be updated. Same goes for swapping two adjacent nodes. At the other extreme lie things like inserting a node on the far left of a large tree, moving a branch from one side of the tree to the other, etc. All of these require updating every record in the tree. The application where I use nested sets stores about 16,000 records in a single tree, and the ''worst case'' operations take a few seconds to complete. I''m happy to put up with that, because the rapid query abilities allow me to do things that would otherwise be not just slow, but impossible. I think you should try better_nested_set, but I don''t think you should take my word for it-- do some benchmarking with acts_as_tree, then try nested sets. Moving to Better Nested Set should be fairly easy, since the API is very similar. (Of course, you will only see performance improvements if you start using the new query methods that Better Nested Set provides, or writing your own that use the left/right values.) Regarding your concerns about high concurrency: I''ve taken pains to ensure that the code is concurrency-safe (all index-altering methods are wrapped in transactions), and I think performance should be fine. But again, I''d benchmark it. I should also mention that there has been talk on the rails list of writing a tree plugin that uses materialized paths for improved write performance, but that hasn''t happened yet. The good news is that the materialized path API should be fully compatible with Better Nested Set, so you can swap one for the other to fit your needs. In any case let us know how it goes! Cheers, Krishna On 12/21/06, Nick Pavlica <linicks at gmail.com> wrote:> All, > I''m currently developing an application that is very dependent on > hierarchical data. In this application I will have 500,000+ > trees(roots), with an average range of 1-20+ data members (children + > leaf nodes) for each tree. In this application, there will be more > reads than writes, but will still have a fair number of writes, mostly > just adding additional objects to the end of the hierarchy. > Essentially, it would be like a forum where you have a very large > number of root threads, and a fairly small number of threads > underneath it (1-100 replies). > > Currently, in my beta I''m using act_as_tree (adjacent list) with out > any problems, but it''s just me and some test data. Eventually, there > will around 1,000 concurrent users reading, traversing, and changing > the hierarchies/trees. After doing some research on the topic I > discovered this plug-in, and started to learn about nested sets. > During my research I noticed that many authors highlight that read > performance in greatly enhanced with nested sets, while write > performance was greatly reduced especially as the tree grows. They > also point out that a write requires a table lock > (http://dev.mysql.com/tech-resources/articles/hierarchical-data.html) > which seems like a pretty high cost when used in a situation where the > hierarchy will change fairly regularly as opposed to something like a > menu system. > > For my application, which of these methods is the best fit? Is there > a better method out there for my application? I have been bouncing > back and forth on the strengths of each method for my application. I > appreciate any advice you have to offer! > > A couple of notes: > - Currently using Postgresql 8.1 and plan to move to 8.2 > - Currently developing with Rails 1.2RC > > Thanks! > --Nick > _______________________________________________ > Betternestedset-talk mailing list > Betternestedset-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/betternestedset-talk >
Krishna, Thanks for this great reply! I have been trying to move from acts_as_tree to better_nested_sets, but haven''t been able to work through the details. This is mostly due to the fact that I''m fairly new to Rails and Ruby. I was using the following code segments to create sub_projects with acts_as_tree: _____________Add sub project Link_____________ <%= link_to '' > Add A Sub-Project '', :controller => ''project'', :action => "new_subproject", :project_id => @project %> <br /> ______________Model_________________ acts_as_tree :order => "name" ______________Form_______________ <%= hidden_field ''project'', ''parent_id'', :value => @parent.id %> ______________Controller____________________________ def new_subproject @project = Project.new @parent = Project.find(params[:project_id]) end def create_subproject @project = Project.new(params[:project]) if @project.save #@project.move_to_child_of flash[:notice] = ''Project was successfully created.'' redirect_to :action => ''list'' else render :action => ''new'' end end ________________________Helper__________________________ <%= (find_all_subprojects at project) %> def find_all_subprojects(project) if project.children.size > 0 ret = ''<ul>'' project.children.each { |subprj| if subprj.children.size > 0 ret += ''<li>'' ret += link_to h(subprj.name), :action => ''manage'', :id => subprj ret += find_all_subprojects(subprj) ret += ''</li>'' else ret += ''<li>'' ret += link_to h(subprj.name), :action => ''manage'', :id => subprj ret += ''</li>'' end } ret += ''</ul>'' end end After installing the Better Nested Sets plug-in and adding (acts_as_nested_set :order => "name") to my Project model, the new records were being created with the root boundaries being populated in the lft and rgt columns. The parent_id field wasn''t being populated any more, but that was to be expected. The main portion that I can''t seem to get past is when I try to make the subproject a child of what should be it''s parent (move_to_child_of) after the save. @project.move_to_child_of ???? Can you point me in the right direction, it would be much appreciated. Thanks! --Nick
Jean-Christophe Michel
2006-Dec-22 23:41 UTC
[Betternestedset-talk] Application suggestions
Hi Nick, Le 22 d?c. 06, ? 22:50, Nick Pavlica a ?crit :> In this application I will have 500,000+ > trees(roots), with an average range of 1-20+ data members (children + > leaf nodes) for each tree.Are your root trees ordered or not ? If they aren''t, use the :scope to store your 500,000 trees in the same table, without interaction between each of them. This should be very efficient.> The main portion that I can''t > seem to get past is when I try to make the subproject a child of > what should be it''s parent (move_to_child_of) after the save. > > @project.move_to_child_of ???? > > Can you point me in the right direction, it would be much appreciated.Pass an id or an object of the new parent. Jean-Christophe Michel -- symetrie.com Better Nested Set for rails: http://opensource.symetrie.com/trac/better_nested_set
> Are your root trees ordered or not ?I don''t think that they are ordered. Essentially, I have a chain of sub_projects where I can have one or more sub projects at the same level with an unlimited depth, but the aren''t specifically ordered. If they aren''t, use the :scope to> store your 500,000 trees in the same table, without interaction between > each of them. This should be very efficient.Not quite sure what you mean here? But are you saying something like :scope => parent_id ?> > The main portion that I can''t > > seem to get past is when I try to make the subproject a child of > > what should be it''s parent (move_to_child_of) after the save. > > > > @project.move_to_child_of ???? > > > > Can you point me in the right direction, it would be much appreciated. > > Pass an id or an object of the new parent.I''m able to assign a child when I''m at the console, as described on the Wiki. I can also do it manually in my create subproject_method: @project.move_to_child_of 1. But when I try to assign a variable in my create_subproject method from the request parameters, I get a variety of errors. I''m not sure where I''m dropping the ball yet :) Here is my current create method def create_subproject @project = Project.new(params[:project]) @parent = (params[:parent_id]) #@parent = @project.parent_id if @project.save @project.move_to_child_of @parent flash[:notice] = ''Project was successfully created.'' redirect_to :action => ''list'' else render :action => ''new'' end end I get this error : Couldn''t find Project without an ID of course "parent_id"=>"1" is in the request parameters list in the debug message. I tried passing the value with the submit button on the form: <%= submit_tag "Create", :project_id => @parent.id; %> Where parent.id is set with the new method. This was how I was inserting the parent ID when using acts_as_tree. Thanks ! --Nick
Hi Nick, On 12/22/06, Nick Pavlica <linicks at gmail.com> wrote:> If they aren''t, use the :scope to > > store your 500,000 trees in the same table, without interaction between > > each of them. This should be very efficient. > Not quite sure what you mean here? But are you saying something like > :scope => parent_id ?The issue is that you could store all your little trees as one giant tree (with multiple roots) but write performance would be _terrible_, ''cause you''d be updating hundreds of thousands of rows each time. You need to specify a scope if you have more than one tree, so that the code knows which nodes belong in which tree. All members of a tree must have the same scope value (in your case something like root_project_id, it sounds like).> > Here is my current create method > > def create_subproject > @project = Project.new(params[:project]) > @parent = (params[:parent_id]) > #@parent = @project.parent_id > > if @project.save > @project.move_to_child_of @parent > flash[:notice] = ''Project was successfully created.'' > redirect_to :action => ''list'' > else > render :action => ''new'' > end > end >Your code looks good, but if you haven''t changed the form the parent is probably coming in as params[:project[:parent_id]] instead of params[:parent_id] If that doesn''t work, put a ''breakpoint'' before the save and run script/breakpointer to check out your params and see what''s happening. cheers, Krishna
Jean-Christophe Michel
2006-Dec-23 14:04 UTC
[Betternestedset-talk] Application suggestions
Hi Nick, Le 23 d?c. 06, ? 02:09, Nick Pavlica a ?crit :> I get this error : Couldn''t find Project without an ID > > of course "parent_id"=>"1" is in the request parameters list in the > debug message.So here is the error: your param is a string and find wants a Fixnum. Change params[:parent_id] to params[parent_id].to_i Jean-Christophe Michel -- symetrie.com Better Nested Set for rails: http://opensource.symetrie.com/trac/better_nested_set