thr3ads.net - Betternestedset talk - [Betternestedset-talk] Improving the nested_set model to remove redundancy [Feb 2007]

If this information is useful, please help other people find it:
Share via:

Jeff Dean

2007-Feb-09 21:22 UTC

[Betternestedset-talk] Improving the nested_set model to remove redundancy

I''ve been working with the Celko sets a bit lately and looking at the
rails
nested_set and better_nested_set and I think there are a few
redundancies/areas of improvement for performance in both models. I may not
understand the issues well, so please excuse me if these points are naive:

First, I think the parent_id field is completely unnecessary. The point of
the nested set model seems to be that you eliminate the need for the join,
thereby creating a table with a single fact, in a single place. If you
don''t have a parent_id, you identify a root as anything with lft=1.
You can
achieve multiple roots by using composite keys (by adding a set_id column,
for example).

Second, the numbering scheme need not be contiguous. If you use the full
range of floats given in a database you can insert thousands of children
without ever having to recalculate other rows in the database. This will
drastically increase performance on inserts/updates, and unless you are
working with huge sets, you may never need to recalculate parent boundaries.

For more information (if you haven''t already seen it), check out:

http://searchoracle.techtarget.com/tip/1,289483,sid13_gci537290,00.html

Way down the page, Celko talks about

"1) The Algebra Property: For each node (n), (rgt-lft+1)/2 = size of
subtree
rooted at (n). This requires that the union of the rgt and lft numbers be an
unbroken sequence, no gaps.

2) The Between-ness Property that you can find superiors and subordinates
with a BETWEEN predicate"
Further reading on how to perform much more complex calculations, check out:

http://www.dbazine.com/oracle/or-articles/tropashko4

Is there any reason why we would need to keep parent_id or unbroken
sequences?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/betternestedset-talk/attachments/20070209/fd9a012e/attachment.html

Krishna Dole

2007-Feb-10 00:15 UTC

head link

[Betternestedset-talk] Improving the nested_set model to remove redundancy

Hi Jeff,

Thanks for writing us about this, and welcome to the list.
> First, I think the parent_id field is completely unnecessary.  The point of
> the nested set model seems to be that you eliminate the need for the join,
> thereby creating a table with a single fact, in a single place.  If you
> don''t have a parent_id, you identify a root as anything with
lft=1.  You can
> achieve multiple roots by using composite keys (by adding a set_id column,
> for example).
You''re right-- strictly speaking the parent_id field is unnecessary,
and thus non-normalized. However, we have chosen to keep it thus far
for several reasons:

1) It provides a good backup to the lft/rgt indexes. Nested sets are
inherently both volatile and brittle. I feel good about our code''s
ability to maintain the left/right values, but if they somehow became
corrupted, repairing them without having parent_id would be a devilish
manual process at best and impossible at worst. We provide handy
methods for both checking and re-indexing using the parent_id.

2) It provides significant performance advantages in a number of
situations: the common task of fetching a node''s parent is _much_
faster using parent_id. Fetching immediate children would require a
''level'' column if the parent_id was not present, and
maintaining a
level column would be a bit of work.  (But nice to have-- it''s on the
to-do list)

3) It makes reversion to an adjacency-list model trivial (just nuke
the lft and rgt cols) if you decide nested sets aren''t for you. One of
my goals with the code is to make it easy for people using
acts_as_tree to try a nested set model.

4) The parent_id is required (and used to good effect) in the ez-set branch.

5) It doesn''t really cause any problems, performance or otherwise.
>
> Second, the numbering scheme need not be contiguous.  If you use the full
> range of floats given in a database you can insert thousands of children
> without ever having to recalculate other rows in the database.  This will
> drastically increase performance on inserts/updates, and unless you are
> working with huge sets, you may never need to recalculate parent
boundaries.
I looked into using floats, but I was turned off by the fact that
trees less than 60 levels deep (if I recall correctly) could overrun
the index. That said, if you would like to try implementing something
like this, please do-- I''m happy to provide what advice I can.

Keeping the index sequential has a few benefits: you can find out how
many descendants a node has without hitting the database, and it makes
finding ''leaf'' nodes easy.
>
> For more information (if you haven''t already seen it), check out:
>
> http://searchoracle.techtarget.com/tip/1,289483,sid13_gci537290,00.html
Thanks for linking to that. I''d seen most of it before (Joe has
recycled that material a lot of times), but some of the information at
the bottom was interesting.

Cheers,
Krishna

Jeff Dean

2007-Feb-10 00:49 UTC

head link

[Betternestedset-talk] Improving the nested_set model to remove redundancy

All good points - thanks for the clarification!

On 2/9/07, Krishna Dole <dontfall at gmail.com>
wrote:>
> Hi Jeff,
>
> Thanks for writing us about this, and welcome to the list.
>
> > First, I think the parent_id field is completely unnecessary.  The
point
> of
> > the nested set model seems to be that you eliminate the need for the
> join,
> > thereby creating a table with a single fact, in a single place.  If
you
> > don''t have a parent_id, you identify a root as anything with
lft=1.  You
> can
> > achieve multiple roots by using composite keys (by adding a set_id
> column,
> > for example).
>
> You''re right-- strictly speaking the parent_id field is
unnecessary,
> and thus non-normalized. However, we have chosen to keep it thus far
> for several reasons:
>
> 1) It provides a good backup to the lft/rgt indexes. Nested sets are
> inherently both volatile and brittle. I feel good about our code''s
> ability to maintain the left/right values, but if they somehow became
> corrupted, repairing them without having parent_id would be a devilish
> manual process at best and impossible at worst. We provide handy
> methods for both checking and re-indexing using the parent_id.
>
> 2) It provides significant performance advantages in a number of
> situations: the common task of fetching a node''s parent is _much_
> faster using parent_id. Fetching immediate children would require a
> ''level'' column if the parent_id was not present, and
maintaining a
> level column would be a bit of work.  (But nice to have-- it''s on
the
> to-do list)
>
> 3) It makes reversion to an adjacency-list model trivial (just nuke
> the lft and rgt cols) if you decide nested sets aren''t for you.
One of
> my goals with the code is to make it easy for people using
> acts_as_tree to try a nested set model.
>
> 4) The parent_id is required (and used to good effect) in the ez-set
> branch.
>
> 5) It doesn''t really cause any problems, performance or otherwise.
>
> >
> > Second, the numbering scheme need not be contiguous.  If you use the
> full
> > range of floats given in a database you can insert thousands of
children
> > without ever having to recalculate other rows in the database.  This
> will
> > drastically increase performance on inserts/updates, and unless you
are
> > working with huge sets, you may never need to recalculate parent
> boundaries.
>
> I looked into using floats, but I was turned off by the fact that
> trees less than 60 levels deep (if I recall correctly) could overrun
> the index. That said, if you would like to try implementing something
> like this, please do-- I''m happy to provide what advice I can.
>
> Keeping the index sequential has a few benefits: you can find out how
> many descendants a node has without hitting the database, and it makes
> finding ''leaf'' nodes easy.
>
> >
> > For more information (if you haven''t already seen it), check
out:
> >
> >
http://searchoracle.techtarget.com/tip/1,289483,sid13_gci537290,00.html
>
> Thanks for linking to that. I''d seen most of it before (Joe has
> recycled that material a lot of times), but some of the information at
> the bottom was interesting.
>
> Cheers,
> Krishna
> _______________________________________________
> Betternestedset-talk mailing list
> Betternestedset-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/betternestedset-talk
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/betternestedset-talk/attachments/20070209/feac712d/attachment-0001.html

Betternestedset talk - Feb 2007 - Improving the nested_set model to remove redundancy

[Betternestedset-talk] Improving the nested_set model to remove redundancy

[Betternestedset-talk] Improving the nested_set model to remove redundancy

[Betternestedset-talk] Improving the nested_set model to remove redundancy