On Fri, Jan 04, 2008 at 02:21:56PM -0500, Kapil Thangavelu wrote:
> i'd like to index a set of documents arranged in a hierarchy, and
perform
> queries to retrieve subsets of documents based on their position within a
> hierarchy...
>
> ie
>
> /australia
> /mammals
> /marsupials
> /dingos
> /reptiles
> /snakes
>
> so i'd like to search against a given sub hierarchy.
>
> in systems like lucene, i'd index the full document path as a filed,
and use
> a prefix query when searching against a subset of the hierarchy.
You have a choice of doing the work at index time or at query
time. Index time is preferred. (Query time will work in the way lucene
does, but you have to do a little more work.)
Index time
----------
Generate terms for each level of the hierarchy. You'll want to give
them a prefix, assuming you're doing the standard Xapian term
style. Say you choose the prefix XH (for hierarchy - X is for any 'user'
prefixes), then you might generate:
XHaustralia
XHaustralia/mammals
XHaustralia/mammals/marsupials
for a single document. And perhaps:
XHaustralia
XHaustralia/reptiles
XHaustralia/reptiles/snakes
for another. Then at search time you search for
'XHaustralia/reptiles', or whatever level you actually want. (You can
use QueryParser::add_boolean_prefix() to say search on topic:australia
or topic:australia/reptiles .)
Query time
----------
Generate a single term for the position in the hierarchy:
XHaustralia/mammals/marsupials/
Then at search time, you want to OP_FILTER on a query constructed
something like say:
Query q(Query::OP_OR,
db.allterms_begin('XHaustralia/'),
db.allterms_end('XHaustralia/'));
(the trailing slashes prevent it from matching XHaustralian, if your
hierarchy contains that separately for some reason - there are
obviously other examples which would actually trip you up).
J
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james@tartarus.org uncertaintydivision.org