On Wed, Oct 14, 2009 at 08:20:23AM +0200, Torsten Bronger
wrote:> I use Xapian to index a lot of PDF files. My current approach is to
> have every PDF page as a Xapian document, so that I can report the
> page number to the user via document.get_value(0). This works well.
You shouldn't really use a document value for this - values are intended
to be used during the match process itself (for sorting, collapsing, value
ranges, MatchDecider, etc), and are stored to make that work well. If you
want something for showing results, the document data is a better option.
> However, it's not so nice that the pages of a certain PDF file are
> spread over the whole hits list. I could tell Xapian to report very
> many (maybe even all) hits to me so I could group them by PDF file
> in the main program. But possibly someone here has a more elegant
> solution?
See Enquire::set_collapse_key():
http://xapian.org/docs/apidoc/html/classXapian_1_1Enquire.html#f32055d3a4da31da994d97171f45d699
1.0.x only allows you to leave a single entry with each key, but in 1.1.x
you can collapse to leave up to a specified number for each key.
Cheers,
Olly