I've got a bunch of indexed documents (newspaper articles).
Each document has 0 or more authors.
I want to show search results grouped by author.
(it's a somewhat similar situation to the one posted a couple of weeks
ago by Torsten Bronger)
Here are the solutions I can think of so far:
1) pick a single author for each article, and put them in a valueno
slot, then use set_collapse_key() to do the grouping.
cons: doesn't handle articles with more than one credited author very well.
2) slurp the top N results out into the calling code (I'm using PHP in
this case) and do the grouping there. Need some metric to rank authors -
either by taking their most relevant document (as set_collapse_key does)
or maybe even by summing up the relevance scores of all their documents
- and multiple matching documents probably means an author is more relevant.
cons: doesn't scale up well to large result sets.
3) maintain a separate xapian database which has single uberdocument for
each author (by concatinating all their articles)
I've got nearly 2 million documents, but only about 20000 authors. Maybe
a second database would be quite small...
cons: _another_ database to maintain and contend for RAM
Any other suggestions or advice?
At the moment, I'm leaning toward option 2, although I might do a quick
test of option 3 and see if the extra database is small enough to be
manageable...
Thanks,
Ben.