On Thu, May 14, 2009 at 01:48:25PM +0200, Daniel Etzold
wrote:> I have a query like "q1 OR q2 OR q3" and the result set should be
ranked
> by relevance. For that case I'd like to know whether the matcher does
> have to read the complete posting lists for q1, q2 and q3 or whether the
> matcher supports some kind of early termination so that only a small
> fraction of each list of a query term has to be read.
Yes, the matcher can terminate early, and it also handles the ORs
"decaying" to AND_MAYBE or AND, which can then skip over chunks of
postings for documents which can't match.
There's an overview here:
http://xapian.org/docs/matcherdesign.html
The gory details are in the matcher subdirectory of the source tree.
> Furthermore, does the matcher first build a M-set for "q1 OR q2"
and
> than incorporates the results of q3 or does the matcher work on the
> posting lists of each query term in parallel?
In parallel - that's much more memory efficient (since q1 OR q2 could
match every document in the database...)
Cheers,
Olly