I am currently using revision 9300 of Xapian core. I
heard rumors of version 1.0.3 having optimized
features to count occurances of values across a match
set.
My particular problem is to know that in, say, 10,000
matching documents, the *exact" number of values
containing the string "Location:Utah".
I tried to pass my custom spy object as the 6th
argument of get_mset(). However, it appears to be
taking just as long if it's the 5th argument
(MatchDecider). I *am* setting checkatleast to 1
million, to make sure I get the exact counts.
My impression was that the new MatchSpy will look at
all matching documents, to get the exact counts, and
checkatleast won't even be needed, except as a hard
cut-off.
Am I missing something?
As a side note. This has been done for a while in
commercial engines like Autonomy or Endeca, with
surprising efficiency. We are currently doing this
with MySQL, but this requirement is demanding, and it
does not scale well. I spent weeks optimizing it. My
hope is that a "real" search engine will do this better.