Fabrice Colin
2007-Oct-16 14:50 UTC
[Xapian-discuss] Matches estimate varies with sorting method
Hi all, I found that the figure returned by MSet::get_matches_estimated() varies depending on how results are to be sorted. For instance, in my index, value 4 contains date and time in the format "yyyymmddhhmmss". For the same query, the number of results will be estimated to 20000+ when results are first sorted by date and time with set_sort_by_value_then_relevance(4) and to only 100 if I use set_sort_by_relevance(). The first figure is the correct one. Note that the MSet is obtained with Enquire::get_mset(0, 100, 101), so that probably explains where the 100 comes from. The estimate will also be correct with set_sort_by_relevance_then_value(4). If I am not mistaken, a similar problem was reported, and apparently fixed, back in September : http://comments.gmane.org/gmane.comp.search.xapian.general/5110 I am using 1.0.3. Fabrice
Olly Betts
2007-Oct-17 01:07 UTC
[Xapian-discuss] Matches estimate varies with sorting method
On Tue, Oct 16, 2007 at 09:50:29PM +0800, Fabrice Colin wrote:> I found that the figure returned by MSet::get_matches_estimated() varies > depending on how results are to be sorted.This in itself isn't a bug - it is after all an estimate!> For instance, in my index, value 4 contains date and time in the format > "yyyymmddhhmmss". For the same query, the number of results will be > estimated to 20000+ when results are first sorted by date and time > with set_sort_by_value_then_relevance(4) and to only 100 if I use > set_sort_by_relevance(). The first figure is the correct one.You're likely to get a more accurate estimate when sorting since the matcher generally has to consider more documents when sorting.> Note that the MSet is obtained with Enquire::get_mset(0, 100, 101), so that > probably explains where the 100 comes from.But this sounds wrong. If "checkatleast" is 101, get_matches_estimated() should only be less if the estimate is exact. What are the corresponding values of get_matches_min() and get_matches_max() in the two cases? Does this also happen with SVN HEAD? There have been some matcher-related changes, but nothing specifically addressing that I'm aware of. And can you supply a recipe to reproduce this easily?> The estimate will also be correct with set_sort_by_relevance_then_value(4). > > If I am not mistaken, a similar problem was reported, and apparently fixed, > back in September : > http://comments.gmane.org/gmane.comp.search.xapian.general/5110 > > I am using 1.0.3.That fix would have made it into 1.0.3, so I don't think it can be the exact same issue. Cheers, Olly