Hightman(马明练)
2007-Sep-20 12:57 UTC
[Xapian-discuss] Incorrect get_matches_estimated() of Xapian::Mset
Hello, As I know, get_matches_estimated() return an estimate for the number of documents with matches the query. But now, I found it get a disparity between the return value and real mathced number. For an example: the real matched number is 58, but the return value is 458; so when the users click the hinder page, get a blank page ... so they often complain to me. I found that the main reason is that the query with a high-matched boolean TERM? E.g: There are only two data-type of all documents, every document belong to data-type I or data-type II?the Number of documents with data-type I is much greater than the data-type II?Now I do a test, query by some keywords only, the mathced number returned by get_matches_estimated() is "500", when I add the boolean condition, the mathced number returned is 400 for data-type I and 100 for data-type II; but really number of them is just reverse? So I get an conclusion, XAPIAN count the estimate number by the percentage of the FILTER term in all documents .... :( How can I fixed this error??
Olly Betts
2007-Sep-20 14:27 UTC
[Xapian-discuss] Incorrect get_matches_estimated() of Xapian::Mset
On Thu, Sep 20, 2007 at 07:56:52PM +0800, Hightman(??????) wrote:> So I get an conclusion, XAPIAN count the estimate number by the > percentage of the FILTER term in all documents .... :( How can I > fixed this error??It's not really an error - get_matches_estimated() returns an *estimate* so it's allowed to be wrong. By default Xapian will favour retrieval speed over getting the estimate more correct.> But now, I found it get a disparity between the return value and real > mathced number. For an example: the real matched number is 58, but the > return value is 458; so when the users click the hinder page, get a > blank page ... so they often complain to me.If you want to ensure the estimate is correct below a certain value (e.g. to allow reliably generation of paging buttons in your UI), set the check_at_least parameter to Enquire::get_mset(). See the documentation for the details, or look at the code to Omega to see how it implements this. Cheers, Olly
Hightman(马明练)
2007-Sep-20 15:18 UTC
[Xapian-discuss] Incorrect get_matches_estimated() of Xapian::Mset
Olly Betts, Hello ======= 2007-09-20 14:27:00 ????????======>On Thu, Sep 20, 2007 at 07:56:52PM +0800, Hightman(??????) wrote: > >If you want to ensure the estimate is correct below a certain value >(e.g. to allow reliably generation of paging buttons in your UI), set >the check_at_least parameter to Enquire::get_mset(). See theThough I used the third argument to set the check_at_least number, It still return much more number than the exact number(about 5 times); But if the exact number is greater than estimated number, the new return value is more correct. Dosen't the third argument fixed the number greater than exact number?>documentation for the details, or look at the code to Omega to see how >it implements this. > >Cheers, > Olly= = = = = = = = = = = = = = = = = = = ????????? ?? ????????Hightman(???) ????????hightman@zuaa.zju.edu.cn ??????????2007-09-20
Hightman(马明练)
2007-Sep-20 16:49 UTC
[Xapian-discuss] Incorrect get_matches_estimated() of Xapian::Mset
Olly Betts,??? Thanks for your help, Now I have resolved this problem at some degree. ======= 2007-09-20 15:36:00 ????????======>On Thu, Sep 20, 2007 at 10:18:07PM +0800, Hightman(??????) wrote: >> Though I used the third argument to set the check_at_least number, It still return >> much more number than the exact number(about 5 times); >> >> But if the exact number is greater than estimated number, the new return value is more correct. >> >> Dosen't the third argument fixed the number greater than exact number? > >I don't really understand that question. > >What the checkatleast parameter specifies is the minimum number of >documents which the matcher will look at. By default we try to >minimise this number, while still returning correct results, as >that makes searches faster. > >If there are fewer matches than this, then get_matches_estimated(), >get_matches_lower_bound() and get_matches_upper_bound() will all >return the same answer, which will be the exact number of matches. > >So if you want to show 10 page buttons and have 10 hits per page, >pass 101 as checkatleast (the extra 1 allows you to tell the >difference between "exactly 100 hits" and "more than 100 hits"). > >If there are more, then get_matches_estimated() won't necessarily >be exact, though because the matcher may have looked at more documents, >it may be a better estimate. You can look at get_matches_lower_bound() >and get_matches_upper_bound() to see how wrong it could be. > >Note that most search engines estimate the number of matches for >reasons of performance (e.g. Google usually says "Results 1 - 10 of >about 781,000"). > >Cheers, > Olly= = = = = = = = = = = = = = = = = = = ????????? ?? ????????Hightman(???) ????????hightman@zuaa.zju.edu.cn ??????????2007-09-20