Sungsoo Kim
2006-Feb-28 19:11 UTC
[Xapian-discuss] I need a function set_sort_by_relevance_then_value()
I am a newbie with Xapian. I have been studying Xapian for about a week. I am building a commercial database which consists of over 1M of images. Each image has its own keywords shown as follows: 1. keywords: China oriental Asia young lady{2} woman{2} people shopping{2} bag{2} pack buy purchase walking street sunshine hold bags satisfied provider: A 2. keywords: Bottles{2} wine{2} French Emilion travel shop{2} shopping provider: A 3. keywords: canada tourism tourist travel souvenir shop gift japaneses japan ethnic asian oriental woman female shopping{2} sandals{2} provider: E 4. keywords: senior seniors couple observer shopping{2} two woman man smile smiling city outdoors outside bottle bottles grandmother grandfather shopping graying reflection arm blond fair provider: D . . and so on. . . In the above the numbers in the braces mean term count even if it appears only once. Actually what I want is that If I search "shopping" the search result should be "1 4 3 2" ordered by relevance, then by provider. Because 1, 4, 3 have "shopping{2}", but provider A precedes provider D and E. And 2 should be the last because it has "shopping" without term count. I understand Xapian supports sort_by_relevance() then docid (ascending or descending), or sort_by_value_then_relevance(). But I cannot find sort_by_relevance_then_value() function in the document. In order to get the search result shown above, the function seems to be necessary to me. Provider A, B, C, D, E is not the name of provider, but the grade of providers, and it is changed from time to time. Without set_sort_by_relevance_then_value() how can I get the same result? Thanks in advance! Sungsoo Kim
Olly Betts
2006-Feb-28 20:37 UTC
[Xapian-discuss] I need a function set_sort_by_relevance_then_value()
On Wed, Mar 01, 2006 at 04:04:14AM +0900, Sungsoo Kim wrote:> Without set_sort_by_relevance_then_value() how can I get the same result?I don't think you can. But note that for your plan to work you'd need to have all documents the same length, or you use a weighting scheme which ignores document length. That's why sort_by_relevance_then_value isn't implemented - it's rare for two documents to score exactly the same relevance with BM25. Cheers, Olly
Sungsoo Kim
2006-Mar-06 16:37 UTC
[Xapian-discuss] Re: I need a function set_sort_by_relevance_then_value()
Hello, Olly!>> I can set weighting scheme to ignore document length by setting a paramter >> such as BM25Weight(0,0,0,0,0). > > Yes, that'll work, though you only need to set k2 and b to 0 to ignore > document length (and then the last parameter becomes irrelevant so might > as well be zero too). So BM25Weight(1,0,1,0,0) is fine too, and takes > into account how many times each term occurs in the query and the > document, so will probably give better ranking of results.Yes, you are right! The search results of BM25Weight(0,0,0,0,0) look similar to BoolWeight() scheme because it ignores "within document frequency". I followed your suggestion and it works very nice. Thanks!>> I hope you add the function set_sort_by_relevance_then_value() in the >> future version of xapian if you think there will be any possibility that the >> function can be used by other people. > > If you can supply a suitable patch I can apply it. It really needs a > matching feature test so we can be confident that it works and will > continue to work. The example in your original message turned into code > would be fine I think. Take a look at the existing test "sortrel1" (in > tests/api_db.cc) if you want a model to follow.Yes I will try, but it is not as simple as I expected. Maybe I need to add a boolean variable, and if new function is added the xapian-binding modules also should be modified. So I wonder it would be better that all of set_sort_by_*() functions merged into one function just like the deprecated set_sorting(). For better Xapian! Sungsoo Kim
Richard Boulton
2007-May-04 14:55 UTC
[Xapian-discuss] I need a function set_sort_by_relevance_then_value()
Sungsoo Kim wrote:> I understand Xapian supports sort_by_relevance() then docid (ascending or descending), or sort_by_value_then_relevance(). But I cannot find sort_by_relevance_then_value() function in the document. In order to get the search result shown above, the function seems to be necessary to me. Provider A, B, C, D, E is not the name of provider, but the grade of providers, and it is changed from time to time.I suspect you're using an old version of Xapian: Enquire::set_sort_by_relevance_then_value was implemented on Apr 4th 2006, and is included in Xapian releases from version 0.9.5 onwards. -- Richard