Matthew Somerville
2008-Feb-29 09:58 UTC
[Xapian-discuss] How many docs to feed to an RSet?
Hi, I'm just trying out get_eset() to fetch terms to go under a "possibly relevant terms" or similar heading on my search results pages. When compiling the RSet to feed to get_eset(), how many documents should I add? As I've just fetched results for a search, do I feed in all the result documents (a default of 20 on a page), or less? The code calls get_mset(0,500) in order to have exact "number of results" for less than 500 results, so could conceivably feed up to 500 in, though I'm guessing that's not that helpful/fast. ATB, Matthew
Matthew Somerville wrote:> Hi, > > I'm just trying out get_eset() to fetch terms to go under a "possibly > relevant terms" or similar heading on my search results pages. When > compiling the RSet to feed to get_eset(), how many documents should I add? > As I've just fetched results for a search, do I feed in all the result > documents (a default of 20 on a page), or less?The best answer is to play around, experiment, and see what seems to work for you. The dataset, and the types of queries you're doing, will both have a big effect. I've found that a value of 10 works well with some datasets - but you may find that it gives terrible results. What you actually want is to only feed documents which are really relevant to the RSet. One approach for doing this is to ask the user; but this often isn't possible. Another approach is to try and make use of log information for previous searchers in some way (but Xapian provides no support for this, of course). > The code calls> get_mset(0,500) in order to have exact "number of results" for less than 500 > results, so could conceivably feed up to 500 in, though I'm guessing that's > not that helpful/fast.Probably not, indeed. If you supply too many documents, you're likely to get lots of irrelevant terms being thrown up in this situation. Incidentally, if you're just passing 500 to get an accurate result count, you might want to try using the "checkatleast" parameter for that, instead. eg: get_mset(0,20,500). -- Richard