For unsupervised approaches like BM25 this approach works well but letor does not need special weighting for title in this form as it itself assigns weights to title features separately. But I see your concern it would be a problem when BM25 is used on the index with this setup. Hence its preferable to take a note of this uplift in title weight for xapian-letor and normalize it everywhere calculating the statistics. Cheers, Parth. On Thu, Mar 20, 2014 at 2:35 AM, Olly Betts <olly at survex.com> wrote:> On Mon, Mar 17, 2014 at 09:07:29PM +0100, Parth Gupta wrote: > > Wouldn't setting the weight of terms in title back to normal (e.g. 5 to > 1) > > by below line, automatically adjust the wdfs and field lengths? > > > > indexer.index_text(title, 5, "S"); -> indexer.index_text(title, 1, "S"); > > > > if it does not then we should include that part in the patch too. I like > to > > create a patch for xapian-letor for resolving common code of xapian. > > I'm not sure I follow. > > The reason we use 5 here is that the page title is that matching terms > in the title are usually a good indicator of a page that should be > ranked highly for a search (note omindex is not usually working in a > domain where evil SEOs are trying to distort the rankings). > > If we simply change 5 to 1 here, then the title won't be given any extra > consideration, which would be a regression in this area. > > Cheers, > Olly >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140322/d8f6c762/attachment-0002.html>
On 22 Mar 2014, at 08:22, Parth Gupta <pargup8 at gmail.com> wrote:> For unsupervised approaches like BM25 this approach works well but letor does not need special weighting for title in this form as it itself assigns weights to title features separately. > > But I see your concern it would be a problem when BM25 is used on the index with this setup. Hence its preferable to take a note of this uplift in title weight for xapian-letor and normalize it everywhere calculating the statistics.This would need configuring, though, wouldn't it? Not everyone (and I'm thinking of people who don't index using omindex here) applies a wdf of 5 while indexing titles; they may apply a different non-1 number, or just leave it at 1 (and possibly apply weighting at search time). J -- James Aylett, occasional trouble-maker xapian.org
Yes James, is there any automatic way to know what weight was used for titles or more generally for terms with some prefix? On Sat, Mar 22, 2014 at 1:35 PM, James Aylett <james-xapian at tartarus.org>wrote:> On 22 Mar 2014, at 08:22, Parth Gupta <pargup8 at gmail.com> wrote: > > > For unsupervised approaches like BM25 this approach works well but letor > does not need special weighting for title in this form as it itself assigns > weights to title features separately. > > > > But I see your concern it would be a problem when BM25 is used on the > index with this setup. Hence its preferable to take a note of this uplift > in title weight for xapian-letor and normalize it everywhere calculating > the statistics. > > This would need configuring, though, wouldn't it? Not everyone (and I'm > thinking of people who don't index using omindex here) applies a wdf of 5 > while indexing titles; they may apply a different non-1 number, or just > leave it at 1 (and possibly apply weighting at search time). > > J > > -- > James Aylett, occasional trouble-maker > xapian.org > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140322/b81e7f8a/attachment-0002.html>