On Wed, Nov 26, 2003 at 08:31:00PM +0000, Olly Betts
wrote:> On Fri, Nov 21, 2003 at 11:52:44AM +0100, Franck Meunier wrote:
> > I indexed internal docs, and then ask omega to "MORELIKE" a
document.
> > It gives me only a link at 43% to the previous revision of this
> > document (i also index my archives)...
> >
> > The problem is that their was only two or three differences between
them.
> >
> > I found that to do a MORELIKE, omega constuct a RSet with the
> > document, extracts the 6 first terms of the corresponding ESet, and
> > create a new query with them.
> >
> > 6 seems to be short... I extend this value to 40, and it looks really
> > better (99% for my two documents).
> >
> > Have you ever experienced a problem with this parameter ?
>
> The MORELIKE functionality was originally written as an experimental
> feature for EuroFerret:
>
> http://web.archive.org/web/19991013083615/http://euroferret.com/
>
> EuroFerret indexed each page by the 60 best terms, and I suspect that
> the choice of 6 is based on tuning for that database.
>
> As you're presumably indexing all the terms in each page, it's not
at
> all suprising that a larger number gives better results. I wonder if we
> can either set a better pick threshold by looking at the expand weights,
> or perhaps just as a function of the number of terms indexing the
> document we're trying to find more like. I'll take a look.
Sorry, I've only just noticed this mail sitting awaiting attention.
I've simply raised the limit to 40 as you suggest for now. I'll make
a not to investigate a more dynamic approach. Feedback on this change
is welcome.
Cheers,
Olly