On Sun, Apr 02, 2006 at 10:27:37AM +0530, durga bidaye
wrote:> Suppose footballer and footballs were given as terms to be indexed
> and both were stemmed to footbal. Now when we gave "footballs" as
the query
> then we will get both, document containing footballs and document
containing
> footballer, as search results with equal ranking(in absence of other
factors
> like within document frequency,etc).
Correct.
> But ideally it should have given document containing "footballs"
> higher ranking and the one containing footballer lower ranking.
I don't follow why. Both "footballs" and "footballer"
indicate that a
document is "about terms that stem to 'footbal'".
Perhaps you think that "footballer" indicates less
"aboutness" than
"footballs"? I think that's a highly subjective judgement - it
may
be true sometimes but in other cases the reverse is true. For example,
consider the query: footballers' wives - there "footballer"
indicates
more relevance than "footballs".
> Isn't there a mechanism in xapian which makes this kind of ranking
> possible?
If you really want to do that, you can set a higher "wdfinc" when
adding
postings for "footbal" when it comes from "footballs" than
when it comes
from "footballer".
But you'll need to compile a list of which unstemmed forms indicate more
"aboutness" than others, and I'm unconvinced it's really a
sensible
approach.
Cheers,
Olly