thr3ads.net - Xapian devel - Weighting Schemes: Evaluation results [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Vivek Pal

2016-Jul-24 11:17 UTC

Weighting Schemes: Evaluation results

Hi all,

I have evaluated new weighting schemes along with their existing
counterparts in Xapian to compare and see which one does better job.
Also, I have put together all the results files for easy access here:
https://github.com/ivmarkp/xapian-evaluation/tree/evaluation/run
and a README for getting started with xapian-evaluation module. Hopefully,
it might be of help to those who are new to evaluating weighting schemes in
Xapian :)

Comparing the MAP to access the retrieval effectiveness, some interesting
results have emerged as follows:

1. BM25+ : 0.100415 and BM25: 0.101771

BM25 does a slightly better job here. My guess is that BM25+ is falling
short because may be we lack very long documents in the data-set
collection.
Also, I'm thinking of revisiting the PR of BM25+ patch and cross-check it
with original BM25+ formula to spot any mistake in the implementation
formula if any.
Let me know of any other ideas that can possibly improve the performance of
BM25+.

2. PL2+: 0.0781953 and PL2: 0.0752646

Here, PL2+ indeed does a better job at retrieving relevant documents
although by a small margin.
I believe this should produce much better results at scale in practical
use. At this point, we might want to consider replacing PL2 with PL2+ in
Xapian to put it in practical use.

3. LMWeight_Dirplus: 0.100168 and LMWeight_Dir: 0.100168

These results are for LMWeight with smoothing Dir and Dirplus respectively.
Interestingly identical results.
Ideally, LMWeight_dirplus should perform better and I'm having similar
thoughts for it as for BM25+ and BM25 results.

Last addtion in weighting schemes (Piv+ normalization) is a work in
progress.
I've been sick these past few days and so things moved slowly. Will be
completing its implementation in the upcoming week along with the
evaluation.

Regards,
Vivek
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20160724/a30e76b2/attachment.html>

James Aylett

2016-Jul-24 14:11 UTC

head link

Weighting Schemes: Evaluation results

On Sun, Jul 24, 2016 at 04:47:15PM +0530, Vivek Pal wrote:
> I have evaluated new weighting schemes along with their existing
> counterparts in Xapian to compare and see which one does better job.
> Also, I have put together all the results files for easy access here:
> https://github.com/ivmarkp/xapian-evaluation/tree/evaluation/run
We probably don't want them committed in git where they're evaluation
runs (because we can recreate them); a gist might be more appropriate.

I can't tell, but are some of those files from FIRE? If so, they
shouldn't be committed either; access to FIRE is via our usage
agreement, and shouldn't be just public on the internet
anywhere. (Unless it's just files that FIRE themselves make completely
public, but even then it's better to link to them.)
> and a README for getting started with xapian-evaluation module. Hopefully,
> it might be of help to those who are new to evaluating weighting schemes in
> Xapian :)
In your instructions:

$ mv xapian-evaluation /path/to/xapian && cd xapian && nano edit
bootstrap

Is there time in your schedule to get evaluation into the main xapian
repo? That would avoid the first part of this. I don't think we're
looking at lots more work to get this done, are we?

You don't need to edit bootstrap; instead you can pass a list of
modules for it to bootstrap on the command line:

$ ./bootstrap xapian-core xapian-evaluation
> Comparing the MAP to access the retrieval effectiveness, some interesting
> results have emerged as follows:
Can you remind me what sort of corpus you're using from FIRE for this?
I want to get an idea of what kinds of use cases it might match. When
we're recommending weighting schemes to users, ideally we'd be able to
do this.
> 1. BM25+ : 0.100415 and BM25: 0.101771
> 
> BM25 does a slightly better job here. My guess is that BM25+ is falling
> short because may be we lack very long documents in the data-set
> collection.
Do you have any idea what 'very long' means in this case, in terms of
number of terms (or maybe multiple of mean terms)?
> 2. PL2+:  0.0781953 and PL2: 0.0752646
> 
> Here, PL2+ indeed does a better job at retrieving relevant documents
> although by a small margin.
Great!
> 3. LMWeight_Dirplus: 0.100168 and LMWeight_Dir: 0.100168
> 
> These results are for LMWeight with smoothing Dir and Dirplus respectively.
> Interestingly identical results.
> Ideally, LMWeight_dirplus should perform better and I'm having similar
> thoughts for it as for BM25+ and BM25 results.
That sounds more like the impact of the smoothing option is limited in
this run. Is this pure Dirichlet, or two-stage smoothing using Dir+
versus Dir? What smoothing parameters were you using?
> Last addtion in weighting schemes (Piv+ normalization) is a work in
> progress.  I've been sick these past few days and so things moved
> slowly. Will be completing its implementation in the upcoming week
> along with the evaluation.
Sorry you've been sick; make sure you're fully recovered before diving
back in full throttle!

Thanks for the (detailed!) update :)

J

-- 
  James Aylett, occasional trouble-maker
  xapian.org

Vivek Pal

2016-Jul-25 12:41 UTC

head link

Weighting Schemes: Evaluation results

Hi James,
> We probably don't want them committed in git where they're
evaluation
> runs (because we can recreate them); a gist might be more appropriate.
Sorry, I have moved results files over to gist for each individual
weighting scheme.
Link: https://gist.github.com/ivmarkp/secret
> I can't tell, but are some of those files from FIRE? If so, they
> shouldn't be committed either; access to FIRE is via our usage
> agreement, and shouldn't be just public on the internet
> anywhere.
No, those files are generated each time a run is completed, and just
contain evaluation results that are displayed on terminal.
> Is there time in your schedule to get evaluation into the main xapian
> repo? That would avoid the first part of this. I don't think we're
> looking at lots more work to get this done, are we?
No, getting evaluation module merged in xapian is not a part of project
schedule but it is one of the additional tasks kept for later attention.
And now that I've run some evaluations, I think module is in good shape
also with support for more weighting schemes due to be added through these
PR's https://goo.gl/D2fviW.
> Can you remind me what sort of corpus you're using from FIRE for this?
The corpus we are using contains sorted news articles/stories based on
section and time period
from two different news providers; BDNews 24 and The Telegraph.
> Do you have any idea what 'very long' means in this case, in terms
of
> number of terms (or maybe multiple of mean terms)
Very long documents in terms of no. of terms as specified in the paper; in
general, where |D| is much larger than avdl.

It is mentioned in the paper that "the MAP improvements of BM25+ over BM25
are much larger on Web collections than on the news collection. In
particular, the MAP improvements on all Web collections are statistically
significant." Therefore, they seem to have used four TREC collections:
WT2G,
WT10G, Terabyte, and Robust04, which represent different sizes and genre of
text collections.
> Is this pure Dirichlet, or two-stage smoothing using Dir+
> versus Dir? What smoothing parameters were you using?
That is pure Dirichlet vs Dir+ and sorry, I should have also uploaded the
config which has the parameter details.
For Dir+ I used following parameters:

lmparam_log  0.0
lmparam_select_smoothing DIRICHLET_SMOOTHING
lmparam_smoothing1 0.9
lmparam_smoothing2 2000.0
lmparam_delta 0.05
lmparam_enable_dirplus 1

I've added config files in gists as well.
> Sorry you've been sick; make sure you're fully recovered before
diving
> back in full throttle!
Thanks, I've gotten better. Should be no more hindrance in the days to come
:)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20160725/1fb8c79a/attachment.html>

Maybe Matching Threads

Search for more maybe matching threads

Xapian devel - Jul 2016 - Weighting Schemes: Evaluation results

Weighting Schemes: Evaluation results

Weighting Schemes: Evaluation results

Weighting Schemes: Evaluation results

Maybe Matching Threads