Hi all, I have evaluated new weighting schemes along with their existing counterparts in Xapian to compare and see which one does better job. Also, I have put together all the results files for easy access here: https://github.com/ivmarkp/xapian-evaluation/tree/evaluation/run and a README for getting started with xapian-evaluation module. Hopefully, it might be of help to those who are new to evaluating weighting schemes in Xapian :) Comparing the MAP to access the retrieval effectiveness, some interesting results have emerged as follows: 1. BM25+ : 0.100415 and BM25: 0.101771 BM25 does a slightly better job here. My guess is that BM25+ is falling short because may be we lack very long documents in the data-set collection. Also, I'm thinking of revisiting the PR of BM25+ patch and cross-check it with original BM25+ formula to spot any mistake in the implementation formula if any. Let me know of any other ideas that can possibly improve the performance of BM25+. 2. PL2+: 0.0781953 and PL2: 0.0752646 Here, PL2+ indeed does a better job at retrieving relevant documents although by a small margin. I believe this should produce much better results at scale in practical use. At this point, we might want to consider replacing PL2 with PL2+ in Xapian to put it in practical use. 3. LMWeight_Dirplus: 0.100168 and LMWeight_Dir: 0.100168 These results are for LMWeight with smoothing Dir and Dirplus respectively. Interestingly identical results. Ideally, LMWeight_dirplus should perform better and I'm having similar thoughts for it as for BM25+ and BM25 results. Last addtion in weighting schemes (Piv+ normalization) is a work in progress. I've been sick these past few days and so things moved slowly. Will be completing its implementation in the upcoming week along with the evaluation. Regards, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160724/a30e76b2/attachment.html>
On Sun, Jul 24, 2016 at 04:47:15PM +0530, Vivek Pal wrote:> I have evaluated new weighting schemes along with their existing > counterparts in Xapian to compare and see which one does better job. > Also, I have put together all the results files for easy access here: > https://github.com/ivmarkp/xapian-evaluation/tree/evaluation/runWe probably don't want them committed in git where they're evaluation runs (because we can recreate them); a gist might be more appropriate. I can't tell, but are some of those files from FIRE? If so, they shouldn't be committed either; access to FIRE is via our usage agreement, and shouldn't be just public on the internet anywhere. (Unless it's just files that FIRE themselves make completely public, but even then it's better to link to them.)> and a README for getting started with xapian-evaluation module. Hopefully, > it might be of help to those who are new to evaluating weighting schemes in > Xapian :)In your instructions: $ mv xapian-evaluation /path/to/xapian && cd xapian && nano edit bootstrap Is there time in your schedule to get evaluation into the main xapian repo? That would avoid the first part of this. I don't think we're looking at lots more work to get this done, are we? You don't need to edit bootstrap; instead you can pass a list of modules for it to bootstrap on the command line: $ ./bootstrap xapian-core xapian-evaluation> Comparing the MAP to access the retrieval effectiveness, some interesting > results have emerged as follows:Can you remind me what sort of corpus you're using from FIRE for this? I want to get an idea of what kinds of use cases it might match. When we're recommending weighting schemes to users, ideally we'd be able to do this.> 1. BM25+ : 0.100415 and BM25: 0.101771 > > BM25 does a slightly better job here. My guess is that BM25+ is falling > short because may be we lack very long documents in the data-set > collection.Do you have any idea what 'very long' means in this case, in terms of number of terms (or maybe multiple of mean terms)?> 2. PL2+: 0.0781953 and PL2: 0.0752646 > > Here, PL2+ indeed does a better job at retrieving relevant documents > although by a small margin.Great!> 3. LMWeight_Dirplus: 0.100168 and LMWeight_Dir: 0.100168 > > These results are for LMWeight with smoothing Dir and Dirplus respectively. > Interestingly identical results. > Ideally, LMWeight_dirplus should perform better and I'm having similar > thoughts for it as for BM25+ and BM25 results.That sounds more like the impact of the smoothing option is limited in this run. Is this pure Dirichlet, or two-stage smoothing using Dir+ versus Dir? What smoothing parameters were you using?> Last addtion in weighting schemes (Piv+ normalization) is a work in > progress. I've been sick these past few days and so things moved > slowly. Will be completing its implementation in the upcoming week > along with the evaluation.Sorry you've been sick; make sure you're fully recovered before diving back in full throttle! Thanks for the (detailed!) update :) J -- James Aylett, occasional trouble-maker xapian.org
Hi James,> We probably don't want them committed in git where they're evaluation > runs (because we can recreate them); a gist might be more appropriate.Sorry, I have moved results files over to gist for each individual weighting scheme. Link: https://gist.github.com/ivmarkp/secret> I can't tell, but are some of those files from FIRE? If so, they > shouldn't be committed either; access to FIRE is via our usage > agreement, and shouldn't be just public on the internet > anywhere.No, those files are generated each time a run is completed, and just contain evaluation results that are displayed on terminal.> Is there time in your schedule to get evaluation into the main xapian > repo? That would avoid the first part of this. I don't think we're > looking at lots more work to get this done, are we?No, getting evaluation module merged in xapian is not a part of project schedule but it is one of the additional tasks kept for later attention. And now that I've run some evaluations, I think module is in good shape also with support for more weighting schemes due to be added through these PR's https://goo.gl/D2fviW.> Can you remind me what sort of corpus you're using from FIRE for this?The corpus we are using contains sorted news articles/stories based on section and time period from two different news providers; BDNews 24 and The Telegraph.> Do you have any idea what 'very long' means in this case, in terms of > number of terms (or maybe multiple of mean terms)Very long documents in terms of no. of terms as specified in the paper; in general, where |D| is much larger than avdl. It is mentioned in the paper that "the MAP improvements of BM25+ over BM25 are much larger on Web collections than on the news collection. In particular, the MAP improvements on all Web collections are statistically significant." Therefore, they seem to have used four TREC collections: WT2G, WT10G, Terabyte, and Robust04, which represent different sizes and genre of text collections.> Is this pure Dirichlet, or two-stage smoothing using Dir+ > versus Dir? What smoothing parameters were you using?That is pure Dirichlet vs Dir+ and sorry, I should have also uploaded the config which has the parameter details. For Dir+ I used following parameters: lmparam_log 0.0 lmparam_select_smoothing DIRICHLET_SMOOTHING lmparam_smoothing1 0.9 lmparam_smoothing2 2000.0 lmparam_delta 0.05 lmparam_enable_dirplus 1 I've added config files in gists as well.> Sorry you've been sick; make sure you're fully recovered before diving > back in full throttle!Thanks, I've gotten better. Should be no more hindrance in the days to come :) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160725/1fb8c79a/attachment.html>