Hi James,> We probably don't want them committed in git where they're evaluation > runs (because we can recreate them); a gist might be more appropriate.Sorry, I have moved results files over to gist for each individual weighting scheme. Link: https://gist.github.com/ivmarkp/secret> I can't tell, but are some of those files from FIRE? If so, they > shouldn't be committed either; access to FIRE is via our usage > agreement, and shouldn't be just public on the internet > anywhere.No, those files are generated each time a run is completed, and just contain evaluation results that are displayed on terminal.> Is there time in your schedule to get evaluation into the main xapian > repo? That would avoid the first part of this. I don't think we're > looking at lots more work to get this done, are we?No, getting evaluation module merged in xapian is not a part of project schedule but it is one of the additional tasks kept for later attention. And now that I've run some evaluations, I think module is in good shape also with support for more weighting schemes due to be added through these PR's https://goo.gl/D2fviW.> Can you remind me what sort of corpus you're using from FIRE for this?The corpus we are using contains sorted news articles/stories based on section and time period from two different news providers; BDNews 24 and The Telegraph.> Do you have any idea what 'very long' means in this case, in terms of > number of terms (or maybe multiple of mean terms)Very long documents in terms of no. of terms as specified in the paper; in general, where |D| is much larger than avdl. It is mentioned in the paper that "the MAP improvements of BM25+ over BM25 are much larger on Web collections than on the news collection. In particular, the MAP improvements on all Web collections are statistically significant." Therefore, they seem to have used four TREC collections: WT2G, WT10G, Terabyte, and Robust04, which represent different sizes and genre of text collections.> Is this pure Dirichlet, or two-stage smoothing using Dir+ > versus Dir? What smoothing parameters were you using?That is pure Dirichlet vs Dir+ and sorry, I should have also uploaded the config which has the parameter details. For Dir+ I used following parameters: lmparam_log 0.0 lmparam_select_smoothing DIRICHLET_SMOOTHING lmparam_smoothing1 0.9 lmparam_smoothing2 2000.0 lmparam_delta 0.05 lmparam_enable_dirplus 1 I've added config files in gists as well.> Sorry you've been sick; make sure you're fully recovered before diving > back in full throttle!Thanks, I've gotten better. Should be no more hindrance in the days to come :) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160725/1fb8c79a/attachment.html>
On Mon, Jul 25, 2016 at 06:11:21PM +0530, Vivek Pal wrote:> > We probably don't want them committed in git where they're evaluation > > runs (because we can recreate them); a gist might be more appropriate. > > Sorry, I have moved results files over to gist for each individual > weighting scheme. > Link: https://gist.github.com/ivmarkp/secretYou need to share the actual URL of the gist, otherwise only you can see them I think :-) Or just make them public; there's nothing sensitive in these, I think. (One gist can contain multiple files, and people can then clone or download the whole lot easily.)> > I can't tell, but are some of those files from FIRE? > > No, those files are generated each time a run is completed, and just > contain evaluation results that are displayed on terminal.Okay, great.> > Can you remind me what sort of corpus you're using from FIRE for this? > > The corpus we are using contains sorted news articles/stories based > on section and time period from two different news providers; BDNews > 24 and The Telegraph.Great, thanks; it's worth noting this somewhere (maybe on your project wiki page).> > Do you have any idea what 'very long' means in this case, in terms of > > number of terms (or maybe multiple of mean terms) > > Very long documents in terms of no. of terms as specified in the paper; in > general, where |D| is much larger than avdl. > > It is mentioned in the paper that "the MAP improvements of BM25+ over BM25 > are much larger on Web collections than on the news collection. In > particular, the MAP improvements on all Web collections are statistically > significant." Therefore, they seem to have used four TREC collections: WT2G, > WT10G, Terabyte, and Robust04, which represent different sizes and genre of > text collections.Ah. If FIRE doesn't have something that can show this suitably, then maybe Parth can advise on access to TREC, as I know he's used some of them in the past. Certainly until we have something where evaluation shows an improvement, we shouldn't change the default. It does sound like it should be possible to find a suitable dataset to demonstrate this on, though. J -- James Aylett, occasional trouble-maker xapian.org
> You need to share the actual URL of the gist, otherwise only you can see > them I think :-)Sorry, I've made all gists public. https://gist.github.com/ivmarkp> Great, thanks; it's worth noting this somewhere (maybe on your project > wiki page).Okay, I'll update the project plan page with more details related to dataset used for evaluation runs.> Certainly until we have something where evaluation shows an > improvement, we shouldn't change the default.Yes, I think the same and I feel it'd still be worth having these new weighting schemes as alternatives in Xapian as for e.g. PL2+ shows some better results already on the news collection that we currently have. Likewise we might see similar promising results from other weighting schemes as well by evaluating them on web collections. Thanks, Vivek On Mon, Jul 25, 2016 at 9:00 PM, James Aylett <james-xapian at tartarus.org> wrote:> On Mon, Jul 25, 2016 at 06:11:21PM +0530, Vivek Pal wrote: > > > > We probably don't want them committed in git where they're evaluation > > > runs (because we can recreate them); a gist might be more appropriate. > > > > Sorry, I have moved results files over to gist for each individual > > weighting scheme. > > Link: https://gist.github.com/ivmarkp/secret > > You need to share the actual URL of the gist, otherwise only you can see > them I think :-) > > Or just make them public; there's nothing sensitive in these, I think. > > (One gist can contain multiple files, and people can then clone or > download the whole lot easily.) > > > > I can't tell, but are some of those files from FIRE? > > > > No, those files are generated each time a run is completed, and just > > contain evaluation results that are displayed on terminal. > > Okay, great. > > > > Can you remind me what sort of corpus you're using from FIRE for this? > > > > The corpus we are using contains sorted news articles/stories based > > on section and time period from two different news providers; BDNews > > 24 and The Telegraph. > > Great, thanks; it's worth noting this somewhere (maybe on your project > wiki page). > > > > Do you have any idea what 'very long' means in this case, in terms of > > > number of terms (or maybe multiple of mean terms) > > > > Very long documents in terms of no. of terms as specified in the paper; > in > > general, where |D| is much larger than avdl. > > > > It is mentioned in the paper that "the MAP improvements of BM25+ over > BM25 > > are much larger on Web collections than on the news collection. In > > particular, the MAP improvements on all Web collections are statistically > > significant." Therefore, they seem to have used four TREC collections: > WT2G, > > WT10G, Terabyte, and Robust04, which represent different sizes and genre > of > > text collections. > > Ah. If FIRE doesn't have something that can show this suitably, then > maybe Parth can advise on access to TREC, as I know he's used some of > them in the past. > > Certainly until we have something where evaluation shows an > improvement, we shouldn't change the default. It does sound like it > should be possible to find a suitable dataset to demonstrate this on, > though. > > J > > -- > James Aylett, occasional trouble-maker > xapian.org > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160726/77b7ac86/attachment.html>
Ah. If FIRE doesn't have something that can show this suitably, then> maybe Parth can advise on access to TREC, as I know he's used some of > them in the past. >?I can say FIRE is also a reliable source but INEX/TREC are better. INEX can give you free access and TREC is not freely available. I had used INEX for xapian in the past and some details are here: https://trac.xapian.org/wiki/GSoC2011/LTR/Notes#IREvaluationofLetorrankingscheme I roughly remember that there was a discussion with our this year GSOC student Ayush about INEX data. He had also obtained it, this would also be a good way to collaborate with him :) and try to establish a common evaluation dataset for future. Cheers Parth> > Certainly until we have something where evaluation shows an > improvement, we shouldn't change the default. It does sound like it > should be possible to find a suitable dataset to demonstrate this on, > though. > > J > > -- > James Aylett, occasional trouble-maker > xapian.org > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160728/75646a8d/attachment-0001.html>