thr3ads.net - Xapian devel - Weighting Schemes: Evaluation results [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Vivek Pal

2016-Jul-25 12:41 UTC

Weighting Schemes: Evaluation results

Hi James,
> We probably don't want them committed in git where they're
evaluation
> runs (because we can recreate them); a gist might be more appropriate.
Sorry, I have moved results files over to gist for each individual
weighting scheme.
Link: https://gist.github.com/ivmarkp/secret
> I can't tell, but are some of those files from FIRE? If so, they
> shouldn't be committed either; access to FIRE is via our usage
> agreement, and shouldn't be just public on the internet
> anywhere.
No, those files are generated each time a run is completed, and just
contain evaluation results that are displayed on terminal.
> Is there time in your schedule to get evaluation into the main xapian
> repo? That would avoid the first part of this. I don't think we're
> looking at lots more work to get this done, are we?
No, getting evaluation module merged in xapian is not a part of project
schedule but it is one of the additional tasks kept for later attention.
And now that I've run some evaluations, I think module is in good shape
also with support for more weighting schemes due to be added through these
PR's https://goo.gl/D2fviW.
> Can you remind me what sort of corpus you're using from FIRE for this?
The corpus we are using contains sorted news articles/stories based on
section and time period
from two different news providers; BDNews 24 and The Telegraph.
> Do you have any idea what 'very long' means in this case, in terms
of
> number of terms (or maybe multiple of mean terms)
Very long documents in terms of no. of terms as specified in the paper; in
general, where |D| is much larger than avdl.

It is mentioned in the paper that "the MAP improvements of BM25+ over BM25
are much larger on Web collections than on the news collection. In
particular, the MAP improvements on all Web collections are statistically
significant." Therefore, they seem to have used four TREC collections:
WT2G,
WT10G, Terabyte, and Robust04, which represent different sizes and genre of
text collections.
> Is this pure Dirichlet, or two-stage smoothing using Dir+
> versus Dir? What smoothing parameters were you using?
That is pure Dirichlet vs Dir+ and sorry, I should have also uploaded the
config which has the parameter details.
For Dir+ I used following parameters:

lmparam_log  0.0
lmparam_select_smoothing DIRICHLET_SMOOTHING
lmparam_smoothing1 0.9
lmparam_smoothing2 2000.0
lmparam_delta 0.05
lmparam_enable_dirplus 1

I've added config files in gists as well.
> Sorry you've been sick; make sure you're fully recovered before
diving
> back in full throttle!
Thanks, I've gotten better. Should be no more hindrance in the days to come
:)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20160725/1fb8c79a/attachment.html>

James Aylett

2016-Jul-25 15:30 UTC

head link

Weighting Schemes: Evaluation results

On Mon, Jul 25, 2016 at 06:11:21PM +0530, Vivek Pal wrote:
> > We probably don't want them committed in git where they're
evaluation
> > runs (because we can recreate them); a gist might be more appropriate.
> 
> Sorry, I have moved results files over to gist for each individual
> weighting scheme.
> Link: https://gist.github.com/ivmarkp/secret
You need to share the actual URL of the gist, otherwise only you can see
them I think :-)

Or just make them public; there's nothing sensitive in these, I think.

(One gist can contain multiple files, and people can then clone or
download the whole lot easily.)
> > I can't tell, but are some of those files from FIRE?
> 
> No, those files are generated each time a run is completed, and just
> contain evaluation results that are displayed on terminal.
Okay, great.
> > Can you remind me what sort of corpus you're using from FIRE for
this?
> 
> The corpus we are using contains sorted news articles/stories based
> on section and time period from two different news providers; BDNews
> 24 and The Telegraph.
Great, thanks; it's worth noting this somewhere (maybe on your project
wiki page).
> > Do you have any idea what 'very long' means in this case, in
terms of
> > number of terms (or maybe multiple of mean terms)
> 
> Very long documents in terms of no. of terms as specified in the paper; in
> general, where |D| is much larger than avdl.
> 
> It is mentioned in the paper that "the MAP improvements of BM25+ over
BM25
> are much larger on Web collections than on the news collection. In
> particular, the MAP improvements on all Web collections are statistically
> significant." Therefore, they seem to have used four TREC collections:
WT2G,
> WT10G, Terabyte, and Robust04, which represent different sizes and genre of
> text collections.
Ah. If FIRE doesn't have something that can show this suitably, then
maybe Parth can advise on access to TREC, as I know he's used some of
them in the past.

Certainly until we have something where evaluation shows an
improvement, we shouldn't change the default. It does sound like it
should be possible to find a suitable dataset to demonstrate this on,
though.

J

-- 
  James Aylett, occasional trouble-maker
  xapian.org

Vivek Pal

2016-Jul-26 06:51 UTC

head link

Weighting Schemes: Evaluation results

> You need to share the actual URL of the gist, otherwise only you can see
> them I think :-)
Sorry, I've made all gists public. https://gist.github.com/ivmarkp
> Great, thanks; it's worth noting this somewhere (maybe on your project
> wiki page).
Okay, I'll update the project plan page with more details related to
dataset used for evaluation runs.
> Certainly until we have something where evaluation shows an
> improvement, we shouldn't change the default.
Yes, I think the same and I feel it'd still be worth having these new
weighting schemes as alternatives in Xapian
as for e.g. PL2+ shows some better results already on the news collection
that we currently have. Likewise we
might see similar promising results from other weighting schemes as well by
evaluating them on web collections.

Thanks,
Vivek

On Mon, Jul 25, 2016 at 9:00 PM, James Aylett <james-xapian at
tartarus.org>
wrote:
> On Mon, Jul 25, 2016 at 06:11:21PM +0530, Vivek Pal wrote:
>
> > > We probably don't want them committed in git where
they're evaluation
> > > runs (because we can recreate them); a gist might be more
appropriate.
> >
> > Sorry, I have moved results files over to gist for each individual
> > weighting scheme.
> > Link: https://gist.github.com/ivmarkp/secret
>
> You need to share the actual URL of the gist, otherwise only you can see
> them I think :-)
>
> Or just make them public; there's nothing sensitive in these, I think.
>
> (One gist can contain multiple files, and people can then clone or
> download the whole lot easily.)
>
> > > I can't tell, but are some of those files from FIRE?
> >
> > No, those files are generated each time a run is completed, and just
> > contain evaluation results that are displayed on terminal.
>
> Okay, great.
>
> > > Can you remind me what sort of corpus you're using from FIRE
for this?
> >
> > The corpus we are using contains sorted news articles/stories based
> > on section and time period from two different news providers; BDNews
> > 24 and The Telegraph.
>
> Great, thanks; it's worth noting this somewhere (maybe on your project
> wiki page).
>
> > > Do you have any idea what 'very long' means in this case,
in terms of
> > > number of terms (or maybe multiple of mean terms)
> >
> > Very long documents in terms of no. of terms as specified in the
paper;
> in
> > general, where |D| is much larger than avdl.
> >
> > It is mentioned in the paper that "the MAP improvements of BM25+
over
> BM25
> > are much larger on Web collections than on the news collection. In
> > particular, the MAP improvements on all Web collections are
statistically
> > significant." Therefore, they seem to have used four TREC
collections:
> WT2G,
> > WT10G, Terabyte, and Robust04, which represent different sizes and
genre
> of
> > text collections.
>
> Ah. If FIRE doesn't have something that can show this suitably, then
> maybe Parth can advise on access to TREC, as I know he's used some of
> them in the past.
>
> Certainly until we have something where evaluation shows an
> improvement, we shouldn't change the default. It does sound like it
> should be possible to find a suitable dataset to demonstrate this on,
> though.
>
> J
>
> --
>   James Aylett, occasional trouble-maker
>   xapian.org
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20160726/77b7ac86/attachment.html>

Parth Gupta

2016-Jul-28 06:55 UTC

head link

Weighting Schemes: Evaluation results

Ah. If FIRE doesn't have something that can show this suitably,
then> maybe Parth can advise on access to TREC, as I know he's used some of
> them in the past.
>

?I can say FIRE is also a reliable source but INEX/TREC are better. INEX
can give you free access and TREC is not freely available. I had used INEX
for xapian in the past and some details are here:
https://trac.xapian.org/wiki/GSoC2011/LTR/Notes#IREvaluationofLetorrankingscheme

I roughly remember that there was a discussion with our this year GSOC
student Ayush about INEX data. He had also obtained it, this would also be
a good way to collaborate with him :) and try to establish a common
evaluation dataset for future.

Cheers
Parth
>
> Certainly until we have something where evaluation shows an
> improvement, we shouldn't change the default. It does sound like it
> should be possible to find a suitable dataset to demonstrate this on,
> though.
>
> J
>
> --
>   James Aylett, occasional trouble-maker
>   xapian.org
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20160728/75646a8d/attachment-0001.html>

Possibly Parallel Threads

Search for more reasonably related threads

Xapian devel - Jul 2016 - Weighting Schemes: Evaluation results

Weighting Schemes: Evaluation results

Weighting Schemes: Evaluation results

Weighting Schemes: Evaluation results

Weighting Schemes: Evaluation results

Possibly Parallel Threads