thr3ads.net - Xapian devel - [Xapian-devel] Questions on letor module [Mar 2014]

If this information is useful, please help other people find it:
Share via:

Jia Xu

2014-Mar-04 06:00 UTC

[Xapian-devel] Questions on letor module

Hi,

I have several questions regarding the letor module,I looked at the
framework of learning to rank in xapian
http://rishabhmehrotra.com/gsoc/17.png, I am a little confused. Why using
deep learning to find unsupervised features in test data? Since in my
understanding, learning to rank model usually learn features from the
training data then apply the model to the test data? Why test set and
training set have different features? And deep learning is to extract
hidden features from the data set, I don't think it is necessary to use it
in this problem. Furthermore, I didn't see any implementation in the source
code for deep learning, is it actually included in letor?

For the source code
https://github.com/rishabhmehrotra/xapian/tree/397034af42c9b1998730160176d219d6f8f38b25/xapian-letor,
the last update is about 2 years ago, is that the latest version of the
code? For several files such as ranker.cc, evalmetric.cc, there is no
implementations of functions, I don't know if they have been implemented
somewhere in the module(as far as I read through the source code, I didn't
see any).

For the tests, are there any benchmark tests on SVM based or listnet
models on sample datasets and what the NDCG or MAP scores of them ( I
didn't see any measure methods have been implemented in the current
module)? And how about the cross validation for the training set? Is there
any method included in the current project?

For SVM method, I found letor_learn_model() has been commented out, but I
didn't find any other file contain this function (or maybe in
letor_internal.cc)?

Finally I found a file called letor_internal_refactored.cc file, is that
the latest version of letor_internal.cc ? Is letor_internal.cc
still being used?

Thank you very much. I am waiting for your reply.

--
Jia Xu
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20140303/80b1079f/attachment-0002.html>

Olly Betts

2014-Mar-04 12:13 UTC

head link

[Xapian-devel] Questions on letor module

On Mon, Mar 03, 2014 at 10:00:16PM -0800, Jia Xu wrote:> For the source code
>
https://github.com/rishabhmehrotra/xapian/tree/397034af42c9b1998730160176d219d6f8f38b25/xapian-letor,
> the last update is about 2 years ago, is that the latest version of the
> code? For several files such as ranker.cc, evalmetric.cc, there is no
> implementations of functions, I don't know if they have been
implemented
> somewhere in the module(as far as I read through the source code, I
didn't
> see any).
I believe that is the latest code.  The student working on the project
in 2012 dropped out (it turned out he was trying to do an internship as
well as GSoC, which really isn't a good idea) so things may not be in a
very polished state.

Your other questions all sound like good ones, but I don't really know
the answers.  Parth implemented the original letor code in 2011, and was
the mentoring the 2012 project most actively, plus he's more up on the
theoretical side, so hopefully he can answer these questions.

Cheers,
    Olly

Parth Gupta

2014-Mar-04 21:59 UTC

head link

[Xapian-devel] Questions on letor module

Hi Jia,

  I have several questions regarding the letor module,I looked at
the> framework of learning to rank in xapian
> http://rishabhmehrotra.com/gsoc/17.png, I am a little confused. Why using
> deep learning to find unsupervised features in test data? Since in my
> understanding, learning to rank model usually learn features from the
> training data then apply the model to the test data? Why test set and
> training set have different features? And deep learning is to extract
> hidden features from the data set, I don't think it is necessary to use
it
> in this problem. Furthermore, I didn't see any implementation in the
source
> code for deep learning, is it actually included in letor?
>
The idea of the GSoC project proposed by Rishabh was based on extracting
unsupervised features using deep learning on top of existing features based
on term frequency and related statistics. Well, this is not a tested
hypothesis that it would help but it was an added part. Lately we dropped
idea of adding this deep learning module. So you dont see any code related
to it.
>
>   For the source code
>
https://github.com/rishabhmehrotra/xapian/tree/397034af42c9b1998730160176d219d6f8f38b25/xapian-letor,
> the last update is about 2 years ago, is that the latest version of the
> code? For several files such as ranker.cc, evalmetric.cc, there is no
> implementations of functions, I don't know if they have been
implemented
> somewhere in the module(as far as I read through the source code, I
didn't
> see any).
>
That is the latest version of the code and the starting point of this
year's GSoC project. The ranker.cc is an abstract class and inherited by
the implemented rankers such as SVM, ListMLE and ListNET you can see the
corresponding definition can be found in .cc files. The evaluation part is
yet to be completed as per the instructions given in evalmetric.h

 For the tests,  are there any benchmark tests on SVM based or
listnet> models on sample datasets and what the NDCG or MAP scores of them ( I
> didn't see any measure methods have been implemented in the current
> module)? And how about the cross validation for the training set? Is there
> any method included in the current project?
>
For the SVM based model, there exist the benchmarking available at
http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#IREvaluationofLetorrankingscheme

Actually the first step of the new project will be generate this figure for
SVM based model with the new refactored code which is mostly done during
GSoC 2012 but never tested. We would appreciate if the prospective students
of the Letor project can generate this value before the student selection
deadline.

>
> For SVM method, I found letor_learn_model() has been commented out, but I
> didn't find any other file contain this function (or maybe in
> letor_internal.cc)?
>
> Finally I found a file called letor_internal_refactored.cc file, is that
> the latest version of letor_internal.cc ? Is letor_internal.cc
> still being used?
>
Right. The svmranker.cc is to be defined. Right now the SVM based ranker is
available in only non-refactored format which lies in
letor_internal_refactored.cc

I think it is the best exercise to prepare the svmranker.cc from the
letor_internal_refactored.cc by implemening necessary methods and
generating the MAP score reported on INEX data that would give you a better
grip of the code. I would love to see a patch on it.

Cheers,
Parth.

> Thank you very much. I am waiting for your reply.
>
> --
> Jia Xu
>
>
> _______________________________________________
> Xapian-devel mailing list
> Xapian-devel at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-devel
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20140304/8786130d/attachment-0002.html>

Jia Xu

2014-Mar-04 23:02 UTC

head link

[Xapian-devel] Questions on letor module

Thank you Parth. It is really helpful for me to understand the project.


On Tue, Mar 4, 2014 at 1:59 PM, Parth Gupta <pargup8 at gmail.com> wrote:
> Hi Jia,
>
>   I have several questions regarding the letor module,I looked at the
>> framework of learning to rank in xapian
>> http://rishabhmehrotra.com/gsoc/17.png, I am a little confused. Why
>> using deep learning to find unsupervised features in test data? Since
in my
>> understanding, learning to rank model usually learn features from the
>> training data then apply the model to the test data? Why test set and
>> training set have different features? And deep learning is to extract
>> hidden features from the data set, I don't think it is necessary to
use it
>> in this problem. Furthermore, I didn't see any implementation in
the source
>> code for deep learning, is it actually included in letor?
>>
>
> The idea of the GSoC project proposed by Rishabh was based on extracting
> unsupervised features using deep learning on top of existing features based
> on term frequency and related statistics. Well, this is not a tested
> hypothesis that it would help but it was an added part. Lately we dropped
> idea of adding this deep learning module. So you dont see any code related
> to it.
>
>>
>>   For the source code
>>
https://github.com/rishabhmehrotra/xapian/tree/397034af42c9b1998730160176d219d6f8f38b25/xapian-letor,
>> the last update is about 2 years ago, is that the latest version of the
>> code? For several files such as ranker.cc, evalmetric.cc, there is no
>> implementations of functions, I don't know if they have been
implemented
>> somewhere in the module(as far as I read through the source code, I
didn't
>> see any).
>>
>
> That is the latest version of the code and the starting point of this
> year's GSoC project. The ranker.cc is an abstract class and inherited
by
> the implemented rankers such as SVM, ListMLE and ListNET you can see the
> corresponding definition can be found in .cc files. The evaluation part is
> yet to be completed as per the instructions given in evalmetric.h
>
>  For the tests,  are there any benchmark tests on SVM based or listnet
>> models on sample datasets and what the NDCG or MAP scores of them ( I
>> didn't see any measure methods have been implemented in the current
>> module)? And how about the cross validation for the training set? Is
there
>> any method included in the current project?
>>
>
> For the SVM based model, there exist the benchmarking available at
>
http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#IREvaluationofLetorrankingscheme
>
> Actually the first step of the new project will be generate this figure
> for SVM based model with the new refactored code which is mostly done
> during GSoC 2012 but never tested. We would appreciate if the prospective
> students of the Letor project can generate this value before the student
> selection deadline.
>
>
>>
>> For SVM method, I found letor_learn_model() has been commented out, but
I
>> didn't find any other file contain this function (or maybe in
>> letor_internal.cc)?
>>
>> Finally I found a file called letor_internal_refactored.cc file, is
that
>> the latest version of letor_internal.cc ? Is letor_internal.cc
>> still being used?
>>
>
> Right. The svmranker.cc is to be defined. Right now the SVM based ranker
> is available in only non-refactored format which lies in
> letor_internal_refactored.cc
>
> I think it is the best exercise to prepare the svmranker.cc from the
> letor_internal_refactored.cc by implemening necessary methods and
> generating the MAP score reported on INEX data that would give you a better
> grip of the code. I would love to see a patch on it.
>
> Cheers,
> Parth.
>
>
>> Thank you very much. I am waiting for your reply.
>>
>> --
>> Jia Xu
>>
>>
>> _______________________________________________
>> Xapian-devel mailing list
>> Xapian-devel at lists.xapian.org
>> http://lists.xapian.org/mailman/listinfo/xapian-devel
>>
>>
>

-- 
Jia Xu
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20140304/2db23e48/attachment-0002.html>

Jia Xu

2014-Mar-04 23:04 UTC

head link

[Xapian-devel] Questions on letor module

Thank you Olly, now I understand the current status of the code.


On Tue, Mar 4, 2014 at 4:13 AM, Olly Betts <olly at survex.com> wrote:
> On Mon, Mar 03, 2014 at 10:00:16PM -0800, Jia Xu wrote:
> > For the source code
> >
>
https://github.com/rishabhmehrotra/xapian/tree/397034af42c9b1998730160176d219d6f8f38b25/xapian-letor
> ,
> > the last update is about 2 years ago, is that the latest version of
the
> > code? For several files such as ranker.cc, evalmetric.cc, there is no
> > implementations of functions, I don't know if they have been
implemented
> > somewhere in the module(as far as I read through the source code, I
> didn't
> > see any).
>
> I believe that is the latest code.  The student working on the project
> in 2012 dropped out (it turned out he was trying to do an internship as
> well as GSoC, which really isn't a good idea) so things may not be in a
> very polished state.
>
> Your other questions all sound like good ones, but I don't really know
> the answers.  Parth implemented the original letor code in 2011, and was
> the mentoring the 2012 project most actively, plus he's more up on the
> theoretical side, so hopefully he can answer these questions.
>
> Cheers,
>     Olly
>


-- 
Jia Xu
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20140304/4f2fe9d4/attachment-0002.html>

Possibly Parallel Threads

Search for more seemingly similar threads

Xapian devel - Mar 2014 - Questions on letor module

[Xapian-devel] Questions on letor module

[Xapian-devel] Questions on letor module

[Xapian-devel] Questions on letor module

[Xapian-devel] Questions on letor module

[Xapian-devel] Questions on letor module

Possibly Parallel Threads