Hi, I have several questions regarding the letor module,I looked at the framework of learning to rank in xapian http://rishabhmehrotra.com/gsoc/17.png, I am a little confused. Why using deep learning to find unsupervised features in test data? Since in my understanding, learning to rank model usually learn features from the training data then apply the model to the test data? Why test set and training set have different features? And deep learning is to extract hidden features from the data set, I don't think it is necessary to use it in this problem. Furthermore, I didn't see any implementation in the source code for deep learning, is it actually included in letor? For the source code https://github.com/rishabhmehrotra/xapian/tree/397034af42c9b1998730160176d219d6f8f38b25/xapian-letor, the last update is about 2 years ago, is that the latest version of the code? For several files such as ranker.cc, evalmetric.cc, there is no implementations of functions, I don't know if they have been implemented somewhere in the module(as far as I read through the source code, I didn't see any). For the tests, are there any benchmark tests on SVM based or listnet models on sample datasets and what the NDCG or MAP scores of them ( I didn't see any measure methods have been implemented in the current module)? And how about the cross validation for the training set? Is there any method included in the current project? For SVM method, I found letor_learn_model() has been commented out, but I didn't find any other file contain this function (or maybe in letor_internal.cc)? Finally I found a file called letor_internal_refactored.cc file, is that the latest version of letor_internal.cc ? Is letor_internal.cc still being used? Thank you very much. I am waiting for your reply. -- Jia Xu -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140303/80b1079f/attachment-0002.html>
On Mon, Mar 03, 2014 at 10:00:16PM -0800, Jia Xu wrote:> For the source code > https://github.com/rishabhmehrotra/xapian/tree/397034af42c9b1998730160176d219d6f8f38b25/xapian-letor, > the last update is about 2 years ago, is that the latest version of the > code? For several files such as ranker.cc, evalmetric.cc, there is no > implementations of functions, I don't know if they have been implemented > somewhere in the module(as far as I read through the source code, I didn't > see any).I believe that is the latest code. The student working on the project in 2012 dropped out (it turned out he was trying to do an internship as well as GSoC, which really isn't a good idea) so things may not be in a very polished state. Your other questions all sound like good ones, but I don't really know the answers. Parth implemented the original letor code in 2011, and was the mentoring the 2012 project most actively, plus he's more up on the theoretical side, so hopefully he can answer these questions. Cheers, Olly
Hi Jia, I have several questions regarding the letor module,I looked at the> framework of learning to rank in xapian > http://rishabhmehrotra.com/gsoc/17.png, I am a little confused. Why using > deep learning to find unsupervised features in test data? Since in my > understanding, learning to rank model usually learn features from the > training data then apply the model to the test data? Why test set and > training set have different features? And deep learning is to extract > hidden features from the data set, I don't think it is necessary to use it > in this problem. Furthermore, I didn't see any implementation in the source > code for deep learning, is it actually included in letor? >The idea of the GSoC project proposed by Rishabh was based on extracting unsupervised features using deep learning on top of existing features based on term frequency and related statistics. Well, this is not a tested hypothesis that it would help but it was an added part. Lately we dropped idea of adding this deep learning module. So you dont see any code related to it.> > For the source code > https://github.com/rishabhmehrotra/xapian/tree/397034af42c9b1998730160176d219d6f8f38b25/xapian-letor, > the last update is about 2 years ago, is that the latest version of the > code? For several files such as ranker.cc, evalmetric.cc, there is no > implementations of functions, I don't know if they have been implemented > somewhere in the module(as far as I read through the source code, I didn't > see any). >That is the latest version of the code and the starting point of this year's GSoC project. The ranker.cc is an abstract class and inherited by the implemented rankers such as SVM, ListMLE and ListNET you can see the corresponding definition can be found in .cc files. The evaluation part is yet to be completed as per the instructions given in evalmetric.h For the tests, are there any benchmark tests on SVM based or listnet> models on sample datasets and what the NDCG or MAP scores of them ( I > didn't see any measure methods have been implemented in the current > module)? And how about the cross validation for the training set? Is there > any method included in the current project? >For the SVM based model, there exist the benchmarking available at http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#IREvaluationofLetorrankingscheme Actually the first step of the new project will be generate this figure for SVM based model with the new refactored code which is mostly done during GSoC 2012 but never tested. We would appreciate if the prospective students of the Letor project can generate this value before the student selection deadline.> > For SVM method, I found letor_learn_model() has been commented out, but I > didn't find any other file contain this function (or maybe in > letor_internal.cc)? > > Finally I found a file called letor_internal_refactored.cc file, is that > the latest version of letor_internal.cc ? Is letor_internal.cc > still being used? >Right. The svmranker.cc is to be defined. Right now the SVM based ranker is available in only non-refactored format which lies in letor_internal_refactored.cc I think it is the best exercise to prepare the svmranker.cc from the letor_internal_refactored.cc by implemening necessary methods and generating the MAP score reported on INEX data that would give you a better grip of the code. I would love to see a patch on it. Cheers, Parth.> Thank you very much. I am waiting for your reply. > > -- > Jia Xu > > > _______________________________________________ > Xapian-devel mailing list > Xapian-devel at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-devel > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140304/8786130d/attachment-0002.html>
Thank you Parth. It is really helpful for me to understand the project. On Tue, Mar 4, 2014 at 1:59 PM, Parth Gupta <pargup8 at gmail.com> wrote:> Hi Jia, > > I have several questions regarding the letor module,I looked at the >> framework of learning to rank in xapian >> http://rishabhmehrotra.com/gsoc/17.png, I am a little confused. Why >> using deep learning to find unsupervised features in test data? Since in my >> understanding, learning to rank model usually learn features from the >> training data then apply the model to the test data? Why test set and >> training set have different features? And deep learning is to extract >> hidden features from the data set, I don't think it is necessary to use it >> in this problem. Furthermore, I didn't see any implementation in the source >> code for deep learning, is it actually included in letor? >> > > The idea of the GSoC project proposed by Rishabh was based on extracting > unsupervised features using deep learning on top of existing features based > on term frequency and related statistics. Well, this is not a tested > hypothesis that it would help but it was an added part. Lately we dropped > idea of adding this deep learning module. So you dont see any code related > to it. > >> >> For the source code >> https://github.com/rishabhmehrotra/xapian/tree/397034af42c9b1998730160176d219d6f8f38b25/xapian-letor, >> the last update is about 2 years ago, is that the latest version of the >> code? For several files such as ranker.cc, evalmetric.cc, there is no >> implementations of functions, I don't know if they have been implemented >> somewhere in the module(as far as I read through the source code, I didn't >> see any). >> > > That is the latest version of the code and the starting point of this > year's GSoC project. The ranker.cc is an abstract class and inherited by > the implemented rankers such as SVM, ListMLE and ListNET you can see the > corresponding definition can be found in .cc files. The evaluation part is > yet to be completed as per the instructions given in evalmetric.h > > For the tests, are there any benchmark tests on SVM based or listnet >> models on sample datasets and what the NDCG or MAP scores of them ( I >> didn't see any measure methods have been implemented in the current >> module)? And how about the cross validation for the training set? Is there >> any method included in the current project? >> > > For the SVM based model, there exist the benchmarking available at > http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#IREvaluationofLetorrankingscheme > > Actually the first step of the new project will be generate this figure > for SVM based model with the new refactored code which is mostly done > during GSoC 2012 but never tested. We would appreciate if the prospective > students of the Letor project can generate this value before the student > selection deadline. > > >> >> For SVM method, I found letor_learn_model() has been commented out, but I >> didn't find any other file contain this function (or maybe in >> letor_internal.cc)? >> >> Finally I found a file called letor_internal_refactored.cc file, is that >> the latest version of letor_internal.cc ? Is letor_internal.cc >> still being used? >> > > Right. The svmranker.cc is to be defined. Right now the SVM based ranker > is available in only non-refactored format which lies in > letor_internal_refactored.cc > > I think it is the best exercise to prepare the svmranker.cc from the > letor_internal_refactored.cc by implemening necessary methods and > generating the MAP score reported on INEX data that would give you a better > grip of the code. I would love to see a patch on it. > > Cheers, > Parth. > > >> Thank you very much. I am waiting for your reply. >> >> -- >> Jia Xu >> >> >> _______________________________________________ >> Xapian-devel mailing list >> Xapian-devel at lists.xapian.org >> http://lists.xapian.org/mailman/listinfo/xapian-devel >> >> >-- Jia Xu -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140304/2db23e48/attachment-0002.html>
Thank you Olly, now I understand the current status of the code. On Tue, Mar 4, 2014 at 4:13 AM, Olly Betts <olly at survex.com> wrote:> On Mon, Mar 03, 2014 at 10:00:16PM -0800, Jia Xu wrote: > > For the source code > > > https://github.com/rishabhmehrotra/xapian/tree/397034af42c9b1998730160176d219d6f8f38b25/xapian-letor > , > > the last update is about 2 years ago, is that the latest version of the > > code? For several files such as ranker.cc, evalmetric.cc, there is no > > implementations of functions, I don't know if they have been implemented > > somewhere in the module(as far as I read through the source code, I > didn't > > see any). > > I believe that is the latest code. The student working on the project > in 2012 dropped out (it turned out he was trying to do an internship as > well as GSoC, which really isn't a good idea) so things may not be in a > very polished state. > > Your other questions all sound like good ones, but I don't really know > the answers. Parth implemented the original letor code in 2011, and was > the mentoring the 2012 project most actively, plus he's more up on the > theoretical side, so hopefully he can answer these questions. > > Cheers, > Olly >-- Jia Xu -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140304/4f2fe9d4/attachment-0002.html>