Mayank Chaudhary
2014-Mar-09 10:34 UTC
[Xapian-devel] Discuss a few things about already implemented methods in Rishabh's branch
Hi Parth, Before getting started to work on svmranker.cc, I need to discuss a few things. For *featurevector.h *- 1. I think it is a header file for the data-structure used for storing a query relevance though it has been mentioned there that This file responsible for transforming the document into the feature space<https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/featurevector.h#L1>. Also all the methods there are *get* and *set* except *load_relevance*(). This same method is also present in featuremanager.h<https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/featuremanager.h#L55>. Implementation wise too they are same. I can't find the reason why the same method is present in two headers. http://trac.xapian.org/wiki/GSoC2012/LTR/TODO also shows that there shouldn't be load_relevance() method in featurevector.h . 2. As it was mentioned in a mail by Jiarong Wei, the data member *label*should be of type *bool* rather than *double*. The data member *fcount* is also unused. 3. As it is a feature vector then there should be data member *queryid* but I found out that it is in ranklist.h<https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/ranklist.h#L50>. Other than that I wanted to know that has ListMLE and ListNet been tested? And what is autoencoder.cc for and where is the "dimred/ya_ate_dimred.h" header that has been included in it? -Mayank -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140309/188ac6e0/attachment-0002.html>
Parth Gupta
2014-Mar-09 11:10 UTC
[Xapian-devel] Discuss a few things about already implemented methods in Rishabh's branch
Hi Mayank,> Before getting started to work on svmranker.cc, I need to discuss a few > things. >Yes, it is a good idea to have insight of the framework before starting to actually write something. For *featurevector.h *-> > 1. I think it is a header file for the data-structure used for storing a > query relevance though it has been mentioned there that This file > responsible for transforming the document into the feature space<https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/featurevector.h#L1>. Also all the methods there are > *get* and *set* except *load_relevance*(). This same method is also > present in featuremanager.h<https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/featuremanager.h#L55>. Implementation wise too they are same. I can't find the reason why the > same method is present in two headers. > http://trac.xapian.org/wiki/GSoC2012/LTR/TODO also shows that there > shouldn't be load_relevance() method in featurevector.h . >Some redundancy might be observed as the code is not scrubbed and actually the project was unfortunately could not finish. Yes, load relevance lies more naturally in featuremanager than featurevector. Bascially the featurevector operates at a document level and the ranklist operates at a query level. One query has many documents related to it. So all those values which are common for all the documents will be in ranklist and the information pertaining to the documents only will rest in featurevector. Featuremanger does most of the job to construct featurevector and fetch necessary statistics for it.> 2. As it was mentioned in a mail by Jiarong Wei, the data member *label*should be of type > *bool* rather than *double*. The data member *fcount* is also unused. >I just answered him that, many Letor datasets have more than two relevance levels (Letor 3.0 and 4.0 have three relevance levels, Yahoo! Letor dataset has 5). The idea behind keeping it double is when we have real number relevance for the feature vector assigned by the ranking algorithm, it will be stored on the same place. The evaluationmetric should sort the document based on this number. Yes, 'fcount' must be used and it is a TODO.> > 3. As it is a feature vector then there should be data member *queryid*but I found out that it is in > ranklist.h<https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/ranklist.h#L50>. >Just see the explanation to point 1.> > Other than that I wanted to know that has ListMLE and ListNet been tested? > And what is autoencoder.cc for and where is the "dimred/ya_ate_dimred.h" > header that has been included in it? >ListMLE and ListNet are not tested, also Rishabh did not mentioned their performance. We have only the benchmark evaluation of svmranker. Just ignore the autoencoder.cc because it was part of Rishabh's idea to add unsupervised features using Deep learning in feature vector in addition to conventional features. Cheers, Parth.> > -Mayank > > _______________________________________________ > Xapian-devel mailing list > Xapian-devel at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-devel > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140309/b9518e0a/attachment-0002.html>