Jiarong Wei
2014-Mar-09 00:36 UTC
[Xapian-devel] [GSOC 2014] Some questions about Letor module
Hi, I've read the code of letor module. And I have some questions about it: 1. In https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/letor_internal.cc#L299, there is a write_to_file method, which save RankList into ?train.txt?. But the format for ?train.txt? is different from the one mentioned in http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#QueryLevelNorm. And in https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/letor_internal_refactored.cc#L716, Qid and DocID become optional. What format should we use for ?train.txt?? Is there any sample ?train.txt? available? 2. In http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#QueryLevelNorm, it mentioned "the first column is the relevance judgement?. I think the value of the relevance judgement is just 0 or 1. But the code saves it as a ?double?. Is it just for convenience? Or I misunderstand the whole thing? 3. I?ve got qrels file of INEX 2010, but I can find query file. How can I get it? I can?t find it on INEX website. Thank you! Jiarong Wei -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140308/4d845805/attachment-0002.html>
Parth Gupta
2014-Mar-09 08:52 UTC
[Xapian-devel] [GSOC 2014] Some questions about Letor module
Hi Jiarong Wei,> 1. In > https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/letor_internal.cc#L299, > there is a write_to_file method, which save RankList into "train.txt". But > the format for "train.txt" is different from the one mentioned in > http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#QueryLevelNorm. And in > https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/letor_internal_refactored.cc#L716, > Qid and DocID become optional. What format should we use for "train.txt"? > Is there any sample "train.txt" available? > >You can find a sample of training file in the resources of Learning-to-Rank project on Xapian GSoC idea page.> 2. In http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#QueryLevelNorm, it > mentioned "the first column is the relevance judgement". I think the value > of the relevance judgement is just 0 or 1. But the code saves it as a > "double". Is it just for convenience? Or I misunderstand the whole thing? >In the INEX set it is binary but for other datasets, it may be higher integer values and sometimes real value. Hence.> > 3. I've got qrels file of INEX 2010, but I can find query file. How can I > get it? I can't find it on INEX website. >Have you checked in the instructions about that I have recently added to the project idea page? Basically, you have to register on INEX website to obtain data. Cheers, Parth.> > Thank you! > > Jiarong Wei > > _______________________________________________ > Xapian-devel mailing list > Xapian-devel at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-devel > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140309/a769d0de/attachment-0002.html>
Jiarong Wei
2014-Mar-09 09:58 UTC
[Xapian-devel] [GSOC 2014] Some questions about Letor module
Thanks for your reply! For the third question: In https://inex.mmci.uni-saarland.de/data/documentcollection.jsp, I can find inex2010-article.qrels in 2010 assessment, but can?t find query files. Could you send me the link? I have registered on INEX website. And I also need to download ``INEX 2009 collection without annotation tags: (unofficial)`` on http://www.mpi-inf.mpg.de/departments/d5/software/inex/, right? Thank you! Jiarong Wei On Mar 9, 2014, at 0:52, Parth Gupta <pargup8 at gmail.com> wrote:> Hi Jiarong Wei, > > > 1. In https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/letor_internal.cc#L299, there is a write_to_file method, which save RankList into ?train.txt?. But the format for ?train.txt? is different from the one mentioned in http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#QueryLevelNorm. And in https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/letor_internal_refactored.cc#L716, Qid and DocID become optional. What format should we use for ?train.txt?? Is there any sample ?train.txt? available? > > > You can find a sample of training file in the resources of Learning-to-Rank project on Xapian GSoC idea page. > > 2. In http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#QueryLevelNorm, it mentioned "the first column is the relevance judgement?. I think the value of the relevance judgement is just 0 or 1. But the code saves it as a ?double?. Is it just for convenience? Or I misunderstand the whole thing? > > In the INEX set it is binary but for other datasets, it may be higher integer values and sometimes real value. Hence. > > > 3. I?ve got qrels file of INEX 2010, but I can find query file. How can I get it? I can?t find it on INEX website. > > Have you checked in the instructions about that I have recently added to the project idea page? Basically, you have to register on INEX website to obtain data. > > Cheers, > Parth. > > Thank you! > > Jiarong Wei > > _______________________________________________ > Xapian-devel mailing list > Xapian-devel at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-devel > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140309/56712877/attachment-0002.html>