Jiarong Wei
2014-Mar-09 09:58 UTC
[Xapian-devel] [GSOC 2014] Some questions about Letor module
Thanks for your reply! For the third question: In https://inex.mmci.uni-saarland.de/data/documentcollection.jsp, I can find inex2010-article.qrels in 2010 assessment, but can?t find query files. Could you send me the link? I have registered on INEX website. And I also need to download ``INEX 2009 collection without annotation tags: (unofficial)`` on http://www.mpi-inf.mpg.de/departments/d5/software/inex/, right? Thank you! Jiarong Wei On Mar 9, 2014, at 0:52, Parth Gupta <pargup8 at gmail.com> wrote:> Hi Jiarong Wei, > > > 1. In https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/letor_internal.cc#L299, there is a write_to_file method, which save RankList into ?train.txt?. But the format for ?train.txt? is different from the one mentioned in http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#QueryLevelNorm. And in https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/letor_internal_refactored.cc#L716, Qid and DocID become optional. What format should we use for ?train.txt?? Is there any sample ?train.txt? available? > > > You can find a sample of training file in the resources of Learning-to-Rank project on Xapian GSoC idea page. > > 2. In http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#QueryLevelNorm, it mentioned "the first column is the relevance judgement?. I think the value of the relevance judgement is just 0 or 1. But the code saves it as a ?double?. Is it just for convenience? Or I misunderstand the whole thing? > > In the INEX set it is binary but for other datasets, it may be higher integer values and sometimes real value. Hence. > > > 3. I?ve got qrels file of INEX 2010, but I can find query file. How can I get it? I can?t find it on INEX website. > > Have you checked in the instructions about that I have recently added to the project idea page? Basically, you have to register on INEX website to obtain data. > > Cheers, > Parth. > > Thank you! > > Jiarong Wei > > _______________________________________________ > Xapian-devel mailing list > Xapian-devel at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-devel > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140309/56712877/attachment-0002.html>
Parth Gupta
2014-Mar-09 10:07 UTC
[Xapian-devel] [GSOC 2014] Some questions about Letor module
The queries are usually referred as topics. Thanks for your reply! For the third question: In> https://inex.mmci.uni-saarland.de/data/documentcollection.jsp, I can > find inex2010-article.qrels in 2010 assessment, but can't find query files. > Could you send me the link? >2010: https://inex.mmci.uni-saarland.de/protected/adhoc/2010-topics.xml 2009: https://inex.mmci.uni-saarland.de/protected/adhoc/2009-topics.zip> I have registered on INEX website. And I also need to download ``INEX 2009 > collection without annotation tags: (unofficial)`` on > http://www.mpi-inf.mpg.de/departments/d5/software/inex/, right? >Right that would be documents to be indexed. Parth.> > Thank you! > > Jiarong Wei > > On Mar 9, 2014, at 0:52, Parth Gupta <pargup8 at gmail.com> wrote: > > Hi Jiarong Wei, > > > >> 1. In >> https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/letor_internal.cc#L299, >> there is a write_to_file method, which save RankList into "train.txt". But >> the format for "train.txt" is different from the one mentioned in >> http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#QueryLevelNorm. And in >> https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/letor_internal_refactored.cc#L716, >> Qid and DocID become optional. What format should we use for "train.txt"? >> Is there any sample "train.txt" available? >> >> > You can find a sample of training file in the resources of > Learning-to-Rank project on Xapian GSoC idea page. > > >> 2. In http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#QueryLevelNorm, it >> mentioned "the first column is the relevance judgement". I think the value >> of the relevance judgement is just 0 or 1. But the code saves it as a >> "double". Is it just for convenience? Or I misunderstand the whole thing? >> > > In the INEX set it is binary but for other datasets, it may be higher > integer values and sometimes real value. Hence. > > >> >> 3. I've got qrels file of INEX 2010, but I can find query file. How can I >> get it? I can't find it on INEX website. >> > > Have you checked in the instructions about that I have recently added to > the project idea page? Basically, you have to register on INEX website to > obtain data. > > Cheers, > Parth. > >> >> Thank you! >> >> Jiarong Wei >> >> _______________________________________________ >> Xapian-devel mailing list >> Xapian-devel at lists.xapian.org >> http://lists.xapian.org/mailman/listinfo/xapian-devel >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140309/6cf31f25/attachment-0002.html>
Jiarong Wei
2014-Mar-09 10:12 UTC
[Xapian-devel] [GSOC 2014] Some questions about Letor module
Thank you very much, Parth! Jiarong Wei On Mar 9, 2014, at 3:07, Parth Gupta <pargup8 at gmail.com> wrote:> The queries are usually referred as topics. > > Thanks for your reply! For the third question: In https://inex.mmci.uni-saarland.de/data/documentcollection.jsp, I can find inex2010-article.qrels in 2010 assessment, but can?t find query files. Could you send me the link? > > 2010: https://inex.mmci.uni-saarland.de/protected/adhoc/2010-topics.xml > 2009: https://inex.mmci.uni-saarland.de/protected/adhoc/2009-topics.zip > > I have registered on INEX website. And I also need to download ``INEX 2009 collection without annotation tags: (unofficial)`` on http://www.mpi-inf.mpg.de/departments/d5/software/inex/, right? > > Right that would be documents to be indexed. > > Parth. > > Thank you! > > Jiarong Wei > > On Mar 9, 2014, at 0:52, Parth Gupta <pargup8 at gmail.com> wrote: > >> Hi Jiarong Wei, >> >> >> 1. In https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/letor_internal.cc#L299, there is a write_to_file method, which save RankList into ?train.txt?. But the format for ?train.txt? is different from the one mentioned in http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#QueryLevelNorm. And in https://github.com/rishabhmehrotra/xapian/blob/master/xapian-letor/letor_internal_refactored.cc#L716, Qid and DocID become optional. What format should we use for ?train.txt?? Is there any sample ?train.txt? available? >> >> >> You can find a sample of training file in the resources of Learning-to-Rank project on Xapian GSoC idea page. >> >> 2. In http://trac.xapian.org/wiki/GSoC2011/LTR/Notes#QueryLevelNorm, it mentioned "the first column is the relevance judgement?. I think the value of the relevance judgement is just 0 or 1. But the code saves it as a ?double?. Is it just for convenience? Or I misunderstand the whole thing? >> >> In the INEX set it is binary but for other datasets, it may be higher integer values and sometimes real value. Hence. >> >> >> 3. I?ve got qrels file of INEX 2010, but I can find query file. How can I get it? I can?t find it on INEX website. >> >> Have you checked in the instructions about that I have recently added to the project idea page? Basically, you have to register on INEX website to obtain data. >> >> Cheers, >> Parth. >> >> Thank you! >> >> Jiarong Wei >> >> _______________________________________________ >> Xapian-devel mailing list >> Xapian-devel at lists.xapian.org >> http://lists.xapian.org/mailman/listinfo/xapian-devel >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140309/182cd2cf/attachment-0002.html>