Hello, I want to discuss my idea on ranking in IR system which I think can be good extension to Xapian. If I am not too late to discuss it then please consider it. I first give you brief background of me, I am a Masters student working on my thesis in the Information Retrieval. I today only got a mail from one of the professor from Europe whom i am going to join for Ph.D about GSoC and more precisely Xapian. Generally the ranking is unsupervised, where the rank list is produced based on the score provided by the ranking function. Ranking functions are unsupervised like BM25, TF-IDF and so on. So we give the rank list in the dercreasing order of the score. Well learning to rank involves supervised learning. If we can extract features for a query and intial retrieval of documents pairs then we can say which document should come above which. Basically search engine requires relevant documents in top order, because user gnerally never bothers to click on the next page of the retrieval rether he chooses to modify the query. In Laarning to Rank (Letor) we prepare the features which can represent a query document pair. So now after the initial retrieval we take say first 20 or 30 documents and represent them in form of feature vactors, now based on the training data our supervised leaning will give a score to each document for a particular query. For example if this learning is from regression then we have to learn 'W' vector which will give a score to the document vector by dot product. Here the features can be term frequency, TF-IDF score, BM25 Score etc, as good as many. For Learning there are many machine learning techniques available. Regards, Parth Gupta, M.Tech Candidate, DA-IICT, Gandhinagar, India. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20110401/f42d639b/attachment.html>
Hey Olly and Richard, Research has shown in many papers that Incorporating Learning in Ranking has improved Results in terms of evaluation measures of Information Retrieval, MRR(Mean Reciprocal Rank) or MAP (Mean Average Precision). So I would certainly want to investigate and incorporate it in Xapian project. Please give your feedback on the possibility of exploration of the idea so that I can incorporate those things in my application. Waiting for the feedbacks. Regards, Parth -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20110403/2f4079ff/attachment.html>
On Fri, Apr 01, 2011 at 02:48:28PM +0530, Parth Gupta wrote:> In Laarning to Rank (Letor) we prepare the features which can represent a > query document pair. So now after the initial retrieval we take say first 20 > or 30 documents and represent them in form of feature vactors, now based on > the training data our supervised leaning will give a score to each document > for a particular query. For example if this learning is from regression then > we have to learn 'W' vector which will give a score to the document vector > by dot product. > > Here the features can be term frequency, TF-IDF score, BM25 Score etc, as > good as many. For Learning there are many machine learning techniques > available.What would be your plan for gathering data to train with? Some sort of click-through measurements? On Sun, Apr 03, 2011 at 12:37:27PM +0530, Parth Gupta wrote:> Please give your feedback on the possibility of exploration of the idea so > that I can incorporate those things in my application.It seems an interesting project to me, though I'm not sure I know enough about the are to offer a much in the way of useful insights. I can probably ask some stupid questions though. But I'm certainly happy to consider an application from you for working on this. Cheers, Olly