The Letor project involves descent amount of Machine Learning while all the ranking related projects are around IR. Its better to introduce your idea on mailing list where all the mentors can have a detailed look at it, potential mentors can respond and the idea is kind of registered under your name. Cheers, Parth. On Wed, Feb 26, 2014 at 10:20 AM, Olly Betts <olly at survex.com> wrote:> On Tue, Feb 25, 2014 at 03:58:09PM +0530, karthik iyer wrote: > > I am C Karthik Iyer, a 3rd year B Tech student at NITK Surathkal. I > am > > interested in working on projects on Information Retrieval and Machine > > Learning. I've had previous experience on working on projects regarding > > Question Answering Systems. > > I have a project idea which includes both IR and ML but i dont know > how > > feasible the idea is. Could you guys say when will you be available on > IRC > > so that I can discuss the idea with you. > > I can't say for certain when I'll be monitoring IRC, but I'm in UTC+13. > Other mentors are in a variety of timezones. > > If the idea is complex, email might be better though. > > Cheers, > Olly > > _______________________________________________ > Xapian-devel mailing list > Xapian-devel at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-devel >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140226/83b3a72e/attachment-0002.html>
Hello, So my idea goes like this. Basically I have been working on Question Answering systems. I developed a QA system for "when" type questions (sorry I cant provide the source code at the moment because my paper is under review at SIGIR 2014). I used the part-of-speech and developed a weighted scoring system. Now I basically plan on developing a generic QA system which encompasses a large number of questions. The biggest drawback of my previous QA system was the lack of relevance measuring mechanism. I want to develop a relevance measure between a query and a sentence. I believe there already exist many relevance measuring codes but those relate a query to a document( as far as I know). To develop a relevance measure I need to take into consideration a large number of sentences and questions so that a generic feature set can be formed which will further be employed in my ML algorithm. This needs a huge dataset of documents which I dont have due to lack of any financial support. I was planning to use the AQUAINT 2 dataset but it costs $500 which i cannot afford. Now if I am successful at building a relevance measuring system between a query and a sentence then I will take into consideration only those sentences that are relevant. Then I will apply my scoring system to those sentences which will help me select the final answer sentence. In my previous project I got an efficiency of ~74% tested on 200 test queries. I believe that with a proper relevance measure I can cross the 90% mark. Please give your suggestions on my project idea. It would be very helpful. Regards Karthik On Wed, Feb 26, 2014 at 5:16 PM, Parth Gupta <pargup8 at gmail.com> wrote:> The Letor project involves descent amount of Machine Learning while all > the ranking related projects are around IR. Its better to introduce your > idea on mailing list where all the mentors can have a detailed look at it, > potential mentors can respond and the idea is kind of registered under your > name. > > Cheers, > Parth. > > > On Wed, Feb 26, 2014 at 10:20 AM, Olly Betts <olly at survex.com> wrote: > >> On Tue, Feb 25, 2014 at 03:58:09PM +0530, karthik iyer wrote: >> > I am C Karthik Iyer, a 3rd year B Tech student at NITK Surathkal. I >> am >> > interested in working on projects on Information Retrieval and Machine >> > Learning. I've had previous experience on working on projects regarding >> > Question Answering Systems. >> > I have a project idea which includes both IR and ML but i dont know >> how >> > feasible the idea is. Could you guys say when will you be available on >> IRC >> > so that I can discuss the idea with you. >> >> I can't say for certain when I'll be monitoring IRC, but I'm in UTC+13. >> Other mentors are in a variety of timezones. >> >> If the idea is complex, email might be better though. >> >> Cheers, >> Olly >> >> _______________________________________________ >> Xapian-devel mailing list >> Xapian-devel at lists.xapian.org >> http://lists.xapian.org/mailman/listinfo/xapian-devel >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140227/b0c1c611/attachment-0002.html>
On Thu, Feb 27, 2014 at 01:11:24PM +0530, karthik iyer wrote:> So my idea goes like this. Basically I have been working on Question > Answering systems. I developed a QA system for "when" type questions (sorry > I cant provide the source code at the moment because my paper is under > review at SIGIR 2014). I used the part-of-speech and developed a weighted > scoring system. > Now I basically plan on developing a generic QA system which encompasses a > large number of questions. The biggest drawback of my previous QA system > was the lack of relevance measuring mechanism. I want to develop a > relevance measure between a query and a sentence. I believe there already > exist many relevance measuring codes but those relate a query to a > document( as far as I know).The term "document" is what the literature uses, but the mental image that might conjure up of a multi-page printout with a staple through the corner is misleading. The "documents" being matched could be single sentences.> To develop a relevance measure I need to take > into consideration a large number of sentences and questions so that a > generic feature set can be formed which will further be employed in my ML > algorithm. This needs a huge dataset of documents which I dont have due to > lack of any financial support. I was planning to use the AQUAINT 2 dataset > but it costs $500 which i cannot afford. > Now if I am successful at building a relevance measuring system between a > query and a sentence then I will take into consideration only those > sentences that are relevant. Then I will apply my scoring system to those > sentences which will help me select the final answer sentence. In my > previous project I got an efficiency of ~74% tested on 200 test queries. I > believe that with a proper relevance measure I can cross the 90% mark. > Please give your suggestions on my project idea. It would be very helpful.The first concern I have is whether this is something we actually have the skills to mentor. I personally don't have any previous experience of Question Answering systems - I don't know about the other mentors. I'm also unclear where Xapian fits into the picture. Are you talking about building this as a new feature for Xapian? Or is it an framework or application built on top of Xapian? Or is it a separate system entirely? Cheers, Olly