thr3ads.net - Xapian devel - [Xapian-devel] GSoC 2014 [Feb 2014]

If this information is useful, please help other people find it:
Share via:

Parth Gupta

2014-Feb-26 11:46 UTC

[Xapian-devel] GSoC 2014

The Letor project involves descent amount of Machine Learning while all the
ranking related projects are around IR. Its better to introduce your idea
on mailing list where all the mentors can have a detailed look at it,
potential mentors can respond and the idea is kind of registered under your
name.

Cheers,
Parth.


On Wed, Feb 26, 2014 at 10:20 AM, Olly Betts <olly at survex.com> wrote:
> On Tue, Feb 25, 2014 at 03:58:09PM +0530, karthik iyer wrote:
> >     I am C Karthik Iyer, a 3rd year B Tech student at NITK Surathkal.
I
> am
> > interested in working on projects on Information Retrieval and Machine
> > Learning. I've had previous experience on working on projects
regarding
> > Question Answering Systems.
> >     I have a project idea which includes both IR and ML but i dont
know
> how
> > feasible the idea is. Could you guys say when will you be available on
> IRC
> > so that I can discuss the idea with you.
>
> I can't say for certain when I'll be monitoring IRC, but I'm in
UTC+13.
> Other mentors are in a variety of timezones.
>
> If the idea is complex, email might be better though.
>
> Cheers,
>     Olly
>
> _______________________________________________
> Xapian-devel mailing list
> Xapian-devel at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-devel
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20140226/83b3a72e/attachment-0002.html>

karthik iyer

2014-Feb-27 07:41 UTC

head link

[Xapian-devel] GSoC 2014

Hello,

So my idea goes like this. Basically I have been working on Question
Answering systems. I developed a QA system for "when" type questions
(sorry
I cant provide the source code at the moment because my paper is under
review at SIGIR 2014). I used the part-of-speech and developed a weighted
scoring system.
Now I basically plan on developing a generic QA system which encompasses a
large number of questions. The biggest drawback of my previous QA system
was the lack of relevance measuring mechanism. I want to develop a
relevance measure between a query and a sentence. I believe there already
exist many relevance measuring codes but those relate a query to a
document( as far as I know). To develop a relevance measure I need to take
into consideration a large number of sentences and questions so that a
generic feature set can be formed which will further be employed in my ML
algorithm. This needs a huge dataset of documents which I dont have due to
lack of any financial support. I was planning to use the AQUAINT 2 dataset
but it costs $500 which i cannot afford.
Now if I am successful at building a relevance measuring system between a
query and a sentence then I will take into consideration only those
sentences that are relevant. Then I will apply my scoring system to those
sentences which will help me select the final answer sentence. In my
previous project I got an efficiency of ~74% tested on 200 test queries. I
believe that with a proper relevance measure I can cross the 90% mark.
Please give your suggestions on my project idea. It would be very helpful.

Regards
Karthik

On Wed, Feb 26, 2014 at 5:16 PM, Parth Gupta <pargup8 at gmail.com> wrote:
> The Letor project involves descent amount of Machine Learning while all
> the ranking related projects are around IR. Its better to introduce your
> idea on mailing list where all the mentors can have a detailed look at it,
> potential mentors can respond and the idea is kind of registered under your
> name.
>
> Cheers,
> Parth.
>
>
> On Wed, Feb 26, 2014 at 10:20 AM, Olly Betts <olly at survex.com>
wrote:
>
>> On Tue, Feb 25, 2014 at 03:58:09PM +0530, karthik iyer wrote:
>> >     I am C Karthik Iyer, a 3rd year B Tech student at NITK
Surathkal. I
>> am
>> > interested in working on projects on Information Retrieval and
Machine
>> > Learning. I've had previous experience on working on projects
regarding
>> > Question Answering Systems.
>> >     I have a project idea which includes both IR and ML but i dont
know
>> how
>> > feasible the idea is. Could you guys say when will you be
available on
>> IRC
>> > so that I can discuss the idea with you.
>>
>> I can't say for certain when I'll be monitoring IRC, but
I'm in UTC+13.
>> Other mentors are in a variety of timezones.
>>
>> If the idea is complex, email might be better though.
>>
>> Cheers,
>>     Olly
>>
>> _______________________________________________
>> Xapian-devel mailing list
>> Xapian-devel at lists.xapian.org
>> http://lists.xapian.org/mailman/listinfo/xapian-devel
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20140227/b0c1c611/attachment-0002.html>

Olly Betts

2014-Feb-28 12:43 UTC

head link

[Xapian-devel] GSoC 2014

On Thu, Feb 27, 2014 at 01:11:24PM +0530, karthik iyer
wrote:> So my idea goes like this. Basically I have been working on Question
> Answering systems. I developed a QA system for "when" type
questions (sorry
> I cant provide the source code at the moment because my paper is under
> review at SIGIR 2014). I used the part-of-speech and developed a weighted
> scoring system.
> Now I basically plan on developing a generic QA system which encompasses a
> large number of questions. The biggest drawback of my previous QA system
> was the lack of relevance measuring mechanism. I want to develop a
> relevance measure between a query and a sentence. I believe there already
> exist many relevance measuring codes but those relate a query to a
> document( as far as I know).
The term "document" is what the literature uses, but the mental image
that might conjure up of a multi-page printout with a staple through the
corner is misleading.  The "documents" being matched could be single
sentences.
> To develop a relevance measure I need to take
> into consideration a large number of sentences and questions so that a
> generic feature set can be formed which will further be employed in my ML
> algorithm. This needs a huge dataset of documents which I dont have due to
> lack of any financial support. I was planning to use the AQUAINT 2 dataset
> but it costs $500 which i cannot afford.
> Now if I am successful at building a relevance measuring system between a
> query and a sentence then I will take into consideration only those
> sentences that are relevant. Then I will apply my scoring system to those
> sentences which will help me select the final answer sentence. In my
> previous project I got an efficiency of ~74% tested on 200 test queries. I
> believe that with a proper relevance measure I can cross the 90% mark.
> Please give your suggestions on my project idea. It would be very helpful.
The first concern I have is whether this is something we actually have
the skills to mentor.  I personally don't have any previous experience
of Question Answering systems - I don't know about the other mentors.

I'm also unclear where Xapian fits into the picture.

Are you talking about building this as a new feature for Xapian?

Or is it an framework or application built on top of Xapian?

Or is it a separate system entirely?

Cheers,
    Olly

Maybe Matching Threads

Search for more apparently analagous threads

Xapian devel - Feb 2014 - GSoC 2014

[Xapian-devel] GSoC 2014

[Xapian-devel] GSoC 2014

[Xapian-devel] GSoC 2014

Maybe Matching Threads