thr3ads.net - Xapian devel - [Xapian-devel] Learning to rank [Mar 2012]

If this information is useful, please help other people find it:
Share via:

pankaj singhal

2012-Mar-24 20:57 UTC

[Xapian-devel] Learning to rank

Dear Sir,
I am Pankaj Singhal from Jaipur, India. I am very much
interested and strongly looking forward in getting involved in this project
Learning-to-Rank.

My previous experience in this field is good. Last semester I did a similar
job of ranking the URLs of the given huge dataset based on their attribute
values. The dataset consisted hundreds of thousands of URLs and each url
consisted of around 33000 features and a binary class label with +1 OR -1
value. I applied the Decision Tree induction(GINI INDEX) Approach for
filtering out the URLs and then applying a RANKSUM[1] metric, which uses
weighted sum approach, to rank the URLs accordingly.

The current implementation involves firstly the unsupervised ranking of a
query and then applying a supervised learning algorithm, SVM, on the first
'n' documents retrieved.
A similar approach can be incorporated while extending the problem of
ranking with a better supervised learning algorithm and probabilistic model
viz. Bayesian Belief Networks i.e. it can be applied after fetching 'n'
documents from either of the two approaches, unsupervised ranking or SVM
ranking.

Incorporating pairwise approach would also be a good idea, there are
various algorithms available.

[1] - Rank-Order Weighting of Web Attributes for Website Evaluation - Mehri
Saeid, Abdul Azim Abd Ghani, and Hasan Selamat

regards,

--
Pankaj Singhal
III Year, CSE
The LNMIIT, Jaipur, India.

Mob: +918875053936
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20120325/16d16730/attachment.html>

Parth Gupta

2012-Mar-25 16:19 UTC

head link

[Xapian-devel] Learning to rank

Hello Pankaj,

I think based on the updated info of the LTR project (below link) you
should be able to think specifically about your idea and if you want you
can discuss it here or on IRC about the fine details involved.

http://trac.xapian.org/wiki/GSoCProjectIdeas#Project:LearningtoRank

Moreover, for the general details you may want to refer
http://trac.xapian.org/wiki/GSoC2012

Thanks for your interest.

Parth.




On Sun, Mar 25, 2012 at 2:27 AM, pankaj singhal <pankajsinghal at
ieee.org>wrote:
> Dear Sir,
>                  I am Pankaj Singhal from Jaipur, India. I am very much
> interested and strongly looking forward in getting involved in this project
> Learning-to-Rank.
>
> My previous experience in this field is good. Last semester I did a
> similar job of ranking the URLs of the given huge dataset based on their
> attribute values. The dataset consisted hundreds of thousands of URLs and
> each url consisted of around 33000 features and a binary class label with
> +1 OR -1 value. I applied the Decision Tree induction(GINI INDEX) Approach
> for filtering out the URLs and then applying a RANKSUM[1] metric, which
> uses weighted sum approach, to rank the URLs accordingly.
>
> The current implementation involves firstly the unsupervised ranking of a
> query and then applying a supervised learning algorithm, SVM, on the first
> 'n' documents retrieved.
> A similar approach can be incorporated while extending the problem of
> ranking with a better supervised learning algorithm and probabilistic model
> viz. Bayesian Belief Networks i.e. it can be applied after fetching
'n'
> documents from either of the two approaches, unsupervised ranking or SVM
> ranking.
>
> Incorporating pairwise approach would also be a good idea, there are
> various algorithms available.
>
>
>
>
> [1] - Rank-Order Weighting of Web Attributes for Website Evaluation -
> Mehri Saeid, Abdul Azim Abd Ghani, and Hasan Selamat
>
>
> regards,
>
> --
> Pankaj Singhal
> III Year, CSE
> The LNMIIT, Jaipur, India.
>
> Mob: +918875053936
>
>
>
>
> _______________________________________________
> Xapian-devel mailing list
> Xapian-devel at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-devel
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20120325/ffd2e624/attachment.html>

Parth Gupta

2012-Mar-30 12:04 UTC

head link

[Xapian-devel] Learning to rank

Hi Pankaj,

Nice to see that you have chosen the algorithm. Yes, indeed ListMLE would
be a nice choice hence the difference between ListNet and ListMLE is the
loss function. The former mimimises the Cross Entropy while the latter
miminises the likelihood loss.

It would be better, if you investigate this and try to include in your
proposal.

Parth.
> Here is the idea which i want to incorporate and which would be a good
> extension to the LTR project and Xapian.
> I want to implement the algorithm ListMLE[1] on Xapian. The algorithm uses
> listwise approach with Neural Network as Model and gradient descent as
> algorithm(highly optimised Loss function). ListMLE is an extension of
> ListNET[2] which itself is an extension(somewhat) of RankNET[2]. This
> algorithm has shown better performance than the other two.Also the
> algorithm has linear complexity.
>
> Regarding the features for the query-document pair, research has shown
> many good features that can be used for better tuning of the parameters of
> ranking function which can differentiate the documents in a better way.
> These can be calculated using the basic set of features(tf, idf, bm25,
> etc.), the more the better.
>
> Regarding the training data we can use the OHSUMED[4] data-set, a
> benchmark data-set released in LETOR 2.0(Microsoft research), used by the
> developers of the algorithm for the training and testing purposes. This
> data-set is reliable as the relevance degrees of documents with respect to
> the queries are judged by humans. They try to adopt the ?standard? features
> proposed in the IR community. The similar kind of features, as used in
> data-set, can be incorporated while implementing the algorithm on Xapian.
>
> Implementing this algorithm would definitely be a good improvement in the
> current LTR project, as it uses listwise approach which is far better than
> the current pointwise approach. Also there are more and better features
> used in OHSUMED dataset which we can use , than the current used features.
>
> Please give feedback on the idea and suggest any exploration needed.
>
>
> [1] - http://research.microsoft.com/en-us/people/tyliu/icml-listmle.pdf
> [2] - http://research.microsoft.com/apps/pubs/default.aspx?id=70428
> [3] -
>
http://research.microsoft.com/en-us/um/people/cburges/papers/ICML_ranking.pdf
> [4] -
>
http://research.microsoft.com/en-us/um/beijing/projects/letor//letor-old.aspx
>
>
> regards,
>
>
> On Wed, Mar 28, 2012 at 7:58 PM, Parth Gupta <parthg.88 at gmail.com>
wrote:
>
>> Pankaj,
>>
>> FANN looks fine. But in the proposal I would like to see something
>> specific what you plan to do with that. Like implementing the algorithm
>> RankNet, ListNet or something else?
>>
>> Parth.
>>
>>
>> On Wed, Mar 28, 2012 at 6:19 AM, Olly Betts <olly at survex.com>
wrote:
>>
>>> On Tue, Mar 27, 2012 at 05:26:45PM +0530, pankaj singhal wrote:
>>> > I have come across these C++ neural-frameworks:
>>> > FANN <http://leenissen.dk/fann/wp/>
>>> > Libann
<http://www.nongnu.org/libann/doc/libann_4.html#SEC17>
>>>
>>> Did you check the licences?  Libann's site clearly says
it's GPL and as
>>> I said in the message you replied to, we'd rather not add more
GPL
>>> dependencies.
>>>
>>> > I want you to look at the libraries as while incorporating
them the
>>> need of
>>> > implementing the ML algo. from the scratch reduces.
>>> > http://lists.xapian.org/mailman/listinfo/xapian-devel
>>>
>>> FANN says it is LGPL, which is probably OK.  I've no idea if it
fulfils
>>> the needs of the project.  Parth may be able to comment more
usefully,
>>> but ultimately you'll need to show us in your proposal that the
>>> libraries you're intending to use are suitable, so you'll
need to look
>>> into this more deeply yourself.
>>>
>>> Cheers,
>>>     Olly
>>>
>>
>>
>
>
> --
> Pankaj Singhal
> III Year, CSE
> The LNMIIT, Jaipur, India.
>
> Mob: +918875053936
>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20120330/82e9af5a/attachment-0001.htm>

pankaj singhal

2012-Apr-04 05:36 UTC

head link

[Xapian-devel] Learning to rank

Parth & Olly,

I have submitted the proposal at the google-melange site. Please have a
look at it & provide your valuable comments to mold it in a better manner.

regards,


On Sat, Mar 31, 2012 at 12:06 AM, pankaj singhal <pankajsinghal at
ieee.org>wrote:
> Parth,
>
> It would be very nice if you could just send me the proposal you made last
> year so that I can refer it for my proposal, if you are OK with that.
>
>
> On Fri, Mar 30, 2012 at 11:39 PM, pankaj singhal <pankajsinghal at
ieee.org>wrote:
>
>> Parth,
>>
>> As I am new with the formalities of submitting the application and
making
>> a good proposal, I would like you to help me with the feedback of my
>> application so that I would mold it accordingly. Also how am I supposed
to
>> show the proposal made.
>>
>> regards
>>
>> On Fri, Mar 30, 2012 at 5:34 PM, Parth Gupta <parthg.88 at
gmail.com> wrote:
>>
>>> Hi Pankaj,
>>>
>>> Nice to see that you have chosen the algorithm. Yes, indeed ListMLE
>>> would be a nice choice hence the difference between ListNet and
ListMLE is
>>> the loss function. The former mimimises the Cross Entropy while the
latter
>>> miminises the likelihood loss.
>>>
>>> It would be better, if you investigate this and try to include in
your
>>> proposal.
>>>
>>> Parth.
>>>
>>>  Here is the idea which i want to incorporate and which would be a
good
>>>> extension to the LTR project and Xapian.
>>>> I want to implement the algorithm ListMLE[1] on Xapian. The
algorithm
>>>> uses listwise approach with Neural Network as Model and
gradient descent as
>>>> algorithm(highly optimised Loss function). ListMLE is an
extension of
>>>> ListNET[2] which itself is an extension(somewhat) of
RankNET[2]. This
>>>> algorithm has shown better performance than the other two.Also
the
>>>> algorithm has linear complexity.
>>>>
>>>> Regarding the features for the query-document pair, research
has shown
>>>> many good features that can be used for better tuning of the
parameters of
>>>> ranking function which can differentiate the documents in a
better way.
>>>> These can be calculated using the basic set of features(tf,
idf, bm25,
>>>> etc.), the more the better.
>>>>
>>>> Regarding the training data we can use the OHSUMED[4] data-set,
a
>>>> benchmark data-set released in LETOR 2.0(Microsoft research),
used by the
>>>> developers of the algorithm for the training and testing
purposes. This
>>>> data-set is reliable as the relevance degrees of documents with
respect to
>>>> the queries are judged by humans. They try to adopt the
?standard? features
>>>> proposed in the IR community. The similar kind of features, as
used in
>>>> data-set, can be incorporated while implementing the algorithm
on Xapian.
>>>>
>>>> Implementing this algorithm would definitely be a good
improvement in
>>>> the current LTR project, as it uses listwise approach which is
far better
>>>> than the current pointwise approach. Also there are more and
better
>>>> features used in OHSUMED dataset which we can use , than the
current used
>>>> features.
>>>>
>>>> Please give feedback on the idea and suggest any exploration
needed.
>>>>
>>>>
>>>> [1] -
http://research.microsoft.com/en-us/people/tyliu/icml-listmle.pdf
>>>> [2] -
http://research.microsoft.com/apps/pubs/default.aspx?id=70428
>>>> [3] -
>>>>
http://research.microsoft.com/en-us/um/people/cburges/papers/ICML_ranking.pdf
>>>> [4] -
>>>>
http://research.microsoft.com/en-us/um/beijing/projects/letor//letor-old.aspx
>>>>
>>>>
>>>> regards,
>>>>
>>>>
>>>> On Wed, Mar 28, 2012 at 7:58 PM, Parth Gupta <parthg.88 at
gmail.com>wrote:
>>>>
>>>>> Pankaj,
>>>>>
>>>>> FANN looks fine. But in the proposal I would like to see
something
>>>>> specific what you plan to do with that. Like implementing
the algorithm
>>>>> RankNet, ListNet or something else?
>>>>>
>>>>> Parth.
>>>>>
>>>>>
>>>>> On Wed, Mar 28, 2012 at 6:19 AM, Olly Betts <olly at
survex.com> wrote:
>>>>>
>>>>>> On Tue, Mar 27, 2012 at 05:26:45PM +0530, pankaj
singhal wrote:
>>>>>> > I have come across these C++ neural-frameworks:
>>>>>> > FANN <http://leenissen.dk/fann/wp/>
>>>>>> > Libann
<http://www.nongnu.org/libann/doc/libann_4.html#SEC17>
>>>>>>
>>>>>> Did you check the licences?  Libann's site clearly
says it's GPL and
>>>>>> as
>>>>>> I said in the message you replied to, we'd rather
not add more GPL
>>>>>> dependencies.
>>>>>>
>>>>>> > I want you to look at the libraries as while
incorporating them the
>>>>>> need of
>>>>>> > implementing the ML algo. from the scratch
reduces.
>>>>>> >
http://lists.xapian.org/mailman/listinfo/xapian-devel
>>>>>>
>>>>>> FANN says it is LGPL, which is probably OK.  I've
no idea if it
>>>>>> fulfils
>>>>>> the needs of the project.  Parth may be able to comment
more usefully,
>>>>>> but ultimately you'll need to show us in your
proposal that the
>>>>>> libraries you're intending to use are suitable, so
you'll need to look
>>>>>> into this more deeply yourself.
>>>>>>
>>>>>> Cheers,
>>>>>>     Olly
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Pankaj Singhal
>>>> III Year, CSE
>>>> The LNMIIT, Jaipur, India.
>>>>
>>>> Mob: +918875053936
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Pankaj Singhal
>> III Year, CSE
>> The LNMIIT, Jaipur, India.
>>
>> Mob: +918875053936
>>
>>
>>
>>
>
>
> --
> Pankaj Singhal
> III Year, CSE
> The LNMIIT, Jaipur, India.
>
> Mob: +918875053936
>
>
>


-- 
Pankaj Singhal
III Year, CSE
The LNMIIT, Jaipur, India.

Mob: +918875053936
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20120404/92fe2773/attachment.html>

Seemingly Similar Threads

Search for more apparently analagous threads

Xapian devel - Mar 2012 - Learning to rank

[Xapian-devel] Learning to rank

[Xapian-devel] Learning to rank

[Xapian-devel] Learning to rank

[Xapian-devel] Learning to rank

Seemingly Similar Threads