thr3ads.net - Xapian devel - GSoC 2017: Letor Click Data Mining [Mar 2017]

If this information is useful, please help other people find it:
Share via:

Vivek Pal

2017-Mar-23 06:18 UTC

GSoC 2017: Letor Click Data Mining

> You could do that by identifying the search session instead of the user,
> which makes it closer to what we need than to something that might trip you
> into privacy concerns.
Okay, that would be much better. :)
> Third records some information about what sort of query it is — add,
> morelike or a plain query. Last provides the estimated match size and then
> the HTTP referrer if one were set. Neither is particularly interesting in
> this case.
Thanks for the explanation. So, as I understand it, we'll need some more
info
to be logged than this to be able to train click models for relevance judgeme-
-nts.
> and you'll need a way to use letor from omega, or you'll have
trained a
> model for no good reason :)
Sorry, I may have misunderstood you here but why would we need a way to use
letor from omega? For training Letor module, wouldn't we just need two files
i.e. Query and Qrel as mentioned in the xapian-letor docs? Letor API can then
generate the final training file using those two files.

And to mine the relevance judgements for Qrel file from logs, we'll need to
train one of the click models such as DBM etc..

Is there a better way to mine the relevance judgements than click models?
> Yes. But if you follow the walkthrough, it copies the uninstalled version
> of the omega CGI. omega is the CGI (I think).
Oh, I thought it'd be a .cgi file. Okay, so I just need to copy this omega
from /usr/local/lib/xapian-omega/bin to usr/lib/cgi-bin and work with it.

Thanks,
Vivek

James Aylett

2017-Mar-23 22:32 UTC

head link

GSoC 2017: Letor Click Data Mining

On 23 Mar 2017, at 06:18, Vivek Pal <vivekpal.dtu at gmail.com> wrote:

[existing omega logging]> So, as I understand it, we'll need some more info
> to be logged than this to be able to train click models for relevance
judgements.
Definitely.
>> and you'll need a way to use letor from omega, or you'll have
trained a
>> model for no good reason :)
> 
> Sorry, I may have misunderstood you here but why would we need a way to use
> letor from omega? For training Letor module, wouldn't we just need two
files
> i.e. Query and Qrel as mentioned in the xapian-letor docs? Letor API can
then
> generate the final training file using those two files.
Yes, but you need to then _use_ letor in displaying omega results. Otherwise
you've just trained the model.
> Is there a better way to mine the relevance judgements than click models?
There may be, but that's really a different project. If you find anything
that sounds promising, maybe add it for possible follow-up; I suspect
there's more than enough for a summer project already.

J

-- 
 James Aylett
 devfort.com — spacelog.org — tartarus.org/james/

Vivek Pal

2017-Mar-24 02:18 UTC

head link

GSoC 2017: Letor Click Data Mining

> Yes, but you need to then _use_ letor in displaying omega results.
> Otherwise you've just trained the model.
Okay, I got your point. Thanks.

Now, after all the discussion I have a pretty clear understanding of
different parts of this project and I can see how the possible workflow
would look like. I can proceed to writing my proposal now. I'll get it into
the GSoC system as soon as it's ready to get some helpful feedback from
you and Olly on how to improve it further. I realise I'm running a few days
late already.
> There may be, but that's really a different project. If you find
anything
> that sounds promising, maybe add it for possible follow-up; I suspect
> there's more than enough for a summer project already.
There are really just click models with different level of complexities that
I've encountered so far in the papers that I read.

One thing that could be set as a stretch goal is that once we have one click
model in place and working this summer, we can then add different variants
of it or may be even some of the more recent ones. That way, people can
have more than one click model to choose from depending upon their needs
just like there are different weighting schemes to choose from in xapian-core.

Thanks,
Vivek

On Fri, Mar 24, 2017 at 4:02 AM, James Aylett <james at tartarus.org>
wrote:> On 23 Mar 2017, at 06:18, Vivek Pal <vivekpal.dtu at gmail.com>
wrote:
>
> [existing omega logging]
>> So, as I understand it, we'll need some more info
>> to be logged than this to be able to train click models for relevance
judgements.
>
> Definitely.
>
>>> and you'll need a way to use letor from omega, or you'll
have trained a
>>> model for no good reason :)
>>
>> Sorry, I may have misunderstood you here but why would we need a way to
use
>> letor from omega? For training Letor module, wouldn't we just need
two files
>> i.e. Query and Qrel as mentioned in the xapian-letor docs? Letor API
can then
>> generate the final training file using those two files.
>
> Yes, but you need to then _use_ letor in displaying omega results.
Otherwise you've just trained the model.
>
>> Is there a better way to mine the relevance judgements than click
models?
>
> There may be, but that's really a different project. If you find
anything that sounds promising, maybe add it for possible follow-up; I suspect
there's more than enough for a summer project already.
>
> J
>
> --
>  James Aylett
>  devfort.com — spacelog.org — tartarus.org/james/
>

Xapian devel - Mar 2017 - GSoC 2017: Letor Click Data Mining

GSoC 2017: Letor Click Data Mining

GSoC 2017: Letor Click Data Mining

GSoC 2017: Letor Click Data Mining