thr3ads.net - Xapian devel - GSoC [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Александр Слесарев

2019-Apr-06 07:59 UTC

GSoC

Hi! Can you give some additional info about  "Learning to Rank Clickstream
Data Mining/Currently, DBN click model training is based on a simple
counting algorithm. There's an advanced version of training method given by
a combination of EM and forward-backward algorithm in the paper which is
worth having." to help me make a time schedule?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20190406/9d94d39f/attachment.html>

James Aylett

2019-Apr-06 13:58 UTC

head link

GSoC

Hi Александр, and welcome to Xapian!

I'm not quite sure what question you're asking, but hopefully the
following is helpful.

First, you should look at the wiki pages of the previous project that started
this work: <https://trac.xapian.org/wiki/GSoC2017/LetorClickstream>. That
discusses what was planned, what happened (there's a week-by-week journal,
which we ask all our students to do during GSoC), and a write-up (the "work
product") which is what was submitted to Google at the end — and says what
was completed, what was left unmerged (although the PR mentioned has since been
completed and merged), and some ideas for future work, which should be read
alongside the project description on our main page (there's some overlap).
This will answer some of your questions, and most immediately:

1. You ask about the advanced training method. This is in the paper Vivek based
his work on. The paper is linked from the project description, and the ACM page
from Vivek's project plan.

2. Other training methods are mentioned by name, and you should be able to find
them in an academic literature search. If you don't have access to an
academic library, then let us know, but if you do then we expect you to be able
to hunt down references (and indeed to find new options through a search — IR is
a field with ongoing work both within academia and industry, and new ideas and
research appears every year).

For the timeline itself, this is something we expect you to do the first draft
of. We can provide feedback, but you need to demonstrate that you can understand
the problem and possible solutions enough to propose a timeline of work across
the summer. This is a key skill in breaking down and planning larger software
projects, as well as in working autonomously in the way that Open Source
projects typically happen.

You asked in another email about examples of Letor and Omega use. Right now,
because this project is not complete, there are no examples of their use
together. It's worth reading through the documentation for Omega (including
articles linked on the wiki) to get a feel for how people use it. Then the
"end to end" part of this project would be to think about how Letor
should be incorporated into that — the steps people will have to take, and what
software and documentation is missing for people to be able to do so. The
documentation in Vivek's work product should provide a good starting point
for that — it gets you from a running Omega with letor to a trained letor model
that can be used to rerank queries — but that reranking is not currently
available in Omega.

Finally, you asked about qualifying tasks — these don't have to be completed
by April 9th, since the assessment period for applications continues until May.
A couple of things come to mind:

 * pick any small tasks or bug as suggested in our guidance notes (this would be
preferred)
 * open a WIP ("work in progress" — ie not to be merged) pull request
that changes omega to rerank based on a trained letor model. This would just be
a quick approach rather than satisfying the project (for instance it
wouldn't include configuration — it would always try to rerank), but would
enable you to get familiar with Omega and with Letor.

J
> On 6 Apr 2019, at 08:59, Александр Слесарев <alexander.g.slesarev at
gmail.com> wrote:
> 
> Hi! Can you give some additional info about  "Learning to Rank
Clickstream Data Mining/Currently, DBN click model training is based on a simple
counting algorithm. There's an advanced version of training method given by
a combination of EM and forward-backward algorithm in the paper which is worth
having." to help me make a time schedule?
-- 
 James Aylett
 devfort.com — spacelog.org — tartarus.org/james/

Xapian devel - Apr 2019 - GSoC

GSoC

GSoC