Hello I am a third year B.Tech student of Mathematics and Computing at IIT Delhi. I would like to contribute to the organization during GSoC. I have build the code from git on my machine. I am interested in the Project: Learning to Rank Click Data Mining. From the project details given, I think the aim of the project is to make the Letor module more usable by providing the training data to it from the real time search results. The training data is to be generated from the click data which is basically the query-document pair. I have gone through the format of Training data as provided in xapian-letor/docs/letor.rst. I want to know, Are we saving the click data/ search log somewhere? Please provide me some advice about my understanding of this project. Thanking You Gautam Dudeja -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20170302/66936986/attachment.html>
On 2 Mar 2017, at 11:49, Gautam Dudeja <gautam9622 at gmail.com> wrote:> I am interested in the Project: Learning to Rank Click Data Mining. From the project details given, I think the aim of the project is to make the Letor module more usable by providing the training data to it from the real time search results. > The training data is to be generated from the click data which is basically the query-document pair. > I have gone through the format of Training data as provided in xapian-letor/docs/letor.rst. > I want to know, Are we saving the click data/ search log somewhere? > Please provide me some advice about my understanding of this project.Hi Gautam. You're basically on the right track here — the wrinkle is that there's no standard format for click data, and so we'll have to provide some ways of people getting it. That will probably involve extending Omega to log things suitably to be able to do that, and should definitely include some clear documentation so people can get it working if they aren't on Omega. J -- James Aylett, occasional troublemaker & project governance xapian.org
On Thu, Mar 02, 2017 at 05:19:56PM +0530, Gautam Dudeja wrote:> I am interested in the Project: Learning to Rank Click Data Mining. From > the project details given, I think the aim of the project is to make the > Letor module more usable by providing the training data to it from the real > time search results. > The training data is to be generated from the click data which is basically > the query-document pair.That's some key information, but it's possible other data might be useful - a few examples: * the rank of the clicked result in the result set * how long the user took to choose that result * if they came back and clicked on a different result> I have gone through the format of Training data as provided in > xapian-letor/docs/letor.rst. > I want to know, Are we saving the click data/ search log somewhere? > Please provide me some advice about my understanding of this project.There isn't such a log currently. What we're expecting for this project is that the applicant would look through the academic literature and pick a promising looking approach that's been shown to work already (the GSoC timescale is really too short to develop a fresh approach). But what we need to log depends on what data the chosen approach needs. Once that's known. we can define a sensible log format (or look to see if there's an existing log format that would work). I'd suggest putting most of the effort into the actual mining of the data, but if the log format used isn't something already produced then you probably need to allow time to prototype something that produces it so the system can actually be tried out end-to-end. For example, maybe configure the omega CGI search front-end to produce such a log. Cheers, Olly