Hi James,> ID: some identifier for each query > QUERY: text of the query (when the query is run) > URLs: every URL displayed (or alternatively, the Xapian docid — this > might be easier) > OFFSET: otherwise you'll have difficulty coping with result pages other > than the first page (when this happens, the query ID should probably > remain the same, and when you aggregate you can "glue" the different > pages together)I'm not clear on what the OFFSET really represents. Could you please explain a bit? And, I think we certainly need the CLICKS field as otherwise we can't capture the click information which is essential to training the click model. This field will need to be of same size and structure as URLs field (i.e. a list) e.g. [0,1,2,0,0] for 5 urls in the result page.> One would then be the clicks, so for each URL clicked in a result page, > emit: > > ID: the query identifier that matches the entry in the search log > URL: the URL redirected to (again, or the Xapian docid) > > This means you need to be able to generate ID for each query, and > also that each clickable URL in the results page will need to go via the > omega CGI using a different template whose job it is to log ID & URL > to the click log and then redirect to URL. Once generated, the ID can > be passed through from call to call (including on pagination)So, whenever a click occurs on the result page, we log the query ID and the clicked url via a different template which will be triggered with each click event but I'm not sure how we will be to capture the click information if we don't record the number of times each url was clicked in a separate CLICKS field? Also, just to be sure, we will log such pairs of query ID and URL in separate files to be aggregated later into a single file? In the end, we will have two files it seems -- one created from the query template containing separate entries for each executed search as per the format you described previously and another containing query IDs and click URLs logged using a different template? I also wanted to ask how does the log command ($log{query.log}) in the query template work. It doesn't seem to comply with the format mentioned in its documentation as it expects two arguments but we provide only one here i.e. query.log and what does this argument mean? Thanks, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20170605/645009fd/attachment.html>
On 5 Jun 2017, at 05:17, Vivek Pal <vivekpal.dtu at gmail.com> wrote:> > ID: some identifier for each query > > QUERY: text of the query (when the query is run) > > URLs: every URL displayed (or alternatively, the Xapian docid — this > > might be easier) > > OFFSET: otherwise you'll have difficulty coping with result pages other > > than the first page (when this happens, the query ID should probably > > remain the same, and when you aggregate you can "glue" the different > > pages together) > > I'm not clear on what the OFFSET really represents. Could you please > explain a bit?Omega paginates results (as does Xapian's MSet, internally). So if you're displaying the second page of results, you'll need to know that when building training data. It's affected by TOPDOC and also by the <>[# CGI variables, but internally to omega there's one variable it's mapped onto. In omegascript, you can find this using $topdoc.> And, I think we certainly need the CLICKS field as > otherwise we can't capture the click information which is essential > to training the click model. This field will need to be of same size > and structure as URLs field (i.e. a list) e.g. [0,1,2,0,0] for 5 urls > in the result page.You will need to generate a file in the format you proposed from the two logging files.> So, whenever a click occurs on the result page, we log the query > ID and the clicked url via a different template which will be triggered > with each click eventYes.> but I'm not sure how we will be to capture the > click information if we don't record the number of times each url was > clicked in a separate CLICKS field?If you have a log line for each time a particular result was clicked, then you can generate CLICKS by adding them up.> Also, just to be sure, we will log > such pairs of query ID and URL in separate files to be aggregated > later into a single file?Well…that's kind of a deployment question. I suggest that the ID,URL (or QUERYID,DOCID) lines are logged to a file separate to the one used to log the query details, because it's easier to think about, and the code is slightly more straightforward. However in the general case, if you have multiple webservers for your site, then each is likely to log to its own file, and you'll later on have to add them all together.> In the end, we will have two files it seems -- one created from the > query template containing separate entries for each executed search > as per the format you described previously and another containing > query IDs and click URLs logged using a different template?Yes, that's right. I recommend logging Xapian docids instead of click URLs for the reason previously discussed.> I also wanted to ask how does the log command ($log{query.log}) in > the query template work.It's documented (tersely) in the omegascript documentation. The format is: $log{LOGFILE[,ENTRY]}> It doesn't seem to comply with the format > mentioned in its documentation as it expects two arguments but we > provide only one here i.e. query.log and what does this argument > mean?The [] means that the second parameter is optional. Indeed, the documentation says:> ENTRY defaults to a format similar to the Common Log Format used by webservers.If you do provide ENTRY, it's more omegascript which is evaluated to produce the string written to LOGFILE. (This is hinted at, but not made quite explicit.) See around line 140 of xapian-applications/omega/query.cc for how the default is implemented. J -- James Aylett devfort.com — spacelog.org — tartarus.org/james/
> > I'm not clear on what the OFFSET really represents. Could you > > please explain a bit? > > Omega paginates results (as does Xapian's MSet, internally). So if > you're displaying the second page of results, you'll need to know > that when building training data. It's affected by TOPDOC and also > by the <>[# CGI variables, but internally to omega there's one > variable it's mapped onto. > > In omegascript, you can find this using $topdoc.Thanks for the explanation. Understood now.> > In the end, we will have two files it seems -- one created from the > > query template containing separate entries for each executed search > > as per the format you described previously and another containing > > query IDs and click URLs logged using a different template? > > Yes, that's right. I recommend logging Xapian docids instead of click > URLs for the reason previously discussed.Yes, I'll use docids instead of click URLs as you recommend. Now for the first step i.e logging separate entries for each executed search from the query template, I wanted to know if I should modify the existing log command or implement a separate one? Although, I think if we implement a new one we'll have a certain level of flexibility for achieving our purpose.> > It doesn't seem to comply with the format > > mentioned in its documentation as it expects two arguments but > > we provide only one here i.e. query.log and what does this > > argument mean? > > The [] means that the second parameter is optional. Indeed, the > documentation says: > > ENTRY defaults to a format similar to the Common Log Format > used by webservers.Thanks, it's clear to me now. I didn't come across the fact that the parameters in square brackets are assumed to be optional. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20170606/0151842e/attachment.html>