Hi Olly. Thanks for your reply to the previous email. To have an appropriate subject I've started this new thread for further discussions.> There's a $log{} command available in Omega templates. We can't log from > the result page template, as the clicks happen after that is used, but we > could make result links redirect via a second Omega template which does > the logging.To make sure I understand correctly -- we need a second Omega template to enable redirection of each result link from opensearch template to that template after they are clicked, our first step should be towards implementing such a template or it exists already? Please correct me if I'm wrong. I've been exploring the Omega codebase for the past few days but it would be great if you could elaborate a bit about how logging works internally. So far, I understand that $log command recursively calls the eval function in which it's defined and prints the returned string from eval to a log file.> So for that you'd also need to implement this result modification and then > to use that new feature from Omega.I've read the paper to understand the FairPairs algorithm. To implement it, we'd need to take the result links for each query from opensearch template and feed it into the algorithm which will rearrange the results using a uniform probability variable. Modified results can then presented using opensearch template and clicks are recorded to adjust the relevance score of certain doc links.> That indeed seems an unrealistic assumption, though I guess what really > matters is how effective these models are in practice (after all, models > are almost inherently simplifications of reality).I'm planning to propose preference pairs model along with FairPairs to eliminate position bias. Do you suggest to look into one of the sequential models as well? Also, I'm following the Omega example wiki page to setup Omega at present to have a first hand experience with it. I've xapian-core and Omega installed on my system (had it installed earlier but pulled the recent changes and installed again). But visiting localhost/cgi-bin/omega.cgi gives 500 Internal Sever Error. Looking into error log file (https://paste.debian.net/922929/) reveals that it doesn't have permission to create .libs directory in /usr/lib/cgi-bin which shouldn't be the case as I set the permissions correctly using: sudo chmod 755 /usr/lib/cgi-bin/omega.cgi Also, I didn't see the below output while indexing the data using omnidex. [Entering directory /] Indexing "/ci_01.htm" as text/html ... added. Indexing "/ci_02.htm" as text/html ... added. ... Did I miss something in the process? Thanks, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20170321/0eca9da2/attachment.html>
On 21 Mar 2017, at 09:17, Vivek Pal <vivekpal.dtu at gmail.com> wrote:> > There's a $log{} command available in Omega templates. We can't log from > > the result page template, as the clicks happen after that is used, but we > > could make result links redirect via a second Omega template which does > > the logging. > > To make sure I understand correctly -- we need a second Omega template to enable > redirection of each result link from opensearch template to that template after they are > clicked, our first step should be towards implementing such a template or it exists already? > Please correct me if I'm wrong.Isn't this from the query template, ie from the main web page of search results? (It might make sense from opensearch as well, though.) We need some way of logging when people click on a search result — which you can build using a second omegascript template, as Olly suggested.> I've been exploring the Omega codebase for the past few days but it would be great if > you could elaborate a bit about how logging works internally. So far, I understand that > $log command recursively calls the eval function in which it's defined and prints the > returned string from eval to a log file.You're overthinking things: look to the documentation first:> $log{LOGFILE[,ENTRY]} > > write to the log file LOGFILE in directory log_dir (set in omega.conf). ENTRY is the OmegaScript for the log entry, and a linefeed is appended. If LOGFILE cannot be opened for writing, nothing is done (and ENTRY isn't evaluated). ENTRY defaults to a format similar to the Common Log Format used by webservers.So the only thing you really need to know is the ENTRY format, so you can figure out how to log what you need. (Which you should identify before diving into code.)> I've read the paper to understand the FairPairs algorithm. To implement it, we'd need to > take the result links for each query from opensearch template and feed it into the algorithm > which will rearrange the results using a uniform probability variable. Modified results can > then presented using opensearch template and clicks are recorded to adjust the relevance > score of certain doc links.You need to think more carefully about the layers involved here. We don't want to post-process the output of a template: we want to be able to render the template with the results rearranged. Incidentally, this feels to me like it needed an MSet re-ordering system. So it may be worth looking at the discussion around doing this for Letor, which has a similar problem. This was the mailing list discussion initiated by Ayush (based on some previous IRC conversations, IIRC) as part of his Letor project last year: https://lists.xapian.org/pipermail/xapian-devel/2016-July/002981.html> Also, I'm following the Omega example wiki page to setup Omega at present to have a first > hand experience with it. I've xapian-core and Omega installed on my system (had it installed > earlier but pulled the recent changes and installed again).That page is ancient, so I hope you're actually installing the 1.4 series Xapian and Omega! This is the problem with overly-specific walkthroughs :-(> But visiting localhost/cgi-bin/omega.cgi gives 500 Internal Sever Error. Looking into error log > file (https://paste.debian.net/922929/) reveals that it doesn't have permission to create .libs > directoryThat looks to me like you haven't installed omega, but are trying to run with the development version (which is a libtool script that finds the right pieces and puts them together). That seems to be what the walkthrough tells you to do, which is unhelpful.> in /usr/lib/cgi-bin which shouldn't be the case as I set the permissions correctly using: > > sudo chmod 755 /usr/lib/cgi-bin/omega.cgiThat is correct, but won't solve your problem. When you ran `make install` for omega, it will have copied the CGI somewhere, although I can't remember where; I'd guess /usr/local/lib/bin/xapian-omega by default, from eyeballing the Makefile.> Also, I didn't see the below output while indexing the data using omnidex. > > [Entering directory /] > Indexing "/ci_01.htm" as text/html ... added. > Indexing "/ci_02.htm" as text/html ... added. > ...What did you get? Nothing? If nothing, you may not have followed the instructions on unpacking the sample (book) data properly. More generally, I'd recommend reading the omega documentation (particularly around omindex) to understand what it does, rather than just following the walkthrough. J -- James Aylett devfort.com — spacelog.org — tartarus.org/james/
Hi James,> Isn't this from the query template, ie from the main web page of search > results? (It might make sense from opensearch as well, though.)Yes, you are right; it is the query template. The reason I said opensearch template is that I haven't quite read all sections of the Omega docs and I'm still in the process. Thanks for pointing that out. I'm aiming to cover most of it in a day or two to have a good understanding of how the project will fit in. However, I won't be able to cover all the Omega- -Script commands but atleast the most related ones like $log.> We need some way of logging when people click on a search result — which > you can build using a second omegascript template, as Olly suggested.Okay, so it will act between the query template and a linked document pointed by a search result. Do you think we need to make this new template transparent to the user in some way as we might have to record some information such as user ids in the form of IP? In any case, we'll need a way to distinguish between different users by assigning unique ids to them.> So the only thing you really need to know is the ENTRY format, so you can > figure out how to log what you need. (Which you should identify before > diving into code.)I see; though it would be helpful to also have an example in the documentation for the same? There's a DEFAULT_LOG_ENTRY string in query.cc that I can across while on the word_in_list PR: "$or{$env{REMOTE_HOST},$env{REMOTE_ADDR},-}\t" "[$date{$now,%d/%b/%Y:%H:%M:%S} +0000]\t" "$if{$cgi{X},add,$if{$cgi{MORELIKE},morelike,query}}\t" "$dbname\t" "$query\t" "$msize$if{$env{HTTP_REFERER},\t$env{HTTP_REFERER}}"; Could you explain the meaning of third and and last strings?> You need to think more carefully about the layers involved here. We don't > want to post-process the output of a template...Yes, so I thought about it in detail and I think the whole process would like the following from a broad perspective: 1. Rearrangement: Input the original results to the FairPairs which will rearrange them and the rearranged results will be presented on the query template. 2. Logging: Log the required data using a new template and store it in an appropriate format for further processing. 3. Click Models: These are successors of preference pair models which I mentioned earlier. We have some options here as descibed in book "Click Models for Web Search" such as DBN, DCN, CCN etc. which will be trained on a relevance dataset to provide us with relevance scores of results links in our logs using which we'll generate Qrel file as used by xapian-letor. To train a click model, we'd need a relevance prediction dataset that should contain human generated binary relevance labels for query-document pairs. I'm curious to know from where we can obtain such a dataset. One that I know of is Yandex web seach challenge dataset on Kaggle. And, thanks for the link to MSet re-ordering system. I'll check out ideas that were discussed there.> That page is ancient, so I hope you're actually installing the 1.4 series > Xapian and Omega!Latest stable release is 1.4 series but I actually have 1.5 series installed which I think is because I installed dev version from latest git master. I don't think that should be a problem here?> That looks to me like you haven't installed omega, but are trying to run > with the development versionI've all xapian related executables in /usr/local/bin including omindex. Does that suggest Omega is installed?> When you ran `make install` for omega, it will have copied the CGI somewhereIn /usr/local/lib/xapian-omega/bin, I can't find CGI but these file: mhtml2html, omega, outlookmsg2html, rfc822tohtml and vcard2text.> More generally, I'd recommend reading the omega documentation.Yes, I'll go through it. I'll give it a second try after reading the docs and may be ask for help with setting up Omega on IRC if I run into an issue again. Thanks, Vivek