Hi James,> Isn't this from the query template, ie from the main web page of search > results? (It might make sense from opensearch as well, though.)Yes, you are right; it is the query template. The reason I said opensearch template is that I haven't quite read all sections of the Omega docs and I'm still in the process. Thanks for pointing that out. I'm aiming to cover most of it in a day or two to have a good understanding of how the project will fit in. However, I won't be able to cover all the Omega- -Script commands but atleast the most related ones like $log.> We need some way of logging when people click on a search result — which > you can build using a second omegascript template, as Olly suggested.Okay, so it will act between the query template and a linked document pointed by a search result. Do you think we need to make this new template transparent to the user in some way as we might have to record some information such as user ids in the form of IP? In any case, we'll need a way to distinguish between different users by assigning unique ids to them.> So the only thing you really need to know is the ENTRY format, so you can > figure out how to log what you need. (Which you should identify before > diving into code.)I see; though it would be helpful to also have an example in the documentation for the same? There's a DEFAULT_LOG_ENTRY string in query.cc that I can across while on the word_in_list PR: "$or{$env{REMOTE_HOST},$env{REMOTE_ADDR},-}\t" "[$date{$now,%d/%b/%Y:%H:%M:%S} +0000]\t" "$if{$cgi{X},add,$if{$cgi{MORELIKE},morelike,query}}\t" "$dbname\t" "$query\t" "$msize$if{$env{HTTP_REFERER},\t$env{HTTP_REFERER}}"; Could you explain the meaning of third and and last strings?> You need to think more carefully about the layers involved here. We don't > want to post-process the output of a template...Yes, so I thought about it in detail and I think the whole process would like the following from a broad perspective: 1. Rearrangement: Input the original results to the FairPairs which will rearrange them and the rearranged results will be presented on the query template. 2. Logging: Log the required data using a new template and store it in an appropriate format for further processing. 3. Click Models: These are successors of preference pair models which I mentioned earlier. We have some options here as descibed in book "Click Models for Web Search" such as DBN, DCN, CCN etc. which will be trained on a relevance dataset to provide us with relevance scores of results links in our logs using which we'll generate Qrel file as used by xapian-letor. To train a click model, we'd need a relevance prediction dataset that should contain human generated binary relevance labels for query-document pairs. I'm curious to know from where we can obtain such a dataset. One that I know of is Yandex web seach challenge dataset on Kaggle. And, thanks for the link to MSet re-ordering system. I'll check out ideas that were discussed there.> That page is ancient, so I hope you're actually installing the 1.4 series > Xapian and Omega!Latest stable release is 1.4 series but I actually have 1.5 series installed which I think is because I installed dev version from latest git master. I don't think that should be a problem here?> That looks to me like you haven't installed omega, but are trying to run > with the development versionI've all xapian related executables in /usr/local/bin including omindex. Does that suggest Omega is installed?> When you ran `make install` for omega, it will have copied the CGI somewhereIn /usr/local/lib/xapian-omega/bin, I can't find CGI but these file: mhtml2html, omega, outlookmsg2html, rfc822tohtml and vcard2text.> More generally, I'd recommend reading the omega documentation.Yes, I'll go through it. I'll give it a second try after reading the docs and may be ask for help with setting up Omega on IRC if I run into an issue again. Thanks, Vivek
On 22 Mar 2017, at 14:27, Vivek Pal <vivekpal.dtu at gmail.com> wrote:>> We need some way of logging when people click on a search result — which >> you can build using a second omegascript template, as Olly suggested. > > Okay, so it will act between the query template and a linked document pointed > by a search result. Do you think we need to make this new template transparent > to the user in some way as we might have to record some information such as > user ids in the form of IP? In any case, we'll need a way to distinguish > between different users by assigning unique ids to them.You could do that by identifying the search session instead of the user, which makes it closer to what we need than to something that might trip you into privacy concerns.>> So the only thing you really need to know is the ENTRY format, so you can >> figure out how to log what you need. (Which you should identify before >> diving into code.) > > I see; though it would be helpful to also have an example in the documentation > for the same?We don't really need an example; however I didn't read the documentation carefully, so it may warrant rewording. Or maybe I should just be more diligent in future.> There's a DEFAULT_LOG_ENTRY string in query.cc that I can across > while on the word_in_list PR: > > "$or{$env{REMOTE_HOST},$env{REMOTE_ADDR},-}\t" > "[$date{$now,%d/%b/%Y:%H:%M:%S} +0000]\t" > "$if{$cgi{X},add,$if{$cgi{MORELIKE},morelike,query}}\t" > "$dbname\t" > "$query\t" > "$msize$if{$env{HTTP_REFERER},\t$env{HTTP_REFERER}}"; > > Could you explain the meaning of third and and last strings?Third records some information about what sort of query it is — add, morelike or a plain query. Last provides the estimated match size and then the HTTP referrer if one were set. Neither is particularly interesting in this case.> 3. Click Models: These are successors of preference pair models which I > mentioned earlier. We have some options here as descibed in book "Click > Models for Web Search" such as DBN, DCN, CCN etc. which will be trained > on a relevance dataset to provide us with relevance scores of results links in > our logs using which we'll generate Qrel file as used by xapian-letor.… and you'll need a way to use letor from omega, or you'll have trained a model for no good reason :)> Latest stable release is 1.4 series but I actually have 1.5 series installed > which I think is because I installed dev version from latest git master. I > don't think that should be a problem here?No, that's even better. I just didn't want you to be using the very old version mentioned in the walkthrough :)>> That looks to me like you haven't installed omega, but are trying to run >> with the development version > > I've all xapian related executables in /usr/local/bin including omindex. Does > that suggest Omega is installed?Yes. But if you follow the walkthrough, it copies the uninstalled version of the omega CGI.>> When you ran `make install` for omega, it will have copied the CGI somewhere > > In /usr/local/lib/xapian-omega/bin, I can't find CGI but these file: > mhtml2html, omega, outlookmsg2html, rfc822tohtml and vcard2text.omega is the CGI (I think). J -- James Aylett devfort.com — spacelog.org — tartarus.org/james/
> You could do that by identifying the search session instead of the user, > which makes it closer to what we need than to something that might trip you > into privacy concerns.Okay, that would be much better. :)> Third records some information about what sort of query it is — add, > morelike or a plain query. Last provides the estimated match size and then > the HTTP referrer if one were set. Neither is particularly interesting in > this case.Thanks for the explanation. So, as I understand it, we'll need some more info to be logged than this to be able to train click models for relevance judgeme- -nts.> and you'll need a way to use letor from omega, or you'll have trained a > model for no good reason :)Sorry, I may have misunderstood you here but why would we need a way to use letor from omega? For training Letor module, wouldn't we just need two files i.e. Query and Qrel as mentioned in the xapian-letor docs? Letor API can then generate the final training file using those two files. And to mine the relevance judgements for Qrel file from logs, we'll need to train one of the click models such as DBM etc.. Is there a better way to mine the relevance judgements than click models?> Yes. But if you follow the walkthrough, it copies the uninstalled version > of the omega CGI. omega is the CGI (I think).Oh, I thought it'd be a .cgi file. Okay, so I just need to copy this omega from /usr/local/lib/xapian-omega/bin to usr/lib/cgi-bin and work with it. Thanks, Vivek