> There's a lot of flexibility already, because the log format is just > omegascript. So I don't think you need to implement a new command to > achieve this. (Although you might need a command to generate the query > id. It depends on how you're going to do that.)Ok, I'll try adapting the existing log command to achieve the kind of logging we want. And, about the command to generate unique query ids, I've been thinking to tackle this as a kind of hashing problem where we'll basically provide the query text as input to generate a unique id as output. Although, coming up with a 100% collision-free hashing algorithm for this task is something worth considering first. Other caveats include max length of the generated unique id string and whether we should truncate leading whitespaces from the query text to avoid "essentially same" queries from being recorded in different entries in the log file. What do you suggest? Thanks, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20170607/e381ff59/attachment.html>
On 6 Jun 2017, at 23:12, Vivek Pal <vivekpal.dtu at gmail.com> wrote:> > There's a lot of flexibility already, because the log format is just > > omegascript. So I don't think you need to implement a new command to > > achieve this. (Although you might need a command to generate the query > > id. It depends on how you're going to do that.) > > Ok, I'll try adapting the existing log command to achieve the kind of logging > we want.In case I wasn't clear: I don't think you have to modify the command at all. Just create a template that uses the command as it currently works.> And, about the command to generate unique query ids, I've been thinking > to tackle this as a kind of hashing problem where we'll basically provide the > query text as input to generate a unique id as output. Although, coming > up with a 100% collision-free hashing algorithm for this task is something > worth considering first.Don't worry about collisions; it isn't a catastrophe if this collides sometimes (especially as you can detect when that happens), so any algorithm that's fairly fast should be fine. (MD5 would give ~22 base64 characters, which sounds fine to me; we already have an implementation in the omega source code, so I'd probably use that.) From the models you talked about, I assume you'll need to hash more than just the query text — I'm guessing something like the timestamp then pass it between different invocations of the CGI (both for click throughs and for navigating around the query pages).> Other caveats include max length of the generated > unique id string and whether we should truncate leading whitespaces from > the query text to avoid "essentially same" queries from being recorded in > different entries in the log file. What do you suggest?Stripping whitespace at either end of the query string seems reasonable. J -- James Aylett devfort.com — spacelog.org — tartarus.org/james/
> In case I wasn't clear: I don't think you have to modify the command > at all. Just create a template that uses the command as it currently > works.I thought we needed a new template only for the second log file? To generate the first log file using the existing $log command, I have introduced another $log command in query template that looks like: $log{search.log,"$qid{$query}\t$query\t$did\t$topdoc"} - search.log: log file name in log_dir (var/log/omega) - $qid{$query}: to return query id for the given query. Planning to use the existing MD5 implementation here as you pointed out. - $query: existing command that returns query text. - $did: to return a list of doc ids on the result page. I'm aware of $id command that returns doc id of the "current" doc but not sure what current doc means there. - $topdoc: existing command to set offset value. I'm currently working towards implementing the support for new commands i.e. $qid and $did. An example log entry assuming that we allow only 4 docs on a single result page: q101 "simple query text" [doc0, doc1, doc2, doc3] 0 q101 "simple query text" [doc4, doc5, doc6, doc7] 4 q101 "simple query text" [doc8, doc9, doc10, doc11] 8 qid taken to be very simple for the purpose of this example and I'm not really sure about the doc id format so assumed it to be like that. Also, I noticed that the existing log command in query template i.e. $log{query.log} doesn't really log anything. I created query.log in log_dir as specified in omega.conf with read and write permission granted to the current system user but I see no logs in that file. Should the log command be included inside the html body for it to work (it currently appears after the closing html tag)? Another thing that concerns me is that whether logging happens whenever a new result page is loaded or it happens just once for each search? We certainly don't want to log the same page again in case user returns back to an already visited page but we do want to log each page once as that is how we'll be able to record offset values. Thanks, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20170608/bacbc0b7/attachment.html>