I was thinking about this some more: Is there a reason I can't just weight by some function of recency at indexing time? $weight = get_weight_based_on_recency(...); $tg->index_text($txt,$weight); If I wanted to allow the user the option of searching either in recency-weighted mode or not, I could index each document into 2 different databases, one with and one without. This avoids having to mess with subclassing PostingSource and C++ and all that. - Alex Aminoff NBER On 05/03/2016 08:15 AM, James Aylett wrote:> On Tue, May 03, 2016 at 07:56:19AM -0400, Alex Aminoff wrote: > >> Perhaps I am not understanding the basic concept, but I was figuring >> we would just write a subclass of PostingSource in C++ that does >> what we want, and not bother with the perl bindings. Is that not >> possible? I realize that ideally we would develop the general >> solution and share our code out to the community, but I assume that >> would be more work. > You should be able to subclass in C++ and then bind out to Perl to use > it fairly easily. What would probably be more useful than having > public, re-usable code is your experiences in using this approach to > balance probabilistic weighting with reverse date weights. > > J >
On Mon, May 16, 2016 at 12:35:53PM -0400, Alex Aminoff wrote:> I was thinking about this some more: Is there a reason I can't just > weight by some function of recency at indexing time? > > $weight = get_weight_based_on_recency(...); > $tg->index_text($txt,$weight);The second parameter there is a WDF multiplier, which isn't really "weight". It depends on the weighting formula you're using (and the parameters set for it), but simply scaling up the WDF values for a whole document is likely to be counteracted by the corresponding increase in the document length (since that is SUM(WDF)). And the average document length will be fairly meaningless, which will probably make the relevance weighting less effective. Also, recency changes with passing time, so you'll either have to reindex regularly, or else $weight will have to keep increasing as time passes. So it seems a problematic approach to me. I think you'd need to try it to see if it can be made to work satisfactorily, and probably be prepared to tweak the weighting scheme parameters. Cheers, Olly
QueryParser is great, but I would like to make a query myself, so I can filter results by a specified value (in this case restricting by epoch time after a certain value) My code looks like this, and compiles, and appears like it should work according to the perl source: my $query = $qp->parse_query($querystr); if ($datefilter) { my $filterepoch = time() - ($datefilter * 60 * 60 * 24); my $filterquery = Xapian::Query->new(OP_VALUE_GE,I_DATE,$filterepoch); $query = Xapian::Query->new(OP_FILTER,$query,$filterquery); } This appears to die on Xapian::Query->new with No matching function for overloaded 'new_Query' at /usr/local/lib/perl5/site_perl/Xapian.pm line 1282. I see in Xapian.pm where Xapian::Query attempts to call Xapianc::new_Query . Is there some other way I am supposed to do this? I should say that I am using xapian-bindings-1.4.4 which I compiled and installed myself. Thanks, - Alex Aminoff NBER