Hey all, is there a way to exclude results by docid? I'm using a combined DB and guaranteeing the order for ->add_database calls. All sub DBs have monotonically increasing docids, so the combined docid will remain monotonically increasing. I'll store the maximum docid from a previous search ($OLD_MAX), and it would be nice to have Xapian only could return results it hasn't returned before. Otherwise, I'll continue using: set_weighting_scheme(BoolWeight->new) set_docid_order(ENQ_DESCENDING) ...and stop iterating mset retrieval once $docid < $OLD_MAX I'm using Perl Search::Xapian from Debian stable (buster). Thanks
On Mon, Feb 08, 2021 at 06:06:38PM +0000, Eric Wong wrote:> Hey all, is there a way to exclude results by docid?There's nothing built in currently. It can be done with a custom PostingSource subclass, but that's not possible from Search::Xapian.> I'm using a combined DB and guaranteeing the order for > ->add_database calls. All sub DBs have monotonically > increasing docids, so the combined docid will remain > monotonically increasing. > > I'll store the maximum docid from a previous search ($OLD_MAX),I'm not sure I see how this works. Say there are two databases with docids in use: A = {1,2} B = {1,2,3,4} Then the combined database is: A+B = {1=A1,2=B1,3=A2,4=B2, 6=B3, 8=B4} (and 5 and 7 are unused). This means $OLD_MAX is 8 Then we add a document to A: A {1,2,3} B {1,2,3,4} A3 is 5 in the combined database, which is below $OLD_MAX, so this new document won't be returned by an incremental search. I think this would only work if you carefully add documents in a round-robin fashion, or otherwise take care to avoid this issue. Cheers, Olly
Just for interest, dunno if helpful (or a good way of doing it!), what we do (for email alerting) is record an incrementing batch number B123 with each indexing pass of data, and then for an alert run only search for results with batch numbers since the last alert run. ATB, Matthew On Mon, 8 Feb 2021 at 18:07, Eric Wong <e at 80x24.org> wrote:> > Hey all, is there a way to exclude results by docid? > > I'm using a combined DB and guaranteeing the order for > ->add_database calls. All sub DBs have monotonically > increasing docids, so the combined docid will remain > monotonically increasing. > > I'll store the maximum docid from a previous search ($OLD_MAX), > and it would be nice to have Xapian only could return results it > hasn't returned before. Otherwise, I'll continue using: > > set_weighting_scheme(BoolWeight->new) > set_docid_order(ENQ_DESCENDING) > > ...and stop iterating mset retrieval once $docid < $OLD_MAX > > I'm using Perl Search::Xapian from Debian stable (buster). > > Thanks >