thr3ads.net - Xapian discuss - prioritizing aggregated DBs [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Eric Wong

2020-Feb-08 18:04 UTC

prioritizing aggregated DBs

Olly Betts <olly at survex.com> wrote:> On Fri, Feb 07, 2020 at 09:33:08PM +0000, Eric Wong wrote:
> > Hey all, I've been using ->add_database for a few years
> > to tie sharded DBs together and it works great.
> > 
> > Now, I want to be able to search across several DBs
> > which aren't sharded, say: linux-DB, glibc-DB, freebsd-DB.
> > 
> > I want to search for something across all of them, but
> > prioritize results to favor one or some of those DBs over
> > others.  Is there a way to do that without reindexing?
> 
> With git master you can achieve this with a PostingSource subclass as
> there's a new PostingSource::reset() method which gets passed the
> shard it is being called for, so you can set an extra weight
> contribution based on that.  This is a replacement for
> PostingSource::init() in 1.4, which doesn't know which shard it is
being
> called for.
> 
> You can then combine this PostingSource with your query with AND_MAYBE
> (so it matches exactly what the query does, but takes an extra weight
> contribution from the PostingSource for matching documents).
Cool.  I'll keep that in mind down the line.  That could be a
while since some users are still on 1.2 and tend to stick to
what's provided by enterprise/LTS distros.
> > Or would I fiddle with wdf_inc for all ->index_text and
->add_term
> > calls on a per-DB basis?
> 
> That would probably work if you don't want to be able to vary the
> prioritisation dynamically.
That's a compromise I'll have to make, for now.  Thanks for the
response!

Olly Betts

2020-Feb-09 22:23 UTC

head link

prioritizing aggregated DBs

On Sat, Feb 08, 2020 at 06:04:42PM +0000, Eric Wong
wrote:> Olly Betts <olly at survex.com> wrote:
> > On Fri, Feb 07, 2020 at 09:33:08PM +0000, Eric Wong wrote:
> > > Or would I fiddle with wdf_inc for all ->index_text and
->add_term
> > > calls on a per-DB basis?
> > 
> > That would probably work if you don't want to be able to vary the
> > prioritisation dynamically.
> 
> That's a compromise I'll have to make, for now.  Thanks for the
> response!
BTW, for either approach try to add the databases which are more boosted
first.  That will tend to mean more good matches are found sooner, which
will help the matcher take short cuts.

Cheers,
    Olly

Eric Wong

2020-Feb-19 10:23 UTC

head link

prioritizing aggregated DBs

Olly Betts <olly at survex.com> wrote:> On Sat, Feb 08, 2020 at 06:04:42PM +0000, Eric Wong wrote:
> > Olly Betts <olly at survex.com> wrote:
> > > On Fri, Feb 07, 2020 at 09:33:08PM +0000, Eric Wong wrote:
> > > > Or would I fiddle with wdf_inc for all ->index_text and
->add_term
> > > > calls on a per-DB basis?
> > > 
> > > That would probably work if you don't want to be able to vary
the
> > > prioritisation dynamically.
> > 
> > That's a compromise I'll have to make, for now.  Thanks for
the
> > response!
> 
> BTW, for either approach try to add the databases which are more boosted
> first.  That will tend to mean more good matches are found sooner, which
> will help the matcher take short cuts.
Thanks, I'll keep that in mind.  Btw, is there a way to quickly
figure out which sub-DB a retrieved document or mset item belongs to?

I suppose I could add that info to docdata since I'm having to
reindex, anyways...

Maybe Matching Threads

Search for more seemingly similar threads

Xapian discuss - Feb 2020 - prioritizing aggregated DBs

prioritizing aggregated DBs

prioritizing aggregated DBs

prioritizing aggregated DBs

Maybe Matching Threads