Olly Betts <olly at survex.com> wrote:> On Fri, Feb 07, 2020 at 09:33:08PM +0000, Eric Wong wrote: > > Hey all, I've been using ->add_database for a few years > > to tie sharded DBs together and it works great. > > > > Now, I want to be able to search across several DBs > > which aren't sharded, say: linux-DB, glibc-DB, freebsd-DB. > > > > I want to search for something across all of them, but > > prioritize results to favor one or some of those DBs over > > others. Is there a way to do that without reindexing? > > With git master you can achieve this with a PostingSource subclass as > there's a new PostingSource::reset() method which gets passed the > shard it is being called for, so you can set an extra weight > contribution based on that. This is a replacement for > PostingSource::init() in 1.4, which doesn't know which shard it is being > called for. > > You can then combine this PostingSource with your query with AND_MAYBE > (so it matches exactly what the query does, but takes an extra weight > contribution from the PostingSource for matching documents).Cool. I'll keep that in mind down the line. That could be a while since some users are still on 1.2 and tend to stick to what's provided by enterprise/LTS distros.> > Or would I fiddle with wdf_inc for all ->index_text and ->add_term > > calls on a per-DB basis? > > That would probably work if you don't want to be able to vary the > prioritisation dynamically.That's a compromise I'll have to make, for now. Thanks for the response!
On Sat, Feb 08, 2020 at 06:04:42PM +0000, Eric Wong wrote:> Olly Betts <olly at survex.com> wrote: > > On Fri, Feb 07, 2020 at 09:33:08PM +0000, Eric Wong wrote: > > > Or would I fiddle with wdf_inc for all ->index_text and ->add_term > > > calls on a per-DB basis? > > > > That would probably work if you don't want to be able to vary the > > prioritisation dynamically. > > That's a compromise I'll have to make, for now. Thanks for the > response!BTW, for either approach try to add the databases which are more boosted first. That will tend to mean more good matches are found sooner, which will help the matcher take short cuts. Cheers, Olly
Olly Betts <olly at survex.com> wrote:> On Sat, Feb 08, 2020 at 06:04:42PM +0000, Eric Wong wrote: > > Olly Betts <olly at survex.com> wrote: > > > On Fri, Feb 07, 2020 at 09:33:08PM +0000, Eric Wong wrote: > > > > Or would I fiddle with wdf_inc for all ->index_text and ->add_term > > > > calls on a per-DB basis? > > > > > > That would probably work if you don't want to be able to vary the > > > prioritisation dynamically. > > > > That's a compromise I'll have to make, for now. Thanks for the > > response! > > BTW, for either approach try to add the databases which are more boosted > first. That will tend to mean more good matches are found sooner, which > will help the matcher take short cuts.Thanks, I'll keep that in mind. Btw, is there a way to quickly figure out which sub-DB a retrieved document or mset item belongs to? I suppose I could add that info to docdata since I'm having to reindex, anyways...