Hey all, I've been using ->add_database for a few years to tie sharded DBs together and it works great. Now, I want to be able to search across several DBs which aren't sharded, say: linux-DB, glibc-DB, freebsd-DB. I want to search for something across all of them, but prioritize results to favor one or some of those DBs over others. Is there a way to do that without reindexing? Or would I fiddle with wdf_inc for all ->index_text and ->add_term calls on a per-DB basis? Thanks.
On Fri, Feb 07, 2020 at 09:33:08PM +0000, Eric Wong wrote:> Hey all, I've been using ->add_database for a few years > to tie sharded DBs together and it works great. > > Now, I want to be able to search across several DBs > which aren't sharded, say: linux-DB, glibc-DB, freebsd-DB. > > I want to search for something across all of them, but > prioritize results to favor one or some of those DBs over > others. Is there a way to do that without reindexing?With git master you can achieve this with a PostingSource subclass as there's a new PostingSource::reset() method which gets passed the shard it is being called for, so you can set an extra weight contribution based on that. This is a replacement for PostingSource::init() in 1.4, which doesn't know which shard it is being called for. You can then combine this PostingSource with your query with AND_MAYBE (so it matches exactly what the query does, but takes an extra weight contribution from the PostingSource for matching documents).> Or would I fiddle with wdf_inc for all ->index_text and ->add_term > calls on a per-DB basis?That would probably work if you don't want to be able to vary the prioritisation dynamically. Cheers, Olly
Olly Betts <olly at survex.com> wrote:> On Fri, Feb 07, 2020 at 09:33:08PM +0000, Eric Wong wrote: > > Hey all, I've been using ->add_database for a few years > > to tie sharded DBs together and it works great. > > > > Now, I want to be able to search across several DBs > > which aren't sharded, say: linux-DB, glibc-DB, freebsd-DB. > > > > I want to search for something across all of them, but > > prioritize results to favor one or some of those DBs over > > others. Is there a way to do that without reindexing? > > With git master you can achieve this with a PostingSource subclass as > there's a new PostingSource::reset() method which gets passed the > shard it is being called for, so you can set an extra weight > contribution based on that. This is a replacement for > PostingSource::init() in 1.4, which doesn't know which shard it is being > called for. > > You can then combine this PostingSource with your query with AND_MAYBE > (so it matches exactly what the query does, but takes an extra weight > contribution from the PostingSource for matching documents).Cool. I'll keep that in mind down the line. That could be a while since some users are still on 1.2 and tend to stick to what's provided by enterprise/LTS distros.> > Or would I fiddle with wdf_inc for all ->index_text and ->add_term > > calls on a per-DB basis? > > That would probably work if you don't want to be able to vary the > prioritisation dynamically.That's a compromise I'll have to make, for now. Thanks for the response!