On 9 Aug 2018, at 10:09, Katja Abramova <katja.abramova at dimension.it>
wrote:
> I need to do a search for a
> multi-word query in which particular fields are boosted - preferably at
> query time. That is, given a query like "the cat is lying on the
mat" (with
> an OR operator, ignoring word positions but with stemming and stop words
> removed), I'd like to search for that query in both, say Title and Body
of
> the documents but with Title field boosted to 4 and Body to 2.
Hi, Katja!
There are a few different things going on here, so I'll try to go through
them one at a time.
Field searching in Xapian is generally done using prefixes; the practical
example in our "getting started" guide discusses this, and has sample
code in python. I'd read from the beginning, including the core concepts.
(https://getting-started-with-xapian.readthedocs.io/).
It also shows how to use the QueryParser to split and stem user-inputted queries
into Xapian Query objects. You'll want to set the default_prefix when you
call QueryParser::parse_query (this is covered in the concepts section of the
getting started guide:
https://getting-started-with-xapian.readthedocs.io/en/latest/concepts/indexing/terms.html?highlight=default_prefix#fields-and-term-prefixes).
You'll end up with python that looks a little like this:
# Some code that sets up the queryparser (stemming, for instance).
# See the getting started guide for a complete example.
# ...
# S = Subject. Note that you can't use a keyword argument for
default_prefix, so we have
# to provide the flags as well.
title_query = queryparser.parse_query(querystring,
xapian.QueryParser.FLAG_DEFAULT, "S")
Then you need to use OP_SCALE_WEIGHT, as you've identified, to apply the
different weightings to the queries parsed against the two fields.
weighted_title_query = xapian.Query(xapian.Query.OP_SCALE_WEIGHT, title_query,
4)
Finally you need to combine the two weighted queries. You can do this using
OP_OR, which will rank higher a document where both the title and the body
match. Alternatively, OP_MAX may work better (use whichever side ranks higher,
which will probably be the higher-weighted one). Something like this:
final_query = xapian.Query(xapian.Query.OP_MAX, [weighted_title_query,
weighted_body_query])
(Note that boosting title to 4 and body to 2 probably isn't better than just
boosting title to 2 and leaving body at standard weighting. Of course if you
have a more complex search structure going on then that may still make sense!)
Hope this helps!
J
--
James Aylett, occasional troublemaker & project governance
xapian.org