On Fri, Apr 26, 2024 at 10:37:37PM +0000, Eric Wong
wrote:> Say I have a bunch of values which I want to filter a query against.
> If I had boolean terms, it could just OP_OR against the whole set.
> IOW, this is what notmuch does with terms:
>
> std::set<std::string> terms;
>
> // notmuch populates terms via terms.insert(*i)...
>
> Query(OP_OR, terms.begin(), terms.end());
The slicker way to do this (unless you need the std::set for other
reasons) would be:
Xapian::Query filter = Xapian::Query::MatchAll;
while (more_terms()) {
filter |= Xapian::Query(get_next_term());
}
Assuming you're using Xapian >= 1.4.10 then |= on an OP_OR Query with
refcount 1 (as here) is specially optimised and just appends a new
subquery so you get a single OP_OR node and this is particularly
efficient (if the refcount is higher it'll build a tree, but still get
optimised the same way - it's just a bit less efficient because it needs
to allocate for each node in the tree).
One difference is that filter here will match everything if there are
no filter terms, so you can just always apply it:
query = Xapian::Query(OP_FILTER, query, filter);
The notmuch way will match nothing for that case so you need to
conditionalise applying the filter (assuming you still want to match
something when there are no filter terms).
> With a set of integers I have (after sortable_serialise), would the
> best way be to OP_OR a bunch of OP_VALUE_RANGE queries together?
>
> So, perhaps something like:
>
> Query(OP_OR,
> Query(OP_VALUE_RANGE, column, v[0], v[0]),
> Query(OP_VALUE_RANGE, column, v[1], v[2]),
Did you mean 1 and 1 here?
> Query(OP_VALUE_RANGE, column, v[3], v[3]),
> ...
> Query(OP_VALUE_RANGE, column, v[LAST], v[LAST]))
>
> // Or (totally not even compile-tested and I don't know C++)
> // something like:
>
> std::vector<Xapian::Query> subq;
>
> for (size_t i = 0; i < nelem; i++) {
> std::string v = sortable_serialise(int_vals[i]));
>
> subq.insert(Query(OP_VALUE_RANGE, column, v, v));
> }
>
> Query(OP_OR, subq.begin(), subq.end());
You can build it up the same way with:
filter |= Query(OP_VALUE_RANGE, column, v, v);
> It seems what I'm really looking for is an OP_VALUE_OR or OP_VALUE_IN;
> but only OP_VALUE_{GE,LE,RANGE} exists.
Just use OP_VALUE_RANGE with equal bounds.
Another approach is to use a custom PostingSource which can fetch the
value for that slot for each document being considered and check if it's
one of the values you want.
Cheers,
Olly