On Fri, Apr 26, 2024 at 10:37:37PM +0000, Eric Wong wrote:> Say I have a bunch of values which I want to filter a query against. > If I had boolean terms, it could just OP_OR against the whole set. > IOW, this is what notmuch does with terms: > > std::set<std::string> terms; > > // notmuch populates terms via terms.insert(*i)... > > Query(OP_OR, terms.begin(), terms.end());The slicker way to do this (unless you need the std::set for other reasons) would be: Xapian::Query filter = Xapian::Query::MatchAll; while (more_terms()) { filter |= Xapian::Query(get_next_term()); } Assuming you're using Xapian >= 1.4.10 then |= on an OP_OR Query with refcount 1 (as here) is specially optimised and just appends a new subquery so you get a single OP_OR node and this is particularly efficient (if the refcount is higher it'll build a tree, but still get optimised the same way - it's just a bit less efficient because it needs to allocate for each node in the tree). One difference is that filter here will match everything if there are no filter terms, so you can just always apply it: query = Xapian::Query(OP_FILTER, query, filter); The notmuch way will match nothing for that case so you need to conditionalise applying the filter (assuming you still want to match something when there are no filter terms).> With a set of integers I have (after sortable_serialise), would the > best way be to OP_OR a bunch of OP_VALUE_RANGE queries together? > > So, perhaps something like: > > Query(OP_OR, > Query(OP_VALUE_RANGE, column, v[0], v[0]), > Query(OP_VALUE_RANGE, column, v[1], v[2]),Did you mean 1 and 1 here?> Query(OP_VALUE_RANGE, column, v[3], v[3]), > ... > Query(OP_VALUE_RANGE, column, v[LAST], v[LAST])) > > // Or (totally not even compile-tested and I don't know C++) > // something like: > > std::vector<Xapian::Query> subq; > > for (size_t i = 0; i < nelem; i++) { > std::string v = sortable_serialise(int_vals[i])); > > subq.insert(Query(OP_VALUE_RANGE, column, v, v)); > } > > Query(OP_OR, subq.begin(), subq.end());You can build it up the same way with: filter |= Query(OP_VALUE_RANGE, column, v, v);> It seems what I'm really looking for is an OP_VALUE_OR or OP_VALUE_IN; > but only OP_VALUE_{GE,LE,RANGE} exists.Just use OP_VALUE_RANGE with equal bounds. Another approach is to use a custom PostingSource which can fetch the value for that slot for each document being considered and check if it's one of the values you want. Cheers, Olly
Thank you, Mr. Wong and Mr. Betts, for your crucial inputs. The only aspect which I find missing is an intuitive GUI and doesn't bother us with nitty-gritty / intricacies / 'remembering the code bits' of the CUI / terminal. Such a GUI system would welcome more users and make the program more popular among users. I am considering the options available from your inputs presently. Thank you once again, and best wishes, Rajib Etc.
On Sat, Apr 27, 2024 at 12:33:36AM +0100, Olly Betts wrote:> On Fri, Apr 26, 2024 at 10:37:37PM +0000, Eric Wong wrote: > > Say I have a bunch of values which I want to filter a query against. > > If I had boolean terms, it could just OP_OR against the whole set. > > IOW, this is what notmuch does with terms: > > > > std::set<std::string> terms; > > > > // notmuch populates terms via terms.insert(*i)... > > > > Query(OP_OR, terms.begin(), terms.end()); > > The slicker way to do this (unless you need the std::set for other > reasons) would be: > > Xapian::Query filter = Xapian::Query::MatchAll; > while (more_terms()) { > filter |= Xapian::Query(get_next_term()); > } > > Assuming you're using Xapian >= 1.4.10 then |= on an OP_OR Query with > refcount 1 (as here) is specially optimised and just appends a new > subquery so you get a single OP_OR node and this is particularly > efficient (if the refcount is higher it'll build a tree, but still get > optimised the same way - it's just a bit less efficient because it needs > to allocate for each node in the tree). > > One difference is that filter here will match everything if there are > no filter terms, so you can just always apply it: > > query = Xapian::Query(OP_FILTER, query, filter); > > The notmuch way will match nothing for that case so you need to > conditionalise applying the filter (assuming you still want to match > something when there are no filter terms).Something else worthy of mention here is that there's another approach using a shim iterator class which is useful for cases such as a synonym or phrase query that can't have subqueries appended one by one. You make a little custom iterator class which returns a subquery on each iteration and then construct a Query object passing begin and end iterators of this class. C++ templates then effectively turn that into a loop without needing a container as temporary storage. For an example, see SynonymIterator here: https://git.xapian.org/?p=xapian;a=blob;f=xapian-core/queryparser/queryparser.lemony;h=0ffeb50eaa39a2dffa257b5b6913112099931d70;hb=refs/heads/master#l348 I don't think you can achieve this via the bindings though, whereas the operator |= trick above should work for bindings which wrap that operator in a usable way. Cheers, Olly