Dustin Oprea
2021-Dec-27 04:41 UTC
Better Understanding of Programmatic Query Construction
It doesn't seem as if there is much documentation for query building. I've been mostly biased towards Python documentation in my searches. There doesn't appear to be a way to search the email archives. What documentation there is mentions this example: ( https://github.com/xapian/xapian-docsprint/commit/f04c97f4d1722c2796ba5d807f441d5d2d4eec4d#diff-6ae69d2eefbbb95e7140a8e82ce0751fa6872172d52a05a2c7586e938bf8e4d1R288 ) subq = xapian.Query(xapian.Query.OP_AND, "hello", "world") q = xapian.Query(xapian.Query.OP_AND, [subq, "foo", xapian.Query("bar", 2)]) Based on this limited amount of information, I tried converting my original string query from something like: 'TERM1' AND title:"TERM2" to (each more unbounded/desperate then the previous): 1: q = xapian.Query(xapian.Query.OP_AND, "'TERM1'", "TERM2") (based on the first statement) 2: q = xapian.Query(xapian.Query.OP_AND, ["'TERM1'", "TERM2"]) 3: q = xapian.Query(xapian.Query.OP_AND, ["TERM1", "TERM2"]) 4: q = xapian.Query(xapian.Query.OP_AND, ["TERM1"]) 5: q = xapian.Query(xapian.Query.OP_OR, ["TERM1"]) Whereas the string query yielded results, I got zero results in each of these. What am I doing wrong? I'd appreciate someone explaining how to do literal (read: unstemmed, proper noun) searches. I'm not sure if wrapping in an inner set of quotes makes sense in this situation. Also, I'm assuming that the example translates to "hello AND world AND foo AND ??", but how does that *xapian.Query("bar", 2)* term translate? Thank you. Dustin
On Sun, Dec 26, 2021 at 11:41:08PM -0500, Dustin Oprea wrote:> It doesn't seem as if there is much documentation for query building. I've > been mostly biased towards Python documentation in my searches. There > doesn't appear to be a way to search the email archives. > > What documentation there is mentions this example: > > ( > https://github.com/xapian/xapian-docsprint/commit/f04c97f4d1722c2796ba5d807f441d5d2d4eec4d#diff-6ae69d2eefbbb95e7140a8e82ce0751fa6872172d52a05a2c7586e938bf8e4d1R288 > )You'll find a more readable version of that here: https://xapian.org/docs/bindings/python3/introduction.html#query As noted there: | The Python API largely follows the C++ API - the differences and | additions are noted below. At present at least, you'll want to look at the C++ API docs for guidance, in this case: https://xapian.org/docs/apidoc/html/classXapian_1_1Query.html The document you're looking at only covers how the Python API differs from the C++ one.> Based on this limited amount of information, I tried converting my original > string query from something like: > > 'TERM1' AND title:"TERM2" > > to (each more unbounded/desperate then the previous): > > 1: q = xapian.Query(xapian.Query.OP_AND, "'TERM1'", "TERM2") (based on the > first statement) > 2: q = xapian.Query(xapian.Query.OP_AND, ["'TERM1'", "TERM2"]) > 3: q = xapian.Query(xapian.Query.OP_AND, ["TERM1", "TERM2"]) > 4: q = xapian.Query(xapian.Query.OP_AND, ["TERM1"]) > 5: q = xapian.Query(xapian.Query.OP_OR, ["TERM1"])You can see the xapian.Query object that the QueryParser produces by calling str() on it: $ python3 -c 'import sys, xapian; qp = xapian.QueryParser(); qp.add_prefix("title", "S"); print(str(qp.parse_query(sys.stdin.readline())))' 'TERM1' AND title:"TERM2" Query((term1 at 1 AND Sterm2 at 2)) (Here I've fed the query string in on stdin to avoid awkward quoting since your query string contains both single and double quotes.) The @1 and @2 are query positions, which mostly don't matter - the main thing they currently support is iterating terms in "query order", which might not always be the same as the order within the xapian.Query tree - e.g. the query `-foo bar` -> Query((Zfoo at 2 AND_NOT Zbar at 1)) Assuming you don't care about query positions, then: q = xapian.Query(xapian.Query.OP_OR, ["term1", "Sterm2"]) If you want the positions set too: q = xapian.Query(xapian.Query.OP_OR, [xapian.Query("term1", 1, 1), xapian.Query("Sterm2", 1, 2)])> Whereas the string query yielded results, I got zero results in each of > these. What am I doing wrong? I'd appreciate someone explaining how to do > literal (read: unstemmed, proper noun) searches. I'm not sure if wrapping > in an inner set of quotes makes sense in this situation.Your problems are: * Terms are normalised even without stemming (relevant here: case-folded to lower case, and some punctuation ignored) * `title:` is mapped to a term prefix (I'm assuming you're using `S` as that's the usual term prefix for `title:`)> subq = xapian.Query(xapian.Query.OP_AND, "hello", "world") > q = xapian.Query(xapian.Query.OP_AND, [subq, "foo", xapian.Query("bar", 2)])> Also, I'm assuming that the example translates to "hello AND world AND foo > AND ??", but how does that *xapian.Query("bar", 2)* term translate?It's `bar` but with within query frequency (wqf) set to 2. I don't think there's a way to create exactly this query by parsing a query string. The QueryParser is intended to parse user-entered search queries, not to provide a way to generate every possible Query object tree from a string specification. Aside from that detail, this would parse to the above: ("hello" AND "world") AND "foo" AND "bar" The parentheses aren't important here though since the meaning is the same without them (and the query optimiser knows that). I've quoted the terms here to prevent stemming (since xapian.Query just takes the term exactly as specified). If you haven't set a stemmer on the QueryParser then those are not needed. Cheers, Olly