I have an application for synonyms for tags in in notmuch, which means synonym expansion for a particular boolean prefix. I have a vague memory of Olly telling me this doesn't work, but I'm not sure about the details. My higher level goal is to support a kind of indirection with tags, where query tag:foo can really generate tag:bar or tag:fub, depending on some kind of configuration. Please CC me on any replies, I'm not subscribed to the list. ###################################################################### import xapian db=xapian.WritableDatabase("db",xapian.DB_CREATE_OR_OPEN) db.add_synonym("Kfoo","Kbar") db.commit(); qp = xapian.QueryParser() qp.set_database(db); # replacing add_prefix with add_boolean_prefix stops synonym expansion, tested with 1.2.21 qp.add_prefix("tag","K") query=qp.parse_query("tag:foo",xapian.QueryParser.FLAG_AUTO_SYNONYMS) print query; -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 647 bytes Desc: not available URL: <http://lists.xapian.org/pipermail/xapian-discuss/attachments/20151227/13a23af9/attachment.sig>
On Sun, Dec 27, 2015 at 11:24:36PM -0400, David Bremner wrote:> I have an application for synonyms for tags in in notmuch, which means > synonym expansion for a particular boolean prefix. I have a vague memory > of Olly telling me this doesn't work, but I'm not sure about the > details.Yes, synonym expansion isn't done for boolean terms (only "probabilistic terms", i.e. words in text). Not sure if there's a reason why, or if it's just something we didn't consider when synonyms were added.> My higher level goal is to support a kind of indirection with tags, > where query tag:foo can really generate tag:bar or tag:fub, depending on > some kind of configuration.A better option for this is probably a FieldProcessor - you set one for a prefix and the it gets passed the value and returns a Query object for it. E.g. in lua (where you can just pass an anon function for the FieldProcessor - we ought to support C++11 lambdas for such things): require "xapian" foo_tag_term = "Kbar" qp = xapian.QueryParser() qp:add_boolean_prefix("tag", function (x) if x == "foo" then return xapian.Query(foo_tag_term) end return xapian.Query("K" .. x) end) print(qp:parse_query("tag:foo tag:x")) Which parses to give: Query(0 * (Kbar OR Kx)) To achieve this with synonyms in a configurable way you'd need to rewrite the synonyms in the database to match the current configuration, so it's not as dynamic as the above. FieldProcessor isn't in 1.2.x, but then support for synonyms for boolean terms isn't in any version. Cheers, Olly
Olly Betts <olly at survex.com> writes:> A better option for this is probably a FieldProcessor - you set one for > a prefix and the it gets passed the value and returns a Query object > for it. E.g. in lua (where you can just pass an anon function for the > FieldProcessor - we ought to support C++11 lambdas for such things):[snip]> To achieve this with synonyms in a configurable way you'd need to > rewrite the synonyms in the database to match the current configuration, > so it's not as dynamic as the above.Well, the configuration needs to be somewhere. Would it make sense to from a performance point of view to be looking up foo_tag_term in document metadata? That was one of the attractions of using synonyms that there is already a persistent/atomic/configurable way of storing them. With a field processor we'd have to manage that ourselves, and I'm hoping to avoid managing my own cache, at least in the first revision.> FieldProcessor isn't in 1.2.x, but then support for synonyms for boolean > terms isn't in any version.[ Debian specific discussion follows; non-Debian users might find it boring and incomrehensible ] Yeah. I guess if there were 1.3 packages in Debian (experimental?), I'd consider optionally depending on them. There are several places where field processors could be useful for notmuch. I see the packages exist in Ubuntu, so I guess there wouldn't be that much packaging work? I guess this would be a perfect application of so-called "bike sheds", but who knows when these will actually become live. Would it help anything if I filed an RFP bug? d
Olly Betts <olly at survex.com> writes:> A better option for this is probably a FieldProcessor - you set one for > a prefix and the it gets passed the value and returns a Query object > for it. E.g. in lua (where you can just pass an anon function for the > FieldProcessor - we ought to support C++11 lambdas for such things):By the way, is there an online version of the FieldProcessor docs? Also, can I expect the API to lose the "experimental" epithet before 1.4 is released? cheers, d