PGNet Dev
2020-Nov-01 14:20 UTC
v2.3.11.3 solr plugin search via MUA fails to match accented ascii characters; cmd line exec of `doveadm fts lookup` PANICs (assertion failed) [proposed patch]
On 11/1/20 1:56 AM, John Fawcett wrote:> At the moment I don't see other corrections needed in dovecot apart from > command line doveadm fts which is not a show stopper. Via doveadm search > I confirm - on my simple config - that search for accented or non > accented characters works correctly as it does via imap connection.thx. hopefully it'll get considered for a next release soon.> Only thing I cannot vouch for is bringing dovecot fts library and config > into the equation because my setup delegates almost everything to solr.do i understand correctly that you're solr-indexing your dovecot mail store withOUT using dovecot fts plugin, and that -- with your aforementioned patch -- doveadm successfully uses the resulting indexes? i hadn't yet seriously considered _circumventing_ fts plugin; if this^ does get resolved soonish, then it's not a big deal. if not, an fts-plugin-less setup would be interesting to know more abt!> Can you get evidence of things not working? For example tests run with > soft_commit configured - that's important since without it the updates > don't show up immediately in searches, that do show that the update is > happening in solr via solr log, but then search is not working on > accented characters, despite it working on other text in the same > message? The solr logs also show whether the text was found or not via > the "hits=" value in the logged searches, for example: > > 2020-11-01 08:32:42.231 INFO? (qtp24119573-21) [?? x:dovecot] > o.a.s.c.S.Request [dovecot]? webapp=/solr path=/select > params={q={!lucene+q.op%3DAND}body:tambi?n&fl=uid,score&sort=uid+asc&fq=%2Bbox:b1626f0fe8d9145e54100000c54a863a+%2Buser:john at voipsupport.it&rows=3202&wt=xml} > hits=3 status=0 QTime=3 > > But if no hits are found, then dovecot cannot be expected to display > results. It still may be an indexing problem though.my current config has soft_commit enabled, fts_solr = url=https://solr.example.com:8984/solr/dovecot/ use_libfts soft_commit=yes batch_size=250 i'll see abt getting some clearer test results ...
John Fawcett
2020-Nov-01 18:35 UTC
v2.3.11.3 solr plugin search via MUA fails to match accented ascii characters; cmd line exec of `doveadm fts lookup` PANICs (assertion failed) [proposed patch]
On 01/11/2020 15:20, PGNet Dev wrote:> On 11/1/20 1:56 AM, John Fawcett wrote: >> At the moment I don't see other corrections needed in dovecot apart from >> command line doveadm fts which is not a show stopper. Via doveadm search >> I confirm - on my simple config - that search for accented or non >> accented characters works correctly as it does via imap connection. > > thx.? hopefully it'll get considered for a next release soon. > >> Only thing I cannot vouch for is bringing dovecot fts library and config >> into the equation because my setup delegates almost everything to solr. > > do i understand correctly that you're solr-indexing your dovecot mail > store withOUT using dovecot fts plugin, and that -- with your > aforementioned patch -- doveadm successfully uses the resulting indexes? > > i hadn't yet seriously considered _circumventing_ fts plugin; if this^ > does get resolved soonish, then it's not a big deal.? if not, an > fts-plugin-less setup would be interesting to know more abt! > >> Can you get evidence of things not working? For example tests run with >> soft_commit configured - that's important since without it the updates >> don't show up immediately in searches, that do show that the update is >> happening in solr via solr log, but then search is not working on >> accented characters, despite it working on other text in the same >> message? The solr logs also show whether the text was found or not via >> the "hits=" value in the logged searches, for example: >> >> 2020-11-01 08:32:42.231 INFO? (qtp24119573-21) [?? x:dovecot] >> o.a.s.c.S.Request [dovecot]? webapp=/solr path=/select >> params={q={!lucene+q.op%3DAND}body:tambi?n&fl=uid,score&sort=uid+asc&fq=%2Bbox:b1626f0fe8d9145e54100000c54a863a+%2Buser:john at voipsupport.it&rows=3202&wt=xml} >> >> hits=3 status=0 QTime=3 >> >> But if no hits are found, then dovecot cannot be expected to display >> results. It still may be an indexing problem though. > > my current config has soft_commit enabled, > > ? fts_solr = url=https://solr.example.com:8984/solr/dovecot/ > use_libfts soft_commit=yes batch_size=250 > > i'll see abt getting some clearer test results ...Yes, getting more data about any potential problem would be useful. Just to clarify: I have a fully working search setup for some time now over various dovecot releases, so no patches needed to get it working. My setup does use fts plugin and fts-solr plugin, but it does not use lib-fts functionality (that has many features for example it was stopping you indexing excluded words like tambien). On my setup without lib-fts everything goes to solr which does the work of indexing without all the features of lib-fts. My setup is like this not because of issues in lib-fts, but because I never had the need for it. There is no evidence at the moment however that even with lib-fts enabled there are issues with dovecot indexing or searching. What is currently nor working is "doveadm fts" command line utility. But this is mitigated by being able to use a similar command line utility "doveadm search". The issue on "dovecot fts" command line utility has (so far as the available evidence suggests) no effect on indexing or imap searches. fyi my working configuration includes fts and fts_solr plugins mail_plugins =? quota notify replication fts fts_solr (and those are also recalled in the various specific plugin settings for imap, lmtp ecc), The specific config I am using for fts and fts_solr is: ? fts = solr ? fts_enforced = yes ? fts_solr = url=https://user at server.example.com:443/solr/dovecot/ batch_size=500 soft_commit=no BTW I use soft_commit=no because I have periodic soft commits setup on solr and I accept that newly indexed text won't become searchable for up to that interval, but for your testing purposes much more useful as you have it. John
PGNet Dev
2020-Nov-02 16:40 UTC
v2.3.11.3 solr plugin search via MUA fails to match accented ascii characters; cmd line exec of `doveadm fts lookup` PANICs (assertion failed) [proposed patch]
On 11/1/20 10:35 AM, John Fawcett wrote:> Yes, getting more data about any potential problem would be useful. > > Just to clarify: I have a fully working search setup for some time now > over various dovecot releases, so no patches needed to get it working. > > My setup does use fts plugin and fts-solr plugin, but it does not use > lib-fts functionality (that has many features for example it was > stopping you indexing excluded words like tambien). On my setup without > lib-fts everything goes to solr which does the work of indexing without > all the features of lib-fts.withOUT libfts - fts_solr = url=https://solr.presence-group.net:8984/solr/dovecot/ use_libfts soft_commit=yes batch_size=250 + fts_solr = url=https://solr.presence-group.net:8984/solr/dovecot/ soft_commit=yes batch_size=250 and unmodified dovecot-provided schema/config, /bin/cp -af /usr/share/doc/dovecot/solr-config-7.7.0.xml /path/to/solr/data/dovecot/conf/solrconfig.xml /bin/cp -af /usr/share/doc/dovecot/solr-schema-7.7.0.xml /path/to/solr/data/dovecot/conf/schema.xml i suspect my config's now more similar to yours. checking, doveadm fts rescan -u testuser at example.com doveadm index -u testuser at example.com -q '*' as before doveadm fts lookup -u testuser at example.com body "t?sting" panics, doveadm(testuser at example.com): Panic: file mail-storage.c: line 2112 (mailbox_get_open_status): assertion failed: (box->opened) doveadm(testuser at example.com): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(backtrace_append+0x46) [0x7f7829b81cc6] -> /usr/lib64/dovecot/libdovecot.so.0(backtrace_get+0x22) [0x7f7829b81de2] -> /usr/lib64/dovecot/libdovecot.so.0(+0x10025b) [0x7f7829b8b25b] -> /usr/lib64/dovecot/libdovecot.so.0(+0x100297) [0x7f7829b8b297] -> /usr/lib64/dovecot/libdovecot.so.0(+0x59bc6) [0x7f7829ae4bc6] -> /usr/lib64/dovecot/libdovecot-storage.so.0(+0x4779e) [0x7f7829c9879e] -> /usr/lib64/dovecot/lib21_fts_solr_plugin.so(+0x5849) [0x7f78296ea849] -> /usr/lib64/dovecot/lib20_fts_plugin.so(fts_backend_lookup+0x51) [0x7f782930b7c1] -> /usr/lib64/dovecot/doveadm/lib20_doveadm_fts_plugin.so(+0x3280) [0x7f78270d0280] -> doveadm(+0x343cd) [0x55aa57edc3cd] -> doveadm(+0x34fe0) [0x55aa57edcfe0] -> doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x22d) [0x55aa57edde2d] -> doveadm(doveadm_cmd_run_ver2+0x4e8) [0x55aa57eee8d8] -> doveadm(doveadm_cmd_try_run_ver2+0x3e) [0x55aa57eee92e] -> doveadm(main+0x1d4) [0x55aa57ecccf4] -> /lib64/libc.so.6(__libc_start_main+0xf2) [0x7f7829746042] -> doveadm(_start+0x2e) [0x55aa57ecd1ce] Aborted but search, even for accented characters, doveadm search -u testuser at example.com subject "t?sting" 42d73837f133a05fad4d0000f8839f03 1 813ef60e984f1b5f5fc200005439fba4 293 doveadm search -u testuser at example.com body "t?sting" ba899d0cfe33a05fbe4d0000f8839f03 1 813ef60e984f1b5f5fc200005439fba4 293 appears to work. next, to get tokenization -- at least email/url (UAX29URLEmailTokenizer) -- and lowercase & icu normalization working and verified.
Possibly Parallel Threads
- v2.3.11.3 solr plugin search via MUA fails to match accented ascii characters; cmd line exec of `doveadm fts lookup` PANICs (assertion failed) [proposed patch]
- v2.3.11.3 solr plugin search via MUA fails to match accented ascii characters; cmd line exec of `doveadm fts lookup` PANICs (assertion failed) [proposed patch]
- v2.3.11.3 solr plugin search via MUA fails to match accented ascii characters; cmd line exec of `doveadm fts lookup` PANICs (assertion failed) [proposed patch]
- v2.3.11.3 solr plugin search via MUA fails to match accented ascii characters; cmd line exec of `doveadm fts lookup` PANICs (assertion failed)
- v2.3.11.3 solr plugin search via MUA fails to match accented ascii characters; cmd line exec of `doveadm fts lookup` PANICs (assertion failed)