On 22/10/2020 10:23, John Fawcett wrote:> On 21/10/2020 19:00, John Fawcett wrote: >> On 21/10/2020 16:44, Patrik Peng wrote: >>> On 16.10.20 18:34, Patrik Peng wrote: >>>> On 16.10.20 18:00, Scott Q. wrote: >>>>> This reminds me, the way I was able to reproduce this consistently >>>>> was by having large headers ( 100+ lines ). >>>>> >>>>> >>>>> On Friday, 16/10/2020 at 11:49 Patrik Peng wrote: >>>>> >>>>> On 19.08.20 17:37, Josef 'Jeff' Sipek wrote: >>>>> >>>>>> On Wed, Aug 19, 2020 at 17:03:57 +0200, Alessio Cecchi wrote: >>>>>>> Hi, >>>>>>> after the upgrade to Dovecot 2.3.11.3, from 2.3.10.1, I see frequently >>>>>>> these errors from different users: >>>>>> It looks like this has been around for a while and you just got unlucky and >>>>>> started seeing this now. Here's a quick & dirty patch that should fix this. >>>>>> If you can try it, let us know how it went. >>>>>> Jeff. >>>>>> >>>>>> diff --git a/src/plugins/fts-solr/solr-connection.c b/src/plugins/fts-solr/solr-connection.c >>>>>> index ae720b5e2870a852c1b6c440939e3c7c0fa72b5c..9d364f93e2cd1b716b9ab61bd39656a6c5b1ea04 100644 >>>>>> --- a/src/plugins/fts-solr/solr-connection.c >>>>>> +++ b/src/plugins/fts-solr/solr-connection.c >>>>>> @@ -103,7 +103,7 @@ int solr_connection_init(const struct fts_solr_settings *solr_set, >>>>>> http_set.ssl = ssl_client_set; >>>>>> http_set.debug = solr_set->debug; >>>>>> http_set.rawlog_dir = solr_set->rawlog_dir; >>>>>> - solr_http_client = http_client_init(&http_set); >>>>>> + solr_http_client = http_client_init_private(&http_set); >>>>>> } >>>>>> *conn_r = conn; >>>>>> diff --git a/src/plugins/fts/fts-parser-tika.c b/src/plugins/fts/fts-parser-tika.c >>>>>> index a4b8b5c3034f57e22e77caa759c090da6b62f8ba..b8b57a350b9a710d101ac7ccbcc14560d415d905 100644 >>>>>> --- a/src/plugins/fts/fts-parser-tika.c >>>>>> +++ b/src/plugins/fts/fts-parser-tika.c >>>>>> @@ -77,7 +77,7 @@ tika_get_http_client_url(struct mail_user *user, struct http_url **http_url_r) >>>>>> http_set.request_timeout_msecs = 60*1000; >>>>>> http_set.ssl = &ssl_set; >>>>>> http_set.debug = user->mail_debug; >>>>>> - tika_http_client = http_client_init(&http_set); >>>>>> + tika_http_client = http_client_init_private(&http_set); >>>>>> } >>>>>> *http_url_r = tuser->http_url; >>>>>> return 0; >>>>> >>>>> Greetings >>>>> >>>>> I'm also experiencing these issues while running Dovecot >>>>> 2.3.11.3 with Solr 8.6.3 on FreeBSD 11.4. As mentioned in a >>>>> previous mail, the above patch is already applied to Dovecot's >>>>> FreeBSD Port, confirmed by the patches being present in the >>>>> portstree >>>>> (https://svnweb.freebsd.org/ports/branches/2020Q3/mail/dovecot/files/). >>>>> >>>>> In a FreeBSD VM with the official image >>>>> (https://download.freebsd.org/ftp/releases/VM-IMAGES/12.1-RELEASE/amd64/Latest/) >>>>> I compiled dovecot from git and was able to reproduce the >>>>> error with the patch mentioned above applied and also without >>>>> any patches at all. From these results i conclude, that >>>>> neither the patches applied in FreeBSDs portstree or the patch >>>>> above have any influence. >>>>> >>>>> I also managed to reproduce the same results on a Debian 10 >>>>> machine (also with and without the patch): >>>>> >>>>> doveadm(some.user at example.com): Panic: file http-client-request.c: line 1232 (http_client_request_send_more): assertion failed: (req->payload_input != NULL) >>>>> doveadm(some.user at example.com): Error: Raw backtrace: /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(backtrace_append+0x42) [0x7f093f7fc3c2] >>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(backtrace_get+0x1e) [0x7f093f7fc4ce] -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(+0xea341) [0x7f093f807341] >>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(+0xea381) [0x7f093f807381] -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(i_fatal+0) [0x7f093f75c074] >>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(http_client_request_send_more+0x378) [0x7f093f7a47a8] >>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(http_client_connection_output+0xe4) [0x7f093f7a90f4] >>>>> -> /usr/local/lib/dovecot/libssl_iostream_openssl.so <http://openssl.so>(+0x8bff) [0x7f093ec71bff] >>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(+0x1148b0) [0x7f093f8318b0] >>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(io_loop_call_io+0x69) [0x7f093f820259] >>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(io_loop_handler_run_internal+0x11b) [0x7f093f821b6b] >>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(io_loop_handler_run+0x59) [0x7f093f820369] >>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(io_loop_run+0x38) [0x7f093f820598] >>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(+0x86d1e) [0x7f093f7a3d1e] >>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(http_client_request_finish_payload+0x2e) [0x7f093f7a407e] >>>>> -> /usr/local/lib/dovecot/lib21_fts_solr_plugin.so <http://plugin.so>(solr_connection_post_end+0x32) [0x7f093b8492c2] >>>>> -> /usr/local/lib/dovecot/lib21_fts_solr_plugin.so <http://plugin.so>(+0x3a45) [0x7f093b844a45] >>>>> -> /usr/local/lib/dovecot/lib20_fts_plugin.so <http://plugin.so>(+0x94cc) [0x7f093e1104cc] >>>>> -> /usr/local/lib/dovecot/lib20_fts_plugin.so <http://plugin.so>(fts_backend_update_deinit+0x23) [0x7f093e110503] >>>>> -> /usr/local/lib/dovecot/lib20_fts_plugin.so <http://plugin.so>(+0x10a9b) [0x7f093e117a9b] >>>>> -> /usr/local/lib/dovecot/lib20_fts_plugin.so <http://plugin.so>(+0x119ca) [0x7f093e1189ca] >>>>> -> /usr/local/lib/dovecot/libdovecot-storage.so <http://libdovecot-storage.so>.0(mailbox_transaction_commit_get_changes+0x56) [0x7f093fb16076] >>>>> -> /usr/local/lib/dovecot/libdovecot-storage.so <http://libdovecot-storage.so>.0(mailbox_transaction_commit+0x1e) [0x7f093fb1615e] >>>>> -> doveadm(+0x31370) [0x5607cfa1f370] -> doveadm(+0x2b2a8) [0x5607cfa192a8] >>>>> -> doveadm(+0x2bfb2) [0x5607cfa19fb2] -> doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x215) [0x5607cfa1ae05] >>>>> -> doveadm(doveadm_cmd_run_ver2+0x57c) [0x5607cfa2bbec] -> doveadm(doveadm_cmd_try_run_ver2+0x37) [0x5607cfa2bc37] >>>>> -> doveadm(main+0x1d2) [0x5607cfa09492] >>>>> Aborted >>>>> >>>>> During my tests I also did notice, that the error appears more >>>>> often depending of mail size and amount of mails in a folder: >>>>> >>>>> Tested with: doveadm -v fts rescan -u some.user at example.com && doveadm -v index -u some.user at example.com '*' >>>>> 1 Mail in INBOX with 9KB -> Error appeared 0 out of 20 times >>>>> 1 Mail in INBOX with 136KB -> Error appeared 17 out of 20 times >>>>> 3 Mails in INBOX with 408KB -> Error appeared 12 out of 20 times >>>>> 20 Mails in INBOX with ~2MB -> Error appeared 0 out of 20 times >>>>> >>>>> Maybe this info helps anyone. >>>>> >>>>> Patrik >>>>> >>>> Yeah, I read your mail and that's why I tested with different mail >>>> sizes. >>>> I did some more tests, one with large headers (around 700 lines of >>>> a long header line) but small body: >>>> >>>> 1 Mail in INBOX with 583KB -> Error appeared 5 out of 20 times >>>> >>>> and another with a large mail body but normal headers: >>>> >>>> 1 Mail in INBOX with 585KB -> Error appeared 0 out of 20 times >>>> I guess this kinda confirms your guess, but interestingly shows >>>> less errors than my previous test with a large header 136KB Mail. >>>> >>>> >>>> Patrik >>>> >>> I did notice that changing the batch_size in >>> >>> fts_solr = url=http://solr.example.org:8983/solr/ soft_commit=yes batch_size=1000 >>> >>> does have an influence in how often the error occurs. Setting it to >>> 1 or some huge number like 10000 reduces the chances quite a bit but >>> not completely and also causes lots of small , or few but quite >>> large requests to Solr, so its not a practical workaround. >>> >>> Whats the current state of this bug? A fix for it would be very >>> welcome, as it causes some trouble in our setup. >>> >>> >>> Regards >>> >> Hi >> >> would someone care to post (preferably a link) to a test email with >> which the problem can be reproduced? >> >> thanks >> >> Joh >> > I can confirm that this is an error that was introduced in 2.3.11.3. > It did not happen on 2.3.10. I see it occurring on a Centos 7 > installation compiled from source. Between these two releases I didn't > see changes on the fts-solr code, but I do see some changes on the > lib-http code. I suspect that the changes done on lib-http have > introduced this issue. > > One temporary solution might be to roll back the relevant commits that > impacted lib-http and see if the problem resolves. Given the info in > this thread about the records arriving on solr, I'm wondering if this > is just a problem where the data has been split into more than one > request, but the last one has no data left in it. > > John >Hi I did a bit of tracing. From the albeit limited examples that I saw the panic and assert segfault always happens when? req->payload_finished is true. If that observation is correct in general, then the patch is simple, just put the test for payload_finished before the assert for payload_input and payload_output. Would anyone with the problem care to try it and give feedback? diff -ur dovecot-2.3.11.3-orig/src/lib-http/http-client-request.c dovecot-2.3.11.3/src/lib-http/http-client-request.c --- dovecot-2.3.11.3-orig/src/lib-http/http-client-request.c??? 2020-08-12 14:20:41.000000000 +0200 +++ dovecot-2.3.11.3/src/lib-http/http-client-request.c 2020-10-27 13:06:09.352973130 +0100 @@ -1229,12 +1229,12 @@ ??????? const char *error; ??????? uoff_t offset; -?????? i_assert(req->payload_input != NULL); -?????? i_assert(req->payload_output != NULL); - ??????? if (req->payload_finished) ??????????????? return http_client_request_finish_payload_out(req); +?????? i_assert(req->payload_input != NULL); +?????? i_assert(req->payload_output != NULL); + ??????? io_remove(&conn->io_req_payload); ??????? /* chunked ostream needs to write to the parent stream's buffer */ -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20201027/b7235a2c/attachment-0001.html>
I have implemented this change and the core dumps went away. It appears so far that it's indeed the right fix. Has this change been introduced in 2.3.11.3 ?? Thanks! On Tuesday, 27/10/2020 at 08:20 John Fawcett wrote: On 22/10/2020 10:23, John Fawcett wrote: On 21/10/2020 19:00, John Fawcett wrote: On 21/10/2020 16:44, Patrik Peng wrote: On 16.10.20 18:34, Patrik Peng wrote: On 16.10.20 18:00, Scott Q. wrote: This reminds me, the way I was able to reproduce this consistently was by having large headers ( 100+ lines ). On Friday, 16/10/2020 at 11:49 Patrik Peng wrote: On 19.08.20 17:37, Josef 'Jeff' Sipek wrote: On Wed, Aug 19, 2020 at 17:03:57 +0200, Alessio Cecchi wrote: Hi, after the upgrade to Dovecot 2.3.11.3, from 2.3.10.1, I see frequently these errors from different users: It looks like this has been around for a while and you just got unlucky and started seeing this now. Here's a quick & dirty patch that should fix this. If you can try it, let us know how it went. Jeff. diff --git a/src/plugins/fts-solr/solr-connection.c b/src/plugins/fts-solr/solr-connection.c index ae720b5e2870a852c1b6c440939e3c7c0fa72b5c..9d364f93e2cd1b716b9ab61bd39656a6c5b1ea04 100644 --- a/src/plugins/fts-solr/solr-connection.c +++ b/src/plugins/fts-solr/solr-connection.c @@ -103,7 +103,7 @@ int solr_connection_init(const struct fts_solr_settings *solr_set, http_set.ssl = ssl_client_set; http_set.debug = solr_set->debug; http_set.rawlog_dir = solr_set->rawlog_dir; -solr_http_client http_client_init(&http_set); +solr_http_client http_client_init_private(&http_set); } *conn_r = conn; diff --git a/src/plugins/fts/fts-parser-tika.c b/src/plugins/fts/fts-parser-tika.c index a4b8b5c3034f57e22e77caa759c090da6b62f8ba..b8b57a350b9a710d101ac7ccbcc14560d415d905 100644 --- a/src/plugins/fts/fts-parser-tika.c +++ b/src/plugins/fts/fts-parser-tika.c @@ -77,7 +77,7 @@ tika_get_http_client_url(struct mail_user *user, struct http_url **http_url_r) http_set.request_timeout_msecs = 60*1000; http_set.ssl &ssl_set; http_set.debug = user->mail_debug; -tika_http_client http_client_init(&http_set); +tika_http_client http_client_init_private(&http_set); } *http_url_r = tuser->http_url; return 0; Greetings I'm also experiencing these issues while running Dovecot 2.3.11.3 with Solr 8.6.3 on FreeBSD 11.4. As mentioned in a previous mail, the above patch is already applied to Dovecot's FreeBSD Port, confirmed by the patches being present in the portstree (https://svnweb.freebsd.org/ports/branches/2020Q3/mail/dovecot/files/). In a FreeBSD VM with the official image (https://download.freebsd.org/ftp/releases/VM-IMAGES/12.1-RELEASE/amd64/Latest/) I compiled dovecot from git and was able to reproduce the error with the patch mentioned above applied and also without any patches at all.>From these results i conclude, that neither the patches applied inFreeBSDs portstree or the patch above have any influence. I also managed to reproduce the same results on a Debian 10 machine (also with and without the patch): doveadm(some.user at example.com [1]): Panic: file http-client-request.c: line 1232 (http_client_request_send_more): assertion failed: (req->payload_input != NULL) doveadm(some.user at example.com [1]): Error: Raw backtrace: /usr/local/lib/dovecot/libdovecot.so [2].0(backtrace_append+0x42) [0x7f093f7fc3c2] -> /usr/local/lib/dovecot/libdovecot.so [2].0(backtrace_get+0x1e) [0x7f093f7fc4ce] -> /usr/local/lib/dovecot/libdovecot.so [2].0(+0xea341) [0x7f093f807341] -> /usr/local/lib/dovecot/libdovecot.so [2].0(+0xea381) [0x7f093f807381] -> /usr/local/lib/dovecot/libdovecot.so [2].0(i_fatal+0) [0x7f093f75c074] -> /usr/local/lib/dovecot/libdovecot.so [2].0(http_client_request_send_more+0x378) [0x7f093f7a47a8] -> /usr/local/lib/dovecot/libdovecot.so [2].0(http_client_connection_output+0xe4) [0x7f093f7a90f4] -> /usr/local/lib/dovecot/libssl_iostream_openssl.so [3](+0x8bff) [0x7f093ec71bff] -> /usr/local/lib/dovecot/libdovecot.so [2].0(+0x1148b0) [0x7f093f8318b0] -> /usr/local/lib/dovecot/libdovecot.so [2].0(io_loop_call_io+0x69) [0x7f093f820259] -> /usr/local/lib/dovecot/libdovecot.so [2].0(io_loop_handler_run_internal+0x11b) [0x7f093f821b6b] -> /usr/local/lib/dovecot/libdovecot.so [2].0(io_loop_handler_run+0x59) [0x7f093f820369] -> /usr/local/lib/dovecot/libdovecot.so [2].0(io_loop_run+0x38) [0x7f093f820598] -> /usr/local/lib/dovecot/libdovecot.so [2].0(+0x86d1e) [0x7f093f7a3d1e] -> /usr/local/lib/dovecot/libdovecot.so [2].0(http_client_request_finish_payload+0x2e) [0x7f093f7a407e] -> /usr/local/lib/dovecot/lib21_fts_solr_plugin.so [4](solr_connection_post_end+0x32) [0x7f093b8492c2] -> /usr/local/lib/dovecot/lib21_fts_solr_plugin.so [4](+0x3a45) [0x7f093b844a45] -> /usr/local/lib/dovecot/lib20_fts_plugin.so [4](+0x94cc) [0x7f093e1104cc] -> /usr/local/lib/dovecot/lib20_fts_plugin.so [4](fts_backend_update_deinit+0x23) [0x7f093e110503] -> /usr/local/lib/dovecot/lib20_fts_plugin.so [4](+0x10a9b) [0x7f093e117a9b] -> /usr/local/lib/dovecot/lib20_fts_plugin.so [4](+0x119ca) [0x7f093e1189ca] -> /usr/local/lib/dovecot/libdovecot-storage.so [5].0(mailbox_transaction_commit_get_changes+0x56) [0x7f093fb16076] -> /usr/local/lib/dovecot/libdovecot-storage.so [5].0(mailbox_transaction_commit+0x1e) [0x7f093fb1615e] -> doveadm(+0x31370) [0x5607cfa1f370] -> doveadm(+0x2b2a8) [0x5607cfa192a8] -> doveadm(+0x2bfb2) [0x5607cfa19fb2] -> doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x215) [0x5607cfa1ae05] -> doveadm(doveadm_cmd_run_ver2+0x57c) [0x5607cfa2bbec] -> doveadm(doveadm_cmd_try_run_ver2+0x37) [0x5607cfa2bc37] -> doveadm(main+0x1d2) [0x5607cfa09492] Aborted During my tests I also did notice, that the error appears more often depending of mail size and amount of mails in a folder: Tested with: doveadm -v fts rescan -u some.user at example.com && doveadm -v index -u some.user at example.com '*' 1 Mail in INBOX with 9KB -> Error appeared 0 out of 20 times 1 Mail in INBOX with 136KB -> Error appeared 17 out of 20 times 3 Mails in INBOX with 408KB -> Error appeared 12 out of 20 times 20 Mails in INBOX with ~2MB -> Error appeared 0 out of 20 times Maybe this info helps anyone. Patrik Yeah, I read your mail and that's why I tested with different mail sizes. I did some more tests, one with large headers (around 700 lines of a long header line) but small body: 1 Mail in INBOX with 583KB -> Error appeared 5 out of 20 times and another with a large mail body but normal headers: 1 Mail in INBOX with 585KB -> Error appeared 0 out of 20 times I guess this kinda confirms your guess, but interestingly shows less errors than my previous test with a large header 136KB Mail. Patrik I did notice that changing the batch_size in fts_solr = url=http://solr.example.org:8983/solr/ soft_commit=yes batch_size=1000 does have an influence in how often the error occurs. Setting it to 1 or some huge number like 10000 reduces the chances quite a bit but not completely and also causes lots of small , or few but quite large requests to Solr, so its not a practical workaround. Whats the current state of this bug? A fix for it would be very welcome, as it causes some trouble in our setup. Regards Hi would someone care to post (preferably a link) to a test email with which the problem can be reproduced? thanks Joh I can confirm that this is an error that was introduced in 2.3.11.3. It did not happen on 2.3.10. I see it occurring on a Centos 7 installation compiled from source. Between these two releases I didn't see changes on the fts-solr code, but I do see some changes on the lib-http code. I suspect that the changes done on lib-http have introduced this issue. One temporary solution might be to roll back the relevant commits that impacted lib-http and see if the problem resolves. Given the info in this thread about the records arriving on solr, I'm wondering if this is just a problem where the data has been split into more than one request, but the last one has no data left in it. John Hi I did a bit of tracing. From the albeit limited examples that I saw the panic and assert segfault always happens when? req->payload_finished is true. If that observation is correct in general, then the patch is simple, just put the test for payload_finished before the assert for payload_input and payload_output. Would anyone with the problem care to try it and give feedback? diff -ur dovecot-2.3.11.3-orig/src/lib-http/http-client-request.c dovecot-2.3.11.3/src/lib-http/http-client-request.c --- dovecot-2.3.11.3-orig/src/lib-http/http-client-request.c??? 2020-08-12 14:20:41.000000000 +0200 +++ dovecot-2.3.11.3/src/lib-http/http-client-request.c 2020-10-27 13:06:09.352973130 +0100 @@ -1229,12 +1229,12 @@ ??????? const char *error; ??????? uoff_t offset; -?????? i_assert(req->payload_input != NULL); -?????? i_assert(req->payload_output != NULL); - ??????? if (req->payload_finished) ??????????????? return http_client_request_finish_payload_out(req); +?????? i_assert(req->payload_input != NULL); +?????? i_assert(req->payload_output != NULL); + ??????? io_remove(&conn->io_req_payload); ??????? /* chunked ostream needs to write to the parent stream's buffer */ Links: ------ [1] mailto:some.user2 at example.com [2] http://libdovecot.so [3] http://openssl.so [4] http://plugin.so [5] http://libdovecot-storage.so -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20201030/5591abb5/attachment-0001.html>
On 31/10/2020 00:18, Scott Q. wrote:> I have implemented this change and the core dumps went away. > > It appears so far that it's indeed the right fix. > > Has this change been introduced in 2.3.11.3 ?? > > Thanks!I've also been running it for 4 days without any further panics / segfaults and no noticed adverse effects. I'd recommend the dovecot team to consider including the patch in the next update. John diff -ur dovecot-2.3.11.3-orig/src/lib-http/http-client-request.c dovecot-2.3.11.3/src/lib-http/http-client-request.c --- dovecot-2.3.11.3-orig/src/lib-http/http-client-request.c??? 2020-08-12 14:20:41.000000000 +0200 +++ dovecot-2.3.11.3/src/lib-http/http-client-request.c 2020-10-27 13:06:09.352973130 +0100 @@ -1229,12 +1229,12 @@ ??????? const char *error; ??????? uoff_t offset; -?????? i_assert(req->payload_input != NULL); -?????? i_assert(req->payload_output != NULL); - ??????? if (req->payload_finished) ??????????????? return http_client_request_finish_payload_out(req); +?????? i_assert(req->payload_input != NULL); +?????? i_assert(req->payload_output != NULL); + ??????? io_remove(&conn->io_req_payload); ??????? /* chunked ostream needs to write to the parent stream's buffer */ -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20201031/5b418979/attachment.html>
On 31.10.20 00:18, Scott Q. wrote:> I have implemented this change and the core dumps went away. > > It appears so far that it's indeed the right fix. > > Has this change been introduced in 2.3.11.3 ? > > Thanks! > > On Tuesday, 27/10/2020 at 08:20 John Fawcett wrote: > > On 22/10/2020 10:23, John Fawcett wrote: >> On 21/10/2020 19:00, John Fawcett wrote: >>> On 21/10/2020 16:44, Patrik Peng wrote: >>>> On 16.10.20 18:34, Patrik Peng wrote: >>>>> On 16.10.20 18:00, Scott Q. wrote: >>>>>> This reminds me, the way I was able to reproduce this >>>>>> consistently was by having large headers ( 100+ lines ). >>>>>> >>>>>> >>>>>> On Friday, 16/10/2020 at 11:49 Patrik Peng wrote: >>>>>> >>>>>> On 19.08.20 17:37, Josef 'Jeff' Sipek wrote: >>>>>> >>>>>>> On Wed, Aug 19, 2020 at 17:03:57 +0200, Alessio Cecchi wrote: >>>>>>>> Hi, >>>>>>>> after the upgrade to Dovecot 2.3.11.3, from 2.3.10.1, I see frequently >>>>>>>> these errors from different users: >>>>>>> It looks like this has been around for a while and you just got unlucky and >>>>>>> started seeing this now. Here's a quick & dirty patch that should fix this. >>>>>>> If you can try it, let us know how it went. >>>>>>> Jeff. >>>>>>> diff --git a/src/plugins/fts-solr/solr-connection.c b/src/plugins/fts-solr/solr-connection.c >>>>>>> index ae720b5e2870a852c1b6c440939e3c7c0fa72b5c..9d364f93e2cd1b716b9ab61bd39656a6c5b1ea04 100644 >>>>>>> --- a/src/plugins/fts-solr/solr-connection.c >>>>>>> +++ b/src/plugins/fts-solr/solr-connection.c >>>>>>> @@ -103,7 +103,7 @@ int solr_connection_init(const struct fts_solr_settings *solr_set, >>>>>>> http_set.ssl = ssl_client_set; >>>>>>> http_set.debug = solr_set->debug; >>>>>>> http_set.rawlog_dir = solr_set->rawlog_dir; >>>>>>> - solr_http_client = http_client_init(&http_set); >>>>>>> + solr_http_client = http_client_init_private(&http_set); >>>>>>> } >>>>>>> *conn_r = conn; >>>>>>> diff --git a/src/plugins/fts/fts-parser-tika.c b/src/plugins/fts/fts-parser-tika.c >>>>>>> index a4b8b5c3034f57e22e77caa759c090da6b62f8ba..b8b57a350b9a710d101ac7ccbcc14560d415d905 100644 >>>>>>> --- a/src/plugins/fts/fts-parser-tika.c >>>>>>> +++ b/src/plugins/fts/fts-parser-tika.c >>>>>>> @@ -77,7 +77,7 @@ tika_get_http_client_url(struct mail_user *user, struct http_url **http_url_r) >>>>>>> http_set.request_timeout_msecs = 60*1000; >>>>>>> http_set.ssl = &ssl_set; >>>>>>> http_set.debug = user->mail_debug; >>>>>>> - tika_http_client = http_client_init(&http_set); >>>>>>> + tika_http_client = http_client_init_private(&http_set); >>>>>>> } >>>>>>> *http_url_r = tuser->http_url; >>>>>>> return 0; >>>>>> >>>>>> Greetings >>>>>> >>>>>> I'm also experiencing these issues while running Dovecot >>>>>> 2.3.11.3 with Solr 8.6.3 on FreeBSD 11.4. As mentioned in >>>>>> a previous mail, the above patch is already applied to >>>>>> Dovecot's FreeBSD Port, confirmed by the patches being >>>>>> present in the portstree >>>>>> (https://svnweb.freebsd.org/ports/branches/2020Q3/mail/dovecot/files/). >>>>>> >>>>>> In a FreeBSD VM with the official image >>>>>> (https://download.freebsd.org/ftp/releases/VM-IMAGES/12.1-RELEASE/amd64/Latest/) >>>>>> I compiled dovecot from git and was able to reproduce the >>>>>> error with the patch mentioned above applied and also >>>>>> without any patches at all. From these results i >>>>>> conclude, that neither the patches applied in FreeBSDs >>>>>> portstree or the patch above have any influence. >>>>>> >>>>>> I also managed to reproduce the same results on a Debian >>>>>> 10 machine (also with and without the patch): >>>>>> >>>>>> doveadm(some.user at example.com): Panic: file http-client-request.c: line 1232 (http_client_request_send_more): assertion failed: (req->payload_input != NULL) >>>>>> doveadm(some.user at example.com): Error: Raw backtrace: /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(backtrace_append+0x42) [0x7f093f7fc3c2] >>>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(backtrace_get+0x1e) [0x7f093f7fc4ce] -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(+0xea341) [0x7f093f807341] >>>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(+0xea381) [0x7f093f807381] -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(i_fatal+0) [0x7f093f75c074] >>>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(http_client_request_send_more+0x378) [0x7f093f7a47a8] >>>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(http_client_connection_output+0xe4) [0x7f093f7a90f4] >>>>>> -> /usr/local/lib/dovecot/libssl_iostream_openssl.so <http://openssl.so>(+0x8bff) [0x7f093ec71bff] >>>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(+0x1148b0) [0x7f093f8318b0] >>>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(io_loop_call_io+0x69) [0x7f093f820259] >>>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(io_loop_handler_run_internal+0x11b) [0x7f093f821b6b] >>>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(io_loop_handler_run+0x59) [0x7f093f820369] >>>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(io_loop_run+0x38) [0x7f093f820598] >>>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(+0x86d1e) [0x7f093f7a3d1e] >>>>>> -> /usr/local/lib/dovecot/libdovecot.so <http://libdovecot.so>.0(http_client_request_finish_payload+0x2e) [0x7f093f7a407e] >>>>>> -> /usr/local/lib/dovecot/lib21_fts_solr_plugin.so <http://plugin.so>(solr_connection_post_end+0x32) [0x7f093b8492c2] >>>>>> -> /usr/local/lib/dovecot/lib21_fts_solr_plugin.so <http://plugin.so>(+0x3a45) [0x7f093b844a45] >>>>>> -> /usr/local/lib/dovecot/lib20_fts_plugin.so <http://plugin.so>(+0x94cc) [0x7f093e1104cc] >>>>>> -> /usr/local/lib/dovecot/lib20_fts_plugin.so <http://plugin.so>(fts_backend_update_deinit+0x23) [0x7f093e110503] >>>>>> -> /usr/local/lib/dovecot/lib20_fts_plugin.so <http://plugin.so>(+0x10a9b) [0x7f093e117a9b] >>>>>> -> /usr/local/lib/dovecot/lib20_fts_plugin.so <http://plugin.so>(+0x119ca) [0x7f093e1189ca] >>>>>> -> /usr/local/lib/dovecot/libdovecot-storage.so <http://libdovecot-storage.so>.0(mailbox_transaction_commit_get_changes+0x56) [0x7f093fb16076] >>>>>> -> /usr/local/lib/dovecot/libdovecot-storage.so <http://libdovecot-storage.so>.0(mailbox_transaction_commit+0x1e) [0x7f093fb1615e] >>>>>> -> doveadm(+0x31370) [0x5607cfa1f370] -> doveadm(+0x2b2a8) [0x5607cfa192a8] >>>>>> -> doveadm(+0x2bfb2) [0x5607cfa19fb2] -> doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x215) [0x5607cfa1ae05] >>>>>> -> doveadm(doveadm_cmd_run_ver2+0x57c) [0x5607cfa2bbec] -> doveadm(doveadm_cmd_try_run_ver2+0x37) [0x5607cfa2bc37] >>>>>> -> doveadm(main+0x1d2) [0x5607cfa09492] >>>>>> Aborted >>>>>> >>>>>> During my tests I also did notice, that the error appears >>>>>> more often depending of mail size and amount of mails in >>>>>> a folder: >>>>>> >>>>>> Tested with: doveadm -v fts rescan -usome.user at example.com && doveadm -v index -usome.user at example.com '*' >>>>>> 1 Mail in INBOX with 9KB -> Error appeared 0 out of 20 times >>>>>> 1 Mail in INBOX with 136KB -> Error appeared 17 out of 20 times >>>>>> 3 Mails in INBOX with 408KB -> Error appeared 12 out of 20 times >>>>>> 20 Mails in INBOX with ~2MB -> Error appeared 0 out of 20 times >>>>>> >>>>>> Maybe this info helps anyone. >>>>>> >>>>>> Patrik >>>>>> >>>>> Yeah, I read your mail and that's why I tested with different >>>>> mail sizes. >>>>> I did some more tests, one with large headers (around 700 >>>>> lines of a long header line) but small body: >>>>> >>>>> 1 Mail in INBOX with 583KB -> Error appeared 5 out of 20 times >>>>> >>>>> and another with a large mail body but normal headers: >>>>> >>>>> 1 Mail in INBOX with 585KB -> Error appeared 0 out of 20 times >>>>> I guess this kinda confirms your guess, but interestingly >>>>> shows less errors than my previous test with a large header >>>>> 136KB Mail. >>>>> >>>>> >>>>> Patrik >>>>> >>>> I did notice that changing the batch_size in >>>> >>>> fts_solr = url=http://solr.example.org:8983/solr/ soft_commit=yes batch_size=1000 >>>> >>>> does have an influence in how often the error occurs. Setting >>>> it to 1 or some huge number like 10000 reduces the chances >>>> quite a bit but not completely and also causes lots of small , >>>> or few but quite large requests to Solr, so its not a practical >>>> workaround. >>>> >>>> Whats the current state of this bug? A fix for it would be very >>>> welcome, as it causes some trouble in our setup. >>>> >>>> >>>> Regards >>>> >>> Hi >>> >>> would someone care to post (preferably a link) to a test email >>> with which the problem can be reproduced? >>> >>> thanks >>> >>> Joh >>> >> I can confirm that this is an error that was introduced in >> 2.3.11.3. It did not happen on 2.3.10. I see it occurring on a >> Centos 7 installation compiled from source. Between these two >> releases I didn't see changes on the fts-solr code, but I do see >> some changes on the lib-http code. I suspect that the changes >> done on lib-http have introduced this issue. >> >> One temporary solution might be to roll back the relevant commits >> that impacted lib-http and see if the problem resolves. Given the >> info in this thread about the records arriving on solr, I'm >> wondering if this is just a problem where the data has been split >> into more than one request, but the last one has no data left in it. >> >> John >> > Hi > > I did a bit of tracing. From the albeit limited examples that I > saw the panic and assert segfault always happens when? > req->payload_finished is true. If that observation is correct in > general, then the patch is simple, just put the test for > payload_finished before the assert for payload_input and > payload_output. Would anyone with the problem care to try it and > give feedback? > > diff -ur dovecot-2.3.11.3-orig/src/lib-http/http-client-request.c > dovecot-2.3.11.3/src/lib-http/http-client-request.c > --- dovecot-2.3.11.3-orig/src/lib-http/http-client-request.c > 2020-08-12 14:20:41.000000000 +0200 > +++ dovecot-2.3.11.3/src/lib-http/http-client-request.c 2020-10-27 > 13:06:09.352973130 +0100 > @@ -1229,12 +1229,12 @@ > ??????? const char *error; > ??????? uoff_t offset; > > -?????? i_assert(req->payload_input != NULL); > -?????? i_assert(req->payload_output != NULL); > - > ??????? if (req->payload_finished) > ??????????????? return http_client_request_finish_payload_out(req); > > +?????? i_assert(req->payload_input != NULL); > +?????? i_assert(req->payload_output != NULL); > + > ??????? io_remove(&conn->io_req_payload); > > ??????? /* chunked ostream needs to write to the parent stream's > buffer */ >I can confirm the same behavior after applying these changes. No more core dumps so far. Thanks as well :) -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20201106/3b69fbf3/attachment-0001.html>