My auth process is dumping core. This happens several times per day but dovecot can operate normally for hours between errors. The crash occurs in src/auth/auth-policy.c, line 356: t at 1 (l at 1) program terminated by signal SEGV (no mapping at the fault address) Current function is auth_policy_parse_response 356 context->request->policy_refusal = FALSE; ...context->request is null. Add markers to the code... } if (context->request == NULL) fprintf(stderr, "2222222222222222222222222222 context->request == NULL\n"); i_stream_unref(&context->payload); if (context->request == NULL) fprintf(stderr, "1111111111111111111111111111 context->request == NULL\n"); if (context->parse_error) { context->result = (context->set->policy_reject_on_fail ? -1 : 0); } if (context->request == NULL) fprintf(stderr, "AAAAAAAAAAAAAAAAAAAAAAAAAAAA context->request == NULL\n"); context->request->policy_refusal = FALSE; if (context->result < 0) { ...gives at the time of the crash... Aug 1 14:25:44 mailhost dovecot: [ID 702911 mail.error] auth: Error: 1111111111111111111111111111 context->request == NULL Aug 1 14:25:44 mailhost dovecot: [ID 702911 mail.error] auth: Error: AAAAAAAAAAAAAAAAAAAAAAAAAAAA context->request == NULL ...so context->result is not null before the call (no 222) to i_stream_unref but is after. dovecot.conf has: auth_policy_server_url = http://policyserver.lan/ auth_policy_server_timeout_msecs = 3000 auth_policy_hash_nonce = Ohr9phaeSeip2Pahaez2raiGohxoo5Ia auth_policy_request_attributes = remote=%{rip} auth_policy_check_before_auth = yes auth_policy_check_after_auth = yes auth_policy_report_after_auth = yes To simplify the problem I used a dummy policy server, in nginx.conf: location / { default_type application/json; return 200 "{\"status\":0,\"msg\":\"accepted\"}"; } however no matter what rubbish a policy server sends back it should not cause dovecot to crash. I've tried 32 and 64 bit and two compilers (gcc and SunStudio), all result in crashes. Adding: keepalive_timeout 0; ...to nginx.conf appears to reduce the crashes. It happens with a variety of users and with debug output I see no pattern. James.
On 02/08/2019 11:45, James via dovecot wrote:> My auth process is dumping core. This happens several times per day but > dovecot can operate normally for hours between errors. > > The crash occurs in src/auth/auth-policy.c, line 356: > > t at 1 (l at 1) program terminated by signal SEGV (no mapping at the fault > address) > Current function is auth_policy_parse_response > 356 context->request->policy_refusal = FALSE;Further tracking shows this sets context->request to NULL: "src/lib/iostream.c" line 54 array_foreach(&stream->destroy_callbacks, dc) dc->callback(dc->context); Very occasionally I see: Aug 3 11:00:35 mailhost dovecot: [ID 702911 mail.crit] auth: Panic: file http-client-request.c: line 283 (http_client_request_unref): assertion failed: (req->refcount > 0) Swapping keep-alive on/off changes crashing from very approximately once per day to some per hour. I guess there is some fundamental thread clash or keep alive time out clean-up failure. James.
On 2.8.2019 13.45, James via dovecot wrote:> My auth process is dumping core.? This happens several times per day > but dovecot can operate normally for hours between errors. > > The crash occurs in src/auth/auth-policy.c, line 356: > > t at 1 (l at 1) program terminated by signal SEGV (no mapping at the fault > address) > Current function is auth_policy_parse_response > ? 356?????????? context->request->policy_refusal = FALSE; > > ...context->request is null.? Add markers to the code... > > ??? } > ??? if (context->request == NULL) fprintf(stderr, > "2222222222222222222222222222 context->request == NULL\n"); > ??? i_stream_unref(&context->payload); > > ??? if (context->request == NULL) fprintf(stderr, > "1111111111111111111111111111 context->request == NULL\n"); > ??? if (context->parse_error) { > ??????? context->result = (context->set->policy_reject_on_fail ? -1 : 0); > ??? } > > ??? if (context->request == NULL) fprintf(stderr, > "AAAAAAAAAAAAAAAAAAAAAAAAAAAA context->request == NULL\n"); > ??? context->request->policy_refusal = FALSE; > > ??? if (context->result < 0) { > > ...gives at the time of the crash... > > Aug? 1 14:25:44 mailhost dovecot: [ID 702911 mail.error] auth: Error: > 1111111111111111111111111111 context->request == NULL > Aug? 1 14:25:44 mailhost dovecot: [ID 702911 mail.error] auth: Error: > AAAAAAAAAAAAAAAAAAAAAAAAAAAA context->request == NULL > > ...so context->result is not null before the call (no 222) to > i_stream_unref but is after. > > > > dovecot.conf has: > > auth_policy_server_url = http://policyserver.lan/ > auth_policy_server_timeout_msecs = 3000 > auth_policy_hash_nonce = Ohr9phaeSeip2Pahaez2raiGohxoo5Ia > auth_policy_request_attributes = remote=%{rip} > auth_policy_check_before_auth = yes > auth_policy_check_after_auth = yes > auth_policy_report_after_auth = yes > > > To simplify the problem I used a dummy policy server, in nginx.conf: > > ??? location / { > ??????? default_type? application/json; > ??????? return 200 "{\"status\":0,\"msg\":\"accepted\"}"; > ??? } > > however no matter what rubbish a policy server sends back it should > not cause dovecot to crash. > > I've tried 32 and 64 bit and two compilers (gcc and SunStudio), all > result in crashes.? Adding: > ??? keepalive_timeout 0; > ...to nginx.conf appears to reduce the crashes.? It happens with a > variety of users and with debug output I see no pattern. > > > > James. >Hi! There is an easy fix for this, attached. Aki -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-auth-Do-not-unref-policy-payload-too-early.patch Type: text/x-patch Size: 1024 bytes Desc: not available URL: <https://dovecot.org/pipermail/dovecot/attachments/20190806/c5757b60/attachment-0001.bin>
On 06/08/2019 06:46, Aki Tuomi via dovecot wrote:> > On 2.8.2019 13.45, James via dovecot wrote: >> My auth process is dumping core. This happens several times per day...> There is an easy fix for this, attached.Patch applied; no core dump in 24 hours. This appears to have fixed the problem. I found that it crashed when the policy server responded too quickly. As the before and after auth command=allow request are the same I cache the first, leading to a fast second response. Removing the cache (nginx proxy_cache ...) must change the timings and circumvented the crash. Why use both check before and after auth? roundcube webmail reports an error with only auth_policy_check_before_auth. I cannot see why. The simple and lazy solution is to use double auth_policy_check_! Thank you Aki for looking at this and finding a solution so quickly.