My auth process is dumping core.  This happens several times per day but 
dovecot can operate normally for hours between errors.
The crash occurs in src/auth/auth-policy.c, line 356:
t at 1 (l at 1) program terminated by signal SEGV (no mapping at the fault 
address)
Current function is auth_policy_parse_response
   356           context->request->policy_refusal = FALSE;
...context->request is null.  Add markers to the code...
     }
     if (context->request == NULL) fprintf(stderr, 
"2222222222222222222222222222 context->request == NULL\n");
     i_stream_unref(&context->payload);
     if (context->request == NULL) fprintf(stderr, 
"1111111111111111111111111111 context->request == NULL\n");
     if (context->parse_error) {
         context->result = (context->set->policy_reject_on_fail ? -1 :
0);
     }
     if (context->request == NULL) fprintf(stderr, 
"AAAAAAAAAAAAAAAAAAAAAAAAAAAA context->request == NULL\n");
     context->request->policy_refusal = FALSE;
     if (context->result < 0) {
...gives at the time of the crash...
Aug  1 14:25:44 mailhost dovecot: [ID 702911 mail.error] auth: Error: 
1111111111111111111111111111 context->request == NULL
Aug  1 14:25:44 mailhost dovecot: [ID 702911 mail.error] auth: Error: 
AAAAAAAAAAAAAAAAAAAAAAAAAAAA context->request == NULL
...so context->result is not null before the call (no 222) to 
i_stream_unref but is after.
dovecot.conf has:
auth_policy_server_url = http://policyserver.lan/
auth_policy_server_timeout_msecs = 3000
auth_policy_hash_nonce = Ohr9phaeSeip2Pahaez2raiGohxoo5Ia
auth_policy_request_attributes = remote=%{rip}
auth_policy_check_before_auth = yes
auth_policy_check_after_auth = yes
auth_policy_report_after_auth = yes
To simplify the problem I used a dummy policy server, in nginx.conf:
     location / {
         default_type  application/json;
         return 200
"{\"status\":0,\"msg\":\"accepted\"}";
     }
however no matter what rubbish a policy server sends back it should not 
cause dovecot to crash.
I've tried 32 and 64 bit and two compilers (gcc and SunStudio), all 
result in crashes.  Adding:
     keepalive_timeout 0;
...to nginx.conf appears to reduce the crashes.  It happens with a 
variety of users and with debug output I see no pattern.
James.
On 02/08/2019 11:45, James via dovecot wrote:> My auth process is dumping core. This happens several times per day but > dovecot can operate normally for hours between errors. > > The crash occurs in src/auth/auth-policy.c, line 356: > > t at 1 (l at 1) program terminated by signal SEGV (no mapping at the fault > address) > Current function is auth_policy_parse_response > 356 context->request->policy_refusal = FALSE;Further tracking shows this sets context->request to NULL: "src/lib/iostream.c" line 54 array_foreach(&stream->destroy_callbacks, dc) dc->callback(dc->context); Very occasionally I see: Aug 3 11:00:35 mailhost dovecot: [ID 702911 mail.crit] auth: Panic: file http-client-request.c: line 283 (http_client_request_unref): assertion failed: (req->refcount > 0) Swapping keep-alive on/off changes crashing from very approximately once per day to some per hour. I guess there is some fundamental thread clash or keep alive time out clean-up failure. James.
On 2.8.2019 13.45, James via dovecot wrote:> My auth process is dumping core.? This happens several times per day > but dovecot can operate normally for hours between errors. > > The crash occurs in src/auth/auth-policy.c, line 356: > > t at 1 (l at 1) program terminated by signal SEGV (no mapping at the fault > address) > Current function is auth_policy_parse_response > ? 356?????????? context->request->policy_refusal = FALSE; > > ...context->request is null.? Add markers to the code... > > ??? } > ??? if (context->request == NULL) fprintf(stderr, > "2222222222222222222222222222 context->request == NULL\n"); > ??? i_stream_unref(&context->payload); > > ??? if (context->request == NULL) fprintf(stderr, > "1111111111111111111111111111 context->request == NULL\n"); > ??? if (context->parse_error) { > ??????? context->result = (context->set->policy_reject_on_fail ? -1 : 0); > ??? } > > ??? if (context->request == NULL) fprintf(stderr, > "AAAAAAAAAAAAAAAAAAAAAAAAAAAA context->request == NULL\n"); > ??? context->request->policy_refusal = FALSE; > > ??? if (context->result < 0) { > > ...gives at the time of the crash... > > Aug? 1 14:25:44 mailhost dovecot: [ID 702911 mail.error] auth: Error: > 1111111111111111111111111111 context->request == NULL > Aug? 1 14:25:44 mailhost dovecot: [ID 702911 mail.error] auth: Error: > AAAAAAAAAAAAAAAAAAAAAAAAAAAA context->request == NULL > > ...so context->result is not null before the call (no 222) to > i_stream_unref but is after. > > > > dovecot.conf has: > > auth_policy_server_url = http://policyserver.lan/ > auth_policy_server_timeout_msecs = 3000 > auth_policy_hash_nonce = Ohr9phaeSeip2Pahaez2raiGohxoo5Ia > auth_policy_request_attributes = remote=%{rip} > auth_policy_check_before_auth = yes > auth_policy_check_after_auth = yes > auth_policy_report_after_auth = yes > > > To simplify the problem I used a dummy policy server, in nginx.conf: > > ??? location / { > ??????? default_type? application/json; > ??????? return 200 "{\"status\":0,\"msg\":\"accepted\"}"; > ??? } > > however no matter what rubbish a policy server sends back it should > not cause dovecot to crash. > > I've tried 32 and 64 bit and two compilers (gcc and SunStudio), all > result in crashes.? Adding: > ??? keepalive_timeout 0; > ...to nginx.conf appears to reduce the crashes.? It happens with a > variety of users and with debug output I see no pattern. > > > > James. >Hi! There is an easy fix for this, attached. Aki -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-auth-Do-not-unref-policy-payload-too-early.patch Type: text/x-patch Size: 1024 bytes Desc: not available URL: <https://dovecot.org/pipermail/dovecot/attachments/20190806/c5757b60/attachment-0001.bin>
On 06/08/2019 06:46, Aki Tuomi via dovecot wrote:> > On 2.8.2019 13.45, James via dovecot wrote: >> My auth process is dumping core. This happens several times per day...> There is an easy fix for this, attached.Patch applied; no core dump in 24 hours. This appears to have fixed the problem. I found that it crashed when the policy server responded too quickly. As the before and after auth command=allow request are the same I cache the first, leading to a fast second response. Removing the cache (nginx proxy_cache ...) must change the timings and circumvented the crash. Why use both check before and after auth? roundcube webmail reports an error with only auth_policy_check_before_auth. I cannot see why. The simple and lazy solution is to use double auth_policy_check_! Thank you Aki for looking at this and finding a solution so quickly.