Tom Sommer
2016-Apr-06 19:14 UTC
Internal quota calculation error (redis), double mail delivery
On 2016-04-06 20:27, Timo Sirainen wrote:> On 06 Apr 2016, at 16:54, Tom Sommer <mail at tomsommer.dk> wrote: >> >> Sometimes my redis connection times out, both for pop3, imap and lmtp. >> I have no idea why, when I do a "doveadm quota recalc -u" it works >> just fine? >> I'm thinking the connection is lost somehow and not retried? >> >> Apr 06 15:42:00 lmtp(xxx at xxx.com): Error: redis: Lookup timed out in >> 30.000 secs >> Apr 06 15:42:00 lmtp(xxx at xxx.com): Error: Internal quota calculation >> error >> Apr 06 15:42:00 lmtp(xxx at xxx.com): Error: Internal quota calculation >> error >> Apr 06 15:42:00 lmtp(xxx at xxx.com): Info: xxxx: msgid=<xxxx at xxxx.com>: >> saved mail to INBOX >> Apr 06 15:49:23 lmtp(xxx at xxx.com): Info: xxxx: msgid=<xxxx at xxxx.com>: >> saved mail to INBOX <-- Same mail >> >> This happens a lot, and it's really bad, because the mail is actually >> saved correctly, but the error is reported back to the Director, >> causing the delivery to be retried by the MTA - so the mail actually >> lands *twice* in the user's inbox. > > 1) Shrink the timeout in Redis with timeout_msecs parameter. > http://wiki2.dovecot.org/DictionaryYea, but why is it taking 30 seconds anyway? I'll try and lower it, but I suspect I'll just get 10.000 secs errors instead?> 2) Actually it looks like the timeout handling code doesn't work at > all. It simply keeps on logging timeouts until it succeeds or the > connection gets disconnected. Do you see such repeated timeout logs, > or does the connection also happen to get closed exactly at 30 seconds > anyway? This fixes: > https://github.com/dovecot/core/commit/2fb829dc7cc8c61dee925a15e51768a4b82e45daI see a lot of them, but none repeated after eachother. It's more of a random pattern.> 3) Is there some firewall between Dovecot and Redis that closes idling > connections and causes them to hang? Dovecot doesn't close idle Redis > connections, although I suppose it would be useful too. The attached > patch probably helps here by adding a new idle_timeout_msecs setting. > I didn't finish testing it.No firewall at all. I'm unsure if redis closes idle connections, that could be the cause of it. What about the issue of lmtp saving the mail, but sending an error back to Director? Thanks :) -- Tom
Timo Sirainen
2016-Apr-06 19:22 UTC
Internal quota calculation error (redis), double mail delivery
On 06 Apr 2016, at 22:14, Tom Sommer <mail at tomsommer.dk> wrote:> > > On 2016-04-06 20:27, Timo Sirainen wrote: >> On 06 Apr 2016, at 16:54, Tom Sommer <mail at tomsommer.dk> wrote: >>> Sometimes my redis connection times out, both for pop3, imap and lmtp. I have no idea why, when I do a "doveadm quota recalc -u" it works just fine? >>> I'm thinking the connection is lost somehow and not retried? >>> Apr 06 15:42:00 lmtp(xxx at xxx.com): Error: redis: Lookup timed out in 30.000 secs >>> Apr 06 15:42:00 lmtp(xxx at xxx.com): Error: Internal quota calculation error >>> Apr 06 15:42:00 lmtp(xxx at xxx.com): Error: Internal quota calculation error >>> Apr 06 15:42:00 lmtp(xxx at xxx.com): Info: xxxx: msgid=<xxxx at xxxx.com>: saved mail to INBOX >>> Apr 06 15:49:23 lmtp(xxx at xxx.com): Info: xxxx: msgid=<xxxx at xxxx.com>: saved mail to INBOX <-- Same mail >>> This happens a lot, and it's really bad, because the mail is actually saved correctly, but the error is reported back to the Director, causing the delivery to be retried by the MTA - so the mail actually lands *twice* in the user's inbox. >> 1) Shrink the timeout in Redis with timeout_msecs parameter. >> http://wiki2.dovecot.org/Dictionary > > Yea, but why is it taking 30 seconds anyway? I'll try and lower it, but I suspect I'll just get 10.000 secs errors instead?I don't know. Possibly a hanging TCP connection. That's why I was asking about the firewall. If it's not that, I don't really know. And yes, just shrinking this without the below fix won't help anything.>> 2) Actually it looks like the timeout handling code doesn't work at >> all. It simply keeps on logging timeouts until it succeeds or the >> connection gets disconnected. Do you see such repeated timeout logs, >> or does the connection also happen to get closed exactly at 30 seconds >> anyway? This fixes: >> https://github.com/dovecot/core/commit/2fb829dc7cc8c61dee925a15e51768a4b82e45da > > I see a lot of them, but none repeated after eachother. It's more of a random pattern.Maybe something triggers after 30 seconds elsewhere that also aborts the lookup..>> 3) Is there some firewall between Dovecot and Redis that closes idling >> connections and causes them to hang? Dovecot doesn't close idle Redis >> connections, although I suppose it would be useful too. The attached >> patch probably helps here by adding a new idle_timeout_msecs setting. >> I didn't finish testing it. > > No firewall at all. I'm unsure if redis closes idle connections, that could be the cause of it.Closing a connection is fine. That causes Dovecot to just reconnect.> What about the issue of lmtp saving the mail, but sending an error back to Director?I guess the LMTP proxy_timeout in director is 30 seconds or lower. If backend's timeouts are smaller than that, director won't have to disconnect LMTP due to timeout and there won't be duplicates. Or just increase the director's proxy_timeout to several minutes.
Tom Sommer
2016-Apr-06 19:33 UTC
Internal quota calculation error (redis), double mail delivery
On 2016-04-06 21:22, Timo Sirainen wrote:>> Yea, but why is it taking 30 seconds anyway? I'll try and lower it, >> but I suspect I'll just get 10.000 secs errors instead? > > I don't know. Possibly a hanging TCP connection. That's why I was > asking about the firewall. If it's not that, I don't really know. And > yes, just shrinking this without the below fix won't help anything.Indeed. I have antivirus on the same server doing redis to the same redis-server, with no errors.>> What about the issue of lmtp saving the mail, but sending an error >> back to Director? > > I guess the LMTP proxy_timeout in director is 30 seconds or lower. If > backend's timeouts are smaller than that, director won't have to > disconnect LMTP due to timeout and there won't be duplicates. Or just > increase the director's proxy_timeout to several minutes.proxy_timeout is not set, I'm unsure of the default. I've raised it to 120 now. This is the log from director: Apr 06 15:42:00 lmtp(46187): Error: xxxx: Failed to send message to <xxxx at xxx.xxx> at xxx.xxx.com:24: 451 4.4.0 Remote server not answering (timeout while waiting for reply to DATA reply) <5vWKIVUIBVdrtAAA1VGibg> (1/1 at 30067 ms) So yea, guess the timeout was 30 there as well.