I'm running 2.0.17 and I'm still seeing a decent amount of "MySQL server has gone away" errors, despite having multiple hosts defined in my auth userdb 'connect'. This is Debian Lenny 32-bit and I'm seeing the same thing with 2.0.16 on Debian Squeeze 64-bit. E.g.: Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying: MySQL server has gone away Our mail mysql servers are busy enough that wait_timeout is set to a whopping 30 seconds. On my regular boxes, I see a good deal of these in the logs. I've been doing a lot of mucking with doveadm/dsync (working on maildir->mdbox migration finally, yay!) on test boxes (same dovecot package & version) and when I get this error, despite the log saying it's retrying, it doesn't seem to be. Instead I get: dsync(root): Error: user ...: Auth USER lookup failed dsync(root): Fatal: User lookup failed: Internal error occurred. Refer to server log for more information. Watching tcpdump at the same time, it looks like it's going through some of the mysql servers, but all of them have by now disconnected and are in CLOSE_WAIT. Here's an (edited) example after doing a dsync that completes without errors, with tcpdump running in the background: # sleep 30; netstat -ant | grep 3306; dsync -C^ -u mailbox at test.com backup mdbox:~/mdbox tcp 1 0 10.1.15.129:57436 10.1.52.48:3306 CLOSE_WAIT tcp 1 0 10.1.15.129:49917 10.1.52.49:3306 CLOSE_WAIT tcp 1 0 10.1.15.129:35904 10.1.52.47:3306 CLOSE_WAIT 20:49:59.725005 IP 10.1.15.129.35904 > 10.1.52.47.3306: F 1126:1126(0) ack 807 win 1004 <nop,nop,timestamp 312603858 766667259> 20:49:59.725459 IP 10.1.52.47.3306 > 10.1.15.129.35904: . ack 1127 win 123 <nop,nop,timestamp 766667998 312603858> 20:49:59.725568 IP 10.1.15.129.57436 > 10.1.52.48.3306: F 1126:1126(0) ack 807 win 1004 <nop,nop,timestamp 312603858 1842560856> 20:49:59.725779 IP 10.1.52.48.3306 > 10.1.15.129.57436: . ack 1127 win 123 <nop,nop,timestamp 1842561225 312603858> dsync(root): Error: user mailbox at test.com: Auth USER lookup failed dsync(root): Fatal: User lookup failed: Internal error occurred. Refer to server log for more information. 10.1.15.129 in this case is the dovecot server, and the 10.1.52.0/24 boxes are mysql servers. That's the same pattern I've seen almost every time. Just a FIN packet to two of the servers (ack'd by the mysql server) and then it fails. Is the retry mechanism supposed to transparently start a new connection, or is this how it works? In connecting remotely to these same servers (which aren't getting production traffic, so I'm the only person connecting to them), I get seemingly random disconnects via IMAP, always coinciding with a "MySQL server has gone away" error in the logs. This is non-production, so I'm happy to turn on whatever debugging would be useful. Here's doveconf -n from the box the tcpdump was on. This box is just configured for lmtp (but have seen the same thing on one configured for IMAP/POP as well), so it's pretty small, config-wise: # 2.0.17: /etc/dovecot/dovecot/dovecot.conf # OS: Linux 3.0.9-nx i686 Debian 5.0.9 auth_cache_negative_ttl = 0 auth_cache_ttl = 0 auth_debug = yes auth_failure_delay = 0 base_dir = /var/run/dovecot/ debug_log_path = /var/log/dovecot/debug.log default_client_limit = 3005 default_internal_user = doveauth default_process_limit = 1500 deliver_log_format = M=%m, F=%f, S="%s" => %$ disable_plaintext_auth = no first_valid_uid = 199 last_valid_uid = 201 lda_mailbox_autocreate = yes listen = * log_path = /var/log/dovecot/mail.log mail_debug = yes mail_fsync = always mail_location = maildir:~/Maildir:INDEX=/var/cache/dovecot/%2Mu/%2.2Mu/%u mail_nfs_index = yes mail_nfs_storage = yes mail_plugins = zlib quota mail_privileged_group = mail mail_uid = 200 managesieve_notify_capability = mailto managesieve_sieve_capability = fileinto reject envelope encoded-character vacation subaddress comparator-i;ascii-numeric relational regex imap4flags copy include variables body enotify environment mailbox date ihave mdbox_rotate_interval = 1 days mmap_disable = yes namespace { hidden = no inbox = yes list = yes location prefix = INBOX. separator = . subscriptions = yes type = private } passdb { args = /opt/dovecot/etc/lmtp/sql.conf driver = sql } plugin { info_log_path = /var/log/dovecot/dovecot-deliver.log log_path = /var/log/dovecot/dovecot-deliver.log quota = maildir:User quota quota_rule = *:bytes=25M quota_rule2 = INBOX.Trash:bytes=+10%% quota_rule3 = *:messages=3000 sieve = ~/sieve/dovecot.sieve sieve_before = /etc/dovecot/scripts/spam.sieve sieve_dir = ~/sieve/ zlib_save = gz zlib_save_level = 3 } protocols = lmtp sieve service auth-worker { unix_listener auth-worker { mode = 0666 } user = doveauth } service auth { client_limit = 8000 unix_listener login/auth { mode = 0666 } user = doveauth } service lmtp { executable = lmtp -L process_min_avail = 10 unix_listener lmtp { mode = 0666 } } ssl = no userdb { driver = prefetch } userdb { args = /opt/dovecot/etc/lmtp/sql.conf driver = sql } verbose_proctitle = yes protocol lmtp { mail_plugins = zlib quota sieve } Thanks!
On 1/12/2012 6:00 PM, Mark Moseley wrote:> Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying: > MySQL server has gone awayI've actually been meaning to send a similar message for the last couple of months :). We run dovecot solely as a sasl authentication provider to postfix for smtp authentication. We're currently running 2.0.15 with a handful of patches from a few months ago when Timo fixed mysql failover. We also see sporadic messages like that in the logs: Jan 11 01:00:57 sparky dovecot: auth-worker: Error: mysql: Query failed, retrying: MySQL server has gone away We do have a timeout on the mysql servers, so I don't necessarily mind this message, except we also see some number of these: Jan 11 01:00:57 sparky dovecot: auth-worker: Error: sql(clgeurts,108.38.64.98): Password query failed: MySQL server has gone away The mysql servers have never been down or unresponsive, if it retries, it should succeed. I'm not sure what's happening here, perhaps it tries the query on one mysql server connection (we have two configured) which has timed out, and then tries the other one, and if the other one has also timed out just fails? I also see some auth timeouts: Jan 11 22:06:02 sparky dovecot: auth: CRAM-MD5(?,200.37.175.14): Request 10232.28 timeouted after 150 secs, state=2 I'm not sure if they're related to the mysql timeouts. There are also some postfix auth errors: Jan 11 23:55:41 sparky postfix/smtpd[20994]: warning: unknown[200.37.175.14]: SASL CRAM-MD5 authentication failed: Connection lost to authentication server Which I think happen when dovecot takes too long to respond. I haven't had time to dig into it or get any debugging info, but just thought I'd pipe up when I saw your similar question :). -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
On 13.1.2012, at 4.00, Mark Moseley wrote:> I'm running 2.0.17 and I'm still seeing a decent amount of "MySQL > server has gone away" errors, despite having multiple hosts defined in > my auth userdb 'connect'. This is Debian Lenny 32-bit and I'm seeing > the same thing with 2.0.16 on Debian Squeeze 64-bit. > > E.g.: > > Jan 12 20:30:33 auth-worker: Error: mysql: Query failed, retrying: > MySQL server has gone away > > Our mail mysql servers are busy enough that wait_timeout is set to a > whopping 30 seconds. On my regular boxes, I see a good deal of these > in the logs. I've been doing a lot of mucking with doveadm/dsync > (working on maildir->mdbox migration finally, yay!) on test boxes > (same dovecot package & version) and when I get this error, despite > the log saying it's retrying, it doesn't seem to be. Instead I get: > > dsync(root): Error: user ...: Auth USER lookup failedTry with only one host in the "connect" string? My guess: Both the connections have timed out, and the retrying fails as well (there is only one retry). Although if the retrying lookup fails, there should be an error logged about it also (you don't see one?) Also another idea to avoid them in the first place: service auth-worker { idle_kill = 20 }
Am 28.01.2012 23:17, schrieb Timo Sirainen:> On 28.1.2012, at 23.06, Robert Schetterer wrote: > >> doc/example-config/dovecot-sql.conf.ext >> from hg >> has something like >> >> # Database connection string. This is driver-specific setting. >> # HA / round-robin load-balancing is supported by giving multiple host >> # settings, like: host=sql1.host.org host=sql2.host.org >> >> but i dont find it in >> http://wiki2.dovecot.org/AuthDatabase/SQL > > I added something about it there. >cool thanks ! -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria
Am 15.02.2012 14:07, schrieb Timo Sirainen:> On Wed, 2012-02-15 at 09:10 +0100, Robert Schetterer wrote: >> Hi Timo,just to make sure >> i have an extra >> dovecot-dict-quota-sql.conf.ext ( dove 2.0.18 ) >> >> connect = host=192.168.123.150 dbname=.. user=... password=... > .. >> is it possible to have i.e >> >> connect = host=192.168.123.150 host=127.0.0.1 dbname=.. user=... >> password=... >> >> there too? > > Assuming your SQL database is configured to do synchronous multi-master > replication, sure. > >jep ,thx Timo -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria
Am 15.02.2012 17:10, schrieb Robert Schetterer:> Am 15.02.2012 14:07, schrieb Timo Sirainen: >> On Wed, 2012-02-15 at 09:10 +0100, Robert Schetterer wrote: >>> Hi Timo,just to make sure >>> i have an extra >>> dovecot-dict-quota-sql.conf.ext ( dove 2.0.18 ) >>> >>> connect = host=192.168.123.150 dbname=.. user=... password=... >> .. >>> is it possible to have i.e >>> >>> connect = host=192.168.123.150 host=127.0.0.1 dbname=.. user=... >>> password=... >>> >>> there too? >> >> Assuming your SQL database is configured to do synchronous multi-master >> replication, sure. >> >> > jep ,thx Timo >Hi Timo, sorry for some more question what is the default behave if dict quota over sql cant be reached i.e with lmtp something like log warning and deliver anyway ? -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria
On 15.2.2012, at 20.22, Robert Schetterer wrote:> sorry for some more question > what is the default behave if dict quota over sql cant be reached > i.e with lmtp > > something like log warning and deliver anyway ?I think it tempfails. Try.
Am 15.02.2012 19:43, schrieb Timo Sirainen:> On 15.2.2012, at 20.22, Robert Schetterer wrote: > >> sorry for some more question >> what is the default behave if dict quota over sql cant be reached >> i.e with lmtp >> >> something like log warning and deliver anyway ? > > I think it tempfails. Try.hm, test servers arent up recently cant do it on production but i will test before i go try dove 2.1 and report, @list perhaps anyone other having it tested before?>-- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria