We are running dovecot to provide authentication for postfix, using two mysql servers in a multi-master replication set as the password source: ---------------------------------------- # 2.0.13: /etc/dovecot/dovecot.conf # OS: Linux 2.6.37-gentoo-r4 x86_64 Gentoo Base System release 2.0.2 auth_mechanisms = plain login digest-md5 cram-md5 auth_verbose = yes passdb { args = /etc/dovecot/dovecot-sql.conf driver = sql } protocols = none service auth-worker { unix_listener auth-worker { user = postfix } user = $default_internal_user } service auth { unix_listener /var/spool/postfix/private/auth { group = postfix mode = 0660 user = postfix } user = postfix } ssl = no userdb { driver = passwd } --------------------------------------- With an sql config of: ------------------------- driver = mysql connect = host=mysql-1.unx.csupomona.edu host=mysql-2.unx.csupomona.edu dbname=idmgmt user=postfix password=XXXXXXX default_pass_scheme = PLAIN password_query = XXXXXXXXX ------------------------- According to the sample SQL configuration file "HA / round-robin load-balancing is supported by giving multiple host settings, like: host=sql1.host.org host=sql2.host.org". However, as far as I can tell dovecot only connects to the first listed host, and processes all queries through it, there does not appear to be any load-balancing going on. That's not necessarily a dealbreaker; however, high-availability does not appear to be working either. If I shutdown the first mysql server, dovecot starts to log connection failures: Sep 9 15:47:34 tweak dovecot: auth: Error: mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - waiting for 1 seconds before retry Sep 9 15:47:39 tweak dovecot: auth: Error: mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - waiting for 25 seconds before retry And postfix starts to fail authentications: Sep 9 15:47:35 tweak postfix/smtpd[5119]: warning: bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5 authentication failed: Connection lost to authentication server Now and again the authentication process dies: Sep 9 15:47:39 tweak dovecot: auth: Panic: file auth-request-handler.c: line 697 (auth_request_handler_flush_failures): assertion failed: (auth_request->state == AUTH_REQUEST_STATE_FINISHED) Sep 9 15:47:39 tweak dovecot: auth: Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0x3f71a) [0x7f25822ca71a] -> /usr/lib64/dovecot/libdovecot.so.0(+0x3f766) [0x7f25822ca766] -> /usr/lib64/dovecot/libdovecot.so.0(+0x198ca) [0x7f25822a48ca] -> dovecot/auth() [0x4137f4] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handle_timeouts+0xd4) [0x7f25822d5fe4] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0x5b) [0x7f25822d6bcb] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28) [0x7f25822d5c48] -> /usr/lib64/dovecot/libdovecot.so.0(master_service_run+0x13) [0x7f25822c3de3] -> dovecot/auth(main+0x2be) [0x4179de] -> /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f2581898bbd] -> dovecot/auth() [0x40bdc9] Sep 9 15:47:39 tweak dovecot: master: Error: service(auth): child 4154 killed with signal 6 (core dumps disabled) Requests start to pile up: Sep 9 15:51:46 tweak dovecot: auth: Warning: auth workers: Auth request was queued for 25 seconds, 45 left in queue Lookups time out: Sep 9 15:57:22 tweak dovecot: auth: Error: auth worker: Aborted request: Lookup timed out This occasionally pops up: Sep 9 15:58:38 tweak dovecot: auth: Fatal: net_connect_unix(auth-worker) failed: Resource temporarily unavailable And sometimes the auth process gets temporarily disabled: Sep 9 15:58:57 tweak dovecot: master: Error: service(auth): command startup failed, throttling Resulting in more postfix authentication failures: Sep 9 15:58:57 tweak postfix/smtpd[6531]: warning: bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5 authentication failed: Sep 9 15:59:08 tweak postfix/smtpd[6551]: fatal: no SASL authentication mechanisms To the point where postfix also temporarily throttles smtpd: Sep 9 15:59:21 tweak postfix/master[6526]: warning: /usr/lib64/postfix/smtpd: bad command startup -- throttling Resulting in a complete unavailability of smtp service, not just unavailability of authenticated services. I don't think all authentications fail during the scenario, but I think the majority do. Based on the network traffic, dovecot is almost continuously trying to connect to the first listed server. It sometimes connects to the second listed server, but when it does, the connection does not persist, it goes away almost immediately. Ideally, I would like no authentications to fail if one of the MySQL servers is unavailable. If a few fail just when the server dies, that would be undesirable but acceptable as long as they do not continuously fail while the server is down. Am I doing something wrong? Does the example sql config have incorrect information? We were previously running dovecot 1.2.11, we just recently upgraded to 2. In the previous version, we actually had two different passdb's configured, each one listing only one of the mysql servers. I seem to recall that was the recommendation at the time for high-availability. When that configuration did not seem to work under version 2, I found an updated recommendation to list both servers in the same passdb, which also does not appear to work correctly. I actually went back and tested the older version, and determined it seemed to work okay in the case where the server was up but the service was down, and connections were refused, but also failed a large number of authentication attempts when the server was completely down and connections were timing out. Thanks much... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
On Fri, 2011-09-09 at 19:33 -0700, Paul B. Henson wrote:> default_pass_scheme = PLAINUhg i'll pretend I didnt see that :)> > According to the sample SQL configuration file "HA / round-robin > load-balancing is supported by giving multiple host settings, like: > host=sql1.host.org host=sql2.host.org". > > However, as far as I can tell dovecot only connects to the first listed > host, and processes all queries through it, there does not appear to be > any load-balancing going on. >I suspect the wording here is incorrect, its just a failover AFAIK, it only hits the first entry failing to second if no response. HA would be like running a mysql slave on all the front ends failing over to the master on your CRM server etc, which is what I do and suggest, having just one master server, after all, dovecot and postfix just need to read, not alter/update/insert etc.> That's not necessarily a dealbreaker; however, high-availability does > not appear to be working either. > > If I shutdown the first mysql server, dovecot starts to log connection > failures: > > Sep 9 15:47:34 tweak dovecot: auth: Error: > mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): > Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - > waiting for 1 seconds before retry > > Sep 9 15:47:39 tweak dovecot: auth: Error: > mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): > Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - > waiting for 25 seconds before retry >yep thats correct because it has " gone away" but it still uses the second host immediately, thats just dovecot trying to re-establish its link with primary> And postfix starts to fail authentications: >err postfix is not dovecot, you need to also add failover in postfix's sql lookup commands hosts = unix:/var/run/mysql/mysql.sock 10.10.10.2 (assuming .2 is your master sql server)> > Resulting in a complete unavailability of smtp service, not just > unavailability of authenticated services. >You could have a higher sec mx smtp box that uses postfix for virtual transport for cases of if dovecot is unavailable, this of course means storing partial paths in your mail db, for use only by that one non-behind-load-balancer separated sec mx, of course this wont solve users issue of sending unless you have multiple smtp behind a load balancer, but allows for inbound still, depends on how big your setup (and budget) is or can be :) (note: I talk of load balancer as in real hardware device, not as in pretend LB's as in software)> Does the example sql config have incorrect > information? >I suspect so. -------------- next part -------------- A non-text attachment was scrubbed... Name: face-smile.png Type: image/png Size: 873 bytes Desc: not available URL: <http://dovecot.org/pipermail/dovecot/attachments/20110910/c5e99667/attachment-0004.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 490 bytes Desc: This is a digitally signed message part URL: <http://dovecot.org/pipermail/dovecot/attachments/20110910/c5e99667/attachment-0004.bin>
On Fri, Sep 09, 2011 at 08:02:57PM -0700, Noel Butler wrote:> Uhg i'll pretend I didnt see that :)We only use dovecot to provide sasl authentication to postfix smtp clients, using a separate password just for that purpose. Storing it in plaintext is the only way to support all authentication types.> I suspect the wording here is incorrect, its just a failover AFAIK, it > only hits the first entry failing to second if no response.Hmm, that would work for me, if it worked ;).> suggest, having just one master server, after all, dovecot and postfix > just need to read, not alter/update/insert etc.True; but the pieces that are altering/updating/inserting the data that postfix/dovecot need to read need redundancy as well :).> yep thats correct because it has " gone away" but it still uses the > second host immediately, thats just dovecot trying to re-establish its > link with primaryBased on my testing, it doesn't use the second host immediately, but only sporadically, with most of the authentications failing.> err postfix is not dovecot, you need to also add failover in postfix's > sql lookup commandspostfix relies on dovecot for authentication, this postfix error message is the result of dovecot not successfully processing an authentication request. postfix itself handles mysql failure well, it both load balances queries across both servers and also continues to function when one isn't available.> (note: I talk of load balancer as in real hardware device, not as in > pretend LB's as in software)We actually have a hardware load balancer, and I've considered just sticking the mysql servers behind it. But everything else using them handles failover ok, and initially I'd rather get dovecot doing the same before changing the current architecture. Thanks for the reply... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
On Fri, 2011-09-09 at 19:33 -0700, Paul B. Henson wrote:> According to the sample SQL configuration file "HA / round-robin > load-balancing is supported by giving multiple host settings, like: > host=sql1.host.org host=sql2.host.org". > > However, as far as I can tell dovecot only connects to the first listed > host, and processes all queries through it, there does not appear to be > any load-balancing going on.The current code creates connection to the second server only when the first connection is already busy with an SQL query, or when it's not working. Once there are more connections, it starts doing round robin lookups. This works okay enough with PostgreSQL because it does asynchronous lookups, so two simultaneous lookups create a second connection. MySQL does synchronous lookups though, so the second connection is normally never created. I suppose the fix to this would be to always connect to all SQL servers at startup.> That's not necessarily a dealbreaker; however, high-availability does > not appear to be working either. > > If I shutdown the first mysql server, dovecot starts to log connection > failures: > > Sep 9 15:47:34 tweak dovecot: auth: Error: > mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): > Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - > waiting for 1 seconds before retry > > Sep 9 15:47:39 tweak dovecot: auth: Error: > mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): > Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - > waiting for 25 seconds before retryThose are intentional.> And postfix starts to fail authentications: > > Sep 9 15:47:35 tweak postfix/smtpd[5119]: warning: > bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5 > authentication failed: Connection lost to authentication serverIt should have created the second connection here and not fail..> Now and again the authentication process dies: > > Sep 9 15:47:39 tweak dovecot: auth: Panic: file auth-request-handler.c: > line 697 (auth_request_handler_flush_failures): assertion failed: > (auth_request->state == AUTH_REQUEST_STATE_FINISHED)And this of course shouldn't happen either.> Requests start to pile up: > > Sep 9 15:51:46 tweak dovecot: auth: Warning: auth workers: Auth request > was queued for 25 seconds, 45 left in queue > > Lookups time out: > > Sep 9 15:57:22 tweak dovecot: auth: Error: auth worker: Aborted > request: Lookup timed outThese are the result of the previous failures.> This occasionally pops up: > > Sep 9 15:58:38 tweak dovecot: auth: Fatal: > net_connect_unix(auth-worker) failed: Resource temporarily unavailableProbably this too.> And sometimes the auth process gets temporarily disabled: > > Sep 9 15:58:57 tweak dovecot: master: Error: service(auth): command > startup failed, throttlingMost likely related to the crash, although I think this still shouldn't have happened.> I don't think all authentications fail during the scenario, but I think > the majority do. Based on the network traffic, dovecot is almost > continuously trying to connect to the first listed server. It sometimes > connects to the second listed server, but when it does, the connection > does not persist, it goes away almost immediately.There are multiple auth-worker processes, each one having their own internal MySQL connections with separate retry counters. I'll try to debug this soon.
On Fri, 2011-09-09 at 19:33 -0700, Paul B. Henson wrote:> Sep 9 15:47:34 tweak dovecot: auth: Error: > mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): > Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - > waiting for 1 seconds before retryI did several fixes related to this in v2.0 hg.> And postfix starts to fail authentications: > > Sep 9 15:47:35 tweak postfix/smtpd[5119]: warning: > bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5 > authentication failed: Connection lost to authentication serverThe reason why it kept failing with Postfix was because Dovecot had 10 second timeout for SQL connecting, and Postfix had 10 second timeout before failing authentication. So Postfix never waited long enough for Dovecot to attempt a second connection to the second MySQL server. I dropped Dovecot's SQL connect timeout to 5 seconds.> Now and again the authentication process dies: > > Sep 9 15:47:39 tweak dovecot: auth: Panic: file auth-request-handler.c: > line 697 (auth_request_handler_flush_failures): assertion failed: > (auth_request->state == AUTH_REQUEST_STATE_FINISHED)This happened only with non-plaintext authentication (e.g. DIGEST-MD5). Fixed also.
Am 15.09.2011 13:43, schrieb Timo Sirainen:> On Thu, 2011-09-15 at 13:39 +0200, Robert Schetterer wrote: >> >> is there really a native failover mysql in dovecot ? >> cant remember this , i only remember this as part of dovecot proxiing > > For SQL authentication it can use multiple SQL server hosts (with both > MySQL and PostgreSQL) and do HA/load balancing. > >ok, i see, but i have nearly all possible parameters in mysql ( i use a mysql cluster ), thx anyway for answer -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria