We are running dovecot to provide authentication for postfix, using two
mysql servers in a multi-master replication set as the password source:
----------------------------------------
# 2.0.13: /etc/dovecot/dovecot.conf
# OS: Linux 2.6.37-gentoo-r4 x86_64 Gentoo Base System release 2.0.2
auth_mechanisms = plain login digest-md5 cram-md5
auth_verbose = yes
passdb {
args = /etc/dovecot/dovecot-sql.conf
driver = sql
}
protocols = none
service auth-worker {
unix_listener auth-worker {
user = postfix
}
user = $default_internal_user
}
service auth {
unix_listener /var/spool/postfix/private/auth {
group = postfix
mode = 0660
user = postfix
}
user = postfix
}
ssl = no
userdb {
driver = passwd
}
---------------------------------------
With an sql config of:
-------------------------
driver = mysql
connect = host=mysql-1.unx.csupomona.edu host=mysql-2.unx.csupomona.edu
dbname=idmgmt user=postfix password=XXXXXXX
default_pass_scheme = PLAIN
password_query = XXXXXXXXX
-------------------------
According to the sample SQL configuration file "HA / round-robin
load-balancing is supported by giving multiple host settings, like:
host=sql1.host.org host=sql2.host.org".
However, as far as I can tell dovecot only connects to the first listed
host, and processes all queries through it, there does not appear to be
any load-balancing going on.
That's not necessarily a dealbreaker; however, high-availability does
not appear to be working either.
If I shutdown the first mysql server, dovecot starts to log connection
failures:
Sep 9 15:47:34 tweak dovecot: auth: Error:
mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt):
Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) -
waiting for 1 seconds before retry
Sep 9 15:47:39 tweak dovecot: auth: Error:
mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt):
Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) -
waiting for 25 seconds before retry
And postfix starts to fail authentications:
Sep 9 15:47:35 tweak postfix/smtpd[5119]: warning:
bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5
authentication failed: Connection lost to authentication server
Now and again the authentication process dies:
Sep 9 15:47:39 tweak dovecot: auth: Panic: file auth-request-handler.c:
line 697 (auth_request_handler_flush_failures): assertion failed:
(auth_request->state == AUTH_REQUEST_STATE_FINISHED)
Sep 9 15:47:39 tweak dovecot: auth: Error: Raw backtrace:
/usr/lib64/dovecot/libdovecot.so.0(+0x3f71a) [0x7f25822ca71a] ->
/usr/lib64/dovecot/libdovecot.so.0(+0x3f766) [0x7f25822ca766] ->
/usr/lib64/dovecot/libdovecot.so.0(+0x198ca) [0x7f25822a48ca] ->
dovecot/auth() [0x4137f4] ->
/usr/lib64/dovecot/libdovecot.so.0(io_loop_handle_timeouts+0xd4)
[0x7f25822d5fe4] ->
/usr/lib64/dovecot/libdovecot.so.0(io_loop_handler_run+0x5b)
[0x7f25822d6bcb] -> /usr/lib64/dovecot/libdovecot.so.0(io_loop_run+0x28)
[0x7f25822d5c48] ->
/usr/lib64/dovecot/libdovecot.so.0(master_service_run+0x13)
[0x7f25822c3de3] -> dovecot/auth(main+0x2be) [0x4179de] ->
/lib64/libc.so.6(__libc_start_main+0xfd) [0x7f2581898bbd] ->
dovecot/auth() [0x40bdc9]
Sep 9 15:47:39 tweak dovecot: master: Error: service(auth): child 4154
killed with signal 6 (core dumps disabled)
Requests start to pile up:
Sep 9 15:51:46 tweak dovecot: auth: Warning: auth workers: Auth request
was queued for 25 seconds, 45 left in queue
Lookups time out:
Sep 9 15:57:22 tweak dovecot: auth: Error: auth worker: Aborted
request: Lookup timed out
This occasionally pops up:
Sep 9 15:58:38 tweak dovecot: auth: Fatal:
net_connect_unix(auth-worker) failed: Resource temporarily unavailable
And sometimes the auth process gets temporarily disabled:
Sep 9 15:58:57 tweak dovecot: master: Error: service(auth): command
startup failed, throttling
Resulting in more postfix authentication failures:
Sep 9 15:58:57 tweak postfix/smtpd[6531]: warning:
bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5
authentication failed:
Sep 9 15:59:08 tweak postfix/smtpd[6551]: fatal: no SASL authentication
mechanisms
To the point where postfix also temporarily throttles smtpd:
Sep 9 15:59:21 tweak postfix/master[6526]: warning:
/usr/lib64/postfix/smtpd: bad command startup -- throttling
Resulting in a complete unavailability of smtp service, not just
unavailability of authenticated services.
I don't think all authentications fail during the scenario, but I think
the majority do. Based on the network traffic, dovecot is almost
continuously trying to connect to the first listed server. It sometimes
connects to the second listed server, but when it does, the connection
does not persist, it goes away almost immediately.
Ideally, I would like no authentications to fail if one of the MySQL
servers is unavailable. If a few fail just when the server dies, that
would be undesirable but acceptable as long as they do not continuously
fail while the server is down.
Am I doing something wrong? Does the example sql config have incorrect
information?
We were previously running dovecot 1.2.11, we just recently upgraded to
2. In the previous version, we actually had two different passdb's
configured, each one listing only one of the mysql servers. I seem to
recall that was the recommendation at the time for high-availability.
When that configuration did not seem to work under version 2, I found an
updated recommendation to list both servers in the same passdb, which
also does not appear to work correctly. I actually went back and tested
the older version, and determined it seemed to work okay in the case
where the server was up but the service was down, and connections were
refused, but also failed a large number of authentication attempts when
the server was completely down and connections were timing out.
Thanks much...
--
Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst | henson at csupomona.edu
California State Polytechnic University | Pomona CA 91768
On Fri, 2011-09-09 at 19:33 -0700, Paul B. Henson wrote:> default_pass_scheme = PLAINUhg i'll pretend I didnt see that :)> > According to the sample SQL configuration file "HA / round-robin > load-balancing is supported by giving multiple host settings, like: > host=sql1.host.org host=sql2.host.org". > > However, as far as I can tell dovecot only connects to the first listed > host, and processes all queries through it, there does not appear to be > any load-balancing going on. >I suspect the wording here is incorrect, its just a failover AFAIK, it only hits the first entry failing to second if no response. HA would be like running a mysql slave on all the front ends failing over to the master on your CRM server etc, which is what I do and suggest, having just one master server, after all, dovecot and postfix just need to read, not alter/update/insert etc.> That's not necessarily a dealbreaker; however, high-availability does > not appear to be working either. > > If I shutdown the first mysql server, dovecot starts to log connection > failures: > > Sep 9 15:47:34 tweak dovecot: auth: Error: > mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): > Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - > waiting for 1 seconds before retry > > Sep 9 15:47:39 tweak dovecot: auth: Error: > mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): > Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - > waiting for 25 seconds before retry >yep thats correct because it has " gone away" but it still uses the second host immediately, thats just dovecot trying to re-establish its link with primary> And postfix starts to fail authentications: >err postfix is not dovecot, you need to also add failover in postfix's sql lookup commands hosts = unix:/var/run/mysql/mysql.sock 10.10.10.2 (assuming .2 is your master sql server)> > Resulting in a complete unavailability of smtp service, not just > unavailability of authenticated services. >You could have a higher sec mx smtp box that uses postfix for virtual transport for cases of if dovecot is unavailable, this of course means storing partial paths in your mail db, for use only by that one non-behind-load-balancer separated sec mx, of course this wont solve users issue of sending unless you have multiple smtp behind a load balancer, but allows for inbound still, depends on how big your setup (and budget) is or can be :) (note: I talk of load balancer as in real hardware device, not as in pretend LB's as in software)> Does the example sql config have incorrect > information? >I suspect so. -------------- next part -------------- A non-text attachment was scrubbed... Name: face-smile.png Type: image/png Size: 873 bytes Desc: not available URL: <http://dovecot.org/pipermail/dovecot/attachments/20110910/c5e99667/attachment-0004.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 490 bytes Desc: This is a digitally signed message part URL: <http://dovecot.org/pipermail/dovecot/attachments/20110910/c5e99667/attachment-0004.bin>
On Fri, Sep 09, 2011 at 08:02:57PM -0700, Noel Butler wrote:> Uhg i'll pretend I didnt see that :)We only use dovecot to provide sasl authentication to postfix smtp clients, using a separate password just for that purpose. Storing it in plaintext is the only way to support all authentication types.> I suspect the wording here is incorrect, its just a failover AFAIK, it > only hits the first entry failing to second if no response.Hmm, that would work for me, if it worked ;).> suggest, having just one master server, after all, dovecot and postfix > just need to read, not alter/update/insert etc.True; but the pieces that are altering/updating/inserting the data that postfix/dovecot need to read need redundancy as well :).> yep thats correct because it has " gone away" but it still uses the > second host immediately, thats just dovecot trying to re-establish its > link with primaryBased on my testing, it doesn't use the second host immediately, but only sporadically, with most of the authentications failing.> err postfix is not dovecot, you need to also add failover in postfix's > sql lookup commandspostfix relies on dovecot for authentication, this postfix error message is the result of dovecot not successfully processing an authentication request. postfix itself handles mysql failure well, it both load balances queries across both servers and also continues to function when one isn't available.> (note: I talk of load balancer as in real hardware device, not as in > pretend LB's as in software)We actually have a hardware load balancer, and I've considered just sticking the mysql servers behind it. But everything else using them handles failover ok, and initially I'd rather get dovecot doing the same before changing the current architecture. Thanks for the reply... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | henson at csupomona.edu California State Polytechnic University | Pomona CA 91768
On Fri, 2011-09-09 at 19:33 -0700, Paul B. Henson wrote:> According to the sample SQL configuration file "HA / round-robin > load-balancing is supported by giving multiple host settings, like: > host=sql1.host.org host=sql2.host.org". > > However, as far as I can tell dovecot only connects to the first listed > host, and processes all queries through it, there does not appear to be > any load-balancing going on.The current code creates connection to the second server only when the first connection is already busy with an SQL query, or when it's not working. Once there are more connections, it starts doing round robin lookups. This works okay enough with PostgreSQL because it does asynchronous lookups, so two simultaneous lookups create a second connection. MySQL does synchronous lookups though, so the second connection is normally never created. I suppose the fix to this would be to always connect to all SQL servers at startup.> That's not necessarily a dealbreaker; however, high-availability does > not appear to be working either. > > If I shutdown the first mysql server, dovecot starts to log connection > failures: > > Sep 9 15:47:34 tweak dovecot: auth: Error: > mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): > Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - > waiting for 1 seconds before retry > > Sep 9 15:47:39 tweak dovecot: auth: Error: > mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): > Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - > waiting for 25 seconds before retryThose are intentional.> And postfix starts to fail authentications: > > Sep 9 15:47:35 tweak postfix/smtpd[5119]: warning: > bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5 > authentication failed: Connection lost to authentication serverIt should have created the second connection here and not fail..> Now and again the authentication process dies: > > Sep 9 15:47:39 tweak dovecot: auth: Panic: file auth-request-handler.c: > line 697 (auth_request_handler_flush_failures): assertion failed: > (auth_request->state == AUTH_REQUEST_STATE_FINISHED)And this of course shouldn't happen either.> Requests start to pile up: > > Sep 9 15:51:46 tweak dovecot: auth: Warning: auth workers: Auth request > was queued for 25 seconds, 45 left in queue > > Lookups time out: > > Sep 9 15:57:22 tweak dovecot: auth: Error: auth worker: Aborted > request: Lookup timed outThese are the result of the previous failures.> This occasionally pops up: > > Sep 9 15:58:38 tweak dovecot: auth: Fatal: > net_connect_unix(auth-worker) failed: Resource temporarily unavailableProbably this too.> And sometimes the auth process gets temporarily disabled: > > Sep 9 15:58:57 tweak dovecot: master: Error: service(auth): command > startup failed, throttlingMost likely related to the crash, although I think this still shouldn't have happened.> I don't think all authentications fail during the scenario, but I think > the majority do. Based on the network traffic, dovecot is almost > continuously trying to connect to the first listed server. It sometimes > connects to the second listed server, but when it does, the connection > does not persist, it goes away almost immediately.There are multiple auth-worker processes, each one having their own internal MySQL connections with separate retry counters. I'll try to debug this soon.
On Fri, 2011-09-09 at 19:33 -0700, Paul B. Henson wrote:> Sep 9 15:47:34 tweak dovecot: auth: Error: > mysql(mysql-1.unx.csupomona.edu): Connect failed to database (idmgmt): > Can't connect to MySQL server on 'mysql-1.unx.csupomona.edu' (111) - > waiting for 1 seconds before retryI did several fixes related to this in v2.0 hg.> And postfix starts to fail authentications: > > Sep 9 15:47:35 tweak postfix/smtpd[5119]: warning: > bender.iitsys.csupomona.edu[134.71.250.134]: SASL DIGEST-MD5 > authentication failed: Connection lost to authentication serverThe reason why it kept failing with Postfix was because Dovecot had 10 second timeout for SQL connecting, and Postfix had 10 second timeout before failing authentication. So Postfix never waited long enough for Dovecot to attempt a second connection to the second MySQL server. I dropped Dovecot's SQL connect timeout to 5 seconds.> Now and again the authentication process dies: > > Sep 9 15:47:39 tweak dovecot: auth: Panic: file auth-request-handler.c: > line 697 (auth_request_handler_flush_failures): assertion failed: > (auth_request->state == AUTH_REQUEST_STATE_FINISHED)This happened only with non-plaintext authentication (e.g. DIGEST-MD5). Fixed also.
Am 15.09.2011 13:43, schrieb Timo Sirainen:> On Thu, 2011-09-15 at 13:39 +0200, Robert Schetterer wrote: >> >> is there really a native failover mysql in dovecot ? >> cant remember this , i only remember this as part of dovecot proxiing > > For SQL authentication it can use multiple SQL server hosts (with both > MySQL and PostgreSQL) and do HA/load balancing. > >ok, i see, but i have nearly all possible parameters in mysql ( i use a mysql cluster ), thx anyway for answer -- Best Regards MfG Robert Schetterer Germany/Munich/Bavaria