Chris Wakelin
2006-Sep-22 10:24 UTC
[Dovecot] Dovecot 1.0.rc7 ioloop-poll.c assertion failed after SIGHUP
Anybody else seen this one? We're running Dovecot 1.0.rc7 on a different port on our live server (we're generally live with 1.0.beta5) and after a HUP to reload the user database (in a passwd-file) this morning we got:-> dovecot: Sep 22 07:45:05 Info: IMAP 14482 someuser xxx.xxx.xxx.xxx : Disconnected in IDLE > dovecot: Sep 22 07:45:05 Warning: imap-login: SSL_read() syscall failed: No route to host [xxx.xxx.xxx.xxx] > dovecot: Sep 22 07:45:05 Warning: SIGHUP received - reloading configuration > dovecot: Sep 22 07:45:05 Error: Login process died too early - shutting down > dovecot: Sep 22 07:45:05 Panic: file ioloop-poll.c: line 105 (io_loop_handle_remove): assertion failed: (index >= 0 && (unsigned int) index < ctx->fds_count)and Dovecot had died. "someuser" was me, and I'd left my Thunderbird client logged in (via IMAPS) by mistake when I hibernated my PC last night. I'd seen exactly the same assert on a test box the day before yesterday, (this time without a machine hibernating):-> dovecot: Sep 19 19:00:45 Info: imap-login: Aborted login: user=<testuser>, method=PLAIN, rip=yyy.yyy.yyy.yyy, lip=yyy.yyy.yyy.yyy, secured > dovecot: Sep 19 19:07:42 Warning: SIGHUP received - reloading configuration > dovecot: Sep 19 19:07:42 Error: Login process died too early - shutting down > dovecot: Sep 19 19:07:42 Panic: file ioloop-poll.c: line 105 (io_loop_handle_remove): assertion failed: (index >= 0 && (unsigned int) index < ctx->fds_count)Where yyy.yyy.yyy.yyy is actually the test box (I was logging in from the server itself using "openssl s_client"). The thing is, I was hoping to upgrade everyone (~20,000 users) to 1.0.rc7 next Tuesday, but now I'm not so sure! Chris -- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- Christopher Wakelin, c.d.wakelin at reading.ac.uk IT Services Centre, The University of Reading, Tel: +44 (0)118 378 8439 Whiteknights, Reading, RG6 2AF, UK Fax: +44 (0)118 975 3094
Chris Wakelin
2006-Sep-25 10:16 UTC
[Dovecot] Dovecot 1.0.rc7 ioloop-poll.c assertion failed after SIGHUP
Chris Wakelin wrote:> Anybody else seen this one? We're running Dovecot 1.0.rc7 on a different > port on our live server (we're generally live with 1.0.beta5) and after > a HUP to reload the user database (in a passwd-file) this morning we got:- > >> dovecot: Sep 22 07:45:05 Info: IMAP 14482 someuser xxx.xxx.xxx.xxx : Disconnected in IDLE >> dovecot: Sep 22 07:45:05 Warning: imap-login: SSL_read() syscall failed: No route to host [xxx.xxx.xxx.xxx] >> dovecot: Sep 22 07:45:05 Warning: SIGHUP received - reloading configuration >> dovecot: Sep 22 07:45:05 Error: Login process died too early - shutting down >> dovecot: Sep 22 07:45:05 Panic: file ioloop-poll.c: line 105 (io_loop_handle_remove): assertion failed: (index >= 0 && (unsigned int) index < ctx->fds_count) > > and Dovecot had died. "someuser" was me, and I'd left my Thunderbird > client logged in (via IMAPS) by mistake when I hibernated my PC last night. > > I'd seen exactly the same assert on a test box the day before yesterday, > (this time without a machine hibernating):- > >> dovecot: Sep 19 19:00:45 Info: imap-login: Aborted login: user=<testuser>, method=PLAIN, rip=yyy.yyy.yyy.yyy, lip=yyy.yyy.yyy.yyy, secured >> dovecot: Sep 19 19:07:42 Warning: SIGHUP received - reloading configuration >> dovecot: Sep 19 19:07:42 Error: Login process died too early - shutting down >> dovecot: Sep 19 19:07:42 Panic: file ioloop-poll.c: line 105 (io_loop_handle_remove): assertion failed: (index >= 0 && (unsigned int) index < ctx->fds_count) > > Where yyy.yyy.yyy.yyy is actually the test box (I was logging in from > the server itself using "openssl s_client"). > > The thing is, I was hoping to upgrade everyone (~20,000 users) to > 1.0.rc7 next Tuesday, but now I'm not so sure! > > Chris >I've had several goes at trying to cause this "on-demand", but I can only get it to fail occasionally. My best recipe so far is to 1) Login via SSL and open INBOX (I'm using "openssl s_client" as my "debug" IMAP client). 2) IDLE then break the connection (with CTRL-C) 3) Replace the userdb passwd-file (mv userdb userdb.temp;cp userdb.temp userdb) to clobber any caching 4) Send SIGHUP to the Dovecot master process. This fails sometimes:- dovecot: Sep 25 10:38:06 Info: IMAP:6196:someuser:yyy.yyy.yyy.yyy: Disconnected in IDLE dovecot: Sep 25 10:38:11 Warning: SIGHUP received - reloading configuration dovecot: Sep 25 10:38:12 Error: invalid I/O fd 29, callback 17d2c dovecot: Sep 25 10:38:12 Error: login: fd_read() failed: Resource temporarily unavailable dovecot: Sep 25 10:38:12 Error: Login process died too early - shutting down dovecot: Sep 25 10:38:13 Panic: file ioloop-poll.c: line 105 (io_loop_handle_remove): assertion failed: (index >= 0 && (unsigned int) index < ctx->fds_count) I once managed to get an "invalid I/O fd" without the crash as well, just to make things more interesting :) I'm guessing it's some sort of race condition on a login-process ending (using SSL forces it to persist while it proxies the IMAP connection) and the forcing of a (slow?) reload of the userdb. It looks like the assertion failure is caused by the login process dying, rather than the other way round. I should have said these are Solaris 8 (sparc) boxes, and we have "login_process_per_connection = no" Our upgrade to 1.0rc7 is on ice for now, until we get this sorted out. (We're actually on 1.0beta7 on the "live" service, not beta5 as I said above.) Best Wishes, Chris -- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- Christopher Wakelin, c.d.wakelin at reading.ac.uk IT Services Centre, The University of Reading, Tel: +44 (0)118 378 8439 Whiteknights, Reading, RG6 2AF, UK Fax: +44 (0)118 975 3094