Erik A Johnson
2012-Nov-10 10:44 UTC
[Dovecot] imap-login hanging when firewall blocks ssl handshaking
imap-login processes are hanging (using 100% of CPU) when connected from a client that is partially blocked by a firewall. It appears that imap-login is stuck in a loop trying to complete an ssl handshake. imap-login is working fine for other clients not blocked by the firewall (including localhost). This is dovecot 2.1.10 under Mac OS X 10.8.2 (compiled from sources); the firewall is Little Snitch 3.0.1 blocking port 993, which appears to let the connection initiate but then squashes and disconnects the socket during ssl handshaking. gdb backtrace and Activity Monitor's "Sample Process" show that imap-login is stuck calling ioloop-kqueue's io_loop_handler_run -> io_loop_call_io -> ssl_step repeatedly; dtruss shows that it is repeatedly making system calls to kevent and read, the latter returning -1 with errno 57=ENOTCONN="Socket is not connected". (I also tried ./configure --with-ioloop=poll and --with-iopoll=select instead of the default best = kqueue but the results were the same; --with-iopoll=epoll didn't work because epoll is not available on this machine.) The client, initiated by the command "openssl s_client -connect SERVER:993", first responds "CONNECTED(00000003)" but then immediately the error "60278:error:140790E5:SSL routines:SSL23_WRITE:ssl handshake failure:/SourceCache/OpenSSL098/OpenSSL098-44/src/ssl/s23_lib.c:182:". The infinite loop is in src/lib/ioloop.c in the function "io_loop_run" where the statement "while (ioloop->running) io_loop_handler_run(ioloop)" is executed. While I think the firewall is not doing the right thing (i.e., it lets a connection start but then kills it), it seems that imap-login should fail gracefully when the socket is no longer connected. Ideas? Thanks. ----------------------------------- /usr/local/sbin/dovecot -n # 2.1.10: /usr/local/etc/dovecot/dovecot.conf # OS: Darwin 12.2.0 x86_64 default_internal_user = _dovecot default_login_user = _dovenull namespace { hidden = no list = yes location = mbox:/Users/%u/Documents/Mailboxes prefix = mbox/ separator = / type = private } namespace { inbox = yes location = maildir:/Users/%u/Documents/Maildir mailbox Drafts { special_use = \Drafts } mailbox Junk { special_use = \Junk } mailbox Sent { special_use = \Sent } mailbox Trash { special_use = \Trash } prefix = separator = / type = private } passdb { driver = pam } protocols = imap lmtp ssl = required ssl_cert = </usr/local/etc/dovecot/dovecot.pem ssl_key = </usr/local/etc/dovecot/dovecot.pem userdb { driver = passwd } protocol imap { mail_plugins = " zlib imap_zlib" } ----------------------------------- gdb's "bt full" gives various results depending on where in the loop, but they are they same from ssl23_accept on down. (gdb) bt full #0 0x00007fff92d19f63 in sha1_block_data_order () #1 0x00007fff92d1a4ae in SHA1_Final () #2 0x00007fff92d45430 in EVP_DigestFinal_ex () #3 0x00007fff92d17d7a in ssleay_rand_add () #4 0x00007fff974529be in ssl23_accept () #5 0x000000010cac0729 in ssl_handshake [inlined] () at /private/tmp/johnsone/dovecot-2.1.10/src/login-common/ssl-proxy-openssl.c:465 ret = -949939536 #6 0x000000010cac0729 in ssl_step (proxy=0x7fff53152b90) at ssl-proxy-openssl.c:528 ret = -949939536 #7 0x000000010cb11bd1 in io_loop_call_io (io=0x7f7fa0c1c240) at ioloop.c:379 ioloop = (struct ioloop *) 0x7f7fa0c03b00 t_id = 2 #8 0x000000010cb12c69 in io_loop_handler_run (ioloop=0x0) at ioloop-kqueue.c:148 tv = { tv_sec = 130, tv_usec = 124797 } ctx = (struct ioloop_handler_context *) 0x7f7fa0c1a890 ret = 1 ts = { tv_sec = 130, tv_nsec = 124797000 } events = (struct kevent *) 0x10cb4ee78 #9 0x000000010cb11b88 in io_loop_run (ioloop=0x7fff53152b90) at ioloop.c:398 No locals. #10 0x000000010cafde9b in master_service_run (service=0x7f7fa0c039f0, callback=0x7f7fa0c1c2d8) at master-service.c:543 No locals. #11 0x000000010cabdcd1 in main_deinit [inlined] () at /private/tmp/johnsone/dovecot-2.1.10/src/login-common/main.c:406 set_pool = (pool_t) 0x7f7fa100b420 login_socket = 0x10cb4f7f8 "?9??" #12 0x000000010cabdcd1 in login_binary_run (argc=1, argv=0x7f7fa0c037c0, binary=0x7fff53152b90) at main.c:407 set_pool = (pool_t) 0x7f7fa100b420 login_socket = 0x10cb4f7f8 "?9??" #13 0x00007fff8ac847e1 in start () (gdb) bt full #0 0x00007fff96e4c110 in malloc_zone_malloc () #1 0x00007fff96e4cba7 in malloc () #2 0x00007fff92d2c9e2 in CRYPTO_malloc () #3 0x00007fff92d4535f in EVP_DigestInit_ex () #4 0x00007fff92d17caa in ssleay_rand_add () #5 0x00007fff974529be in ssl23_accept () ... #0 0x00007fff94775ffa in read () #1 0x00007fff92cf41e6 in sock_read () #2 0x00007fff92d15598 in BIO_read () #3 0x00007fff9745294a in ssl23_read_bytes () #4 0x00007fff97452c3e in ssl23_get_client_hello () #5 0x00007fff97452b4d in ssl23_accept () ... -----------------------------------
Timo Sirainen
2012-Nov-23 06:52 UTC
[Dovecot] imap-login hanging when firewall blocks ssl handshaking
On 10.11.2012, at 12.44, Erik A Johnson wrote:> imap-login processes are hanging (using 100% of CPU) when connected from a client that is partially blocked by a firewall. It appears that imap-login is stuck in a loop trying to complete an ssl handshake. imap-login is working fine for other clients not blocked by the firewall (including localhost). > > This is dovecot 2.1.10 under Mac OS X 10.8.2 (compiled from sources); the firewall is Little Snitch 3.0.1 blocking port 993, which appears to let the connection initiate but then squashes and disconnects the socket during ssl handshaking. > > gdb backtrace and Activity Monitor's "Sample Process" show that imap-login is stuck calling ioloop-kqueue's io_loop_handler_run -> io_loop_call_io -> ssl_step repeatedly; dtruss shows that it is repeatedly making system calls to kevent and read, the latter returning -1 with errno 57=ENOTCONN="Socket is not connected". (I also tried ./configure --with-ioloop=poll and --with-iopoll=select instead of the default best = kqueue but the results were the same; --with-iopoll=epoll didn't work because epoll is not available on this machine.) The client, initiated by the command "openssl s_client -connect SERVER:993", first responds "CONNECTED(00000003)" but then immediately the error "60278:error:140790E5:SSL routines:SSL23_WRITE:ssl handshake failure:/SourceCache/OpenSSL098/OpenSSL098-44/src/ssl/s23_lib.c:182:". The infinite loop is in src/lib/ioloop.c in the function "io_loop_run" where the statement "while (ioloop->running) io_loop_handler_run(ioloop)" is executed.I wonder if this fixes it? http://hg.dovecot.org/dovecot-2.1/rev/e95479f439aa
Erik A Johnson
2012-Dec-05 09:08 UTC
[Dovecot] imap-login hanging when firewall blocks ssl handshaking
FYI, the tcpdump I sent previously was with one of our previously-discussed patches in place: if (!proxy->client_proxy && net_geterror(proxy->fd_ssl) == EBADF) { I'm attaching that dump again (as tcpdump_output_witholdpatch_headeronly.txt), as well as a dump without any of the patches (tcpdump_output_withoutpatches_headeronly.txt) in case there is any difference. On December 4, 2012 6:18:23 AM PST, Ben Morrow <ben at morrow.me.uk> wrote:> At 12PM +0000 on 4/12/12 Ben Morrow wrote: >> Well, it looks to me as though xnu/bsd/kern/uipc_socket.c:soreceive will >> indeed return ENOTCONN for a socket which was once successfully >> connected but has now been disconnected. This happens when the socket is >> in the DEFUNCT state, which is a state that doesn't exist in FreeBSD; >> it's not completely clear but I suspect firewalls may be able to put >> arbitrary sockets into that state. > > Investigating a little further, it should be possible to test for this > situation directly. Assuming I'm correct about what's going on here, > this should be both cleaner and safer than mucking about looking for > ENOTCONN and guessing about what's happening. > > Erik, does this make the problem go away? I left out the > proxy->client_proxy test, since AFAICT this is just as likely to happen > on a client socket. > > Ben > > #ifdef SO_ISDEFUNCT > + if (getsockopt(proxy->fd_ssl, SOL_SOCKET, SO_ISDEFUNCT, > + (void *)&err, sizeof(err)) == 0 && err) { > + errstr = t_strdup_printf( > + "%s: socket is defunct", func_name); > + break; > + } > +#endifNope, SO_ISDEFUNCT isn't defined. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: tcpdump_output_witholdpatch_headeronly.txt URL: <http://dovecot.org/pipermail/dovecot/attachments/20121205/a4a3be52/attachment-0008.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: tcpdump_output_withoutpatches_headeronly.txt URL: <http://dovecot.org/pipermail/dovecot/attachments/20121205/a4a3be52/attachment-0009.txt>
Ben Morrow
2012-Dec-05 10:07 UTC
[Dovecot] imap-login hanging when firewall blocks ssl handshaking
At 1AM -0800 on 5/12/12 Erik A Johnson wrote:> FYI, the tcpdump I sent previously was with one of our > previously-discussed patches in place: > > if (!proxy->client_proxy && net_geterror(proxy->fd_ssl) == EBADF) { > > I'm attaching that dump again (as > tcpdump_output_witholdpatch_headeronly.txt), as well as a dump without > any of the patches (tcpdump_output_withoutpatches_headeronly.txt) in > case there is any difference.Well, they're certainly different. Are you sure the second trace (withoutpatches) was of a session which went into an infinite loop? The only thing peculiar about that trace is that the server closes the connection after receiving the first packet from the client, but it does so perfectly properly: it ACKs the client's data packet, and does the FIN-FIN/ACK exchance properly. You will notice there are no [R] packets, which indicate something odd is happening at the server end.> On December 4, 2012 6:18:23 AM PST, Ben Morrow <ben at morrow.me.uk> wrote: > > At 12PM +0000 on 4/12/12 Ben Morrow wrote: > >> Well, it looks to me as though xnu/bsd/kern/uipc_socket.c:soreceive will > >> indeed return ENOTCONN for a socket which was once successfully > >> connected but has now been disconnected. This happens when the socket is > >> in the DEFUNCT state, which is a state that doesn't exist in FreeBSD; > >> it's not completely clear but I suspect firewalls may be able to put > >> arbitrary sockets into that state. > > > > Investigating a little further, it should be possible to test for this > > situation directly. Assuming I'm correct about what's going on here, > > this should be both cleaner and safer than mucking about looking for > > ENOTCONN and guessing about what's happening. > > > > Erik, does this make the problem go away? I left out the > > proxy->client_proxy test, since AFAICT this is just as likely to happen > > on a client socket. > > > > Ben > > > > #ifdef SO_ISDEFUNCT > > + if (getsockopt(proxy->fd_ssl, SOL_SOCKET, SO_ISDEFUNCT, > > + (void *)&err, sizeof(err)) == 0 && err) { > > + errstr = t_strdup_printf( > > + "%s: socket is defunct", func_name); > > + break; > > + } > > +#endif > > Nope, SO_ISDEFUNCT isn't defined.Oh, sorry, that needs #include <sys/socket.h> at the top. If that doesn't work, then which version of the OS are you building for? AFAICT the DEFUNCT socket flag has been present since at least 10.5, but the SO_ISDEFUNCT option was only introduced in 10.7. This is irritating, actually: it means that to properly fix this on all versions of Mac OS Dovecot would need to include the previous ENOTCONN code #ifndef SO_ISDEFUNCT. Ben
Erik A Johnson
2012-Dec-06 05:43 UTC
[Dovecot] imap-login hanging when firewall blocks ssl handshaking
On December 4, 2012 at 4:43:53 AM PST, Ben Morrow <ben at morrow.me.uk> wrote:> So, it looks to me as though you have a firewall problem. You may be > able to get more information by setting the kern.ipc.sodefunctlog sysctl > to 1: this should make the kernel log to syslog (or wherever the OSX > kernel logs go) when sockets are made DEFUNCT and when reads fail for > that reason.sudo sysctl -w kern.ipc.sodefunctlog=1 gives the following in the log: 12/5/12 9:10:00.000 PM kernel[0]: sosetdefunct[60169]: (target pid 60169 level 0) so 0xffffff803159c738 [2,1] marked as defunct 12/5/12 9:10:00.000 PM kernel[0]: sodefunct[60169]: (target pid 60169 level 0) so 0xffffff803159c738 [2,1] is now defunct [rcv_si 0x0, snd_si 0x0, rcv_fl 0x9400, snd_fl 0x1400] 12/5/12 9:10:00.000 PM kernel[0]: soreceive[60169]: defunct so 0xffffff803159c738 [2,1] (57) The last line is repeated about once every 4 microseconds until I kill it.