When we upgrade our kernels from 2.6.32.2 to 3.2.16 something strange has happened. On high traffic dovecot/auth looks like not responding. We found a lot of this lines at the log: dovecot: pop3-login: Error: net_connect_unix(pop3) failed: Resource temporarily unavailable (...) and clients stop authorizing Some other errors follow in the wake of: dovecot: pop3: Error: Raw backtrace: /opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x373ca) [0x7768a3ca] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x3743b) [0x7768a43b] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7766048b] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x4593a) [0x7769893a] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(io_add+0xaf) [0x7769757f] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(master_service_init_finish+0x19a) [0x77683c2a] -> dovecot/pop3(main+0xfc) [0x804a90c] -> /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x774c04d3] -> dovecot/pop3() [0x804aba9] dovecot: pop3: Error: Raw backtrace: /opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x373ca) [0x7768a3ca] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x3743b) [0x7768a43b] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7766048b] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x4593a) [0x7769893a] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(io_add+0xaf) [0x7769757f] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(master_service_init_finish+0x19a) [0x77683c2a] -> dovecot/pop3(main+0xfc) [0x804a90c] -> /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x774c04d3] -> dovecot/pop3() [0x804aba9] dovecot: master: Error: service(pop3): child 18756 killed with signal 6 (core dumped) dovecot: master: Error: service(pop3): child 18756 killed with signal 6 (core dumped) dovecot: master: Error: service(pop3): command startup failed, throttling dovecot: master: Error: service(pop3): command startup failed, throttling dovecot: pop3-login: Error: Raw backtrace: /opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x373ca) [0x776b73ca] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x3743b) [0x776b743b] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7768d48b] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x4593a) [0x776c593a] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(io_add+0xaf) [0x776c457f] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(master_service_init_finish+0x19a) [0x776b0c2a] -> /opt/dovecot2/lib/dovecot/libdovecot-login.so.0(main+0x143) [0x77705383] -> /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x774ed4d3] -> dovecot/pop3-login() [0x8049471] dovecot: pop3-login: Error: Raw backtrace: /opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x373ca) [0x776fd3ca] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x3743b) [0x776fd43b] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x776d348b] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x4593a) [0x7770b93a] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(io_add+0xaf) [0x7770a57f] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(master_service_init_finish+0x19a) [0x776f6c2a] -> /opt/dovecot2/lib/dovecot/libdovecot-login.so.0(main+0x143) [0x7774b383] -> /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x775334d3] -> dovecot/pop3-login() [0x8049471] And example stack trace (from pop3, pop3-login throws almost the same): #0 0x776f6424 in __kernel_vsyscall () No symbol table info available. #1 0x7744d1ef in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 resultvar = <optimized out> resultvar = <optimized out> pid = 2002518004 selftid = 25476 #2 0x77450835 in __GI_abort () at abort.c:91 save_stage = 2 act = {__sigaction_handler = {sa_handler = 0x9bce4a8, sa_sigaction = 0x9bce4a8}, sa_mask = {__val = {163409408, 2002781570, 163374248, 603, 163374280, 604, 163374280, 2001703379, 0, 2002790760, 2140717252, 163374280, 0, 2003786736, 2002596704, 0, 2002618953, 2003087348, 2140717196, 0, 163409408, 2001286473, 163374248, 10, 2000834616, 2002534400, 604, 2003087348, 604, 2002791863, 4294967295, 10}}, sa_flags = 2140717316, sa_restorer = 0x7764bd84 <pid.3786>} sigs = {__val = {32, 0 <repeats 31 times>}} #3 0x77603390 in default_fatal_finish (type=<optimized out>, status=<optimized out>) at failures.c:187 backtrace = 0x9bce098 "/opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x3837a) [0x7760337a] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x383eb) [0x776033eb] -> /opt/dovecot2/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x775d877a"... #4 0x776033eb in i_internal_fatal_handler (ctx=0x7f98c174, format=0x9bd6db0 "(25476) epoll_ctl(%s, %d) failed: %m", args=0x7f98c194 "\\\323bw\006") at failures.c:688 status = 0 #5 0x775d877a in i_panic (format=0x7762d364 "epoll_ctl(%s, %d) failed: %m") at failures.c:263 ctx = {type = LOG_TYPE_PANIC, exit_status = 0, timestamp = 0x0} args = 0x7f98c194 "\\\323bw\006" f = <optimized out> #6 0x77611d8a in io_loop_handle_add (io=0x9bd6d78) at ioloop-epoll.c:119 ---Type <return> to continue, or q <return> to quit--- ctx = 0x9bd64d8 list = <optimized out> event = {events = 24, data = {ptr = 0x9bd6da0, fd = 163409312, u32 = 163409312, u64 = 163409312}} op = 1 first = true #7 0x776109af in io_add (fd=6, condition=IO_ERROR, callback=0x775fd030 <master_status_error>, context=0x9bd6300) at ioloop.c:54 io = <optimized out> __FUNCTION__ = "io_add" #8 0x775fca9a in master_service_init_finish (service=0x9bd6300) at master-service.c:381 sigint_flags = <optimized out> st = {st_dev = 8, __pad1 = 0, __st_ino = 1286567453, st_mode 4480, st_nlink = 1, st_uid = 0, st_gid = 0, st_rdev = 0, __pad2 = 0, st_size = 0, st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec 1338389522, tv_nsec = 269436805}, st_mtim = {tv_sec = 1338389522, tv_nsec = 269436805}, st_ctim = { tv_sec = 1338389522, tv_nsec = 269436805}, st_ino = 1286567453} value = <optimized out> count = 1 __FUNCTION__ = "master_service_init_finish" #9 0x0804a954 in main (argc=1, argv=0x7f98c414) at main.c:232 set_roots = {0x804ee40, 0x0} service_flags = <optimized out> storage_service_flags = MAIL_STORAGE_SERVICE_FLAG_DISALLOW_ROOT postlogin_socket_path = 0x0 username = 0x0 c = <optimized out> We do some investigation. Restart helped for a moment, but problem was returning. When we back to older kernel - everything back to normal. We compile Dovecot with poll instead of epoll (--with-ioloop=poll) and this works for us. Is any problem with epoll on 3.2.x kernels? Or maybe this is Dovecot problem? Maybe this is not connected with epoll, but epoll is interferes with this. Problem on Dovecot 2.0.11 and 2.0.20. -- Len7hir
Javier Miguel RodrÃguez
2012-May-30 16:14 UTC
[Dovecot] Strange Dovecot 2.0.20 auth chokes and cores
There is a known Problem with epoll, at least on Red Hat / CentOS, this bugzilla may give you additional info (comments of Timo inside) https://bugzilla.redhat.com/show_bug.cgi?id=681578 Regards Javier El 30/05/2012 17:45, Konrad . escribi?:> When we upgrade ourkernels from 2.6.32.2 to 3.2.16 something strange> has happened. > Onhigh traffic dovecot/auth looks like not responding.> > We found a lotof this lines at the log:> dovecot: pop3-login: Error:net_connect_unix(pop3) failed: Resource> temporarily unavailable >(...) and clients stop authorizing> > Some other errors follow in thewake of:> dovecot: pop3: Error: Raw backtrace: >/opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x373ca) [0x7768a3ca] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x3743b) [0x7768a43b] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7766048b] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x4593a) [0x7769893a] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(io_add+0xaf) [0x7769757f] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(master_service_init_finish+0x19a)>[0x77683c2a] -> dovecot/pop3(main+0xfc) [0x804a90c] ->>/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x774c04d3] ->>dovecot/pop3() [0x804aba9]> dovecot: pop3: Error: Raw backtrace: >/opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x373ca) [0x7768a3ca] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x3743b) [0x7768a43b] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7766048b] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x4593a) [0x7769893a] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(io_add+0xaf) [0x7769757f] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(master_service_init_finish+0x19a)>[0x77683c2a] -> dovecot/pop3(main+0xfc) [0x804a90c] ->>/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x774c04d3] ->>dovecot/pop3() [0x804aba9]> dovecot: master: Error: service(pop3):child 18756 killed with signal> 6 (core dumped) > dovecot: master:Error: service(pop3): child 18756 killed with signal> 6 (core dumped) >dovecot: master: Error: service(pop3): command startup failed, throttling> dovecot: master: Error: service(pop3): command startupfailed, throttling> dovecot: pop3-login: Error: Raw backtrace: >/opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x373ca) [0x776b73ca] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x3743b) [0x776b743b] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7768d48b] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x4593a) [0x776c593a] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(io_add+0xaf) [0x776c457f] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(master_service_init_finish+0x19a)>[0x776b0c2a] ->>/opt/dovecot2/lib/dovecot/libdovecot-login.so.0(main+0x143)>[0x77705383] -> /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)>[0x774ed4d3] -> dovecot/pop3-login() [0x8049471]> dovecot: pop3-login:Error: Raw backtrace:>/opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x373ca) [0x776fd3ca] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x3743b) [0x776fd43b] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x776d348b] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x4593a) [0x7770b93a] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(io_add+0xaf) [0x7770a57f] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(master_service_init_finish+0x19a)>[0x776f6c2a] ->>/opt/dovecot2/lib/dovecot/libdovecot-login.so.0(main+0x143)>[0x7774b383] -> /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)>[0x775334d3] -> dovecot/pop3-login() [0x8049471]> > And example stacktrace (from pop3, pop3-login throws almost the same):> #0 0x776f6424 in__kernel_vsyscall ()> No symbol table info available. > #1 0x7744d1efin __GI_raise (sig=6) at> ../nptl/sysdeps/unix/sysv/linux/raise.c:64 >resultvar => resultvar = > pid = 2002518004 > selftid = 25476 > #20x77450835 in __GI_abort () at abort.c:91> save_stage = 2 > act {__sigaction_handler = {sa_handler = 0x9bce4a8, > sa_sigaction 0x9bce4a8}, sa_mask = {__val = {163409408, 2002781570, > 163374248, 603,163374280,> 604, 163374280, 2001703379, 0, 2002790760, 2140717252, >163374280, 0, 2003786736, 2002596704, 0, 2002618953, 2003087348,>2140717196, 0, 163409408,> 2001286473, 163374248, 10, 2000834616,2002534400, 604,> 2003087348, 604, 2002791863, 4294967295, 10}},sa_flags = 2140717316,> sa_restorer = 0x7764bd84 } > sigs = {__val {32, 0 }} > #3 0x77603390 in default_fatal_finish (type=, > status=) atfailures.c:187> backtrace = 0x9bce098 >"/opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x3837a) [0x7760337a] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(+0x383eb) [0x776033eb] ->>/opt/dovecot2/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x775d877a"...>#4 0x776033eb in i_internal_fatal_handler (ctx=0x7f98c174,>format=0x9bd6db0 "(25476) epoll_ctl(%s, %d) failed: %m",>args=0x7f98c194 "\323bw06")> at failures.c:688 > status = 0 > #50x775d877a in i_panic (format=0x7762d364 "epoll_ctl(%s, %d)> failed:%m") at failures.c:263> ctx = {type = LOG_TYPE_PANIC, exit_status = 0,timestamp = 0x0}> args = 0x7f98c194 "\323bw06" > f = > #6 0x77611d8ain io_loop_handle_add (io=0x9bd6d78) at ioloop-epoll.c:119> ---Type tocontinue, or q to quit---> ctx = 0x9bd64d8 > list = > event = {events= 24, data = {ptr = 0x9bd6da0, fd = 163409312,> u32 = 163409312, u64 163409312}} > op = 1 > first = true > #7 0x776109af in io_add (fd=6,condition=IO_ERROR,> callback=0x775fd030 , context=0x9bd6300) at >ioloop.c:54> io = > __FUNCTION__ = "io_add" > #8 0x775fca9a inmaster_service_init_finish (service=0x9bd6300) at>master-service.c:381> sigint_flags = > st = {st_dev = 8, __pad1 = 0,__st_ino = 1286567453, st_mode > 4480, st_nlink = 1, st_uid = 0, st_gid = 0, st_rdev = 0, __pad2 = 0,> st_size = 0, > st_blksize = 4096,st_blocks = 0, st_atim = {tv_sec > 1338389522, tv_nsec = 269436805}, st_mtim = {tv_sec = 1338389522,> tv_nsec = 269436805}, st_ctim = { >tv_sec = 1338389522, tv_nsec = 269436805}, st_ino = 1286567453}> value=> count = 1 > __FUNCTION__ = "master_service_init_finish" > #90x0804a954 in main (argc=1, argv=0x7f98c414) at main.c:232> set_roots {0x804ee40, 0x0} > service_flags = > storage_service_flags MAIL_STORAGE_SERVICE_FLAG_DISALLOW_ROOT > postlogin_socket_path = 0x0 >username = 0x0> c = > > We do some investigation. > Restart helpedfor a moment, but problem was returning.> When we back to older kernel- everything back to normal.> > We compile Dovecot with poll insteadof epoll (--with-ioloop=poll) and> this works for us. > > Is anyproblem with epoll on 3.2.x kernels? Or maybe this is Dovecot problem?>Maybe this is not connected with epoll, but epoll is interferes with this.> > Problem on Dovecot 2.0.11 and 2.0.20.
> From: "Konrad ."<len7hir at gmail.com>> When we upgrade our kernels from 2.6.32.2 to 3.2.16 something strange > has happened. > On high traffic dovecot/auth looks like not responding. > > We found a lot of this lines at the log:[snip]> We compile Dovecot with poll instead of epoll (--with-ioloop=poll) and > this works for us. > > > Is any problem with epoll on 3.2.x kernels?Yes - and it's been discussed here. Some "bright spark" rewrote the kernel epoll code to prevent DoS attacks caused by "excessive forking". That "bright spark" clearly doesn't use imap or webserver software. Redhat backported this change to kernels released after February 2012 and are about to release an emergency kernel release which removes it. A number of $LARGE companies were "not happy" As you've posted, compiling without epoll fixes the problem, but that doesn't help anyone using distro or repo packages.
Possibly Parallel Threads
- dovecot/auth killed with signal 6
- Problem about dovecot Panic
- v2.0.13 problems after kernel patch for CVE-2011-1083 applied on Centos 5
- Exit status code 134; what is it, in the context of Dovecot Antispam plug-in?
- How to dsync mdbox compressed to maildir uncompressed