On 18 Sep 2018, at 13.29, Simone Lazzaris <s.lazzaris at interactive.eu> wrote:> > > Hi all, again; > > > > I've enabled the core dumps and let it go for some day waiting for the issue > > to reoccur. > > > > Meantime I've also upgraded the poolmon script, as Sami suggested. > > > > It seems that the upgrade has scared the issue away, because it no longer > > occurred. > > > > Maybe the problem is related to the way the old poolmon talked to the > > director daemon? I'm not very inclined to downgrade poolmon to catch a > > traceback, but can do if neccessary. > > Well, maybe it's not necessary ;) > I've performed some maintenance operations on the backends and that triggered the crash. It seems that something goes wrong where one backend come back online.It's weird how easily you can reproduce the crash. I've ran all kinds of (stress) tests and I can't reproduce this crash. I was able to reproduce the original hang though.> Unfortunately, the core was not dumped.... And I don't know what to do: the director service was not chrooted, and ulimit -c is unlimited.Do you have: sysctl -w fs.suid_dumpable=2 -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20180918/cde1dc3b/attachment-0001.html>
Simone Lazzaris
2018-Sep-18 12:01 UTC
Auth process sometimes stop responding after upgrade
Alas, I've set fs.suid_dumpable to 2 but the core is not dumped. So far I've checked: - ulimit -c unlimited, done - /proc/sys/kernel/core_pattern is set to /var/tmp/core.%p - /var/tmp is chmod 1777 - daemon is not chrooted - sysctl -w fs.suid_dumpable=2 - dir /var/tmp is empty and filesystem has 2GB free This is the logfile: Sep 18 13:54:22 imap-front4 dovecot: director: doveadm: Host 192.168.1.145 changed down (vhost_count=100 last_updown_change=0) Sep 18 13:54:52 imap-front4 dovecot: director: doveadm: Host 192.168.1.145 changed up (vhost_count=100 last_updown_change=1537271662) Sep 18 13:54:52 imap-front4 dovecot: director: Panic: file doveadm-connection.c: line 1097 (doveadm_connection_deinit): assertion failed: (conn->to_ring_sync_abort == NULL) Sep 18 13:54:52 imap-front4 dovecot: director: Error: Raw backtrace: /usr/local/lib/dovecot/ libdovecot.so.0(+0xa15be) [0xb76fa5be] -> /usr/local/lib/dovecot/libdovecot.so. 0(+0xa1641) [0xb76fa641] -> /usr/local/lib/dovecot/libdovecot.so.0(i_fatal+0) [0xb768035e] -> dovecot/director() [0x80574f7] -> dovecot/director() [0x8057f03] -> /usr/local/lib/ dovecot/libdovecot.so.0(io_loop_call_io+0x6b) [0xb77133db] -> /usr/local/lib/dovecot/ libdovecot.so.0(io_loop_handler_run_internal+0xfe) [0xb7714e1e] -> /usr/local/lib/dovecot/ libdovecot.so.0(io_loop_handler_run+0x46) [0xb7713496] -> /usr/local/lib/dovecot/ libdovecot.so.0(io_loop_run+0x48) [0xb7713658] -> /usr/local/lib/dovecot/libdovecot.so. 0(master_service_run+0x2e) [0xb768c45e] -> dovecot/director(main+0x49e) [0x804cf5e] -> /lib/i386-linux-gnu/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xb7501e46] -> dovecot/ director() [0x804d081] Sep 18 13:54:52 imap-front4 dovecot: director: Fatal: master: service(director): child 8059 killed with signal 6 (core not dumped) Sep 18 13:54:52 imap-front4 dovecot: director: Connecting to 212.183.164.161:9090 (as 212.183.164.164): Initial connection Sep 18 13:54:52 imap-front4 dovecot: director: Incoming connection from director 212.183.164.163/in Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.163/in): Host 192.168.1.142 vhost count changed from 100 to 100 Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.163/in): Host 192.168.1.143 vhost count changed from 100 to 100 Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.163/in): Host 192.168.1.144 vhost count changed from 100 to 100 Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.163/in): Host 192.168.1.145 vhost count changed from 100 to 100 Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.163/in): Host 192.168.1.216 vhost count changed from 100 to 100 Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.163/in): Host 192.168.1.217 vhost count changed from 100 to 100 Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.163/in): Host 192.168.1.218 vhost count changed from 100 to 100 Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.163/in): Host 192.168.1.219 vhost count changed from 100 to 100 Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.161:9090/out): Handshake finished in 0.001 secs (bytes in=61, bytes out=791, 0+0 USERs received, last input 0.000 s ago, last output 0.001 s ago, connected 0.001 s ago, 408 peak output buffer size, 0.000 CPU secs since connected) Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.163/in): Handshake finished in 0.006 secs (bytes in=111411, bytes out=56, 2940+0 USERs received, last input 0.000 s ago, last output 0.006 s ago, connected 0.006 s ago, 0 peak output buffer size, 0.004 CPU secs since connected) I can confirm that I can trigger the issue having one of the backends flapping down/up. *Simone Lazzaris* *Qcom S.p.A.* simone.lazzaris at qcom.it[1] | www.qcom.it[2] * LinkedIn[3]* | *Facebook*[4] [5] -------- [1] mailto:simone.lazzaris at qcom.it [2] https://www.qcom.it [3] https://www.linkedin.com/company/qcom-spa [4] http://www.facebook.com/qcomspa [5] https://www.qcom.it/includes/email-banner.gif -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20180918/fb9a50cb/attachment.html>
If you are using systemd, create /etc/systemd/system/dovecot.service.d/limits.conf and put [Service]LimitCORE=infinity and run? systemctl daemon-reloadsystemctl restart dovecot ---Aki TuomiDovecot oy -------- Original message --------From: Simone Lazzaris <s.lazzaris at interactive.eu> Date: 18/09/2018 15:01 (GMT+02:00) To: Timo Sirainen <tss at iki.fi> Cc: dovecot at dovecot.org Subject: Re: Auth process sometimes stop responding after upgrade Alas, I've set fs.suid_dumpable to 2 but the core is not dumped. So far I've checked: ? - ulimit -c unlimited, done - /proc/sys/kernel/core_pattern is set to /var/tmp/core.%p - /var/tmp is chmod 1777 - daemon is not chrooted - sysctl -w fs.suid_dumpable=2 - dir /var/tmp is empty and filesystem has 2GB free ? ? This is the logfile: ? Sep 18 13:54:22 imap-front4 dovecot: director: doveadm: Host 192.168.1.145 changed down (vhost_count=100 last_updown_change=0) Sep 18 13:54:52 imap-front4 dovecot: director: doveadm: Host 192.168.1.145 changed up (vhost_count=100 last_updown_change=1537271662) Sep 18 13:54:52 imap-front4 dovecot: director: Panic: file doveadm-connection.c: line 1097 (doveadm_connection_deinit): assertion failed: (conn->to_ring_sync_abort == NULL) Sep 18 13:54:52 imap-front4 dovecot: director: Error: Raw backtrace: /usr/local/lib/dovecot/libdovecot.so.0(+0xa15be) [0xb76fa5be] -> /usr/local/lib/dovecot/libdovecot.so.0(+0xa1641) [0xb76fa641] -> /usr/local/lib/dovecot/libdovecot.so.0(i_fatal+0) [0xb768035e] -> dovecot/director() [0x80574f7] -> dovecot/director() [0x8057f03] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_call_io+0x6b) [0xb77133db] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_handler_run_internal+0xfe) [0xb7714e1e] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_handler_run+0x46) [0xb7713496] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_run+0x48) [0xb7713658] -> /usr/local/lib/dovecot/libdovecot.so.0(master_service_run+0x2e) [0xb768c45e] -> dovecot/director(main+0x49e) [0x804cf5e] -> /lib/i386-linux-gnu/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xb7501e46] -> dovecot/director() [0x804d081] Sep 18 13:54:52 imap-front4 dovecot: director: Fatal: master: service(director): child 8059 killed with signal 6 (core not dumped) Sep 18 13:54:52 imap-front4 dovecot: director: Connecting to 212.183.164.161:9090 (as 212.183.164.164): Initial connection Sep 18 13:54:52 imap-front4 dovecot: director: Incoming connection from director 212.183.164.163/in Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.163/in): Host 192.168.1.142 vhost count changed from 100 to 100 Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.163/in): Host 192.168.1.143 vhost count changed from 100 to 100 Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.163/in): Host 192.168.1.144 vhost count changed from 100 to 100 Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.163/in): Host 192.168.1.145 vhost count changed from 100 to 100 Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.163/in): Host 192.168.1.216 vhost count changed from 100 to 100 Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.163/in): Host 192.168.1.217 vhost count changed from 100 to 100 Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.163/in): Host 192.168.1.218 vhost count changed from 100 to 100 Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.163/in): Host 192.168.1.219 vhost count changed from 100 to 100 Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.161:9090/out): Handshake finished in 0.001 secs (bytes in=61, bytes out=791, 0+0 USERs received, last input 0.000 s ago, last output 0.001 s ago, connected 0.001 s ago, 408 peak output buffer size, 0.000 CPU secs since connected) Sep 18 13:54:52 imap-front4 dovecot: director: director(212.183.164.163/in): Handshake finished in 0.006 secs (bytes in=111411, bytes out=56, 2940+0 USERs received, last input 0.000 s ago, last output 0.006 s ago, connected 0.006 s ago, 0 peak output buffer size, 0.004 CPU secs since connected) ? ? I can confirm that I can trigger the issue having one of the backends flapping down/up. -- Simone Lazzaris Responsabile datacenter Qcom S.p.A. Via Roggia Vignola, 9 | 24047 Treviglio (BG) T +39036347905 | D +3903631970352| M +393938111237 simone.lazzaris at qcom.it | www.qcom.it Qcom Official Pages LinkedIn | Facebook -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://dovecot.org/pipermail/dovecot/attachments/20180918/46df577b/attachment.html>
Reasonably Related Threads
- Auth process sometimes stop responding after upgrade
- Auth process sometimes stop responding after upgrade
- Auth process sometimes stop responding after upgrade
- Auth process sometimes stop responding after upgrade
- Auth process sometimes stop responding after upgrade