Christoph Kluge
2017-Feb-27 22:36 UTC
Fwd: Some mails do not get replicated anymore after memory-exhaust
Hey guys, overall I have an working dovecot replication between 2 servers running on amazon cloud. Sadly I had some messages that my server ran out of memory. After investigating a little bit further I realized that some mails didn't got replicated, but I'm not sure if this was related to the memory exhaust. I was expecting that the full-sync would catch them up but sadly it's not. Attached I'm adding: * /etc/dovecot/dovecot.conf from both servers * one sample of my memory-exhaust exception * maildir directory listing of one mailbox on both servers * commands + outpot of manual attempt for full-replication * grep information of missing mail inside Maildir on both servers Here is my configuration from both servers. The configugration is 1:1 the same except the mail_replica server. Please note one server runs on debian 8.7 and the other one on 7.11. ---- SERVER A> # dovecot -n > # 2.2.13: /etc/dovecot/dovecot.conf > # OS: Linux 3.2.0-4-amd64 x86_64 Debian 8.7 > ---- SERVER B > # dovecot -n > # 2.2.13: /etc/dovecot/dovecot.conf > # OS: Linux 2.6.32-34-pve i686 Debian 7.11 > auth_mechanisms = plain login > disable_plaintext_auth = no > doveadm_password = **** > doveadm_port = 12345 > listen = *,[::] > log_timestamp = "%Y-%m-%d %H:%M:%S " > mail_max_userip_connections = 100 > mail_plugins = notify replication quota > mail_privileged_group = vmail > passdb { > args = /etc/dovecot/dovecot-sql.conf > driver = sql > } > plugin { > mail_replica = tcp:*.****.de > quota = dict:user::file:/var/vmail/%d/%n/.quotausage > replication_full_sync_interval = 1 hours > sieve = /var/vmail/%d/%n/.sieve > sieve_max_redirects = 25 > } > protocols = imap > replication_max_conns = 2 > service aggregator { > fifo_listener replication-notify-fifo { > mode = 0666 > user = vmail > } > unix_listener replication-notify { > mode = 0666 > user = vmail > } > } > service auth { > unix_listener /var/spool/postfix/private/auth { > group = postfix > mode = 0660 > user = postfix > } > unix_listener auth-userdb { > group = vmail > mode = 0600 > user = vmail > } > user = root > } > service config { > unix_listener config { > user = vmail > } > } > service doveadm { > inet_listener { > port = 12345 > } > user = vmail > } > service imap-login { > client_limit = 1000 > process_limit = 512 > } > service lmtp { > unix_listener /var/spool/postfix/private/dovecot-lmtp { > group = postfix > mode = 0600 > user = postfix > } > } > service replicator { > process_min_avail = 1 > unix_listener replicator-doveadm { > mode = 0666 > } > } > ssl_cert = </etc/postfix/smtpd.cert > ssl_key = </etc/postfix/smtpd.key > ssl_protocols = !SSLv2 !SSLv3 > userdb { > driver = prefetch > } > userdb { > args = /etc/dovecot/dovecot-sql.conf > driver = sql > } > protocol imap { > mail_plugins = notify replication quota imap_quota > } > protocol pop3 { > mail_plugins = quota > pop3_uidl_format = %08Xu%08Xv > } > protocol lda { > mail_plugins = notify replication quota sieve > postmaster_address = webmaster at localhost > } > protocol lmtp { > mail_plugins = notify replication quota sieve > postmaster_address = webmaster at localhost > }This is the exception which I got several times: Feb 26 16:16:39 mx dovecot: replicator: Panic: data stack: Out of memory> when allocating 268435496 bytes > Feb 26 16:16:39 mx dovecot: replicator: Error: Raw backtrace: > /usr/lib/dovecot/libdovecot.so.0(+0x6b6fe) [0x7f7ca2b0a6fe] -> > /usr/lib/dovecot/libdovecot.so.0(+0x6b7ec) [0x7f7ca2b0a7ec] -> > /usr/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7f7ca2ac18fb] -> > /usr/lib/dovecot/libdovecot.so.0(+0x6977e) [0x7f7ca2b0877e] -> > /usr/lib/dovecot/libdovecot.so.0(+0x699db) [0x7f7ca2b089db] -> > /usr/lib/dovecot/libdovecot.so.0(+0x82198) [0x7f7ca2b21198] -> > /usr/lib/dovecot/libdovecot.so.0(+0x6776d) [0x7f7ca2b0676d] -> > /usr/lib/dovecot/libdovecot.so.0(buffer_write+0x6c) [0x7f7ca2b069dc] -> > dovecot/replicator(replicator_queue_push+0x14e) [0x7f7ca2fa17ae] -> > dovecot/replicator(+0x4f9e) [0x7f7ca2fa0f9e] -> dovecot/replicator(+0x4618) > [0x7f7ca2fa0618] -> dovecot/replicator(+0x4805) [0x7f7ca2fa0805] -> > /usr/lib/dovecot/libdovecot.so.0(io_loop_call_io+0x3f) [0x7f7ca2b1bd0f] > -> /usr/lib/dovecot/libdovecot.so.0(io_loop_handler_run_internal+0xf9) > [0x7f7ca2b1cd09] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_handler_run+0x9) > [0x7f7ca2b1bd79] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_run+0x38) > [0x7f7ca2b1bdf8] -> /usr/lib/dovecot/libdovecot.so.0(master_service_run+0x13) > [0x7f7ca2ac6dc3] -> dovecot/replicator(main+0x195) [0x7f7ca2f9f8b5] -> > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f7ca2715b45] > -> dovecot/replicator(+0x395d) [0x7f7ca2f9f95d] > Feb 26 16:16:39 mx dovecot: imap(***.com): Warning: replication(***.com): > Sync failure: > Feb 26 16:16:39 mx dovecot: replicator: Fatal: master: > service(replicator): child 24012 killed with signal 6 (core dumps disabled)This is the current maildir listing on Server A # ls -la /var/vmail/*.eu/*h/Maildir/new/> total 24 > drwx------ 2 vmail vmail 4096 Feb 27 18:12 . > drwx------ 15 vmail vmail 4096 Feb 27 21:47 .. > -rw------- 1 vmail vmail 3600 Feb 27 14:49 1488206976.M277562P25620.mail, > S=3600,W=3671 > -rw------- 1 vmail vmail 4390 Feb 27 15:17 1488208642.M513542P27111.mail, > S=4390,W=4478:2,S > -rw------- 1 vmail vmail 3577 Feb 27 16:32 1488213157.M307300P30773.mail, > S=3577,W=3648:2,SThis is the current maildir listing on Server B # ls -la /var/vmail/*.eu/*h/Maildir/new/> total 16 > drwx------ 2 vmail vmail 12288 Feb 27 16:45 . > drwx------ 15 vmail vmail 4096 Feb 27 21:47 ..This is how I tried to manually sync it doveadm -v sync -u *h@*.eu -f tcp:mx.***.de:12345 This is the users sync status # doveadm replicator status 'cheecoh at ragequit.eu'> username priority fast sync full sync failed > *h@*.eu none 00:24:47 10:57:04 -Then I tried to lookup for the mail-id which is also the same on both servers # grep -ri "M277562P25620" /var/vmail/*.eu/*h/> /var/vmail/*.eu/*h/Maildir/dovecot-uidlist:493 :1488206976.M277562P25620. > mail,S=3600,W=3671I have no idea what else I could do. I could also pass a "doveadm -Dv sync" output but this one is really huge.. Best Regards Christoph Kluge
Christoph Kluge
2017-Mar-02 09:10 UTC
Some mails do not get replicated anymore after memory-exhaust
The amount of non-replicated mails on the mirror starts to grow without any exceptions inside the log. Is there a way how I can enforce a full-replication incl. directory scans through the doveadm utility? Besides that are there any arguments against a non-destructive rsync? Could it break anything i.e. flags/dupes? Best On Mon, Feb 27, 2017 at 11:36 PM, Christoph Kluge <me at christoph-kluge.eu> wrote:> Hey guys, > > overall I have an working dovecot replication between 2 servers running on > amazon cloud. Sadly I had some messages that my server ran out of memory. > After investigating a little bit further I realized that some mails didn't > got replicated, but I'm not sure if this was related to the memory exhaust. > I was expecting that the full-sync would catch them up but sadly it's not. > > Attached I'm adding: > * /etc/dovecot/dovecot.conf from both servers > * one sample of my memory-exhaust exception > * maildir directory listing of one mailbox on both servers > * commands + outpot of manual attempt for full-replication > * grep information of missing mail inside Maildir on both servers > > Here is my configuration from both servers. The configugration is 1:1 the > same except the mail_replica server. Please note one server runs on debian > 8.7 and the other one on 7.11. > > ---- SERVER A >> # dovecot -n >> # 2.2.13: /etc/dovecot/dovecot.conf >> # OS: Linux 3.2.0-4-amd64 x86_64 Debian 8.7 >> ---- SERVER B >> # dovecot -n >> # 2.2.13: /etc/dovecot/dovecot.conf >> # OS: Linux 2.6.32-34-pve i686 Debian 7.11 >> auth_mechanisms = plain login >> disable_plaintext_auth = no >> doveadm_password = **** >> doveadm_port = 12345 >> listen = *,[::] >> log_timestamp = "%Y-%m-%d %H:%M:%S " >> mail_max_userip_connections = 100 >> mail_plugins = notify replication quota >> mail_privileged_group = vmail >> passdb { >> args = /etc/dovecot/dovecot-sql.conf >> driver = sql >> } >> plugin { >> mail_replica = tcp:*.****.de >> quota = dict:user::file:/var/vmail/%d/%n/.quotausage >> replication_full_sync_interval = 1 hours >> sieve = /var/vmail/%d/%n/.sieve >> sieve_max_redirects = 25 >> } >> protocols = imap >> replication_max_conns = 2 >> service aggregator { >> fifo_listener replication-notify-fifo { >> mode = 0666 >> user = vmail >> } >> unix_listener replication-notify { >> mode = 0666 >> user = vmail >> } >> } >> service auth { >> unix_listener /var/spool/postfix/private/auth { >> group = postfix >> mode = 0660 >> user = postfix >> } >> unix_listener auth-userdb { >> group = vmail >> mode = 0600 >> user = vmail >> } >> user = root >> } >> service config { >> unix_listener config { >> user = vmail >> } >> } >> service doveadm { >> inet_listener { >> port = 12345 >> } >> user = vmail >> } >> service imap-login { >> client_limit = 1000 >> process_limit = 512 >> } >> service lmtp { >> unix_listener /var/spool/postfix/private/dovecot-lmtp { >> group = postfix >> mode = 0600 >> user = postfix >> } >> } >> service replicator { >> process_min_avail = 1 >> unix_listener replicator-doveadm { >> mode = 0666 >> } >> } >> ssl_cert = </etc/postfix/smtpd.cert >> ssl_key = </etc/postfix/smtpd.key >> ssl_protocols = !SSLv2 !SSLv3 >> userdb { >> driver = prefetch >> } >> userdb { >> args = /etc/dovecot/dovecot-sql.conf >> driver = sql >> } >> protocol imap { >> mail_plugins = notify replication quota imap_quota >> } >> protocol pop3 { >> mail_plugins = quota >> pop3_uidl_format = %08Xu%08Xv >> } >> protocol lda { >> mail_plugins = notify replication quota sieve >> postmaster_address = webmaster at localhost >> } >> protocol lmtp { >> mail_plugins = notify replication quota sieve >> postmaster_address = webmaster at localhost >> } > > > This is the exception which I got several times: > > Feb 26 16:16:39 mx dovecot: replicator: Panic: data stack: Out of memory >> when allocating 268435496 bytes >> Feb 26 16:16:39 mx dovecot: replicator: Error: Raw backtrace: >> /usr/lib/dovecot/libdovecot.so.0(+0x6b6fe) [0x7f7ca2b0a6fe] -> >> /usr/lib/dovecot/libdovecot.so.0(+0x6b7ec) [0x7f7ca2b0a7ec] -> >> /usr/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7f7ca2ac18fb] -> >> /usr/lib/dovecot/libdovecot.so.0(+0x6977e) [0x7f7ca2b0877e] -> >> /usr/lib/dovecot/libdovecot.so.0(+0x699db) [0x7f7ca2b089db] -> >> /usr/lib/dovecot/libdovecot.so.0(+0x82198) [0x7f7ca2b21198] -> >> /usr/lib/dovecot/libdovecot.so.0(+0x6776d) [0x7f7ca2b0676d] -> >> /usr/lib/dovecot/libdovecot.so.0(buffer_write+0x6c) [0x7f7ca2b069dc] -> >> dovecot/replicator(replicator_queue_push+0x14e) [0x7f7ca2fa17ae] -> >> dovecot/replicator(+0x4f9e) [0x7f7ca2fa0f9e] -> dovecot/replicator(+0x4618) >> [0x7f7ca2fa0618] -> dovecot/replicator(+0x4805) [0x7f7ca2fa0805] -> >> /usr/lib/dovecot/libdovecot.so.0(io_loop_call_io+0x3f) [0x7f7ca2b1bd0f] >> -> /usr/lib/dovecot/libdovecot.so.0(io_loop_handler_run_internal+0xf9) >> [0x7f7ca2b1cd09] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_handler_run+0x9) >> [0x7f7ca2b1bd79] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_run+0x38) >> [0x7f7ca2b1bdf8] -> /usr/lib/dovecot/libdovecot.so.0(master_service_run+0x13) >> [0x7f7ca2ac6dc3] -> dovecot/replicator(main+0x195) [0x7f7ca2f9f8b5] -> >> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f7ca2715b45] >> -> dovecot/replicator(+0x395d) [0x7f7ca2f9f95d] >> Feb 26 16:16:39 mx dovecot: imap(***.com): Warning: replication(***.com): >> Sync failure: >> Feb 26 16:16:39 mx dovecot: replicator: Fatal: master: >> service(replicator): child 24012 killed with signal 6 (core dumps disabled) > > > This is the current maildir listing on Server A > > # ls -la /var/vmail/*.eu/*h/Maildir/new/ >> total 24 >> drwx------ 2 vmail vmail 4096 Feb 27 18:12 . >> drwx------ 15 vmail vmail 4096 Feb 27 21:47 .. >> -rw------- 1 vmail vmail 3600 Feb 27 14:49 1488206976.M277562P25620.mail, >> S=3600,W=3671 >> -rw------- 1 vmail vmail 4390 Feb 27 15:17 1488208642.M513542P27111.mail, >> S=4390,W=4478:2,S >> -rw------- 1 vmail vmail 3577 Feb 27 16:32 1488213157.M307300P30773.mail, >> S=3577,W=3648:2,S > > > This is the current maildir listing on Server B > > # ls -la /var/vmail/*.eu/*h/Maildir/new/ >> total 16 >> drwx------ 2 vmail vmail 12288 Feb 27 16:45 . >> drwx------ 15 vmail vmail 4096 Feb 27 21:47 .. > > > This is how I tried to manually sync it > > doveadm -v sync -u *h@*.eu -f tcp:mx.***.de:12345 > > > This is the users sync status > > # doveadm replicator status 'cheecoh at ragequit.eu' >> username priority fast sync full sync failed >> *h@*.eu none 00:24:47 10:57:04 - > > > Then I tried to lookup for the mail-id which is also the same on both > servers > > # grep -ri "M277562P25620" /var/vmail/*.eu/*h/ >> /var/vmail/*.eu/*h/Maildir/dovecot-uidlist:493 >> :1488206976.M277562P25620.mail,S=3600,W=3671 > > > I have no idea what else I could do. I could also pass a "doveadm -Dv > sync" output but this one is really huge.. > > Best Regards > Christoph Kluge > >