Steven Varco
2021-Jul-05 08:00 UTC
dsync replication fails with No space left on device / Out of memory
> Aki Tuomi aki.tuomi at open-xchange.com > Fri Jul 2 09:14:47 EEST 2021 > > The disk issue is likely that disk space on mail_temp_dir runs out, which is usually /tmp.Hi Aki Many thanks for that hint, it actually lead me to the root cause of the problem! :) As during the process the /tmp filesystem fills- and after empties so fast I could not even see the filesystem filling up when actively monitoring it with the watch command. It took like a microsecond when I could only see that /tmp increased somehow and immediately decreased again. Thats why I not noticed this in the first place. I then increased the filesystem size and all the problems suddenly vanished. - Not just the "No space left on device?, suppringsly also the error log message: ?Out of memory? ist gone now, so they were somehow connected to eachother. cheers, Steven -- https://steven.varco.ch/ https://www.tech-island.com/> Am 02.07.2021 um 07:43 schrieb J?rg Faudin Schulz <js at faudin.de>: > > Hi, > > the memory issue has already been reported, not resolved yet: > > https://www.mail-archive.com/dovecot at dovecot.org/msg83763.html > > > the disk-free issue is something different. Increasing memory parameters doesn't help- the sync only crashes later. > > Here, everything seems to be synced fine nevertheless. > > > > Am 02.07.21 um 02:56 schrieb Harlan Stenn: >> Inodes? df -i >> >> On 7/1/2021 5:07 PM, Steven Varco wrote: >>> Hi All >>> >>> Since I configured dsync replication I get strange errors in the maillog on my two mail dovecot nodes: >>> >>> PRIMARY: >>> Jul 2 01:21:42 mx01.example.com dovecot: doveadm: Error: read(mx02.example.com) failed: read(size=3148) failed: Connection reset by peer (last sent=mail, last recv=mail (EOL)) >>> >>> >>> The secondary is more interesting: >>> >>> SECONDARY >>> Jul 2 01:21:42 mx02 dovecot: doveadm: Error: close(-1[istream-seekable.c:237]) failed: No space left on device >>> Jul 2 01:21:43 mx02 dovecot: doveadm: Fatal: pool_system_realloc(268435456): Out of memory >>> Jul 2 01:21:43 mx02 dovecot: doveadm: Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0xa192e) [0x7f2e9be4c92e] -> /usr/lib64/dovecot/libdovecot.so.0(+0xa1a0e) [0x7f2e9be4ca0e] -> /usr/lib64/dovecot/libdovecot.so.0(i_error+0) [0x7f2e9bddc3d3] -> /usr/lib64/dovecot/libdo >>> Jul 2 01:21:43 mx02 dovecot: doveadm: Fatal: master: service(doveadm): child 2876 returned error 83 (Out of memory (service doveadm { vsz_limit=256 MB }, you may need to increase it) - set CORE_OUTOFMEM=1 environment to get core dump) >>> Jul 2 01:21:51 mx02 dovecot: dsync-local(user at example.com): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0xa192e) [0x7fd56e17e92e] -> /usr/lib64/dovecot/libdovecot.so.0(+0xa1a0e) [0x7fd56e17ea0e] -> /usr/lib64/dovecot/libdovecot.so.0(i_error+0) [0x7fd56e10e3d3] -> /us >>> Jul 2 01:21:51 mx02 dovecot: dsync-local(user at example.com): Fatal: master: service(doveadm): child 2882 returned error 83 (Out of memory (service doveadm { vsz_limit=256 MB }, you may need to increase it) - set CORE_OUTOFMEM=1 environment to get core dump) >>> >>> >>> The error messages state that disk space and/or memory is a problem, but disk space and memory is enough available: >>> >>> mx02 [~] # df -h /srv/mail/ >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/mapper/system-mail 10G 5.7G 4.3G 58% /srv/mail >>> >>> mx02 [~] # free -m >>> total used free shared buff/cache available >>> Mem: 3789 1602 1088 199 1097 1759 >>> Swap: 471 93 378 >>> >>> >>> I also tried to increase vsz_limit from 256 MB to 512 MB, which did not help. >>> >>> >>> And for the sake of completness also the connection to the doveadm port works well from both nodes: >>> >>> mx01-prod [~] # telnet mx02 14310 >>> Trying 172.20.19.225... >>> Connected to mx02. >>> Escape character is '^]'. >>> ^] >>> >>> >>> mx02 [~] # telnet mx01 14310 >>> Trying 172.20.19.251... >>> Connected to mx01. >>> Escape character is '^]'. >>> ^] >>> >>> >>> Although mail replication seems to be working properly and mails are in sync on both nodes (as what I could see), I would like to find the cause of this messages, as this does definetely don?t look normal? >>> >>> I?m grateful for any help, since I?m quite on a struggle now? >>> >>> Steven >>> >>> >>> Here?s my config >>> -------------------------------------------------------------------------------- >>> # doveconf -n >>> # 2.2.36 (1f10bfa63): /etc/dovecot/dovecot.conf >>> # Pigeonhole version 0.4.24 (124e06aa) >>> # OS: Linux 3.10.0-1160.31.1.el7.x86_64 x86_64 CentOS Linux release 7.9.2009 (Core) >>> # Hostname: mx01.example.com >>> auth_mechanisms = plain login >>> auth_verbose = yes >>> dict { >>> sqlquota = mysql:/etc/dovecot/dict-sqlquota.conf.ext >>> } >>> doveadm_password = # hidden, use -P to show it >>> doveadm_port = 14310 >>> first_valid_uid = 1000 >>> mail_plugins = quota notify replication >>> managesieve_notify_capability = mailto >>> managesieve_sieve_capability = fileinto reject envelope encoded-character vacation subaddress comparator-i;ascii-numeric relational regex imap4flags copy include variables body enotify environment mailbox date index ihave duplicate mime foreverypart extracttext >>> mbox_write_locks = fcntl >>> namespace inbox { >>> inbox = yes >>> location >>> mailbox Drafts { >>> special_use = \Drafts >>> } >>> mailbox Junk { >>> special_use = \Junk >>> } >>> mailbox Sent { >>> special_use = \Sent >>> } >>> mailbox "Sent Messages" { >>> special_use = \Sent >>> } >>> mailbox Trash { >>> special_use = \Trash >>> } >>> prefix >>> separator = / >>> type = private >>> } >>> passdb { >>> args = /etc/dovecot/dovecot-sql.conf.ext >>> driver = sql >>> } >>> plugin { >>> mail_replica = tcp:mx02.example.com >>> quota = maildir:User quota >>> quota_exceeded_message = Quota exceeded, please go to http://www.example.com/over_quota_help for instructions on how to fix this. >>> quota_rule2 = INBOX.Trash:storage=+100M >>> quota_status_nouser = DUNNO >>> quota_status_overquota = 552 5.2.2 Mailbox is full / Mailbox ist voll >>> quota_status_success = DUNNO >>> quota_warning = storage=90%% quota-warning 90 %u >>> quota_warning2 = -storage=90%% quota-warning below %u >>> sieve = file:~/sieve;active=~/.dovecot.sieve >>> } >>> postmaster_address = postmaster at example.com >>> protocols = imap pop3 lmtp sieve >>> replication_dsync_parameters = -d -l 30 -U >>> service aggregator { >>> fifo_listener replication-notify-fifo { >>> user = vmail >>> } >>> unix_listener replication-notify { >>> user = vmail >>> } >>> } >>> service auth { >>> unix_listener /var/spool/postfix/private/auth { >>> group = postfix >>> mode = 0660 >>> user = postfix >>> } >>> unix_listener auth-userdb { >>> user = vmail >>> } >>> } >>> service dict { >>> unix_listener dict { >>> user = vmail >>> } >>> } >>> service doveadm { >>> inet_listener { >>> port = 14310 >>> ssl = no >>> } >>> } >>> service managesieve-login { >>> inet_listener sieve { >>> port = 4190 >>> } >>> } >>> service quota-status { >>> client_limit = 1 >>> executable = quota-status -p postfix >>> inet_listener { >>> port = 14340 >>> } >>> } >>> service quota-warning { >>> executable = script /usr/local/libexec/dovecot/quota-warning.sh >>> unix_listener quota-warning { >>> user = vmail >>> } >>> user = vmail >>> } >>> service replicator { >>> process_min_avail = 1 >>> unix_listener replicator-doveadm { >>> mode = 0600 >>> user = vmail >>> } >>> } >>> ssl = required >>> ssl_cert = </etc/ssl/acme/certs/mail.example.com.chain.crt >>> ssl_key = # hidden, use -P to show it >>> userdb { >>> args = /etc/dovecot/dovecot-sql.conf.ext >>> driver = sql >>> } >>> verbose_proctitle = yes >>> protocol lmtp { >>> mail_plugins = quota notify replication sieve >>> } >>> protocol lda { >>> mail_plugins = quota notify replication sieve >>> } >>> protocol imap { >>> mail_max_userip_connections = 20 >>> mail_plugins = quota notify replication imap_quota >>> } >>> -------------------------------------------------------------------------------- >>> >>> >>> mx02.example.com has exact the same config, except of: >>> -------------------------------------------------------------------------------- >>> plugin { >>> mail_replica = tcp:mx01.example.com >>> -------------------------------------------------------------------------------- >>> >>>
@lbutlr
2021-Jul-07 08:34 UTC
dsync replication fails with No space left on device / Out of memory
On 2021 Jul 05, at 02:00, Steven Varco <dovecot.org at bbs.varco.ch> wrote:> I then increased the filesystem size and all the problems suddenly vanished.How large was your tmp before and after the change, out of curiosity? -- -=> <http://xkcd.com/241/> <http://xkcd.com/304/> <http://xkcd.com/635/> <=-