Paul Kudla (SCOM.CA Internet Services Inc.)
2022-Jun-04 12:39 UTC
Replicator: Panic: data stack: Out of memory
actually suggestion below is a good idea run ps -axww (or top) to list active processes this will give you some hints top is better for overall memory i also have a perl script that will show actual memory useage, free etc utilities like this are handy to have also i found i had to set in dovecot.conf default_process_limit = 16384 also are you running debug ? auth_debug = no auth_debug_passwords = no mail_debug = no ie set debug to = yes? might give more detail if this is really a dovecot issue. other background processes can eat memory I run mailscanner for example and someone every one in a while tries to crash it! it recovers but lord knows mem outputs : # mem SYSTEM MEMORY SUMMARY: mem_used: 16GB [ 12%] Logically used memory mem_avail: + 111GB [ 87%] Logically available memory -------------- ------------ ----------- ------ mem_total: = 128GB [100%] Logically total memory SYSTEM MEMORY INFORMATION: mem_wire: 13GB [ 10%] Wired: disabled for paging out mem_active: + 0GB [ 0%] Active: recently referenced mem_inactive:+ 71GB [ 57%] Inactive: recently not referenced mem_cache: + 0GB [ 0%] Cached: almost avail. for allocation mem_free: + 40GB [ 32%] Free: fully available for allocation mem_gap_vm: + 0GB [ 0%] Memory gap: UNKNOWN -------------- ------------ ----------- ------ mem_all: = 124GB [100%] Total real memory managed mem_gap_sys: + 3GB Memory gap: Kernel?! -------------- ------------ ----------- mem_phys: = 127GB Total real memory available mem_gap_hw: + 0GB Memory gap: Segment Mappings?! -------------- ------------ ----------- mem_hw: = 128GB Total real memory installed ----------------------------------------------------------------------- # cat /programs/common/mem #!/usr/local/bin/perl ## ## freebsd-memory -- List Total System Memory Usage ## Copyright (c) 2003-2004 Ralf S. Engelschall <rse at engelschall.com> ## ## Redistribution and use in source and binary forms, with or without ## modification, are permitted provided that the following conditions ## are met: ## 1. Redistributions of source code must retain the above copyright ## notice, this list of conditions and the following disclaimer. ## 2. Redistributions in binary form must reproduce the above copyright ## notice, this list of conditions and the following disclaimer in the ## documentation and/or other materials provided with the distribution. ## ## THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND ## ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE ## IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ## ARE DISCLAIMED. IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE ## FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL ## DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS ## OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) ## HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT ## LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY ## OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF ## SUCH DAMAGE. ## # query the system through the generic sysctl(8) interface # (this does not require special priviledges) my $sysctl = {}; my $sysctl_output = `/sbin/sysctl -a`; foreach my $line (split(/\n/, $sysctl_output)) { if ($line =~ m/^([^:]+):\s+(.+)\s*$/s) { $sysctl->{$1} = $2; } } # round the physical memory size to the next power of two which is # reasonable for memory cards. We do this by first determining the # guessed memory card size under the assumption that usual computer # hardware has an average of a maximally eight memory cards installed # and those are usually of equal size. sub mem_rounded { my ($mem_size) = @_; my $chip_size = 1; my $chip_guess = ($mem_size / 8) - 1; while ($chip_guess != 0) { $chip_guess >>= 1; $chip_size <<= 1; } my $mem_round = (int($mem_size / $chip_size) + 1) * $chip_size; return $mem_round; } # determine the individual known information # NOTICE: forget hw.usermem, it is just (hw.physmem - vm.stats.vm.v_wire_count). # NOTICE: forget vm.stats.misc.zero_page_count, it is just the subset of # vm.stats.vm.v_free_count which is already pre-zeroed. my $mem_hw = &mem_rounded($sysctl->{"hw.physmem"}); my $mem_phys = $sysctl->{"hw.physmem"}; my $mem_all = $sysctl->{"vm.stats.vm.v_page_count"} * $sysctl->{"hw.pagesize"}; my $mem_wire = $sysctl->{"vm.stats.vm.v_wire_count"} * $sysctl->{"hw.pagesize"}; my $mem_active = $sysctl->{"vm.stats.vm.v_active_count"} * $sysctl->{"hw.pagesize"}; my $mem_inactive = $sysctl->{"vm.stats.vm.v_inactive_count"} * $sysctl->{"hw.pagesize"}; my $mem_cache = $sysctl->{"vm.stats.vm.v_cache_count"} * $sysctl->{"hw.pagesize"}; my $mem_free = $sysctl->{"vm.stats.vm.v_free_count"} * $sysctl->{"hw.pagesize"}; # determine the individual unknown information my $mem_gap_vm = $mem_all - ($mem_wire + $mem_active + $mem_inactive + $mem_cache + $mem_free); my $mem_gap_sys = $mem_phys - $mem_all; my $mem_gap_hw = $mem_hw - $mem_phys; # determine logical summary information my $mem_total = $mem_hw; my $mem_avail = $mem_inactive + $mem_cache + $mem_free; my $mem_used = $mem_total - $mem_avail; # information annotations my $info = { "mem_wire" => 'Wired: disabled for paging out', "mem_active" => 'Active: recently referenced', "mem_inactive" => 'Inactive: recently not referenced', "mem_cache" => 'Cached: almost avail. for allocation', "mem_free" => 'Free: fully available for allocation', "mem_gap_vm" => 'Memory gap: UNKNOWN', "mem_all" => 'Total real memory managed', "mem_gap_sys" => 'Memory gap: Kernel?!', "mem_phys" => 'Total real memory available', "mem_gap_hw" => 'Memory gap: Segment Mappings?!', "mem_hw" => 'Total real memory installed', "mem_used" => 'Logically used memory', "mem_avail" => 'Logically available memory', "mem_total" => 'Logically total memory', }; # print system results printf("\n"); printf("SYSTEM MEMORY SUMMARY:\n"); printf("mem_used: %7dGB [%3d%%] %s\n", $mem_used / (1024*1024*1024), ($mem_used / $mem_total) * 100, $info->{"mem_used"}); printf("mem_avail: + %7dGB [%3d%%] %s\n", $mem_avail / (1024*1024*1024), ($mem_avail / $mem_total) * 100, $info->{"mem_avail"}); printf("-------------- ------------ ----------- ------\n"); printf("mem_total: = %7dGB [100%%] %s\n", $mem_total / (1024*1024*1024), $info->{"mem_total"}); printf("\n"); printf("SYSTEM MEMORY INFORMATION:\n"); printf("mem_wire: %7dGB [%3d%%] %s\n", $mem_wire / (1024*1024*1024), ($mem_wire / $mem_all) * 100, $info->{"mem_wire"}); printf("mem_active: + %7dGB [%3d%%] %s\n", $mem_active / (1024*1024*1024), ($mem_active / $mem_all) * 100, $info->{"mem_active"}); printf("mem_inactive:+ %7dGB [%3d%%] %s\n", $mem_inactive / (1024*1024*1024), ($mem_inactive / $mem_all) * 100, $info->{"mem_inactive"}); printf("mem_cache: + %7dGB [%3d%%] %s\n", $mem_cache / (1024*1024*1024), ($mem_cache / $mem_all) * 100, $info->{"mem_cache"}); printf("mem_free: + %7dGB [%3d%%] %s\n", $mem_free / (1024*1024*1024), ($mem_free / $mem_all) * 100, $info->{"mem_free"}); printf("mem_gap_vm: + %7dGB [%3d%%] %s\n", $mem_gap_vm / (1024*1024*1024), ($mem_gap_vm / $mem_all) * 100, $info->{"mem_gap_vm"}); printf("-------------- ------------ ----------- ------\n"); printf("mem_all: = %7dGB [100%%] %s\n", $mem_all / (1024*1024*1024), $info->{"mem_all"}); printf("mem_gap_sys: + %7dGB %s\n", $mem_gap_sys / (1024*1024*1024), $info->{"mem_gap_sys"}); printf("-------------- ------------ -----------\n"); printf("mem_phys: = %7dGB %s\n", $mem_phys / (1024*1024*1024), $info->{"mem_phys"}); printf("mem_gap_hw: + %7dGB %s\n", $mem_gap_hw / (1024*1024*1024), $info->{"mem_gap_hw"}); printf("-------------- ------------ -----------\n"); printf("mem_hw: = %7dGB %s\n", $mem_hw / (1024*1024*1024), $info->{"mem_hw"}); # print logical results ------------------------------------------------------------------ top will display something like this ? last pid: 85373; load averages: 0.71, 0.48, 0.38 up 72+04:51:41 08:22:11 207 processes: 1 running, 206 sleeping CPU: 1.5% user, 0.0% nice, 0.4% system, 0.0% interrupt, 98.0% idle Mem: 336M Active, 71G Inact, 139M Laundry, 13G Wired, 770M Buf, 40G Free ARC: 4319M Total, 1346M MFU, 501M MRU, 2368K Anon, 55M Header, 2414M Other 383M Compressed, 1469M Uncompressed, 3.83:1 Ratio Swap: 16G Total, 16G Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 1986 pgsql 1 26 0 195M 46M select 14 426:27 11.47% postgres 83810 pgsql 1 27 0 200M 171M select 13 3:15 9.19% postgres 1882 root 128 20 0 11M 3732K rpcsvc 15 29.7H 2.26% nfsd 1987 pgsql 1 20 0 195M 47M select 5 33:21 1.84% postgres 1985 root 34 21 0 141M 88M sigwai 14 72:22 1.32% named 1937 root 1 20 0 27M 15M select 15 491:36 0.90% python3.8 99555 root 1 20 0 28M 18M select 10 634:23 0.88% python3.8 1939 root 1 20 0 27M 15M select 1 939:47 0.87% python3.8 1988 pgsql 1 20 0 195M 47M select 7 6:58 0.28% postgres 1989 pgsql 1 20 0 195M 47M select 8 2:14 0.17% postgres 1964 pgsql 1 20 0 194M 164M select 9 10:02 0.08% postgres 85373 root 1 20 0 14M 3644K CPU0 0 0:00 0.07% top 3150 pgsql 1 20 0 195M 42M select 6 39:21 0.06% postgres ps -axw or ps -axww or freebsd # ps -axww PID TT STAT TIME COMMAND 0 - DLs 3788:48.94 [kernel] 1 - ILs 0:05.38 /sbin/init -- 2 - DL 0:00.00 [crypto] 3 - DL 0:00.00 [crypto returns 0] 4 - DL 0:00.00 [crypto returns 1] 5 - DL 0:00.00 [crypto returns 2] 6 - DL 0:00.00 [crypto returns 3] 7 - DL 0:00.00 [crypto returns 4] 8 - DL 0:00.00 [crypto returns 5] 9 - DL 0:00.00 [crypto returns 6] 10 - DL 0:00.00 [audit] 11 - RNL 1629112:33.34 [idle] 12 - WL 180:00.70 [intr] 13 - DL 123:57.70 [geom] 14 - DL 0:00.00 [crypto returns 7] 15 - DL 0:00.00 [crypto returns 8] 16 - DL 0:00.00 [crypto returns 9] 17 - DL 0:00.00 [crypto returns 10] 18 - DL 0:00.00 [crypto returns 11] 19 - DL 0:00.00 [crypto returns 12] 20 - DL 0:00.00 [crypto returns 13] 21 - DL 0:00.00 [crypto returns 14] 22 - DL 0:00.00 [crypto returns 15] 23 - DL 0:00.00 [sequencer 00] 24 - DL 0:00.00 [cam] 25 - DL 5:42.32 [usb] 26 - DL 0:00.47 [soaiod1] 27 - DL 0:00.47 [soaiod2] 28 - DL 0:00.46 [soaiod3] 29 - DL 0:00.47 [soaiod4] 30 - DL 1714:58.15 [zfskern] 31 - DL 0:00.00 [sctp_iterator] 32 - DL 12:50.77 [pf purge] 33 - DL 2:16.82 [rand_harvestq] 34 - DL 29:00.62 [pagedaemon] 35 - DL 0:00.00 [vmdaemon] 36 - DL 5:25.68 [bufdaemon] 37 - DL 1:44.98 [vnlru] 38 - DL 2040:33.82 [syncer] 1657 - Is 0:01.21 /sbin/devd 1863 - Ss 0:03.44 /usr/sbin/rpcbind 1878 - Is 0:00.08 /usr/sbin/mountd -r -S 1880 - Is 0:00.27 nfsd: master (nfsd) 1882 - S 1780:23.16 nfsd: server (nfsd) 1907 - Ss 10:01.06 /usr/sbin/syslogd -s 1909 - Is 0:00.55 /usr/sbin/inetd -wW -C 50 -s 500 1911 - Is 0:00.25 /usr/sbin/sshd 1955 - Is 24:50.70 /usr/local/sbin/clamd 1964 - Ss 10:02.28 postmaster: checkpointer (postgres) 1965 - Ss 1:38.52 postmaster: background writer (postgres) 1966 - Ss 3:48.60 postmaster: walwriter (postgres) 1967 - Ss 2:03.84 postmaster: autovacuum launcher (postgres) 1968 - Ss 12:41.60 postmaster: stats collector (postgres) 1969 - Is 0:01.82 postmaster: logical replication launcher (postgres) 1974 - Ss 37:19.26 postmaster: walsender pgsql 10.221.0.16(30421) (postgres) 1976 - Ss 39:37.29 postmaster: walsender pgsql 10.221.0.10(64872) (postgres) 1985 - Is 72:21.96 /usr/local/sbin/named -d 0 -4 1986 - Ss 426:29.15 postmaster: pgsql scom_billing 10.221.0.18(52852) (postgres) 1987 - Ss 33:21.50 postmaster: pgsql scom_billing 10.221.0.18(60830) (postgres) 1988 - Ss 6:57.70 postmaster: pgsql scom_billing 10.221.0.18(34255) (postgres) 1989 - Ss 2:13.52 postmaster: pgsql scom_billing 10.221.0.18(17265) (postgres) 2073 - Ss 10:12.46 /usr/local/libexec/postfix/master -w 2076 - I 0:07.82 qmgr -l -t fifo -u 2166 - Is 1:53.61 /usr/local/libexec/postfix/master -w 2168 - I 0:55.23 qmgr -l -t fifo -u 2238 - Is 1:49.77 /usr/local/libexec/postfix/master -w 2240 - I 1:01.17 qmgr -l -t fifo -u 2253 - I 0:39.34 tlsmgr -l -t unix -u 2397 - Is 0:05.58 MailScanner: starting child (perl) 2513 - Is 0:20.43 /usr/sbin/cron -s 3150 - Rs 39:21.01 postmaster: walsender pgsql 10.221.0.6(10000) (postgres) 3175 - Is 0:00.35 postmaster: pgsql scom_billing 10.221.0.6(10017) (postgres) 3176 - Is 0:10.80 postmaster: pgsql scom_billing 10.221.0.6(10018) (postgres) 3177 - Ss 1:10.22 postmaster: pgsql scom_billing 10.221.0.6(10019) (postgres) Happy Saturday !!! Thanks - paul Paul Kudla Scom.ca Internet Services <http://www.scom.ca> 004-1009 Byron Street South Whitby, Ontario - Canada L1N 4S3 Toronto 416.642.7266 Main?1.866.411.7266 Fax?1.888.892.7266 Email?paul at scom.ca On 6/4/2022 5:15 AM, dovecot-bounces at dovecot.org wrote:> > On 2022-06-04 02:46, Ivan Juri?i? wrote: >>> Ok a little more help : >>> vsz_limit = 0 --> means unlimited ram for allocation, change >>> this/try 2g etc pending avaliable ram. >> >> I try with 524M, 1G, 2G, 4G and 8G but in any case repclicator proces >> got crash. > > Maybe there is another service process causing OOM? e.g. check clamd, > antivirus DBs tend to be quite big and in updating for sometime becomes > double the size due to reloading. > > Also, somtimes httpd service when using event worker, and its not tuned > properly, it will cause the OOM crash to other service along itself. > > Good luck. > > Zakaria. >
> Dana 04.06.2022 15:34, Paul Kudla (SCOM.CA Internet Services Inc.) je > napisao(la): > ok thanks for the info > from here you need to turn on full debugging and then filter the log > by > "replicat"Now replication work when set vsz_limit in service aggregator and remove parametar replication_dsync_parameters and replication_full_sync_interval from my 90-replicator.conf. Now my configuration work for replication on another mail server. Config file for replication /etc/dovecot/conf.d/90-replicator.conf ------------------------------------------------------------------ service aggregator { vsz_limit = 256M fifo_listener replication-notify-fifo { user = vmail } unix_listener replication-notify { user = vmail } } service replicator { process_min_avail = 1 unix_listener replicator-doveadm { mode = 0600 user = vmail } } service doveadm { inet_listener { port = 12345 ssl = no } } replication_max_conns = 100 #replication_dsync_parameters = -d -N -l 30 -U #replication_full_sync_interval = 1 days doveadm_port = 12345 doveadm_password = Jados82! plugin { mail_replica = tcp:imap.myserv2.local:12345 }