Hi! Long story short: 11.2-STABLE/amd64 r335757 leaked over 4600MB kernel wired memory over 81 days uptime out of 8GB total RAM. Details follow. I have a workstation running Xorg, Firefox, Thunderbird, LibreOffice and occasionally VirtualBox for single VM. It has two identical 320GB HDDs combined with single graid-based array with "Intel" on-disk format having 3 volumes: - one "RAID1" volume /dev/raid/r0 occupies first 10GB or each HDD; - two "SINGLE" volumes /dev/raid/r1 and /dev/raid/r2 that utilize "tails" of HDDs (310GB each). /dev/raid/r0 (10GB) has MBR partitioning and two slices: - /dev/raid/r0s1 (8GB) is used for swap; - /dev/raid/r0s2 (2GB) is used by non-redundant ZFS pool named "os" that contains only root file system (177M used) and /usr file system (340M used). There is also second pool (ZMIRROR) named "z" built directly on top of /dev/raid/r[12] volumes, this pool contains all other file systems including /var, /home, /usr/ports, /usr/local, /usr/{src|obj} etc. # zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT os 1,98G 520M 1,48G - - 55% 25% 1.00x ONLINE - z 288G 79,5G 209G - - 34% 27% 1.00x ONLINE - This way I have swap outside of ZFS, boot blocks and partitioning mirrored by means of GEOM_RAID and can use local console to break to single user mode to unmount all file system other than root and /usr and can even export bigger ZFS pool "z". And I did that to see that ARC usage (limited with vfs.zfs.arc_max="3G" in /boot/loader.conf) dropped from over 2500MB down to 44MB but "Wired" stays high. Now after I imported "z" back and booted to multiuser mode top(1) shows: last pid: 51242; load averages: 0.24, 0.16, 0.13 up 81+02:38:38 22:59:18 104 processes: 1 running, 103 sleeping CPU: 0.0% user, 0.0% nice, 0.4% system, 0.2% interrupt, 99.4% idle Mem: 84M Active, 550M Inact, 4K Laundry, 4689M Wired, 2595M Free ARC: 273M Total, 86M MFU, 172M MRU, 64K Anon, 1817K Header, 12M Other 117M Compressed, 333M Uncompressed, 2.83:1 Ratio Swap: 8192M Total, 940K Used, 8191M Free I have KDB and DDB in my custom kernel also. How do I debug the leak further? I use nvidia-driver-340-340.107 driver for GK208 [GeForce GT 710B] video card. Here are outputs of "vmstat -m": http://www.grosbein.net/freebsd/leak/vmstat-m.txt and "vmstat -z": http://www.grosbein.net/freebsd/leak/vmstat-z.txt as well as "sysctl hw": http://www.grosbein.net/freebsd/leak/sysctl-hw.txt and "sysctl vm": http://www.grosbein.net/freebsd/leak/sysctl-vm.txt
On Tue, Feb 12, 2019 at 11:14:31PM +0700, Eugene Grosbein wrote:> Hi! > > Long story short: 11.2-STABLE/amd64 r335757 leaked over 4600MB kernel wired memory over 81 days uptime > out of 8GB total RAM. > > Details follow. > > I have a workstation running Xorg, Firefox, Thunderbird, LibreOffice and occasionally VirtualBox for single VM. > > It has two identical 320GB HDDs combined with single graid-based array with "Intel" > on-disk format having 3 volumes: > - one "RAID1" volume /dev/raid/r0 occupies first 10GB or each HDD; > - two "SINGLE" volumes /dev/raid/r1 and /dev/raid/r2 that utilize "tails" of HDDs (310GB each). > > /dev/raid/r0 (10GB) has MBR partitioning and two slices: > - /dev/raid/r0s1 (8GB) is used for swap; > - /dev/raid/r0s2 (2GB) is used by non-redundant ZFS pool named "os" that contains only > root file system (177M used) and /usr file system (340M used). > > There is also second pool (ZMIRROR) named "z" built directly on top of /dev/raid/r[12] volumes, > this pool contains all other file systems including /var, /home, /usr/ports, /usr/local, /usr/{src|obj} etc. > > # zpool list > NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT > os 1,98G 520M 1,48G - - 55% 25% 1.00x ONLINE - > z 288G 79,5G 209G - - 34% 27% 1.00x ONLINE - > > This way I have swap outside of ZFS, boot blocks and partitioning mirrored by means of GEOM_RAID and > can use local console to break to single user mode to unmount all file system other than root and /usr > and can even export bigger ZFS pool "z". And I did that to see that ARC usage > (limited with vfs.zfs.arc_max="3G" in /boot/loader.conf) dropped from over 2500MB > down to 44MB but "Wired" stays high. Now after I imported "z" back and booted to multiuser mode > top(1) shows: > > last pid: 51242; load averages: 0.24, 0.16, 0.13 up 81+02:38:38 22:59:18 > 104 processes: 1 running, 103 sleeping > CPU: 0.0% user, 0.0% nice, 0.4% system, 0.2% interrupt, 99.4% idle > Mem: 84M Active, 550M Inact, 4K Laundry, 4689M Wired, 2595M Free > ARC: 273M Total, 86M MFU, 172M MRU, 64K Anon, 1817K Header, 12M Other > 117M Compressed, 333M Uncompressed, 2.83:1 Ratio > Swap: 8192M Total, 940K Used, 8191M Free > > I have KDB and DDB in my custom kernel also. How do I debug the leak further? > > I use nvidia-driver-340-340.107 driver for GK208 [GeForce GT 710B] video card. > Here are outputs of "vmstat -m": http://www.grosbein.net/freebsd/leak/vmstat-m.txt > and "vmstat -z": http://www.grosbein.net/freebsd/leak/vmstat-z.txtI suspect that the "leaked" memory is simply being used to cache UMA items. Note that the values in the FREE column of vmstat -z output are quite large. The cached items are reclaimed only when the page daemon wakes up to reclaim memory; if there are no memory shortages, large amounts of memory may accumulate in UMA caches. In this case, the sum of the product of columns 2 and 5 gives a total of roughly 4GB cached.> as well as "sysctl hw": http://www.grosbein.net/freebsd/leak/sysctl-hw.txt > and "sysctl vm": http://www.grosbein.net/freebsd/leak/sysctl-vm.txt
In article <d8c7abc0-3ba1-40e4-22b1-1b30d28ced14 at grosbein.net> eugen at grosbein.net writes:>Long story short: 11.2-STABLE/amd64 r335757 leaked over 4600MB kernel >wired memory over 81 days uptime >out of 8GB total RAM.Not a whole lot of evidence yet, but anecdotally I'm seeing the same thing on some huge-memory NFS servers running releng/11.2. They seem to run fine for a few weeks, then mysteriously start swapping continuously, a few hundred pages a second. The continues for hours at a time, and then stops just as mysteriously. Over time the total memory dedicated to ZFS ARC goes down but there's no decrease in wired memory. I've tried disabling swap, but this seems to make the server unstable. I have yet to find any obvious commonality (aside from the fact that these are all large-memory NFS servers which don't do much of anything else -- the only software running on them is related to managing and monitoring the NFS service). -GAWollman
Hi, I just want to report a similar issue here with 11.2-RELEASE-p8. The affected machine has 64 GB ram and does daily backups from several machines in the night, at daytime there a parallel runs of clamav on a specific dataset. One symtom is basic I/O-performance: After upgrading from 11.1 to 11.2 backup times have increased, and are even still increasing. After one week of operation, backup times have doubled - without having changed anything else. Then there is this wired memory and way too lazy reclaim of memory for user processes: The clamav scans start at 10:30 and get swapped out immediatly. Although vfs.zfs.arc_max=48G, wired is at 62 GB before the scans and it takes about 10 minutes for the scan processes to actually run on system ram, not swap. There is obviously something broken, as there are several threads with similar observations. with kind regards, Robert Schulze