Ahoy. This morning, I awoke to the following on one of my servers: pid 59630 (httpd), uid 80, was killed: out of swap space pid 59341 (find), uid 0, was killed: out of swap space pid 23134 (irssi), uid 1001, was killed: out of swap space pid 49332 (sshd), uid 1001, was killed: out of swap space pid 69074 (httpd), uid 0, was killed: out of swap space pid 11879 (eggdrop-1.6.19), uid 1001, was killed: out of swap space ... And so on. The machine is: FreeBSD exodus.poly.edu 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #2: Thu Dec 2 11:39:21 EST 2010 spawk@exodus.poly.edu:/usr/obj/usr/src/sys/EXODUS amd64 10:13AM up 120 days, 20:06, 2 users, load averages: 0.00, 0.01, 0.00 The memory line from top intrigued me: Mem: 16M Active, 48M Inact, 6996M Wired, 229M Cache, 828M Buf, 605M Free The machine has 8 gigs of memory, and I don't know what all that wired memory is being used for. There is a large-ish (6 x 1.5-TB) ZFS RAID-Z2 on it which has had a disk in the UNAVAIL state for a few months: # zpool status pool: home state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: none requested config: NAME STATE READ WRITE CKSUM home DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada5 UNAVAIL 0 85 11 experienced I/O failures errors: No known data errors "vmstat -m" and "vmstat -z" output: http://acm.poly.edu/~spawk/vmstat-m.txt http://acm.poly.edu/~spawk/vmstat-z.txt Anyone have a clue? I know it's just going to happen again if I reboot the machine. It is still up in case there are diagnostics for me to run. -Boris
On Sat, Apr 02, 2011 at 10:17:27AM -0400, Boris Kochergin wrote:> Ahoy. This morning, I awoke to the following on one of my servers: > > pid 59630 (httpd), uid 80, was killed: out of swap space > pid 59341 (find), uid 0, was killed: out of swap space > pid 23134 (irssi), uid 1001, was killed: out of swap space > pid 49332 (sshd), uid 1001, was killed: out of swap space > pid 69074 (httpd), uid 0, was killed: out of swap space > pid 11879 (eggdrop-1.6.19), uid 1001, was killed: out of swap space > ... > > And so on. > > The machine is: > > FreeBSD exodus.poly.edu 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #2: > Thu Dec 2 11:39:21 EST 2010 > spawk@exodus.poly.edu:/usr/obj/usr/src/sys/EXODUS amd64 > > 10:13AM up 120 days, 20:06, 2 users, load averages: 0.00, 0.01, 0.00 > > The memory line from top intrigued me: > > Mem: 16M Active, 48M Inact, 6996M Wired, 229M Cache, 828M Buf, 605M Free > > The machine has 8 gigs of memory, and I don't know what all that > wired memory is being used for. There is a large-ish (6 x 1.5-TB) > ZFS RAID-Z2 on it which has had a disk in the UNAVAIL state for a > few months:The ZFS ARC is what's responsible for your large wired count. How much swap space do you have? You excluded that line from top. "swapinfo" would also be helpful, but would indicate the same thing. If you lack swap (which is a bad idea for a lot of reasons), then the machine running out of available memory for userspace (a process which grew too large, thus impacting others which were trying to malloc() at the time) would make sense. Can you please provide /boot/loader.conf and /etc/sysctl.conf ?> # zpool status > pool: home > state: DEGRADED > status: One or more devices could not be used because the label is > missing or > invalid. Sufficient replicas exist for the pool to continue > functioning in a degraded state. > action: Replace the device using 'zpool replace'. > see: http://www.sun.com/msg/ZFS-8000-4J > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > home DEGRADED 0 0 0 > raidz2 DEGRADED 0 0 0 > ada0 ONLINE 0 0 0 > ada1 ONLINE 0 0 0 > ada2 ONLINE 0 0 0 > ada3 ONLINE 0 0 0 > ada4 ONLINE 0 0 0 > ada5 UNAVAIL 0 85 11 experienced I/O failures > > errors: No known data errorsI would also recommend fixing ada5; I'm not sure why any SA would let a bad disk sit in a machine for "a few months". Though, hopefully, this doesn't cause extra memory usage or something odd behind the scenes (in the kernel). I'm going to assume the two things are completely unrelated.> "vmstat -m" and "vmstat -z" output: > > http://acm.poly.edu/~spawk/vmstat-m.txt > http://acm.poly.edu/~spawk/vmstat-z.txt > > Anyone have a clue? I know it's just going to happen again if I > reboot the machine. It is still up in case there are diagnostics for > me to run.The above vmstat data won't be too helpful since you need to see what's going on "over time" and not what the values are right now. There may be one of them that indicates available userspace vs. available kmem. Basically what you need is the equivalent of Solaris sar(1), so that you can see memory usage of processes/etc. over time and find out if something went crazy and started going malloc-crazy. If the kernel itself ran out, you'd be seeing a panic. Sorry if these ideas/comments seem like a ramble, I've been up all night trying to decode a circa-1992 font routine in 65816 assembly, heh. :-) -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Sat, Apr 02, 2011 at 10:17:27AM -0400, Boris Kochergin wrote:> Ahoy. This morning, I awoke to the following on one of my servers: > > pid 59630 (httpd), uid 80, was killed: out of swap space > pid 59341 (find), uid 0, was killed: out of swap space > pid 23134 (irssi), uid 1001, was killed: out of swap space > pid 49332 (sshd), uid 1001, was killed: out of swap space > pid 69074 (httpd), uid 0, was killed: out of swap space > pid 11879 (eggdrop-1.6.19), uid 1001, was killed: out of swap space > ... > > And so on. > > The machine is: > > FreeBSD exodus.poly.edu 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #2: Thu > Dec 2 11:39:21 EST 2010 > spawk@exodus.poly.edu:/usr/obj/usr/src/sys/EXODUS amd64 > > 10:13AM up 120 days, 20:06, 2 users, load averages: 0.00, 0.01, 0.00 > > The memory line from top intrigued me: > > Mem: 16M Active, 48M Inact, 6996M Wired, 229M Cache, 828M Buf, 605M Free > > The machine has 8 gigs of memory, and I don't know what all that wired > memory is being used for. There is a large-ish (6 x 1.5-TB) ZFS RAID-Z2 > on it which has had a disk in the UNAVAIL state for a few months: > > # zpool status > pool: home > state: DEGRADED > status: One or more devices could not be used because the label is > missing or > invalid. Sufficient replicas exist for the pool to continue > functioning in a degraded state. > action: Replace the device using 'zpool replace'. > see: http://www.sun.com/msg/ZFS-8000-4J > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > home DEGRADED 0 0 0 > raidz2 DEGRADED 0 0 0 > ada0 ONLINE 0 0 0 > ada1 ONLINE 0 0 0 > ada2 ONLINE 0 0 0 > ada3 ONLINE 0 0 0 > ada4 ONLINE 0 0 0 > ada5 UNAVAIL 0 85 11 experienced I/O failures > > errors: No known data errors > > "vmstat -m" and "vmstat -z" output: > > http://acm.poly.edu/~spawk/vmstat-m.txt > http://acm.poly.edu/~spawk/vmstat-z.txt > > Anyone have a clue? I know it's just going to happen again if I reboot > the machine. It is still up in case there are diagnostics for me to run.Try r218795. Most likely, your issue is not leak. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20110402/1cf1410b/attachment.pgp
On 04/05/11 10:04, Pete French wrote:>> Adding some swap would help a lot more. > So, I run a lot of systems without swap - basically my > thinking at the time I set them up went like this. > > "I have 4 gig of memory, and 4 gig of swap. Surely running 8 gig of > memory and no swap will be just as good ?" > > but, is that actually true ? Is real RAM as good as an equivalent amount > of swap, or is there smething special about swap which means you shoud > have some no matter how much RAM you have ? > > -pete.I guess swap is special since I assume memory used by the kernel will never be offloaded to it (could be wrong), but userspace memory will, so it is guaranteed to be available to userspace processes only. -Boris
on 05/04/2011 17:04 Pete French said the following:>> Adding some swap would help a lot more. > > So, I run a lot of systems without swap - basically my > thinking at the time I set them up went like this. > > "I have 4 gig of memory, and 4 gig of swap. Surely running 8 gig of > memory and no swap will be just as good ?" > > but, is that actually true ? Is real RAM as good as an equivalent amount > of swap, or is there smething special about swap which means you shoud > have some no matter how much RAM you have ?I think that it depends. I usually do use swap for the following reasons: 1. some anonymous memory ("malloced") may reasonably go to swap to free some RAM for caching data; that can have overall performance benefits depending in system usage patterns; 2. VM is happy dealing out RAM for any uses until some low watermarks are reached, then the system tries to free up some RAM. Depending on the amount of memory (and those thresholds) and "burstiness" of memory demand a system may potentially run completely out of memory and would have to kill some processes. Having swap provides some cushion. Swap kind of smooths any bursts. (And it can also slow things down as a side effect) Of course, the system can run out of swap as well, but that would mean that you really need more RAM. -- Andriy Gapon
On Tue, Apr 05, 2011 at 03:04:22PM +0100, Pete French wrote:> > Adding some swap would help a lot more. > > So, I run a lot of systems without swap - basically my > thinking at the time I set them up went like this. > > "I have 4 gig of memory, and 4 gig of swap. Surely running 8 gig of > memory and no swap will be just as good ?" > > but, is that actually true ? Is real RAM as good as an equivalent amount > of swap, or is there smething special about swap which means you shoud > have some no matter how much RAM you have ?I believe some things (caches/buffers and the like) are sized according to how much real RAM you have, i.e. if you have 8G RAM the system will actuallu use more memory than if you have only 4G RAM. I also think that parts of the system are designed with the assumption that there is some swap available that can act as some sort of "overflow buffer" from time to time. -- <Insert your favourite quote here.> Erik Trulsson ertr1013@student.uu.se
> Having swap provides some cushion. Swap kind of smooths any bursts. (And it can > also slow things down as a side effect)This is why I got rid of it - my application is a lot of CGI scripts. The overload condition is that we run out of memory - and we run *way* out of memory .... its never just a little overflow, it;s either handleable or completely crushed. But swap makes that mre llikely to happen, because as the processes are swapped out they run slower, take longer to finish and thus use memory for longer. What I saw was that as soon as any web server would start tos wap it would swftly fall down. Without swap they stay up, but reject requests. Its a better failure mode... these days I run a compormise - swap on internal machines, and no swap on customer facing ones, but lots of RAM (16 gig). -pete.
2011/4/2 Boris Kochergin <spawk@acm.poly.edu>:> pid 59630 (httpd), uid 80, was killed: out of swap space > pid 59341 (find), uid 0, was killed: out of swap space > pid 23134 (irssi), uid 1001, was killed: out of swap space > pid 49332 (sshd), uid 1001, was killed: out of swap space > pid 69074 (httpd), uid 0, was killed: out of swap space > pid 11879 (eggdrop-1.6.19), uid 1001, was killed: out of swap spaceLike others, I'll also suggest adding at least a little swap. If you don't have disk space outside of the ZFS pool (recommended way to create a swap), you can create one inside, with a zvol : zfs create -V 2G -o org.freebsd:swap=on -o primarycache=none -o secondarycache=none -o tank/swap I sometimes use "-b 8K" and "-o checksum=off" for the swap, but haven't stress tested this under 9-CURRENT and ZFS v28.> # zpool status > ?pool: home > ?state: DEGRADED > status: One or more devices could not be used because the label is missing > or > ? ? ? ?invalid. ?Sufficient replicas exist for the pool to continue > ? ? ? ?functioning in a degraded state. > action: Replace the device using 'zpool replace'. > ? see: http://www.sun.com/msg/ZFS-8000-4J > ?scrub: none requested > config: > > ? ? ? ?NAME ? ? ? ?STATE ? ? READ WRITE CKSUM > ? ? ? ?home ? ? ? ?DEGRADED ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ?raidz2 ? ?DEGRADED ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?ada0 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?ada1 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?ada2 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?ada3 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?ada4 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 > ? ? ? ? ? ?ada5 ? ?UNAVAIL ? ? ?0 ? ?85 ? ?11 ?experienced I/O failures > > errors: No known data errorsLike others, I'll also *strongly* suggest fixing that ada5 problem. Try to run smartctl on the disk to see the problem. If the disk is bad, replace it ! Don't wait "for a few months" if you don't want to definitely loose your data. Cheers -- Olivier Smedts? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? _ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ASCII ribbon campaign ( ) e-mail: olivier@gid0.org? ? ? ? - against HTML email & vCards? X www: http://www.gid0.org? ? - against proprietary attachments / \ ? "Il y a seulement 10 sortes de gens dans le monde : ? ceux qui comprennent le binaire, ? et ceux qui ne le comprennent pas."