thr3ads.net - freebsd stable - Kernel memory leak in 8.2-PRERELEASE? [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Boris Kochergin

2011-Apr-02 14:44 UTC

Kernel memory leak in 8.2-PRERELEASE?

Ahoy. This morning, I awoke to the following on one of my servers:

pid 59630 (httpd), uid 80, was killed: out of swap space
pid 59341 (find), uid 0, was killed: out of swap space
pid 23134 (irssi), uid 1001, was killed: out of swap space
pid 49332 (sshd), uid 1001, was killed: out of swap space
pid 69074 (httpd), uid 0, was killed: out of swap space
pid 11879 (eggdrop-1.6.19), uid 1001, was killed: out of swap space
...

And so on.

The machine is:

FreeBSD exodus.poly.edu 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #2: Thu 
Dec  2 11:39:21 EST 2010     
spawk@exodus.poly.edu:/usr/obj/usr/src/sys/EXODUS  amd64

10:13AM  up 120 days, 20:06, 2 users, load averages: 0.00, 0.01, 0.00

The memory line from top intrigued me:

Mem: 16M Active, 48M Inact, 6996M Wired, 229M Cache, 828M Buf, 605M Free

The machine has 8 gigs of memory, and I don't know what all that wired 
memory is being used for. There is a large-ish (6 x 1.5-TB) ZFS RAID-Z2 
on it which has had a disk in the UNAVAIL state for a few months:

# zpool status
   pool: home
  state: DEGRADED
status: One or more devices could not be used because the label is 
missing or
         invalid.  Sufficient replicas exist for the pool to continue
         functioning in a degraded state.
action: Replace the device using 'zpool replace'.
    see: http://www.sun.com/msg/ZFS-8000-4J
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         home        DEGRADED     0     0     0
           raidz2    DEGRADED     0     0     0
             ada0    ONLINE       0     0     0
             ada1    ONLINE       0     0     0
             ada2    ONLINE       0     0     0
             ada3    ONLINE       0     0     0
             ada4    ONLINE       0     0     0
             ada5    UNAVAIL      0    85    11  experienced I/O failures

errors: No known data errors

"vmstat -m" and "vmstat -z" output:

http://acm.poly.edu/~spawk/vmstat-m.txt
http://acm.poly.edu/~spawk/vmstat-z.txt

Anyone have a clue? I know it's just going to happen again if I reboot 
the machine. It is still up in case there are diagnostics for me to run.

-Boris

Jeremy Chadwick

2011-Apr-02 15:31 UTC

head link

Kernel memory leak in 8.2-PRERELEASE?

On Sat, Apr 02, 2011 at 10:17:27AM -0400, Boris Kochergin
wrote:> Ahoy. This morning, I awoke to the following on one of my servers:
> 
> pid 59630 (httpd), uid 80, was killed: out of swap space
> pid 59341 (find), uid 0, was killed: out of swap space
> pid 23134 (irssi), uid 1001, was killed: out of swap space
> pid 49332 (sshd), uid 1001, was killed: out of swap space
> pid 69074 (httpd), uid 0, was killed: out of swap space
> pid 11879 (eggdrop-1.6.19), uid 1001, was killed: out of swap space
> ...
> 
> And so on.
>
> The machine is:
> 
> FreeBSD exodus.poly.edu 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #2:
> Thu Dec  2 11:39:21 EST 2010
> spawk@exodus.poly.edu:/usr/obj/usr/src/sys/EXODUS  amd64
> 
> 10:13AM  up 120 days, 20:06, 2 users, load averages: 0.00, 0.01, 0.00
> 
> The memory line from top intrigued me:
> 
> Mem: 16M Active, 48M Inact, 6996M Wired, 229M Cache, 828M Buf, 605M Free
> 
> The machine has 8 gigs of memory, and I don't know what all that
> wired memory is being used for. There is a large-ish (6 x 1.5-TB)
> ZFS RAID-Z2 on it which has had a disk in the UNAVAIL state for a
> few months:
The ZFS ARC is what's responsible for your large wired count.

How much swap space do you have?  You excluded that line from top.
"swapinfo" would also be helpful, but would indicate the same thing.

If you lack swap (which is a bad idea for a lot of reasons), then the
machine running out of available memory for userspace (a process which
grew too large, thus impacting others which were trying to malloc() at
the time) would make sense.

Can you please provide /boot/loader.conf and /etc/sysctl.conf ?
> # zpool status
>   pool: home
>  state: DEGRADED
> status: One or more devices could not be used because the label is
> missing or
>         invalid.  Sufficient replicas exist for the pool to continue
>         functioning in a degraded state.
> action: Replace the device using 'zpool replace'.
>    see: http://www.sun.com/msg/ZFS-8000-4J
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         home        DEGRADED     0     0     0
>           raidz2    DEGRADED     0     0     0
>             ada0    ONLINE       0     0     0
>             ada1    ONLINE       0     0     0
>             ada2    ONLINE       0     0     0
>             ada3    ONLINE       0     0     0
>             ada4    ONLINE       0     0     0
>             ada5    UNAVAIL      0    85    11  experienced I/O failures
> 
> errors: No known data errors
I would also recommend fixing ada5; I'm not sure why any SA would let a
bad disk sit in a machine for "a few months".  Though, hopefully, this
doesn't cause extra memory usage or something odd behind the scenes (in
the kernel).  I'm going to assume the two things are completely
unrelated.
> "vmstat -m" and "vmstat -z" output:
> 
> http://acm.poly.edu/~spawk/vmstat-m.txt
> http://acm.poly.edu/~spawk/vmstat-z.txt
> 
> Anyone have a clue? I know it's just going to happen again if I
> reboot the machine. It is still up in case there are diagnostics for
> me to run.
The above vmstat data won't be too helpful since you need to see what's
going on "over time" and not what the values are right now.  There may
be one of them that indicates available userspace vs. available kmem.

Basically what you need is the equivalent of Solaris sar(1), so that you
can see memory usage of processes/etc. over time and find out if
something went crazy and started going malloc-crazy.

If the kernel itself ran out, you'd be seeing a panic.

Sorry if these ideas/comments seem like a ramble, I've been up all night
trying to decode a circa-1992 font routine in 65816 assembly, heh.  :-)

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |

Kostik Belousov

2011-Apr-02 15:33 UTC

head link

Kernel memory leak in 8.2-PRERELEASE?

On Sat, Apr 02, 2011 at 10:17:27AM -0400, Boris Kochergin
wrote:> Ahoy. This morning, I awoke to the following on one of my servers:
> 
> pid 59630 (httpd), uid 80, was killed: out of swap space
> pid 59341 (find), uid 0, was killed: out of swap space
> pid 23134 (irssi), uid 1001, was killed: out of swap space
> pid 49332 (sshd), uid 1001, was killed: out of swap space
> pid 69074 (httpd), uid 0, was killed: out of swap space
> pid 11879 (eggdrop-1.6.19), uid 1001, was killed: out of swap space
> ...
> 
> And so on.
> 
> The machine is:
> 
> FreeBSD exodus.poly.edu 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #2: Thu 
> Dec  2 11:39:21 EST 2010     
> spawk@exodus.poly.edu:/usr/obj/usr/src/sys/EXODUS  amd64
> 
> 10:13AM  up 120 days, 20:06, 2 users, load averages: 0.00, 0.01, 0.00
> 
> The memory line from top intrigued me:
> 
> Mem: 16M Active, 48M Inact, 6996M Wired, 229M Cache, 828M Buf, 605M Free
> 
> The machine has 8 gigs of memory, and I don't know what all that wired 
> memory is being used for. There is a large-ish (6 x 1.5-TB) ZFS RAID-Z2 
> on it which has had a disk in the UNAVAIL state for a few months:
> 
> # zpool status
>   pool: home
>  state: DEGRADED
> status: One or more devices could not be used because the label is 
> missing or
>         invalid.  Sufficient replicas exist for the pool to continue
>         functioning in a degraded state.
> action: Replace the device using 'zpool replace'.
>    see: http://www.sun.com/msg/ZFS-8000-4J
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         home        DEGRADED     0     0     0
>           raidz2    DEGRADED     0     0     0
>             ada0    ONLINE       0     0     0
>             ada1    ONLINE       0     0     0
>             ada2    ONLINE       0     0     0
>             ada3    ONLINE       0     0     0
>             ada4    ONLINE       0     0     0
>             ada5    UNAVAIL      0    85    11  experienced I/O failures
> 
> errors: No known data errors
> 
> "vmstat -m" and "vmstat -z" output:
> 
> http://acm.poly.edu/~spawk/vmstat-m.txt
> http://acm.poly.edu/~spawk/vmstat-z.txt
> 
> Anyone have a clue? I know it's just going to happen again if I reboot 
> the machine. It is still up in case there are diagnostics for me to run.
Try r218795. Most likely, your issue is not leak.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20110402/1cf1410b/attachment.pgp

Boris Kochergin

2011-Apr-05 14:09 UTC

head link

Kernel memory leak in 8.2-PRERELEASE?

On 04/05/11 10:04, Pete French wrote:>> Adding some swap would help a lot more.
> So, I run a lot of systems without swap - basically my
> thinking at the time I set them up went like this.
>
> "I have 4 gig of memory, and 4 gig of swap. Surely running 8 gig of
> memory and no swap will be just as good ?"
>
> but, is that actually true ? Is real RAM as good as an equivalent amount
> of swap, or is there smething special about swap which means you shoud
> have some no matter how much RAM you have ?
>
> -pete.
I guess swap is special since I assume memory used by the kernel will 
never be offloaded to it (could be wrong), but userspace memory will, so 
it is guaranteed to be available to userspace processes only.

-Boris

Andriy Gapon

2011-Apr-05 14:30 UTC

head link

Kernel memory leak in 8.2-PRERELEASE?

on 05/04/2011 17:04 Pete French said the following:>> Adding some swap would help a lot more.
> 
> So, I run a lot of systems without swap - basically my
> thinking at the time I set them up went like this.
> 
> "I have 4 gig of memory, and 4 gig of swap. Surely running 8 gig of
> memory and no swap will be just as good ?"
> 
> but, is that actually true ? Is real RAM as good as an equivalent amount
> of swap, or is there smething special about swap which means you shoud
> have some no matter how much RAM you have ?
I think that it depends.
I usually do use swap for the following reasons:
1. some anonymous memory ("malloced") may reasonably go to swap to
free some RAM
for caching data;  that can have overall performance benefits depending in
system
usage patterns;
2. VM is happy dealing out RAM for any uses until some low watermarks are
reached,
then the system tries to free up some RAM.  Depending on the amount of memory
(and
those thresholds) and "burstiness" of memory demand a system may
potentially run
completely out of memory and would have to kill some processes.
Having swap provides some cushion.  Swap kind of smooths any bursts. (And it can
also slow things down as a side effect)
Of course, the system can run out of swap as well, but that would mean that you
really need more RAM.

-- 
Andriy Gapon

Erik Trulsson

2011-Apr-05 14:37 UTC

head link

Kernel memory leak in 8.2-PRERELEASE?

On Tue, Apr 05, 2011 at 03:04:22PM +0100, Pete French
wrote:> > Adding some swap would help a lot more.
> 
> So, I run a lot of systems without swap - basically my
> thinking at the time I set them up went like this.
> 
> "I have 4 gig of memory, and 4 gig of swap. Surely running 8 gig of
> memory and no swap will be just as good ?"
> 
> but, is that actually true ? Is real RAM as good as an equivalent amount
> of swap, or is there smething special about swap which means you shoud
> have some no matter how much RAM you have ?
I believe some things (caches/buffers and the like) are sized according
to how much real RAM you have, i.e. if you have 8G RAM the system will
actuallu use more memory than if you have only 4G RAM.

I also think that parts of the system are designed with the assumption
that there is some swap available that can act as some sort of
"overflow buffer" from time to time.






-- 
<Insert your favourite quote here.>
Erik Trulsson
ertr1013@student.uu.se

Pete French

2011-Apr-05 14:55 UTC

head link

Kernel memory leak in 8.2-PRERELEASE?

> Having swap provides some cushion.  Swap kind of smooths any bursts. (And
it can
> also slow things down as a side effect)
This is why I got rid of it - my application is a lot of CGI scripts. The
overload condition is that we run out of memory - and we run *way* out
of memory .... its never just a little overflow, it;s either handleable or
completely crushed. But swap makes that mre llikely to happen, because
as the processes are swapped out they run slower, take longer to
finish and thus use memory for longer.

What I saw was that as soon as any web server would start tos wap it would
swftly fall down. Without swap they stay up, but reject requests. Its a better
failure mode...

these days I run a compormise - swap on internal machines, and no swap
on customer facing ones, but lots of RAM (16 gig).

-pete.

Olivier Smedts

2011-Apr-06 12:37 UTC

head link

Kernel memory leak in 8.2-PRERELEASE?

2011/4/2 Boris Kochergin <spawk@acm.poly.edu>:> pid 59630 (httpd), uid 80, was killed: out of swap space
> pid 59341 (find), uid 0, was killed: out of swap space
> pid 23134 (irssi), uid 1001, was killed: out of swap space
> pid 49332 (sshd), uid 1001, was killed: out of swap space
> pid 69074 (httpd), uid 0, was killed: out of swap space
> pid 11879 (eggdrop-1.6.19), uid 1001, was killed: out of swap space
Like others, I'll also suggest adding at least a little swap. If you
don't have disk space outside of the ZFS pool (recommended way to
create a swap), you can create one inside, with a zvol :
zfs create -V 2G -o org.freebsd:swap=on -o primarycache=none -o
secondarycache=none -o tank/swap

I sometimes use "-b 8K" and "-o checksum=off" for the swap,
but
haven't stress tested this under 9-CURRENT and ZFS v28.
> # zpool status
> ?pool: home
> ?state: DEGRADED
> status: One or more devices could not be used because the label is missing
> or
> ? ? ? ?invalid. ?Sufficient replicas exist for the pool to continue
> ? ? ? ?functioning in a degraded state.
> action: Replace the device using 'zpool replace'.
> ? see: http://www.sun.com/msg/ZFS-8000-4J
> ?scrub: none requested
> config:
>
> ? ? ? ?NAME ? ? ? ?STATE ? ? READ WRITE CKSUM
> ? ? ? ?home ? ? ? ?DEGRADED ? ? 0 ? ? 0 ? ? 0
> ? ? ? ? ?raidz2 ? ?DEGRADED ? ? 0 ? ? 0 ? ? 0
> ? ? ? ? ? ?ada0 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
> ? ? ? ? ? ?ada1 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
> ? ? ? ? ? ?ada2 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
> ? ? ? ? ? ?ada3 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
> ? ? ? ? ? ?ada4 ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
> ? ? ? ? ? ?ada5 ? ?UNAVAIL ? ? ?0 ? ?85 ? ?11 ?experienced I/O failures
>
> errors: No known data errors
Like others, I'll also *strongly* suggest fixing that ada5 problem.
Try to run smartctl on the disk to see the problem. If the disk is
bad, replace it ! Don't wait "for a few months" if you don't
want to
definitely loose your data.

Cheers
-- 
Olivier Smedts? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? _
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ASCII ribbon campaign ( )
e-mail: olivier@gid0.org? ? ? ? - against HTML email & vCards? X
www: http://www.gid0.org? ? - against proprietary attachments / \

? "Il y a seulement 10 sortes de gens dans le monde :
? ceux qui comprennent le binaire,
? et ceux qui ne le comprennent pas."

freebsd stable - Apr 2011 - Kernel memory leak in 8.2-PRERELEASE?

Kernel memory leak in 8.2-PRERELEASE?

Kernel memory leak in 8.2-PRERELEASE?

Kernel memory leak in 8.2-PRERELEASE?

Kernel memory leak in 8.2-PRERELEASE?

Kernel memory leak in 8.2-PRERELEASE?

Kernel memory leak in 8.2-PRERELEASE?

Kernel memory leak in 8.2-PRERELEASE?

Kernel memory leak in 8.2-PRERELEASE?