thr3ads.net - freebsd stable - FreeBSD 8.2 - active plus inactive memory leak!? [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Luke Marsden

2012-Mar-06 19:52 UTC

FreeBSD 8.2 - active plus inactive memory leak!?

Hi all,

I'm having some trouble with some production 8.2-RELEASE servers where
the 'Active' and 'Inact' memory values reported by top don't
seem to
correspond with the processes which are running on the machine.  I have
two near-identical machines (with slightly different workloads); on one,
let's call it A, active + free is small (6.5G) and on the other (B)
active + free is large (13.6G), even though they have almost identical
sums-of-resident memory (8.3G on A and 9.3G on B).

The only difference is that A has a smaller number of quite long-running
processes (it's hosting a small number of busy sites) and B has a larger
number of more frequently killed/recycled processes (it's hosting a
larger number of quiet sites, so the FastCGI processes get killed and
restarted frequently).  Notably B has many more ZFS filesystems mounted
than A (around 4,000 versus 100).  The machines are otherwise under
similar amounts of load.  I hoped that the community could please help
me understand what's going on with respect to the worryingly large
amount of active + free memory on B.

Both machines are ZFS-on-root with FreeBSD 8.2-RELEASE with uptimes
around 5-6 days.  I have recently reduced the ARC cache on both machines
since my previous thread [1] and Wired memory usage is now stable at 6G
on A and 7G on B with an arc_max of 4G on both machines.

Neither of the machines have any swap in use:

        Swap: 10G Total, 10G Free

My current (probably quite simplistic) understanding of the FreeBSD
virtual memory system is that, for each process as reported by top:

      * Size corresponds to the total size of all the text pages for the
        process (those belonging to code in the binary itself and linked
        libraries) plus data pages (including stack and malloc()'d but
        not-yet-written-to memory segments).
      * Resident corresponds to a subset of the pages above: those pages
        which actually occupy physical/core memory.  Notably pages may
        appear in size but not appear in resident for read-only text
        pages from libraries which have not been used yet or which have
        been malloc()'d but not yet written-to.

My understanding for the values for the system as a whole (at the top in
'top') is as follows:

      * Active / inactive memory is the same thing: resident memory from
        processes in use.  Being in the inactive as opposed to active
        list simply indicates that the pages in question are less
        recently used and therefore more likely to get swapped out if
        the machine comes under memory pressure.
      * Wired is mostly kernel memory.
      * Cache is freed memory which the kernel has decided to keep in
        case it correspond to a useful page in future; it can be cheaply
        evicted into the free list.
      * Free memory is actually not being used for anything.

It seems that pages which occur in the active + inactive lists must
occur in the resident memory of one or more processes ("or more" since
processes can share pages in e.g. read-only shared libs or COW forked
address space).  Conversely, if a page *does not* occur in the resident
memory of any process, it must not occupy any space in the active +
inactive lists.

Therefore the active + inactive memory should always be less than or
equal to the sum of the resident memory of all the processes on the
system, right?

But it's not.  So, I wrote a very simple Python script to add up the
resident memory values in the output from 'top' and, on machine A:

        Mem: 3388M Active, 3209M Inact, 6066M Wired, 196K Cache, 11G
        Free
        There were 246 processes totalling 8271 MB resident memory
        
Whereas on machine B:

        Mem: 11G Active, 2598M Inact, 7177M Wired, 733M Cache, 1619M
        Free
        There were 441 processes totalling 9297 MB resident memory
        
Now, on machine A:

        3388M active + 3209M inactive - 8271M sum-of-resident = -1674M
        
I can attribute this negative value to shared libraries between the
running processes (which the sum-of-res is double-counting but active +
inactive is not).  But on machine B:

        11264M active + 2598M inactive - 9297M sum-of-resident = 4565M
        
I'm struggling to explain how, when there are only 9.2G (worst case,
discounting shared pages) of resident processes, the system is using 11G
+ 2598M = 13.8G of memory!

This "missing memory" is scary, because it seems to be increasing over
time, and eventually when the system runs out of free memory, I'm
certain it will crash in the same way described in my previous thread
[1].

Is my understanding of the virtual memory system badly broken - in which
case please educate me ;-) or is there a real problem here?  If so how
can I dig deeper to help uncover/fix it?

Best Regards,
Luke Marsden

[1] lists.freebsd.org/pipermail/freebsd-fs/2012-February/013775.html
[2] https://gist.github.com/1988153

-- 
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com

Luke Marsden

2012-Mar-07 00:36 UTC

head link

FreeBSD 8.2 - active plus inactive memory leak!?

Thanks for your email, Chuck.
> > Conversely, if a page *does not* occur in the resident
> > memory of any process, it must not occupy any space in the active +
> > inactive lists.
> 
> Hmm...if a process gets swapped out entirely, the pages for it will be
moved
> to the cache list, flushed, and then reused as soon as the disk I/O
completes.
>   But there is a window where the process can be marked as swapped out (and
> considered no longer resident), but still has some of it's pages in
physical
> memory.
There's no swapping happening on these machines (intentionally so,
because as soon as we hit swap everything goes tits up), so this window
doesn't concern me.

I'm trying to confirm that, on a system with no pages swapped out, that
the following is a true statement:

        a page is accounted for in active + inactive if and only if it
        corresponds to one or more of the pages accounted for in the
        resident memory lists of all the processes on the system (as per
        the output of 'top' and 'ps')
> > Therefore the active + inactive memory should always be less than or
> > equal to the sum of the resident memory of all the processes on the
> > system, right?
> 
> No.  If you've got a lot of process pages shared (ie, a webserver with
lots of
> httpd children, or a database pulling in a large common shmem area), then
your
> process resident sizes can be very large compared to the system-wide 
> active+inactive count.
But that's what I'm saying...

        sum(process resident sizes) >= active + inactive
        
Or as I said it above, equivalently:
        
        active + inactive <= sum(process resident sizes)

The data I've got from this system, and what's killing us, shows the
opposite: active + inactive > sum(process resident sizes) - by over 5GB
now and growing, which is what keeps causing these machines to crash.

In particular:
Mem: 13G Active, 1129M Inact, 7543M Wired, 120M Cache, 1553M Free

But the total sum of resident memories is 9457M (according to summing
the output from ps or top).

        13G + 1129M = 14441M (active + inact) > 9457M (sum of res)

That's 4984M out, and that's almost enough to push us over the edge.

If my understanding of VM is correct, I don't see how this can happen.
But it's happening, and it's causing real trouble here because our free
memory keeps hitting zero and then we swap-spiral.

What can I do to investigate this discrepancy?  Are there some tools
that I can use to debug the memory allocated in "active" to find out
where it's going, if not to resident process memory?

Thanks,
Luke

-- 
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com

Ian Lepore

2012-Mar-07 00:49 UTC

head link

FreeBSD 8.2 - active plus inactive memory leak!?

On Tue, 2012-03-06 at 19:13 +0000, Luke Marsden wrote:> Hi all,
> 
> I'm having some trouble with some production 8.2-RELEASE servers where
> the 'Active' and 'Inact' memory values reported by top
don't seem to
> correspond with the processes which are running on the machine.  I have
> two near-identical machines (with slightly different workloads); on one,
> let's call it A, active + free is small (6.5G) and on the other (B)
> active + free is large (13.6G), even though they have almost identical
> sums-of-resident memory (8.3G on A and 9.3G on B).
> 
> The only difference is that A has a smaller number of quite long-running
> processes (it's hosting a small number of busy sites) and B has a
larger
> number of more frequently killed/recycled processes (it's hosting a
> larger number of quiet sites, so the FastCGI processes get killed and
> restarted frequently).  Notably B has many more ZFS filesystems mounted
> than A (around 4,000 versus 100).  The machines are otherwise under
> similar amounts of load.  I hoped that the community could please help
> me understand what's going on with respect to the worryingly large
> amount of active + free memory on B.
> 
> Both machines are ZFS-on-root with FreeBSD 8.2-RELEASE with uptimes
> around 5-6 days.  I have recently reduced the ARC cache on both machines
> since my previous thread [1] and Wired memory usage is now stable at 6G
> on A and 7G on B with an arc_max of 4G on both machines.
> 
> Neither of the machines have any swap in use:
> 
>         Swap: 10G Total, 10G Free
> 
> My current (probably quite simplistic) understanding of the FreeBSD
> virtual memory system is that, for each process as reported by top:
> 
>       * Size corresponds to the total size of all the text pages for the
>         process (those belonging to code in the binary itself and linked
>         libraries) plus data pages (including stack and malloc()'d but
>         not-yet-written-to memory segments).
>       * Resident corresponds to a subset of the pages above: those pages
>         which actually occupy physical/core memory.  Notably pages may
>         appear in size but not appear in resident for read-only text
>         pages from libraries which have not been used yet or which have
>         been malloc()'d but not yet written-to.
> 
> My understanding for the values for the system as a whole (at the top in
> 'top') is as follows:
> 
>       * Active / inactive memory is the same thing: resident memory from
>         processes in use.  Being in the inactive as opposed to active
>         list simply indicates that the pages in question are less
>         recently used and therefore more likely to get swapped out if
>         the machine comes under memory pressure.
>       * Wired is mostly kernel memory.
>       * Cache is freed memory which the kernel has decided to keep in
>         case it correspond to a useful page in future; it can be cheaply
>         evicted into the free list.
>       * Free memory is actually not being used for anything.
> 
> It seems that pages which occur in the active + inactive lists must
> occur in the resident memory of one or more processes ("or more"
since
> processes can share pages in e.g. read-only shared libs or COW forked
> address space).  Conversely, if a page *does not* occur in the resident
> memory of any process, it must not occupy any space in the active +
> inactive lists.
> 
> Therefore the active + inactive memory should always be less than or
> equal to the sum of the resident memory of all the processes on the
> system, right?
> 
> But it's not.  So, I wrote a very simple Python script to add up the
> resident memory values in the output from 'top' and, on machine A:
> 
>         Mem: 3388M Active, 3209M Inact, 6066M Wired, 196K Cache, 11G
>         Free
>         There were 246 processes totalling 8271 MB resident memory
>         
> Whereas on machine B:
> 
>         Mem: 11G Active, 2598M Inact, 7177M Wired, 733M Cache, 1619M
>         Free
>         There were 441 processes totalling 9297 MB resident memory
>         
> Now, on machine A:
> 
>         3388M active + 3209M inactive - 8271M sum-of-resident = -1674M
>         
> I can attribute this negative value to shared libraries between the
> running processes (which the sum-of-res is double-counting but active +
> inactive is not).  But on machine B:
> 
>         11264M active + 2598M inactive - 9297M sum-of-resident = 4565M
>         
> I'm struggling to explain how, when there are only 9.2G (worst case,
> discounting shared pages) of resident processes, the system is using 11G
> + 2598M = 13.8G of memory!
> 
> This "missing memory" is scary, because it seems to be increasing
over
> time, and eventually when the system runs out of free memory, I'm
> certain it will crash in the same way described in my previous thread
> [1].
> 
> Is my understanding of the virtual memory system badly broken - in which
> case please educate me ;-) or is there a real problem here?  If so how
> can I dig deeper to help uncover/fix it?
> 
> Best Regards,
> Luke Marsden
> 
> [1] lists.freebsd.org/pipermail/freebsd-fs/2012-February/013775.html
> [2] https://gist.github.com/1988153
> 
In my experience, the bulk of the memory in the inactive category is
cached disk blocks, at least for ufs (I think zfs does things
differently).  On this desktop machine I have 12G physical and typically
have roughly 11G inactive, and I can unmount one particular filesystem
where most of my work is done and instantly I have almost no inactive
and roughly 11G free.

-- Ian

J B

2012-Mar-07 11:23 UTC

head link

FreeBSD 8.2 - active plus inactive memory leak!?

On Wed, 07 Mar 2012 10:23:38 +0200, Konstantin Belousov wrote:
> On Wed, Mar 07, 2012 at 12:36:21AM +0000, Luke Marsden wrote:
> ...
>> I'm trying to confirm that, on a system with no pages swapped out,
that
>> the following is a true statement:
>>
>>         a page is accounted for in active + inactive if and only if it
>>         corresponds to one or more of the pages accounted for in the
>>         resident memory lists of all the processes on the system (as
>>         per the output of 'top' and 'ps')
> No.
>
> The pages belonging to vnode vm object can be active or inactive or
> cached but not mapped into any process address space.
I wonder if some ideas by Denys Vlasenko contained in this thread
http://comments.gmane.org/gmane.linux.redhat.fedora.devel/157706
would be useful ?

...
"Today, I'm looking at my process list, sorted by amount of dirtied
pages
(which very closely matches amount of malloced and used space - that is,
malloced, but not-written to memory areas are not included).
This is the most expensive type of pages, they can't be discarded.
If we would be in memory squeeze, kernel will have to swap them out,
if swap exists, otherwise kernel can't do anything at all."
...
"Note that any shared pages (such as glibc) are not freed this way;
also, non-mapped pages (such as large, but unused malloced space, or large,
but unused file mappings) also do not contribute to MemFree increase."

jb

freebsd stable - Mar 2012 - FreeBSD 8.2 - active plus inactive memory leak!?

FreeBSD 8.2 - active plus inactive memory leak!?

FreeBSD 8.2 - active plus inactive memory leak!?

FreeBSD 8.2 - active plus inactive memory leak!?

FreeBSD 8.2 - active plus inactive memory leak!?