thr3ads.net - freebsd stable - Significant memory leak in 9.3p10? [Mar 2015]

If this information is useful, please help other people find it:
Share via:

J David

2015-Mar-16 22:59 UTC

Significant memory leak in 9.3p10?

Recently we have seen a large-scale memory leak on amd64 machines
running FreeBSD 9.3-RELEASE-p10.

This was first observed on 9.3p2 but has since shown up all the way through p10.

Here's what the header of top shows:

last pid: 32329;  load averages:  0.00,  0.01,  0.21    up 3+15:37:29  22:34:04
25 processes:  2 running, 22 sleeping, 1 waiting
CPU:     % user,     % nice,     % system,     % interrupt,     % idle
Mem: 4072M Active, 895M Inact, 1284M Wired, 125M Cache, 826M Buf, 1521M Free
Swap: 1024M Total, 874M Used, 149M Free, 85% Inuse

About 4G actively being used, another 895M inactive, and another 874M
in swap.  So it seems like this is a user-space leak, rather than a
kernel-space leak.

At the time of measurement, this machine was not doing anything and
every possible process had been killed trying to find a culprit.  The
entire output of "ps axlww" is:

UID   PID  PPID CPU PRI NI   VSZ  RSS MWCHAN   STAT TT        TIME COMMAND
  0     0     0   0 -52  0     0  224 -        DLs  ??     0:00.82 [kernel]
  0     1     0   0  20  0  6280  556 wait     SLs  ??     0:00.57 /sbin/init --
  0     2     0   0 -16  0     0   16 pftm     DL   ??     0:00.85 [pfpurge]
  0     3     0   0 -16  0     0   16 waiting_ DL   ??     0:00.00
[sctp_iterator]
  0     4     0   0 -16  0     0   16 -        DL   ??     0:00.00 [xpt_thrd]
  0     5     0   0 -16  0     0   16 psleep   DL   ??     0:28.85 [pagedaemon]
  0     6     0   0 -16  0     0   16 psleep   DL   ??     0:45.03 [vmdaemon]
  0     7     0   0 -16  0     0   16 pollid   DL   ??     0:00.23 [idlepoll]
  0     8     0   0 155  0     0   16 pgzero   DL   ??     0:00.00 [pagezero]
  0     9     0   0 -16  0     0   16 psleep   DL   ??     0:00.83 [bufdaemon]
  0    10     0   0 -16  0     0   16 audit_wo DL   ??     0:00.00 [audit]
  0    11     0   0 155  0     0   32 -        RL   ??  8317:13.37 [idle]
  0    12     0   0 -76  0     0  240 -        WL   ??   301:43.54 [intr]
  0    13     0   0  -8  0     0   48 -        DL   ??     0:09.89 [geom]
  0    14     0   0 -16  0     0   16 -        DL   ??     2:58.88 [yarrow]
  0    15     0   0 -68  0     0   64 -        DL   ??     0:02.32 [usb]
  0    16     0   0 -16  0     0   16 vlruwt   DL   ??     0:06.35 [vnlru]
  0    17     0   0  16  0     0   16 syncer   DL   ??     5:28.89 [syncer]
  0    18     0   0 -16  0     0   16 sdflush  DL   ??     0:10.27
[softdepflush]
  0    19     0   0 -16  0     0   16 -        DL   ??     0:55.09 [racctd]
  0   830     1   0  20  0 45348 2396 wait     Is   u0     0:00.07
login [pam] (login)
500 32269   830   0  20  0 14556 2428 wait     S    u0     0:00.09 -sh (sh)
500 32340 32269   0  20  0 16296 1908 -        R+   u0     0:00.00 ps axlww

Since the issue doesn't seem related to kernel memory usage, vmstat -m
and -z have been skipped, but nothing jumps out as using gigs of RAM;
they do appear consistent with 1284M of wired memory, which is not
unreasonable for the affected machines' tuning and workload.

The only user-space processes running are login, sh, and ps.  So where
did 5.5G of userspace RAM go?

The only other potentially useful information is that when this
happens, shutting down the system will hang for about ten minutes.

$ sudo halt -p
Waiting (max 60 seconds) for system process `vnlru' to stop...done
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining...0 0 0 0 0 0 0 0 0 done
All buffers synced.  <----- 10 MINUTE HANG AFTER PRINTING THIS
Uptime: 3d15h56m32s
usbus0: Controller shutdown
uhub0: at usbus0, port 1, addr 1 (disconnected)
usbus0: controller did not stop
usbus0: Controller shutdown complete
acpi0: Powering system off
Connection closed by foreign host.

So it seems like somewhere after "All buffers synced" and printing the
uptime, it's very slowly unwinding whatever is using up all that RAM
and swap.

Does anyone have any idea what might be causing this or how to fix/prevent it?

Thanks in advance for any advice!

Konstantin Belousov

2015-Mar-16 23:24 UTC

head link

Significant memory leak in 9.3p10?

On Mon, Mar 16, 2015 at 06:59:33PM -0400, J David wrote:> Recently we have seen a large-scale memory leak on amd64 machines
> running FreeBSD 9.3-RELEASE-p10.
> 
> This was first observed on 9.3p2 but has since shown up all the way through
p10.
> 
> Here's what the header of top shows:
> 
> last pid: 32329;  load averages:  0.00,  0.01,  0.21    up 3+15:37:29 
22:34:04
> 25 processes:  2 running, 22 sleeping, 1 waiting
> CPU:     % user,     % nice,     % system,     % interrupt,     % idle
> Mem: 4072M Active, 895M Inact, 1284M Wired, 125M Cache, 826M Buf, 1521M
Free
> Swap: 1024M Total, 874M Used, 149M Free, 85% Inuse
> 
> About 4G actively being used, another 895M inactive, and another 874M
> in swap.  So it seems like this is a user-space leak, rather than a
> kernel-space leak.
> 
> At the time of measurement, this machine was not doing anything and
> every possible process had been killed trying to find a culprit.  The
> entire output of "ps axlww" is:
> 
> UID   PID  PPID CPU PRI NI   VSZ  RSS MWCHAN   STAT TT        TIME COMMAND
>   0     0     0   0 -52  0     0  224 -        DLs  ??     0:00.82 [kernel]
>   0     1     0   0  20  0  6280  556 wait     SLs  ??     0:00.57
/sbin/init --
>   0     2     0   0 -16  0     0   16 pftm     DL   ??     0:00.85
[pfpurge]
>   0     3     0   0 -16  0     0   16 waiting_ DL   ??     0:00.00
> [sctp_iterator]
>   0     4     0   0 -16  0     0   16 -        DL   ??     0:00.00
[xpt_thrd]
>   0     5     0   0 -16  0     0   16 psleep   DL   ??     0:28.85
[pagedaemon]
>   0     6     0   0 -16  0     0   16 psleep   DL   ??     0:45.03
[vmdaemon]
>   0     7     0   0 -16  0     0   16 pollid   DL   ??     0:00.23
[idlepoll]
>   0     8     0   0 155  0     0   16 pgzero   DL   ??     0:00.00
[pagezero]
>   0     9     0   0 -16  0     0   16 psleep   DL   ??     0:00.83
[bufdaemon]
>   0    10     0   0 -16  0     0   16 audit_wo DL   ??     0:00.00 [audit]
>   0    11     0   0 155  0     0   32 -        RL   ??  8317:13.37 [idle]
>   0    12     0   0 -76  0     0  240 -        WL   ??   301:43.54 [intr]
>   0    13     0   0  -8  0     0   48 -        DL   ??     0:09.89 [geom]
>   0    14     0   0 -16  0     0   16 -        DL   ??     2:58.88 [yarrow]
>   0    15     0   0 -68  0     0   64 -        DL   ??     0:02.32 [usb]
>   0    16     0   0 -16  0     0   16 vlruwt   DL   ??     0:06.35 [vnlru]
>   0    17     0   0  16  0     0   16 syncer   DL   ??     5:28.89 [syncer]
>   0    18     0   0 -16  0     0   16 sdflush  DL   ??     0:10.27
> [softdepflush]
>   0    19     0   0 -16  0     0   16 -        DL   ??     0:55.09 [racctd]
>   0   830     1   0  20  0 45348 2396 wait     Is   u0     0:00.07
> login [pam] (login)
> 500 32269   830   0  20  0 14556 2428 wait     S    u0     0:00.09 -sh (sh)
> 500 32340 32269   0  20  0 16296 1908 -        R+   u0     0:00.00 ps axlww
> 
> Since the issue doesn't seem related to kernel memory usage, vmstat -m
> and -z have been skipped, but nothing jumps out as using gigs of RAM;
> they do appear consistent with 1284M of wired memory, which is not
> unreasonable for the affected machines' tuning and workload.
> 
> The only user-space processes running are login, sh, and ps.  So where
> did 5.5G of userspace RAM go?
> 
> The only other potentially useful information is that when this
> happens, shutting down the system will hang for about ten minutes.
> 
> $ sudo halt -p
> Waiting (max 60 seconds) for system process `vnlru' to stop...done
> Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
> Waiting (max 60 seconds) for system process `syncer' to stop...
> Syncing disks, vnodes remaining...0 0 0 0 0 0 0 0 0 done
> All buffers synced.  <----- 10 MINUTE HANG AFTER PRINTING THIS
> Uptime: 3d15h56m32s
> usbus0: Controller shutdown
> uhub0: at usbus0, port 1, addr 1 (disconnected)
> usbus0: controller did not stop
> usbus0: Controller shutdown complete
> acpi0: Powering system off
> Connection closed by foreign host.
> 
> So it seems like somewhere after "All buffers synced" and
printing the
> uptime, it's very slowly unwinding whatever is using up all that RAM
> and swap.
> 
> Does anyone have any idea what might be causing this or how to fix/prevent
it?
There are a lot of possibilities to create persistent anonymous shared
memory objects.  Not complete list is tmpfs mounts, swap-backed md disks,
sysv shared memory, possibly posix shared memory (I do not remember which
implementation is used in stable/9).

I quite possible missed some object types.  Also note that active/inactive
can be explained by cached file pages, and only swap usage suggests that
it might be something persisent from the list above.

freebsd stable - Mar 2015 - Significant memory leak in 9.3p10?

Significant memory leak in 9.3p10?

Significant memory leak in 9.3p10?