thr3ads.net - freebsd stable - vm.swap_reserved toooooo large? [Dec 2010]

If this information is useful, please help other people find it:
Share via:

George Mamalakis

2010-Dec-15 11:16 UTC

vm.swap_reserved toooooo large?

Dear all,

I was testing a program that would exhaust all my memory (in C++), and 
when this would happen, it would call set_new_handler() along with one 
of my functions that would inform the user about the lack of memory and 
then it would exit the program. Instead, the program was force-killed by 
the kernel (signal 9) and I was informed that:

"swap zone exhausted, increase kern.maxswzone"

on my main console. So far so good. In two other consoles I was running 
top -P, and vmvstat 5, and their output was:

# top -P

last pid:  1776;  load averages:  1.71,  0.72,  0.40    up 0+00:24:29  
12:24:39
34 processes:  2 running, 32 sleeping
CPU 0:  2.6% user,  0.0% nice, 88.3% system,  0.0% interrupt,  9.0% idle
CPU 1:  0.0% user,  0.0% nice, 39.1% system,  0.0% interrupt, 60.9% idle
CPU 2:  0.0% user,  0.0% nice,  2.6% system,  1.5% interrupt, 95.9% idle
CPU 3:  0.4% user,  0.0% nice, 89.8% system,  0.0% interrupt,  9.8% idle
Mem: 2629M Active, 221M Inact, 966M Wired, 82M Cache, 7280K Buf, 16M Free
Swap: 4096M Total, 1821M Used, 2275M Free, 44% Inuse, 196K In, 30M Out

FreeBSD/amd64 (example.com) (ttyv2)ATE   C   TIME   WCPU COMMAND
  1771 mamalos       1 117    0   573G  3566M CPU1    1   1:48 94.38% memory
login: oot          1  44    0 11544K  1176K select  0   0:17  0.29% 
hald-addon
  1535 root          1  44    0 16552K  1436K swread  2   0:03  0.20% 
hald-addon
  1772 mamalos       1  44    0  9364K  2032K CPU3    0   0:02  0.00% top
   790 root          1  46    0  8068K   812K select  3   0:01  0.00% moused
  1514 root          1  44    0 21696K  1596K wait    0   0:00  0.00% login
  1711 mamalos       1  45    0 10148K     0K wait    1   0:00  0.00%
<bash>
  1021 root          1  44    0  7044K   864K zio->i  2   0:00  0.00% 
syslogd
   829 root          1  64    0  3204K    16K select  2   0:00  0.00% devd
  1568 root          1  47    0 10148K   964K ttyin   1   0:00  0.00% bash
  1530 root          1  44    0 17872K  1108K swread  2   0:00  0.00% 
hald-runne
  1513 root          1  44    0 21696K     0K wait    0   0:00  0.00% 
<login>
  1680 mamalos       1  44    0 10148K     0K wait    3   0:00  0.00%
<bash>
  1512 root          1  44    0 21696K     0K wait    0   0:00  0.00% 
<login>
  1511 root          1  44    0 21696K     0K wait    0   0:00  0.00% 
<login>

and:

# vmstat 5
  procs      memory      page                    disks     
faults         cpu
  r b w     avm    fre   flt  re  pi  po    fr  sr ad4 cd0   in   sy   
cs us sy id
  1 0 9    565G   133M  1761   3   4 156   901 13078   0   0  173 1693 
1581  0  7i 92

when the program was shut.

As you can see (in vmstat's output) avm equals to 565G of 
memory(?!?!?!), and top shows that the first process (it's called 
memory) has a size of 573G!

moreover, on another terminal I run:
# sysctl -a | grep -i swap
vm.swap_enabled: 1
vm.nswapdev: 1
vm.swap_async_max: 4
vm.swap_reserved: 544762376192
vm.swap_total: 4294967296
vm.swap_idle_threshold2: 10
vm.swap_idle_threshold1: 2
vm.stats.vm.v_swappgsout: 795224
vm.stats.vm.v_swappgsin: 1188
vm.stats.vm.v_swapout: 200045
vm.stats.vm.v_swapin: 606
vm.disable_swapspace_pageouts: 0
vm.defer_swapspace_pageouts: 0
vm.swap_idle_enabled: 0

where one can see that vm.swap_reserved is equal to 507G whereas the 
total size (vm.swap_total) is 4G.(?!?!?)

As far as my system is concerned:
$ uname -a
FreeBSD example.com 8.1-STABLE FreeBSD 8.1-STABLE #1: Fri Nov  5 
17:27:37 EET 2010     root@:/mnt/obj/mnt/src/sys/GENERIC  amd64

and I use zfs on one of my filesystems. So:

$ cat /boot/loader.conf
zfs_load="YES"
vfs.zfs.prefetch_disable=0
nvidia_load="YES"
kern.maxfiles="35000"
kern.ipc.shmall="65536"
atapicam_load=YES
snd_hda_load=YES

I had noticed in the past that at some moments vmstat would show more 
memory than what my total memory was, but never sooo much.

Is this a bug or am I missing something very fundamental?

Thank you all for your time in advance.

-- 
George Mamalakis

IT Officer
Electrical and Computer Engineer (Aristotle Un. of Thessaloniki),
MSc (Imperial College of London)

Department of Electrical and Computer Engineering
Faculty of Engineering
Aristotle University of Thessaloniki

phone number : +30 (2310) 994379

Trond Endrestøl

2010-Dec-15 11:26 UTC

head link

vm.swap_reserved toooooo large?

On Wed, 15 Dec 2010 13:04+0200, George Mamalakis wrote:
> I was testing a program that would exhaust all my memory (in C++), 
> and when this would happen, it would call set_new_handler() along 
> with one of my functions that would inform the user about the lack 
> of memory and then it would exit the program. Instead, the program 
> was force-killed by the kernel (signal 9) and I was informed that:
If all your process' memory is exhausted, then there is no memory left 
for the runtime system for doing I/O and the other stuff you want. 
Next, unless I'm on drugs, maybe you should call set_new_handler() 
before you actually run out of memory. Just my $0.02.


Trond.

-- 
----------------------------------------------------------------------
Trond Endrest?l                  | Trond.Endrestol@fagskolen.gjovik.no
ACM, NAS, NUUG, SAGE, USENIX     |    FreeBSD 8.1-STABLE & Alpine 2.00

Matthew Dillon

2010-Dec-20 23:14 UTC

head link

vm.swap_reserved toooooo large?

One of the problems with resource management in general is
    that it has traditionally been per-process, and due to the
    multiplicative effect (e.g. max-descriptors * limit-per-descriptor),
    per-process resources cannot be set such that any given user is
    prevented from DDOSing the system without making them so low that
    normal programs begin to fail for no good reason.

    Hence the advent of per-user and other more suitable resource
    limits, nominally set via sysctl.  Even with these, however,
    it is virtually impossible to protect against a user DDOS.
    The kernel itself has resource limitations which are fairly easy
    to blow out... mbufs are usually the easiest to blow up, followed
    by pipe KVM memory.  Filesytems can be blown up too by creating
    sparse files and mmap()ing them (thus circumventing normal overcommit
    limitations).

    Paging just itself, without running the system out of VM, can destroy
    a machine's performance and be just as effective a DDOS attack as
    resource starvation is.

    Virtual memory resources are similarly impacted.  Overcommit limiting
    features have as many downsides as they have upsides.  Its an endless
    argument but I've seen systems blow up with overcommit limits set even
    more readily than with no (overcommit) limits set.  Theoretically
    overcommit limits make the system more manageable but in actual practice
    they only work when the application base is written with such limits
    in mind (and most are not).  So for a general purpose unix environment
    putting limits on overcommit tends to create headaches.  To be sure, in
    a turn-key environment overcommit serves a very important function.  In
    a non-turn-key environment however it will likely create more problems
    than it will solve.

    The only way to realistically deal with the mess, if it is important
    to you, is to partition the systems' real resources and run stuff
    inside their own virtualized kernels each of which does its own
    independent resource management and whos I/O on the real system can
    be well-controlled as an aggregate.

    Alternatively, creating very large swap partitions work very well to
    mitigate the more common problems.  Swap itself is changing its function.
    Swap is no longer just used for real memory overcommit (in fact,
    real memory overcommit is quite uncommon these days).  It is now also
    used for things like tmpfs, temporary virtual disks, meta-data
    caching, and so forth.  These days the minimum amount of swap I
    configure is 32G and as efficient swap storage gets more cost effective
    (e.g. SSDs), significantly more.  70G, 110G, etc.

    It becomes more a matter of being able to detact and act on the
    DDOS/resource issue BEFORE it gets to the point of killing important
    processes (definition: whatever is important for the functioning of
    that particular machine, user-run or root-run), and less a matter of
    hoping the system will do the right thing when the resource limit is
    actually reached.  Having a lot of swap gives you more time to act.

						-Matt

freebsd stable - Dec 2010 - vm.swap_reserved toooooo large?

vm.swap_reserved toooooo large?

vm.swap_reserved toooooo large?

vm.swap_reserved toooooo large?