Dear all, I was testing a program that would exhaust all my memory (in C++), and when this would happen, it would call set_new_handler() along with one of my functions that would inform the user about the lack of memory and then it would exit the program. Instead, the program was force-killed by the kernel (signal 9) and I was informed that: "swap zone exhausted, increase kern.maxswzone" on my main console. So far so good. In two other consoles I was running top -P, and vmvstat 5, and their output was: # top -P last pid: 1776; load averages: 1.71, 0.72, 0.40 up 0+00:24:29 12:24:39 34 processes: 2 running, 32 sleeping CPU 0: 2.6% user, 0.0% nice, 88.3% system, 0.0% interrupt, 9.0% idle CPU 1: 0.0% user, 0.0% nice, 39.1% system, 0.0% interrupt, 60.9% idle CPU 2: 0.0% user, 0.0% nice, 2.6% system, 1.5% interrupt, 95.9% idle CPU 3: 0.4% user, 0.0% nice, 89.8% system, 0.0% interrupt, 9.8% idle Mem: 2629M Active, 221M Inact, 966M Wired, 82M Cache, 7280K Buf, 16M Free Swap: 4096M Total, 1821M Used, 2275M Free, 44% Inuse, 196K In, 30M Out FreeBSD/amd64 (example.com) (ttyv2)ATE C TIME WCPU COMMAND 1771 mamalos 1 117 0 573G 3566M CPU1 1 1:48 94.38% memory login: oot 1 44 0 11544K 1176K select 0 0:17 0.29% hald-addon 1535 root 1 44 0 16552K 1436K swread 2 0:03 0.20% hald-addon 1772 mamalos 1 44 0 9364K 2032K CPU3 0 0:02 0.00% top 790 root 1 46 0 8068K 812K select 3 0:01 0.00% moused 1514 root 1 44 0 21696K 1596K wait 0 0:00 0.00% login 1711 mamalos 1 45 0 10148K 0K wait 1 0:00 0.00% <bash> 1021 root 1 44 0 7044K 864K zio->i 2 0:00 0.00% syslogd 829 root 1 64 0 3204K 16K select 2 0:00 0.00% devd 1568 root 1 47 0 10148K 964K ttyin 1 0:00 0.00% bash 1530 root 1 44 0 17872K 1108K swread 2 0:00 0.00% hald-runne 1513 root 1 44 0 21696K 0K wait 0 0:00 0.00% <login> 1680 mamalos 1 44 0 10148K 0K wait 3 0:00 0.00% <bash> 1512 root 1 44 0 21696K 0K wait 0 0:00 0.00% <login> 1511 root 1 44 0 21696K 0K wait 0 0:00 0.00% <login> and: # vmstat 5 procs memory page disks faults cpu r b w avm fre flt re pi po fr sr ad4 cd0 in sy cs us sy id 1 0 9 565G 133M 1761 3 4 156 901 13078 0 0 173 1693 1581 0 7i 92 when the program was shut. As you can see (in vmstat's output) avm equals to 565G of memory(?!?!?!), and top shows that the first process (it's called memory) has a size of 573G! moreover, on another terminal I run: # sysctl -a | grep -i swap vm.swap_enabled: 1 vm.nswapdev: 1 vm.swap_async_max: 4 vm.swap_reserved: 544762376192 vm.swap_total: 4294967296 vm.swap_idle_threshold2: 10 vm.swap_idle_threshold1: 2 vm.stats.vm.v_swappgsout: 795224 vm.stats.vm.v_swappgsin: 1188 vm.stats.vm.v_swapout: 200045 vm.stats.vm.v_swapin: 606 vm.disable_swapspace_pageouts: 0 vm.defer_swapspace_pageouts: 0 vm.swap_idle_enabled: 0 where one can see that vm.swap_reserved is equal to 507G whereas the total size (vm.swap_total) is 4G.(?!?!?) As far as my system is concerned: $ uname -a FreeBSD example.com 8.1-STABLE FreeBSD 8.1-STABLE #1: Fri Nov 5 17:27:37 EET 2010 root@:/mnt/obj/mnt/src/sys/GENERIC amd64 and I use zfs on one of my filesystems. So: $ cat /boot/loader.conf zfs_load="YES" vfs.zfs.prefetch_disable=0 nvidia_load="YES" kern.maxfiles="35000" kern.ipc.shmall="65536" atapicam_load=YES snd_hda_load=YES I had noticed in the past that at some moments vmstat would show more memory than what my total memory was, but never sooo much. Is this a bug or am I missing something very fundamental? Thank you all for your time in advance. -- George Mamalakis IT Officer Electrical and Computer Engineer (Aristotle Un. of Thessaloniki), MSc (Imperial College of London) Department of Electrical and Computer Engineering Faculty of Engineering Aristotle University of Thessaloniki phone number : +30 (2310) 994379
On Wed, 15 Dec 2010 13:04+0200, George Mamalakis wrote:> I was testing a program that would exhaust all my memory (in C++), > and when this would happen, it would call set_new_handler() along > with one of my functions that would inform the user about the lack > of memory and then it would exit the program. Instead, the program > was force-killed by the kernel (signal 9) and I was informed that:If all your process' memory is exhausted, then there is no memory left for the runtime system for doing I/O and the other stuff you want. Next, unless I'm on drugs, maybe you should call set_new_handler() before you actually run out of memory. Just my $0.02. Trond. -- ---------------------------------------------------------------------- Trond Endrest?l | Trond.Endrestol@fagskolen.gjovik.no ACM, NAS, NUUG, SAGE, USENIX | FreeBSD 8.1-STABLE & Alpine 2.00
One of the problems with resource management in general is that it has traditionally been per-process, and due to the multiplicative effect (e.g. max-descriptors * limit-per-descriptor), per-process resources cannot be set such that any given user is prevented from DDOSing the system without making them so low that normal programs begin to fail for no good reason. Hence the advent of per-user and other more suitable resource limits, nominally set via sysctl. Even with these, however, it is virtually impossible to protect against a user DDOS. The kernel itself has resource limitations which are fairly easy to blow out... mbufs are usually the easiest to blow up, followed by pipe KVM memory. Filesytems can be blown up too by creating sparse files and mmap()ing them (thus circumventing normal overcommit limitations). Paging just itself, without running the system out of VM, can destroy a machine's performance and be just as effective a DDOS attack as resource starvation is. Virtual memory resources are similarly impacted. Overcommit limiting features have as many downsides as they have upsides. Its an endless argument but I've seen systems blow up with overcommit limits set even more readily than with no (overcommit) limits set. Theoretically overcommit limits make the system more manageable but in actual practice they only work when the application base is written with such limits in mind (and most are not). So for a general purpose unix environment putting limits on overcommit tends to create headaches. To be sure, in a turn-key environment overcommit serves a very important function. In a non-turn-key environment however it will likely create more problems than it will solve. The only way to realistically deal with the mess, if it is important to you, is to partition the systems' real resources and run stuff inside their own virtualized kernels each of which does its own independent resource management and whos I/O on the real system can be well-controlled as an aggregate. Alternatively, creating very large swap partitions work very well to mitigate the more common problems. Swap itself is changing its function. Swap is no longer just used for real memory overcommit (in fact, real memory overcommit is quite uncommon these days). It is now also used for things like tmpfs, temporary virtual disks, meta-data caching, and so forth. These days the minimum amount of swap I configure is 32G and as efficient swap storage gets more cost effective (e.g. SSDs), significantly more. 70G, 110G, etc. It becomes more a matter of being able to detact and act on the DDOS/resource issue BEFORE it gets to the point of killing important processes (definition: whatever is important for the functioning of that particular machine, user-run or root-run), and less a matter of hoping the system will do the right thing when the resource limit is actually reached. Having a lot of swap gives you more time to act. -Matt