just had a server hang on me...seems pretty clearly that some process stole all the RAM (clamd?) Jul 30 16:26:04 srv1 kernel: auditd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=-17 Jul 30 16:26:08 srv1 kernel: [<c0457d36>] out_of_memory+0x72/0x1a4 Jul 30 16:26:08 srv1 kernel: [<c0459161>] __alloc_pages+0x201/0x282 Jul 30 16:26:08 srv1 kernel: [<c045a3bf>] __do_page_cache_readahead +0xc4/0x1c6 Jul 30 16:26:08 srv1 kernel: [<c0438ecf>] wake_futex+0x3a/0x44 Jul 30 16:26:08 srv1 kernel: [<c043a2a9>] do_futex+0x738/0xb15 Jul 30 16:26:08 srv1 kernel: [<f88d0b96>] dm_any_congested+0x2f/0x35 [dm_mod] Jul 30 16:26:08 srv1 kernel: [<c04572d8>] filemap_nopage+0x151/0x315 Jul 30 16:26:08 srv1 kernel: [<c045fda3>] __handle_mm_fault+0x172/0x87b Jul 30 16:26:08 srv1 kernel: [<c0606c2b>] do_page_fault+0x20a/0x4b8 Jul 30 16:26:08 srv1 kernel: [<c0606a21>] do_page_fault+0x0/0x4b8 Jul 30 16:26:08 srv1 kernel: [<c0405a71>] error_code+0x39/0x40 Jul 30 16:26:08 srv1 kernel: ====================== snip... Jul 30 16:26:21 srv1 kernel: Free pages: 12044kB (124kB HighMem) Jul 30 16:26:21 srv1 kernel: Active:119791 inactive:120544 dirty:0 writeback:12 unstable:0 free:3011 slab:7550 mapped-file:1746 mapped-anon:215165 pagetabl es:3364 Jul 30 16:26:32 srv1 kernel: DMA free:4096kB min:68kB low:84kB high:100kB active:3948kB inactive:3608kB present:16384kB pages_scanned:17135 all_unreclaimab le? yes Jul 30 16:26:32 srv1 kernel: lowmem_reserve[]: 0 0 880 1007 Jul 30 16:26:33 srv1 kernel: DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Jul 30 16:26:33 srv1 kernel: lowmem_reserve[]: 0 0 880 1007 Jul 30 16:26:33 srv1 kernel: Normal free:7824kB min:3756kB low:4692kB high:5632kB active:413832kB inactive:416972kB present:901120kB pages_scanned:2000672 all_unreclaimable? yes Jul 30 16:26:33 srv1 kernel: lowmem_reserve[]: 0 0 0 1023 Jul 30 16:26:33 srv1 kernel: HighMem free:124kB min:128kB low:264kB high:400kB active:61512kB inactive:61412kB present:130944kB pages_scanned:271904 all_un reclaimable? yes Jul 30 16:26:33 srv1 kernel: lowmem_reserve[]: 0 0 0 0 Jul 30 16:26:33 srv1 kernel: DMA: 0*4kB 0*8kB 24*16kB 4*32kB 0*64kB 0*128kB 2*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 4096kB Jul 30 16:26:34 srv1 kernel: DMA32: empty Jul 30 16:26:34 srv1 kernel: Normal: 44*4kB 4*8kB 234*16kB 7*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 7824kB Jul 30 16:26:34 srv1 kernel: HighMem: 3*4kB 0*8kB 3*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 124kB Jul 30 16:26:34 srv1 kernel: Swap cache: add 61987202, delete 61986895, find 48568667/56549915, race 0+19406 Jul 30 16:26:34 srv1 kernel: Free swap = 0kB Jul 30 16:26:34 srv1 kernel: Total swap = 2031608kB Jul 30 16:26:34 srv1 kernel: Free swap: 0kB Jul 30 16:26:34 srv1 kernel: 262112 pages of RAM Jul 30 16:26:34 srv1 kernel: 32736 pages of HIGHMEM Jul 30 16:26:34 srv1 kernel: 3327 reserved pages Jul 30 16:26:34 srv1 kernel: 8186 pages shared Jul 30 16:26:34 srv1 kernel: 313 pages swap cached Jul 30 16:26:34 srv1 kernel: 0 pages dirty Jul 30 16:26:34 srv1 kernel: 12 pages writeback Jul 30 16:26:34 srv1 kernel: 1746 pages mapped Jul 30 16:26:34 srv1 kernel: 7550 pages slab Jul 30 16:26:34 srv1 kernel: 3364 pages pagetables how does one determine who the culprit was? Craig
On Wed, Jul 30, 2008 at 20:31, Craig White <craigwhite at azapple.com> wrote:> how does one determine who the culprit was?Very hard... the kernel tries to "guess" which process is causing the issue, but from what I've seen (and I see OOMs every week) it guesses wrong most of the time. In my case, the victim ends up being "nscd" most of the time, even when I'm sure it's not using a lot of memory nor leaking. In my case, usually when I start having OOMs I have them on several machines running the same programs (it's a grid) so it's more or less easy to find the culprit by looking at the jobs that were running on all affected machines. In any case, my policy is to always reboot a machine after an OOM, since it may be in an incoherent state. HTH, Filipe
Seemingly Similar Threads
- Repeated kernel "oops" / oom-killer with Ralph Passgang''s xen 3.0.0 Debian packages
- [79030.229547] motion: page allocation failure: order:6, mode:0xd4
- Memory Leak with stock Squirrelmail, PHP, mysql, apache since 5.3
- LVM OOM killer
- Lots of "swapper: page allocation failure" and other memory related messages - 2.6.16-xen0