I'm trying to track this down - this started happening without changing the kernel in use, so probably a corrupted filesystem. The symptoms are that all memory is suddenly used by no apparent source. OOM killer is invoked on every task, still can't free up enough memory to continue. When it goes wrong, it's extremely rapid - system goes from stable to dead in less than 30 seconds. Tested 3.9.0, 3.12.0, 3.12.8. Limited testing on 3.13 shows I think the same problem but I need to double-check that it's not a different issue. Blows up the exact same way on a real kernel or in UML. All sorts of things can trigger it - defrag, random writes to files. Balance and scrub don't, readonly mount doesn't. I can reproduce this trivially, mount the filesystem read-write and perform some activity. It only takes a few minutes. The other btrfs filesystems on the same machine don't show similar problems. Unfortunately, the output of btrfs-image -c9 is 75gb, much more than I can reasonably share. I've got a reliable reproducer in UML using UML-COW to always start with the same base image, defrag a file with 33,000 extents and the system explodes within a minute. Here's the OOM report, the formatting is a bit off due to being delivered via netconsole. Swap was disabled on this run, but it makes no difference. I get insta-OOM issues out of the blue with very little memory swapped out. [ 1184.871419] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1184.879873] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1184.894932] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1184.898207] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1184.902116] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1184.902454] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1184.903333] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1184.903588] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1184.904592] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1184.904839] parent transid verify failed on 8049834639360 wanted 1736567 found 1734749 [ 1192.113082] verify_parent_transid: 16 callbacks suppressed [ 1192.113166] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1192.113269] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1192.176637] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1192.178119] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1192.203369] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1192.203503] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1192.204112] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1192.205324] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1192.814465] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1192.817226] parent transid verify failed on 8049835315200 wanted 1736567 found 1736533 [ 1219.366168] ntpd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 [ 1219.366270] CPU: 1 PID: 5479 Comm: ntpd Not tainted 3.12.8-00848-g97f15f1 #2 [ 1219.366324] Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GPM-DS2H/GA-MA78GPM-DS2H, BIOS F1 06/03/2008 [ 1219.366402] 0000000000000000 ffff8800c02339a8 ffffffff815ccf3b 000000003f51a67e [ 1219.366632] ffff8800c557ae40 ffff8800c0233a48 ffffffff815c8551 0000000000000100 [ 1219.366861] 0000000000000001 ffff8800c02339e8 ffffffff815d4f46 00000000000ef3e4 [ 1219.367086] Call Trace: [ 1219.367155] [<ffffffff815ccf3b>] dump_stack+0x50/0x85 [ 1219.367262] [<ffffffff815c8551>] dump_header.isra.14+0x6d/0x1b5 [ 1219.367322] [<ffffffff815d4f46>] ? sub_preempt_count+0x33/0x46 [ 1219.367390] [<ffffffff815d1b9d>] ? _raw_spin_unlock_irqrestore+0x2b/0x48 [ 1219.367448] [<ffffffff8132849a>] ? ___ratelimit+0xda/0xf8 [ 1219.367514] [<ffffffff810cf773>] oom_kill_process+0x70/0x303 [ 1219.367614] [<ffffffff81041930>] ? has_capability_noaudit+0x12/0x16 [ 1219.367672] [<ffffffff810cfe91>] out_of_memory+0x314/0x347 [ 1219.367734] [<ffffffff810d3ee3>] __alloc_pages_nodemask+0x629/0x7c8 [ 1219.367798] [<ffffffff811052db>] alloc_pages_current+0xb2/0xbb [ 1219.367852] [<ffffffff810cd36e>] __page_cache_alloc+0xb/0xd [ 1219.367915] [<ffffffff810ceb9a>] filemap_fault+0x249/0x362 [ 1219.367973] [<ffffffff810eb378>] __do_fault+0xa7/0x418 [ 1219.368071] [<ffffffff815d1b9d>] ? _raw_spin_unlock_irqrestore+0x2b/0x48 [ 1219.368130] [<ffffffff810606c4>] ? get_parent_ip+0xe/0x3e [ 1219.368184] [<ffffffff810eed47>] handle_mm_fault+0x2b4/0x907 [ 1219.368239] [<ffffffff815d1a93>] ? _raw_spin_unlock_irq+0x17/0x32 [ 1219.368297] [<ffffffff815d4dc4>] __do_page_fault+0x489/0x4e6 [ 1219.368354] [<ffffffff8100b22e>] ? __restore_xstate_sig+0x30a/0x4dc [ 1219.368408] [<ffffffff810606c4>] ? get_parent_ip+0xe/0x3e [ 1219.368462] [<ffffffff815d1cdd>] ? _raw_spin_lock_irq+0x19/0x38 [ 1219.368518] [<ffffffff815d4f46>] ? sub_preempt_count+0x33/0x46 [ 1219.368575] [<ffffffff8132da3a>] ? trace_hardirqs_off_thunk+0x3a/0x6c [ 1219.368633] [<ffffffff815d4e2a>] do_page_fault+0x9/0xb [ 1219.368687] [<ffffffff815d25a2>] page_fault+0x22/0x30 [ 1219.370803] Mem-Info: [ 1219.370858] Node 0 [ 1219.371146] CPU 0: hi: 0, btch: 1 usd: 0 [ 1219.371202] CPU 1: hi: 0, btch: 1 usd: 0 [ 1219.371264] Node 0 [ 1219.371555] CPU 0: hi: 186, btch: 31 usd: 42 [ 1219.371617] CPU 1: hi: 186, btch: 31 usd: 29 [ 1219.371670] Node 0 [ 1219.371772] CPU 0: hi: 186, btch: 31 usd: 30 [ 1219.371832] CPU 1: hi: 186, btch: 31 usd: 26 [ 1219.371893] active_anon:46310 inactive_anon:940 isolated_anon:0#012 [ 1219.371893] active_file:768 inactive_file:1155 isolated_file:0#012 [ 1219.371893] unevictable:0 dirty:827 writeback:0 unstable:0#012 [ 1219.371893] free:21409 slab_reclaimable:5668 slab_unreclaimable:5883#012 [ 1219.371893] mapped:1276 shmem:1261 pagetables:2591 bounce:0#012 [ 1219.371893] free_cma:0 [ 1219.372411] Node 0 [ 1219.372628] lowmem_reserve[]: 0 3106 3811 3811 [ 1219.372970] Node 0 [ 1219.373149] lowmem_reserve[]: 0 0 705 705 [ 1219.373475] Node 0 [ 1219.373654] lowmem_reserve[]: 0 0 0 0 [ 1219.373930] Node 0 DMA: 3*4kB (M) 4*8kB (UM) 5*16kB (UM) 3*32kB (UM) 1*64kB (U) 1*128kB (U) 1*256kB (M) 1*512kB (M) 2*1024kB (UM) 2*2048kB (MR) 2*4096kB (EM) = 15516kB [ 1219.375077] Node 0 DMA32: 945*4kB (UEM) 1764*8kB (UEM) 1107*16kB (UEM) 366*32kB (UEM) 84*64kB (UEM) 6*128kB (UEM) 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB (R) = 57556kB [ 1219.376072] Node 0 Normal: 2139*4kB (UEM) 36*8kB (M) 0*16kB 0*32kB 1*64kB (R) 1*128kB (R) 1*256kB (R) 0*512kB 1*1024kB (R) 1*2048kB (R) 0*4096kB = 12364kB [ 1219.377031] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 1219.377100] 3230 total pagecache pages [ 1219.377151] 0 pages in swap cache [ 1219.377211] Swap cache stats: add 0, delete 0, find 0/0 [ 1219.377263] Free swap = 0kB [ 1219.377320] Total swap = 0kB [ 1219.428078] 1015807 pages RAM [ 1219.428154] 35867 pages reserved [ 1219.428224] 536682 pages shared [ 1219.428273] 953367 pages non-shared [ 1219.428330] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 1219.428414] [ 1249] 0 1249 6103 316 16 0 -1000 udevd [ 1219.428477] [ 1479] 0 1479 6092 313 15 0 -1000 udevd [ 1219.428552] [ 3285] 0 3285 2130 39 10 0 0 dhcpcd-bin [ 1219.428614] [ 3383] 0 3383 5989 208 15 0 -1000 udevd [ 1219.428681] [ 3717] 0 3717 4779 104 15 0 0 rpcbind [ 1219.428743] [ 3749] 106 3749 5872 148 17 0 0 rpc.statd [ 1219.428811] [ 3761] 0 3761 6359 72 17 0 0 rpc.idmapd [ 1219.428873] [ 3766] 0 3766 7681 108 19 0 0 rpc.gssd [ 1219.428941] [ 4227] 13 4227 1169 44 8 0 0 polipo [ 1219.429107] [ 4288] 130 4288 3165 98 12 0 0 syslogd [ 1219.430837] [ 4309] 135 4309 4774 96 14 0 0 dirmngr [ 1219.430899] [ 4367] 0 4367 17697 210 35 0 0 nmbd [ 1219.430972] [ 4371] 0 4371 24778 284 48 0 0 smbd [ 1219.431033] [ 4389] 101 4389 8094 137 19 0 0 dbus-daemon [ 1219.431101] [ 4390] 0 4390 24804 264 47 0 0 smbd [ 1219.431162] [ 4408] 0 4408 10567 120 24 0 0 krb5kdc [ 1219.431230] [ 4439] 0 4439 7094 249 17 0 0 openvpn [ 1219.431290] [ 4525] 0 4525 96084 181 31 0 0 automount [ 1219.431358] [ 4531] 0 4531 4204 56 12 0 0 atd [ 1219.431418] [ 4605] 136 4605 64452 7713 57 0 0 named [ 1219.431485] [ 4664] 0 4664 2128 49 9 0 0 dd [ 1219.431545] [ 4666] 131 4666 2073 1093 9 0 0 klogd [ 1219.431613] [ 4674] 0 4674 10589 160 26 0 0 kadmind [ 1219.431673] [ 4751] 0 4751 3297 80 11 0 0 mdadm [ 1219.431742] [ 4772] 114 4772 33045 260 28 0 0 bacula-sd [ 1219.431802] [ 4831] 0 4831 21770 629 46 0 0 apache2 [ 1219.431873] [ 4833] 33 4833 21703 544 45 0 0 apache2 [ 1219.431934] [ 4834] 0 4834 21703 545 44 0 0 apache2 [ 1219.432002] [ 4835] 33 4835 61700 1035 87 0 0 php5-cgi [ 1219.432085] [ 4837] 33 4837 94011 1123 75 0 0 apache2 [ 1219.432147] [ 4839] 33 4839 94013 1128 75 0 0 apache2 [ 1219.432215] [ 4975] 0 4975 6155 88 18 0 0 cron [ 1219.432275] [ 4977] 0 4977 2646 58 10 0 0 inetd [ 1219.432342] [ 5039] 0 5039 3379 1410 10 0 0 dhcpd [ 1219.432403] [ 5064] 0 5064 2368 98 10 0 0 mysqld_safe [ 1219.432471] [ 5383] 103 5383 138079 16199 74 0 0 mysqld [ 1219.432532] [ 5384] 0 5384 1057 49 8 0 0 logger [ 1219.432599] [ 5399] 113 5399 27465 840 44 0 0 postgres [ 1219.432660] [ 5449] 129 5449 5481 176 15 0 0 privoxy [ 1219.432727] [ 5476] 0 5476 3691 48 11 0 0 radvd [ 1219.432788] [ 5478] 119 5478 3691 65 12 0 0 radvd [ 1219.432854] [ 5479] 110 5479 9911 231 24 0 0 ntpd [ 1219.432915] [ 5660] 0 5660 5510 245 14 0 0 smartd [ 1219.432982] [ 5763] 0 5763 3791 126 13 0 0 tincd [ 1219.433043] [ 5780] 0 5780 13075 182 28 0 -1000 sshd [ 1219.433110] [ 5798] 113 5798 27465 354 41 0 0 postgres [ 1219.433172] [ 5799] 113 5799 27465 310 40 0 0 postgres [ 1219.433239] [ 5800] 113 5800 27498 348 40 0 0 postgres [ 1219.433300] [ 5801] 113 5801 20047 324 36 0 0 postgres [ 1219.433368] [ 5894] 0 5894 20888 237 43 0 0 winbindd [ 1219.433429] [ 5897] 0 5897 20887 232 41 0 0 winbindd [ 1219.433497] [ 6010] 118 6010 421074 4794 354 0 0 asterisk [ 1219.433558] [ 6367] 102 6367 11866 141 25 0 0 exim4 [ 1219.433625] [ 6399] 120 6399 1057 49 8 0 0 uml_switch [ 1219.433686] [ 6443] 0 6443 1023 30 7 0 0 minissdpd [ 1219.433754] [ 6472] 0 6472 2633 61 11 0 0 miniupnpd [ 1219.433815] [ 6522] 0 6522 4542 68 14 0 0 getty [ 1219.433882] [ 6523] 0 6523 4542 67 14 0 0 getty [ 1219.433943] [ 6524] 0 6524 4542 67 14 0 0 getty [ 1219.434010] [ 6525] 0 6525 4542 67 14 0 0 getty [ 1219.434071] [ 6526] 0 6526 4542 66 13 0 0 getty [ 1219.434137] [ 6527] 0 6527 4542 67 14 0 0 getty [ 1219.434198] [ 6528] 0 6528 23174 345 49 0 0 sshd [ 1219.434265] [ 6540] 0 6540 261278 379 44 0 0 console-kit-dae [ 1219.434327] [ 6607] 0 6607 48754 209 33 0 0 polkitd [ 1219.434395] [ 6613] 0 6613 5802 625 17 0 0 bash [ 1219.434455] [ 6850] 0 6850 23138 302 50 0 0 sshd [ 1219.434522] [ 6855] 0 6855 5801 619 17 0 0 bash [ 1219.434582] [ 7006] 0 7006 6335 245 17 0 0 top [ 1219.434649] [ 7021] 0 7021 23138 276 49 0 0 sshd [ 1219.434710] [ 7026] 0 7026 5801 621 16 0 0 bash [ 1219.434781] [ 7463] 0 7463 23138 300 48 0 0 sshd [ 1219.434841] [ 7468] 0 7468 5801 618 17 0 0 bash [ 1219.434909] [ 7623] 0 7623 14346 2321 33 0 0 iotop [ 1219.434970] [ 7646] 0 7646 4096 187 14 0 0 watch [ 1219.435037] [ 7657] 0 7657 23138 271 48 0 0 sshd [ 1219.435097] [ 7662] 0 7662 5801 622 16 0 0 bash [ 1219.435164] [ 7852] 0 7852 1948 48 9 0 0 xargs [ 1219.435224] [ 7872] 0 7872 1886 47 9 0 0 tail [ 1219.435292] [ 7892] 0 7892 20890 225 41 0 0 winbindd [ 1219.435353] [ 7893] 0 7893 20886 235 41 0 0 winbindd [ 1219.435423] [ 8666] 0 8666 3166 72 12 0 0 fixfrag [ 1219.435484] [ 8675] 0 8675 1022 28 7 0 0 time [ 1219.435551] [ 8676] 0 8676 2900 42 10 0 0 btrfs [ 1219.435612] [ 8707] 113 8707 27710 686 42 0 0 postgres [ 1219.435679] [ 8708] 0 8708 4095 104 10 0 0 watch [ 1219.435740] [ 8709] 0 8709 2391 38 7 0 0 sh [ 1219.435805] Out of memory: Kill process 5383 (mysqld) score 16 or sacrifice child [ 1219.435874] Killed process 5383 (mysqld) total-vm:552316kB, anon-rss:64576kB, file-rss:220kB [ 1220.014027] verify_parent_transid: 97 callbacks suppressed [ 1220.014109] parent transid verify failed on 8049836576768 wanted 1736567 found 1736533 [ 1220.020710] parent transid verify failed on 8049836576768 wanted 1736567 found 1736533 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html