Konrad Rzeszutek Wilk
2013-Feb-23 01:06 UTC
Re: x86: mm: Fix vmalloc_fault oops during lazy MMU updates.
On Thu, Feb 21, 2013 at 05:56:35PM +0200, Samu Kallio wrote:> On Thu, Feb 21, 2013 at 2:33 PM, Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com> wrote: > > On Sun, Feb 17, 2013 at 02:35:52AM -0000, Samu Kallio wrote: > >> In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops > >> when lazy MMU updates are enabled, because set_pgd effects are being > >> deferred. > >> > >> One instance of this problem is during process mm cleanup with memory > >> cgroups enabled. The chain of events is as follows: > >> > >> - zap_pte_range enables lazy MMU updates > >> - zap_pte_range eventually calls mem_cgroup_charge_statistics, > >> which accesses the vmalloc''d mem_cgroup per-cpu stat area > >> - vmalloc_fault is triggered which tries to sync the corresponding > >> PGD entry with set_pgd, but the update is deferred > >> - vmalloc_fault oopses due to a mismatch in the PUD entries > >> > >> Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the > >> changes visible to the consistency checks. > > > > How do you reproduce this? Is there a BUG() or WARN() trace that > > is triggered when this happens? > > In my case I''ve seen this triggered on an Amazon EC2 (Xen PV) instance > under heavy load spawning many LXC containers. The best I can say at > this point is that the frequency of this bug seems to be linked to how > busy the machine is. > > The earliest report of this problem was from 3.3: > http://comments.gmane.org/gmane.linux.kernel.cgroups/5540 > I can personally confirm the issue since 3.5. > > Here''s a sample bug report from a 3.7 kernel (vanilla with Xen XSAVE patch > for EC2 compatibility). The latest kernel version I have tested and seen this > problem occur is 3.7.9.Ingo, I am OK with this patch. Are you OK taking this in or should I take it (and add the nice RIP below)? It should also have CC: stable@vger.kernel.org on it. FYI, There is also a Red Hat bug for this: https://bugzilla.redhat.com/show_bug.cgi?id=914737> > [11852214.733630] ------------[ cut here ]------------ > [11852214.733642] kernel BUG at arch/x86/mm/fault.c:397! > [11852214.733648] invalid opcode: 0000 [#1] SMP > [11852214.733654] Modules linked in: veth xt_nat xt_comment fuse btrfs > libcrc32c zlib_deflate ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat > xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack > bridge stp llc iptable_filter ip_tables x_tables ghash_clmulni_intel > aesni_intel aes_x86_64 ablk_helper cryptd xts lrw gf128mul microcode > ext4 crc16 jbd2 mbcache > [11852214.733695] CPU 1 > [11852214.733700] Pid: 1617, comm: qmgr Not tainted 3.7.0-1-ec2 #1 > [11852214.733705] RIP: e030:[<ffffffff8143018d>] [<ffffffff8143018d>] > vmalloc_fault+0x14b/0x249 > [11852214.733725] RSP: e02b:ffff88083e57d7f8 EFLAGS: 00010046 > [11852214.733730] RAX: 0000000854046000 RBX: ffffe8ffffc80d70 RCX: > ffff880000000000 > [11852214.733736] RDX: 00003ffffffff000 RSI: ffff880854046ff8 RDI: > 0000000000000000 > [11852214.733744] RBP: ffff88083e57d818 R08: 0000000000000000 R09: > ffff880000000ff8 > [11852214.733750] R10: 0000000000007ff0 R11: 0000000000000001 R12: > ffff880854686e88 > [11852214.733758] R13: ffffffff8180ce88 R14: ffff88083e57d948 R15: > 0000000000000000 > [11852214.733768] FS: 00007ff3bf0f8740(0000) > GS:ffff88088b480000(0000) knlGS:0000000000000000 > [11852214.733777] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > [11852214.733782] CR2: ffffe8ffffc80d70 CR3: 0000000854686000 CR4: > 0000000000002660 > [11852214.733790] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [11852214.733796] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [11852214.733803] Process qmgr (pid: 1617, threadinfo > ffff88083e57c000, task ffff88084474b3e0) > [11852214.733810] Stack: > [11852214.733814] 0000000000000029 0000000000000002 ffffe8ffffc80d70 > ffff88083e57d948 > [11852214.733828] ffff88083e57d928 ffffffff8103e0c7 0000000000000000 > ffff88083e57d8d0 > [11852214.733840] ffff88084474b3e0 0000000000000060 0000000000000000 > 0000000000006cf6 > [11852214.733852] Call Trace: > [11852214.733861] [<ffffffff8103e0c7>] __do_page_fault+0x2c7/0x4a0 > [11852214.733871] [<ffffffff81004ac2>] ? xen_mc_flush+0xb2/0x1b0 > [11852214.733880] [<ffffffff810032ce>] ? xen_end_context_switch+0x1e/0x30 > [11852214.733888] [<ffffffff810043cb>] ? xen_write_msr_safe+0x9b/0xc0 > [11852214.733900] [<ffffffff810125b3>] ? __switch_to+0x163/0x4a0 > [11852214.733907] [<ffffffff8103e2de>] do_page_fault+0xe/0x10 > [11852214.733919] [<ffffffff81437f98>] page_fault+0x28/0x30 > [11852214.733930] [<ffffffff8115e873>] ? > mem_cgroup_charge_statistics.isra.12+0x13/0x50 > [11852214.733940] [<ffffffff8116012e>] __mem_cgroup_uncharge_common+0xce/0x2d0 > [11852214.733948] [<ffffffff81007fee>] ? xen_pte_val+0xe/0x10 > [11852214.733958] [<ffffffff8116391a>] mem_cgroup_uncharge_page+0x2a/0x30 > [11852214.733966] [<ffffffff81139e78>] page_remove_rmap+0xf8/0x150 > [11852214.733976] [<ffffffff8112d78a>] ? vm_normal_page+0x1a/0x80 > [11852214.733984] [<ffffffff8112e5b3>] unmap_single_vma+0x573/0x860 > [11852214.733994] [<ffffffff81114520>] ? release_pages+0x1f0/0x230 > [11852214.734004] [<ffffffff810054aa>] ? __xen_pgd_walk+0x16a/0x260 > [11852214.734018] [<ffffffff8112f0b2>] unmap_vmas+0x52/0xa0 > [11852214.734026] [<ffffffff81136e08>] exit_mmap+0x98/0x170 > [11852214.734034] [<ffffffff8104b929>] mmput+0x59/0x110 > [11852214.734043] [<ffffffff81053d95>] exit_mm+0x105/0x130 > [11852214.734051] [<ffffffff814376e0>] ? _raw_spin_lock_irq+0x10/0x40 > [11852214.734059] [<ffffffff81053f27>] do_exit+0x167/0x900 > [11852214.734070] [<ffffffff8106093d>] ? __sigqueue_free+0x3d/0x50 > [11852214.734079] [<ffffffff81060b9e>] ? __dequeue_signal+0x10e/0x1f0 > [11852214.734087] [<ffffffff810549ff>] do_group_exit+0x3f/0xb0 > [11852214.734097] [<ffffffff81063431>] get_signal_to_deliver+0x1c1/0x5e0 > [11852214.734107] [<ffffffff8101334f>] do_signal+0x3f/0x960 > [11852214.734114] [<ffffffff811aae61>] ? ep_poll+0x2a1/0x360 > [11852214.734122] [<ffffffff81083420>] ? try_to_wake_up+0x2d0/0x2d0 > [11852214.734129] [<ffffffff81013cd8>] do_notify_resume+0x48/0x60 > [11852214.734138] [<ffffffff81438a5a>] int_signal+0x12/0x17 > [11852214.734143] Code: ff ff 3f 00 00 48 21 d0 4c 8d 0c 30 ff 14 25 > b8 f3 81 81 48 21 d0 48 01 c6 48 83 3e 00 0f 84 fa 00 00 00 49 8b 39 > 48 85 ff 75 02 <0f> 0b ff 14 25 e0 f3 81 81 49 89 c0 48 8b 3e ff 14 25 > e0 f3 81 > [11852214.734212] RIP [<ffffffff8143018d>] vmalloc_fault+0x14b/0x249 > [11852214.734222] RSP <ffff88083e57d7f8> > [11852214.734231] ---[ end trace 81ac798210f95867 ]--- > [11852214.734237] Fixing recursive fault but reboot is needed! > > > Also pls next time also CC me. > > Will do, I originally CC''d Jeremy since made some lazy MMU related > cleanups in arch/x86/mm/fault.c, and I thought he might have a comment > on this.