Christophe Saout
2009-Mar-05 01:27 UTC
[Xen-devel] pv_ops dom0 BUG in xen_flush_tlb_others
Hi, I started compiling a larger package and after I few minute the machine locked up with the following BUG. It''s the second time I observe this one, so it might be a race condition or something lurking here. (possibly not dom0 specific at all) The BUG() is triggered because the CPU mask is empty. So maybe it''s not a xen bug at all, just a weird occurence in the kernel - native_flush_tlb_others doesn''t have such a check. Thanks, Christophe Mar 5 02:14:05 leto kernel BUG at arch/x86/xen/mmu.c:1386! Mar 5 02:14:05 leto invalid opcode: 0000 [#1] PREEMPT SMP Mar 5 02:14:05 leto last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:01/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/energy_full Mar 5 02:14:05 leto CPU 1 Mar 5 02:14:05 leto Modules linked in: radeon drm iptable_mangle iptable_nat nf_nat ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state xt_tcpudp iptable_filter ip_tables Mar 5 02:14:05 leto Pid: 306, comm: kswapd0 Not tainted 2.6.29-rc6-xen-cs1 #1 2007ZDP Mar 5 02:14:05 leto RIP: e030:[<ffffffff8020cd9a>] [<ffffffff8020cd9a>] xen_flush_tlb_others+0x11a/0x130 Mar 5 02:14:05 leto RSP: e02b:ffff880074141890 EFLAGS: 00010246 Mar 5 02:14:05 leto RAX: 0000000000000000 RBX: ffff880007d2cfa8 RCX: 0000000000000038 Mar 5 02:14:05 leto RDX: 0000003f84157000 RSI: ffff880007d2cd00 RDI: ffff880007d2cfa8 Mar 5 02:14:05 leto RBP: ffff8800741418e0 R08: ffff880007d2cfa8 R09: 0000000000000000 Mar 5 02:14:05 leto R10: ffff880012096100 R11: dead000000200200 R12: 0000003f84157000 Mar 5 02:14:05 leto R13: 0000003f84157000 R14: ffff880007d2cd00 R15: ffff8800741419ec Mar 5 02:14:05 leto FS: 00002b3ce79599c0(0000) GS:ffffc20000014000(0000) knlGS:0000000000000000 Mar 5 02:14:05 leto CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Mar 5 02:14:05 leto CR2: 00002b3cec376020 CR3: 00000000438ab000 CR4: 0000000000002660 Mar 5 02:14:05 leto DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 5 02:14:05 leto DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Mar 5 02:14:05 leto Process kswapd0 (pid: 306, threadinfo ffff880074140000, task ffff880074968c00) Mar 5 02:14:05 leto Stack: Mar 5 02:14:05 leto ffff88002e849a80 ffffe200017ce178 ffffe20001636808 ffff8800741418b8 Mar 5 02:14:05 leto ffffffff8020c052 ffff880007d2cd00 ffff880007d2cfa8 0000003f84157000 Mar 5 02:14:05 leto ffff880007d2cd00 ffff8800741419ec ffff880074141910 ffffffff80237cb2 Mar 5 02:14:05 leto Call Trace: Mar 5 02:14:05 leto [<ffffffff8020c052>] ? xen_pte_val+0x12/0x40 Mar 5 02:14:05 leto [<ffffffff80237cb2>] flush_tlb_page+0x62/0xd0 Mar 5 02:14:05 leto [<ffffffff80236f8a>] ptep_clear_flush_young+0x4a/0x70 Mar 5 02:14:05 leto [<ffffffff802c2e16>] page_referenced_one+0xb6/0x130 Mar 5 02:14:05 leto [<ffffffff802c3181>] page_referenced_file+0x91/0xc0 Mar 5 02:14:05 leto [<ffffffff8020f772>] ? check_events+0x12/0x20 Mar 5 02:14:05 leto [<ffffffff802c3ee2>] page_referenced+0x72/0x140 Mar 5 02:14:05 leto [<ffffffff802ad9f6>] shrink_active_list+0xf6/0x490 Mar 5 02:14:05 leto [<ffffffff8022fc4f>] ? pvclock_clocksource_read+0x4f/0x90 Mar 5 02:14:05 leto [<ffffffff8020f523>] ? xen_clocksource_read+0x43/0x70 Mar 5 02:14:05 leto [<ffffffff802aebb4>] shrink_list+0x5e4/0x730 Mar 5 02:14:05 leto [<ffffffff8020f523>] ? xen_clocksource_read+0x43/0x70 Mar 5 02:14:05 leto [<ffffffff8020edbd>] ? xen_force_evtchn_callback+0xd/0x10 Mar 5 02:14:05 leto [<ffffffff8020f772>] ? check_events+0x12/0x20 Mar 5 02:14:05 leto [<ffffffff802aef55>] shrink_zone+0x255/0x360 Mar 5 02:14:05 leto [<ffffffff802af95d>] kswapd+0x73d/0x7c0 Mar 5 02:14:05 leto [<ffffffff802acb40>] ? isolate_pages_global+0x0/0x250 Mar 5 02:14:05 leto [<ffffffff80260e00>] ? autoremove_wake_function+0x0/0x40 Mar 5 02:14:05 leto [<ffffffff8069de65>] ? _spin_unlock_irqrestore+0x25/0x50 Mar 5 02:14:05 leto [<ffffffff802af220>] ? kswapd+0x0/0x7c0 Mar 5 02:14:05 leto [<ffffffff802af220>] ? kswapd+0x0/0x7c0 Mar 5 02:14:05 leto [<ffffffff80260978>] kthread+0x48/0x90 Mar 5 02:14:05 leto [<ffffffff8021407a>] child_rip+0xa/0x20 Mar 5 02:14:05 leto [<ffffffff80213297>] ? int_ret_from_sys_call+0x7/0x1b Mar 5 02:14:05 leto [<ffffffff80213a21>] ? retint_restore_args+0x5/0x6 Mar 5 02:14:05 leto [<ffffffff80214070>] ? child_rip+0x0/0x20 Mar 5 02:14:05 leto Code: 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 0f 1f 44 00 00 e8 fb eb ff ff eb cf 66 0f 1f 84 00 00 00 00 00 41 c7 06 08 00 00 00 90 eb 8f <0f> 0b 0f 1f 40 Mar 5 02:14:05 leto RIP [<ffffffff8020cd9a>] xen_flush_tlb_others+0x11a/0x130 Mar 5 02:14:05 leto RSP <ffff880074141890> Mar 5 02:14:05 leto ---[ end trace aef6ef9765748c4c ]--- Mar 5 02:14:05 leto note: kswapd0[306] exited with preempt_count 3 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Mar-05 01:37 UTC
Re: [Xen-devel] pv_ops dom0 BUG in xen_flush_tlb_others
Christophe Saout wrote:> Hi, > > I started compiling a larger package and after I few minute the machine > locked up with the following BUG. It''s the second time I observe this > one, so it might be a race condition or something lurking here. > (possibly not dom0 specific at all) > > The BUG() is triggered because the CPU mask is empty. So maybe it''s not > a xen bug at all, just a weird occurence in the kernel - > native_flush_tlb_others doesn''t have such a check. >Interesting. I''ve never seen it trigger, but I think we can easily remove the test with no bad effects. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christophe Saout
2009-Mar-05 01:39 UTC
Re: [Xen-devel] pv_ops dom0 BUG in xen_flush_tlb_others
Hi Jeremy,> > I started compiling a larger package and after I few minute the machine > > locked up with the following BUG. It''s the second time I observe this > > one, so it might be a race condition or something lurking here. > > (possibly not dom0 specific at all) > > > > The BUG() is triggered because the CPU mask is empty. So maybe it''s not > > a xen bug at all, just a weird occurence in the kernel - > > native_flush_tlb_others doesn''t have such a check. > > Interesting. I''ve never seen it trigger, but I think we can easily > remove the test with no bad effects.I just removed it. Let''s see whether the qt build gets through. :-) Christophe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christophe Saout
2009-Mar-05 11:36 UTC
Re: [Xen-devel] pv_ops dom0 BUG in xen_flush_tlb_others
Hi again,> > > I started compiling a larger package and after I few minute the machine > > > locked up with the following BUG. It''s the second time I observe this > > > one, so it might be a race condition or something lurking here. > > > (possibly not dom0 specific at all) > > > > > > The BUG() is triggered because the CPU mask is empty. So maybe it''s not > > > a xen bug at all, just a weird occurence in the kernel - > > > native_flush_tlb_others doesn''t have such a check. > > > > Interesting. I''ve never seen it trigger, but I think we can easily > > remove the test with no bad effects. > > I just removed it. Let''s see whether the qt build gets through. :-)Yes, happily compiling all night and still up and running without anything suspicious in the logs. Christophe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel