Xen: 3.2.1-rc1 (I can get the exact changeset if needed) domU: 2.6.16.33 PAE (XEN) ----[ Xen-3.2.1-rc1 x86_64 debug=y Not tainted ]---- (XEN) CPU: 3 (XEN) RIP: e008:[<ffff828c8013dee4>] put_page_type+0x17/0x107 (XEN) RFLAGS: 0000000000210282 CONTEXT: hypervisor (XEN) rax: 00001c9f2d2abca8 rbx: ffff9f232d2abca8 rcx: 0000000080000000 (XEN) rdx: 000000b72dedde51 rsi: 00000000002f25fd rdi: ffff9f232d2abca8 (XEN) rbp: ffff8300cee0fcb8 rsp: ffff8300cee0fc98 r8: 0000000000000000 (XEN) r9: 00000000deadbeef r10: ffff828c801c5bf0 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: ffff9f232d2abca8 r14: ffff8300cfc84100 (XEN) r15: ffff8300cfc84118 cr0: 000000008005003b cr4: 00000000000026b0 (XEN) cr3: 000000062ffd7000 cr2: ffff9f232d2abcc0 (XEN) ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff8300cee0fc98: (XEN) ffff8300cee0fcd8 ffff9f232d2abca8 0000000000000000 00000000002f25fd (XEN) ffff8300cee0fcd8 ffff828c8013b409 ffff8300cfc850f8 ffff8302f25fd000 (XEN) ffff8300cee0fd08 ffff828c8013c06d ffff8300cfc84100 ffff8284075def88 (XEN) 0000000068000001 ffff8300cfc850f8 ffff8300cee0fd38 ffff828c8013de5a (XEN) 0000000060000001 0000000068000000 ffff8284075def88 ffff8300cfc850f8 (XEN) ffff8300cee0fd68 ffff828c8013df63 ffff8284075def88 ffff8284075def88 (XEN) ffff8284075def88 ffff8300cfc84100 ffff8300cee0fdb8 ffff828c80131680 (XEN) 0000000088000000 0000000080000000 ffff8300cee0ff28 ffff8300cfc84100 (XEN) ffff8300cfc84100 00000000b31fc868 0000000000000000 0000000000000000 (XEN) ffff8300cee0fdd8 ffff828c80131a94 ffff8300cfc84100 0000000000000000 (XEN) ffff8300cee0fe08 ffff828c80105638 ffff8300cee0fe18 ffff828c80114d70 (XEN) 00000000b31fc868 fffffffffffffff3 ffff8300cee0ff08 ffff828c8010479f (XEN) ffff8300cee0fe48 ffff8300cee34130 0000000000000003 0001b932a9ddc50a (XEN) 0000000000200282 0000000000000000 0000000500000002 083ca594b7b50067 (XEN) 0832ab4c011fc898 b7dadc50b7b5d68c b7a733e400000001 00000001b79fccdc (XEN) 080facafb31fc8c8 081361e008313e98 080797e7b76c1934 00000000b76c1950 (XEN) b7da802c00000060 b761db6c00000000 0805946cb31fc8e8 b7da802cb761db6c (XEN) b7dab6a000000000 00000002b5d5451c a5dba1eea5dba1ee 0000001f00000000 (XEN) ffff8300cee0fee8 ffff8300cee34100 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 00007cff311f00b7 ffff828c801bdd50 (XEN) Xen call trace: (XEN) [<ffff828c8013dee4>] put_page_type+0x17/0x107 (XEN) [<ffff828c8013b409>] put_page_from_l3e+0x3f/0x4e (XEN) [<ffff828c8013c06d>] free_l3_table+0x78/0xc4 (XEN) [<ffff828c8013de5a>] free_page_type+0x1d4/0x247 (XEN) [<ffff828c8013df63>] put_page_type+0x96/0x107 (XEN) [<ffff828c80131680>] relinquish_memory+0xce/0x262 (XEN) [<ffff828c80131a94>] domain_relinquish_resources+0xd1/0x1b0 (XEN) [<ffff828c80105638>] domain_kill+0x77/0x164 (XEN) [<ffff828c8010479f>] do_domctl+0x4dd/0xc1e (XEN) [<ffff828c801bdd50>] compat_tracing_off+0xb/0x64 (XEN) (XEN) Pagetable walk from ffff9f232d2abcc0: (XEN) L4[0x13e] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 3: (XEN) FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: ffff9f232d2abcc0 (XEN) **************************************** _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote: > On 3/4/08 14:27, "Christopher S. Aker" <caker@theshore.net> wrote: > >> I misspoke, dom0 was the 2.6.16.33. domUs were a mix of 2.6.24.3 >> pv_ops, and 2.6.18.8. We have about a dozen of these boxes deployed >> with this version, each with 30-40 domains just doing their thing -- >> nothing crazy. > > That''s interesting. 2.6.24 is less tested than other Linux kernels, and > being pv_ops it is quite different. It''s not unlikely to have corner-case > bugs that crash it or, worst case, tickle dormant problems in the hypervisor > itself. > >> Maybe the symbols would help just a little bit? In any case, here are >> the files: >> >> http://theshore.net/~caker/xen/BUGfatal_page_fault/ > > I will take a look. It might help narrow down the possibilities a bit. > >> I guess I''ll set up a thrash test environment full of nothing but >> domains looping crashme and make -j kernel builds and the like. Sounds >> like fun. > > Okay, is this a bug you''ve seen exactly once so far? That would be annoying! So far just the one time. We just took Xen out of (a three year) beta, and so we''re gearing up for a large deployment and need to eliminate any potential host/hypervisor crashes. I can deal with domain bugs, but having the whole box go down is painful. Needless to say, I''m anxious to get this fixed, and will help in any way I can. Can I provide anything else that you can think of? In the meantime, we''ll work up a thrash-xen box. Thanks, -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 3/4/08 15:04, "Christopher S. Aker" <caker@theshore.net> wrote:>>> Maybe the symbols would help just a little bit? In any case, here are >>> the files: >>> >>> http://theshore.net/~caker/xen/BUGfatal_page_fault/ >> >> I will take a look. It might help narrow down the possibilities a bit. >>My analysis is that the hypervisor crashed because one of the entries in a dying guest''s third-level page directory has the present bit (bit 0) set, yet the physical address mapped by that entry is 0xb72dedde51000. That is a rather large and obviously bogus number! It causes us to access way off the end of an array indexed by physical address, resulting in a fatal page fault. Obviously the question is: Where did the bogus address come from? That''s going to be rather hard to answer without finding a more reliable repro of the bug, and then adding some hypervisor tracing. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> That''s going to be rather hard to answer without finding a more reliable > repro of the bug, and then adding some hypervisor tracing.Here are two more Xen traces with this problem. These always appear to occur after we''re forced to destroy a domain. The first trace is a DoubleDump<tm> and has something new in the second dump... http://www.theshore.net/~caker/xen/build-1.11/ I still don''t have a method to reproduce, but since we''re hitting this with some frequency, would it be worth it to stick in some extra debugging now? ====== First trace ===== ----[ Xen-3.2.1-rc5 x86_64 debug=y Not tainted ]---- CPU: 1 RIP: e008:[<ffff828c8013dee4>] put_page_type+0x17/0x107 RFLAGS: 0000000000210286 CONTEXT: hypervisor rax: 00001da2f4162bf0 rbx: ffffa026f4162bf0 rcx: 0000000080000000 rdx: 000000bdac808de6 rsi: 0000000000402fe3 rdi: ffffa026f4162bf0 rbp: ffff8300cf13fbf8 rsp: ffff8300cf13fbd8 r8: 0000000000000000 r9: 00000000deadbeef r10: ffff828c801c5bf0 r11: 0000000000000000 r12: 0000000000000000 r13: ffffa026f4162bf0 r14: 0000000000402fe3 r15: ffff82840a077b78 cr0: 000000008005003b cr4: 00000000000026b0 cr3: 000000062ffdd000 cr2: ffffa026f4162c08 ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0000 cs: e008 Xen stack trace from rsp=ffff8300cf13fbd8: 0000000000000002 ffffa026f4162bf0 0000000000000000 ffff8300cee48100 ffff8300cf13fc18 ffff828c8013b3bb 0000000000200202 ffff830402fe3000 ffff8300cf13fc58 ffff828c8013bfcd 00000000cee48100 ffff8300cee48100 ffff82840a077b78 000000004c000001 ffff8300cee48100 ffff8300cee48118 ffff8300cf13fc88 ffff828c8013de4a 0000000044000001 000000004c000000 ffff82840a077b78 ffff8300cee48100 ffff8300cf13fcb8 ffff828c8013df63 00007cff30ec0337 ffff82840a077b78 0000000000000003 00000000004011a4 ffff8300cf13fcd8 ffff828c8013b409 ffff8300cf13fd68 ffff8304011a4018 ffff8300cf13fd08 ffff828c8013c06d ffff8300cee48100 ffff82840a02c1a0 0000000068000001 ffff8300cee490f8 ffff8300cf13fd38 ffff828c8013de5a 0000000060000001 0000000068000000 ffff82840a02c1a0 ffff8300cee490f8 ffff8300cf13fd68 ffff828c8013df63 ffff82840a02c1a0 ffff82840a02c1a0 ffff82840a02c1a0 ffff8300cee48100 ffff8300cf13fdb8 ffff828c80131680 0000000088000000 0000000080000000 ffff8300cf13ff28 ffff8300cee48100 ffff8300cee48100 00000000b4dfc508 0000000000000000 0000000000000000 ffff8300cf13fdd8 ffff828c80131a94 ffff8300cee48100 0000000000000000 ffff8300cf13fe08 ffff828c80105638 ffff82840f448b58 ffff8300cf13fe28 00000000b4dfc508 fffffffffffffff3 ffff8300cf13ff08 ffff828c8010479f 00000000000000fb ffff8300cee3a130 ffff8300cf13fe68 ffff828c8011c746 0000000000200282 ffff8300ceefe118 0000000500000002 083010acb7ab000a Xen call trace: [<ffff828c8013dee4>] put_page_type+0x17/0x107 [<ffff828c8013b3bb>] put_page_from_l2e+0x3f/0x4e [<ffff828c8013bfcd>] free_l2_table+0xa6/0xce [<ffff828c8013de4a>] free_page_type+0x1c4/0x247 [<ffff828c8013df63>] put_page_type+0x96/0x107 [<ffff828c8013b409>] put_page_from_l3e+0x3f/0x4e [<ffff828c8013c06d>] free_l3_table+0x78/0xc4 [<ffff828c8013de5a>] free_page_type+0x1d4/0x247 [<ffff828c8013df63>] put_page_type+0x96/0x107 [<ffff828c80131680>] relinquish_memory+0xce/0x262 [<ffff828c80131a94>] domain_relinquish_resources+0xd1/0x1b0 [<ffff828c80105638>] domain_kill+0x77/0x164 [<ffff828c8010479f>] do_domctl+0x4dd/0xc1e [<ffff828c801bdd50>] compat_tracing_off+0xb/0x64 Pagetable walk from ffffa026f4162c08: L4[0x140] = 0000000000000000 ffffffffffffffff **************************************** Panic on CPU 1: FATAL PAGE FAULT [error_code=0000] Faulting linear address: ffffa026f4162c08 **************************************** Reboot in five seconds... ...3 seconds later, this occurred... Assertion ''__cpus_subset(&(cpumask), &(cpu_online_map), 32)'' failed at smp.c:84 ----[ Xen-3.2.1-rc5 x86_64 debug=y Not tainted ]---- CPU: 0 RIP: e008:[<ffff828c80145c68>] send_IPI_mask_flat+0x29/0x9c RFLAGS: 0000000000010002 CONTEXT: hypervisor rax: 00000000fffffffe rbx: ffff8300cee3c100 rcx: 0000000000000003 rdx: 0000000000000040 rsi: 00000000000000fc rdi: 0000000000000004 rbp: ffff828c80237be8 rsp: ffff828c80237bd0 r8: ffff828c8024c780 r9: 0000000000000002 r10: 00000000deadbeef r11: 0000000000000000 r12: 0000000000000004 r13: 00000000000000fc r14: 0000000000000010 r15: 00001485db7a5091 cr0: 000000008005003b cr4: 00000000000026b0 cr3: 00000003ff15a000 cr2: 00000000e3015078 ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008 Xen stack trace from rsp=ffff828c80237bd0: ffff8300cee3c100 0000000000000086 0000000000000000 ffff828c80237c08 ffff828c8014601a ffff8300cee30f00 0000000000000004 ffff828c80237c38 ffff828c80114da0 0000000000000004 ffff828c80137fe0 0000000000000004 ffff828c8025951c ffff828c80237c68 ffff828c80119b18 ffff828c80237c98 ffff828c80137ac2 ffff8300cee3c100 ffff8300cfdd4100 ffff828c80237c98 ffff828c80107409 00000000c0621300 ffff8300cfdd4100 ffff8300cee30f00 0000000000000000 ffff828c80237ca8 ffff828c801075c9 ffff828c80237cd8 ffff828c80137fe0 ffff828c80259500 ffff828c8025951c 0000000000000098 ffff828c80237d38 ffff828c80237d28 ffff828c80137ac2 0000000000000082 0000000000000000 ffff828c80237d18 0000000000000009 00000000ffffffff ffff828c801ebb60 ffff828c8020e100 00001485db7a5091 00007d737fdc82a7 ffff828c801336e6 00001485db7a5091 ffff828c8020e100 ffff828c801ebb60 00000000ffffffff ffff828c80237de8 0000000000000009 0000000000000000 00000000deadbeef 0000000000000000 0000000000000000 000000007d9b040e 000000007d8a4358 000000000000290c 00000000001e8480 00000000000003e8 0000009800000000 ffff828c8012ac48 000000000000e008 0000000000000216 ffff828c80237de8 0000000000000000 00001485db7a5091 ffff828c80237e08 ffff828c80146257 ffff828c80237f28 ffff828c8020e534 ffff828c80237e28 ffff828c80145b9a ffff828c80237f28 ffff828c8020e534 ffff828c80237e38 ffff828c80146312 00007d737fdc8197 ffff828c801347a0 00001485db7a5091 Xen call trace: [<ffff828c80145c68>] send_IPI_mask_flat+0x29/0x9c [<ffff828c8014601a>] smp_send_event_check_mask+0x3e/0x40 [<ffff828c80114da0>] csched_vcpu_wake+0x242/0x259 [<ffff828c80119b18>] vcpu_wake+0x12d/0x248 [<ffff828c80107409>] evtchn_set_pending+0xe5/0x15c [<ffff828c801075c9>] send_guest_pirq+0x61/0x63 [<ffff828c80137fe0>] __do_IRQ_guest+0x19c/0x1b2 [<ffff828c80137ac2>] do_IRQ+0x5a/0x1a7 [<ffff828c801336e6>] common_interrupt+0x26/0x30 [<ffff828c8012ac48>] __udelay+0x30/0x48 [<ffff828c80146257>] smp_send_stop+0x39/0x67 [<ffff828c80145b9a>] machine_restart+0x4f/0xc5 [<ffff828c80146312>] smp_call_function_interrupt+0x79/0xa7 [<ffff828c801347a0>] call_function_interrupt+0x30/0x40 [<ffff828c8012c73b>] default_idle+0x2f/0x34 [<ffff828c8012c7ff>] idle_loop+0x70/0x77 **************************************** Panic on CPU 0: Assertion ''__cpus_subset(&(cpumask), &(cpu_online_map), 32)'' failed at smp.c:84 **************************************** Reboot in five seconds... ====== Second trace ===== ----[ Xen-3.2.1-rc5 x86_64 debug=y Not tainted ]---- CPU: 0 RIP: e008:[<ffff828c8013dee4>] put_page_type+0x17/0x107 RFLAGS: 0000000000210286 CONTEXT: hypervisor rax: 00000a51169fd050 rbx: ffff8cd5169fd050 rcx: 0000000080000000 rdx: 0000004206f73202 rsi: 00000000004041e1 rdi: ffff8cd5169fd050 rbp: ffff828c80237bf8 rsp: ffff828c80237bd8 r8: 0000000000000000 r9: 00000000deadbeef r10: ffff828c801c5bf0 r11: 0000000000000000 r12: 0000000000000000 r13: ffff8cd5169fd050 r14: 00000000004041e1 r15: ffff82840a0a4b28 cr0: 000000008005003b cr4: 00000000000026b0 cr3: 000000062ffd9000 cr2: ffff8cd5169fd068 ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0000 cs: e008 Xen stack trace from rsp=ffff828c80237bd8: ffff828409df5d01 ffff8cd5169fd050 0000000000000000 ffff8300ceea0100 ffff828c80237c18 ffff828c8013b3bb 0000000400000004 ffff8304041e1000 ffff828c80237c58 ffff828c8013bfcd 00000003f2f24027 ffff8300ceea0100 ffff82840a0a4b28 0000000048000001 ffff8300ceea0100 ffff8300ceea0118 ffff828c80237c88 ffff828c8013de4a 0000000040000001 0000000048000000 ffff82840a0a4b28 ffff8300ceea0100 ffff828c80237cb8 ffff828c8013df63 0000000000000000 ffff82840a0a4b28 0000000000000000 0000000000402dd4 ffff828c80237cd8 ffff828c8013b409 ffff8300ceea0100 ffff830402dd4000 ffff828c80237d08 ffff828c8013c06d ffff8300ceea0100 ffff82840a072920 0000000068000001 ffff8300ceea10f8 ffff828c80237d38 ffff828c8013de5a 0000000060000001 0000000068000000 ffff82840a072920 ffff8300ceea10f8 ffff828c80237d68 ffff828c8013df63 ffff82840a072920 ffff82840a072920 ffff82840a072920 ffff8300ceea0100 ffff828c80237db8 ffff828c80131680 0000000088000000 0000000080000000 ffff828c80237f28 ffff8300ceea0100 ffff8300ceea0100 00000000b2cf9868 0000000000000000 0000000000000000 ffff828c80237dd8 ffff828c80131a94 ffff8300ceea0100 0000000000000000 ffff828c80237e08 ffff828c80105638 ffff828c80237e18 ffff828c80114da0 00000000b2cf9868 fffffffffffffff3 ffff828c80237f08 ffff828c8010479f ffff828c80237e48 ffff8300cee36130 0000000000000000 000078cdfb20f27f 0000000000200282 0000000000000000 0000000500000002 081d66ecb7af0010 Xen call trace: [<ffff828c8013dee4>] put_page_type+0x17/0x107 [<ffff828c8013b3bb>] put_page_from_l2e+0x3f/0x4e [<ffff828c8013bfcd>] free_l2_table+0xa6/0xce [<ffff828c8013de4a>] free_page_type+0x1c4/0x247 [<ffff828c8013df63>] put_page_type+0x96/0x107 [<ffff828c8013b409>] put_page_from_l3e+0x3f/0x4e [<ffff828c8013c06d>] free_l3_table+0x78/0xc4 [<ffff828c8013de5a>] free_page_type+0x1d4/0x247 [<ffff828c8013df63>] put_page_type+0x96/0x107 [<ffff828c80131680>] relinquish_memory+0xce/0x262 [<ffff828c80131a94>] domain_relinquish_resources+0xd1/0x1b0 [<ffff828c80105638>] domain_kill+0x77/0x164 [<ffff828c8010479f>] do_domctl+0x4dd/0xc1e [<ffff828c801bdd50>] compat_tracing_off+0xb/0x64 Pagetable walk from ffff8cd5169fd068: L4[0x119] = 0000000000000000 ffffffffffffffff **************************************** Panic on CPU 0: FATAL PAGE FAULT [error_code=0000] Faulting linear address: ffff8cd5169fd068 **************************************** Reboot in five seconds... -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 22/4/08 19:19, "Christopher S. Aker" <caker@theshore.net> wrote:> Here are two more Xen traces with this problem. These always appear to > occur after we''re forced to destroy a domain. The first trace is a > DoubleDump<tm> and has something new in the second dump... > > http://www.theshore.net/~caker/xen/build-1.11/ > > I still don''t have a method to reproduce, but since we''re hitting this > with some frequency, would it be worth it to stick in some extra > debugging now?The second crash is just some overzealous asserting. Easily fixed but also not very interesting, unfortunately. The two main backtraces are exactly the same bug as you saw last time. Except in this case you have bogus nonsense in a pair of L2 pagetable entries, whereas last time the garbage was in an L3 entry. My best guess just now, seeing as noone else has reported ever seeing this, is that maybe you have a bad driver or hardware corrupting memory? Obviously that''s a bit of a stab in the dark though. Have you seen this particular type of crash on multiple different machines? If so, are they different types of machine? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> The second crash is just some overzealous asserting. Easily fixed but also > not very interesting, unfortunately. > > The two main backtraces are exactly the same bug as you saw last time. > Except in this case you have bogus nonsense in a pair of L2 pagetable > entries, whereas last time the garbage was in an L3 entry. > > My best guess just now, seeing as noone else has reported ever seeing this, > is that maybe you have a bad driver or hardware corrupting memory? Obviously > that''s a bit of a stab in the dark though. > > Have you seen this particular type of crash on multiple different machines? > If so, are they different types of machine?Two machines thus far, both are of identical software and hardware configurations. Now that it looks like the 3ware issues have been corrected in post-Xen 3.1 dom0, I''ll update our boxes from 2.6.16.x to 2.6.18.8 and hope for the best. Thanks for your help so far. -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 22/4/08 20:39, "Christopher S. Aker" <caker@theshore.net> wrote:>> My best guess just now, seeing as noone else has reported ever seeing this, >> is that maybe you have a bad driver or hardware corrupting memory? Obviously >> that''s a bit of a stab in the dark though. >> >> Have you seen this particular type of crash on multiple different machines? >> If so, are they different types of machine? > > Two machines thus far, both are of identical software and hardware > configurations.Have you been running this type of workload on a variety of hardware, or are you limited in the range of types of hardware that you''re testing on? This might indicate whether it is significant that you have only seen the crash on a single hardware type. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 22/4/08 20:39, "Christopher S. Aker" <caker@theshore.net> wrote: > >>> My best guess just now, seeing as noone else has reported ever seeing this, >>> is that maybe you have a bad driver or hardware corrupting memory? Obviously >>> that''s a bit of a stab in the dark though. >>> >>> Have you seen this particular type of crash on multiple different machines? >>> If so, are they different types of machine? >> Two machines thus far, both are of identical software and hardware >> configurations. > > Have you been running this type of workload on a variety of hardware, or are > you limited in the range of types of hardware that you''re testing on? This > might indicate whether it is significant that you have only seen the crash > on a single hardware type.Make that three machines. They''re all of the same config. This identical hardware config runs fine under non-Xen. It also only occurs when a domain is being destroyed, so I wouldn''t suspect this is a driver issue or memory corruption given the pattern. Xen is most suspect, in my mind. Will you provide me with some debugging code that''ll make these occurrences more useful in tracking down the problem the next time it triggers? (XEN) Pagetable walk from 00000000c16e3f30: (XEN) L4[0x000] = 00000002bfe8d027 00000000000258e3 (XEN) L3[0x003] = 646c696843206120 ffffffffffffffff (XEN) domain_crash_sync called from entry.S (XEN) Domain 84 (vcpu#2) crashed on cpu#1: (XEN) ----[ Xen-3.2.1-rc1 x86_64 debug=y Not tainted ]---- (XEN) CPU: 1 (XEN) RIP: 0061:[<00000000c0101347>] (XEN) RFLAGS: 0000000000010246 CONTEXT: guest (XEN) rax: 0000000000000000 rbx: 00000000deadbeef rcx: 00000000deadbeef (XEN) rdx: 00000000deadbeef rsi: 00000000deadbeef rdi: 00000000c7006030 (XEN) rbp: 00000000c16e3fac rsp: 00000000c16e3f38 r8: 0000000000000000 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026b0 (XEN) cr3: 000000060f4c8000 cr2: 00000000c0101347 (XEN) ds: 007b es: 007b fs: 0000 gs: 0000 ss: 0069 cs: 0061 (XEN) Guest stack trace from esp=c16e3f38: (XEN) Fault while accessing guest memory. (XEN) ----[ Xen-3.2.1-rc1 x86_64 debug=y Not tainted ]---- (XEN) CPU: 5 (XEN) RIP: e008:[<ffff828c8013dee4>] put_page_type+0x17/0x107 (XEN) RFLAGS: 0000000000210282 CONTEXT: hypervisor (XEN) rax: 000006162f512f98 rbx: ffff889a2f512f98 rcx: 6765746143206568 (XEN) rdx: 00000026f4620797 rsi: 00000000002bfe8d rdi: ffff889a2f512f98 (XEN) rbp: ffff8300cfde7cb8 rsp: ffff8300cfde7c98 r8: 0000000000000000 (XEN) r9: 00000000deadbeef r10: ffff828c801c5bf0 r11: 0000000000000000 (XEN) r12: 0000000000000001 r13: ffff889a2f512f98 r14: ffff8300cee88100 (XEN) r15: ffff8300cee88118 cr0: 000000008005003b cr4: 00000000000026b0 (XEN) cr3: 000000062ffdf000 cr2: ffff889a2f512fb0 (XEN) ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff8300cfde7c98: (XEN) ffff8300cfde7ca8 ffff889a2f512f98 0000000000000001 00000000002bfe8d (XEN) ffff8300cfde7cd8 ffff828c8013b409 ffff8300cee88100 ffff8302bfe8d008 (XEN) ffff8300cfde7d08 ffff828c8013c06d ffff8300cee88100 ffff828406dfc608 (XEN) 0000000068000001 ffff8300cee890f8 ffff8300cfde7d38 ffff828c8013de5a (XEN) 0000000060000001 0000000068000000 ffff828406dfc608 ffff8300cee890f8 (XEN) ffff8300cfde7d68 ffff828c8013df63 ffff828406dfc608 ffff828406dfc608 (XEN) ffff828406dfc608 ffff8300cee88100 ffff8300cfde7db8 ffff828c80131680 (XEN) 0000000088000000 0000000080000000 ffff8300cfde7f28 ffff8300cee88100 (XEN) ffff8300cee88100 00000000b4dfb508 0000000000000000 0000000000000000 (XEN) ffff8300cfde7dd8 ffff828c80131a94 ffff8300cee88100 0000000000000000 (XEN) ffff8300cfde7e08 ffff828c80105638 ffff8300cfde7e08 ffff828c8014601a (XEN) 00000000b4dfb508 fffffffffffffff3 ffff8300cfde7f08 ffff828c8010479f (XEN) 0000000000000001 0000000000000000 0000000000000001 0000000000000000 (XEN) ffff8300cfde7e68 0000000000200286 0000000500000002 082ebba4b7b80054 (XEN) 0836d2a401dfb538 b7ddfc50b7b8f68c b7aa53e400000001 00000001b7a2ecdc (XEN) 080facafb4dfb568 081361e0082f17c0 080797e7b775bf0c 00000000b775bf28 (XEN) b7dda02c00000060 b76f084c00000000 0805946cb4dfb588 b7dda02cb76f084c (XEN) b7ddd6a000000000 00000002b765eeac a5dba1eea5dba1ee 0000001f00000000 (XEN) 0000000000000010 ffff8300cee3c100 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 00007cff302180b7 ffff828c801bdd50 (XEN) Xen call trace: (XEN) [<ffff828c8013dee4>] put_page_type+0x17/0x107 (XEN) [<ffff828c8013b409>] put_page_from_l3e+0x3f/0x4e (XEN) [<ffff828c8013c06d>] free_l3_table+0x78/0xc4 (XEN) [<ffff828c8013de5a>] free_page_type+0x1d4/0x247 (XEN) [<ffff828c8013df63>] put_page_type+0x96/0x107 (XEN) [<ffff828c80131680>] relinquish_memory+0xce/0x262 (XEN) [<ffff828c80131a94>] domain_relinquish_resources+0xd1/0x1b0 (XEN) [<ffff828c80105638>] domain_kill+0x77/0x164 (XEN) [<ffff828c8010479f>] do_domctl+0x4dd/0xc1e (XEN) [<ffff828c801bdd50>] compat_tracing_off+0xb/0x64 (XEN) (XEN) Pagetable walk from ffff889a2f512fb0: (XEN) L4[0x111] = 0000000000000000 ffffffffffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 5: (XEN) FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: ffff889a2f512fb0 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... Thanks, -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/4/08 15:02, "Christopher S. Aker" <caker@theshore.net> wrote:> Make that three machines. They''re all of the same config. This > identical hardware config runs fine under non-Xen. It also only occurs > when a domain is being destroyed, so I wouldn''t suspect this is a driver > issue or memory corruption given the pattern. Xen is most suspect, in > my mind. > > Will you provide me with some debugging code that''ll make these > occurrences more useful in tracking down the problem the next time it > triggers?I suggest you try repro''ing on a slightly different hardware configuration. For example, a different storage controller. Did you repro with a 2.6.18 dom0 yet? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> On 28/4/08 15:02, "Christopher S. Aker" <caker@theshore.net> wrote: >> Will you provide me with some debugging code that''ll make these >> occurrences more useful in tracking down the problem the next time it >> triggers? > > I suggest you try repro''ing on a slightly different hardware configuration. > For example, a different storage controller. Did you repro with a 2.6.18 > dom0 yet?All of our machines are using 3ware RAID cards, so trying this on alternate hardware isn''t an option. We haven''t hit this on 2.6.18 dom0 yet. Newly deployed machines and boxes that crash are being updated to 2.6.18 dom0. I''ll keep you posted :) Thanks, -Chris _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel