Hi Folks, After trials and tribulations getting my serial ports in order, I''m finally able to report on a Xen crash I''ve been seeing. I have two domUs that share & map grant references with each other, then both domUs crash. Shortly thereafter, Xen crashes too. With dead domUs, the grant references they once held are left orphaned. Xen isn''t provisioned to mop this up, hence the well placed BUG(rd==NULL). Make any sense? I''d be over my head to submit a patch for this, so I''m hoping for help. Thanks, -steve I did a pull this morning. (XEN) (file=grant_table.c, line=936) Grant release (0) ref:(605) flags:(6) dom:(1) (XEN) BUG at grant_table.c:939 (XEN) CPU: 0 (XEN) EIP: e008:[<ff10c67e>] gnttab_release_mappings+0x110/0x34d (XEN) EFLAGS: 00210282 CONTEXT: hypervisor (XEN) eax: ff1922d8 ebx: ffbe7080 ecx: 00000000 edx: 00000000 (XEN) esi: 00000007 edi: ff1a3fac ebp: ff1a3dcc esp: ff1a3d94 (XEN) cr0: 80050033 cr3: ddd58000 (XEN) ds: e010 es: e010 fs: 0000 gs: 0033 ss: e010 cs: e008 (XEN) Xen stack trace from esp=ff1a3d94: (XEN) ff180795 ff18087f 000003ab 00000000 0000025d 00000006 00000001 ff12cd4c (XEN) ffbae080 00000000 00000000 0000025d ffbac000 ffbad700 ff1a3dec ff105a74 (XEN) ffbae080 ffbae318 00000001 ffbae080 ffbae080 00000000 ff1a3f8c ff104474 (XEN) ffbae080 ff1a3edc 00000000 ff117a4b ff1a3fb4 ff1a3e68 ff15d931 00000004 (XEN) 00000003 80000004 80000004 00000004 80000003 80000004 fc455838 00000000 (XEN) ff1a3fb4 ffbae080 ff15ddf0 80000003 ff1fa690 ff1fa480 ff1fa080 00000000 (XEN) b0000002 80000004 80000004 80000004 80000003 ff1a3ed8 ff139611 ff1fa088 (XEN) ff1a3ea8 00000008 80000004 fd8c8298 a0000000 00003902 00000000 fec08030 (XEN) 00000008 ff1a3ea8 fec08030 00000000 fc455830 cd604202 08dff21f 80000004 (XEN) fec08000 00000006 00003902 00003902 ff1fa080 cd60ffff 08dff21f 03902030 (XEN) 00000000 00000000 00000000 ff1a3ee4 00000009 03000000 00000002 08102424 (XEN) 00000000 081fca4c b78fc9fc 080e7f12 b7b44dfc ffffffff b7c70264 08102424 (XEN) 081d5a40 081fc8f4 b78fca8c 080a9ba7 081fc8f4 b7c72b20 b7c6257c 08074694 (XEN) b7c773e0 b7c76a44 b78fca3c 08074be5 b7c76a44 081fb970 0000001f 00000000 (XEN) a5dba235 a5dba1ee b7c9a420 b7f4639c b7b2feec b7b3c2ac 00000002 ff1a9d00 (XEN) ffffffff ff1a3f88 ff144869 ffffffea ffbe7080 ff1a3fac 00e5c037 ff15d31f (XEN) b78fc9c4 deadbeef deadbeef deadbeef deadbeef deadbeef c02ccb6e 00000007 (XEN) b78fc9c4 081fca50 081fca40 08102424 b7b44e18 ec448000 00000007 00070000 (XEN) c02ccb6e 00000061 00200246 ec449d38 00000069 0000007b 0000007b 00000000 (XEN) 00000033 00000000 ffbe7080 (XEN) Xen call trace: (XEN) [<ff10c67e>] gnttab_release_mappings+0x110/0x34d (XEN) [<ff105a74>] domain_kill+0x6d/0xa9 (XEN) [<ff104474>] do_dom0_op+0x73f/0x16d0 (XEN) [<ff15d31f>] hypercall+0x8f/0xaf (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) CPU0 FATAL TRAP: vector = 6 (invalid operand) (XEN) [error_code=0000] (XEN) **************************************** _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 23 Jan 2006, at 21:13, King, Steven R wrote:> After trials and tribulations getting my serial ports in order, I''m > finally able to report on a Xen crash I''ve been seeing. > > I have two domUs that share & map grant references with each other, > then > both domUs crash. Shortly thereafter, Xen crashes too. > > With dead domUs, the grant references they once held are left orphaned. > Xen isn''t provisioned to mop this up, hence the well placed > BUG(rd==NULL). Make any sense? I''d be over my head to submit a patch > for this, so I''m hoping for help.Let''s say domain A maps pages belonging to domain B (implying that B granted access to A). If B crashes then the domain should stick around as a zombie until all mappings of its pages by other domains have gone away. This should mean that if A crashes or dies and calls gnttab_release_mappings(), it should not be possible for B to have disappeared at that point, and find_domain_by_id() should succeed. Neither of your domains are running on shadow page tables, are they? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Keir, thanks for helping. I''m not using shadow tables. Before the Xen crash, Xen kills both of my domUs via arch/x86/mm.c line 614. Apparently my domUs are misbehaving by not explicitly unmapping granted pages. Because of the domain_crash() call in mm.c, no form of put_page() is ever called for the mapping. Is this info helpful? -steve -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser Sent: Tuesday, January 24, 2006 3:43 AM To: King, Steven R Cc: xen-devel Subject: Re: [Xen-devel] BUG grant_table.c line 939 On 23 Jan 2006, at 21:13, King, Steven R wrote:> After trials and tribulations getting my serial ports in order, I''m > finally able to report on a Xen crash I''ve been seeing. > > I have two domUs that share & map grant references with each other, > then both domUs crash. Shortly thereafter, Xen crashes too. > > With dead domUs, the grant references they once held are leftorphaned.> Xen isn''t provisioned to mop this up, hence the well placed > BUG(rd==NULL). Make any sense? I''d be over my head to submit a patch> for this, so I''m hoping for help.Let''s say domain A maps pages belonging to domain B (implying that B granted access to A). If B crashes then the domain should stick around as a zombie until all mappings of its pages by other domains have gone away. This should mean that if A crashes or dies and calls gnttab_release_mappings(), it should not be possible for B to have disappeared at that point, and find_domain_by_id() should succeed. Neither of your domains are running on shadow page tables, are they? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 24 Jan 2006, at 17:03, King, Steven R wrote:> Hi Keir, thanks for helping. > I''m not using shadow tables. Before the Xen crash, Xen kills both of > my > domUs via arch/x86/mm.c line 614. Apparently my domUs are misbehaving > by not explicitly unmapping granted pages. Because of the > domain_crash() call in mm.c, no form of put_page() is ever called for > the mapping. Is this info helpful? > -steveThe domain is not immediately killed when it crashes, and should still be returned by find_domain_by_id() for example. When this crashed domain is finally killed by the control tools, it will then have all its page tables destructed, and that should find and destroy and foreign mappings (calling put_page as appropriate). See domain_relinquish_resources(). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Kier, The destroyed domain does linger as a zombie, just not quite long enough to avoid the crash. ;^) The sequence of events: 1) domA shares page with domB 2) domB shares pages with domA 3) domA maps domB''s page 4) domB maps domA''s page 5) DomA crashes, is torn-down and lingers as a zombie. No BUG_ON(rd =NULL) occurs here, since the rd is domB which is still healthy. 6) DomB crashes, Xen starts the tear-down. 7) Xen calls domain_destroy() on domA !!!!! 8) Xen crashes in gnttab_release_mappings() with the BUG_ON(rd == NULL) So, why does Xen call domain_destroy() on domA? I but a BUG_ON(1) in domain_destroy(), and lo, the console produced the output below. My guess is that domA is destroyed via some callback waiting for its zombified mapping to be released. Thanks, -steve (XEN) domain_destroy() ENTER (XEN) BUG at domain.c:274 (XEN) CPU: 0 (XEN) EIP: e008:[<ff105f48>] domain_destroy+0x72/0x197 (XEN) EFLAGS: 00210282 CONTEXT: hypervisor (XEN) eax: ff192358 ebx: ffbe2080 ecx: 00000000 edx: 00000000 (XEN) esi: ffbe2080 edi: ff1a3fac ebp: ff1a3aac esp: ff1a3a84 (XEN) cr0: 80050033 cr3: d8c81000 (XEN) ds: e010 es: e010 fs: 0000 gs: 0033 ss: e010 cs: e008 (XEN) Xen stack trace from esp=ff1a3a84: (XEN) ff1802ed ff1802e4 00000112 ff113996 fd39f150 ff192228 fd21f9c0 ff1132e5 (XEN) ff192224 011a3fb4 ff1a3aec ff11339d ffbe2080 ff192228 00000000 fd2b3f40 (XEN) 00000000 ffbae080 00000000 00000001 00000000 ffbe2080 00000001 00000001 (XEN) ffbe7080 00000007 ff1a3b5c ff13b12f fd39f150 00000000 00000001 00000001 (XEN) 00000004 00000004 00000001 00000001 00000004 00000000 00000001 fd39f158 (XEN) 00000000 ff1fa588 ff1a3b8c ff15dcb3 ff1fa488 f0000001 00000001 00000001 (XEN) 00000001 00000000 00000001 00000001 00000001 00000000 ff1a3b8c ff134e51 (XEN) fd39f150 ff1fa6b8 ff1fa480 ff1fa080 00000000 00000000 000003a4 ffbe2080 (XEN) fd39f150 000a6a0e ff1a3bac ff13529f a6a0e861 ffbae080 00000182 fefa4000 (XEN) 000a5baf ffbae080 ff1a3bdc ff135c0c fd389868 00000000 ff1cab18 ffbe7e80 (XEN) 00000000 ffbae080 20000000 00000001 ffbe7080 ffbae080 ff1a3c4c ff135e44 (XEN) fd389868 333c0001 ff1a3c1c 00000082 00000004 00000004 333c0001 333c0001 (XEN) 00000004 233c0001 333c0001 fd389878 ff1fa488 ff1fa514 00000364 ff1a3c30 (XEN) ff1304f4 00000003 00000003 ff1a3c50 95ca0063 233b0001 333c0001 333c0001 (XEN) 333c0001 333c0000 ff1a3c5c ff13b25c fd389868 00000000 ff1a3c6c ff134e97 (XEN) fd389868 00000007 ff1a3c8c ff135318 a5baf067 00095ca0 ff1a3c8c 0000033c (XEN) fef64000 00095ca0 ff1a3cbc ff135c19 fd20af00 00000042 00000043 ff12f2ea (XEN) ff1a3ca8 ffbe7080 40000000 ff1a3cc0 ff13f9a0 ffbae080 ff1a3d2c ff135e44 (XEN) fd20af00 57ff0001 ff19350c ff1a3ce0 00000004 00000004 57ff0001 57ff0001 (XEN) 00000004 47ff0001 57ff0001 fd20af10 00000004 00000001 00000002 fd20af20 (XEN) Xen call trace: (XEN) [<ff105f48>] domain_destroy+0x72/0x197 (XEN) [<ff11339d>] free_domheap_pages+0x462/0x469 (XEN) [<ff13b12f>] put_page+0xdd/0xdf (XEN) [<ff134e51>] put_page_from_l1e+0x113/0x115 (XEN) [<ff13529f>] free_l1_table+0x65/0x79 (XEN) [<ff135c0c>] free_page_type+0x17a/0x1e1 (XEN) [<ff135e44>] put_page_type+0x1d1/0x2b0 (XEN) [<ff13b25c>] put_page_and_type+0x11/0x1e (XEN) [<ff134e97>] put_page_from_l2e+0x44/0x46 (XEN) [<ff135318>] free_l2_table+0x65/0x79 (XEN) [<ff135c19>] free_page_type+0x187/0x1e1 (XEN) [<ff135e44>] put_page_type+0x1d1/0x2b0 (XEN) [<ff12d230>] put_page_and_type+0x11/0x1e (XEN) [<ff12ca99>] relinquish_memory+0xa4/0x1ea (XEN) [<ff12cd8c>] domain_relinquish_resources+0x1ad/0x1af (XEN) [<ff105a69>] domain_kill+0x62/0xa9 (XEN) [<ff104474>] do_dom0_op+0x73f/0x16d0 (XEN) [<ff15d35f>] hypercall+0x8f/0xaf (XEN) -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser Sent: Wednesday, January 25, 2006 1:51 AM To: King, Steven R Cc: xen-devel Subject: Re: [Xen-devel] BUG grant_table.c line 939 On 24 Jan 2006, at 17:03, King, Steven R wrote:> Hi Keir, thanks for helping. > I''m not using shadow tables. Before the Xen crash, Xen kills both of > my domUs via arch/x86/mm.c line 614. Apparently my domUs are > misbehaving by not explicitly unmapping granted pages. Because of the > domain_crash() call in mm.c, no form of put_page() is ever called for > the mapping. Is this info helpful? > -steveThe domain is not immediately killed when it crashes, and should still be returned by find_domain_by_id() for example. When this crashed domain is finally killed by the control tools, it will then have all its page tables destructed, and that should find and destroy and foreign mappings (calling put_page as appropriate). See domain_relinquish_resources(). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 25 Jan 2006, at 21:07, King, Steven R wrote:> So, why does Xen call domain_destroy() on domA? I but a BUG_ON(1) in > domain_destroy(), and lo, the console produced the output below. My > guess is that domA is destroyed via some callback waiting for its > zombified mapping to be released.Ah, I think I see the problem. Can you please try switching the order of the calls to domain_relinquish_resources() and gnttab_release_mappings() in common/domain.c:domain_kill()? i.e., the new order should be: gnttab_release_mappings(d); domain_relinquish_resources(d); I think that should fix the crash. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thanks Keir, that fixed it!! So what was the problem? Was a callback waiting to destroy domA? -----Original Message----- From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk] Sent: Wednesday, January 25, 2006 3:22 PM To: King, Steven R Cc: xen-devel Subject: Re: [Xen-devel] BUG grant_table.c line 939 On 25 Jan 2006, at 21:07, King, Steven R wrote:> So, why does Xen call domain_destroy() on domA? I but a BUG_ON(1) in > domain_destroy(), and lo, the console produced the output below. My > guess is that domA is destroyed via some callback waiting for its > zombified mapping to be released.Ah, I think I see the problem. Can you please try switching the order of the calls to domain_relinquish_resources() and gnttab_release_mappings() in common/domain.c:domain_kill()? i.e., the new order should be: gnttab_release_mappings(d); domain_relinquish_resources(d); I think that should fix the crash. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 25 Jan 2006, at 23:54, King, Steven R wrote:> Thanks Keir, that fixed it!! > > So what was the problem? Was a callback waiting to destroy domA?Host mappings via grant tables are refcounted via the mapping PTEs, not directly via the maptrack table. So if we destroy pagetables first, we can drop all refcounts before we call the gnttab_release_mappings(), and the maptrack can have dangling domain id''s at that point. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel