Chris Lalancette
2007-Jan-09 21:16 UTC
[Xen-devel] Tracking "Cannot allocate memory" error in shadow_alloc_p2m_table
Hello, I''ve been working on tracking down a "Cannot allocate memory" error when trying to start FV domains. I finally found a 16GB box where I could reliably reproduce the problem. By turning on trace debugging and adding a few of my own prints, I was able to see that the error came from this code path: shadow_alloc_p2m_table() -> shadow_set_p2m_entry -> p2m_next_level -> p2m_find_entry (all in arch/x86/mm/shadow/common.c). What''s happening is that the gfn_remainder passed into p2m_find_entry is something like 0x3a3815 which, when shifted by shift (which would happen to be 18 in the case of the 3rd-level page table in i686 PAE), it would end up being larger than the max (which is 8), and hence causing the failure. Looking deeper into it, there is something I really don''t understand, though. shadow_alloc_p2m_table fetches each page in the page_list, then gets the page struct, then converts to mfn using page_to_mfn, and finally gets the gfn using get_gpfn_from_mfn. get_gpfn_from_mfn is defined in include/asm-x86/mm.h, and looks to be just pulling the mfn -> gpfn mapping out of the machine_to_phys_mapping table. The problem, as I see it, is that no one ever put a valid entry into machine_to_phys_mapping, so the data returned there is bogus. Once I commented out this section of code in shadow_alloc_p2m_table(): /* for ( entry = d->page_list.next; entry != &d->page_list; entry = entry->next ) { page = list_entry(entry, struct page_info, list); mfn = page_to_mfn(page); gfn = get_gpfn_from_mfn(mfn_x(mfn)); page_count++; if ( #ifdef __x86_64__ (gfn != 0x5555555555555555L) #else (gfn != 0x55555555L) #endif && gfn != INVALID_M2P_ENTRY && !shadow_set_p2m_entry(d, gfn, mfn) ) goto error; } */ I was able to successfully start some FV domains. 1) Am I missing something here? Is there some sort of initialization of the machine_to_phys_mapping table that I missed? 2) Why do we need the above code during the creation of a new domain? I would think there aren''t any valid pages in the page_list we would have to worry about at that point. Of course, I am by no means a shadow table expert, so I''m sure I''m missing something. Any ideas? Thanks, Chris Lalancette _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2007-Jan-10 11:32 UTC
Re: [Xen-devel] Tracking "Cannot allocate memory" error in shadow_alloc_p2m_table
Hi, At 16:16 -0500 on 09 Jan (1168359405), Chris Lalancette wrote:> What''s happening is that the gfn_remainder passed into p2m_find_entry > is something like 0x3a3815 which, when shifted by shift (which would > happen to be 18 in the case of the 3rd-level page table in i686 PAE), > it would end up being larger than the max (which is 8), and hence > causing the failure.Are you giving this domain more than 7GB of RAM on a PAE hypervisor?> 1) Am I missing something here? Is there some sort of initialization of the machine_to_phys_mapping > table that I missed?The code you cut out is start-of-day code that builds the p2m map of a domain from the m2p entries of its allocated pages. It was needed originally because the domain builder tools would set up the guest''s memory and m2p mapping and _then_ enable shadow-translate mode. It may not be necessary now that HVM domains are put in shadow mode at creation time. AFAICS, either the domain should have no memory assigned yet (in which case it does nothing), or the domain''s pages should have m2p entries that are valid, explicitly invalid, or set to the "debug pattern" of all 5s. I''ll look at a more sensible failure mode then -ENOMEM, though. Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Chris Lalancette
2007-Jan-10 15:41 UTC
Re: [Xen-devel] Tracking "Cannot allocate memory" error in shadow_alloc_p2m_table
Tim Deegan wrote:> Hi, > > At 16:16 -0500 on 09 Jan (1168359405), Chris Lalancette wrote: > >>What''s happening is that the gfn_remainder passed into p2m_find_entry >>is something like 0x3a3815 which, when shifted by shift (which would >>happen to be 18 in the case of the 3rd-level page table in i686 PAE), >>it would end up being larger than the max (which is 8), and hence >>causing the failure. > > > Are you giving this domain more than 7GB of RAM on a PAE hypervisor?Yes, the Hypervisor is i686 PAE, but no, the FV domain that I am trying to start up only has 512MB of memory.> > >>1) Am I missing something here? Is there some sort of initialization of the machine_to_phys_mapping >>table that I missed? > > > The code you cut out is start-of-day code that builds the p2m map of a > domain from the m2p entries of its allocated pages. It was needed > originally because the domain builder tools would set up the guest''s > memory and m2p mapping and _then_ enable shadow-translate mode. It may > not be necessary now that HVM domains are put in shadow mode at creation > time. > > AFAICS, either the domain should have no memory assigned yet (in which > case it does nothing), or the domain''s pages should have m2p entries > that are valid, explicitly invalid, or set to the "debug pattern" of all > 5s. I''ll look at a more sensible failure mode then -ENOMEM, though.Ah, that makes sense. Yeah, I was thinking that the other way to solve this was to initialize the whole machine_to_phys_mapping table to something sensible, but then I just didn''t even see why it needed to be done at this early stage, so I took the code out. Either way is fine with me...I would just like to see the FV domains start working on this large memory machine :). Thanks, Chris Lalancette _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Rostedt
2007-Jan-12 05:13 UTC
Re: [Xen-devel] Tracking "Cannot allocate memory" error in shadow_alloc_p2m_table
Tim Deegan wrote:> The code you cut out is start-of-day code that builds the p2m map of a > domain from the m2p entries of its allocated pages. It was needed > originally because the domain builder tools would set up the guest''s > memory and m2p mapping and _then_ enable shadow-translate mode. It may > not be necessary now that HVM domains are put in shadow mode at creation > time. > > AFAICS, either the domain should have no memory assigned yet (in which > case it does nothing), or the domain''s pages should have m2p entries > that are valid, explicitly invalid, or set to the "debug pattern" of all > 5s. I''ll look at a more sensible failure mode then -ENOMEM, though. >OK, I''ve been debugging the hell out of this one. And I don''t see where the m2p mapping is updated for the guest yet! I put in prints to show me where the m2p is updated for the guest and this seems to have shown me that the m2p mapping doesn''t know about the new guest at this point. I used my internal log tracer (ring buffer) to map output, and this is what it showed me. Note: I''m using RHEL5 HV for this, but it goes the same for Xen. (XEN) [ 98.395925] cpu:0 share_xen_page_with_guest:264 guest 1 mapping pages page=f6811280 mfn=2928 from:ff106fa2 (XEN) [ 98.395927] cpu:0 share_xen_page_with_guest:264 guest 1 mapping pages page=f6811298 mfn=2929 from:ff106fa2 (XEN) [ 98.395928] cpu:0 share_xen_page_with_guest:264 guest 1 mapping pages page=f68112b0 mfn=2930 from:ff106fa2 (XEN) [ 98.395929] cpu:0 share_xen_page_with_guest:264 guest 1 mapping pages page=f68112c8 mfn=2931 from:ff106fa2 (XEN) [ 98.395936] cpu:0 share_xen_page_with_guest:264 guest 1 mapping pages page=f68112f8 mfn=2933 from:ff11db10 The above is not filtered (meaning that I didn''t put in any limit to the amount of prints it will make, or snip it from this email) so the above is the only calls to share_xen_page_with_guest (that updates m2p mapping) before . These show me that this was called by grant_table_create (0xff106fa2), and arch_domain_create (0xff11db10). (XEN) [ 99.008441] cpu:1 set_sh_allocation:1201 alloc_domheap_pages (1) pages=768 (XEN) [ 99.008444] cpu:1 set_sh_allocation:1205 arch total_pages=0 The above shows that we are going to call alloc_domheap_pages next (768 pages to allocate) from set_sh_allocation, and the total_pages currently set by the domain is zero. This only allocates the pages for the domain (puts it on its d->page_list), but it does not modify the m2p mapping here. Perhaps we should have this invalidate the m2p mapping for the page? (XEN) [ 99.054858] cpu:0 shadow_alloc_p2m_table:1040 page=f7049e70 gfn=2669a mfn=362138 cnt=0 (XEN) [ 99.054861] cpu:0 shadow_alloc_p2m_table:1040 page=f73b3ba8 gfn=752e1 mfn=511271 cnt=1 (XEN) [ 99.054867] cpu:0 shadow_alloc_p2m_table:1040 page=f7049e58 gfn=26699 mfn=362137 cnt=2 (XEN) [ 99.054868] cpu:0 shadow_alloc_p2m_table:1040 page=f7049e40 gfn=26698 mfn=362136 cnt=3 (XEN) [ 99.054869] cpu:0 shadow_alloc_p2m_table:1040 page=f73b3c98 gfn=752d7 mfn=511281 cnt=4 The above is next, but I do filter it, since it prints out for every page. mfn is in decimal (don''t ask why, it was my first default) gfn''s will be in hex, and page is the pointer. This is the code that we are having a problem with. Basically, we are updating the shadow page tables with bogus memory! Unless we wanted the 5 pages that were mapped for the grant table, and the one for the arch setup, but I don''t even think those are in this. Why do we need that code, it''s using stale data, and updating the shadow page tables with a m2p mapping that is from a older domain. See more below: [...] (XEN) [ 99.054911] cpu:0 shadow_alloc_p2m_table:1040 page=f73b2000 gfn=2000 mfn=510976 cnt=49 (XEN) [ 99.076309] cpu:0 shadow_guest_physmap_add_page:2901 guest 1 calling sh_p2m_remove_page ogfn=2669a (XEN) [ 99.076317] cpu:0 sh_p2m_remove_page:2832 guest 1 mapping pages invalid(XEN) [ 99.076319] cpu:0 shadow_guest_physmap_add_page:2912 guest 1 mapping pages gfn=0 mfn=362138 from:ff12d10f (XEN) [ 99.076321] cpu:0 shadow_guest_physmap_add_page:2901 guest 1 calling sh_p2m_remove_page ogfn=752e1 (XEN) [ 99.076322] cpu:0 sh_p2m_remove_page:2832 guest 1 mapping pages invalid(XEN) [ 99.076323] cpu:0 shadow_guest_physmap_add_page:2912 guest 1 mapping pages gfn=1 mfn=511271 from:ff12d10f (XEN) [ 99.076324] cpu:0 shadow_guest_physmap_add_page:2901 guest 1 calling sh_p2m_remove_page ogfn=26699 (XEN) [ 99.076326] cpu:0 sh_p2m_remove_page:2832 guest 1 mapping pages invalid(XEN) [ 99.076327] cpu:0 shadow_guest_physmap_add_page:2912 guest 1 mapping pages gfn=2 mfn=362137 from:ff12d10f Finally we are calling shadow_guest_physmap_add_page, and this is called from ff12d10f or do_mmu_update which I believe (correct me if I''m wrong) is being called by xend and friends. First thing that is done, is that it *removes* the m2p mapping (sh_p2m_remove_page). It actually does both, remove the shadow mapping that we set up before, and it invalidates the m2p mapping. We then set up this page to be the correct shadow mapping, with the call to shadow_guest_physmap_add_page and this also updates to the correct m2p mapping. So in conclusion, I don''t see why we need the code that does this bogus mapping, and that the patch that Chris proposed, in removing that code looks like the better approach. If you think I''m wrong, please feel free to convince me. I''ve been wrong before, but I don''t thing that we should have code that just undoes itself. Oh, I also tried this logging after removing the code, and I got this at the end: (XEN) [ 52.764736] cpu:1 shadow_guest_physmap_add_page:2914 guest 1 mapping pages gfn=0 mfn=346470 from:ff12d10f (XEN) [ 52.764739] cpu:1 shadow_guest_physmap_add_page:2914 guest 1 mapping pages gfn=1 mfn=346467 from:ff12d10f (XEN) [ 52.764741] cpu:1 shadow_guest_physmap_add_page:2914 guest 1 mapping pages gfn=2 mfn=346465 from:ff12d10f (XEN) [ 52.764742] cpu:1 shadow_guest_physmap_add_page:2914 guest 1 mapping pages gfn=3 mfn=346462 from:ff12d10f (XEN) [ 52.764744] cpu:1 shadow_guest_physmap_add_page:2914 guest 1 mapping pages gfn=4 mfn=346402 from:ff12d10f (XEN) [ 52.764749] cpu:1 shadow_guest_physmap_add_page:2914 guest 1 mapping pages gfn=5 mfn=346426 from:ff12d10f Notice that there''s no effort in removing the pages before setting them up. -- Steve _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2007-Jan-12 11:07 UTC
Re: [Xen-devel] Tracking "Cannot allocate memory" error in shadow_alloc_p2m_table
At 00:13 -0500 on 12 Jan (1168560825), Steven Rostedt wrote:> Why do we need that code, it''s using stale data, and updating the shadow > page tables with a m2p mapping that is from a older domain.The domain builder for translated PV guests still populates the physmap before enabling shadow-translate mode, so needs this init code. Now that paravirt-ops kernels use proper mmu operations that can probably be fixed up. In the meantime, cset 13326:dc0638bb4628 of xen-unstable should have stopped it from killing the guest if it finds an entry that''s too big. Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Rostedt
2007-Jan-12 13:05 UTC
Re: [Xen-devel] Tracking "Cannot allocate memory" error in shadow_alloc_p2m_table
Tim Deegan wrote:> At 00:13 -0500 on 12 Jan (1168560825), Steven Rostedt wrote: >> Why do we need that code, it''s using stale data, and updating the shadow >> page tables with a m2p mapping that is from a older domain. > > The domain builder for translated PV guests still populates the physmap > before enabling shadow-translate mode, so needs this init code. Now > that paravirt-ops kernels use proper mmu operations that can probably be > fixed up. > > In the meantime, cset 13326:dc0638bb4628 of xen-unstable should have > stopped it from killing the guest if it finds an entry that''s too big. >Is there a way to tell if this is a translated PV guest? I''d rather not do this code at all if it is not needed. We are mapping pages into a guess from junk data. I hate to find out later on that something is missed, and we allow for the guest to have access to memory it should not have. -- Steve _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Rostedt
2007-Jan-12 14:36 UTC
Re: [Xen-devel] Tracking "Cannot allocate memory" error in shadow_alloc_p2m_table
OK, Herbert has been looking into this too, and he''s discovered that it''s a problem with the RHEL5 snapshot. The allocation of the shadow_enable is done later on in RHEL5 and thus the d->page_list is not empty, where as in the xen-unstable/testing version, shadow_enable is called in the hypervisor, so it is not affected by this bug. Thanks, -- Steve _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2007-Jan-12 14:37 UTC
Re: [Xen-devel] Tracking "Cannot allocate memory" error in shadow_alloc_p2m_table
At 08:05 -0500 on 12 Jan (1168589116), Steven Rostedt wrote:> Is there a way to tell if this is a translated PV guest? I''d rather not > do this code at all if it is not needed.It just needs the PV loader to be tweaked, and we can drop it entirely.> We are mapping pages into a > guess from junk data. I hate to find out later on that something is > missed, and we allow for the guest to have access to memory it should > not have.That''s not a risk. The MFNs we''re putting into the p2m map are definitely correct: they''re taken from the domain''s page list. It''s just that they''re going in at the wrong guest-physical addresses. Tim. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel