Simon Graham
2010-Apr-15 18:36 UTC
[Xen-devel] Re: [Xen-users] rebased openSUSE Xen dom0 Patches
On Tue, Apr 06, 2010 at 02:37:35PM +0100, Andrew Lyon wrote:> I''ve uploaded updated 2.6.31 and 2.6.32 rebased openSUSE Xen dom0 > patches and ebuilds to > http://code.google.com/p/gentoo-xen-kernel/downloads/list > > Notable change is that both include the online resize feature recently > posted to xen-devel. >We''re currently testing these patches with the Ubuntu 10.4 kernel and Xen 3.4.2 and have encountered some problems in the Xen mm code shortly after Dom0 starts to run. The typical failure is that Dom0 is trying to write to a page table (a L4 PT in the example below) but finds the page it is writing to is a PGT_writable_page ... (XEN) mm.c:2410:d0 Bad type (saw e800000000000001 != exp 8000000000000000) for mfn 0x114702 (pfn 0x702) (XEN) mm.c:2802:d0 Error while pinning mfn 114702 [ 18.549043] HYPERVISOR_mmuext_op failed: pgd ffff880000703000 cmd0=3 mfn0=114702 cmd1=3 mfn1=114703 err=-22 [ 18.549689] ------------[ cut here ]------------ [ 18.549975] kernel BUG at /sandbox/orc-tree-10.4/orc-kernel/linux-2.6.32/arch/x86/mm/hypervisor.c:680! [ 18.550542] invalid opcode: 0000 [#1] SMP Another common failure is shown below. Dom0 expects a page to be PGT_l1_page_table but instead finds the page''s type is PGT_writable_page: (XEN) mm.c:2410:d0 Bad type (saw e800000000000001 != exp 2000000000000000) for mfn 0x114711 (pfn 0x711) (XEN) mm.c:2413:d0 Writable page alloc''d from ptwr_do_page_fault:4481 (XEN) mm.c:821:d0 Attempt to create linear p.t. with write perms [ 21.159495] HYPERVISOR_multicall_check failed on call 1 rc 4294967274 [ 21.159498] HYPERVISOR_multicall_check failed page=ffff8800020bbc00 pfn=1808 [ 21.159503] ------------[ cut here ]------------ [ 21.159505] kernel BUG at /sandbox/orc-tree-10.4/orc-kernel/linux-2.6.32/arch/x86/mm/hypervisor.c:480! It is not clear whether Dom0 is erroneously using pages that are and truly should be writable, or whether somehow pages have erroneously become writable; if anyone can suggest likely places to look for problems that would be awesome! Simon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-15 18:41 UTC
Re: [Xen-devel] Re: [Xen-users] rebased openSUSE Xen dom0 Patches
On 15/04/2010 19:36, "Simon Graham" <simon.graham@virtualcomputer.com> wrote:> It is not clear whether Dom0 is erroneously using pages that are and truly > should be writable, or whether somehow pages have erroneously become writable; > if anyone can suggest likely places to look for problems that would be > awesome!Typically it means that the guest kernel hasn''t managed to zap all of its writable mappings of a page before assigning it for use as a page table. That doesn''t immediately point to the bug, unfortunately. ;-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Apr-16 07:58 UTC
[Xen-devel] Re: [Xen-users] rebased openSUSE Xen dom0 Patches
>>> "Simon Graham" <simon.graham@virtualcomputer.com> 15.04.10 20:36 >>> >On Tue, Apr 06, 2010 at 02:37:35PM +0100, Andrew Lyon wrote: >> I''ve uploaded updated 2.6.31 and 2.6.32 rebased openSUSE Xen dom0 >> patches and ebuilds to >> http://code.google.com/p/gentoo-xen-kernel/downloads/list >> >> Notable change is that both include the online resize feature recently >> posted to xen-devel. >> > >We''re currently testing these patches with the Ubuntu 10.4 kernel and Xen 3.4.2 and have encountered some problems in the Xen mm code shortly after Dom0 starts to run. > >The typical failure is that Dom0 is trying to write to a page table (a L4 PT in the example below) but finds the page it is writing to is a PGT_writable_page > >... >Another common failure is shown below. Dom0 expects a page to be PGT_l1_page_table but instead finds the page''s type is PGT_writable_page:Well, we certainly aren''t experiencing anything like that with our "original" version of those patches, and I would suppose Andy didn''t see such either. Hence perhaps a problem that got introduced by how you made use of the patches and/or some specifics of the source tree you applied them to? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Simon Graham
2010-Apr-16 13:42 UTC
RE: [Xen-devel] Re: [Xen-users] rebased openSUSE Xen dom0 Patches
Thanks Jan,> >... > >Another common failure is shown below. Dom0 expects a page to be > PGT_l1_page_table but instead finds the page''s type is > PGT_writable_page: > > Well, we certainly aren''t experiencing anything like that with our > "original" version of those patches, and I would suppose Andy didn''t > see such either. Hence perhaps a problem that got introduced > by how you made use of the patches and/or some specifics of the > source tree you applied them to? >The patches seem to apply cleanly to the Ubuntu 10.4 kernel source tree (but I agree that this might be the problem)... We''ve actually narrowed the problem down a bit -- the pages we fail on are always in the range of those freed by free_init_pages("unused kernel memory") from free_initmem(). Now, the specific problem is that a writable page cant be turned into a page table page because it''s page type ref count is non-zero -- I see in the free_init_pages() routine that two hypercalls are made for each page, one of which sets the pte to zero (which would decrement the page type ref count I think) and one of which does not -- doesn''t this leave the page type ref count at 1 which in turn means the page cant be turned into a page table page? Or is there some other magic that occurs later on that should decrement the page type ref count before attempting to use the page as a page table page? Here''s the extract of the code I am talking about (yes, we are using a 64-bit Dom0): printk(KERN_INFO "Freeing %s: %luk freed\n", what, (end - begin)>> 10);for (; addr < end; addr += PAGE_SIZE) { ClearPageReserved(virt_to_page(addr)); init_page_count(virt_to_page(addr)); memset((void *)(addr & ~(PAGE_SIZE-1)), POISON_FREE_INITMEM, PAGE_SIZE); #ifdef CONFIG_X86_64 if (addr >= __START_KERNEL_map) { /* make_readonly() reports all kernel addresses. */ if (HYPERVISOR_update_va_mapping((unsigned long)__va(__pa(addr)), pfn_pte(__pa(addr) >> PAGE_SHIFT, PAGE_KERNEL), 0)) BUG(); if (HYPERVISOR_update_va_mapping(addr, __pte(0), 0)) BUG(); } #endif ... Simon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Apr-19 08:41 UTC
RE: [Xen-devel] Re: [Xen-users] rebased openSUSE Xen dom0 Patches
>>> "Simon Graham" <simon.graham@virtualcomputer.com> 16.04.10 15:42 >>> >We''ve actually narrowed the problem down a bit -- the pages we fail on >are always in the range of those freed by free_init_pages("unused kernel >memory") from free_initmem(). Now, the specific problem is that a >writable page cant be turned into a page table page because it''s page >type ref count is non-zero -- I see in the free_init_pages() routine >that two hypercalls are made for each page, one of which sets the pte to >zero (which would decrement the page type ref count I think) and one of >which does not -- doesn''t this leave the page type ref count at 1 which >in turn means the page cant be turned into a page table page? Or is >there some other magic that occurs later on that should decrement the >page type ref count before attempting to use the page as a page table >page?Are you observing this with both the .31 and .32 patches?>Here''s the extract of the code I am talking about (yes, we are using a >64-bit Dom0): >...But that code is precisely what guarantees that the pages *can* be converted to page table pages (by completely unmapping them from the kernel image part of the address space). So your explanation is rather confusing than clarifying to me... Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Simon Graham
2010-Apr-19 14:52 UTC
RE: [Xen-devel] Re: [Xen-users] rebased openSUSE Xen dom0 Patches
> >in turn means the page cant be turned into a page table page? Or is > >there some other magic that occurs later on that should decrement the > >page type ref count before attempting to use the page as a page table > >page? > > Are you observing this with both the .31 and .32 patches? >We''re only testing the .32 patches.> >Here''s the extract of the code I am talking about (yes, we are usinga> >64-bit Dom0): > >... > > But that code is precisely what guarantees that the pages *can* be > converted to page table pages (by completely unmapping them from > the kernel image part of the address space). So your explanation is > rather confusing than clarifying to me...I agree that that is the intent of this code -- what we _seem_ to observe (and this is hard to prove) is that the page type ref count is not being decremented by this code which would imply that the unmapping is not happening for some reason. The only real evidence I have for this is that the failure always occurs on one of these pages. Now, the first of these hypercalls creates a pte with PAGE_KERNEL as the opts and I think this includes read-write access whereas the second one completely deletes the pte for the alternate mapping -- the combined affect should leave the page type ref count as one shouldn''t it? (for the read-write kernel mapping) That being the case, I''m not sure how the page type ref count is supposed to get to zero when reusing one of these pages as a page table page later on. Thanks for your help Simon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Apr-19 15:09 UTC
RE: [Xen-devel] Re: [Xen-users] rebased openSUSE Xen dom0 Patches
>>> "Simon Graham" <simon.graham@virtualcomputer.com> 19.04.10 16:52 >>> >Now, the first of these hypercalls creates a pte with PAGE_KERNEL as the >opts and >I think this includes read-write access whereas the second one >completely deletes >the pte for the alternate mapping -- the combined affect should leave >the page type >ref count as one shouldn''t it? (for the read-write kernel mapping) > >That being the case, I''m not sure how the page type ref count is >supposed to get to >zero when reusing one of these pages as a page table page later on.Such pages get converted to read-only when they get allocated for the purpose of being a page table. Hence there being a type refcount of 1 at the point of the insertion means that there''s a second mapping to the page somewhere else. That''s what you want to find. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Simon Graham
2010-Apr-20 16:07 UTC
RE: [Xen-devel] Re: [Xen-users] rebased openSUSE Xen dom0 Patches
> > > > But that code is precisely what guarantees that the pages *can* be > > converted to page table pages (by completely unmapping them from > > the kernel image part of the address space). So your explanation is > > rather confusing than clarifying to me... > > I agree that that is the intent of this code -- what we _seem_ to > observe (and this > is hard to prove) is that the page type ref count is not being > decremented by this > code which would imply that the unmapping is not happening for some > reason. The only > real evidence I have for this is that the failure always occurs on one > of these pages. >We now think we''ve found the problem which seems to be due to the following two calls in Linux within mark_rodata_ro(): free_init_pages("unused kernel memory", (unsigned long) page_address(virt_to_page(text_end)), (unsigned long) page_address(virt_to_page(rodata_start))); free_init_pages("unused kernel memory", (unsigned long) page_address(virt_to_page(rodata_end)), (unsigned long) page_address(virt_to_page(data_start))); The first of these calls is trying to free the range page_address(virt_to_page(text_end)) through page_address(virt_to_page(rodata_start)). With text_end == 0xffffffff80610000 and rodata_start =0xffffffff80800000 the actual values received by free_init_pages() are 0xffff880000610000 and 0xffff880000800000 (i.e. within the 64-bit direct mapping region). In free_init_pages() there is a test of addr >= __start_kernel_map (which is 0xffffffff80000000). Because of this test, the two calls to HYPERVISOR_update_va_mapping() are not made. The net effect (we believe) is that this range of pages is freed from Linux''s viewpoint but the pages are still marked as PGT_writable_page with a non-zero page type ref count in the hypervisor. When Linux tries to use these pages later on for page table pages, the hypervisor traps. Note, we have traced all uses of the pages in question. Apparently they are never used by Linux prior to the trap. Our traces show them being initialized in the hypervisor by construct_dom0(), marked as readonly in Linux by mark_rodata_ro() and then causing the hypervisor trap when Linux tries to use one them for a page tables. Presumably the correct fix will be to change the address range test in free_init_pages... Simon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Apr-20 19:01 UTC
Re: [Xen-devel] Re: [Xen-users] rebased openSUSE Xen dom0 Patches
On Tue, Apr 20, 2010 at 11:07:54AM -0500, Simon Graham wrote:> > > > > > But that code is precisely what guarantees that the pages *can* be > > > converted to page table pages (by completely unmapping them from > > > the kernel image part of the address space). So your explanation is > > > rather confusing than clarifying to me... > > > > I agree that that is the intent of this code -- what we _seem_ to > > observe (and this > > is hard to prove) is that the page type ref count is not being > > decremented by this > > code which would imply that the unmapping is not happening for some > > reason. The only > > real evidence I have for this is that the failure always occurs on one > > of these pages. > > > > We now think we''ve found the problem which seems to be due to the > following two calls in Linux within mark_rodata_ro(): > > free_init_pages("unused kernel memory", > (unsigned long) > page_address(virt_to_page(text_end)), > (unsigned long) > page_address(virt_to_page(rodata_start))); > free_init_pages("unused kernel memory", > (unsigned long) > page_address(virt_to_page(rodata_end)), > (unsigned long) > page_address(virt_to_page(data_start))); > > The first of these calls is trying to free the range > page_address(virt_to_page(text_end)) through > page_address(virt_to_page(rodata_start)). > > With text_end == 0xffffffff80610000 and rodata_start => 0xffffffff80800000 the actual values received by free_init_pages() are > 0xffff880000610000 and 0xffff880000800000 (i.e. within the 64-bit direct > mapping region). > > In free_init_pages() there is a test of addr >= __start_kernel_map > (which is 0xffffffff80000000). Because of this test, the two calls to > HYPERVISOR_update_va_mapping() are not made. > > The net effect (we believe) is that this range of pages is freed from > Linux''s viewpoint but the pages are still marked as PGT_writable_page > with a non-zero page type ref count in the hypervisor. When Linux tries > to use these pages later on for page table pages, the hypervisor traps. > > Note, we have traced all uses of the pages in question. Apparently they > are never used by Linux prior to the trap. Our traces show them being > initialized in the hypervisor by construct_dom0(), marked as readonly in > Linux by mark_rodata_ro() and then causing the hypervisor trap when > Linux tries to use one them for a page tables.Oh man, I remember this one. I submitted an initial patch for this. https://patchwork.kernel.org/patch/79086/> > Presumably the correct fix will be to change the address range test in > free_init_pages...And this was the final fix: http://marc.info/?l=linux-kernel&m=126652277705569&w=2 The end result was that the a different mechanism to get the kernel address and use that to set the _PAGE_RW on them. And ignore the other mapping. I think, this has been some time ago. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel