I''m trying to track down some strange behaviour I see when doing live migration. Looking at the code that copies the pages across in xen- unstable.hg/tools/libxc/xc_domain_save.c I see there''s a test that avoids doing anything with pages that are MMIO if it''s an HVM domain: /* Skip PFNs that aren''t really there */ if ( hvm && ((n >= 0xa0 && n < 0xc0) /* VGA hole */ || (n >= (HVM_BELOW_4G_MMIO_START >> PAGE_SHIFT) && n < (1ULL<<32) >> PAGE_SHIFT)) /* MMIO */ ) continue; However there''s no equivalent test for a non-HVM domain. I think adding such a test to just ignore any MMIO pages would help. The MMIO pages will be unmapped when the live migration finally suspends the domain (and so the migration should be possible), but accessing them in the mean time is probably not a good idea. Non-live migration is fine because the domain (and so all its devices) are suspended in advance, and the MMIO pages are unmapped. This code is running in dom0, and so I think will need some help from the hypervisor to get the range of mfns that they hypervisor has reserved for MMIO. It could then compare the mfn to that range and ignore it if there''s a match. Does this sound sensible? Is there an API to get this information already? Any chance of a patch to fix this making it into 3.2.0? Kieran _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 31/10/07 12:19, "Kieran Mansley" <kmansley@solarflare.com> wrote:> However there''s no equivalent test for a non-HVM domain. I think adding > such a test to just ignore any MMIO pages would help. The MMIO pages > will be unmapped when the live migration finally suspends the domain > (and so the migration should be possible), but accessing them in the > mean time is probably not a good idea. Non-live migration is fine > because the domain (and so all its devices) are suspended in advance, > and the MMIO pages are unmapped.Is mapping the page as cacheable and then reading from it not supported by the hardware device? E.g., does it need to be uncacheable, or do reads have side effects? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2007-10-31 at 13:32 +0000, Keir Fraser wrote:> > > On 31/10/07 12:19, "Kieran Mansley" <kmansley@solarflare.com> wrote: > > > However there''s no equivalent test for a non-HVM domain. I think adding > > such a test to just ignore any MMIO pages would help. The MMIO pages > > will be unmapped when the live migration finally suspends the domain > > (and so the migration should be possible), but accessing them in the > > mean time is probably not a good idea. Non-live migration is fine > > because the domain (and so all its devices) are suspended in advance, > > and the MMIO pages are unmapped. > > Is mapping the page as cacheable and then reading from it not supported by > the hardware device? E.g., does it need to be uncacheable, or do reads have > side effects?The page is mapped with ioremap_nocache, and needs to be uncacheable. Reads shouldn''t have side effects (although shouldn''t and don''t aren''t always synonymous). More generally I can think of devices where reads do have side effects though. The symptom of failure is that once live migration has started, trying to write to an IO page results in it getting stuck in (or a perpetual loop into) page_fault(). This only happens very occasionally. I''ve interpreted it getting stuck in page_fault() as a result of the shadow paging which (as I understand it) marks the normal page table entries as read only so that writes to pages trap into the hypervisor and it can update its dirty set. So the live migration page reading doesn''t quite explain the problem (unless the read from hardware does have side effects) but as there was a check there for HVM MMIO pages it seemed to me that adding something equivalent for PV would be a good idea, and if that didn''t fix the problem I could try looking at the shadow page tables and how they might interact badly. Kieran _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 31/10/07 14:03, "Kieran Mansley" <kmansley@solarflare.com> wrote:> The symptom of failure is that once live migration has started, trying > to write to an IO page results in it getting stuck in (or a perpetual > loop into) page_fault(). This only happens very occasionally. I''ve > interpreted it getting stuck in page_fault() as a result of the shadow > paging which (as I understand it) marks the normal page table entries as > read only so that writes to pages trap into the hypervisor and it can > update its dirty set.Yes, but then it should mark the page writable again, so that the access can be re-executed without faulting! So this rather points at some problem with the live-migration shadow mode w.r.t. mmio pages. You''re going to have to fix this regardless of whether you prevent xc_domain_save from mapping mmio pages (which isn''t itself a bad idea, for the reasons you note). This is because, regardless of whether xc_domain_save ignores mmio pages, that isn''t going to change the shadow code''s behaviour, and it''s almost certainly that which is causing the infinite fault loop. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2007-10-31 at 14:08 +0000, Keir Fraser wrote:> On 31/10/07 14:03, "Kieran Mansley" <kmansley@solarflare.com> wrote: > > > The symptom of failure is that once live migration has started, trying > > to write to an IO page results in it getting stuck in (or a perpetual > > loop into) page_fault(). This only happens very occasionally. I''ve > > interpreted it getting stuck in page_fault() as a result of the shadow > > paging which (as I understand it) marks the normal page table entries as > > read only so that writes to pages trap into the hypervisor and it can > > update its dirty set. > > Yes, but then it should mark the page writable again, so that the access can > be re-executed without faulting! So this rather points at some problem with > the live-migration shadow mode w.r.t. mmio pages.Yes. The reason it''s failing is that sh_page_fault() in xen/arch/x86/mm/shadow/multi.c thinks it''s a bad gfn: if ( !p2m_is_valid(p2mt) || (!(p2m_is_mmio(p2mt) || mfn_valid (gmfn))) ) { perfc_incr(shadow_fault_bail_bad_gfn); SHADOW_PRINTK("BAD gfn=%"SH_PRI_gfn" gmfn=%"PRI_mfn"\n", gfn_x(gfn), mfn_x(gmfn)); goto not_a_shadow_fault; } I think the problem is that set_mmio_p2m_entry() isn''t getting called when the IO mapping is established. There are three places where iomem_permit_access() is called: - XEN_DOMCTL_memory_mapping: (in xen/arch/x86/domctl.c) - XEN_DOMCTL_iomem_permission: (in xen/common/domctl.c) - __gnttab_map_grant_ref(): (in xen/common/grant_table.c) The last one was written by me based on the second one, and neither of these call set_mmio_p2m_entry(). I find this a bit suspicious, because the first one does, and it looks necessary to me in all three cases. There are however very few users of either of those domctl operations, and so it''s hard to tell what the difference is supposed to be, and so why XEN_DOMCTL_iomem_permission doesn''t call set_mmio_p2m_entry(). Adding calls to set_mmio_p2m_entry() in either of the cases that don''t have it might be a bit tricky too as I''m not sure a gfn exists for that mfn at the point that they are called. Kieran _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 31/10/07 16:34, "Kieran Mansley" <kmansley@solarflare.com> wrote:> Yes. The reason it''s failing is that sh_page_fault() in > xen/arch/x86/mm/shadow/multi.c thinks it''s a bad gfn: > > if ( !p2m_is_valid(p2mt) || (!(p2m_is_mmio(p2mt) || mfn_valid > (gmfn))) ) > { > perfc_incr(shadow_fault_bail_bad_gfn); > SHADOW_PRINTK("BAD gfn=%"SH_PRI_gfn" gmfn=%"PRI_mfn"\n", > gfn_x(gfn), mfn_x(gmfn)); > goto not_a_shadow_fault; > } > > I think the problem is that set_mmio_p2m_entry() isn''t getting called > when the IO mapping is established. There are three places where > iomem_permit_access() is called:No, basically that pagefault-handler check is nonsense for a PV guest. We don''t have a p2m table in Xen for PV guests because they are not ''translated mode''. So there is nowhere for us to store the ''mmio'' p2m type. Perhaps Tim has a good idea what to do here. Adding a !shadow_mode_translate() condition to the if statement would probably work but I''m not sure it''s the neatest answer. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2007-10-31 at 16:44 +0000, Keir Fraser wrote:> On 31/10/07 16:34, "Kieran Mansley" <kmansley@solarflare.com> wrote: > > > Yes. The reason it''s failing is that sh_page_fault() in > > xen/arch/x86/mm/shadow/multi.c thinks it''s a bad gfn: > > > > if ( !p2m_is_valid(p2mt) || (!(p2m_is_mmio(p2mt) || mfn_valid > > (gmfn))) ) > > { > > perfc_incr(shadow_fault_bail_bad_gfn); > > SHADOW_PRINTK("BAD gfn=%"SH_PRI_gfn" gmfn=%"PRI_mfn"\n", > > gfn_x(gfn), mfn_x(gmfn)); > > goto not_a_shadow_fault; > > } > > > > I think the problem is that set_mmio_p2m_entry() isn''t getting called > > when the IO mapping is established. There are three places where > > iomem_permit_access() is called: > > No, basically that pagefault-handler check is nonsense for a PV guest. We > don''t have a p2m table in Xen for PV guests because they are not ''translated > mode''. So there is nowhere for us to store the ''mmio'' p2m type. > > Perhaps Tim has a good idea what to do here. Adding a > !shadow_mode_translate() condition to the if statement would probably work > but I''m not sure it''s the neatest answer.Avoiding that if statement isn''t sufficient to fix the problem - it still doesn''t seem to correctly re-enable writing to the page, even though xenperf tells me that "calls to shadow fault" == "shadow_fault fixed fault" (i.e. the code now thinks it''s successfully done what it should). Kieran _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 16:44 +0000 on 31 Oct (1193849054), Keir Fraser wrote:> On 31/10/07 16:34, "Kieran Mansley" <kmansley@solarflare.com> wrote:> > if ( !p2m_is_valid(p2mt) || (!(p2m_is_mmio(p2mt) || mfn_valid > > (gmfn))) ) > > { > > perfc_incr(shadow_fault_bail_bad_gfn); > > SHADOW_PRINTK("BAD gfn=%"SH_PRI_gfn" gmfn=%"PRI_mfn"\n", > > gfn_x(gfn), mfn_x(gmfn)); > > goto not_a_shadow_fault; > > }> No, basically that pagefault-handler check is nonsense for a PV guest. We > don''t have a p2m table in Xen for PV guests because they are not ''translated > mode''. So there is nowhere for us to store the ''mmio'' p2m type.Hmm. The p2m_is_valid() check is OK because that always passes for PV guests. The check for mfn_valid() is failing because MMIO frames aren''t "valid". For HVM guests we let that pass if the GFN is known to be acceptable MMIO (by its p2m type). There needs to be an equivalent get-out here for PV guests -- iomem_access_permitted(d, mfn_x(gmfn), mfn_x(gmfn)) should do. You''ll need to add the same test in _sh_propagate(), where it checks again that the target MFN is sane: if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) ) { ASSERT((ft == ft_prefetch)); *sp = shadow_l1e_empty(); goto done; } Cheers, Tim. -- Tim Deegan <Tim.Deegan@eu.citrix.com> Principal Software Engineer, Citrix Systems. [Company #5334508: XenSource UK Ltd, c/o EC2Y 5EB, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 31/10/07 17:31, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote:> For HVM guests we let that pass if the GFN is known to be acceptable > MMIO (by its p2m type). There needs to be an equivalent get-out here > for PV guests -- iomem_access_permitted(d, mfn_x(gmfn), mfn_x(gmfn)) > should do.We shouldn''t really need to do a iomem_access_permitted() check here. If this is a PV guest then the access check was made when the guest pagetable was validated. We can just circumvent these checks entirely for PV guests. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 2007-10-31 at 17:31 +0000, Tim Deegan wrote:> Hmm. > > The p2m_is_valid() check is OK because that always passes for PV guests. > The check for mfn_valid() is failing because MMIO frames aren''t "valid". > > For HVM guests we let that pass if the GFN is known to be acceptable > MMIO (by its p2m type). There needs to be an equivalent get-out here > for PV guests -- iomem_access_permitted(d, mfn_x(gmfn), mfn_x(gmfn)) > should do. > > You''ll need to add the same test in _sh_propagate(), where it checks again > that the target MFN is sane: > > if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) ) > { > ASSERT((ft == ft_prefetch)); > *sp = shadow_l1e_empty(); > goto done; > }I''m a bit concerned about this one in _sh_propogate(). It seems it''s checking that the mfn is valid for a good reason, and so letting it carry on in the case of MMIO frames will lead to a rather ungraceful failure. If I modify the above if to be: if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) && !iomem_access_permitted(d, target_mfn, target_mfn)) so that PV iomem frames can continue rather than giving up at this point, it gets a fatal page fault (see below). I wonder if the comment above that if has the answer: /* N.B. For pass-through MMIO, either this test needs to be relaxed, * and shadow_set_l1e() trained to handle non-valid MFNs (ugh), or the * MMIO areas need to be added to the frame-table to make them "valid". */ i.e. relaxing the test is not sufficient. (XEN) ----[ Xen-3.2-unstable x86_32p debug=n Not tainted ]---- (XEN) CPU: 3 (XEN) EIP: e008:[<ff179ed2>] l1e_propagate_from_guest+0x2e2/0x320 (XEN) EFLAGS: 00010246 CONTEXT: hypervisor (XEN) eax: 00101000 ebx: 00000040 ecx: ff1a7080 edx: ff1a7080 (XEN) esi: 00000063 edi: 000d8fa0 ebp: ff1aa080 esp: ffbf7da0 (XEN) cr0: 8005003b cr4: 000006f0 cr3: 001aaf40 cr2: fcb63e80 (XEN) ds: e010 es: e010 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from esp=ffbf7da0: (XEN) ff1a7c00 000d8fa0 000d8fa0 ff1a7080 00000000 00000000 ff1a7080 ff1a7080(XEN) 000d8fa0 137b4001 fe688260 ff17d0b4 000d8fa0 ffbf7ef8 00000001 00000001(XEN) 00000006 ff1a79dc 00dde090 fe7f3008 00000020 00000020 3eacc063 ff1be1e0(XEN) ff1a7080 000160b3 0001379a 3e3fc063 00000002 ff1a79dc 00000000 fe7f3440(XEN) 00000020 00000003 160b3063 00000000 ffbf7ec0 fe688268 ff13b7e2 ff1aa080(XEN) c1212080 ffbf7ec0 00000000 ff1a7080 ff1aafa0 ff1ae000 00000000 ff1be1e0(XEN) 00000004 ffbf7fb4 ff182b95 00000073 ffbf7e84 ffbf7e80 fe800034 00000005(XEN) ff1ce900 ff1ce900 0000000f ff1ce900 00000005 ff115da7 ff1ce900 00000005 (XEN) ff1ce900 ff1ce900 ff1aa0a4 ff1ce900 ff1c1104 000f5da7 0000000f ff1c1104(XEN) 000f5da7 00000003 ff1aa0a4 ff1be100 d104955c ff1aaf78 fee09440 fee0d248(XEN) 160b3063 00000000 0003fa34 0003e8e7 0000000f 0000007b ffffffff ffffffff(XEN) d8fa0073 00000000 160b3061 00000000 00000001 00000001 00000000 ff1aa080(XEN) ff1a7080 ffbf7fb4 d104955c ff13b7e2 ff1aa080 d104955c ffbf7fb4 00000001(XEN) 274038a1 68465ba4 00000084 00000000 ff1be1e0 ff10dec4 00000068 cfe89ff8(XEN) 00000000 cfe89ec0 c01186cf ff1c1024 cfe89e9c 00000020 ffbfcf0c ffbf7fb4(XEN) ff1c1020 00000001 ff1aa080 0000007b 0000007b cfe89fb4 ff181f83 ff1aa080 (XEN) 0000007b 0000007b cfe89f04 ff182292 ffbf7fb4 cfe90bdc cfe89f0c 00e89f60(XEN) d1049540 cfe90bc0 cfe89f04 cff80000 000e0000 c026724b 00000061 00010286(XEN) cfe89ee4 00000069 0000007b 0000007b 00000000 00000000 00000003 ff1aa080(XEN) Xen call trace: (XEN) [<ff179ed2>] l1e_propagate_from_guest+0x2e2/0x320 (XEN) [<ff1a7c00>] boot_edid_info+0x61/0x80 (XEN) [<ff1a7080>] edd_check_ext+0x1b/0x1c (XEN) [<ff1a7080>] edd_check_ext+0x1b/0x1c (XEN) [<ff1a7080>] edd_check_ext+0x1b/0x1c (XEN) [<ff17d0b4>] sh_page_fault__shadow_3_guest_3+0x4b4/0xf80 (XEN) [<ff1a79dc>] store_edid+0x5a/0x61 (XEN) [<ff1a7080>] edd_check_ext+0x1b/0x1c (XEN) [<ff1a79dc>] store_edid+0x5a/0x61 (XEN) [<ff13b7e2>] do_page_fault+0x122/0x630 (XEN) [<ff1aa080>] ns16550_init+0xf0/0x2b0 (XEN) [<ff1a7080>] edd_check_ext+0x1b/0x1c (XEN) [<ff1aafa0>] acpi_parse_memory_affinity+0x20/0x30 (XEN) [<ff1ae000>] init_e820+0x80/0x6b0 (XEN) [<ff182b95>] linearise_address+0x25/0x50 (XEN) [<ff115da7>] add_entry+0x57/0x140 (XEN) [<ff1aa0a4>] ns16550_init+0x114/0x2b0 (XEN) [<ff1aa0a4>] ns16550_init+0x114/0x2b0 (XEN) [<ff1aaf78>] acpi_table_print_srat_entry+0x28/0x30 (XEN) [<ff1aa080>] ns16550_init+0xf0/0x2b0 (XEN) [<ff1a7080>] edd_check_ext+0x1b/0x1c (XEN) [<ff13b7e2>] do_page_fault+0x122/0x630 (XEN) [<ff1aa080>] ns16550_init+0xf0/0x2b0 (XEN) [<ff10dec4>] do_multicall+0x124/0x350 (XEN) [<ff1aa080>] ns16550_init+0xf0/0x2b0 (XEN) [<ff181f83>] tracing_off+0x7/0xe (XEN) [<ff1aa080>] ns16550_init+0xf0/0x2b0 (XEN) [<ff182292>] handle_exception+0x82/0xb6 (XEN) [<ff1aa080>] ns16550_init+0xf0/0x2b0 (XEN) (XEN) Pagetable walk from fcb63e80: (XEN) L3[0x003] = 00000000137b4001 55555555 (XEN) L2[0x1e5] = 0000000000000000 ffffffff (XEN) (XEN) **************************************** (XEN) Panic on CPU 3: (XEN) FATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: fcb63e80 (XEN) **************************************** (XEN) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 11:18 +0000 on 01 Nov (1193915902), Kieran Mansley wrote:> I''m a bit concerned about this one in _sh_propogate(). It seems it''s > checking that the mfn is valid for a good reason, and so letting it > carry on in the case of MMIO frames will lead to a rather ungraceful > failure. If I modify the above if to be: > > if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) > && !iomem_access_permitted(d, target_mfn, target_mfn)) > > so that PV iomem frames can continue rather than giving up at this > point, it gets a fatal page fault (see below).Oh dear. Yes, you''ll need to protect all the stuff further down that tries to mark the frame dirty. Probably making sh_mfn_is_dirty test for !mfn_valid(mfn) and return 0 will be enough.> I wonder if the comment above that if has the answer: > > /* N.B. For pass-through MMIO, either this test needs to be relaxed, > * and shadow_set_l1e() trained to handle non-valid MFNs (ugh), or > the > * MMIO areas need to be added to the frame-table to make them > "valid". */That comment is gone in -unstable, since the IOMMU pass-through to HVM domains went in. Passing through the MMIO mappings should work because get_page_from_l1e will do the right thing later. Another thing to worry about in _sh_propagate is that the shadows don''t retain any of the caching attribute bits from the guest entries. You might want to add _PAGE_PCD and _PAGE_PWT to "pass_thru_flags" for PV guests (and for l1es, _PAGE_PAT, I suppose, if PV guests are allowed to use PAT). Tim. -- Tim Deegan <Tim.Deegan@eu.citrix.com> Principal Software Engineer, Citrix Systems. [Company #5334508: XenSource UK Ltd, reg''d c/o EC2Y 5EB, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 2007-11-01 at 11:34 +0000, Tim Deegan wrote:> At 11:18 +0000 on 01 Nov (1193915902), Kieran Mansley wrote: > > I''m a bit concerned about this one in _sh_propogate(). It seems it''s > > checking that the mfn is valid for a good reason, and so letting it > > carry on in the case of MMIO frames will lead to a rather ungraceful > > failure. If I modify the above if to be: > > > > if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) > > && !iomem_access_permitted(d, target_mfn, target_mfn)) > > > > so that PV iomem frames can continue rather than giving up at this > > point, it gets a fatal page fault (see below). > > Oh dear. Yes, you''ll need to protect all the stuff further down that > tries to mark the frame dirty. Probably making sh_mfn_is_dirty test for > !mfn_valid(mfn) and return 0 will be enough.See attached patch. I opted instead to surround the section that calls sh_mfn_is_dirty() with if ( mfn_valid(target_mfn) || !iomem_access_permitted(d, target_mfn, target_mfn) ) because if sh_mfn_is_dirty() returned zero on a bad mfn, it would remove the _PAGE_RW flag. I''m not 100% sure this patch is complete or correct or doesn''t have side effects, but it works for me. I''d appreciate it if you could give it a once over to make sure it makes sense. If it does, and isn''t too big a risk for 3.2.0, applying it to that would be great. One question in my mind is whether the tests of iomem_access_permitted() in _sh_propogate() would be better replaced with the more general shadow_mode_translate(). The former seemed less risky to me as otherwise any invalid mfn could drop through when !shadow_mode_translate (), and this test was originally designed to filter them out. Signed-off-by Kieran Mansley <kmansley@solarflare.com> Thanks Kieran fix iomem frame shadow propogation to allow live migration diff -r b28fa67d3940 xen/arch/x86/mm/shadow/multi.c --- a/xen/arch/x86/mm/shadow/multi.c Thu Nov 01 11:52:34 2007 +0000 +++ b/xen/arch/x86/mm/shadow/multi.c Thu Nov 01 16:02:36 2007 +0000 @@ -28,6 +28,7 @@ #include <xen/sched.h> #include <xen/perfc.h> #include <xen/domain_page.h> +#include <xen/iocap.h> #include <asm/page.h> #include <asm/current.h> #include <asm/shadow.h> @@ -696,7 +697,8 @@ _sh_propagate(struct vcpu *v, /* N.B. For pass-through MMIO, either this test needs to be relaxed, * and shadow_set_l1e() trained to handle non-valid MFNs (ugh), or the * MMIO areas need to be added to the frame-table to make them "valid". */ - if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) ) + if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) && + !iomem_access_permitted(d, target_mfn, target_mfn) ) { ASSERT((ft == ft_prefetch)); *sp = shadow_l1e_empty(); @@ -709,9 +711,12 @@ _sh_propagate(struct vcpu *v, // SHADOW_PRESENT bit. // pass_thru_flags = (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER | - _PAGE_RW | _PAGE_PRESENT); + _PAGE_RW | _PAGE_PRESENT ); if ( guest_supports_nx(v) ) pass_thru_flags |= _PAGE_NX_BIT; + if ( !mfn_valid(target_mfn) && + iomem_access_permitted(d, target_mfn, target_mfn) ) + pass_thru_flags |= _PAGE_PCD | _PAGE_PWT; sflags = gflags & pass_thru_flags; /* Only change memory caching type for pass-through domain */ @@ -758,10 +763,13 @@ _sh_propagate(struct vcpu *v, // p2m_ram_logdirty p2m type: only HAP uses that.) if ( unlikely((level == 1) && shadow_mode_log_dirty(d)) ) { - if ( ft & FETCH_TYPE_WRITE ) - paging_mark_dirty(d, mfn_x(target_mfn)); - else if ( !sh_mfn_is_dirty(d, target_mfn) ) - sflags &= ~_PAGE_RW; + if ( mfn_valid(target_mfn) || + !iomem_access_permitted(d, target_mfn, target_mfn) ) { + if ( ft & FETCH_TYPE_WRITE ) + paging_mark_dirty(d, mfn_x(target_mfn)); + else if ( !sh_mfn_is_dirty(d, target_mfn) ) + sflags &= ~_PAGE_RW; + } } /* Read-only memory */ @@ -2836,7 +2844,8 @@ static int sh_page_fault(struct vcpu *v, gfn = guest_l1e_get_gfn(gw.eff_l1e); gmfn = gfn_to_mfn(d, gfn, &p2mt); - if ( !p2m_is_valid(p2mt) || (!p2m_is_mmio(p2mt) && !mfn_valid (gmfn)) ) + if ( !shadow_mode_translate(d) && + (!p2m_is_valid(p2mt) || (!p2m_is_mmio(p2mt) && !mfn_valid (gmfn))) ) { perfc_incr(shadow_fault_bail_bad_gfn); SHADOW_PRINTK("BAD gfn=%"SH_PRI_gfn" gmfn=%"PRI_mfn"\n", _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 16:36 +0000 on 01 Nov (1193934965), Kieran Mansley wrote:> One question in my mind is whether the tests of iomem_access_permitted() > in _sh_propogate() would be better replaced with the more general > shadow_mode_translate(). The former seemed less risky to me asYou could do that. I think the correct test is shadow_mode_refcounts(); as Keir pointed out, guests whose refcounting isn''t done by the shadow code will have had their pagetables validated bby the PV MMU interface. The test around the log-dirty code can just be on mfn_valid(). Cheers, Tim. -- Tim Deegan <Tim.Deegan@eu.citrix.com> Principal Software Engineer, Citrix Systems. [Company #5334508: XenSource UK Ltd, reg''d c/o EC2Y 5EB, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 2007-11-01 at 16:43 +0000, Tim Deegan wrote:> At 16:36 +0000 on 01 Nov (1193934965), Kieran Mansley wrote: > > One question in my mind is whether the tests of iomem_access_permitted() > > in _sh_propogate() would be better replaced with the more general > > shadow_mode_translate(). The former seemed less risky to me as > > You could do that. I think the correct test is shadow_mode_refcounts(); > as Keir pointed out, guests whose refcounting isn''t done by the shadow > code will have had their pagetables validated bby the PV MMU interface. > > The test around the log-dirty code can just be on mfn_valid().Attached is a new patch with those changes. Seems much more sensible to me that way too, and it works just as well. Signed-off-by: Kieran Mansley <kmansley@solarflare.com> Thanks Kieran fix iomem frame shadow propogation to allow live migration diff -r b28fa67d3940 xen/arch/x86/mm/shadow/multi.c --- a/xen/arch/x86/mm/shadow/multi.c Thu Nov 01 11:52:34 2007 +0000 +++ b/xen/arch/x86/mm/shadow/multi.c Fri Nov 02 09:27:43 2007 +0000 @@ -696,7 +696,8 @@ _sh_propagate(struct vcpu *v, /* N.B. For pass-through MMIO, either this test needs to be relaxed, * and shadow_set_l1e() trained to handle non-valid MFNs (ugh), or the * MMIO areas need to be added to the frame-table to make them "valid". */ - if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) ) + if ( shadow_mode_refcounts(d) && + !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) ) { ASSERT((ft == ft_prefetch)); *sp = shadow_l1e_empty(); @@ -712,6 +713,8 @@ _sh_propagate(struct vcpu *v, _PAGE_RW | _PAGE_PRESENT); if ( guest_supports_nx(v) ) pass_thru_flags |= _PAGE_NX_BIT; + if ( !shadow_mode_refcounts(d) && !mfn_valid(target_mfn) ) + pass_thru_flags |= _PAGE_PCD | _PAGE_PWT; sflags = gflags & pass_thru_flags; /* Only change memory caching type for pass-through domain */ @@ -758,10 +761,12 @@ _sh_propagate(struct vcpu *v, // p2m_ram_logdirty p2m type: only HAP uses that.) if ( unlikely((level == 1) && shadow_mode_log_dirty(d)) ) { - if ( ft & FETCH_TYPE_WRITE ) - paging_mark_dirty(d, mfn_x(target_mfn)); - else if ( !sh_mfn_is_dirty(d, target_mfn) ) - sflags &= ~_PAGE_RW; + if ( mfn_valid(target_mfn) ) { + if ( ft & FETCH_TYPE_WRITE ) + paging_mark_dirty(d, mfn_x(target_mfn)); + else if ( !sh_mfn_is_dirty(d, target_mfn) ) + sflags &= ~_PAGE_RW; + } } /* Read-only memory */ @@ -2836,7 +2841,8 @@ static int sh_page_fault(struct vcpu *v, gfn = guest_l1e_get_gfn(gw.eff_l1e); gmfn = gfn_to_mfn(d, gfn, &p2mt); - if ( !p2m_is_valid(p2mt) || (!p2m_is_mmio(p2mt) && !mfn_valid(gmfn)) ) + if ( !shadow_mode_translate(d) && + (!p2m_is_valid(p2mt) || (!p2m_is_mmio(p2mt) && !mfn_valid(gmfn))) ) { perfc_incr(shadow_fault_bail_bad_gfn); SHADOW_PRINTK("BAD gfn=%"SH_PRI_gfn" gmfn=%"PRI_mfn"\n", _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 2/11/07 09:31, "Kieran Mansley" <kmansley@solarflare.com> wrote:>> You could do that. I think the correct test is shadow_mode_refcounts(); >> as Keir pointed out, guests whose refcounting isn''t done by the shadow >> code will have had their pagetables validated bby the PV MMU interface. >> >> The test around the log-dirty code can just be on mfn_valid(). > > Attached is a new patch with those changes. Seems much more sensible to > me that way too, and it works just as well.I changed the final chunk test from !shadow_mode_translate() to shadow_mode_refcounts(). I think refcounts is a better test than translate here, and also I think your test was the wrong way round! I also note that guarding the mark-dirty-or-zap-writable-bit with mfn_valid() is not really correct. mfn_valid() only checks whether the mfn < max_page. I bet this would not work if you migrate on a machine with 4GB of RAM, as the MMIO hole will be below max_page. Really mfn_valid needs to handle such MMIO holes, or the shadow code needs to be using a test other than mfn_valid in many places (e.g., the function iomem_page_test() that you added before). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 2/11/07 10:42, "Keir Fraser" <keir@xensource.com> wrote:> I also note that guarding the mark-dirty-or-zap-writable-bit with > mfn_valid() is not really correct. mfn_valid() only checks whether the mfn < > max_page. I bet this would not work if you migrate on a machine with 4GB of > RAM, as the MMIO hole will be below max_page. Really mfn_valid needs to > handle such MMIO holes, or the shadow code needs to be using a test other > than mfn_valid in many places (e.g., the function iomem_page_test() that you > added before).Actually Tim reckons that use of mfn_valid() is okay because although you will lose the _PAGE_RW bit on your mmio mapping temporarily, if the mmio mfn is less than max_page, you''ve fixed up the fault path now so that you should get the _PAGE_RW bit back again when the guest attempts a write access. However, we think that the !mfn_valid() test that gates adding _PAGE_PAT|PAGE_PCD|_PAGE_PWT to the passthru flags should go away. We''ll already have validated those flags even for ordinary RAM mappings for a PV guest, and there are cases where cache attributes have to be different for RAM pages. So probably the test should unconditionally pass through those flags if the domain is !shadow_mode_refcounts. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel