Konrad Rzeszutek Wilk
2010-Dec-30 19:48 UTC
[Xen-devel] [PATCH RFC v2] Consider E820 non-RAM and E820 gaps as 1-1 mappings.
Please see attached an RFC of second set of patches that augments how Xen MMU deals with PFNs that point to physical devices. This patchset is still a development type so sharp edges present. Changelog: [since v1 https://lkml.org/lkml/2010/12/21/255]: - Diagrams of P2M included. - More conservative approach used (memory that is not populated or identity is considered "missing", instead of as identity). - Added IDENTITY_PAGE_FRAME bit to uniquely identify 1-1 mappings. - Optional debugfs file (xen/mmu/p2m) to print out the level and types in the P2M tree. - Lots of comments - if I missed any please prompt me. Short summary: No need to troll through code to add VM_IO on mmap paths anymore. Long summary: Under Xen MMU we would distinguish two different types of PFNs in the P2M tree: real MFN, INVALID_P2M_ENTRY (missing PFN - used for ballooning). If there was a device which PCI BAR was within the P2M, we would look at the flags and if _PAGE_IOMAP was passed we would just return the PFN without consulting the P2M. We have a patch (and some auxiliary for other subsystems) that sets this: x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas This patchset proposes a different way of doing this where the patch above and the other auxiliary ones will not be necessary. This approach is the one that H. Peter Anvin, Jeremy Fitzhardinge, Ian Campbell suggested. The mechanism is to think of the E820 non-RAM entries and E820 gaps in the P2M tree structure as identity (1-1) mapping. In the past we used to think of those regions as "missing" and under the ownership of the balloon code. But the balloon code only operates on a specific region. This region is in last E820 RAM page (basically any region past nr_pages is considered balloon type page). Gaps in the E820 (which are usually considered to PCI BAR spaces) would end up with the void entries and point to the "missing" pages. This patchset iterates over the E820 and finds the gaps and non-RAM E820 and marks them as as "identity". Since the E820 gaps could cross boundary (keep in mind that the P2M structure is a 3-level tree) in the P2M regions we go through the E820 gaps and reserved E820 regions and set those to be identity. For large regions we just hook up the top (or middle) pointer to shared "identity" pages. For smaller regions we set the MFN wherein pfn_to_mfn(pfn)==pfn. The two attached diagrams crudely explain how we are doing this. "P2M story" (https://docs.google.com/drawings/edit?id=1LQy0np2GCkFtspoucgs5Y_TILI8eceY6rikXwtPqlTI&hl=en&authkey=CO_yv_cC) is how the P2M is constructed and setup with balloon pages. The "P2M with 1-1.." (https://docs.google.com/drawings/edit?id=1smqIRPYq2mSxmvpabuk_Ap6bQAW_aaZnOnjzqlcmxKc&hl=en&authkey=CI2iwKcE) is how we insert the identity mappings in the P2M tree. For the balloon pages, the setting of the "missing" pages is mostly already done. The initial case of carving the last E820 region for balloon ownership is augmented to set those PFNs to missing and we also change the balloon code to be more aggressive. This patchset is also available under git: git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git devel/p2m-identity.v4.2 Further work: Also filter out _PAGE_IOMAP on entries that are System RAM (which is wrong). With this P2M to lookup we can make this determination easily. P.S. Along with the devel/ttm.pci-api-v2, I''ve been able to boot Dom0 on a variety of PCIe type graphics hardware with X working (G31M, ATI ES1000, GeForce 6150SE, HD 4350 Radeon, HD 3200 Radeon, GeForce 8600 GT). That test branch is located at devel/fix-amd-bootup if you are curious. arch/x86/include/asm/xen/page.h | 7 +- arch/x86/xen/mmu.c | 222 +++++++++++++++++++++++++++++++++++---- arch/x86/xen/setup.c | 46 ++++++++- drivers/xen/balloon.c | 2 +- 4 files changed, 255 insertions(+), 22 deletions(-) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Dec-30 19:48 UTC
[Xen-devel] [PATCH 1/8] xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY.
With this patch, we diligently set regions that will be used by the balloon driver to be INVALID_P2M_ENTRY and under the ownership of the balloon driver. We are OK using the __set_phys_to_machine as we do not expect to be allocating any P2M middle or entries pages. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/include/asm/xen/page.h | 1 + arch/x86/xen/mmu.c | 12 ++++++------ arch/x86/xen/setup.c | 7 ++++++- drivers/xen/balloon.c | 2 +- 4 files changed, 14 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index 8760cc6..be671f6 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -41,6 +41,7 @@ extern unsigned int machine_to_phys_order; extern unsigned long get_phys_to_machine(unsigned long pfn); extern bool set_phys_to_machine(unsigned long pfn, unsigned long mfn); +extern bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn); static inline unsigned long pfn_to_mfn(unsigned long pfn) { diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 44924e5..8e9c669 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -503,6 +503,11 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn) { unsigned topidx, mididx, idx; + if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) { + BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY); + return true; + } + if (unlikely(pfn >= MAX_P2M_PFN)) { BUG_ON(mfn != INVALID_P2M_ENTRY); return true; @@ -522,11 +527,6 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn) bool set_phys_to_machine(unsigned long pfn, unsigned long mfn) { - if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) { - BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY); - return true; - } - if (unlikely(!__set_phys_to_machine(pfn, mfn))) { if (!alloc_p2m(pfn)) return false; @@ -2438,7 +2438,7 @@ static void xen_zap_pfn_range(unsigned long vaddr, unsigned int order, in_frames[i] = virt_to_mfn(vaddr); MULTI_update_va_mapping(mcs.mc, vaddr, VOID_PTE, 0); - set_phys_to_machine(virt_to_pfn(vaddr), INVALID_P2M_ENTRY); + __set_phys_to_machine(virt_to_pfn(vaddr), INVALID_P2M_ENTRY); if (out_frames) out_frames[i] = virt_to_pfn(vaddr); diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index b5a7f92..7201800 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -52,6 +52,8 @@ phys_addr_t xen_extra_mem_start, xen_extra_mem_size; static __init void xen_add_extra_mem(unsigned long pages) { + unsigned long pfn; + u64 size = (u64)pages * PAGE_SIZE; u64 extra_start = xen_extra_mem_start + xen_extra_mem_size; @@ -66,6 +68,9 @@ static __init void xen_add_extra_mem(unsigned long pages) xen_extra_mem_size += size; xen_max_p2m_pfn = PFN_DOWN(extra_start + size); + + for (pfn = PFN_DOWN(extra_start); pfn <= xen_max_p2m_pfn; pfn++) + __set_phys_to_machine(pfn, INVALID_P2M_ENTRY); } static unsigned long __init xen_release_chunk(phys_addr_t start_addr, @@ -104,7 +109,7 @@ static unsigned long __init xen_release_chunk(phys_addr_t start_addr, WARN(ret != 1, "Failed to release memory %lx-%lx err=%d\n", start, end, ret); if (ret == 1) { - set_phys_to_machine(pfn, INVALID_P2M_ENTRY); + __set_phys_to_machine(pfn, INVALID_P2M_ENTRY); len++; } } diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 43f9f02..b1661cd 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -296,7 +296,7 @@ static int decrease_reservation(unsigned long nr_pages) /* No more mappings: invalidate P2M and add to balloon. */ for (i = 0; i < nr_pages; i++) { pfn = mfn_to_pfn(frame_list[i]); - set_phys_to_machine(pfn, INVALID_P2M_ENTRY); + __set_phys_to_machine(pfn, INVALID_P2M_ENTRY); balloon_append(pfn_to_page(pfn)); } -- 1.7.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Dec-30 19:48 UTC
[Xen-devel] [PATCH 2/8] xen/mmu: Add the notion of identity (1-1) mapping.
Our P2M tree structure is a three-level. On the leaf nodes we set the Machine Frame Number (MFN) of the PFN. What this means is that when one does: pfn_to_mfn(pfn), which is used when creating PTE entries, you get the real MFN of the hardware. When Xen sets up a guest it initially populates a array which has descending MFN values, as so: idx: 0, 1, 2 [0x290F, 0x290E, 0x290D, ..] so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list starts looking quite random. We graft this structure on our P2M tree structure and stick in those MFN in the leafs. But for all other leaf entries, or for the top root, or middle one, for which there is a void entry, we assume it is "missing". So pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY. We add the possibility of setting 1-1 mappings on certain regions, so that: pfn_to_mfn(0xc0000)=0xc0000 The benefit of this is, that we can assume for non-RAM regions (think PCI BARs, or ACPI spaces), we can create mappings easily b/c we get the PFN value to match the MFN. For this to work efficiently we introduce are two new pages: p2m_identity and p2m_mid_identity. All entries in p2m_identity are set to INVALID_P2M_ENTRY type, and all entries in p2m_mid_identity point to p2m_identity. Whenever we are told to set the MFN which is equal to PFN for void regions, we swap over from p2m_missing to p2m_identity or p2m_mid_missing to p2m_mid_identity. See this diagram for details: https://docs.google.com/drawings/edit?id=1smqIRPYq2mSxmvpabuk_Ap6bQAW_aaZnOnjzqlcmxKc&hl=en&authkey=CI2iwKcE Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/mmu.c | 67 +++++++++++++++++++++++++++++++++++++++++++-------- 1 files changed, 56 insertions(+), 11 deletions(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 8e9c669..667901f 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -217,6 +217,9 @@ static RESERVE_BRK_ARRAY(unsigned long **, p2m_top, P2M_TOP_PER_PAGE); static RESERVE_BRK_ARRAY(unsigned long, p2m_top_mfn, P2M_TOP_PER_PAGE); static RESERVE_BRK_ARRAY(unsigned long *, p2m_top_mfn_p, P2M_TOP_PER_PAGE); +static RESERVE_BRK_ARRAY(unsigned long, p2m_identity, P2M_PER_PAGE); +static RESERVE_BRK_ARRAY(unsigned long *, p2m_mid_identity, P2M_MID_PER_PAGE); + RESERVE_BRK(p2m_mid, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE))); RESERVE_BRK(p2m_mid_mfn, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE))); @@ -260,12 +263,12 @@ static void p2m_top_mfn_p_init(unsigned long **top) top[i] = p2m_mid_missing_mfn; } -static void p2m_mid_init(unsigned long **mid) +static void p2m_mid_init(unsigned long **mid, unsigned long *ptr) { unsigned i; for (i = 0; i < P2M_MID_PER_PAGE; i++) - mid[i] = p2m_missing; + mid[i] = ptr; } static void p2m_mid_mfn_init(unsigned long *mid) @@ -326,7 +329,7 @@ void xen_build_mfn_list_list(void) * they''re just missing, just update the stored mfn, * since all could have changed over a migrate. */ - if (mid == p2m_mid_missing) { + if (mid == p2m_mid_identity || mid == p2m_mid_missing) { BUG_ON(mididx); BUG_ON(mid_mfn_p != p2m_mid_missing_mfn); p2m_top_mfn[topidx] = virt_to_mfn(p2m_mid_missing_mfn); @@ -374,11 +377,16 @@ void __init xen_build_dynamic_phys_to_machine(void) p2m_init(p2m_missing); p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE); - p2m_mid_init(p2m_mid_missing); + p2m_mid_init(p2m_mid_missing, p2m_missing); p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE); p2m_top_init(p2m_top); + p2m_identity = extend_brk(PAGE_SIZE, PAGE_SIZE); + p2m_init(p2m_identity); + + p2m_mid_identity = extend_brk(PAGE_SIZE, PAGE_SIZE); + p2m_mid_init(p2m_mid_identity, p2m_identity); /* * The domain builder gives us a pre-constructed p2m array in * mfn_list for all the pages initially given to us, so we just @@ -390,7 +398,7 @@ void __init xen_build_dynamic_phys_to_machine(void) if (p2m_top[topidx] == p2m_mid_missing) { unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE); - p2m_mid_init(mid); + p2m_mid_init(mid, p2m_missing); p2m_top[topidx] = mid; } @@ -410,6 +418,17 @@ unsigned long get_phys_to_machine(unsigned long pfn) mididx = p2m_mid_index(pfn); idx = p2m_index(pfn); + /* + * The INVALID_P2M_ENTRY is filled in both p2m_*identity + * and in p2m_*missing, so returning the INVALID_P2M_ENTRY + * would be wrong. + */ + if (p2m_top[topidx] == p2m_mid_identity) + return pfn; + + if (p2m_top[topidx][mididx] == p2m_identity) + return pfn; + return p2m_top[topidx][mididx][idx]; } EXPORT_SYMBOL_GPL(get_phys_to_machine); @@ -434,7 +453,7 @@ static void free_p2m_page(void *p) static bool alloc_p2m(unsigned long pfn) { unsigned topidx, mididx; - unsigned long ***top_p, **mid; + unsigned long ***top_p, **mid, **mid_orig; unsigned long *top_mfn_p, *mid_mfn; topidx = p2m_top_index(pfn); @@ -443,15 +462,19 @@ static bool alloc_p2m(unsigned long pfn) top_p = &p2m_top[topidx]; mid = *top_p; - if (mid == p2m_mid_missing) { + if (mid == p2m_mid_identity || mid == p2m_mid_missing) { /* Mid level is missing, allocate a new one */ + mid_orig = mid; mid = alloc_p2m_page(); if (!mid) return false; - p2m_mid_init(mid); + if (mid == p2m_mid_identity) + p2m_mid_init(mid, p2m_identity); + else + p2m_mid_init(mid, p2m_missing); - if (cmpxchg(top_p, p2m_mid_missing, mid) != p2m_mid_missing) + if (cmpxchg(top_p, mid_orig, mid) != mid_orig) free_p2m_page(mid); } @@ -479,9 +502,11 @@ static bool alloc_p2m(unsigned long pfn) p2m_top_mfn_p[topidx] = mid_mfn; } - if (p2m_top[topidx][mididx] == p2m_missing) { + if (p2m_top[topidx][mididx] == p2m_identity || + p2m_top[topidx][mididx] == p2m_missing) { /* p2m leaf page is missing */ unsigned long *p2m; + unsigned long *p2m_orig = p2m_top[topidx][mididx]; p2m = alloc_p2m_page(); if (!p2m) @@ -489,7 +514,7 @@ static bool alloc_p2m(unsigned long pfn) p2m_init(p2m); - if (cmpxchg(&mid[mididx], p2m_missing, p2m) != p2m_missing) + if (cmpxchg(&mid[mididx], p2m_orig, p2m) != p2m_orig) free_p2m_page(p2m); else mid_mfn[mididx] = virt_to_mfn(p2m); @@ -517,6 +542,26 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn) mididx = p2m_mid_index(pfn); idx = p2m_index(pfn); + /* For sparse holes were the p2m leaf has real PFN along with + * PCI holes, stick in the PFN as the MFN value. + */ + if (pfn == mfn) { + if (p2m_top[topidx] == p2m_mid_identity) + return 1; + if (p2m_top[topidx][mididx] == p2m_identity) + return 1; + + /* Swap over from MISSING to IDENTITY if needed. */ + if (p2m_top[topidx] == p2m_mid_missing) { + p2m_top[topidx] = p2m_mid_identity; + return 1; + } + if (p2m_top[topidx][mididx] == p2m_missing) { + p2m_top[topidx][mididx] = p2m_identity; + return 1; + } + } + if (p2m_top[topidx][mididx] == p2m_missing) return mfn == INVALID_P2M_ENTRY; -- 1.7.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Dec-30 19:48 UTC
[Xen-devel] [PATCH 3/8] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
We walk the E820 region and start at 0 (for PV guests we start at ISA_END_ADDRESS) and skip any E820 RAM regions. For all other regions and as well the gaps we set them to be identity mappings. The reasons we do not want to set the identity mapping from 0-> ISA_END_ADDRESS when running as PV is b/c that the kernel would try to read DMI information and fail (no permissions to read that). Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/setup.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 files changed, 39 insertions(+), 0 deletions(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 7201800..643b3fc 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -143,6 +143,38 @@ static unsigned long __init xen_return_unused_memory(unsigned long max_pfn, return released; } +static unsigned long __init xen_set_identity(const struct e820map *e820) +{ + phys_addr_t last = xen_initial_domain() ? 0 : ISA_END_ADDRESS; + int i; + unsigned long identity = 0; + unsigned long pfn; + + for (i = 0; i < e820->nr_map; i++) { + phys_addr_t start = e820->map[i].addr; + phys_addr_t end = start + e820->map[i].size; + + if (end < start) + continue; + + /* Skip over the 1MB region. */ + if (last > end) + continue; + + if (e820->map[i].type == E820_RAM) { + /* Without saving ''last'' we would gooble RAM too. */ + last = end; + continue; + } + + for (pfn = PFN_UP(last); pfn < PFN_DOWN(end); pfn++) + __set_phys_to_machine(pfn, pfn); + identity += pfn - PFN_UP(last); + + last = end; + } + return identity; +} /** * machine_specific_memory_setup - Hook for machine specific memory setup. **/ @@ -156,6 +188,7 @@ char * __init xen_memory_setup(void) struct xen_memory_map memmap; unsigned long extra_pages = 0; unsigned long extra_limit; + unsigned long identity_pages = 0; int i; int op; @@ -251,6 +284,12 @@ char * __init xen_memory_setup(void) xen_add_extra_mem(extra_pages); + /* + * Set P2M for all non-RAM pages and E820 gaps to be identity + * type PFNs. + */ + identity_pages = xen_set_identity(&e820); + printk(KERN_INFO "Set %ld page(s) to 1-1 mapping.\n", identity_pages); return "Xen"; } -- 1.7.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Dec-30 19:48 UTC
[Xen-devel] [PATCH 4/8] xen/mmu: Warn against races.
The initial bootup code uses set_phys_to_machine quite a lot, and after bootup it would be used by the balloon driver. The balloon driver does have mutex lock so this should not be necessary - but just in case, add a warning if we do hit this scenario. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/mmu.c | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 667901f..36bdd2d 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -553,11 +553,13 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn) /* Swap over from MISSING to IDENTITY if needed. */ if (p2m_top[topidx] == p2m_mid_missing) { - p2m_top[topidx] = p2m_mid_identity; + WARN_ON(cmpxchg(&p2m_top[topidx], p2m_mid_missing, + p2m_mid_identity) != p2m_mid_missing); return 1; } if (p2m_top[topidx][mididx] == p2m_missing) { - p2m_top[topidx][mididx] = p2m_identity; + WARN_ON(cmpxchg(&p2m_top[topidx][mididx], p2m_missing, + p2m_identity) != p2m_missing); return 1; } } -- 1.7.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Dec-30 19:48 UTC
[Xen-devel] [PATCH 5/8] xen/debug: Print out all pages in the P2M.
We walk over the whole P2M and constructing a simplified view of which regions belong to what level and what type they are. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/mmu.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 84 insertions(+), 0 deletions(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 36bdd2d..e9dfdd6 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -46,6 +46,7 @@ #include <linux/module.h> #include <linux/gfp.h> #include <linux/memblock.h> +#include <linux/seq_file.h> #include <asm/pgtable.h> #include <asm/tlbflush.h> @@ -2764,6 +2765,88 @@ EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_range); #ifdef CONFIG_XEN_DEBUG_FS +static int p2m_dump_show(struct seq_file *m, void *v) +{ + static const char * const level_name[] = { "top", "middle", + "entry", "abnormal" }; + static const char * const type_name[] = { "identity", "missing", + "pfn", "abnormal"}; +#define TYPE_IDENTITY 0 +#define TYPE_MISSING 1 +#define TYPE_PFN 2 +#define TYPE_UNKN 3 + unsigned long pfn, prev_pfn_type = 0, prev_pfn_level = 0; + unsigned int prev_level = 0; + unsigned int prev_type = TYPE_UNKN; + + if (!p2m_top) + return 0; + + for (pfn = 0; pfn < MAX_DOMAIN_PAGES; pfn++) { + unsigned topidx = p2m_top_index(pfn); + unsigned mididx = p2m_mid_index(pfn); + unsigned idx = p2m_index(pfn); + unsigned lvl, type; + + lvl = 4; + type = TYPE_UNKN; + if (p2m_top[topidx] == p2m_mid_missing) { + lvl = 0; type = TYPE_MISSING; + } else if (p2m_top[topidx] == p2m_mid_identity) { + lvl = 0; type = TYPE_IDENTITY; + } else if (p2m_top[topidx][mididx] == p2m_identity) { + lvl = 1; type = TYPE_IDENTITY; + } else if (p2m_top[topidx][mididx][idx] == pfn) { + lvl = 2; type = TYPE_IDENTITY; + } else if (p2m_top[topidx] == NULL) { + lvl = 0; type = TYPE_UNKN; + } else if (p2m_top[topidx][mididx] == NULL) { + lvl = 1; type = TYPE_UNKN; + } else if (p2m_top[topidx][mididx][idx] == 0) { + lvl = 2; type = TYPE_UNKN; + } else if (p2m_top[topidx][mididx] == p2m_missing) { + lvl = 1; type = TYPE_MISSING; + } else if (p2m_top[topidx][mididx][idx] == INVALID_P2M_ENTRY) { + lvl = 2; type = TYPE_MISSING; + } else if (p2m_top[topidx][mididx][idx] != pfn) { + lvl = 2; type = TYPE_PFN; + } + if (pfn == 0) { + prev_level = lvl; + prev_type = type; + } + if (pfn == MAX_DOMAIN_PAGES-1) { + lvl = 3; + type = TYPE_UNKN; + } + if (prev_type != type) { + seq_printf(m, " [0x%lx->0x%lx] %s\n", + prev_pfn_type, pfn, type_name[prev_type]); + prev_pfn_type = pfn; + prev_type = type; + } + if (prev_level != lvl) { + seq_printf(m, " [0x%lx->0x%lx] level %s\n", + prev_pfn_level, pfn, level_name[prev_level]); + prev_pfn_level = pfn; + prev_level = lvl; + } + } + return 0; +} + +static int p2m_dump_open(struct inode *inode, struct file *filp) +{ + return single_open(filp, p2m_dump_show, NULL); +} + +static const struct file_operations p2m_dump_fops = { + .open = p2m_dump_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + static struct dentry *d_mmu_debug; static int __init xen_mmu_debugfs(void) @@ -2819,6 +2902,7 @@ static int __init xen_mmu_debugfs(void) debugfs_create_u32("prot_commit_batched", 0444, d_mmu_debug, &mmu_stats.prot_commit_batched); + debugfs_create_file("p2m", 0600, d_mmu_debug, NULL, &p2m_dump_fops); return 0; } fs_initcall(xen_mmu_debugfs); -- 1.7.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Dec-30 19:48 UTC
[Xen-devel] [PATCH 6/8] xen/debug: WARN_ON when 1-1 but no _PAGE_IOMAP flag set.
Only enabled if XEN_DEBUG_FS is enabled. We print a warning when: pfn_to_mfn(pfn) == pfn, but no VM_IO (_PAGE_IOMAP) flag pfn_to_mfn(pfn) != pfn, and VM_IO flag is set. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/mmu.c | 37 +++++++++++++++++++++++++++++++++++++ 1 files changed, 37 insertions(+), 0 deletions(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index e9dfdd6..d98bd43 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -944,6 +944,40 @@ pte_t xen_make_pte(pteval_t pte) } PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte); +#ifdef CONFIG_XEN_DEBUG_FS +pte_t xen_make_pte_debug(pteval_t pte) +{ + phys_addr_t addr = (pte & PTE_PFN_MASK); + phys_addr_t other_addr; + bool io_page = false; + pte_t _pte; + + if (pte & _PAGE_IOMAP) + io_page = true; + + _pte = xen_make_pte(pte); + + if (!addr) + return _pte; + + if (io_page && + (xen_initial_domain() || addr >= ISA_END_ADDRESS)) { + other_addr = pfn_to_mfn(addr >> PAGE_SHIFT) << PAGE_SHIFT; + WARN(addr != other_addr, + "0x%lx is using VM_IO, but it is 0x%lx!\n", + (unsigned long)addr, (unsigned long)other_addr); + } else { + other_addr = (_pte.pte & PTE_PFN_MASK); + WARN((addr == other_addr) && (!io_page), + "0x%lx is missing VM_IO!\n", + (unsigned long)addr); + } + + return _pte; +} +PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte_debug); +#endif + pgd_t xen_make_pgd(pgdval_t pgd) { pgd = pte_pfn_to_mfn(pgd); @@ -2354,6 +2388,9 @@ __init void xen_ident_map_ISA(void) static __init void xen_post_allocator_init(void) { +#ifdef CONFIG_XEN_DEBUG_FS + pv_mmu_ops.make_pte = PV_CALLEE_SAVE(xen_make_pte_debug); +#endif pv_mmu_ops.set_pte = xen_set_pte; pv_mmu_ops.set_pmd = xen_set_pmd; pv_mmu_ops.set_pud = xen_set_pud; -- 1.7.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Dec-30 19:48 UTC
[Xen-devel] [PATCH 7/8] xen/mmu: Introduce IDENTITY_FRAME_BIT
We tack on the IDENTITY_FRAME_BIT on all PFNs which have been elected during the E820 parsing to be identity mapping. This way, in case a freak occurs such that pfn_to_mfn(pfn)==pfn for non-identity PFNs we can guard against that and not consider it a 1-1 mapping. Also this will allows us to leverage this and tack on _PAGE_IOMAP on any page that has IDENTITY_FRAME_BIT set (see: "xen/mmu: Set _PAGE_IOMAP if PFN is in identity P2M."). Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/include/asm/xen/page.h | 6 +++++- arch/x86/xen/mmu.c | 11 ++++++++--- arch/x86/xen/setup.c | 2 +- 3 files changed, 14 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index be671f6..a1e4ce8 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -30,6 +30,7 @@ typedef struct xpaddr { /**** MACHINE <-> PHYSICAL CONVERSION MACROS ****/ #define INVALID_P2M_ENTRY (~0UL) #define FOREIGN_FRAME_BIT (1UL<<31) +#define IDENTITY_FRAME_BIT (1UL<<30) #define FOREIGN_FRAME(m) ((m) | FOREIGN_FRAME_BIT) /* Maximum amount of memory we can handle in a domain in pages */ @@ -52,9 +53,12 @@ static inline unsigned long pfn_to_mfn(unsigned long pfn) mfn = get_phys_to_machine(pfn); - if (mfn != INVALID_P2M_ENTRY) + if (mfn != INVALID_P2M_ENTRY) { mfn &= ~FOREIGN_FRAME_BIT; + if (mfn & IDENTITY_FRAME_BIT) + mfn &= ~IDENTITY_FRAME_BIT; + } return mfn; } diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index d98bd43..d470435 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -425,10 +425,10 @@ unsigned long get_phys_to_machine(unsigned long pfn) * would be wrong. */ if (p2m_top[topidx] == p2m_mid_identity) - return pfn; + return pfn | IDENTITY_FRAME_BIT; if (p2m_top[topidx][mididx] == p2m_identity) - return pfn; + return pfn | IDENTITY_FRAME_BIT; return p2m_top[topidx][mididx][idx]; } @@ -546,7 +546,7 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn) /* For sparse holes were the p2m leaf has real PFN along with * PCI holes, stick in the PFN as the MFN value. */ - if (pfn == mfn) { + if (mfn != INVALID_P2M_ENTRY && (mfn & IDENTITY_FRAME_BIT)) { if (p2m_top[topidx] == p2m_mid_identity) return 1; if (p2m_top[topidx][mididx] == p2m_identity) @@ -2834,7 +2834,12 @@ static int p2m_dump_show(struct seq_file *m, void *v) } else if (p2m_top[topidx][mididx] == p2m_identity) { lvl = 1; type = TYPE_IDENTITY; } else if (p2m_top[topidx][mididx][idx] == pfn) { + lvl = 2; type = TYPE_PFN; + } else if (p2m_top[topidx][mididx][idx] =+ (pfn | IDENTITY_FRAME_BIT)) { lvl = 2; type = TYPE_IDENTITY; + } else if (p2m_top[topidx][mididx][idx] != pfn) { + lvl = 2; type = TYPE_PFN; } else if (p2m_top[topidx] == NULL) { lvl = 0; type = TYPE_UNKN; } else if (p2m_top[topidx][mididx] == NULL) { diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 643b3fc..35b3b4d 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -168,7 +168,7 @@ static unsigned long __init xen_set_identity(const struct e820map *e820) } for (pfn = PFN_UP(last); pfn < PFN_DOWN(end); pfn++) - __set_phys_to_machine(pfn, pfn); + __set_phys_to_machine(pfn, pfn | IDENTITY_FRAME_BIT); identity += pfn - PFN_UP(last); last = end; -- 1.7.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Dec-30 19:48 UTC
[Xen-devel] [PATCH 8/8] xen/mmu: Set _PAGE_IOMAP if PFN is in identity P2M.
If we find that the PFN is within the P2M as an identity make sure to tack on the _PAGE_IOMAP flag. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/xen/mmu.c | 19 +++++++++++++++---- 1 files changed, 15 insertions(+), 4 deletions(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index d470435..158d07a 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -828,7 +828,7 @@ static pteval_t pte_pfn_to_mfn(pteval_t val) if (val & _PAGE_PRESENT) { unsigned long pfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT; pteval_t flags = val & PTE_FLAGS_MASK; - unsigned long mfn = pfn_to_mfn(pfn); + unsigned long mfn = get_phys_to_machine(pfn); /* * If there''s no mfn for the pfn, then just create an @@ -839,8 +839,18 @@ static pteval_t pte_pfn_to_mfn(pteval_t val) if (unlikely(mfn == INVALID_P2M_ENTRY)) { mfn = 0; flags = 0; + } else { + /* + * Paramount to do this test _after_ the + * INVALID_P2M_ENTRY as INVALID_P2M_ENTRY & + * IDENTITY_FRAME_BIT resolves to true. + */ + mfn &= ~FOREIGN_FRAME_BIT; + if (mfn & IDENTITY_FRAME_BIT) { + mfn &= ~IDENTITY_FRAME_BIT; + flags |= _PAGE_IOMAP; + } } - val = ((pteval_t)mfn << PAGE_SHIFT) | flags; } @@ -968,8 +978,9 @@ pte_t xen_make_pte_debug(pteval_t pte) (unsigned long)addr, (unsigned long)other_addr); } else { other_addr = (_pte.pte & PTE_PFN_MASK); - WARN((addr == other_addr) && (!io_page), - "0x%lx is missing VM_IO!\n", + pteval_t iomap_set = (_pte.pte & PTE_FLAGS_MASK) & _PAGE_IOMAP; + WARN((addr == other_addr) && (!io_page) && (!iomap_set), + "0x%lx is missing VM_IO (and wasn''t fixed)!\n", (unsigned long)addr); } -- 1.7.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jan-04 16:26 UTC
[Xen-devel] Re: [PATCH 7/8] xen/mmu: Introduce IDENTITY_FRAME_BIT
On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:> @@ -52,9 +53,12 @@ static inline unsigned long pfn_to_mfn(unsigned long pfn) > > mfn = get_phys_to_machine(pfn); > > - if (mfn != INVALID_P2M_ENTRY) > + if (mfn != INVALID_P2M_ENTRY) { > mfn &= ~FOREIGN_FRAME_BIT; > > + if (mfn & IDENTITY_FRAME_BIT) > + mfn &= ~IDENTITY_FRAME_BIT; > + }I don''t think the inner-if buys us anything here and the whole thing is equivalent to: if (mfn != INVALID_P2M_ENTRY) mfn &= ~(FOREIGN_FRAME_BIT|IDENTITY_FRAME_BIT); Not sure if the FOREIGN_FRAME_BIT|IDENTITY_FRAME_BIT construct gets enough use to be worthy of a #define FRAME_TYPE_MASK etc.> return mfn; > } > > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c > index d98bd43..d470435 100644 > --- a/arch/x86/xen/mmu.c > +++ b/arch/x86/xen/mmu.c > @@ -425,10 +425,10 @@ unsigned long get_phys_to_machine(unsigned long pfn) > * would be wrong. > */ > if (p2m_top[topidx] == p2m_mid_identity) > - return pfn; > + return pfn | IDENTITY_FRAME_BIT;It''s probably worth defining IDENTITY_FRAME(m) in the pattern of FOREIGN_FRAME(m). Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jan-04 16:34 UTC
[Xen-devel] Re: [PATCH 1/8] xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY.
On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:> With this patch, we diligently set regions that will be used by the > balloon driver to be INVALID_P2M_ENTRY and under the ownership > of the balloon driver. We are OK using the __set_phys_to_machine > as we do not expect to be allocating any P2M middle or entries pages.... because xen_build_mfn_list_list will have already allocated all such pages up to xen_max_p2m_pfn. (right?) Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-04 16:45 UTC
[Xen-devel] Re: [PATCH 1/8] xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY.
On Tue, Jan 04, 2011 at 04:34:48PM +0000, Ian Campbell wrote:> On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote: > > With this patch, we diligently set regions that will be used by the > > balloon driver to be INVALID_P2M_ENTRY and under the ownership > > of the balloon driver. We are OK using the __set_phys_to_machine > > as we do not expect to be allocating any P2M middle or entries pages. > > ... because xen_build_mfn_list_list will have already allocated all such > pages up to xen_max_p2m_pfn.Correct. Let me update the description accordingly. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-04 16:45 UTC
[Xen-devel] Re: [PATCH 7/8] xen/mmu: Introduce IDENTITY_FRAME_BIT
On Tue, Jan 04, 2011 at 04:26:15PM +0000, Ian Campbell wrote:> On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote: > > @@ -52,9 +53,12 @@ static inline unsigned long pfn_to_mfn(unsigned long pfn) > > > > mfn = get_phys_to_machine(pfn); > > > > - if (mfn != INVALID_P2M_ENTRY) > > + if (mfn != INVALID_P2M_ENTRY) { > > mfn &= ~FOREIGN_FRAME_BIT; > > > > + if (mfn & IDENTITY_FRAME_BIT) > > + mfn &= ~IDENTITY_FRAME_BIT; > > + } > > I don''t think the inner-if buys us anything here and the whole thing is > equivalent to: > if (mfn != INVALID_P2M_ENTRY) > mfn &= ~(FOREIGN_FRAME_BIT|IDENTITY_FRAME_BIT); > > Not sure if the FOREIGN_FRAME_BIT|IDENTITY_FRAME_BIT construct gets > enough use to be worthy of a #define FRAME_TYPE_MASK etc.Yeah, that is much better.> > > return mfn; > > } > > > > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c > > index d98bd43..d470435 100644 > > --- a/arch/x86/xen/mmu.c > > +++ b/arch/x86/xen/mmu.c > > @@ -425,10 +425,10 @@ unsigned long get_phys_to_machine(unsigned long pfn) > > * would be wrong. > > */ > > if (p2m_top[topidx] == p2m_mid_identity) > > - return pfn; > > + return pfn | IDENTITY_FRAME_BIT; > > It''s probably worth defining IDENTITY_FRAME(m) in the pattern of > FOREIGN_FRAME(m).<nods> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jan-04 16:53 UTC
[Xen-devel] Re: [PATCH 2/8] xen/mmu: Add the notion of identity (1-1) mapping.
Should patch 7/8 (add the IDENTITY_FRAME_BIT) be folded back into this patch? On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:> Our P2M tree structure is a three-level. On the leaf nodes > we set the Machine Frame Number (MFN) of the PFN. What this means > is that when one does: pfn_to_mfn(pfn), which is used when creating > PTE entries, you get the real MFN of the hardware. When Xen sets > up a guest it initially populates a array which has descending MFN > values, as so: > > idx: 0, 1, 2 > [0x290F, 0x290E, 0x290D, ..] > > so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list > starts looking quite random. > > We graft this structure on our P2M tree structure and stick in > those MFN in the leafs. But for all other leaf entries, or for the top > root, or middle one, for which there is a void entry, we assume it is > "missing". So > pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY. > > We add the possibility of setting 1-1 mappings on certain regions, so > that: > pfn_to_mfn(0xc0000)=0xc0000 > > The benefit of this is, that we can assume for non-RAM regions (think > PCI BARs, or ACPI spaces), we can create mappings easily b/c we > get the PFN value to match the MFN. > > For this to work efficiently we introduce are two new pages: p2m_identity > and p2m_mid_identity. All entries in p2m_identity are set to INVALID_P2M_ENTRY > type,This statement confused me at first. I think the piece of information which is missing in the rest of the paragraph is that on lookup we spot that the entry points to p2m_identity and return the identity value instead of dereferencing and returning INVALID_P2M_ENTRY. If the value is never actually (deliberately) completely dereferecned then perhaps for sanity/debugging the page should contain some other invalid/poison value so we can spot in stack traces etc cases where it has been erroneously dereferenced? (e.g. 0xFFFFFFFE?) Or does that confuse the tools/migration or something similar?> @@ -410,6 +418,17 @@ unsigned long get_phys_to_machine(unsigned long pfn) > mididx = p2m_mid_index(pfn); > idx = p2m_index(pfn); > > + /* > + * The INVALID_P2M_ENTRY is filled in both p2m_*identity > + * and in p2m_*missing, so returning the INVALID_P2M_ENTRY > + * would be wrong. > + */ > + if (p2m_top[topidx] == p2m_mid_identity) > + return pfn; > + > + if (p2m_top[topidx][mididx] == p2m_identity) > + return pfn; > +If p2m_top[topidx] == p2m_mid_identity the also p2m_top[topidx][mididx] == p2m_identity. Therefore I''m not sure if the first check is worthwhile or not.> @@ -517,6 +542,26 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn) > mididx = p2m_mid_index(pfn); > idx = p2m_index(pfn); > > + /* For sparse holes were the p2m leaf has real PFN along with > + * PCI holes, stick in the PFN as the MFN value. > + */ > + if (pfn == mfn) { > + if (p2m_top[topidx] == p2m_mid_identity) > + return 1; > + if (p2m_top[topidx][mididx] == p2m_identity) > + return 1;Should be "return true" throughout for a function returning bool. I think it can also be simplified as per my comment above.> + /* Swap over from MISSING to IDENTITY if needed. */ > + if (p2m_top[topidx] == p2m_mid_missing) { > + p2m_top[topidx] = p2m_mid_identity; > + return 1; > + } > + if (p2m_top[topidx][mididx] == p2m_missing) { > + p2m_top[topidx][mididx] = p2m_identity; > + return 1; > + }I don''t think I quite understand this bit. If I do "__set_phys_to_machine(X, X)" where X happens to currently correspond to p2m_mid_missing won''t that cause all pfn entries in the range covered by p2m_top[topidx] (i.e. quite a few, like 512^2 pages or something) to switch from missing to identity? Similarly for p2m_top[topidx][mididx]? Perhaps ranges of identity bits are often well aligned with the boundaries in the p2m 3-level tree but wouldn''t that just be coincidence? Don''t we need to completely populate the tree at this point and setup the leaf nodes as appropriate? Which we can''t do since this is __set_phys_to_machine so we need to fail and let set_phys_to_machine to its thing? Or maybe this hole bit needs to be hoisted up to set_phys_to_machine? Ian.> + } > + > if (p2m_top[topidx][mididx] == p2m_missing) > return mfn == INVALID_P2M_ENTRY; >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jan-04 16:59 UTC
[Xen-devel] Re: [PATCH 2/8] xen/mmu: Add the notion of identity (1-1) mapping.
On Tue, 2011-01-04 at 16:53 +0000, Ian Campbell wrote:> > If I do "__set_phys_to_machine(X, X)" where X happens to currently > correspond to p2m_mid_missing won''t that cause all pfn entries in the > range covered by p2m_top[topidx] (i.e. quite a few, like 512^2 pages > or > something) to switch from missing to identity? Similarly for > p2m_top[topidx][mididx]? > > Perhaps ranges of identity bits are often well aligned with the > boundaries in the p2m 3-level tree but wouldn''t that just be > coincidence?I wonder if it would make sense to have a debugfs file which exports the p2m, for the purposes of eye-balling it for this sort of issue? comparing to the e820 etc... Maybe the whole thing in raw form would be overkill but spitting out a list of ranges of identity, invalid, normal pages, plus annotation regarding whether those come from a p2m_mid_identity style page or a leaf node, might be occasionally useful. Ian. -- Ian Campbell Current Noise: Place of Skulls - The Watchers Call toll free number before digging. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jan-04 17:18 UTC
[Xen-devel] Re: [PATCH 3/8] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:> We walk the E820 region and start at 0 (for PV guests we start > at ISA_END_ADDRESS)I was trying to figure out what any of this had to do with HVM guests, but you mean as opposed to dom0, which with my pedant hat on is also a guest ;-).> and skip any E820 RAM regions. For all other > regions and as well the gaps we set them to be identity mappings. > > The reasons we do not want to set the identity mapping from 0-> > ISA_END_ADDRESS when running as PV is b/c that the kernel would > try to read DMI information and fail (no permissions to read that).The reason for this special case is that in domU we have already punched a hole from 640k-1M into the e820 which the hypervisor gave us. Should we perhaps be doing this identity mapping before we punch that extra hole? i.e. setup ID mappings based on the hypervisors idea of the guest e820 not the munged one we subsequently magicked up? Only the original e820 is going to bear any possible resemblance to the identity pages which the guest can actually see. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jan-04 17:20 UTC
Re: [Xen-devel] Re: [PATCH 2/8] xen/mmu: Add the notion of identity (1-1) mapping.
On Tue, 2011-01-04 at 16:59 +0000, Ian Campbell wrote:> On Tue, 2011-01-04 at 16:53 +0000, Ian Campbell wrote: > > > > If I do "__set_phys_to_machine(X, X)" where X happens to currently > > correspond to p2m_mid_missing won''t that cause all pfn entries in the > > range covered by p2m_top[topidx] (i.e. quite a few, like 512^2 pages > > or > > something) to switch from missing to identity? Similarly for > > p2m_top[topidx][mididx]? > > > > Perhaps ranges of identity bits are often well aligned with the > > boundaries in the p2m 3-level tree but wouldn''t that just be > > coincidence? > > I wonder if it would make sense to have a debugfs file which exports the > p2m, for the purposes of eye-balling it for this sort of issue? > comparing to the e820 etc... > > Maybe the whole thing in raw form would be overkill but spitting out a > list of ranges of identity, invalid, normal pages, plus annotation > regarding whether those come from a p2m_mid_identity style page or a > leaf node, might be occasionally useful.I should have read patch 5/8 first... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jan-04 17:24 UTC
[Xen-devel] Re: [PATCH 6/8] xen/debug: WARN_ON when 1-1 but no _PAGE_IOMAP flag set.
On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote:> Only enabled if XEN_DEBUG_FS is enabled.Bit of an odd configuration option to use. Perhaps co-opt CONFIG_PARAVIRT_DEBUG instead? Or maybe we need a new XEN_DEBUG option? or just make it a developer only EXTRA_CFLAGS +=-DDEBUG thing? Is this only temporary until the need for _PAGE_IOMAP is removed anyway? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-04 18:38 UTC
[Xen-devel] Re: [PATCH 3/8] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
On Tue, Jan 04, 2011 at 05:18:58PM +0000, Ian Campbell wrote:> On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote: > > We walk the E820 region and start at 0 (for PV guests we start > > at ISA_END_ADDRESS) > > I was trying to figure out what any of this had to do with HVM guests, > but you mean as opposed to dom0, which with my pedant hat on is also a > guest ;-). > > > and skip any E820 RAM regions. For all other > > regions and as well the gaps we set them to be identity mappings. > > > > The reasons we do not want to set the identity mapping from 0-> > > ISA_END_ADDRESS when running as PV is b/c that the kernel would > > try to read DMI information and fail (no permissions to read that). > > The reason for this special case is that in domU we have already punched > a hole from 640k-1M into the e820 which the hypervisor gave us.For the privileged guest - yes. But for the non-priviligied it does not have such range and would end up failing.> > Should we perhaps be doing this identity mapping before we punch that > extra hole? i.e. setup ID mappings based on the hypervisors idea of the > guest e820 not the munged one we subsequently magicked up? Only theYou mean the ISA_START_ADDRESS->ISA_END_ADDRESS we mark as reserved? It sure would be easier (and it would mean we can return that memory back to the hypervisor).> original e820 is going to bear any possible resemblance to the identity > pages which the guest can actually see. > > Ian. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-04 18:46 UTC
[Xen-devel] Re: [PATCH 6/8] xen/debug: WARN_ON when 1-1 but no _PAGE_IOMAP flag set.
On Tue, Jan 04, 2011 at 05:24:38PM +0000, Ian Campbell wrote:> On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote: > > Only enabled if XEN_DEBUG_FS is enabled. > > Bit of an odd configuration option to use. Perhaps co-opt > CONFIG_PARAVIRT_DEBUG instead? > > Or maybe we need a new XEN_DEBUG option? or just make it a developer > only EXTRA_CFLAGS +=-DDEBUG thing?I think the ''XEN_DEBUG'' works better.> > Is this only temporary until the need for _PAGE_IOMAP is removed anyway?I was thinking to leave it as a way to weed out bugs, but I could as well just leave it in my "debug" branch and not propose it upstream. I am not sure how to remove the _PAGE_IOMAP fully. We need some _flag_ to signal ''xen_pte_val'' that the PTE should not be looked up in the M2P. Otherwise, for identity mappings, the value is 0xfffff.. (or 0x55555.. sometimes) and the PTE ends up being messed up. Instead there is a check to see if _PAGE_IOMAP is part of the PTE, and if so, no M2P lookup is done. I guess we could do the M2P irregardless and just see if it is 0xfffff... and if so just return the PTE without any changes.> > Ian. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jan-04 19:20 UTC
[Xen-devel] Re: [PATCH 6/8] xen/debug: WARN_ON when 1-1 but no _PAGE_IOMAP flag set.
On Tue, 2011-01-04 at 18:46 +0000, Konrad Rzeszutek Wilk wrote:> On Tue, Jan 04, 2011 at 05:24:38PM +0000, Ian Campbell wrote: > > On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote: > > > Only enabled if XEN_DEBUG_FS is enabled. > > > > Bit of an odd configuration option to use. Perhaps co-opt > > CONFIG_PARAVIRT_DEBUG instead? > > > > Or maybe we need a new XEN_DEBUG option? or just make it a developer > > only EXTRA_CFLAGS +=-DDEBUG thing? > > I think the ''XEN_DEBUG'' works better. > > > > Is this only temporary until the need for _PAGE_IOMAP is removed anyway? > > I was thinking to leave it as a way to weed out bugs, but I could as well > just leave it in my "debug" branch and not propose it upstream. > > I am not sure how to remove the _PAGE_IOMAP fully. We need some _flag_ to > signal ''xen_pte_val'' that the PTE should not be looked up in the M2P. > > Otherwise, for identity mappings, the value is 0xfffff.. (or 0x55555.. sometimes) > and the PTE ends up being messed up. Instead there is a check to see if > _PAGE_IOMAP is part of the PTE, and if so, no M2P lookup is done. > > I guess we could do the M2P irregardless and just see if it is 0xfffff... and > if so just return the PTE without any changes.Perhaps this ties in with the m2p overlay which Stefano+Jeremy have been working on to deal with granted foreign pages? I/O pages are a bit like foreign memory (if you squint enough)... Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-04 19:24 UTC
[Xen-devel] Re: [PATCH 2/8] xen/mmu: Add the notion of identity (1-1) mapping.
On Tue, Jan 04, 2011 at 04:53:48PM +0000, Ian Campbell wrote:> Should patch 7/8 (add the IDENTITY_FRAME_BIT) be folded back into this > patch?Does not have too. This can work independently (it does not introduce any regressions).> > On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote: > > Our P2M tree structure is a three-level. On the leaf nodes > > we set the Machine Frame Number (MFN) of the PFN. What this means > > is that when one does: pfn_to_mfn(pfn), which is used when creating > > PTE entries, you get the real MFN of the hardware. When Xen sets > > up a guest it initially populates a array which has descending MFN > > values, as so: > > > > idx: 0, 1, 2 > > [0x290F, 0x290E, 0x290D, ..] > > > > so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list > > starts looking quite random. > > > > We graft this structure on our P2M tree structure and stick in > > those MFN in the leafs. But for all other leaf entries, or for the top > > root, or middle one, for which there is a void entry, we assume it is > > "missing". So > > pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY. > > > > We add the possibility of setting 1-1 mappings on certain regions, so > > that: > > pfn_to_mfn(0xc0000)=0xc0000 > > > > The benefit of this is, that we can assume for non-RAM regions (think > > PCI BARs, or ACPI spaces), we can create mappings easily b/c we > > get the PFN value to match the MFN. > > > > For this to work efficiently we introduce are two new pages: p2m_identity > > and p2m_mid_identity. All entries in p2m_identity are set to INVALID_P2M_ENTRY > > type, > > This statement confused me at first. I think the piece of information > which is missing in the rest of the paragraph is that on lookup we spot > that the entry points to p2m_identity and return the identity value > instead of dereferencing and returning INVALID_P2M_ENTRY. > > If the value is never actually (deliberately) completely dereferecned > then perhaps for sanity/debugging the page should contain some other > invalid/poison value so we can spot in stack traces etc cases where it > has been erroneously dereferenced? (e.g. 0xFFFFFFFE?) Or does that > confuse the tools/migration or something similar?Yup. The tools have checks for either valid PFNs or 0xFFFFFFF, but nothing else. They flip out if you stick other values in .. oh and sticking in the same MFN (in case you were thinking of them pointing to a dummy page) makes the toolstack think you are having a race.> > > @@ -410,6 +418,17 @@ unsigned long get_phys_to_machine(unsigned long pfn) > > mididx = p2m_mid_index(pfn); > > idx = p2m_index(pfn); > > > > + /* > > + * The INVALID_P2M_ENTRY is filled in both p2m_*identity > > + * and in p2m_*missing, so returning the INVALID_P2M_ENTRY > > + * would be wrong. > > + */ > > + if (p2m_top[topidx] == p2m_mid_identity) > > + return pfn; > > + > > + if (p2m_top[topidx][mididx] == p2m_identity) > > + return pfn; > > + > > If p2m_top[topidx] == p2m_mid_identity the also p2m_top[topidx][mididx] > == p2m_identity. Therefore I''m not sure if the first check is worthwhile > or not.Good eye.> > > @@ -517,6 +542,26 @@ bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn) > > mididx = p2m_mid_index(pfn); > > idx = p2m_index(pfn); > > > > + /* For sparse holes were the p2m leaf has real PFN along with > > + * PCI holes, stick in the PFN as the MFN value. > > + */ > > + if (pfn == mfn) { > > + if (p2m_top[topidx] == p2m_mid_identity) > > + return 1; > > + if (p2m_top[topidx][mididx] == p2m_identity) > > + return 1; > > Should be "return true" throughout for a function returning bool. I > think it can also be simplified as per my comment above.<nods>> > > + /* Swap over from MISSING to IDENTITY if needed. */ > > + if (p2m_top[topidx] == p2m_mid_missing) { > > + p2m_top[topidx] = p2m_mid_identity; > > + return 1; > > + } > > + if (p2m_top[topidx][mididx] == p2m_missing) { > > + p2m_top[topidx][mididx] = p2m_identity; > > + return 1; > > + } > > I don''t think I quite understand this bit. > > If I do "__set_phys_to_machine(X, X)" where X happens to currently > correspond to p2m_mid_missing won''t that cause all pfn entries in the > range covered by p2m_top[topidx] (i.e. quite a few, like 512^2 pages or > something) to switch from missing to identity? Similarly for > p2m_top[topidx][mididx]?Yup. Keep in mind that this patchset explores the more "conservative" approach. All ''void'' entries are pointing to the p2m_*MISSING instead of p2m_*IDENTITY pages. Hence the swap over from missing to identity is correct.> > Perhaps ranges of identity bits are often well aligned with the > boundaries in the p2m 3-level tree but wouldn''t that just be > coincidence?I thought so, but for all the machines I''ve run this on, those boundaries are there.> > Don''t we need to completely populate the tree at this point and setup > the leaf nodes as appropriate? Which we can''t do since this isNot exactly. At this point (xen/setup, parsing the E820), the P2M has been populated up to nr_pages. So, say we have 2G to the guest, that is p2m[0, 1] point to two pages "created" by extend_brk. The contents of the pages points to mfn_list[..]. The rest of the p2m (so, p2m[2]..p2m[511]) point to p2m_mid_missing. When we start the identity mapping, we end up right (for example) from 3GB to 4GB with identity mapping, so p2m[3] is now p2m_mid_identity. The p2m[4] and later are still in p2m_mid_missing. Thought I''ve just thought of nasty condition. Say those PCI regions start at 2.5GB, and the user provided dom0_mem=max:3GB. Oh, and actually that still works. The p2m[1] will be allocated by reserv_brk, and we just end up sticking from p2m[1][473]..p2m[1][511] with the PFN | IDENTITY_FRAME_BIT, which is OK as that contents used to have INVALID_P2M_IDENTITY (it would have been created by extend_brk in xen_build_dynamic_phys_to_machine and filled with ~0 value).> __set_phys_to_machine so we need to fail and let set_phys_to_machine to > its thing? Or maybe this hole bit needs to be hoisted up to > set_phys_to_machine?It can, but set_phys_to_machine would have to use reserved_brk. The danger I am seeing is if the non-RAM regions (and gaps) in the E820 are not 1GB (or 512MB under 32-bit) boundary aligned (2.36GB?)_and_ the dom0_mem is employed as to make the top-level P2M right up to the non-boundary aligned E820 void (so, dom0_mem=max:2GB for example). That would create a proper p2m[0] and p2m[1], but p2m[3] would point to p2m_mid_missing, and when we try to make p2m[3][231] identity (so ~2.36GB area) we would discover it is p2m_missing and would over-write it to p2m_identity .. and it would be OK as we cannot use the balloon driver (max:2GB). However, if we used ''dom0_mem=max:2364MB,1GB'' I think we would hit a failure when the balloon driver would try to balloon up and end up hitting this BUG: phys_to_machine_mapping_valid(pfn) (as it should have returned INVALID_P2M_ENTRY but instead returns the MFN). A simple solution could be to make ''xen_build_dynamic_phys_to_machine'' cover E820 regions between "System RAM" in case the nr_pages does not extend that far. Or introduce a ''set_phys_to_machine_early'' that would do exactly the same thing as ''set_phys_to_machine'' but use extend_brk instead of __get_free_pages. Ideas? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jan-04 19:27 UTC
[Xen-devel] Re: [PATCH 3/8] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
On Tue, 2011-01-04 at 18:38 +0000, Konrad Rzeszutek Wilk wrote:> On Tue, Jan 04, 2011 at 05:18:58PM +0000, Ian Campbell wrote: > > On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote: > > > We walk the E820 region and start at 0 (for PV guests we start > > > at ISA_END_ADDRESS) > > > > I was trying to figure out what any of this had to do with HVM guests, > > but you mean as opposed to dom0, which with my pedant hat on is also a > > guest ;-). > > > > > and skip any E820 RAM regions. For all other > > > regions and as well the gaps we set them to be identity mappings. > > > > > > The reasons we do not want to set the identity mapping from 0-> > > > ISA_END_ADDRESS when running as PV is b/c that the kernel would > > > try to read DMI information and fail (no permissions to read that). > > > > The reason for this special case is that in domU we have already punched > > a hole from 640k-1M into the e820 which the hypervisor gave us. > > For the privileged guest - yes. But for the non-priviligied it does not have > such range and would end up failing.xen_memory_setup has: e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS - ISA_START_ADDRESS, E820_RESERVED); which is unconditional but is actually more for domU''s benefit than dom0''s which already sees the host e820 presumably with the right hole already in place, which we simply shadow, or maybe slightly extend, here. In a domU we do this because if you let these pages into the general allocation pool then they will potentially get used as page table pages (hence be R/O) but e.g. the DMI code tries to map them to probe for signatures and tries to does so R/W which fails. We could try and find everywhere in the kernel which does this or we can simply reserve the region which stops it getting used for page tables or other special things, and is somewhat less surprising for non-Xen code.> > Should we perhaps be doing this identity mapping before we punch that > > extra hole? i.e. setup ID mappings based on the hypervisors idea of the > > guest e820 not the munged one we subsequently magicked up? Only the > > You mean the ISA_START_ADDRESS->ISA_END_ADDRESS we mark as reserved?Yep.> It sure would be easier> (and it would mean we can return that memory back to the hypervisor).I don''t think you can return it, since something like the DMI code which wants to probe it expects to be able to map that PFN, if you''ve given the MFN back then that will fail. I suppose we could alias all such PFNs to the same scratch MFN but I''d be concerned about some piece of code which expects to interact with firmware scribbling over it and surprising some other piece of code which interacts with the firmware... Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-04 21:28 UTC
[Xen-devel] Re: [PATCH 3/8] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
> > For the privileged guest - yes. But for the non-priviligied it does not > > have such range and would end up failing. > > xen_memory_setup has: > e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS - > ISA_START_ADDRESS, E820_RESERVED); > which is unconditional but is actually more for domU''s benefit than > dom0''s which already sees the host e820 presumably with the right hole > already in place, which we simply shadow, or maybe slightly extend, > here.Actually we don''t do anything with that region in Dom0 case. We just return the PFN without consulting the P2M for 0->0x100 while for DomU _we_ do consult the P2M and set those in the PTE. (look in xen_make_pte)> > In a domU we do this because if you let these pages into the general > allocation pool then they will potentially get used as page table pages > (hence be R/O) but e.g. the DMI code tries to map them to probe for > signatures and tries to does so R/W which fails. We could try and find > everywhere in the kernel which does this or we can simply reserve the > region which stops it getting used for page tables or other special > things, and is somewhat less surprising for non-Xen code.Yeah, went that hole once.. too many generic pieces of code. .. snip..> > You mean the ISA_START_ADDRESS->ISA_END_ADDRESS we mark as reserved? > > Yep. > > > It sure would be easier > > > > (and it would mean we can return that memory back to the hypervisor). > > I don''t think you can return it, since something like the DMI code which > wants to probe it expects to be able to map that PFN, if you''ve given > the MFN back then that will fail.Correct (for non-priviliged PV domain).> > I suppose we could alias all such PFNs to the same scratch MFN but I''dIt actually works. I setup 0x1->0x100 to point to whatever the MFN was at 0x0, and released the pages from 0x1->0x100 and it worked for DomU PV guests (and dom0 since I ended up stomping those regions with the PFN| IDENTITY_BIT_FRAME). However, the tools weren''t happy (''xm save''). They did not like the same PFN across a couple of entries in the P2M table and complained about a potential race. But there is another way and that is to special case in ''xen_make_pte'' when we want to create a PTE for 0->ISA_END_ADDRESS and just give it the MFN from P2M[0x0] (for !xen_initial_domain()) while having the the pfns from 0x1->0x100 freed and set to be IDENTITY_BIT_FRAME... But that all just smacks ofweird corner cases. Thought the code that is there is already special casing access to that region. Maybe it would clear it up a bit.> be concerned about some piece of code which expects to interact with > firmware scribbling over it and surprising some other piece of code > which interacts with the firmware...Fortunatly the all look for a some signature first before trying to scribble. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Jan-05 14:03 UTC
[Xen-devel] Re: [PATCH 2/8] xen/mmu: Add the notion of identity (1-1) mapping.
On Tue, 2011-01-04 at 19:24 +0000, Konrad Rzeszutek Wilk wrote:> On Tue, Jan 04, 2011 at 04:53:48PM +0000, Ian Campbell wrote: > > Should patch 7/8 (add the IDENTITY_FRAME_BIT) be folded back into this > > patch? > > Does not have too. This can work independently (it does not introduce > any regressions).Without 7/8 don''t you have the issue of confusing identity mapped PFNs with ones which just happen to be identity?> > > > > + /* Swap over from MISSING to IDENTITY if needed. */ > > > + if (p2m_top[topidx] == p2m_mid_missing) { > > > + p2m_top[topidx] = p2m_mid_identity; > > > + return 1; > > > + } > > > + if (p2m_top[topidx][mididx] == p2m_missing) { > > > + p2m_top[topidx][mididx] = p2m_identity; > > > + return 1; > > > + } > > > > I don''t think I quite understand this bit. > > > > If I do "__set_phys_to_machine(X, X)" where X happens to currently > > correspond to p2m_mid_missing won''t that cause all pfn entries in the > > range covered by p2m_top[topidx] (i.e. quite a few, like 512^2 pages or > > something) to switch from missing to identity? Similarly for > > p2m_top[topidx][mididx]? > > Yup. Keep in mind that this patchset explores the more "conservative" approach. > > All ''void'' entries are pointing to the p2m_*MISSING instead of p2m_*IDENTITY > pages. > > Hence the swap over from missing to identity is correct.How? If I have a P2M such that P2M[X] == missing and P2M[X+1] == missing (because they both happen to be in a region backed by p2m_mid_missing) and I do __set_phys_to_machine(X, X) then this makes P2M[X] == identity, which is fine. But it also makes P2M[X+1] == identity which is at best counter-intuitive and at worst just totally wrong. I also suspect we shouldn''t be special casing __set_phys_to_machine(X, X) in this way (since it is prone to false positives). We should probably have a separate __set_phys_to_machine_identity(X) or something.> > > > Perhaps ranges of identity bits are often well aligned with the > > boundaries in the p2m 3-level tree but wouldn''t that just be > > coincidence? > > I thought so, but for all the machines I''ve run this on, those boundaries > are there.I really don''t think we should rely on that, it''s very likely just a coincidence...> > Don''t we need to completely populate the tree at this point and setup > > the leaf nodes as appropriate? Which we can''t do since this is > > Not exactly. At this point (xen/setup, parsing the E820), the P2M has > been populated up to nr_pages. So, say we have 2G to the guest, that is > p2m[0, 1] point to two pages "created" by extend_brk. The contents of the > pages points to mfn_list[..]. The rest of the p2m (so, p2m[2]..p2m[511]) > point to p2m_mid_missing. > > When we start the identity mapping, we end up right (for example) from 3GB > to 4GB with identity mapping, so p2m[3] is now p2m_mid_identity. The p2m[4] and > later are still in p2m_mid_missing. > > Thought I''ve just thought of nasty condition. Say those PCI regions start at 2.5GB, > and the user provided dom0_mem=max:3GB. Oh, and actually that still works. The > p2m[1] will be allocated by reserv_brk, and we just end up sticking from > p2m[1][473]..p2m[1][511] with the PFN | IDENTITY_FRAME_BIT, which is OK as > that contents used to have INVALID_P2M_IDENTITY (it would have been > created by extend_brk in xen_build_dynamic_phys_to_machine and filled > with ~0 value).> > __set_phys_to_machine so we need to fail and let set_phys_to_machine to > > its thing? Or maybe this hole bit needs to be hoisted up to > > set_phys_to_machine? > > It can, but set_phys_to_machine would have to use reserved_brk. > > The danger I am seeing is if the non-RAM regions (and gaps) in the E820 are > not 1GB (or 512MB under 32-bit) boundary aligned (2.36GB?)_and_ the dom0_mem is > employed as to make the top-level P2M right up to the non-boundary aligned E820 > void (so, dom0_mem=max:2GB for example). That would create a proper p2m[0] and p2m[1], > but p2m[3] would point to p2m_mid_missing, and when we try to make > p2m[3][231] identity (so ~2.36GB area) we would discover it is p2m_missing and > would over-write it to p2m_identity .. and it would be OK as we cannot use the > balloon driver (max:2GB). However, if we used ''dom0_mem=max:2364MB,1GB'' I think > we would hit a failure when the balloon driver would try to balloon up and > end up hitting this BUG: phys_to_machine_mapping_valid(pfn) (as it should have > returned INVALID_P2M_ENTRY but instead returns the MFN).Yes, this is exactly the problem case I was thinking of.> A simple solution could be to make ''xen_build_dynamic_phys_to_machine'' > cover E820 regions between "System RAM" in case the nr_pages does not extend > that far.I think the limit used by xen_build_dynamic_phys_to_machine should likely be the last page of the last entry in the e820 even if the e820 ends with RAM, hole, IO memory (or the RAM has been clamped such that it effectively looks like that). I guess it would be possible/wise to leave gaps in the case that the hole is massive, i.e. only populate the RAM + IO memory bits. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Jan-06 19:50 UTC
[Xen-devel] Re: [PATCH 6/8] xen/debug: WARN_ON when 1-1 but no _PAGE_IOMAP flag set.
On Tue, 4 Jan 2011, Ian Campbell wrote:> On Tue, 2011-01-04 at 18:46 +0000, Konrad Rzeszutek Wilk wrote: > > On Tue, Jan 04, 2011 at 05:24:38PM +0000, Ian Campbell wrote: > > > On Thu, 2010-12-30 at 19:48 +0000, Konrad Rzeszutek Wilk wrote: > > > > Only enabled if XEN_DEBUG_FS is enabled. > > > > > > Bit of an odd configuration option to use. Perhaps co-opt > > > CONFIG_PARAVIRT_DEBUG instead? > > > > > > Or maybe we need a new XEN_DEBUG option? or just make it a developer > > > only EXTRA_CFLAGS +=-DDEBUG thing? > > > > I think the ''XEN_DEBUG'' works better. > > > > > > Is this only temporary until the need for _PAGE_IOMAP is removed anyway? > > > > I was thinking to leave it as a way to weed out bugs, but I could as well > > just leave it in my "debug" branch and not propose it upstream. > > > > I am not sure how to remove the _PAGE_IOMAP fully. We need some _flag_ to > > signal ''xen_pte_val'' that the PTE should not be looked up in the M2P. > > > > Otherwise, for identity mappings, the value is 0xfffff.. (or 0x55555.. sometimes) > > and the PTE ends up being messed up. Instead there is a check to see if > > _PAGE_IOMAP is part of the PTE, and if so, no M2P lookup is done. > > > > I guess we could do the M2P irregardless and just see if it is 0xfffff... and > > if so just return the PTE without any changes. > > Perhaps this ties in with the m2p overlay which Stefano+Jeremy have been > working on to deal with granted foreign pages? I/O pages are a bit like > foreign memory (if you squint enough)...In theory the m2p overlay could be used for this purpose but in practice the current m2p overlay API needs a struct page, also it might end up stressing the hashtable too much. Besides I think Konrad''s solution might be simpler: if the m2p returns one of the two special values we just return mfn from pte_mfn_to_pfn. Keir, could you confirm that the m2p entries of DOM_IO pages are always 0xffffff or 0x55555? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Jan-06 20:17 UTC
[Xen-devel] Re: [PATCH 6/8] xen/debug: WARN_ON when 1-1 but no _PAGE_IOMAP flag set.
On 06/01/2011 19:50, "Stefano Stabellini" <stefano.stabellini@eu.citrix.com> wrote:>> Perhaps this ties in with the m2p overlay which Stefano+Jeremy have been >> working on to deal with granted foreign pages? I/O pages are a bit like >> foreign memory (if you squint enough)... > > In theory the m2p overlay could be used for this purpose but in practice > the current m2p overlay API needs a struct page, also it might end up > stressing the hashtable too much. > > Besides I think Konrad''s solution might be simpler: if the m2p returns > one of the two special values we just return mfn from pte_mfn_to_pfn. > > Keir, could you confirm that the m2p entries of DOM_IO pages are always > 0xffffff or 0x55555?Always 0x55...55 (for m2p entries that exist), else page fault on access to the non-existent m2p entry (m2p entries only guaranteed to exist for ram). Perhaps the 0xff...ff values come from Linux''s own fixup code handling a faulting read access of the m2p array? If so you could return 0x55...55 instead and avoid checking for 0xff...ff. I really don''t know how you could get 0xff...ff for non-RAM pages from Xen itself. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jan-06 21:59 UTC
[Xen-devel] Re: [PATCH 6/8] xen/debug: WARN_ON when 1-1 but no _PAGE_IOMAP flag set.
On Thu, Jan 06, 2011 at 08:17:38PM +0000, Keir Fraser wrote:> On 06/01/2011 19:50, "Stefano Stabellini" <stefano.stabellini@eu.citrix.com> > wrote: > > >> Perhaps this ties in with the m2p overlay which Stefano+Jeremy have been > >> working on to deal with granted foreign pages? I/O pages are a bit like > >> foreign memory (if you squint enough)... > > > > In theory the m2p overlay could be used for this purpose but in practice > > the current m2p overlay API needs a struct page, also it might end up > > stressing the hashtable too much. > > > > Besides I think Konrad''s solution might be simpler: if the m2p returns > > one of the two special values we just return mfn from pte_mfn_to_pfn. > > > > Keir, could you confirm that the m2p entries of DOM_IO pages are always > > 0xffffff or 0x55555? > > Always 0x55...55 (for m2p entries that exist), else page fault on access to > the non-existent m2p entry (m2p entries only guaranteed to exist for ram). > Perhaps the 0xff...ff values come from Linux''s own fixup code handling a > faulting read access of the m2p array? If so you could return 0x55...55 > instead and avoid checking for 0xff...ff. I really don''t know how you could > get 0xff...ff for non-RAM pages from Xen itself.The non-RAM pages are assinged to a DOMID_IO (arch_init_memory), for example: 298 /* First 1MB of RAM is historically marked as I/O. */ 299 for ( i = 0; i < 0x100; i++ ) 300 share_xen_page_with_guest(mfn_to_page(i), dom_io, XENSHARE_writable); and share_xen_page.. sets that page to INVALID_M2P_ENTRY. But I could also be reading the code wrongly?> > -- Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Jan-06 22:17 UTC
[Xen-devel] Re: [PATCH 6/8] xen/debug: WARN_ON when 1-1 but no _PAGE_IOMAP flag set.
On 06/01/2011 21:59, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:>> Always 0x55...55 (for m2p entries that exist), else page fault on access to >> the non-existent m2p entry (m2p entries only guaranteed to exist for ram). >> Perhaps the 0xff...ff values come from Linux''s own fixup code handling a >> faulting read access of the m2p array? If so you could return 0x55...55 >> instead and avoid checking for 0xff...ff. I really don''t know how you could >> get 0xff...ff for non-RAM pages from Xen itself. > > The non-RAM pages are assinged to a DOMID_IO (arch_init_memory), for example: > > 298 /* First 1MB of RAM is historically marked as I/O. */ > 299 for ( i = 0; i < 0x100; i++ ) > 300 share_xen_page_with_guest(mfn_to_page(i), dom_io, > XENSHARE_writable); > > and share_xen_page.. sets that page to INVALID_M2P_ENTRY. > > But I could also be reading the code wrongly?You''re right, I missed that. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel