Hi, V3: - New patch #7 for xsm changes to add_to_physmap. - Patches 3, 4 5 6 and 8 are unchanged from V2. (just a new comment in #8). These patches implement PVH dom0. Please note there is 1 fixme in this entire series that is being discussed under a different thread, and will be worked on next. PVH dom0 creation is disabled until then. Patches 1 thru 4 implement changes in and around construct_dom0. Patch 5 implements xsm changes for physmap update. Patches 6 thru 8 are to support tool stack on PVH dom0, and have been mostly looked at in the past. Finally, patch 9 adds option to boot a dom0 in PVH mode. These patches are based on c/s: e439e0b These can also be found in public git tree at: git://oss.oracle.com/git/mrathor/xen.git branch: dom0pvh-v3 Thanks for all the help, Mukesh
- For now, iommu is required for PVH dom0. Check for that. - For pvh, we need to do mfn_to_gmfn before calling mapping function intel_iommu_map_page/amd_iommu_map_page which expects a gfn. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> --- xen/drivers/passthrough/iommu.c | 17 ++++++++++++++++- 1 files changed, 16 insertions(+), 1 deletions(-) diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c index 93ad122..f6c7ad6 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -125,10 +125,24 @@ int iommu_domain_init(struct domain *d) return hd->platform_ops->init(d); } +static __init void check_dom0_pvh_reqs(struct domain *d) +{ + if ( !iommu_enabled ) + panic("Presently, iommu must be enabled for pvh dom0\n"); + + if ( iommu_passthrough ) + panic("For pvh dom0, dom0-passthrough must not be enabled\n"); + + iommu_dom0_strict = 1; +} + void __init iommu_dom0_init(struct domain *d) { struct hvm_iommu *hd = domain_hvm_iommu(d); + if ( is_pvh_domain(d) ) + check_dom0_pvh_reqs(d); + if ( !iommu_enabled ) return; @@ -141,12 +155,13 @@ void __init iommu_dom0_init(struct domain *d) page_list_for_each ( page, &d->page_list ) { unsigned long mfn = page_to_mfn(page); + unsigned long gfn = mfn_to_gmfn(d, mfn); unsigned int mapping = IOMMUF_readable; if ( ((page->u.inuse.type_info & PGT_count_mask) == 0) || ((page->u.inuse.type_info & PGT_type_mask) == PGT_writable_page) ) mapping |= IOMMUF_writable; - hd->platform_ops->map_page(d, mfn, mfn, mapping); + hd->platform_ops->map_page(d, gfn, mfn, mapping); if ( !(i++ & 0xfffff) ) process_pending_softirqs(); } -- 1.7.2.3
Mukesh Rathor
2013-Nov-27 02:27 UTC
[V3 PATCH 2/9] PVH dom0: create add_mem_mapping_for_xlate() function
In this preparatory patch, add portion of XEN_DOMCTL_memory_mapping code is put into a function so it can be called later for PVH from construct_dom0. There is no change in it''s functionality. The function is made non-static in the construct_dom0 patch. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/domctl.c | 48 ++++++++++++++++++++++++++++++------------------ 1 files changed, 30 insertions(+), 18 deletions(-) diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index 8644218..e3f544a 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -46,6 +46,35 @@ static int gdbsx_guest_mem_io( return (iop->remain ? -EFAULT : 0); } +static int add_mem_mapping_for_xlate(struct domain *d, unsigned long gfn, + unsigned long mfn, unsigned long nr_mfns) +{ + unsigned long i; + int ret = 0; + + for ( i = 0; i < nr_mfns; i++ ) + if ( !set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i)) ) + ret = -EIO; + if ( ret ) + { + if ( is_hardware_domain(d) ) + panic("Failed setting p2m. ret:%d gfn:%lx mfn:%lx i:%ld\n", + ret, gfn, mfn, i); + + printk(XENLOG_G_WARNING + "memory_map:fail: dom%d gfn=%lx mfn=%lx\n", + d->domain_id, gfn + i, mfn + i); + while ( i-- ) + clear_mmio_p2m_entry(d, gfn + i); + if ( iomem_deny_access(d, mfn, mfn + nr_mfns - 1) && + is_hardware_domain(current->domain) ) + printk(XENLOG_ERR + "memory_map: failed to deny dom%d access to [%lx,%lx]\n", + d->domain_id, mfn, mfn + nr_mfns - 1); + } + return ret; +} + long arch_do_domctl( struct xen_domctl *domctl, struct domain *d, XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) @@ -649,24 +678,7 @@ long arch_do_domctl( ret = iomem_permit_access(d, mfn, mfn + nr_mfns - 1); if ( !ret && paging_mode_translate(d) ) - { - for ( i = 0; !ret && i < nr_mfns; i++ ) - if ( !set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i)) ) - ret = -EIO; - if ( ret ) - { - printk(XENLOG_G_WARNING - "memory_map:fail: dom%d gfn=%lx mfn=%lx\n", - d->domain_id, gfn + i, mfn + i); - while ( i-- ) - clear_mmio_p2m_entry(d, gfn + i); - if ( iomem_deny_access(d, mfn, mfn + nr_mfns - 1) && - is_hardware_domain(current->domain) ) - printk(XENLOG_ERR - "memory_map: failed to deny dom%d access to [%lx,%lx]\n", - d->domain_id, mfn, mfn + nr_mfns - 1); - } - } + ret = add_mem_mapping_for_xlate(d, gfn, mfn, nr_mfns); } else { -- 1.7.2.3
Mukesh Rathor
2013-Nov-27 02:27 UTC
[V3 PATCH 3/9] PVH dom0: move some pv specific code to static functions
In this preparatory patch also, some pv specific code is carved out into static functions. No functionality change. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/domain_build.c | 353 +++++++++++++++++++++++-------------------- 1 files changed, 192 insertions(+), 161 deletions(-) diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c index 232adf8..c9ff680 100644 --- a/xen/arch/x86/domain_build.c +++ b/xen/arch/x86/domain_build.c @@ -307,6 +307,191 @@ static void __init process_dom0_ioports_disable(void) } } +/* Pages that are part of page tables must be read only. */ +static __init void mark_pv_pt_pages_rdonly(struct domain *d, + l4_pgentry_t *l4start, + unsigned long vpt_start, + unsigned long nr_pt_pages) +{ + unsigned long count; + struct page_info *page; + l4_pgentry_t *pl4e; + l3_pgentry_t *pl3e; + l2_pgentry_t *pl2e; + l1_pgentry_t *pl1e; + + pl4e = l4start + l4_table_offset(vpt_start); + pl3e = l4e_to_l3e(*pl4e); + pl3e += l3_table_offset(vpt_start); + pl2e = l3e_to_l2e(*pl3e); + pl2e += l2_table_offset(vpt_start); + pl1e = l2e_to_l1e(*pl2e); + pl1e += l1_table_offset(vpt_start); + for ( count = 0; count < nr_pt_pages; count++ ) + { + l1e_remove_flags(*pl1e, _PAGE_RW); + page = mfn_to_page(l1e_get_pfn(*pl1e)); + + /* Read-only mapping + PGC_allocated + page-table page. */ + page->count_info = PGC_allocated | 3; + page->u.inuse.type_info |= PGT_validated | 1; + + /* Top-level p.t. is pinned. */ + if ( (page->u.inuse.type_info & PGT_type_mask) =+ (!is_pv_32on64_domain(d) ? + PGT_l4_page_table : PGT_l3_page_table) ) + { + page->count_info += 1; + page->u.inuse.type_info += 1 | PGT_pinned; + } + + /* Iterate. */ + if ( !((unsigned long)++pl1e & (PAGE_SIZE - 1)) ) + { + if ( !((unsigned long)++pl2e & (PAGE_SIZE - 1)) ) + { + if ( !((unsigned long)++pl3e & (PAGE_SIZE - 1)) ) + pl3e = l4e_to_l3e(*++pl4e); + pl2e = l3e_to_l2e(*pl3e); + } + pl1e = l2e_to_l1e(*pl2e); + } + } +} + +/* Set up the phys->machine table if not part of the initial mapping. */ +static __init void setup_pv_physmap(struct domain *d, unsigned long pgtbl_pfn, + unsigned long v_start, unsigned long v_end, + unsigned long vphysmap_start, + unsigned long vphysmap_end, + unsigned long nr_pages) +{ + struct page_info *page = NULL; + l4_pgentry_t *pl4e = NULL, *l4start; + l3_pgentry_t *pl3e = NULL; + l2_pgentry_t *pl2e = NULL; + l1_pgentry_t *pl1e = NULL; + + l4start = map_domain_page(pgtbl_pfn); + + if ( v_start <= vphysmap_end && vphysmap_start <= v_end ) + panic("DOM0 P->M table overlaps initial mapping"); + + while ( vphysmap_start < vphysmap_end ) + { + if ( d->tot_pages + ((round_pgup(vphysmap_end) - vphysmap_start) + >> PAGE_SHIFT) + 3 > nr_pages ) + panic("Dom0 allocation too small for initial P->M table.\n"); + + if ( pl1e ) + { + unmap_domain_page(pl1e); + pl1e = NULL; + } + if ( pl2e ) + { + unmap_domain_page(pl2e); + pl2e = NULL; + } + if ( pl3e ) + { + unmap_domain_page(pl3e); + pl3e = NULL; + } + pl4e = l4start + l4_table_offset(vphysmap_start); + if ( !l4e_get_intpte(*pl4e) ) + { + page = alloc_domheap_page(d, 0); + if ( !page ) + break; + + /* No mapping, PGC_allocated + page-table page. */ + page->count_info = PGC_allocated | 2; + page->u.inuse.type_info = PGT_l3_page_table | PGT_validated | 1; + pl3e = __map_domain_page(page); + clear_page(pl3e); + *pl4e = l4e_from_page(page, L4_PROT); + } else + pl3e = map_domain_page(l4e_get_pfn(*pl4e)); + + pl3e += l3_table_offset(vphysmap_start); + if ( !l3e_get_intpte(*pl3e) ) + { + if ( cpu_has_page1gb && + !(vphysmap_start & ((1UL << L3_PAGETABLE_SHIFT) - 1)) && + vphysmap_end >= vphysmap_start + (1UL << L3_PAGETABLE_SHIFT) && + (page = alloc_domheap_pages(d, + L3_PAGETABLE_SHIFT - PAGE_SHIFT, + 0)) != NULL ) + { + *pl3e = l3e_from_page(page, L1_PROT|_PAGE_DIRTY|_PAGE_PSE); + vphysmap_start += 1UL << L3_PAGETABLE_SHIFT; + continue; + } + if ( (page = alloc_domheap_page(d, 0)) == NULL ) + break; + + /* No mapping, PGC_allocated + page-table page. */ + page->count_info = PGC_allocated | 2; + page->u.inuse.type_info = PGT_l2_page_table | PGT_validated | 1; + pl2e = __map_domain_page(page); + clear_page(pl2e); + *pl3e = l3e_from_page(page, L3_PROT); + } + else + pl2e = map_domain_page(l3e_get_pfn(*pl3e)); + + pl2e += l2_table_offset(vphysmap_start); + if ( !l2e_get_intpte(*pl2e) ) + { + if ( !(vphysmap_start & ((1UL << L2_PAGETABLE_SHIFT) - 1)) && + vphysmap_end >= vphysmap_start + (1UL << L2_PAGETABLE_SHIFT) && + (page = alloc_domheap_pages(d, + L2_PAGETABLE_SHIFT - PAGE_SHIFT, + 0)) != NULL ) + { + *pl2e = l2e_from_page(page, L1_PROT|_PAGE_DIRTY|_PAGE_PSE); + if ( opt_allow_superpage ) + get_superpage(page_to_mfn(page), d); + vphysmap_start += 1UL << L2_PAGETABLE_SHIFT; + continue; + } + if ( (page = alloc_domheap_page(d, 0)) == NULL ) + break; + + /* No mapping, PGC_allocated + page-table page. */ + page->count_info = PGC_allocated | 2; + page->u.inuse.type_info = PGT_l1_page_table | PGT_validated | 1; + pl1e = __map_domain_page(page); + clear_page(pl1e); + *pl2e = l2e_from_page(page, L2_PROT); + } + else + pl1e = map_domain_page(l2e_get_pfn(*pl2e)); + + pl1e += l1_table_offset(vphysmap_start); + BUG_ON(l1e_get_intpte(*pl1e)); + page = alloc_domheap_page(d, 0); + if ( !page ) + break; + + *pl1e = l1e_from_page(page, L1_PROT|_PAGE_DIRTY); + vphysmap_start += PAGE_SIZE; + vphysmap_start &= PAGE_MASK; + } + if ( !page ) + panic("Not enough RAM for DOM0 P->M table.\n"); + + if ( pl1e ) + unmap_domain_page(pl1e); + if ( pl2e ) + unmap_domain_page(pl2e); + if ( pl3e ) + unmap_domain_page(pl3e); + + unmap_domain_page(l4start); +} + int __init construct_dom0( struct domain *d, const module_t *image, unsigned long image_headroom, @@ -705,44 +890,8 @@ int __init construct_dom0( COMPAT_L2_PAGETABLE_XEN_SLOTS(d) * sizeof(*l2tab)); } - /* Pages that are part of page tables must be read only. */ - l4tab = l4start + l4_table_offset(vpt_start); - l3start = l3tab = l4e_to_l3e(*l4tab); - l3tab += l3_table_offset(vpt_start); - l2start = l2tab = l3e_to_l2e(*l3tab); - l2tab += l2_table_offset(vpt_start); - l1start = l1tab = l2e_to_l1e(*l2tab); - l1tab += l1_table_offset(vpt_start); - for ( count = 0; count < nr_pt_pages; count++ ) - { - l1e_remove_flags(*l1tab, _PAGE_RW); - page = mfn_to_page(l1e_get_pfn(*l1tab)); - - /* Read-only mapping + PGC_allocated + page-table page. */ - page->count_info = PGC_allocated | 3; - page->u.inuse.type_info |= PGT_validated | 1; - - /* Top-level p.t. is pinned. */ - if ( (page->u.inuse.type_info & PGT_type_mask) =- (!is_pv_32on64_domain(d) ? - PGT_l4_page_table : PGT_l3_page_table) ) - { - page->count_info += 1; - page->u.inuse.type_info += 1 | PGT_pinned; - } - - /* Iterate. */ - if ( !((unsigned long)++l1tab & (PAGE_SIZE - 1)) ) - { - if ( !((unsigned long)++l2tab & (PAGE_SIZE - 1)) ) - { - if ( !((unsigned long)++l3tab & (PAGE_SIZE - 1)) ) - l3start = l3tab = l4e_to_l3e(*++l4tab); - l2start = l2tab = l3e_to_l2e(*l3tab); - } - l1start = l1tab = l2e_to_l1e(*l2tab); - } - } + if ( is_pv_domain(d) ) + mark_pv_pt_pages_rdonly(d, l4start, vpt_start, nr_pt_pages); /* Mask all upcalls... */ for ( i = 0; i < XEN_LEGACY_MAX_VCPUS; i++ ) @@ -814,132 +963,14 @@ int __init construct_dom0( elf_64bit(&elf) ? 64 : 32, parms.pae ? "p" : ""); count = d->tot_pages; - l4start = map_domain_page(pagetable_get_pfn(v->arch.guest_table)); - l3tab = NULL; - l2tab = NULL; - l1tab = NULL; - /* Set up the phys->machine table if not part of the initial mapping. */ - if ( parms.p2m_base != UNSET_ADDR ) - { - unsigned long va = vphysmap_start; - if ( v_start <= vphysmap_end && vphysmap_start <= v_end ) - panic("DOM0 P->M table overlaps initial mapping"); - - while ( va < vphysmap_end ) - { - if ( d->tot_pages + ((round_pgup(vphysmap_end) - va) - >> PAGE_SHIFT) + 3 > nr_pages ) - panic("Dom0 allocation too small for initial P->M table.\n"); - - if ( l1tab ) - { - unmap_domain_page(l1tab); - l1tab = NULL; - } - if ( l2tab ) - { - unmap_domain_page(l2tab); - l2tab = NULL; - } - if ( l3tab ) - { - unmap_domain_page(l3tab); - l3tab = NULL; - } - l4tab = l4start + l4_table_offset(va); - if ( !l4e_get_intpte(*l4tab) ) - { - page = alloc_domheap_page(d, 0); - if ( !page ) - break; - /* No mapping, PGC_allocated + page-table page. */ - page->count_info = PGC_allocated | 2; - page->u.inuse.type_info - PGT_l3_page_table | PGT_validated | 1; - l3tab = __map_domain_page(page); - clear_page(l3tab); - *l4tab = l4e_from_page(page, L4_PROT); - } else - l3tab = map_domain_page(l4e_get_pfn(*l4tab)); - l3tab += l3_table_offset(va); - if ( !l3e_get_intpte(*l3tab) ) - { - if ( cpu_has_page1gb && - !(va & ((1UL << L3_PAGETABLE_SHIFT) - 1)) && - vphysmap_end >= va + (1UL << L3_PAGETABLE_SHIFT) && - (page = alloc_domheap_pages(d, - L3_PAGETABLE_SHIFT - - PAGE_SHIFT, - 0)) != NULL ) - { - *l3tab = l3e_from_page(page, - L1_PROT|_PAGE_DIRTY|_PAGE_PSE); - va += 1UL << L3_PAGETABLE_SHIFT; - continue; - } - if ( (page = alloc_domheap_page(d, 0)) == NULL ) - break; - /* No mapping, PGC_allocated + page-table page. */ - page->count_info = PGC_allocated | 2; - page->u.inuse.type_info - PGT_l2_page_table | PGT_validated | 1; - l2tab = __map_domain_page(page); - clear_page(l2tab); - *l3tab = l3e_from_page(page, L3_PROT); - } - else - l2tab = map_domain_page(l3e_get_pfn(*l3tab)); - l2tab += l2_table_offset(va); - if ( !l2e_get_intpte(*l2tab) ) - { - if ( !(va & ((1UL << L2_PAGETABLE_SHIFT) - 1)) && - vphysmap_end >= va + (1UL << L2_PAGETABLE_SHIFT) && - (page = alloc_domheap_pages(d, - L2_PAGETABLE_SHIFT - - PAGE_SHIFT, - 0)) != NULL ) - { - *l2tab = l2e_from_page(page, - L1_PROT|_PAGE_DIRTY|_PAGE_PSE); - if ( opt_allow_superpage ) - get_superpage(page_to_mfn(page), d); - va += 1UL << L2_PAGETABLE_SHIFT; - continue; - } - if ( (page = alloc_domheap_page(d, 0)) == NULL ) - break; - /* No mapping, PGC_allocated + page-table page. */ - page->count_info = PGC_allocated | 2; - page->u.inuse.type_info - PGT_l1_page_table | PGT_validated | 1; - l1tab = __map_domain_page(page); - clear_page(l1tab); - *l2tab = l2e_from_page(page, L2_PROT); - } - else - l1tab = map_domain_page(l2e_get_pfn(*l2tab)); - l1tab += l1_table_offset(va); - BUG_ON(l1e_get_intpte(*l1tab)); - page = alloc_domheap_page(d, 0); - if ( !page ) - break; - *l1tab = l1e_from_page(page, L1_PROT|_PAGE_DIRTY); - va += PAGE_SIZE; - va &= PAGE_MASK; - } - if ( !page ) - panic("Not enough RAM for DOM0 P->M table.\n"); + if ( is_pv_domain(d) && parms.p2m_base != UNSET_ADDR ) + { + pfn = pagetable_get_pfn(v->arch.guest_table); + setup_pv_physmap(d, pfn, v_start, v_end, vphysmap_start, vphysmap_end, + nr_pages); } - if ( l1tab ) - unmap_domain_page(l1tab); - if ( l2tab ) - unmap_domain_page(l2tab); - if ( l3tab ) - unmap_domain_page(l3tab); - unmap_domain_page(l4start); - /* Write the phys->machine and machine->phys table entries. */ for ( pfn = 0; pfn < count; pfn++ ) { -- 1.7.2.3
This patch changes construct_dom0 to boot in PVH mode. Changes need to support it are also included here. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/domain_build.c | 229 +++++++++++++++++++++++++++++++++++++++--- xen/arch/x86/domctl.c | 2 +- xen/arch/x86/mm/hap/hap.c | 15 +++ xen/include/asm-x86/hap.h | 1 + xen/include/xen/domain.h | 3 + 5 files changed, 232 insertions(+), 18 deletions(-) diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c index c9ff680..f4a32df 100644 --- a/xen/arch/x86/domain_build.c +++ b/xen/arch/x86/domain_build.c @@ -35,6 +35,7 @@ #include <asm/setup.h> #include <asm/bzimage.h> /* for bzimage_parse */ #include <asm/io_apic.h> +#include <asm/hap.h> #include <public/version.h> @@ -307,6 +308,145 @@ static void __init process_dom0_ioports_disable(void) } } +/* + * Set the 1:1 map for all non-RAM regions for dom 0. Thus, dom0 will have + * the entire io region mapped in the EPT/NPT. + * + * pvh fixme: The following doesn''t map MMIO ranges when they sit above the + * highest E820 covered address. + */ +static __init void pvh_map_all_iomem(struct domain *d) +{ + unsigned long start_pfn, end_pfn, end = 0, start = 0; + const struct e820entry *entry; + unsigned int i, nump; + int rc; + + for ( i = 0, entry = e820.map; i < e820.nr_map; i++, entry++ ) + { + end = entry->addr + entry->size; + + if ( entry->type == E820_RAM || entry->type == E820_UNUSABLE || + i == e820.nr_map - 1 ) + { + start_pfn = PFN_DOWN(start); + + /* Unused RAM areas are marked UNUSABLE, so skip it too */ + if ( entry->type == E820_RAM || entry->type == E820_UNUSABLE ) + end_pfn = PFN_UP(entry->addr); + else + end_pfn = PFN_UP(end); + + if ( start_pfn < end_pfn ) + { + nump = end_pfn - start_pfn; + /* Add pages to the mapping */ + rc = add_mem_mapping_for_xlate(d, start_pfn, start_pfn, nump); + BUG_ON(rc); + } + start = end; + } + } + + /* If the e820 ended under 4GB, we must map the remaining space upto 4GB */ + if ( end < GB(4) ) + { + start_pfn = PFN_UP(end); + end_pfn = (GB(4)) >> PAGE_SHIFT; + nump = end_pfn - start_pfn; + rc = add_mem_mapping_for_xlate(d, start_pfn, start_pfn, nump); + BUG_ON(rc); + } +} + +static __init void dom0_update_physmap(struct domain *d, unsigned long pfn, + unsigned long mfn, unsigned long vphysmap_s) +{ + if ( is_pvh_domain(d) ) + { + int rc = guest_physmap_add_page(d, pfn, mfn, 0); + BUG_ON(rc); + return; + } + if ( !is_pv_32on64_domain(d) ) + ((unsigned long *)vphysmap_s)[pfn] = mfn; + else + ((unsigned int *)vphysmap_s)[pfn] = mfn; + + set_gpfn_from_mfn(mfn, pfn); +} + +static __init void pvh_fixup_page_tables_for_hap(struct vcpu *v, + unsigned long v_start, + unsigned long v_end) +{ + int i, j, k; + l4_pgentry_t *pl4e, *l4start; + l3_pgentry_t *pl3e; + l2_pgentry_t *pl2e; + l1_pgentry_t *pl1e; + unsigned long cr3_pfn; + + ASSERT(paging_mode_enabled(v->domain)); + + l4start = map_domain_page(pagetable_get_pfn(v->arch.guest_table)); + + /* Clear entries prior to guest L4 start */ + pl4e = l4start + l4_table_offset(v_start); + memset(l4start, 0, (unsigned long)pl4e - (unsigned long)l4start); + + for ( ; pl4e <= l4start + l4_table_offset(v_end - 1); pl4e++ ) + { + pl3e = map_l3t_from_l4e(*pl4e); + for ( i = 0; i < PAGE_SIZE / sizeof(*pl3e); i++, pl3e++ ) + { + if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) ) + continue; + + pl2e = map_l2t_from_l3e(*pl3e); + for ( j = 0; j < PAGE_SIZE / sizeof(*pl2e); j++, pl2e++ ) + { + if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) ) + continue; + + pl1e = map_l1t_from_l2e(*pl2e); + for ( k = 0; k < PAGE_SIZE / sizeof(*pl1e); k++, pl1e++ ) + { + if ( !(l1e_get_flags(*pl1e) & _PAGE_PRESENT) ) + continue; + + *pl1e = l1e_from_pfn(get_gpfn_from_mfn(l1e_get_pfn(*pl1e)), + l1e_get_flags(*pl1e)); + } + unmap_domain_page(pl1e); + *pl2e = l2e_from_pfn(get_gpfn_from_mfn(l2e_get_pfn(*pl2e)), + l2e_get_flags(*pl2e)); + } + unmap_domain_page(pl2e); + *pl3e = l3e_from_pfn(get_gpfn_from_mfn(l3e_get_pfn(*pl3e)), + l3e_get_flags(*pl3e)); + } + unmap_domain_page(pl3e); + *pl4e = l4e_from_pfn(get_gpfn_from_mfn(l4e_get_pfn(*pl4e)), + l4e_get_flags(*pl4e)); + } + + /* Clear entries post guest L4. */ + if ( (unsigned long)pl4e & (PAGE_SIZE - 1) ) + memset(pl4e, 0, PAGE_SIZE - ((unsigned long)pl4e & (PAGE_SIZE - 1))); + + unmap_domain_page(l4start); + + cr3_pfn = get_gpfn_from_mfn(paddr_to_pfn(v->arch.cr3)); + v->arch.hvm_vcpu.guest_cr[3] = pfn_to_paddr(cr3_pfn); + + /* + * Finally, we update the paging modes (hap_update_paging_modes). This will + * create monitor_table for us, update v->arch.cr3, and update vmcs.cr3. + */ + paging_update_paging_modes(v); +} + /* Pages that are part of page tables must be read only. */ static __init void mark_pv_pt_pages_rdonly(struct domain *d, l4_pgentry_t *l4start, @@ -520,6 +660,8 @@ int __init construct_dom0( l3_pgentry_t *l3tab = NULL, *l3start = NULL; l2_pgentry_t *l2tab = NULL, *l2start = NULL; l1_pgentry_t *l1tab = NULL, *l1start = NULL; + paddr_t shared_info_paddr = 0; + u32 save_pvh_pg_mode = 0; /* * This fully describes the memory layout of the initial domain. All @@ -597,12 +739,21 @@ int __init construct_dom0( goto out; } - if ( parms.elf_notes[XEN_ELFNOTE_SUPPORTED_FEATURES].type != XEN_ENT_NONE && - !test_bit(XENFEAT_dom0, parms.f_supported) ) + if ( parms.elf_notes[XEN_ELFNOTE_SUPPORTED_FEATURES].type != XEN_ENT_NONE ) { - printk("Kernel does not support Dom0 operation\n"); - rc = -EINVAL; - goto out; + if ( !test_bit(XENFEAT_dom0, parms.f_supported) ) + { + printk("Kernel does not support Dom0 operation\n"); + rc = -EINVAL; + goto out; + } + if ( is_pvh_domain(d) && + !test_bit(XENFEAT_hvm_callback_vector, parms.f_supported) ) + { + printk("Kernel does not support PVH mode\n"); + rc = -EINVAL; + goto out; + } } if ( compat32 ) @@ -667,6 +818,13 @@ int __init construct_dom0( vstartinfo_end = (vstartinfo_start + sizeof(struct start_info) + sizeof(struct dom0_vga_console_info)); + + if ( is_pvh_domain(d) ) + { + shared_info_paddr = round_pgup(vstartinfo_end) - v_start; + vstartinfo_end += PAGE_SIZE; + } + vpt_start = round_pgup(vstartinfo_end); for ( nr_pt_pages = 2; ; nr_pt_pages++ ) { @@ -906,6 +1064,13 @@ int __init construct_dom0( (void)alloc_vcpu(d, i, cpu); } + /* + * pvh: we temporarily disable paging mode so that we can build cr3 needed + * to run on dom0''s page tables. + */ + save_pvh_pg_mode = d->arch.paging.mode; + d->arch.paging.mode = 0; + /* Set up CR3 value for write_ptbase */ if ( paging_mode_enabled(d) ) paging_update_paging_modes(v); @@ -971,6 +1136,15 @@ int __init construct_dom0( nr_pages); } + if ( is_pvh_domain(d) ) + hap_set_pvh_alloc_for_dom0(d, nr_pages); + + /* + * We enable paging mode again so guest_physmap_add_page will do the + * right thing for us. + */ + d->arch.paging.mode = save_pvh_pg_mode; + /* Write the phys->machine and machine->phys table entries. */ for ( pfn = 0; pfn < count; pfn++ ) { @@ -987,11 +1161,7 @@ int __init construct_dom0( if ( pfn > REVERSE_START && (vinitrd_start || pfn < initrd_pfn) ) mfn = alloc_epfn - (pfn - REVERSE_START); #endif - if ( !is_pv_32on64_domain(d) ) - ((unsigned long *)vphysmap_start)[pfn] = mfn; - else - ((unsigned int *)vphysmap_start)[pfn] = mfn; - set_gpfn_from_mfn(mfn, pfn); + dom0_update_physmap(d, pfn, mfn, vphysmap_start); if (!(pfn & 0xfffff)) process_pending_softirqs(); } @@ -1007,8 +1177,8 @@ int __init construct_dom0( if ( !page->u.inuse.type_info && !get_page_and_type(page, d, PGT_writable_page) ) BUG(); - ((unsigned long *)vphysmap_start)[pfn] = mfn; - set_gpfn_from_mfn(mfn, pfn); + + dom0_update_physmap(d, pfn, mfn, vphysmap_start); ++pfn; if (!(pfn & 0xfffff)) process_pending_softirqs(); @@ -1028,11 +1198,7 @@ int __init construct_dom0( #ifndef NDEBUG #define pfn (nr_pages - 1 - (pfn - (alloc_epfn - alloc_spfn))) #endif - if ( !is_pv_32on64_domain(d) ) - ((unsigned long *)vphysmap_start)[pfn] = mfn; - else - ((unsigned int *)vphysmap_start)[pfn] = mfn; - set_gpfn_from_mfn(mfn, pfn); + dom0_update_physmap(d, pfn, mfn, vphysmap_start); #undef pfn page++; pfn++; if (!(pfn & 0xfffff)) @@ -1056,6 +1222,15 @@ int __init construct_dom0( si->console.dom0.info_size = sizeof(struct dom0_vga_console_info); } + /* + * PVH: We need to update si->shared_info while we are on dom0 page tables, + * but need to defer the p2m update until after we have fixed up the + * page tables for PVH so that the m2p for the si pte entry returns + * correct pfn. + */ + if ( is_pvh_domain(d) ) + si->shared_info = shared_info_paddr; + if ( is_pv_32on64_domain(d) ) xlat_start_info(si, XLAT_start_info_console_dom0); @@ -1089,8 +1264,15 @@ int __init construct_dom0( regs->eflags = X86_EFLAGS_IF; if ( opt_dom0_shadow ) + { + if ( is_pvh_domain(d) ) + { + printk("Invalid option dom0_shadow for PVH\n"); + return -EINVAL; + } if ( paging_enable(d, PG_SH_enable) == 0 ) paging_update_paging_modes(v); + } if ( supervisor_mode_kernel ) { @@ -1180,6 +1362,19 @@ int __init construct_dom0( printk(" Xen warning: dom0 kernel broken ELF: %s\n", elf_check_broken(&elf)); + if ( is_pvh_domain(d) ) + { + /* finally, fixup the page table, replacing mfns with pfns */ + pvh_fixup_page_tables_for_hap(v, v_start, v_end); + + /* the pt has correct pfn for si, now update the mfn in the p2m */ + mfn = virt_to_mfn(d->shared_info); + pfn = shared_info_paddr >> PAGE_SHIFT; + dom0_update_physmap(d, pfn, mfn, 0); + + pvh_map_all_iomem(d); + } + iommu_dom0_init(dom0); return 0; diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index e3f544a..ec18771 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -46,7 +46,7 @@ static int gdbsx_guest_mem_io( return (iop->remain ? -EFAULT : 0); } -static int add_mem_mapping_for_xlate(struct domain *d, unsigned long gfn, +int add_mem_mapping_for_xlate(struct domain *d, unsigned long gfn, unsigned long mfn, unsigned long nr_mfns) { unsigned long i; diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c index d3f64bd..4accab6 100644 --- a/xen/arch/x86/mm/hap/hap.c +++ b/xen/arch/x86/mm/hap/hap.c @@ -579,6 +579,21 @@ int hap_domctl(struct domain *d, xen_domctl_shadow_op_t *sc, } } +void __init hap_set_pvh_alloc_for_dom0(struct domain *d, + unsigned long num_pages) +{ + int rc; + unsigned long memkb = num_pages * (PAGE_SIZE / 1024); + + /* Copied from: libxl_get_required_shadow_memory() */ + memkb = 4 * (256 * d->max_vcpus + 2 * (memkb / 1024)); + num_pages = ((memkb+1023)/1024) << (20 - PAGE_SHIFT); + paging_lock(d); + rc = hap_set_allocation(d, num_pages, NULL); + paging_unlock(d); + BUG_ON(rc); +} + static const struct paging_mode hap_paging_real_mode; static const struct paging_mode hap_paging_protected_mode; static const struct paging_mode hap_paging_pae_mode; diff --git a/xen/include/asm-x86/hap.h b/xen/include/asm-x86/hap.h index e03f983..aab8558 100644 --- a/xen/include/asm-x86/hap.h +++ b/xen/include/asm-x86/hap.h @@ -63,6 +63,7 @@ int hap_track_dirty_vram(struct domain *d, XEN_GUEST_HANDLE_64(uint8) dirty_bitmap); extern const struct paging_mode *hap_paging_get_mode(struct vcpu *); +void hap_set_pvh_alloc_for_dom0(struct domain *d, unsigned long num_pages); #endif /* XEN_HAP_H */ diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h index a057069..fd6fc1a 100644 --- a/xen/include/xen/domain.h +++ b/xen/include/xen/domain.h @@ -89,4 +89,7 @@ extern unsigned int xen_processor_pmbits; extern bool_t opt_dom0_vcpus_pin; +extern int add_mem_mapping_for_xlate(struct domain *d, unsigned long gfn, + unsigned long mfn, unsigned long nr_mfns); + #endif /* __XEN_DOMAIN_H__ */ -- 1.7.2.3
Mukesh Rathor
2013-Nov-27 02:27 UTC
[V3 PATCH 5/9] PVH dom0: implement XENMEM_add_to_physmap_range for x86
This preparatory patch adds support for XENMEM_add_to_physmap_range on x86 so it can be used to create a guest on PVH dom0. To this end, we add a new function xenmem_add_to_physmap_range(), and change xenmem_add_to_physmap_once parameters so it can be called from xenmem_add_to_physmap_range. Please note, compat will continue to return -ENOSYS. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/mm.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 73 insertions(+), 0 deletions(-) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 6c26026..fc8dded 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4675,6 +4675,42 @@ static int xenmem_add_to_physmap(struct domain *d, return xenmem_add_to_physmap_once(d, xatp); } +static int xenmem_add_to_physmap_range(struct domain *d, + struct xen_add_to_physmap_range *xatpr) +{ + int rc; + + /* Process entries in reverse order to allow continuations */ + while ( xatpr->size > 0 ) + { + xen_ulong_t idx; + xen_pfn_t gpfn; + struct xen_add_to_physmap xatp; + + if ( copy_from_guest_offset(&idx, xatpr->idxs, xatpr->size-1, 1) || + copy_from_guest_offset(&gpfn, xatpr->gpfns, xatpr->size-1, 1) ) + { + return -EFAULT; + } + + xatp.space = xatpr->space; + xatp.idx = idx; + xatp.gpfn = gpfn; + rc = xenmem_add_to_physmap_once(d, &xatp); + + if ( copy_to_guest_offset(xatpr->errs, xatpr->size-1, &rc, 1) ) + return -EFAULT; + + xatpr->size--; + + /* Check for continuation if it''s not the last interation */ + if ( xatpr->size > 0 && hypercall_preempt_check() ) + return -EAGAIN; + } + + return 0; +} + long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) { int rc; @@ -4689,6 +4725,10 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) if ( copy_from_guest(&xatp, arg, 1) ) return -EFAULT; + /* This one is only supported for add_to_physmap_range */ + if ( xatp.space == XENMAPSPACE_gmfn_foreign ) + return -EINVAL; + d = rcu_lock_domain_by_any_id(xatp.domid); if ( d == NULL ) return -ESRCH; @@ -4716,6 +4756,39 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) return rc; } + case XENMEM_add_to_physmap_range: + { + struct xen_add_to_physmap_range xatpr; + struct domain *d; + + if ( copy_from_guest(&xatpr, arg, 1) ) + return -EFAULT; + + /* This mapspace is redundant for this hypercall */ + if ( xatpr.space == XENMAPSPACE_gmfn_range ) + return -EINVAL; + + d = rcu_lock_domain_by_any_id(xatpr.domid); + if ( d == NULL ) + return -ESRCH; + + if ( (rc = xsm_add_to_physmap(XSM_TARGET, current->domain, d)) ) + { + rcu_unlock_domain(d); + return rc; + } + + rc = xenmem_add_to_physmap_range(d, &xatpr); + + rcu_unlock_domain(d); + + if ( rc == -EAGAIN ) + rc = hypercall_create_continuation( + __HYPERVISOR_memory_op, "ih", op, arg); + + return rc; + } + case XENMEM_set_memory_map: { struct xen_foreign_memory_map fmap; -- 1.7.2.3
In this patch, a new type p2m_map_foreign is introduced for pages that toolstack on PVH dom0 maps from foreign domains that its creating or supporting during it''s run time. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/mm/p2m-ept.c | 1 + xen/arch/x86/mm/p2m-pt.c | 1 + xen/arch/x86/mm/p2m.c | 28 ++++++++++++++++++++-------- xen/include/asm-x86/p2m.h | 4 ++++ 4 files changed, 26 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index 92d9e2d..08d1d72 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -75,6 +75,7 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_acces entry->w = 0; break; case p2m_grant_map_rw: + case p2m_map_foreign: entry->r = entry->w = 1; entry->x = 0; break; diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c index a1d5650..09b60ce 100644 --- a/xen/arch/x86/mm/p2m-pt.c +++ b/xen/arch/x86/mm/p2m-pt.c @@ -89,6 +89,7 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn) case p2m_ram_rw: return flags | P2M_BASE_FLAGS | _PAGE_RW; case p2m_grant_map_rw: + case p2m_map_foreign: return flags | P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT; case p2m_mmio_direct: if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) ) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 8f380ed..0659ef1 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -525,7 +525,7 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn, unsigned long mfn, for ( i = 0; i < (1UL << page_order); i++ ) { mfn_return = p2m->get_entry(p2m, gfn + i, &t, &a, 0, NULL); - if ( !p2m_is_grant(t) && !p2m_is_shared(t) ) + if ( !p2m_is_grant(t) && !p2m_is_shared(t) && !p2m_is_foreign(t) ) set_gpfn_from_mfn(mfn+i, INVALID_M2P_ENTRY); ASSERT( !p2m_is_valid(t) || mfn + i == mfn_x(mfn_return) ); } @@ -756,10 +756,9 @@ void p2m_change_type_range(struct domain *d, p2m_unlock(p2m); } - - -int -set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn) +static int +set_typed_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, + p2m_type_t gfn_p2mt) { int rc = 0; p2m_access_t a; @@ -784,16 +783,29 @@ set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn) set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY); } - P2M_DEBUG("set mmio %lx %lx\n", gfn, mfn_x(mfn)); - rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_mmio_direct, p2m->default_access); + P2M_DEBUG("set %d %lx %lx\n", gfn_p2mt, gfn, mfn_x(mfn)); + rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, gfn_p2mt, + p2m->default_access); gfn_unlock(p2m, gfn, 0); if ( 0 == rc ) gdprintk(XENLOG_ERR, - "set_mmio_p2m_entry: set_p2m_entry failed! mfn=%08lx\n", + "%s: set_p2m_entry failed! mfn=%08lx\n", __func__, mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot))); return rc; } +/* Returns: True for success. */ +int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn) +{ + return set_typed_p2m_entry(d, gfn, mfn, p2m_map_foreign); +} + +int +set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn) +{ + return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct); +} + int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn) { diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index 43583b2..6fc71a1 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -70,6 +70,7 @@ typedef enum { p2m_ram_paging_in = 11, /* Memory that is being paged in */ p2m_ram_shared = 12, /* Shared or sharable memory */ p2m_ram_broken = 13, /* Broken page, access cause domain crash */ + p2m_map_foreign = 14, /* ram pages from foreign domain */ } p2m_type_t; /* @@ -180,6 +181,7 @@ typedef unsigned int p2m_query_t; #define p2m_is_sharable(_t) (p2m_to_mask(_t) & P2M_SHARABLE_TYPES) #define p2m_is_shared(_t) (p2m_to_mask(_t) & P2M_SHARED_TYPES) #define p2m_is_broken(_t) (p2m_to_mask(_t) & P2M_BROKEN_TYPES) +#define p2m_is_foreign(_t) (p2m_to_mask(_t) & p2m_to_mask(p2m_map_foreign)) /* Per-p2m-table state */ struct p2m_domain { @@ -510,6 +512,8 @@ p2m_type_t p2m_change_type(struct domain *d, unsigned long gfn, int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn); int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn); +/* Set foreign mfn in the current guest''s p2m table. */ +int set_foreign_p2m_entry(struct domain *domp, unsigned long gfn, mfn_t mfn); /* * Populate-on-demand -- 1.7.2.3
In preparation for the next patch, we update xsm_add_to_physmap to allow for checking of foreign domain. Thus, the current domain must have the right to update the mappings of target domain with pages from foreign domain. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> CC: dgdegra@tycho.nsa.gov --- xen/arch/x86/mm.c | 16 +++++++++++++--- xen/include/xsm/dummy.h | 10 ++++++++-- xen/include/xsm/xsm.h | 6 +++--- xen/xsm/flask/hooks.c | 10 ++++++++-- 4 files changed, 32 insertions(+), 10 deletions(-) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index fc8dded..11c9a89 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4733,7 +4733,7 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) if ( d == NULL ) return -ESRCH; - if ( xsm_add_to_physmap(XSM_TARGET, current->domain, d) ) + if ( xsm_add_to_physmap(XSM_TARGET, current->domain, d, NULL) ) { rcu_unlock_domain(d); return -EPERM; @@ -4759,7 +4759,7 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) case XENMEM_add_to_physmap_range: { struct xen_add_to_physmap_range xatpr; - struct domain *d; + struct domain *d, *fd = NULL; if ( copy_from_guest(&xatpr, arg, 1) ) return -EFAULT; @@ -4772,7 +4772,17 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) if ( d == NULL ) return -ESRCH; - if ( (rc = xsm_add_to_physmap(XSM_TARGET, current->domain, d)) ) + if ( xatpr.foreign_domid ) + { + if ( (fd = rcu_lock_domain_by_any_id(xatpr.foreign_domid)) == NULL ) + { + rcu_unlock_domain(d); + return -ESRCH; + } + rcu_unlock_domain(fd); + } + + if ( (rc = xsm_add_to_physmap(XSM_TARGET, current->domain, d, fd)) ) { rcu_unlock_domain(d); return rc; diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h index eb9e1a1..34c097d 100644 --- a/xen/include/xsm/dummy.h +++ b/xen/include/xsm/dummy.h @@ -467,10 +467,16 @@ static XSM_INLINE int xsm_pci_config_permission(XSM_DEFAULT_ARG struct domain *d return xsm_default_action(action, current->domain, d); } -static XSM_INLINE int xsm_add_to_physmap(XSM_DEFAULT_ARG struct domain *d1, struct domain *d2) +static XSM_INLINE int xsm_add_to_physmap(XSM_DEFAULT_ARG struct domain *d1, struct domain *d2, struct domain *d3) { + int rc; + XSM_ASSERT_ACTION(XSM_TARGET); - return xsm_default_action(action, d1, d2); + rc = xsm_default_action(action, d1, d2); + if ( d3 && !rc ) + rc = xsm_default_action(action, d1, d3); + + return rc; } static XSM_INLINE int xsm_remove_from_physmap(XSM_DEFAULT_ARG struct domain *d1, struct domain *d2) diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h index 1939453..2d29a2f 100644 --- a/xen/include/xsm/xsm.h +++ b/xen/include/xsm/xsm.h @@ -90,7 +90,7 @@ struct xsm_operations { int (*memory_adjust_reservation) (struct domain *d1, struct domain *d2); int (*memory_stat_reservation) (struct domain *d1, struct domain *d2); int (*memory_pin_page) (struct domain *d1, struct domain *d2, struct page_info *page); - int (*add_to_physmap) (struct domain *d1, struct domain *d2); + int (*add_to_physmap) (struct domain *d1, struct domain *d2, struct domain *d3); int (*remove_from_physmap) (struct domain *d1, struct domain *d2); int (*claim_pages) (struct domain *d); @@ -344,9 +344,9 @@ static inline int xsm_memory_pin_page(xsm_default_t def, struct domain *d1, stru return xsm_ops->memory_pin_page(d1, d2, page); } -static inline int xsm_add_to_physmap(xsm_default_t def, struct domain *d1, struct domain *d2) +static inline int xsm_add_to_physmap(xsm_default_t def, struct domain *d1, struct domain *d2, struct domain *d3) { - return xsm_ops->add_to_physmap(d1, d2); + return xsm_ops->add_to_physmap(d1, d2, d3); } static inline int xsm_remove_from_physmap(xsm_default_t def, struct domain *d1, struct domain *d2) diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c index b1e2593..e541dd3 100644 --- a/xen/xsm/flask/hooks.c +++ b/xen/xsm/flask/hooks.c @@ -1068,9 +1068,15 @@ static inline int flask_tmem_control(void) return domain_has_xen(current->domain, XEN__TMEM_CONTROL); } -static int flask_add_to_physmap(struct domain *d1, struct domain *d2) +static int flask_add_to_physmap(struct domain *d1, struct domain *d2, struct domain *d3) { - return domain_has_perm(d1, d2, SECCLASS_MMU, MMU__PHYSMAP); + int rc; + + rc = domain_has_perm(d1, d2, SECCLASS_MMU, MMU__PHYSMAP); + if ( d3 && !rc ) + rc = domain_has_perm(d1, d3, SECCLASS_MMU, + MMU__MAP_READ|MMU__MAP_WRITE); + return rc; } static int flask_remove_from_physmap(struct domain *d1, struct domain *d2) -- 1.7.2.3
In this patch, a new function, xenmem_add_foreign_to_pmap(), is added to map pages from foreign guest into current dom0 for domU creation. Such pages are typed p2m_map_foreign. Also, support is added here to XENMEM_remove_from_physmap to remove such pages. Note, in the remove path, we must release the refcount that was taken during the map phase. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/mm.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++--- xen/common/memory.c | 38 ++++++++++++++++++--- 2 files changed, 121 insertions(+), 10 deletions(-) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 11c9a89..797fbc7 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -2810,7 +2810,7 @@ static struct domain *get_pg_owner(domid_t domid) goto out; } - if ( unlikely(paging_mode_translate(curr)) ) + if ( !is_pvh_domain(curr) && unlikely(paging_mode_translate(curr)) ) { MEM_LOG("Cannot mix foreign mappings with translated domains"); goto out; @@ -4520,9 +4520,84 @@ static int handle_iomem_range(unsigned long s, unsigned long e, void *p) return 0; } +/* + * Add frames from foreign domain to current domain''s physmap. Similar to + * XENMAPSPACE_gmfn but the frame is foreign being mapped into current, + * and is not removed from foreign domain. + * Usage: libxl on pvh dom0 creating a guest and doing privcmd_ioctl_mmap. + * Side Effect: the mfn for fgfn will be refcounted so it is not lost + * while mapped here. The refcnt is released in do_memory_op() + * via XENMEM_remove_from_physmap. + * Returns: 0 ==> success + */ +static int xenmem_add_foreign_to_pmap(unsigned long fgfn, unsigned long gpfn, + domid_t foreign_domid) +{ + p2m_type_t p2mt, p2mt_prev; + int rc = 0; + unsigned long prev_mfn, mfn = 0; + struct domain *fdom, *currd = current->domain; + struct page_info *page = NULL; + + if ( currd->domain_id == foreign_domid || foreign_domid == DOMID_SELF || + !is_pvh_domain(currd) ) + return -EINVAL; + + /* Note, access check is done in the caller via xsm_add_to_physmap */ + if ( !is_control_domain(currd) || + (fdom = get_pg_owner(foreign_domid)) == NULL ) + return -EPERM; + + /* following will take a refcnt on the mfn */ + page = get_page_from_gfn(fdom, fgfn, &p2mt, P2M_ALLOC); + if ( !page || !p2m_is_valid(p2mt) ) + { + if ( page ) + put_page(page); + put_pg_owner(fdom); + return -EINVAL; + } + mfn = page_to_mfn(page); + + /* Remove previously mapped page if it is present. */ + prev_mfn = mfn_x(get_gfn(currd, gpfn, &p2mt_prev)); + if ( mfn_valid(prev_mfn) ) + { + if ( is_xen_heap_mfn(prev_mfn) ) + /* Xen heap frames are simply unhooked from this phys slot */ + guest_physmap_remove_page(currd, gpfn, prev_mfn, 0); + else + /* Normal domain memory is freed, to avoid leaking memory. */ + guest_remove_page(currd, gpfn); + } + /* + * Create the new mapping. Can''t use guest_physmap_add_page() because it + * will update the m2p table which will result in mfn -> gpfn of dom0 + * and not fgfn of domU. + */ + if ( set_foreign_p2m_entry(currd, gpfn, _mfn(mfn)) == 0 ) + { + gdprintk(XENLOG_WARNING, "set_foreign_p2m_entry failed. " + "gpfn:%lx mfn:%lx fgfn:%lx fd:%d\n", + gpfn, mfn, fgfn, foreign_domid); + put_page(page); + rc = -EINVAL; + } + + /* + * We must do this put_gfn after set_foreign_p2m_entry so another cpu + * doesn''t populate the gpfn before us. + */ + put_gfn(currd, gpfn); + + put_pg_owner(fdom); + return rc; +} + static int xenmem_add_to_physmap_once( struct domain *d, - const struct xen_add_to_physmap *xatp) + const struct xen_add_to_physmap *xatp, + domid_t foreign_domid) { struct page_info *page = NULL; unsigned long gfn = 0; /* gcc ... */ @@ -4581,6 +4656,14 @@ static int xenmem_add_to_physmap_once( page = mfn_to_page(mfn); break; } + + case XENMAPSPACE_gmfn_foreign: + { + rc = xenmem_add_foreign_to_pmap(xatp->idx, xatp->gpfn, + foreign_domid); + return rc; + } + default: break; } @@ -4646,7 +4729,7 @@ static int xenmem_add_to_physmap(struct domain *d, start_xatp = *xatp; while ( xatp->size > 0 ) { - rc = xenmem_add_to_physmap_once(d, xatp); + rc = xenmem_add_to_physmap_once(d, xatp, DOMID_INVALID); if ( rc < 0 ) return rc; @@ -4672,7 +4755,7 @@ static int xenmem_add_to_physmap(struct domain *d, return rc; } - return xenmem_add_to_physmap_once(d, xatp); + return xenmem_add_to_physmap_once(d, xatp, DOMID_INVALID); } static int xenmem_add_to_physmap_range(struct domain *d, @@ -4696,7 +4779,7 @@ static int xenmem_add_to_physmap_range(struct domain *d, xatp.space = xatpr->space; xatp.idx = idx; xatp.gpfn = gpfn; - rc = xenmem_add_to_physmap_once(d, &xatp); + rc = xenmem_add_to_physmap_once(d, &xatp, xatpr->foreign_domid); if ( copy_to_guest_offset(xatpr->errs, xatpr->size-1, &rc, 1) ) return -EFAULT; diff --git a/xen/common/memory.c b/xen/common/memory.c index 50b740f..d81df18 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -675,9 +675,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case XENMEM_remove_from_physmap: { + unsigned long mfn; struct xen_remove_from_physmap xrfp; struct page_info *page; - struct domain *d; + struct domain *d, *foreign_dom = NULL; + p2m_type_t p2mt, tp; if ( copy_from_guest(&xrfp, arg, 1) ) return -EFAULT; @@ -693,11 +695,37 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) return rc; } - page = get_page_from_gfn(d, xrfp.gpfn, NULL, P2M_ALLOC); - if ( page ) + /* + * if PVH, the gfn could be mapped to a mfn from foreign domain by the + * user space tool during domain creation. We need to check for that, + * free it up from the p2m, and release refcnt on it. In such a case, + * page would be NULL and the following call would not have refcnt''d + * the page. See also xenmem_add_foreign_to_pmap(). + */ + page = get_page_from_gfn(d, xrfp.gpfn, &p2mt, P2M_ALLOC); + + if ( page || p2m_is_foreign(p2mt) ) { - guest_physmap_remove_page(d, xrfp.gpfn, page_to_mfn(page), 0); - put_page(page); + if ( page ) + mfn = page_to_mfn(page); + else + { + mfn = mfn_x(get_gfn_query(d, xrfp.gpfn, &tp)); + foreign_dom = page_get_owner(mfn_to_page(mfn)); + ASSERT(is_pvh_domain(d)); + ASSERT(d != foreign_dom); + ASSERT(p2m_is_foreign(tp)); + } + + guest_physmap_remove_page(d, xrfp.gpfn, mfn, 0); + if (page) + put_page(page); + + if ( p2m_is_foreign(p2mt) ) + { + put_page(mfn_to_page(mfn)); + put_gfn(d, xrfp.gpfn); + } } else rc = -ENOENT; -- 1.7.2.3
Add opt_dom0pvh. Note, pvh dom0 is disabled until the fixme in domain_build.c is resolved. The fixme is added by patch title: "PVH dom0: construct_dom0 changes" Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/setup.c | 19 ++++++++++++++++--- 1 files changed, 16 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index e33c34b..de30ef6 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -61,6 +61,10 @@ integer_param("maxcpus", max_cpus); static bool_t __initdata disable_smep; invbool_param("smep", disable_smep); +/* Boot dom0 in pvh mode */ +bool_t __initdata opt_dom0pvh; +boolean_param("dom0pvh", opt_dom0pvh); + /* **** Linux config option: propagated to domain0. */ /* "acpi=off": Sisables both ACPI table parsing and interpreter. */ /* "acpi=force": Override the disable blacklist. */ @@ -545,7 +549,7 @@ void __init __start_xen(unsigned long mbi_p) { char *memmap_type = NULL; char *cmdline, *kextra, *loader; - unsigned int initrdidx; + unsigned int initrdidx, domcr_flags = 0; multiboot_info_t *mbi = __va(mbi_p); module_t *mod = (module_t *)__va(mbi->mods_addr); unsigned long nr_pages, raw_max_page, modules_headroom, *module_map; @@ -1332,8 +1336,17 @@ void __init __start_xen(unsigned long mbi_p) if ( !tboot_protect_mem_regions() ) panic("Could not protect TXT memory regions\n"); - /* Create initial domain 0. */ - dom0 = domain_create(0, DOMCRF_s3_integrity, 0); + /* + * Following removed when "pvh fixme" in domain_build.c is resolved. + * The fixme is added by patch "PVH dom0: construct_dom0 changes". + */ + if ( opt_dom0pvh ) + panic("You do not have the correct xen version for dom0 PVH\n"); + + /* Create initial domain 0. */ + domcr_flags = (opt_dom0pvh ? DOMCRF_pvh | DOMCRF_hap : 0); + domcr_flags |= DOMCRF_s3_integrity; + dom0 = domain_create(0, domcr_flags, 0); if ( IS_ERR(dom0) || (alloc_dom0_vcpu0() == NULL) ) panic("Error creating domain 0\n"); -- 1.7.2.3
George Dunlap
2013-Nov-27 15:00 UTC
Re: [V3 PATCH 9/9] pvh dom0: add opt_dom0pvh to setup.c
On 11/27/2013 02:27 AM, Mukesh Rathor wrote:> Add opt_dom0pvh. Note, pvh dom0 is disabled until the fixme in > domain_build.c is resolved. The fixme is added by patch title: > "PVH dom0: construct_dom0 changes"So it''s been asked before but I haven''t seen an answer yet: What exactly is the problem here, and when might it be fixed? -George
On 11/26/2013 09:27 PM, Mukesh Rathor wrote:> In preparation for the next patch, we update xsm_add_to_physmap to > allow for checking of foreign domain. Thus, the current domain must > have the right to update the mappings of target domain with pages from > foreign domain. > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > CC: dgdegra@tycho.nsa.gov > --- > xen/arch/x86/mm.c | 16 +++++++++++++--- > xen/include/xsm/dummy.h | 10 ++++++++-- > xen/include/xsm/xsm.h | 6 +++--- > xen/xsm/flask/hooks.c | 10 ++++++++-- > 4 files changed, 32 insertions(+), 10 deletions(-)The XSM changes look good; however, the calling code needs a bit of tweaking. Currently, if domain 0 is specified as the foreign domain, the check is skipped, and the check is also run unnecessarily when foreign_domid is nonzero but the operation is not XENMAPSPACE_gmfn_foreign. The locking in this version also implies a potential TOCTOU bug, but which in reality is impossible to trigger due to the existing RCU lock held on (d). I would suggest passing the foreign struct domain instead of the domid, as below. An unrelated question about XENMAPSPACE_gmfn_foreign that came up while looking at this: is the domain parameter (d) supposed to be ignored here, with maps always modifying the current domain? I would have expected this call to manipulate d''s physmap, with the common case being (d == current->domain). --- diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 797fbc7..9afbcb9 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4531,23 +4531,17 @@ static int handle_iomem_range(unsigned long s, unsigned long e, void *p) * Returns: 0 ==> success */ static int xenmem_add_foreign_to_pmap(unsigned long fgfn, unsigned long gpfn, - domid_t foreign_domid) + struct domain *fdom) { p2m_type_t p2mt, p2mt_prev; int rc = 0; unsigned long prev_mfn, mfn = 0; - struct domain *fdom, *currd = current->domain; + struct domain *currd = current->domain; struct page_info *page = NULL; - if ( currd->domain_id == foreign_domid || foreign_domid == DOMID_SELF || - !is_pvh_domain(currd) ) + if ( currd == fdom || !fdom || !is_pvh_domain(currd) ) return -EINVAL; - /* Note, access check is done in the caller via xsm_add_to_physmap */ - if ( !is_control_domain(currd) || - (fdom = get_pg_owner(foreign_domid)) == NULL ) - return -EPERM; - /* following will take a refcnt on the mfn */ page = get_page_from_gfn(fdom, fgfn, &p2mt, P2M_ALLOC); if ( !page || !p2m_is_valid(p2mt) ) @@ -4579,7 +4573,7 @@ static int xenmem_add_foreign_to_pmap(unsigned long fgfn, unsigned long gpfn, { gdprintk(XENLOG_WARNING, "set_foreign_p2m_entry failed. " "gpfn:%lx mfn:%lx fgfn:%lx fd:%d\n", - gpfn, mfn, fgfn, foreign_domid); + gpfn, mfn, fgfn, fdom->domain_id); put_page(page); rc = -EINVAL; } @@ -4590,14 +4584,13 @@ static int xenmem_add_foreign_to_pmap(unsigned long fgfn, unsigned long gpfn, */ put_gfn(currd, gpfn); - put_pg_owner(fdom); return rc; } static int xenmem_add_to_physmap_once( struct domain *d, const struct xen_add_to_physmap *xatp, - domid_t foreign_domid) + struct domain *foreign_dom) { struct page_info *page = NULL; unsigned long gfn = 0; /* gcc ... */ @@ -4660,7 +4653,7 @@ static int xenmem_add_to_physmap_once( case XENMAPSPACE_gmfn_foreign: { rc = xenmem_add_foreign_to_pmap(xatp->idx, xatp->gpfn, - foreign_domid); + foreign_dom); return rc; } @@ -4729,7 +4722,7 @@ static int xenmem_add_to_physmap(struct domain *d, start_xatp = *xatp; while ( xatp->size > 0 ) { - rc = xenmem_add_to_physmap_once(d, xatp, DOMID_INVALID); + rc = xenmem_add_to_physmap_once(d, xatp, NULL); if ( rc < 0 ) return rc; @@ -4755,11 +4748,12 @@ static int xenmem_add_to_physmap(struct domain *d, return rc; } - return xenmem_add_to_physmap_once(d, xatp, DOMID_INVALID); + return xenmem_add_to_physmap_once(d, xatp, NULL); } static int xenmem_add_to_physmap_range(struct domain *d, - struct xen_add_to_physmap_range *xatpr) + struct xen_add_to_physmap_range *xatpr, + struct domain *foreign_dom) { int rc; @@ -4779,7 +4773,7 @@ static int xenmem_add_to_physmap_range(struct domain *d, xatp.space = xatpr->space; xatp.idx = idx; xatp.gpfn = gpfn; - rc = xenmem_add_to_physmap_once(d, &xatp, xatpr->foreign_domid); + rc = xenmem_add_to_physmap_once(d, &xatp, foreign_dom); if ( copy_to_guest_offset(xatpr->errs, xatpr->size-1, &rc, 1) ) return -EFAULT; @@ -4855,25 +4849,29 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) if ( d == NULL ) return -ESRCH; - if ( xatpr.foreign_domid ) + if ( xatpr.space == XENMAPSPACE_gmfn_foreign ) { - if ( (fd = rcu_lock_domain_by_any_id(xatpr.foreign_domid)) == NULL ) + fd = get_pg_owner(xatpr.foreign_domid); + if ( fd == NULL ) { rcu_unlock_domain(d); return -ESRCH; } - rcu_unlock_domain(fd); } if ( (rc = xsm_add_to_physmap(XSM_TARGET, current->domain, d, fd)) ) { rcu_unlock_domain(d); + if (fd) + put_pg_owner(fd); return rc; } - rc = xenmem_add_to_physmap_range(d, &xatpr); + rc = xenmem_add_to_physmap_range(d, &xatpr, fd); rcu_unlock_domain(d); + if (fd) + put_pg_owner(fd); if ( rc == -EAGAIN ) rc = hypercall_create_continuation(
Mukesh Rathor
2013-Nov-27 20:12 UTC
Re: [V3 PATCH 9/9] pvh dom0: add opt_dom0pvh to setup.c
On Wed, 27 Nov 2013 15:00:43 +0000 George Dunlap <george.dunlap@eu.citrix.com> wrote:> On 11/27/2013 02:27 AM, Mukesh Rathor wrote: > > Add opt_dom0pvh. Note, pvh dom0 is disabled until the fixme in > > domain_build.c is resolved. The fixme is added by patch title: > > "PVH dom0: construct_dom0 changes" > > So it''s been asked before but I haven''t seen an answer yet: What > exactly is the problem here, and when might it be fixed?The problem would happen if an mmio space sits above the highest e820 address. Ie, the e820 didn''t report it. FWIW, PV linux would not boot as dom0 in those cases anyways, as it doesn''t support anything not in the e820 at present. Konrad is looking into it. As for the hardware, I''ve not found any where it would happen. Here''s Jan''s response where this was discussed previously: -------> For testing purposes, do you have reference for hardware? I don''t see > any here with such configuration.Nothing specific, but I know that SR-IOV virtual functions easily cause kernels to run out of MMIO space below 4G (namely when the hole is only around 1Gb or even less), and Intel must have knowledge of graphics cards having so huge a frame buffer that it can only be mapped above 4G. Jan ------- IMO, this is a pre-existing condition, that is not specific to PVH only, as such a case would not work for linux PV dom0 either. thanks, Mukesh
On Wed, 27 Nov 2013 11:46:27 -0500 Daniel De Graaf <dgdegra@tycho.nsa.gov> wrote:> On 11/26/2013 09:27 PM, Mukesh Rathor wrote: > > In preparation for the next patch, we update xsm_add_to_physmap to > > allow for checking of foreign domain. Thus, the current domain must > > have the right to update the mappings of target domain with pages > > from foreign domain. > > > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > > CC: dgdegra@tycho.nsa.gov > > --- > > xen/arch/x86/mm.c | 16 +++++++++++++--- > > xen/include/xsm/dummy.h | 10 ++++++++-- > > xen/include/xsm/xsm.h | 6 +++--- > > xen/xsm/flask/hooks.c | 10 ++++++++-- > > 4 files changed, 32 insertions(+), 10 deletions(-) > > The XSM changes look good; however, the calling code needs a bit of > tweaking. Currently, if domain 0 is specified as the foreign domain, > the check is skipped, and the check is also run unnecessarily when > foreign_domid is nonzero but the operation is not > XENMAPSPACE_gmfn_foreign. The locking in this version also implies a > potential TOCTOU bug, but which in reality is impossible to trigger > due to the existing RCU lock held on (d). I would suggest passing > the foreign struct domain instead of the domid, as below. > > An unrelated question about XENMAPSPACE_gmfn_foreign that came up > while looking at this: is the domain parameter (d) supposed to be > ignored here, with maps always modifying the current domain? I would > have expected this call to manipulate d''s physmap, with the common > case being (d == current->domain).Right, that is the assumption at present that PVH dom0 is creating user domains. Perhaps a debug assert would be a good idea. I can add that. It''s probably straightforward to just manipulate d''s physmap, but I''d rather do it when it''s really time for it, than to anticipate something I can''t test right now. thanks for your input. Mukesh
George Dunlap
2013-Nov-28 11:54 UTC
Re: [V3 PATCH 9/9] pvh dom0: add opt_dom0pvh to setup.c
On Wed, Nov 27, 2013 at 8:12 PM, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:> On Wed, 27 Nov 2013 15:00:43 +0000 > George Dunlap <george.dunlap@eu.citrix.com> wrote: > >> On 11/27/2013 02:27 AM, Mukesh Rathor wrote: >> > Add opt_dom0pvh. Note, pvh dom0 is disabled until the fixme in >> > domain_build.c is resolved. The fixme is added by patch title: >> > "PVH dom0: construct_dom0 changes" >> >> So it''s been asked before but I haven''t seen an answer yet: What >> exactly is the problem here, and when might it be fixed? > > The problem would happen if an mmio space sits above the highest > e820 address. Ie, the e820 didn''t report it. FWIW, PV linux would not > boot as dom0 in those cases anyways, as it doesn''t support anything > not in the e820 at present. Konrad is looking into it. > > As for the hardware, I''ve not found any where it would happen. Here''s > Jan''s response where this was discussed previously: > > ------- >> For testing purposes, do you have reference for hardware? I don''t see >> any here with such configuration. > > Nothing specific, but I know that SR-IOV virtual functions easily > cause kernels to run out of MMIO space below 4G (namely when > the hole is only around 1Gb or even less), and Intel must have > knowledge of graphics cards having so huge a frame buffer that > it can only be mapped above 4G. > > Jan > ------- > > IMO, this is a pre-existing condition, that is not specific to > PVH only, as such a case would not work for linux PV dom0 either.Why are you disabling PVH for all systems then, if it''s only a small number of systems that may have this problem, and if PV wouldn''t work on those systems anyway? -George
On 11/27/2013 02:27 AM, Mukesh Rathor wrote:> Hi, > > V3: > - New patch #7 for xsm changes to add_to_physmap. > - Patches 3, 4 5 6 and 8 are unchanged from V2. (just a new comment in #8). > > These patches implement PVH dom0. Please note there is 1 fixme in > this entire series that is being discussed under a different thread, > and will be worked on next. PVH dom0 creation is disabled > until then.So a couple of thoughts from a release perspective. Releasing *code* as "experimental" means, "it may work or it may not; use at your own risk". If people use it and it works, then great; they can expect that the code will only get better. However, releasing an *interface* as "experimental" means, "it may work for you now, but it may not work later when we change the interface". While this is nice in theory, in practice, once something works, people may begin to rely on it and we may end up having to support it anyway. So the Linux interface cannot really be labelled "experimental"; we have to be reasonably certain that we can support it going forward. Benefits: We have a fairly solid precedent for releasing features as "experimental" or "tech preview". This allows a much wider testing and feedback. If it turns out to be robust enough, people may even be able to use it, gaining the potential performance advantages. Someone could make an argument the other way: that the best thing to do would be to check it in at the beginning of the release cycle, get it well tested, and then release it as "ready" for 4.5, without going through an "experimental" phase. Both arguments have their merits; but since current way we do things hasn''t caused any problems and seems to be working OK, it seems best to follow precedent, and assume that a tech preview will be beneficial. Risks, bugs: All of the actual functional changes in this series are predicated on "if(is_pvh_domain())", so in theory they should only have an effect on PVH guests. (There is, of course, a small risk that there will be a mistake here.) It introduces a new p2m type, but since it is the only one that uses it, bugs should only affect PVH, and not other functionality. Risks, interface: This patch series only adds two things to the interface with Linux: XENMAPSPACE_gmfn_foreign and XENMEM_add_to_physmap_range. These are already used and available in the ARM side. Normally I''d be afraid of accepting new interfaces at this stage in the game, as I''d be afraid that we hadn''t had enough time to make sure it''s something we want to support going forward. However, since this is just duplicating an interface already in use on the ARM side, I think the interface *has* been thought of for some time. This makes is much more likely to be worth the risk; if the ARM side has used it for 6 months without finding a problem with it, it seems unlikely that the x86 side will be particularly different. So on the whole, there is a benefit (if a bit nebulous) to having it in, and a reasonably low risk; and it''s not clear that the risk will be significantly mitigated by waiting another 6 months. I''m therefore inclined to give it a release ack. Any thoughts? -George
>>> On 28.11.13 at 13:07, George Dunlap <george.dunlap@eu.citrix.com> wrote: > So on the whole, there is a benefit (if a bit nebulous) to having it in, > and a reasonably low risk; and it''s not clear that the risk will be > significantly mitigated by waiting another 6 months. I''m therefore > inclined to give it a release ack. > > Any thoughts?You worded it quite nicely, and much better than I would ever have been able to, but in the end it all boils down to the same reasoning that I have been following when suggesting to take the changes if they''re ready. Jan
>>> On 27.11.13 at 21:29, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > On Wed, 27 Nov 2013 11:46:27 -0500 > Daniel De Graaf <dgdegra@tycho.nsa.gov> wrote: > >> On 11/26/2013 09:27 PM, Mukesh Rathor wrote: >> > In preparation for the next patch, we update xsm_add_to_physmap to >> > allow for checking of foreign domain. Thus, the current domain must >> > have the right to update the mappings of target domain with pages >> > from foreign domain. >> > >> > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> >> > CC: dgdegra@tycho.nsa.gov >> > --- >> > xen/arch/x86/mm.c | 16 +++++++++++++--- >> > xen/include/xsm/dummy.h | 10 ++++++++-- >> > xen/include/xsm/xsm.h | 6 +++--- >> > xen/xsm/flask/hooks.c | 10 ++++++++-- >> > 4 files changed, 32 insertions(+), 10 deletions(-) >> >> The XSM changes look good; however, the calling code needs a bit of >> tweaking. Currently, if domain 0 is specified as the foreign domain, >> the check is skipped, and the check is also run unnecessarily when >> foreign_domid is nonzero but the operation is not >> XENMAPSPACE_gmfn_foreign. The locking in this version also implies a >> potential TOCTOU bug, but which in reality is impossible to trigger >> due to the existing RCU lock held on (d). I would suggest passing >> the foreign struct domain instead of the domid, as below. >> >> An unrelated question about XENMAPSPACE_gmfn_foreign that came up >> while looking at this: is the domain parameter (d) supposed to be >> ignored here, with maps always modifying the current domain? I would >> have expected this call to manipulate d''s physmap, with the common >> case being (d == current->domain). > > Right, that is the assumption at present that PVH dom0 is creating > user domains. Perhaps a debug assert would be a good idea. I can add > that. It''s probably straightforward to just manipulate d''s physmap, > but I''d rather do it when it''s really time for it, than to anticipate > something I can''t test right now.I''m afraid that''s not the right approach: Close to none of us would ever test in a heavily disaggregated environment, yet nevertheless the fundamentals for it are being put in place. So the PVH code shouldn''t be an exception here. Jan
>>> On 28.11.13 at 12:54, George Dunlap <George.Dunlap@eu.citrix.com> wrote: > On Wed, Nov 27, 2013 at 8:12 PM, Mukesh Rathor <mukesh.rathor@oracle.com> > wrote: >> On Wed, 27 Nov 2013 15:00:43 +0000 >> George Dunlap <george.dunlap@eu.citrix.com> wrote: >> >>> On 11/27/2013 02:27 AM, Mukesh Rathor wrote: >>> > Add opt_dom0pvh. Note, pvh dom0 is disabled until the fixme in >>> > domain_build.c is resolved. The fixme is added by patch title: >>> > "PVH dom0: construct_dom0 changes" >>> >>> So it''s been asked before but I haven''t seen an answer yet: What >>> exactly is the problem here, and when might it be fixed? >> >> The problem would happen if an mmio space sits above the highest >> e820 address. Ie, the e820 didn''t report it. FWIW, PV linux would not >> boot as dom0 in those cases anyways, as it doesn''t support anything >> not in the e820 at present. Konrad is looking into it. >> >> As for the hardware, I''ve not found any where it would happen. Here''s >> Jan''s response where this was discussed previously: >> >> ------- >>> For testing purposes, do you have reference for hardware? I don''t see >>> any here with such configuration. >> >> Nothing specific, but I know that SR-IOV virtual functions easily >> cause kernels to run out of MMIO space below 4G (namely when >> the hole is only around 1Gb or even less), and Intel must have >> knowledge of graphics cards having so huge a frame buffer that >> it can only be mapped above 4G. >> >> Jan >> ------- >> >> IMO, this is a pre-existing condition, that is not specific to >> PVH only, as such a case would not work for linux PV dom0 either. > > Why are you disabling PVH for all systems then, if it''s only a small > number of systems that may have this problem, and if PV wouldn''t work > on those systems anyway?The thing is - you can''t tell from looking at a system at boot time whether its kernel would assign MMIO regions above 4Gb. Hence there''s nothing you can qualify a conditional disable on; you could make a reasonable guess (e.g. summing up all MMIO region sizes from the PCI device BARs, including not-yet-enabled SR-IOV ones, and seeing whether that would fit in the MMIO hole below 4Gb), but I think that''s not really reasonable. Otoh the feature is experimental, so I don''t see why we shouldn''t allow people to try this out on systems they know the kernel has no need to assign high MMIO regions. The fact that "normal" pv-ops has problems with this is irrelevant here - it''s a genuine bug if it can''t cope with such a scenario (and btw. one more of the many reasons making us hesitant to switch our distros to it). Jan
On Thu, 28 Nov 2013 12:07:02 +0000 George Dunlap <george.dunlap@eu.citrix.com> wrote:> On 11/27/2013 02:27 AM, Mukesh Rathor wrote:.......> So a couple of thoughts from a release perspective. > > Releasing *code* as "experimental" means, "it may work or it may not; > use at your own risk". If people use it and it works, then great; > they can expect that the code will only get better. > > However, releasing an *interface* as "experimental" means, "it may > work for you now, but it may not work later when we change the > interface". While this is nice in theory, in practice, once something > works, people may begin to rely on it and we may end up having to > support it anyway. So the Linux interface cannot really be labelled > "experimental"; we have to be reasonably certain that we can support > it going forward. > > Benefits: > > We have a fairly solid precedent for releasing features as > "experimental" or "tech preview". This allows a much wider testing > and feedback. If it turns out to be robust enough, people may even > be able to use it, gaining the potential performance advantages. > > Someone could make an argument the other way: that the best thing to > do would be to check it in at the beginning of the release cycle, get > it well tested, and then release it as "ready" for 4.5, without going > through an "experimental" phase. Both arguments have their merits; > but since current way we do things hasn''t caused any problems and > seems to be working OK, it seems best to follow precedent, and assume > that a tech preview will be beneficial. > > Risks, bugs: > > All of the actual functional changes in this series are predicated on > "if(is_pvh_domain())", so in theory they should only have an effect > on PVH guests. (There is, of course, a small risk that there will be > a mistake here.) It introduces a new p2m type, but since it is the > only one that uses it, bugs should only affect PVH, and not other > functionality. > > Risks, interface: > > This patch series only adds two things to the interface with Linux: > XENMAPSPACE_gmfn_foreign and XENMEM_add_to_physmap_range. > These are already used and available in the ARM side. > > Normally I''d be afraid of accepting new interfaces at this stage in > the game, as I''d be afraid that we hadn''t had enough time to make > sure it''s something we want to support going forward. However, since > this is just duplicating an interface already in use on the ARM side, > I think the interface *has* been thought of for some time. This > makes is much more likely to be worth the risk; if the ARM side has > used it for 6 months without finding a problem with it, it seems > unlikely that the x86 side will be particularly different. > > So on the whole, there is a benefit (if a bit nebulous) to having it > in, and a reasonably low risk; and it''s not clear that the risk will > be significantly mitigated by waiting another 6 months. I''m > therefore inclined to give it a release ack. > > Any thoughts? >Normally, I''d be uncomfortable myself, but given that the feature is marked experimental, and the fact that the changes are hidden behind is_pvh_domain(), thereby leaving normal PV/HVM paths as before, gives me the comfort. But ultimately your call, and I"d be OK either way. thanks mukesh
On 11/29/2013 09:17 AM, Jan Beulich wrote:>>>> On 28.11.13 at 13:07, George Dunlap <george.dunlap@eu.citrix.com> wrote: >> So on the whole, there is a benefit (if a bit nebulous) to having it in, >> and a reasonably low risk; and it''s not clear that the risk will be >> significantly mitigated by waiting another 6 months. I''m therefore >> inclined to give it a release ack. >> >> Any thoughts? > You worded it quite nicely, and much better than I would ever have > been able to, but in the end it all boils down to the same reasoning > that I have been following when suggesting to take the changes if > they''re ready.Sure, and your reasoning was ultimately the basis for my conclusion. But I think it''s helpful to go through the exercise anyway just to be sure you''ve got it right; and it''s also helpful I think to have all the information in one place so that it''s easier for someone else to follow the reasoning, so they can either learn from it, or argue with it. :-) -George
Jan Beulich
2013-Dec-02 12:16 UTC
Re: [V3 PATCH 2/9] PVH dom0: create add_mem_mapping_for_xlate() function
>>> On 27.11.13 at 03:27, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > In this preparatory patch, add portion of XEN_DOMCTL_memory_mapping > code is put into a function so it can be called later for PVH from > construct_dom0. There is no change in it''s functionality. > The function is made non-static in the construct_dom0 patch.At this point I start questioning the purpose of the whole patch (and hence I''m glad I requested the scope of the broken out code to be further restricted):> +static int add_mem_mapping_for_xlate(struct domain *d, unsigned long gfn, > + unsigned long mfn, unsigned long nr_mfns) > +{ > + unsigned long i; > + int ret = 0; > + > + for ( i = 0; i < nr_mfns; i++ ) > + if ( !set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i)) ) > + ret = -EIO; > + if ( ret ) > + { > + if ( is_hardware_domain(d) ) > + panic("Failed setting p2m. ret:%d gfn:%lx mfn:%lx i:%ld\n", > + ret, gfn, mfn, i);In effect for Dom0 all you need is the code up to here, so the code re-used from the domctl is _only_ the loop at the beginning of the function. That doesn''t look like a worthwhile refactoring - just add the loop to domain_build.c verbatim. Jan
Jan Beulich
2013-Dec-02 12:30 UTC
Re: [V3 PATCH 3/9] PVH dom0: move some pv specific code to static functions
>>> On 27.11.13 at 03:27, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > In this preparatory patch also, some pv specific code is > carved out into static functions. No functionality change. > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>Reviewed-by: Jan Beulich <jbeulich@suse.com> albeit with a few minor comments:> +/* Pages that are part of page tables must be read only. */This and the other function''s comment seem to be a better fit if they remained at the call sites.> +static __init void mark_pv_pt_pages_rdonly(struct domain *d, > + l4_pgentry_t *l4start, > + unsigned long vpt_start, > + unsigned long nr_pt_pages) > +{ > + unsigned long count; > + struct page_info *page; > + l4_pgentry_t *pl4e; > + l3_pgentry_t *pl3e; > + l2_pgentry_t *pl2e; > + l1_pgentry_t *pl1e; > + > + pl4e = l4start + l4_table_offset(vpt_start); > + pl3e = l4e_to_l3e(*pl4e); > + pl3e += l3_table_offset(vpt_start); > + pl2e = l3e_to_l2e(*pl3e); > + pl2e += l2_table_offset(vpt_start); > + pl1e = l2e_to_l1e(*pl2e); > + pl1e += l1_table_offset(vpt_start); > + for ( count = 0; count < nr_pt_pages; count++ ) > + { > + l1e_remove_flags(*pl1e, _PAGE_RW); > + page = mfn_to_page(l1e_get_pfn(*pl1e)); > + > + /* Read-only mapping + PGC_allocated + page-table page. */ > + page->count_info = PGC_allocated | 3; > + page->u.inuse.type_info |= PGT_validated | 1; > + > + /* Top-level p.t. is pinned. */ > + if ( (page->u.inuse.type_info & PGT_type_mask) => + (!is_pv_32on64_domain(d) ? > + PGT_l4_page_table : PGT_l3_page_table) ) > + { > + page->count_info += 1; > + page->u.inuse.type_info += 1 | PGT_pinned; > + } > + > + /* Iterate. */ > + if ( !((unsigned long)++pl1e & (PAGE_SIZE - 1)) ) > + { > + if ( !((unsigned long)++pl2e & (PAGE_SIZE - 1)) ) > + { > + if ( !((unsigned long)++pl3e & (PAGE_SIZE - 1)) ) > + pl3e = l4e_to_l3e(*++pl4e); > + pl2e = l3e_to_l2e(*pl3e); > + } > + pl1e = l2e_to_l1e(*pl2e); > + } > + } > +} > + > +/* Set up the phys->machine table if not part of the initial mapping. */Even more so here, since the second half of the comment is actually referring to code that you left at the call site.> +static __init void setup_pv_physmap(struct domain *d, unsigned long pgtbl_pfn, > + unsigned long v_start, unsigned long v_end, > + unsigned long vphysmap_start, > + unsigned long vphysmap_end, > + unsigned long nr_pages) > +{ > + struct page_info *page = NULL; > + l4_pgentry_t *pl4e = NULL, *l4start;Pointless initializer.> + l3_pgentry_t *pl3e = NULL; > + l2_pgentry_t *pl2e = NULL; > + l1_pgentry_t *pl1e = NULL; > + > + l4start = map_domain_page(pgtbl_pfn);Instead, this one could become the initializer of l4start. Jan
>>> On 27.11.13 at 03:27, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > This patch changes construct_dom0 to boot in PVH mode. Changes > need to support it are also included here. > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>Looks mostly okay, except ...> @@ -1089,8 +1264,15 @@ int __init construct_dom0( > regs->eflags = X86_EFLAGS_IF; > > if ( opt_dom0_shadow ) > + { > + if ( is_pvh_domain(d) ) > + { > + printk("Invalid option dom0_shadow for PVH\n");... this option isn''t invalid, it''s merely unsupported iiuc.> +void __init hap_set_pvh_alloc_for_dom0(struct domain *d, > + unsigned long num_pages) > +{ > + int rc; > + unsigned long memkb = num_pages * (PAGE_SIZE / 1024); > + > + /* Copied from: libxl_get_required_shadow_memory() */Indentation.> + memkb = 4 * (256 * d->max_vcpus + 2 * (memkb / 1024)); > + num_pages = ((memkb+1023)/1024) << (20 - PAGE_SHIFT);Missing blanks around operators. Jan
Jan Beulich
2013-Dec-02 12:47 UTC
Re: [V3 PATCH 5/9] PVH dom0: implement XENMEM_add_to_physmap_range for x86
>>> On 27.11.13 at 03:27, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > This preparatory patch adds support for XENMEM_add_to_physmap_range > on x86 so it can be used to create a guest on PVH dom0. To this end, we > add a new function xenmem_add_to_physmap_range(), and change > xenmem_add_to_physmap_once parameters so it can be called from > xenmem_add_to_physmap_range. > > Please note, compat will continue to return -ENOSYS.And as noted a number of times before - I don''t think that''s appropriate. There''s nothing keeping non-PVH guests from using this interface, and hence it should either be uniformly available to all of them, or uniformly unavailable. I''m not intending to apply such a half baked thing.> +static int xenmem_add_to_physmap_range(struct domain *d, > + struct xen_add_to_physmap_range *xatpr) > +{ > + int rc;Why is this being left here ...> + > + /* Process entries in reverse order to allow continuations */ > + while ( xatpr->size > 0 ) > + { > + xen_ulong_t idx; > + xen_pfn_t gpfn; > + struct xen_add_to_physmap xatp;... when all the other ones are properly scope restricted?> + > + if ( copy_from_guest_offset(&idx, xatpr->idxs, xatpr->size-1, 1) || > + copy_from_guest_offset(&gpfn, xatpr->gpfns, xatpr->size-1, 1) ) > + { > + return -EFAULT; > + }Pointless braces (and inconsistent with code further down).> @@ -4689,6 +4725,10 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) > if ( copy_from_guest(&xatp, arg, 1) ) > return -EFAULT; > > + /* This one is only supported for add_to_physmap_range */ > + if ( xatp.space == XENMAPSPACE_gmfn_foreign ) > + return -EINVAL;Kind of an odd restriction - as this isn''t by design, adding "for now" would seem desirable.> + if ( (rc = xsm_add_to_physmap(XSM_TARGET, current->domain, d)) )By inverting the condition here ...> + { > + rcu_unlock_domain(d); > + return rc; > + } > + > + rc = xenmem_add_to_physmap_range(d, &xatpr);... and putting this into the body of the if(), you could avoid having two distinct exit paths (under the premise that a XSM call wouldn''t return -EAGAIN). Jan
>>> On 27.11.13 at 03:27, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > -static XSM_INLINE int xsm_add_to_physmap(XSM_DEFAULT_ARG struct domain *d1, struct domain *d2) > +static XSM_INLINE int xsm_add_to_physmap(XSM_DEFAULT_ARG struct domain *d1, struct domain *d2, struct domain *d3) > { > + int rc; > + > XSM_ASSERT_ACTION(XSM_TARGET); > - return xsm_default_action(action, d1, d2); > + rc = xsm_default_action(action, d1, d2); > + if ( d3 && !rc ) > + rc = xsm_default_action(action, d1, d3);Is this really making sense? It means that d1 has rights over both d2 and d3. Yet there''s only a single ->target field in struct domain. I see that this is in line with xsm_mmu_update(), but rather than accepting it on that basis I wonder whether that one''s making sense either. In any event, function parameters should be renamed following the ones of xsm_mmu_update(). Jan
Jan Beulich
2013-Dec-02 12:57 UTC
Re: [V3 PATCH 8/9] pvh dom0: Add and remove foreign pages
>>> On 27.11.13 at 03:27, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > +static int xenmem_add_foreign_to_pmap(unsigned long fgfn, unsigned long gpfn, > + domid_t foreign_domid)Please not a third abbreviation: This should say "p2m" or "physmap". Jan
>>> On 27.11.13 at 03:27, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > + /* > + * Following removed when "pvh fixme" in domain_build.c is resolved. > + * The fixme is added by patch "PVH dom0: construct_dom0 changes". > + */ > + if ( opt_dom0pvh ) > + panic("You do not have the correct xen version for dom0 PVH\n");As said before - either we go without this patch, or you drop this unconditional panic() when the option is being made use of (on the basis that the feature is documented to be experimenatal anyway). Jan
Roger Pau Monné
2013-Dec-02 15:09 UTC
Re: [V3 PATCH 9/9] pvh dom0: add opt_dom0pvh to setup.c
On 27/11/13 03:27, Mukesh Rathor wrote:> Add opt_dom0pvh. Note, pvh dom0 is disabled until the fixme in > domain_build.c is resolved. The fixme is added by patch title: > "PVH dom0: construct_dom0 changes" > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > --- > xen/arch/x86/setup.c | 19 ++++++++++++++++--- > 1 files changed, 16 insertions(+), 3 deletions(-) > > diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c > index e33c34b..de30ef6 100644 > --- a/xen/arch/x86/setup.c > +++ b/xen/arch/x86/setup.c > @@ -61,6 +61,10 @@ integer_param("maxcpus", max_cpus); > static bool_t __initdata disable_smep; > invbool_param("smep", disable_smep); > > +/* Boot dom0 in pvh mode */ > +bool_t __initdata opt_dom0pvh; > +boolean_param("dom0pvh", opt_dom0pvh); > + > /* **** Linux config option: propagated to domain0. */ > /* "acpi=off": Sisables both ACPI table parsing and interpreter. */ > /* "acpi=force": Override the disable blacklist. */ > @@ -545,7 +549,7 @@ void __init __start_xen(unsigned long mbi_p) > { > char *memmap_type = NULL; > char *cmdline, *kextra, *loader; > - unsigned int initrdidx; > + unsigned int initrdidx, domcr_flags = 0; > multiboot_info_t *mbi = __va(mbi_p); > module_t *mod = (module_t *)__va(mbi->mods_addr); > unsigned long nr_pages, raw_max_page, modules_headroom, *module_map; > @@ -1332,8 +1336,17 @@ void __init __start_xen(unsigned long mbi_p) > if ( !tboot_protect_mem_regions() ) > panic("Could not protect TXT memory regions\n"); > > - /* Create initial domain 0. */ > - dom0 = domain_create(0, DOMCRF_s3_integrity, 0); > + /* > + * Following removed when "pvh fixme" in domain_build.c is resolved. > + * The fixme is added by patch "PVH dom0: construct_dom0 changes". > + */ > + if ( opt_dom0pvh ) > + panic("You do not have the correct xen version for dom0 PVH\n");I''ve removed this from my local copy and passed dom0pvh=1 on the command line in order to try to boot with PVH Dom0. As Dom0 kernel I''m using the tmp2 branch from your repo at git://oss.oracle.com/git/mrathor /linux.git (which seems to work fine as a DomU), but as Dom0 the kernel panics with the following message: __ __ _ _ _ _ _ _ _ \ \/ /___ _ __ | || | | || | _ _ _ __ ___| |_ __ _| |__ | | ___ \ // _ \ ''_ \ | || |_| || |_ __| | | | ''_ \/ __| __/ _` | ''_ \| |/ _ \ / \ __/ | | | |__ _|__ _|__| |_| | | | \__ \ || (_| | |_) | | __/ /_/\_\___|_| |_| |_|(_) |_| \__,_|_| |_|___/\__\__,_|_.__/|_|\___| (XEN) Xen version 4.4-unstable (root@) (gcc (Debian 4.4.5-8) 4.4.5) debug=y Mon Dec 2 14:13:26 CET 2013 (XEN) Latest ChangeSet: Thu Nov 21 18:11:15 2013 -0800 git:b535866-dirty (XEN) Bootloader: PXELINUX 4.02 debian-20101014 (XEN) Command line: dom0pvh=1 com1=115200,8n1 guest_loglvl=all loglvl=all console=com1 (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) VBE/DDC methods: V2; EDID transfer time: 1 seconds (XEN) EDID info not retrieved because of reasons unknown (XEN) Disc information: (XEN) Found 2 MBR signatures (XEN) Found 2 EDD information structures (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 0000000000092400 (usable) (XEN) 00000000000f0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000dfdf9c00 (usable) (XEN) 00000000dfdf9c00 - 00000000dfe4bc00 (ACPI NVS) (XEN) 00000000dfe4bc00 - 00000000dfe4dc00 (ACPI data) (XEN) 00000000dfe4dc00 - 00000000e0000000 (reserved) (XEN) 00000000f8000000 - 00000000fd000000 (reserved) (XEN) 00000000fe000000 - 00000000fed00400 (reserved) (XEN) 00000000fee00000 - 00000000fef00000 (reserved) (XEN) 00000000ffb00000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 00000001a0000000 (usable) (XEN) ACPI: RSDP 000FEC30, 0024 (r2 DELL ) (XEN) ACPI: XSDT 000FCCC7, 007C (r1 DELL B10K 15 ASL 61) (XEN) ACPI: FACP 000FCDB7, 00F4 (r3 DELL B10K 15 ASL 61) (XEN) ACPI: DSDT FFE9E951, 4A74 (r1 DELL dt_ex 1000 INTL 20050624) (XEN) ACPI: FACS DFDF9C00, 0040 (XEN) ACPI: SSDT FFEA34D6, 009C (r1 DELL st_ex 1000 INTL 20050624) (XEN) ACPI: APIC 000FCEAB, 015E (r1 DELL B10K 15 ASL 61) (XEN) ACPI: BOOT 000FD009, 0028 (r1 DELL B10K 15 ASL 61) (XEN) ACPI: ASF! 000FD031, 0096 (r32 DELL B10K 15 ASL 61) (XEN) ACPI: MCFG 000FD0C7, 003C (r1 DELL B10K 15 ASL 61) (XEN) ACPI: HPET 000FD103, 0038 (r1 DELL B10K 15 ASL 61) (XEN) ACPI: TCPA 000FD35F, 0032 (r1 DELL B10K 15 ASL 61) (XEN) ACPI: DMAR 000FD391, 00C8 (r1 DELL B10K 15 ASL 61) (XEN) ACPI: SLIC 000FD13B, 0176 (r1 DELL B10K 15 ASL 61) (XEN) ACPI: SSDT DFE4DC00, 15C4 (r1 INTEL PPM RCM 80000001 INTL 20061109) (XEN) System RAM: 6141MB (6288940kB) (XEN) No NUMA configuration found (XEN) Faking a node at 0000000000000000-00000001a0000000 (XEN) Domain heap initialised (XEN) DMI 2.5 present. (XEN) Using APIC driver default (XEN) ACPI: PM-Timer IO Port: 0x808 (XEN) ACPI: SLEEP INFO: pm1x_cnt[804,0], pm1x_evt[800,0] (XEN) ACPI: wakeup_vec[dfdf9c0c], vec_size[20] (XEN) ACPI: Local APIC address 0xfee00000 (XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) (XEN) Processor #0 7:10 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) (XEN) Processor #2 7:10 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x04] enabled) (XEN) Processor #4 7:10 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x06] enabled) (XEN) Processor #6 7:10 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x01] enabled) (XEN) Processor #1 7:10 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x03] enabled) (XEN) Processor #3 7:10 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x05] enabled) (XEN) Processor #5 7:10 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x08] lapic_id[0x07] enabled) (XEN) Processor #7 7:10 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x09] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x10] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x11] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x12] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x13] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x14] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x15] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x16] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x17] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x18] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x19] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x1a] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x1b] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x1c] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x1d] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x1e] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x1f] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC (acpi_id[0x20] lapic_id[0x00] disabled) (XEN) ACPI: LAPIC_NMI (acpi_id[0xff] high level lint[0x1]) (XEN) ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0]) (XEN) IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23 (XEN) ACPI: IOAPIC (id[0x09] address[0xfec80000] gsi_base[24]) (XEN) IOAPIC[1]: apic_id 9, version 32, address 0xfec80000, GSI 24-47 (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) (XEN) ACPI: IRQ0 used by override. (XEN) ACPI: IRQ2 used by override. (XEN) ACPI: IRQ9 used by override. (XEN) Enabling APIC mode: Flat. Using 2 I/O APICs (XEN) ACPI: HPET id: 0x8086a301 base: 0xfed00000 (XEN) ERST table was not found (XEN) Using ACPI (MADT) for SMP configuration information (XEN) SMP: Allowing 32 CPUs (24 hotplug CPUs) (XEN) IRQ limits: 48 GSI, 1504 MSI/MSI-X (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Detected 3066.818 MHz processor. (XEN) Initing memory sharing. (XEN) mce_intel.c:717: MCA Capability: BCAST 1 SER 0 CMCI 1 firstbank 0 extended MCE MSR 0 (XEN) Intel machine check reporting enabled (XEN) PCI: MCFG configuration 0: base f8000000 segment 0000 buses 00 - 3f (XEN) PCI: MCFG area at f8000000 reserved in E820 (XEN) PCI: Using MCFG for segment 0000 bus 00-3f (XEN) Intel VT-d iommu 0 supported page sizes: 4kB. (XEN) Intel VT-d Snoop Control enabled. (XEN) Intel VT-d Dom0 DMA Passthrough not enabled. (XEN) Intel VT-d Queued Invalidation enabled. (XEN) Intel VT-d Interrupt Remapping enabled. (XEN) Intel VT-d Shared EPT tables not enabled. (XEN) I/O virtualisation enabled (XEN) - Dom0 mode: Relaxed (XEN) Interrupt remapping enabled (XEN) ENABLING IO-APIC IRQs (XEN) -> Using new ACK method (XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1 (XEN) Platform timer is 14.318MHz HPET (XEN) Allocated console ring of 64 KiB. (XEN) mwait-idle: MWAIT substates: 0x1120 (XEN) mwait-idle: v0.4 model 0x1a (XEN) mwait-idle: lapic_timer_reliable_states 0x2 (XEN) HPET: 0 timers usable for broadcast (4 total) (XEN) VMX: Supported advanced features: (XEN) - APIC MMIO access virtualisation (XEN) - APIC TPR shadow (XEN) - Extended Page Tables (EPT) (XEN) - Virtual-Processor Identifiers (VPID) (XEN) - Virtual NMI (XEN) - MSR direct-access bitmap (XEN) HVM: ASIDs enabled. (XEN) HVM: VMX enabled (XEN) HVM: Hardware Assisted Paging (HAP) detected (XEN) HVM: HAP page sizes: 4kB, 2MB (XEN) Brought up 8 CPUs (XEN) ACPI sleep modes: S3 (XEN) mcheck_poll: Machine check polling timer started. (XEN) *** LOADING DOMAIN 0 *** (XEN) elf_parse_binary: phdr: paddr=0x1000000 memsz=0xa12000 (XEN) elf_parse_binary: phdr: paddr=0x1c00000 memsz=0xb60f0 (XEN) elf_parse_binary: phdr: paddr=0x1cb7000 memsz=0x14d80 (XEN) elf_parse_binary: phdr: paddr=0x1ccc000 memsz=0x716000 (XEN) elf_parse_binary: memory: 0x1000000 -> 0x23e2000 (XEN) elf_xen_parse_note: GUEST_OS = "linux" (XEN) elf_xen_parse_note: GUEST_VERSION = "2.6" (XEN) elf_xen_parse_note: XEN_VERSION = "xen-3.0" (XEN) elf_xen_parse_note: VIRT_BASE = 0xffffffff80000000 (XEN) elf_xen_parse_note: ENTRY = 0xffffffff81ccc1e0 (XEN) elf_xen_parse_note: HYPERCALL_PAGE = 0xffffffff81001000 (XEN) elf_xen_parse_note: FEATURES = "!writable_page_tables|pae_pgdir_above_4gb|writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector" (XEN) elf_xen_parse_note: PAE_MODE = "yes" (XEN) elf_xen_parse_note: LOADER = "generic" (XEN) elf_xen_parse_note: unknown xen elf note (0xd) (XEN) elf_xen_parse_note: SUSPEND_CANCEL = 0x1 (XEN) elf_xen_parse_note: HV_START_LOW = 0xffff800000000000 (XEN) elf_xen_parse_note: PADDR_OFFSET = 0x0 (XEN) elf_xen_addr_calc_check: addresses: (XEN) virt_base = 0xffffffff80000000 (XEN) elf_paddr_offset = 0x0 (XEN) virt_offset = 0xffffffff80000000 (XEN) virt_kstart = 0xffffffff81000000 (XEN) virt_kend = 0xffffffff823e2000 (XEN) virt_entry = 0xffffffff81ccc1e0 (XEN) p2m_base = 0xffffffffffffffff (XEN) Xen kernel: 64-bit, lsb, compat32 (XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x23e2000 (XEN) PHYSICAL MEMORY ARRANGEMENT: (XEN) Dom0 alloc.: 0000000190000000->0000000194000000 (1483129 pages to be allocated) (XEN) Init. ramdisk: 000000019bce7000->000000019ffffa00 (XEN) VIRTUAL MEMORY ARRANGEMENT: (XEN) Loaded kernel: ffffffff81000000->ffffffff823e2000 (XEN) Init. ramdisk: ffffffff823e2000->ffffffff866faa00 (XEN) Phys-Mach map: ffffffff866fb000->ffffffff8728d490 (XEN) Start info: ffffffff8728e000->ffffffff8728f4b4 (XEN) Page tables: ffffffff87290000->ffffffff872cd000 (XEN) Boot stack: ffffffff872cd000->ffffffff872ce000 (XEN) TOTAL: ffffffff80000000->ffffffff87400000 (XEN) ENTRY ADDRESS: ffffffff81ccc1e0 (XEN) Dom0 has maximum 8 VCPUs (XEN) elf_load_binary: phdr 0 at 0xffffffff81000000 -> 0xffffffff81a12000 (XEN) elf_load_binary: phdr 1 at 0xffffffff81c00000 -> 0xffffffff81cb60f0 (XEN) elf_load_binary: phdr 2 at 0xffffffff81cb7000 -> 0xffffffff81ccbd80 (XEN) elf_load_binary: phdr 3 at 0xffffffff81ccc000 -> 0xffffffff81e6a000 (XEN) Scrubbing Free RAM: done. (XEN) Initial low memory virq threshold set at 0x4000 pages. (XEN) Std. Loglevel: All (XEN) Guest Loglevel: All (XEN) *** Serial input -> DOM0 (type ''CTRL-a'' three times to switch input to Xen) (XEN) Freed 256kB init memory. mapping kernel into physical memory about to get started... (XEN) memory.c:132:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 1) [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Initializing cgroup subsys cpuacct [ 0.000000] Linux version 3.12.0-rc6upstream-g62c68d3 (root@loki) (gcc version 4.4.5 (Debian 4.4.5-8) ) #5 SMP Mon Dec 2 13:00:04 CET 2013 [ 0.000000] Command line: root=/dev/sda1 ro ramdisk_size=1024000 earlyprintk=xenboot loglevel=9 console=hvc0 debug [ 0.000000] Released 18446744073708661104 pages of unused memory [ 0.000000] Set 131701 page(s) to 1-1 mapping [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/x86/xen/setup.c:134 xen_do_chunk+0x1a0/0x247() [ 0.000000] Failed to populate pfn 1770b9 err=0 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.0-rc6upstream-g62c68d3 #5 [ 0.000000] 0000000000000086 ffffffff81c01c48 ffffffff816a748b 0000000000000086 [ 0.000000] ffffffff81c01c98 ffffffff81c01c88 ffffffff8109b617 ffffffff81c01c98 [ 0.000000] 00000000001770b9 0000000000000000 00000000001a0000 0000000000004c27 [ 0.000000] Call Trace: [ 0.000000] [<ffffffff816a748b>] dump_stack+0x59/0x7b [ 0.000000] [<ffffffff8109b617>] warn_slowpath_common+0x87/0xb0 [ 0.000000] [<ffffffff8109b6e1>] warn_slowpath_fmt+0x41/0x50 [ 0.000000] [<ffffffff81cd733a>] ? e820_all_mapped+0x65/0x65 [ 0.000000] [<ffffffff81cd1839>] xen_do_chunk+0x1a0/0x247 [ 0.000000] [<ffffffff81cd1cd2>] xen_memory_setup+0x3f2/0x702 [ 0.000000] [<ffffffff816acab0>] ? _raw_spin_unlock_irqrestore+0x20/0x70 [ 0.000000] [<ffffffff81cd7600>] setup_memory_map+0xf/0x41 [ 0.000000] [<ffffffff81cd5958>] setup_arch+0x1cc/0xd24 [ 0.000000] [<ffffffff816a72f9>] ? printk+0x72/0x74 [ 0.000000] [<ffffffff81cccd64>] start_kernel+0x8b/0x3fa [ 0.000000] [<ffffffff81ccc5f3>] x86_64_start_reservations+0x2a/0x2c [ 0.000000] [<ffffffff81cd1329>] xen_start_kernel+0x60f/0x611 [ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]--- [ 0.000000] Populating 172492-1a0000 pfn range: 19495 pages added [ 0.000000] e820: BIOS-provided physical RAM map: [ 0.000000] Xen: [mem 0x0000000000000000-0x0000000000091fff] usable [ 0.000000] Xen: [mem 0x00000000000a0000-0x00000000000fffff] reserved [ 0.000000] Xen: [mem 0x0000000000100000-0x00000000dfdf8fff] usable [ 0.000000] Xen: [mem 0x00000000dfdf9c00-0x00000000dfe4bbff] ACPI NVS [ 0.000000] Xen: [mem 0x00000000dfe4bc00-0x00000000dfe4dbff] ACPI data [ 0.000000] Xen: [mem 0x00000000dfe4dc00-0x00000000dfffffff] reserved [ 0.000000] Xen: [mem 0x00000000f8000000-0x00000000fcffffff] reserved [ 0.000000] Xen: [mem 0x00000000fe000000-0x00000000fed003ff] reserved [ 0.000000] Xen: [mem 0x00000000fee00000-0x00000000feefffff] reserved [ 0.000000] Xen: [mem 0x00000000ffb00000-0x00000000ffffffff] reserved [ 0.000000] Xen: [mem 0x0000000100000000-0x000000019fffffff] usable [ 0.000000] bootconsole [xenboot0] enabled [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] SMBIOS 2.5 present. [ 0.000000] DMI: Dell Inc. Precision WorkStation T3500 /09KPNV, BIOS A15 03/28/2012 [ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved [ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable [ 0.000000] No AGP bridge found [ 0.000000] e820: last_pfn = 0x1a0000 max_arch_pfn = 0x400000000 [ 0.000000] e820: last_pfn = 0xdfdf9 max_arch_pfn = 0x400000000 [ 0.000000] Scanning 1 areas for low memory corruption [ 0.000000] Base memory trampoline at [ffff88000008c000] 8c000 size 24576 [ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff] [ 0.000000] [mem 0x00000000-0x000fffff] page 4k [ 0.000000] init_memory_mapping: [mem 0x176e00000-0x176ffffff] [ 0.000000] [mem 0x176e00000-0x176ffffff] page 4k [ 0.000000] BRK [0x01fc3000, 0x01fc3fff] PGTABLE [ 0.000000] BRK [0x01fc4000, 0x01fc4fff] PGTABLE [ 0.000000] init_memory_mapping: [mem 0x174000000-0x176dfffff] [ 0.000000] [mem 0x174000000-0x176dfffff] page 4k [ 0.000000] BRK [0x01fc5000, 0x01fc5fff] PGTABLE [ 0.000000] BRK [0x01fc6000, 0x01fc6fff] PGTABLE [ 0.000000] BRK [0x01fc7000, 0x01fc7fff] PGTABLE [ 0.000000] BRK [0x01fc8000, 0x01fc8fff] PGTABLE [ 0.000000] init_memory_mapping: [mem 0x100000000-0x173ffffff] [ 0.000000] [mem 0x100000000-0x173ffffff] page 4k [ 0.000000] init_memory_mapping: [mem 0x00100000-0xdfdf8fff] [ 0.000000] [mem 0x00100000-0xdfdf8fff] page 4k [ 0.000000] init_memory_mapping: [mem 0x177000000-0x19fffffff] [ 0.000000] [mem 0x177000000-0x19fffffff] page 4k [ 0.000000] RAMDISK: [mem 0x023e2000-0x066fafff] [ 0.000000] ACPI: RSDP 00000000000fec30 00024 (v02 DELL ) [ 0.000000] ACPI: XSDT 00000000000fccc7 0007C (v01 DELL B10K 00000015 ASL 00000061) [ 0.000000] ACPI: FACP 00000000000fcdb7 000F4 (v03 DELL B10K 00000015 ASL 00000061) [ 0.000000] ACPI: DSDT 00000000ffe9e951 04A74 (v01 DELL dt_ex 00001000 INTL 20050624) [ 0.000000] ACPI: FACS 00000000dfdf9c00 00040 [ 0.000000] ACPI: SSDT 00000000ffea34d6 0009C (v01 DELL st_ex 00001000 INTL 20050624) [ 0.000000] ACPI: APIC 00000000000fceab 0015E (v01 DELL B10K 00000015 ASL 00000061) [ 0.000000] ACPI: BOOT 00000000000fd009 00028 (v01 DELL B10K 00000015 ASL 00000061) [ 0.000000] ACPI: ASF! 00000000000fd031 00096 (v32 DELL B10K 00000015 ASL 00000061) [ 0.000000] ACPI: MCFG 00000000000fd0c7 0003C (v01 DELL B10K 00000015 ASL 00000061) [ 0.000000] ACPI: HPET 00000000000fd103 00038 (v01 DELL B10K 00000015 ASL 00000061) [ 0.000000] ACPI: TCPA 00000000000fd35f 00032 (v01 DELL B10K 00000015 ASL 00000061) [ 0.000000] ACPI: DMAR 00000000000fd391 000C8 (v01 DELL B10K 00000015 ASL 00000061) [ 0.000000] ACPI: SLIC 00000000000fd13b 00176 (v01 DELL B10K 00000015 ASL 00000061) [ 0.000000] ACPI: SSDT 00000000dfe4dc00 015C4 (v01 INTEL PPM RCM 80000001 INTL 20061109) [ 0.000000] ACPI: Local APIC address 0xfee00000 [ 0.000000] NUMA turned off [ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000019fffffff] [ 0.000000] Initmem setup node 0 [mem 0x00000000-0x19fffffff] [ 0.000000] NODE_DATA [mem 0x1770b5000-0x1770b8fff] [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x00001000-0x00ffffff] [ 0.000000] DMA32 [mem 0x01000000-0xffffffff] [ 0.000000] Normal [mem 0x100000000-0x19fffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x00001000-0x00091fff] [ 0.000000] node 0: [mem 0x00100000-0xdfdf8fff] [ 0.000000] node 0: [mem 0x100000000-0x19fffffff] [ 0.000000] On node 0 totalpages: 1572234 [ 0.000000] DMA zone: 56 pages used for memmap [ 0.000000] DMA zone: 21 pages reserved [ 0.000000] DMA zone: 3985 pages, LIFO batch:0 [ 0.000000] DMA32 zone: 12481 pages used for memmap [ 0.000000] DMA32 zone: 912889 pages, LIFO batch:31 [ 0.000000] Normal zone: 8960 pages used for memmap [ 0.000000] Normal zone: 655360 pages, LIFO batch:31 [ 0.000000] ACPI: PM-Timer IO Port: 0x808 [ 0.000000] ACPI: Local APIC address 0xfee00000 [ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x04] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x06] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x01] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x03] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x05] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x07] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x10] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x11] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x12] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x13] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x14] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x15] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x16] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x17] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x18] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x19] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x1a] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x1b] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x1c] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x1d] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x1e] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x1f] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x20] lapic_id[0x00] disabled) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] high level lint[0x1]) [ 0.000000] ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0]) [ 0.000000] IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23 [ 0.000000] ACPI: IOAPIC (id[0x09] address[0xfec80000] gsi_base[24]) [ 0.000000] IOAPIC[1]: apic_id 9, version 32, address 0xfec80000, GSI 24-47 [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) [ 0.000000] ACPI: IRQ0 used by override. [ 0.000000] ACPI: IRQ2 used by override. [ 0.000000] ACPI: IRQ9 used by override. [ 0.000000] Using ACPI (MADT) for SMP configuration information [ 0.000000] ACPI: HPET id: 0x8086a301 base: 0xfed00000 [ 0.000000] smpboot: Allowing 32 CPUs, 24 hotplug CPUs [ 0.000000] nr_irqs_gsi: 64 [ 0.000000] PM: Registered nosave memory: [mem 0x00092000-0x0009ffff] [ 0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000fffff] [ 0.000000] PM: Registered nosave memory: [mem 0xdfdf9000-0xdfdf9fff] [ 0.000000] PM: Registered nosave memory: [mem 0xdfdfa000-0xdfe4afff] [ 0.000000] PM: Registered nosave memory: [mem 0xdfe4b000-0xdfe4bfff] [ 0.000000] PM: Registered nosave memory: [mem 0xdfe4c000-0xdfe4cfff] [ 0.000000] PM: Registered nosave memory: [mem 0xdfe4d000-0xdfe4dfff] [ 0.000000] PM: Registered nosave memory: [mem 0xdfe4e000-0xdfffffff] [ 0.000000] PM: Registered nosave memory: [mem 0xe0000000-0xf7ffffff] [ 0.000000] PM: Registered nosave memory: [mem 0xf8000000-0xfcffffff] [ 0.000000] PM: Registered nosave memory: [mem 0xfd000000-0xfdffffff] [ 0.000000] PM: Registered nosave memory: [mem 0xfe000000-0xfecfffff] [ 0.000000] PM: Registered nosave memory: [mem 0xfed00000-0xfedfffff] [ 0.000000] PM: Registered nosave memory: [mem 0xfee00000-0xfeefffff] [ 0.000000] PM: Registered nosave memory: [mem 0xfef00000-0xffafffff] [ 0.000000] PM: Registered nosave memory: [mem 0xffb00000-0xffffffff] [ 0.000000] e820: [mem 0xe0000000-0xf7ffffff] available for PCI devices [ 0.000000] Booting paravirtualized kernel with PVH extensions on Xen [ 0.000000] Xen version: 4.4-unstable [ 0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:32 nr_node_ids:1 [ 0.000000] PERCPU: Embedded 28 pages/cpu @ffff880176000000 s85376 r8192 d21120 u131072 [ 0.000000] pcpu-alloc: s85376 r8192 d21120 u131072 alloc=1*2097152 [ 0.000000] pcpu-alloc: [0] 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 [ 0.000000] pcpu-alloc: [0] 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 [ 2.585853] Built 1 zonelists in Node order, mobility grouping on. Total pages: 1550716 [ 2.585857] Policy zone: Normal [ 2.585861] Kernel command line: root=/dev/sda1 ro ramdisk_size=1024000 earlyprintk=xenboot loglevel=9 console=hvc0 debug [ 2.585931] PID hash table entries: 4096 (order: 3, 32768 bytes) [ 2.592600] software IO TLB [mem 0x16bc00000-0x16fc00000] (64MB) mapped at [ffff88016bc00000-ffff88016fbfffff] [ 2.608541] Memory: 5352736K/6288936K available (6885K kernel code, 724K rwdata, 2120K rodata, 1708K init, 1364K bss, 936200K reserved) [ 2.608927] Hierarchical RCU implementation. [ 2.608932] RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=8. [ 2.608949] NR_IRQS:33024 nr_irqs:1152 16 [ 2.609067] xen: sci override: global_irq=9 trigger=0 polarity=0 [ 2.609072] xen: registering gsi 9 triggering 0 polarity 0 [ 2.609095] xen: --> pirq=9 -> irq=9 (gsi=9) [ 2.609166] xen: acpi sci 9 [ 2.609181] xen: --> pirq=1 -> irq=1 (gsi=1) [ 2.609196] xen: --> pirq=2 -> irq=2 (gsi=2) [ 2.609210] xen: --> pirq=3 -> irq=3 (gsi=3) [ 2.609224] xen: --> pirq=4 -> irq=4 (gsi=4) [ 2.609238] xen: --> pirq=5 -> irq=5 (gsi=5) [ 2.609253] xen: --> pirq=6 -> irq=6 (gsi=6) [ 2.609267] xen: --> pirq=7 -> irq=7 (gsi=7) [ 2.609281] xen: --> pirq=8 -> irq=8 (gsi=8) [ 2.609289] xen map irq failed -12 [ 2.609296] xen map irq failed -12 [ 2.609302] xen map irq failed -12 [ 2.609308] xen map irq failed -12 [ 2.609314] xen map irq failed -12 [ 2.609320] xen map irq failed -12 (XEN) irq.c:375: Dom0 callback via changed to Direct Vector 0xf3 [ 2.609334] xen:events: Xen HVM callback vector for event delivery is enabled [ 2.612758] Console: colour VGA+ 80x25 [ 2.612764] console [hvc0] enabled, bootconsole disabled [ 2.612764] console [hvc0] enabled, bootconsole disabled [ 2.614018] Xen: using vcpuop timer interface [ 2.614033] installing Xen timer for CPU 0 [ 2.614115] tsc: Detected 3066.818 MHz processor [ 2.614127] Calibrating delay loop (skipped), value calculated using timer frequency.. 6133.63 BogoMIPS (lpj=3066818) [ 2.614143] pid_max: default: 32768 minimum: 301 [ 2.614469] Security Framework initialized [ 2.614479] SELinux: Initializing. [ 2.614531] SELinux: Starting in permissive mode [ 2.615098] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) [ 2.616621] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) [ 2.617334] Mount-cache hash table entries: 256 [ 2.618654] Initializing cgroup subsys freezer [ 2.618784] CPU: Physical Processor ID: 0 [ 2.618792] CPU: Processor Core ID: 0 [ 2.618800] mce: CPU supports 2 MCE banks [ 2.618855] Last level iTLB entries: 4KB 512, 2MB 7, 4MB 7 [ 2.618855] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32 [ 2.618855] tlb_flushall_shift: 6 [ 2.619005] Freeing SMP alternatives memory: 28K (ffffffff81e62000 - ffffffff81e69000) [ 2.622343] ACPI: Core revision 20130725 [ 2.784580] ACPI: All ACPI Tables successfully acquired [ 2.785078] cpu 0 spinlock event irq 65 [ 2.785268] Performance Events: unsupported p6 CPU model 26 no PMU driver, software events only. [ 2.786181] NMI watchdog: disabled (cpu0): hardware events not enabled [ 2.786797] installing Xen timer for CPU 1 [ 2.786883] cpu 1 spinlock event irq 72 [ 2.786906] ------------[ cut here ]------------ [ 2.786915] kernel BUG at arch/x86/xen/smp.c:437! [ 2.786923] invalid opcode: 0000 [#1] SMP [ 2.786934] Modules linked in: [ 2.786943] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 3.12.0-rc6upstream-g62c68d3 #5 [ 2.786956] Hardware name: Dell Inc. Precision WorkStation T3500 /09KPNV, BIOS A15 03/28/2012 [ 2.786969] task: ffff88016b73b080 ti: ffff88016b73c000 task.ti: ffff88016b73c000 [ 2.786981] RIP: 0010:[<ffffffff8104af28>] [<ffffffff8104af28>] xen_cpu_up+0x1f8/0x420 [ 2.786997] RSP: 0000:ffff88016b73ddf8 EFLAGS: 00010282 [ 2.787006] RAX: fffffffffffffff4 RBX: 0000000000000001 RCX: ffff88016b787f58 [ 2.787017] RDX: ffff88016b7f6000 RSI: 0000000000000001 RDI: 0000000000000000 [ 2.787028] RBP: ffff88016b73de38 R08: ffff88016b73dd38 R09: ffff88016b7f6000 [ 2.787039] R10: ffff880175802668 R11: ffff88016b7a0cf0 R12: 0000000000000001 [ 2.787050] R13: ffff88016b782920 R14: ffff88016b7f6000 R15: 0000000000000000 [ 2.787061] FS: 0000000000000000(0000) GS:ffff880176000000(0000) knlGS:0000000000000000 [ 2.787072] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080010039 [ 2.787082] CR2: 0000000000000000 CR3: 0000000001c0c000 CR4: 0000000000000660 [ 2.787093] Stack: [ 2.787097] 0000000000000001 0000000000000000 0000000000000001 0000000000000000 [ 2.787115] 0000000000000001 0000000000000000 0000000000000001 ffffffff81ca8520 [ 2.787133] ffff88016b73de98 ffffffff8109bd44 000000000000e1e0 ffff88016b782920 [ 2.787151] Call Trace: [ 2.787159] [<ffffffff8109bd44>] _cpu_up+0xd4/0x160 [ 2.787168] [<ffffffff8109be91>] cpu_up+0xc1/0x130 [ 2.787179] [<ffffffff81cf0524>] smp_init+0x4e/0xa6 [ 2.787189] [<ffffffff81ccc98c>] kernel_init_freeable+0xc9/0x1d9 [ 2.787201] [<ffffffff8169dcb0>] ? rest_init+0xa0/0xa0 [ 2.787210] [<ffffffff8169dcb9>] kernel_init+0x9/0xf0 [ 2.787221] [<ffffffff816b4e0c>] ret_from_fork+0x7c/0xb0 [ 2.787230] [<ffffffff8169dcb0>] ? rest_init+0xa0/0xa0 [ 2.787238] Code: c1 ef 0c 80 3d 9b 18 c6 00 00 74 30 48 89 fa 48 c1 e2 0c 49 89 96 90 13 00 00 31 ff 49 63 f4 4c 89 f2 e8 dc 63 fb ff 85 c0 74 04 <0f> 0b eb fe 4c 89 f7 e8 0c 91 16 00 e9 a1 fe ff ff e8 e2 a3 ff [ 2.787381] RIP [<ffffffff8104af28>] xen_cpu_up+0x1f8/0x420 [ 2.787392] RSP <ffff88016b73ddf8> [ 2.787412] ---[ end trace 4eaa2a86a8e2da23 ]--- [ 2.787426] swapper/0 (1) used greatest stack depth: 5192 bytes left [ 2.787439] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 2.787439] (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
Mukesh Rathor
2013-Dec-02 19:30 UTC
Re: [V3 PATCH 9/9] pvh dom0: add opt_dom0pvh to setup.c
On Mon, 2 Dec 2013 16:09:50 +0100 Roger Pau Monné <roger.pau@citrix.com> wrote:> On 27/11/13 03:27, Mukesh Rathor wrote: > > Add opt_dom0pvh. Note, pvh dom0 is disabled until the fixme in > > domain_build.c is resolved. The fixme is added by patch title: > > "PVH dom0: construct_dom0 changes" > > > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > > --- > > xen/arch/x86/setup.c | 19 ++++++++++++++++--- > > 1 files changed, 16 insertions(+), 3 deletions(-) > > > > diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c > > index e33c34b..de30ef6 100644 > > --- a/xen/arch/x86/setup.c > > +++ b/xen/arch/x86/setup.c > > @@ -61,6 +61,10 @@ integer_param("maxcpus", max_cpus); > > static bool_t __initdata disable_smep; > > invbool_param("smep", disable_smep); > > > > +/* Boot dom0 in pvh mode */ > > +bool_t __initdata opt_dom0pvh; > > +boolean_param("dom0pvh", opt_dom0pvh); > > + > > /* **** Linux config option: propagated to domain0. */ > > /* "acpi=off": Sisables both ACPI table parsing and > > interpreter. */ /* "acpi=force": Override the disable > > blacklist. */ @@ -545,7 +549,7 @@ void __init > > __start_xen(unsigned long mbi_p) { > > char *memmap_type = NULL; > > char *cmdline, *kextra, *loader; > > - unsigned int initrdidx; > > + unsigned int initrdidx, domcr_flags = 0; > > multiboot_info_t *mbi = __va(mbi_p); > > module_t *mod = (module_t *)__va(mbi->mods_addr); > > unsigned long nr_pages, raw_max_page, modules_headroom, > > *module_map; @@ -1332,8 +1336,17 @@ void __init > > __start_xen(unsigned long mbi_p) if ( !tboot_protect_mem_regions() ) > > panic("Could not protect TXT memory regions\n"); > > > > - /* Create initial domain 0. */ > > - dom0 = domain_create(0, DOMCRF_s3_integrity, 0); > > + /* > > + * Following removed when "pvh fixme" in domain_build.c is > > resolved. > > + * The fixme is added by patch "PVH dom0: construct_dom0 > > changes". > > + */ > > + if ( opt_dom0pvh ) > > + panic("You do not have the correct xen version for dom0 > > PVH\n"); > > I've removed this from my local copy and passed dom0pvh=1 on the > command line in order to try to boot with PVH Dom0. As Dom0 kernel > I'm using the tmp2 branch from your repo at > git://oss.oracle.com/git/mrathor /linux.git (which seems to work fine > as a DomU), but as Dom0 the kernel panics with the following message:Well, the guest failed VCPUOP_initialise for secondary vcpu. Do you have all the xen patches, specifically the one's you had submitted for it? I'm on latest xen with dom0 patches on e439e0b289e3590f84836e4f9bbdfa560c7af6ef. If yes, then wonder why xen failed VCPUOP_initialise! May be you can figure where it's failing. BTW, I also noticed: (XEN)memory.c:132:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 1) and Released 18446744073708661104 pages of unused memory <====================[ 0.000000] Set 131701 page(s) to 1-1 mapping [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/x86/xen/setup.c:134 xen_do_chunk+0x1a0/0x247() [ 0.000000] Failed to populate pfn 1770b9 err=0 <==================== Wonder what the e820 to dom0 looks like. Thanks for trying it out. Mukesh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Roger Pau Monné
2013-Dec-02 19:38 UTC
Re: [V3 PATCH 9/9] pvh dom0: add opt_dom0pvh to setup.c
On 02/12/13 20:30, Mukesh Rathor wrote:> On Mon, 2 Dec 2013 16:09:50 +0100 > Roger Pau Monné <roger.pau@citrix.com> wrote: > >> On 27/11/13 03:27, Mukesh Rathor wrote: >>> Add opt_dom0pvh. Note, pvh dom0 is disabled until the fixme in >>> domain_build.c is resolved. The fixme is added by patch title: >>> "PVH dom0: construct_dom0 changes" >>> >>> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> >>> --- >>> xen/arch/x86/setup.c | 19 ++++++++++++++++--- >>> 1 files changed, 16 insertions(+), 3 deletions(-) >>> >>> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c >>> index e33c34b..de30ef6 100644 >>> --- a/xen/arch/x86/setup.c >>> +++ b/xen/arch/x86/setup.c >>> @@ -61,6 +61,10 @@ integer_param("maxcpus", max_cpus); >>> static bool_t __initdata disable_smep; >>> invbool_param("smep", disable_smep); >>> >>> +/* Boot dom0 in pvh mode */ >>> +bool_t __initdata opt_dom0pvh; >>> +boolean_param("dom0pvh", opt_dom0pvh); >>> + >>> /* **** Linux config option: propagated to domain0. */ >>> /* "acpi=off": Sisables both ACPI table parsing and >>> interpreter. */ /* "acpi=force": Override the disable >>> blacklist. */ @@ -545,7 +549,7 @@ void __init >>> __start_xen(unsigned long mbi_p) { >>> char *memmap_type = NULL; >>> char *cmdline, *kextra, *loader; >>> - unsigned int initrdidx; >>> + unsigned int initrdidx, domcr_flags = 0; >>> multiboot_info_t *mbi = __va(mbi_p); >>> module_t *mod = (module_t *)__va(mbi->mods_addr); >>> unsigned long nr_pages, raw_max_page, modules_headroom, >>> *module_map; @@ -1332,8 +1336,17 @@ void __init >>> __start_xen(unsigned long mbi_p) if ( !tboot_protect_mem_regions() ) >>> panic("Could not protect TXT memory regions\n"); >>> >>> - /* Create initial domain 0. */ >>> - dom0 = domain_create(0, DOMCRF_s3_integrity, 0); >>> + /* >>> + * Following removed when "pvh fixme" in domain_build.c is >>> resolved. >>> + * The fixme is added by patch "PVH dom0: construct_dom0 >>> changes". >>> + */ >>> + if ( opt_dom0pvh ) >>> + panic("You do not have the correct xen version for dom0 >>> PVH\n"); >> >> I've removed this from my local copy and passed dom0pvh=1 on the >> command line in order to try to boot with PVH Dom0. As Dom0 kernel >> I'm using the tmp2 branch from your repo at >> git://oss.oracle.com/git/mrathor /linux.git (which seems to work fine >> as a DomU), but as Dom0 the kernel panics with the following message: > > Well, the guest failed VCPUOP_initialise for secondary vcpu. Do you > have all the xen patches, specifically the one's you had submitted for > it? I'm on latest xen with dom0 patches on > e439e0b289e3590f84836e4f9bbdfa560c7af6ef. If yes, then wonder why > xen failed VCPUOP_initialise! May be you can figure where it's failing.Thanks for the input, I haven't been able to do much debugging, but AFAICT the problem comes from alloc_vcpu_guest_context returning NULL, because alloc_domheap_page(NULL, 0) inside of that function also returned NULL (not able to allocate a page). I'm currently using your tree for both Linux (branch tmp2) and Xen (branch dom0pvh-v3), no added or removed patches (only the line mentioned above in order to boot Dom0 in PVH mode). Can you confirm you are able to boot a PVH Dom0 from you Xen and Linux trees? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Mukesh Rathor
2013-Dec-02 20:38 UTC
Re: [V3 PATCH 9/9] pvh dom0: add opt_dom0pvh to setup.c
On Mon, 2 Dec 2013 20:38:26 +0100 Roger Pau Monné <roger.pau@citrix.com> wrote:> On 02/12/13 20:30, Mukesh Rathor wrote: > > On Mon, 2 Dec 2013 16:09:50 +0100 > > Roger Pau Monné <roger.pau@citrix.com> wrote: > >......> Thanks for the input, I haven't been able to do much debugging, but > AFAICT the problem comes from alloc_vcpu_guest_context returning NULL, > because alloc_domheap_page(NULL, 0) inside of that function also > returned NULL (not able to allocate a page). > > I'm currently using your tree for both Linux (branch tmp2) and Xen > (branch dom0pvh-v3), no added or removed patches (only the line > mentioned above in order to boot Dom0 in PVH mode). > > Can you confirm you are able to boot a PVH Dom0 from you Xen and Linux > trees?yes, I'd not be submitting the patches otherwise ;).. Name ID Mem VCPUs State Time(s) Domain-0 0 1200 3 r----- 6250.6 thanks Mukesh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Mukesh Rathor
2013-Dec-02 20:46 UTC
Re: [V3 PATCH 9/9] pvh dom0: add opt_dom0pvh to setup.c
On Mon, 2 Dec 2013 12:38:12 -0800 Mukesh Rathor <mukesh.rathor@oracle.com> wrote:> On Mon, 2 Dec 2013 20:38:26 +0100 > Roger Pau Monné <roger.pau@citrix.com> wrote: > > > On 02/12/13 20:30, Mukesh Rathor wrote: > > > On Mon, 2 Dec 2013 16:09:50 +0100 > > > Roger Pau Monné <roger.pau@citrix.com> wrote: > > > > ...... > > Thanks for the input, I haven't been able to do much debugging, but > > AFAICT the problem comes from alloc_vcpu_guest_context returning > > NULL, because alloc_domheap_page(NULL, 0) inside of that function > > also returned NULL (not able to allocate a page). > > > > I'm currently using your tree for both Linux (branch tmp2) and Xen > > (branch dom0pvh-v3), no added or removed patches (only the line > > mentioned above in order to boot Dom0 in PVH mode). > > > > Can you confirm you are able to boot a PVH Dom0 from you Xen and > > Linux trees? > > yes, I'd not be submitting the patches otherwise ;).. > > Name ID Mem VCPUs > State Time(s) Domain-0 0 > 1200 3 r----- 6250.6FWIW, here's my command line: kernel /xen.hybrid.kdb console=com1,vga com1=57600,8n1 dom0_mem=1200M maxcpus=4 dom0_max_vcpus=3 noreboot sync_console dom0pvh module /bzImage.hyb ro root=/dev/sda10 console=tty console=hvc0,57600n8 Something is going with memory on your system looks like, the line Released 18446744073708661104 pages of unused memory is worrysome. May be start with dom0_mem and see if that makes a difference? OTOH, after I am done cranking out another dom0 version with Jan's comments, I'll try out on my system without dom0_mem specified. That make take a day or two. thanks, Mukesh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Mukesh Rathor
2013-Dec-03 00:05 UTC
Re: [V3 PATCH 5/9] PVH dom0: implement XENMEM_add_to_physmap_range for x86
On Mon, 02 Dec 2013 12:47:25 +0000 "Jan Beulich" <JBeulich@suse.com> wrote:> >>> On 27.11.13 at 03:27, Mukesh Rathor <mukesh.rathor@oracle.com> > >>> wrote: > > This preparatory patch adds support for XENMEM_add_to_physmap_range > > on x86 so it can be used to create a guest on PVH dom0. To this > > end, we add a new function xenmem_add_to_physmap_range(), and change > > xenmem_add_to_physmap_once parameters so it can be called from > > xenmem_add_to_physmap_range. > > > > Please note, compat will continue to return -ENOSYS. > > And as noted a number of times before - I don''t think that''s > appropriate. There''s nothing keeping non-PVH guests from using > this interface, and hence it should either be uniformly available to > all of them, or uniformly unavailable. I''m not intending to apply > such a half baked thing.I am sorry for your frustration, but as I said before, doing it is not straightforward as it involves creating a new version of XLAT_add_to_physmap which unfortunately is generated via python/shell scripts. At this point, I want to focus on the 64bit. I understand while it''s not a regression, it''s a partial implementation, but so is the entire PVH and many other features that are "work in progress". Your final word on this was: quote: Then leave it out, and I''ll waste my time on getting it implemented once the patch set is in. But please add a clear note of this state to the patch description. Jan end quote All your other comments are taken care of. Thanks for your help. mukesh
Mukesh Rathor
2013-Dec-03 02:33 UTC
Re: [V3 PATCH 9/9] pvh dom0: add opt_dom0pvh to setup.c
On Mon, 2 Dec 2013 12:46:49 -0800 Mukesh Rathor <mukesh.rathor@oracle.com> wrote:> On Mon, 2 Dec 2013 12:38:12 -0800 > Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > > > On Mon, 2 Dec 2013 20:38:26 +0100 > > Roger Pau Monné <roger.pau@citrix.com> wrote: > > > > > On 02/12/13 20:30, Mukesh Rathor wrote: > > > > On Mon, 2 Dec 2013 16:09:50 +0100 > > > > Roger Pau Monné <roger.pau@citrix.com> wrote: > > > > > > ...... > > > Thanks for the input, I haven't been able to do much debugging, > > > but AFAICT the problem comes from alloc_vcpu_guest_context > > > returning NULL, because alloc_domheap_page(NULL, 0) inside of > > > that function also returned NULL (not able to allocate a page). > > > > > > I'm currently using your tree for both Linux (branch tmp2) and Xen > > > (branch dom0pvh-v3), no added or removed patches (only the line > > > mentioned above in order to boot Dom0 in PVH mode). > > > > > > Can you confirm you are able to boot a PVH Dom0 from you Xen and > > > Linux trees? > > > > yes, I'd not be submitting the patches otherwise ;).. > > > > Name ID Mem VCPUs > > State Time(s) Domain-0 0 > > 1200 3 r----- 6250.6 > > FWIW, here's my command line: > > kernel /xen.hybrid.kdb console=com1,vga com1=57600,8n1 > dom0_mem=1200M maxcpus=4 dom0_max_vcpus=3 noreboot sync_console > dom0pvh module /bzImage.hyb ro root=/dev/sda10 console=tty > console=hvc0,57600n8 > > > Something is going with memory on your system looks like, the line > > Released 18446744073708661104 pages of unused memory > > is worrysome. May be start with dom0_mem and see if that makes a > difference? > > OTOH, after I am done cranking out another dom0 version with Jan's > comments, I'll try out on my system without dom0_mem specified. That > make take a day or two.Roger, I'm able to reproduce by not specifying dom0_mem option. Will debug linux tomorrow to see what's going on. JFYI. thanks Mukesh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Dec-03 07:48 UTC
Re: [V3 PATCH 5/9] PVH dom0: implement XENMEM_add_to_physmap_range for x86
>>> On 03.12.13 at 01:05, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > On Mon, 02 Dec 2013 12:47:25 +0000 > "Jan Beulich" <JBeulich@suse.com> wrote: >> And as noted a number of times before - I don''t think that''s >> appropriate. There''s nothing keeping non-PVH guests from using >> this interface, and hence it should either be uniformly available to >> all of them, or uniformly unavailable. I''m not intending to apply >> such a half baked thing. > > I am sorry for your frustration, but as I said before, doing it is > not straightforward as it involves creating a new version of > XLAT_add_to_physmap which unfortunately is generated via python/shell > scripts.Exactly - all of the basics are being taken care of for you. You don#t even need to worry about how these XLAT_* macros are being generated. You only need to use them (taking other similar code as reference if needed).> At this point, I want to focus on the 64bit. I understand while > it''s not a regression, it''s a partial implementation, but so is the entire > PVHRight, but here we''re not talking about PVH alone.> and many other features that are "work in progress". Your final word > on this was: > > quote: > Then leave it out, and I''ll waste my time on getting it implemented > once the patch set is in. But please add a clear note of this state > to the patch description. > > Jan > end quoteI now recall, and so be it then <sigh>. Jan
Roger Pau Monné
2013-Dec-03 10:30 UTC
Re: [V3 PATCH 9/9] pvh dom0: add opt_dom0pvh to setup.c
On 03/12/13 03:33, Mukesh Rathor wrote:> On Mon, 2 Dec 2013 12:46:49 -0800 > Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > >> On Mon, 2 Dec 2013 12:38:12 -0800 >> Mukesh Rathor <mukesh.rathor@oracle.com> wrote: >> >>> On Mon, 2 Dec 2013 20:38:26 +0100 >>> Roger Pau Monné <roger.pau@citrix.com> wrote: >>> >>>> On 02/12/13 20:30, Mukesh Rathor wrote: >>>>> On Mon, 2 Dec 2013 16:09:50 +0100 >>>>> Roger Pau Monné <roger.pau@citrix.com> wrote: >>>>> >>> ...... >>>> Thanks for the input, I haven't been able to do much debugging, >>>> but AFAICT the problem comes from alloc_vcpu_guest_context >>>> returning NULL, because alloc_domheap_page(NULL, 0) inside of >>>> that function also returned NULL (not able to allocate a page). >>>> >>>> I'm currently using your tree for both Linux (branch tmp2) and Xen >>>> (branch dom0pvh-v3), no added or removed patches (only the line >>>> mentioned above in order to boot Dom0 in PVH mode). >>>> >>>> Can you confirm you are able to boot a PVH Dom0 from you Xen and >>>> Linux trees? >>> >>> yes, I'd not be submitting the patches otherwise ;).. >>> >>> Name ID Mem VCPUs >>> State Time(s) Domain-0 0 >>> 1200 3 r----- 6250.6 >> >> FWIW, here's my command line: >> >> kernel /xen.hybrid.kdb console=com1,vga com1=57600,8n1 >> dom0_mem=1200M maxcpus=4 dom0_max_vcpus=3 noreboot sync_console >> dom0pvh module /bzImage.hyb ro root=/dev/sda10 console=tty >> console=hvc0,57600n8 >> >> >> Something is going with memory on your system looks like, the line >> >> Released 18446744073708661104 pages of unused memory >> >> is worrysome. May be start with dom0_mem and see if that makes a >> difference? >> >> OTOH, after I am done cranking out another dom0 version with Jan's >> comments, I'll try out on my system without dom0_mem specified. That >> make take a day or two. > > > Roger, > > I'm able to reproduce by not specifying dom0_mem option. Will debug > linux tomorrow to see what's going on. JFYI.Thanks, I've tried with dom0_mem and it seems to work, but I've also found that rebooting a PVH Dom0 triggers the following assert (this also happens with your new dom0pvh-v4 branch): (XEN) Domain 0 shutdown: rebooting machine. (XEN) Assertion 'read_cr0() & X86_CR0_TS' failed at vmx.c:595 (XEN) ----[ Xen-4.4-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82d0801d8992>] vmx_ctxt_switch_from+0x20/0x155 (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor (XEN) rax: 0000000080050033 rbx: ffff8300df8fb000 rcx: 0000000000000000 (XEN) rdx: 00000000ffffffff rsi: ffff82d0802cffc0 rdi: ffff8300df8fb000 (XEN) rbp: ffff82d0802cfa58 rsp: ffff82d0802cfa48 r8: 0000000000000000 (XEN) r9: ffff82cffffff000 r10: ffff83019a5c1f50 r11: 00000016eba95224 (XEN) r12: ffff8300df8fb000 r13: 0000000000000000 r14: ffff82d0802cff18 (XEN) r15: ffff83019a5c75f8 cr0: 0000000080050033 cr4: 00000000000026f0 (XEN) cr3: 000000019a765000 cr2: 00007f9f2861b000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff82d0802cfa48: (XEN) ffff8300dfdf2000 ffff8300dfdf2000 ffff82d0802cfaa8 ffff82d08015f8a0 (XEN) ffff82d0802cfac8 ffff82d0801730c2 ffff82d0802cfac8 0000000000000001 (XEN) 0000000000000046 000000000019a400 0000000000000012 ffff83019a5c75f8 (XEN) ffff82d0802cfac8 ffff82d08015fbd1 ffff8300dfdf2000 0000000000000000 (XEN) ffff82d0802cfad8 ffff82d08015fbf6 ffff82d0802cfb28 ffff82d08016239b (XEN) ffff82d0802cfb18 ffff82d080140ad3 0000000000000002 ffff83019a5c7530 (XEN) 0000000000000000 0000000000000000 0000000000000012 ffff83019a5c75f8 (XEN) ffff82d0802cfb38 ffff82d08014f1f3 ffff82d0802cfb98 ffff82d08014e304 (XEN) 0000000000000093 ffff83019a5c7604 0001000000000021 0000000400000000 (XEN) 000000000000006b 0000000000000000 0000000000000001 0000000000000000 (XEN) ffff82d080165e21 00000000ffffffff ffff82d0802cfba8 ffff82d080144e1a (XEN) ffff82d0802cfbb8 ffff82d080165e38 ffff82d0802cfbe8 ffff82d080165920 (XEN) 0000000000000000 0000000000000001 0000000000000000 0000000000000000 (XEN) ffff82d0802cfc18 ffff82d080166c76 0000000000010000 0000000000000000 (XEN) 0000000000000001 ffff82d08026fd20 ffff82d0802cfc48 ffff82d080166d82 (XEN) 0000000000000009 ffff82d080274080 00000000000000fb ffff83019a577ad0 (XEN) ffff82d0802cfc68 ffff82d0801670a6 ffff82d0802cfc68 ffff82d08018268d (XEN) ffff82d0802cfc88 ffff82d080182715 0000000000000000 ffff82d0802cfdb8 (XEN) ffff82d0802cfcd8 ffff82d08018239c ffff82d0801405e6 0000000000000061 (XEN) ffff82d0802cfce8 0000000000000000 ffff82d0802cfdb8 00000000000000fb (XEN) Xen call trace: (XEN) [<ffff82d0801d8992>] vmx_ctxt_switch_from+0x20/0x155 (XEN) [<ffff82d08015f8a0>] __context_switch+0xa0/0x36c (XEN) [<ffff82d08015fbd1>] __sync_local_execstate+0x65/0x81 (XEN) [<ffff82d08015fbf6>] sync_local_execstate+0x9/0xb (XEN) [<ffff82d08016239b>] map_domain_page+0x86/0x51a (XEN) [<ffff82d08014f1f3>] map_vtd_domain_page+0xd/0x1a (XEN) [<ffff82d08014e304>] io_apic_read_remap_rte+0x156/0x2a2 (XEN) [<ffff82d080144e1a>] iommu_read_apic_from_ire+0x29/0x30 (XEN) [<ffff82d080165e38>] io_apic_read+0x17/0x65 (XEN) [<ffff82d080165920>] __ioapic_read_entry+0x3d/0x69 (XEN) [<ffff82d080166c76>] clear_IO_APIC_pin+0x1a/0xf3 (XEN) [<ffff82d080166d82>] clear_IO_APIC+0x33/0x62 (XEN) [<ffff82d0801670a6>] disable_IO_APIC+0xd/0x89 (XEN) [<ffff82d080182715>] smp_send_stop+0x5b/0x66 (XEN) [<ffff82d08018239c>] machine_restart+0x84/0x225 (XEN) [<ffff82d080182548>] __machine_restart+0xb/0xd (XEN) [<ffff82d080128dc9>] smp_call_function_interrupt+0xa5/0xca (XEN) [<ffff82d080182b08>] call_function_interrupt+0x33/0x35 (XEN) [<ffff82d08016b988>] do_IRQ+0x9e/0x660 (XEN) [<ffff82d0801645df>] common_interrupt+0x5f/0x70 (XEN) [<ffff82d0801a7981>] mwait_idle+0x2ad/0x30e (XEN) [<ffff82d08015ed42>] idle_loop+0x64/0x7e (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Assertion 'read_cr0() & X86_CR0_TS' failed at vmx.c:595 (XEN) **************************************** _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 02.12.13 at 20:38, Roger Pau Monné<roger.pau@citrix.com> wrote: > Thanks for the input, I haven't been able to do much debugging, but > AFAICT the problem comes from alloc_vcpu_guest_context returning NULL, > because alloc_domheap_page(NULL, 0) inside of that function also > returned NULL (not able to allocate a page).How about making sure Xen has some free memory then? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Mukesh Rathor
2013-Dec-03 19:49 UTC
Re: [V3 PATCH 5/9] PVH dom0: implement XENMEM_add_to_physmap_range for x86
On Tue, 03 Dec 2013 07:48:11 +0000 "Jan Beulich" <JBeulich@suse.com> wrote:> >>> On 03.12.13 at 01:05, Mukesh Rathor <mukesh.rathor@oracle.com> > >>> wrote: > > On Mon, 02 Dec 2013 12:47:25 +0000 > > "Jan Beulich" <JBeulich@suse.com> wrote: > >> And as noted a number of times before - I don''t think that''s > >> appropriate. There''s nothing keeping non-PVH guests from using > >> this interface, and hence it should either be uniformly available > >> to all of them, or uniformly unavailable. I''m not intending to > >> apply such a half baked thing. > > > > I am sorry for your frustration, but as I said before, doing it is > > not straightforward as it involves creating a new version of > > XLAT_add_to_physmap which unfortunately is generated via > > python/shell scripts. > > Exactly - all of the basics are being taken care of for you. You > don#t even need to worry about how these XLAT_* macros > are being generated. You only need to use them (taking other > similar code as reference if needed).No, the existing macro can''t be used because of extra field foreign_domid that needs to be copied also, not to mention the gpfns array that need to be handled too. We need to create/generate a new macro because the foreign_domid doesn''t exist in struct xen_add_to_physmap, and the gpfns and errs arrays need to be considered. thanks mukesh
Mukesh Rathor
2013-Dec-03 19:51 UTC
Re: [V3 PATCH 9/9] pvh dom0: add opt_dom0pvh to setup.c
On Tue, 3 Dec 2013 11:30:16 +0100 Roger Pau Monné <roger.pau@citrix.com> wrote:> On 03/12/13 03:33, Mukesh Rathor wrote: > > On Mon, 2 Dec 2013 12:46:49 -0800 > > Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > > > >> On Mon, 2 Dec 2013 12:38:12 -0800 > >> Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > >> > >>> On Mon, 2 Dec 2013 20:38:26 +0100 > >>> Roger Pau Monné <roger.pau@citrix.com> wrote: > >>> > >>>> On 02/12/13 20:30, Mukesh Rathor wrote: > >>>>> On Mon, 2 Dec 2013 16:09:50 +0100 > >>>>> Roger Pau Monné <roger.pau@citrix.com> wrote: > >>>>> > >>> ...... > >>>> Thanks for the input, I haven't been able to do much debugging, > >>>> but AFAICT the problem comes from alloc_vcpu_guest_context > >>>> returning NULL, because alloc_domheap_page(NULL, 0) inside of > >>>> that function also returned NULL (not able to allocate a page). > >>>> > >>>> I'm currently using your tree for both Linux (branch tmp2) and > >>>> Xen (branch dom0pvh-v3), no added or removed patches (only the > >>>> line mentioned above in order to boot Dom0 in PVH mode). > >>>> > >>>> Can you confirm you are able to boot a PVH Dom0 from you Xen and > >>>> Linux trees? > >>> > >>> yes, I'd not be submitting the patches otherwise ;).. > >>> > >>> Name ID Mem VCPUs > >>> State Time(s) Domain-0 0 > >>> 1200 3 r----- 6250.6 > >> > >> FWIW, here's my command line: > >> > >> kernel /xen.hybrid.kdb console=com1,vga com1=57600,8n1 > >> dom0_mem=1200M maxcpus=4 dom0_max_vcpus=3 noreboot sync_console > >> dom0pvh module /bzImage.hyb ro root=/dev/sda10 console=tty > >> console=hvc0,57600n8 > >> > >> > >> Something is going with memory on your system looks like, the line > >> > >> Released 18446744073708661104 pages of unused memory > >> > >> is worrysome. May be start with dom0_mem and see if that makes a > >> difference? > >> > >> OTOH, after I am done cranking out another dom0 version with Jan's > >> comments, I'll try out on my system without dom0_mem specified. > >> That make take a day or two. > > > > > > Roger, > > > > I'm able to reproduce by not specifying dom0_mem option. Will debug > > linux tomorrow to see what's going on. JFYI. > > Thanks, I've tried with dom0_mem and it seems to work, but I've also > found that rebooting a PVH Dom0 triggers the following assert (this > also happens with your new dom0pvh-v4 branch): > > (XEN) Domain 0 shutdown: rebooting machine. > (XEN) Assertion 'read_cr0() & X86_CR0_TS' failed at vmx.c:595 > (XEN) ----[ Xen-4.4-unstable x86_64 debug=y Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82d0801d8992>]Hmm... I am able to reboot, no problem. I'll have to see how the modified domu patch handles cr0 again. LMK if you find anything in the meantime. thanks Mukesh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Dec-04 08:03 UTC
Re: [V3 PATCH 5/9] PVH dom0: implement XENMEM_add_to_physmap_range for x86
>>> On 03.12.13 at 20:49, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > On Tue, 03 Dec 2013 07:48:11 +0000 > "Jan Beulich" <JBeulich@suse.com> wrote: > >> >>> On 03.12.13 at 01:05, Mukesh Rathor <mukesh.rathor@oracle.com> >> >>> wrote: >> > On Mon, 02 Dec 2013 12:47:25 +0000 >> > "Jan Beulich" <JBeulich@suse.com> wrote: >> >> And as noted a number of times before - I don''t think that''s >> >> appropriate. There''s nothing keeping non-PVH guests from using >> >> this interface, and hence it should either be uniformly available >> >> to all of them, or uniformly unavailable. I''m not intending to >> >> apply such a half baked thing. >> > >> > I am sorry for your frustration, but as I said before, doing it is >> > not straightforward as it involves creating a new version of >> > XLAT_add_to_physmap which unfortunately is generated via >> > python/shell scripts. >> >> Exactly - all of the basics are being taken care of for you. You >> don#t even need to worry about how these XLAT_* macros >> are being generated. You only need to use them (taking other >> similar code as reference if needed). > > No, the existing macro can''t be used because of extra field foreign_domid > that needs to be copied also, not to mention the gpfns array that need > to be handled too. We need to create/generate a new macro because the > foreign_domid doesn''t exist in struct xen_add_to_physmap, and the gpfns > and errs arrays need to be considered.I didn''t talk of any existing macro. What I said is once the necessary addition to xen/include/xlat.lst was made, the macro _would be generated_ for you. But yes, you''d need to write the code wrapping the macro and dealing with the array. Anyway - as said, I''ll take care of this if you feel unable to, but this requires that your code goes in rather soon (didn''t get around yet to look at v4) so I''ll have time to also deal with that fallout. Jan