thr3ads.net - Xen devel - [V1 PATCH 0/11]: PVH dom0.... [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Mukesh Rathor

2013-Nov-09 01:23 UTC

[V1 PATCH 0/11]: PVH dom0....

These patches implement PVH dom0. Please note there is 1 fixme in
this entire series that is being addressed under a different patch 
series.  PVH dom0 creation is disabled until then.

Patches 1 thru 6 implement changes in and around construct_dom0.
Patches 7 thru 11 are to support tool stack on PVH dom0, and have
been mostly looked at in the past.

Thanks,
Mukesh

Mukesh Rathor

2013-Nov-09 01:23 UTC

head link

[V1 PATCH 01/11] PVH dom0: set eflags resvd bit #1

In this patch the eflags resv bit #1 is set in vmx_vmenter_helper. If
the bit is not set, the vmlaunch/resume will fail with guest state
invalid.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/domain.c           |    2 --
 xen/arch/x86/hvm/hvm.c          |    7 ++-----
 xen/arch/x86/hvm/vmx/vmx.c      |    2 +-
 xen/include/asm-x86/processor.h |    1 +
 4 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 0736a4b..e9ab6c8 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -732,8 +732,6 @@ int arch_set_info_guest(
     for ( i = 0; i < ARRAY_SIZE(v->arch.debugreg); ++i )
         v->arch.debugreg[i] = c(debugreg[i]);
 
-    v->arch.user_regs.eflags |= 2;
-
     if ( !is_pv_vcpu(v) )
     {
         hvm_set_info_guest(v);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 61b5d8e..0d7ab44 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -952,7 +952,7 @@ static int hvm_load_cpu_ctxt(struct domain *d,
hvm_domain_context_t *h)
     v->arch.user_regs.edi = ctxt.rdi;
     v->arch.user_regs.esp = ctxt.rsp;
     v->arch.user_regs.eip = ctxt.rip;
-    v->arch.user_regs.eflags = ctxt.rflags | 2;
+    v->arch.user_regs.eflags = ctxt.rflags | X86_EFLAGS_MBS;
     v->arch.user_regs.r8  = ctxt.r8;
     v->arch.user_regs.r9  = ctxt.r9;
     v->arch.user_regs.r10 = ctxt.r10;
@@ -1133,7 +1133,6 @@ static int pvh_vcpu_initialise(struct vcpu *v)
                          (unsigned long)v);
 
     v->arch.hvm_vcpu.hcall_64bit = 1;    /* PVH 32bitfixme. */
-    v->arch.user_regs.eflags = 2;
     v->arch.hvm_vcpu.inject_trap.vector = -1;
 
     if ( (rc = hvm_vcpu_cacheattr_init(v)) != 0 )
@@ -1218,8 +1217,6 @@ int hvm_vcpu_initialise(struct vcpu *v)
         (void(*)(unsigned long))hvm_assert_evtchn_irq,
         (unsigned long)v);
 
-    v->arch.user_regs.eflags = 2;
-
     if ( v->vcpu_id == 0 )
     {
         /* NB. All these really belong in hvm_domain_initialise(). */
@@ -3619,7 +3616,7 @@ void hvm_vcpu_reset_state(struct vcpu *v, uint16_t cs,
uint16_t ip)
 
     v->arch.vgc_flags = VGCF_online;
     memset(&v->arch.user_regs, 0, sizeof(v->arch.user_regs));
-    v->arch.user_regs.eflags = 2;
+    v->arch.user_regs.eflags = X86_EFLAGS_MBS;
     v->arch.user_regs.edx = 0x00000f00;
     v->arch.user_regs.eip = ip;
     memset(&v->arch.debugreg, 0, sizeof(v->arch.debugreg));
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index e93fd98..1f39220 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3033,7 +3033,7 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs)
 
     __vmwrite(GUEST_RIP,    regs->rip);
     __vmwrite(GUEST_RSP,    regs->rsp);
-    __vmwrite(GUEST_RFLAGS, regs->rflags);
+    __vmwrite(GUEST_RFLAGS, regs->rflags | X86_EFLAGS_MBS);
 }
 
 /*
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 551036d..73a3202 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -35,6 +35,7 @@
  * EFLAGS bits
  */
 #define X86_EFLAGS_CF	0x00000001 /* Carry Flag */
+#define X86_EFLAGS_MBS	0x00000002 /* Resvd bit */
 #define X86_EFLAGS_PF	0x00000004 /* Parity Flag */
 #define X86_EFLAGS_AF	0x00000010 /* Auxillary carry Flag */
 #define X86_EFLAGS_ZF	0x00000040 /* Zero Flag */
-- 
1.7.2.3

Mukesh Rathor

2013-Nov-09 01:23 UTC

head link

[V1 PATCH 02/11] PVH dom0: Allow physdevops for PVH dom0

Allow a PVH dom0 access to all PHYSDEVOP_* ops.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/hvm/hvm.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0d7ab44..cd1dfcd 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3256,6 +3256,8 @@ static long hvm_physdev_op(int cmd,
XEN_GUEST_HANDLE_PARAM(void) arg)
         case PHYSDEVOP_get_free_pirq:
             return do_physdev_op(cmd, arg);
         default:
+            if ( is_pvh_vcpu(current) &&
is_hardware_domain(current->domain) )
+                return do_physdev_op(cmd, arg);
             return -ENOSYS;
     }
 }
-- 
1.7.2.3

Mukesh Rathor

2013-Nov-09 01:23 UTC

head link

[V1 PATCH 03/11] PVH dom0: iommu related changes

- For now, iommu is required for PVH dom0. Check for that.
- For PVH, we need to do mfn_to_gfn before calling mapping function
  intel_iommu_map_page/amd_iommu_map_page which expects a gfn.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/drivers/passthrough/iommu.c |   17 +++++++++++++++--
 1 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 93ad122..c3d31a5 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -125,15 +125,27 @@ int iommu_domain_init(struct domain *d)
     return hd->platform_ops->init(d);
 }
 
+static inline void check_dom0_pvh_reqs(struct domain *d)
+{
+    if ( !iommu_enabled )
+        panic("Presently, iommu must be enabled for pvh dom0\n");
+
+    if ( iommu_passthrough )
+        panic("For pvh dom0, dom0-passthrough must not be
enabled\n");
+}
+
 void __init iommu_dom0_init(struct domain *d)
 {
     struct hvm_iommu *hd = domain_hvm_iommu(d);
 
+    if ( is_pvh_domain(d) )
+        check_dom0_pvh_reqs(d);
+
     if ( !iommu_enabled )
         return;
 
     register_keyhandler(''o'', &iommu_p2m_table);
-    d->need_iommu = !!iommu_dom0_strict;
+    d->need_iommu = is_pvh_domain(d) || !!iommu_dom0_strict;
     if ( need_iommu(d) )
     {
         struct page_info *page;
@@ -141,12 +153,13 @@ void __init iommu_dom0_init(struct domain *d)
         page_list_for_each ( page, &d->page_list )
         {
             unsigned long mfn = page_to_mfn(page);
+            unsigned long gfn = mfn_to_gfn(d, _mfn(mfn));
             unsigned int mapping = IOMMUF_readable;
             if ( ((page->u.inuse.type_info & PGT_count_mask) == 0) ||
                  ((page->u.inuse.type_info & PGT_type_mask)
                   == PGT_writable_page) )
                 mapping |= IOMMUF_writable;
-            hd->platform_ops->map_page(d, mfn, mfn, mapping);
+            hd->platform_ops->map_page(d, gfn, mfn, mapping);
             if ( !(i++ & 0xfffff) )
                 process_pending_softirqs();
         }
-- 
1.7.2.3

Mukesh Rathor

2013-Nov-09 01:23 UTC

head link

[V1 PATCH 04/11] PVH dom0: create update_memory_mapping() function

In this preparatory patch XEN_DOMCTL_memory_mapping code is
put into a function so it can be called later for PVH from
construct_dom0. There is no change in it''s functionality.
The function is made non-static in the construct_dom0 patch.

Note, iomem_access_permitted can''t be moved to the function
because current doesn''t point to dom0 in construct_dom0.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/domctl.c |  111 +++++++++++++++++++++++++++----------------------
 1 files changed, 61 insertions(+), 50 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index e75918a..31e1aed 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -46,6 +46,66 @@ static int gdbsx_guest_mem_io(
     return (iop->remain ? -EFAULT : 0);
 }
 
+static long update_memory_mapping(struct domain *d, unsigned long gfn,
+                           unsigned long mfn, unsigned long nr_mfns,
+                           bool_t add)
+{
+    unsigned long i;
+    long ret;
+
+    ret = xsm_iomem_mapping(XSM_HOOK, d, mfn, mfn + nr_mfns - 1, add);
+    if ( ret )
+        return ret;
+
+    if ( add )
+    {
+        printk(XENLOG_G_INFO
+               "memory_map:add: dom%d gfn=%lx mfn=%lx nr=%lx\n",
+               d->domain_id, gfn, mfn, nr_mfns);
+
+        ret = iomem_permit_access(d, mfn, mfn + nr_mfns - 1);
+        if ( !ret && paging_mode_translate(d) )
+        {
+            for ( i = 0; !ret && i < nr_mfns; i++ )
+                if ( !set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i)) )
+                    ret = -EIO;
+            if ( ret )
+            {
+                printk(XENLOG_G_WARNING
+                       "memory_map:fail: dom%d gfn=%lx mfn=%lx\n",
+                       d->domain_id, gfn + i, mfn + i);
+                while ( i-- )
+                    clear_mmio_p2m_entry(d, gfn + i);
+                if ( iomem_deny_access(d, mfn, mfn + nr_mfns - 1) &&
+                     is_hardware_domain(current->domain) )
+                    printk(XENLOG_ERR
+                           "memory_map: failed to deny dom%d access to
[%lx,%lx]\n",
+                           d->domain_id, mfn, mfn + nr_mfns - 1);
+            }
+        }
+    }
+    else
+    {
+        printk(XENLOG_G_INFO
+               "memory_map:remove: dom%d gfn=%lx mfn=%lx nr=%lx\n",
+               d->domain_id, gfn, mfn, nr_mfns);
+
+        if ( paging_mode_translate(d) )
+            for ( i = 0; i < nr_mfns; i++ )
+                add |= !clear_mmio_p2m_entry(d, gfn + i);
+        ret = iomem_deny_access(d, mfn, mfn + nr_mfns - 1);
+        if ( !ret && add )
+            ret = -EIO;
+        if ( ret && is_hardware_domain(current->domain) )
+            printk(XENLOG_ERR
+                   "memory_map: error %ld %s dom%d access to
[%lx,%lx]\n",
+                   ret, add ? "removing" : "denying",
d->domain_id,
+                   mfn, mfn + nr_mfns - 1);
+    }
+
+    return ret;
+}
+
 long arch_do_domctl(
     struct xen_domctl *domctl, struct domain *d,
     XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
@@ -625,7 +685,6 @@ long arch_do_domctl(
         unsigned long mfn = domctl->u.memory_mapping.first_mfn;
         unsigned long nr_mfns = domctl->u.memory_mapping.nr_mfns;
         int add = domctl->u.memory_mapping.add_mapping;
-        unsigned long i;
 
         ret = -EINVAL;
         if ( (mfn + nr_mfns - 1) < mfn || /* wrap? */
@@ -637,55 +696,7 @@ long arch_do_domctl(
         if ( !iomem_access_permitted(current->domain, mfn, mfn + nr_mfns -
1) )
             break;
 
-        ret = xsm_iomem_mapping(XSM_HOOK, d, mfn, mfn + nr_mfns - 1, add);
-        if ( ret )
-            break;
-
-        if ( add )
-        {
-            printk(XENLOG_G_INFO
-                   "memory_map:add: dom%d gfn=%lx mfn=%lx nr=%lx\n",
-                   d->domain_id, gfn, mfn, nr_mfns);
-
-            ret = iomem_permit_access(d, mfn, mfn + nr_mfns - 1);
-            if ( !ret && paging_mode_translate(d) )
-            {
-                for ( i = 0; !ret && i < nr_mfns; i++ )
-                    if ( !set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i)) )
-                        ret = -EIO;
-                if ( ret )
-                {
-                    printk(XENLOG_G_WARNING
-                           "memory_map:fail: dom%d gfn=%lx
mfn=%lx\n",
-                           d->domain_id, gfn + i, mfn + i);
-                    while ( i-- )
-                        clear_mmio_p2m_entry(d, gfn + i);
-                    if ( iomem_deny_access(d, mfn, mfn + nr_mfns - 1)
&&
-                         is_hardware_domain(current->domain) )
-                        printk(XENLOG_ERR
-                               "memory_map: failed to deny dom%d access to
[%lx,%lx]\n",
-                               d->domain_id, mfn, mfn + nr_mfns - 1);
-                }
-            }
-        }
-        else
-        {
-            printk(XENLOG_G_INFO
-                   "memory_map:remove: dom%d gfn=%lx mfn=%lx
nr=%lx\n",
-                   d->domain_id, gfn, mfn, nr_mfns);
-
-            if ( paging_mode_translate(d) )
-                for ( i = 0; i < nr_mfns; i++ )
-                    add |= !clear_mmio_p2m_entry(d, gfn + i);
-            ret = iomem_deny_access(d, mfn, mfn + nr_mfns - 1);
-            if ( !ret && add )
-                ret = -EIO;
-            if ( ret && is_hardware_domain(current->domain) )
-                printk(XENLOG_ERR
-                       "memory_map: error %ld %s dom%d access to
[%lx,%lx]\n",
-                       ret, add ? "removing" : "denying",
d->domain_id,
-                       mfn, mfn + nr_mfns - 1);
-        }
+        ret = update_memory_mapping(d, gfn, mfn, nr_mfns, add);
     }
     break;
 
-- 
1.7.2.3

Mukesh Rathor

2013-Nov-09 01:23 UTC

head link

[V1 PATCH 05/11] PVH dom0: move some pv specific code to static functions

In this preparatory patch also, some pv specific code is
carved out into static functions. No functionality change.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/domain_build.c |  353 +++++++++++++++++++++++--------------------
 1 files changed, 192 insertions(+), 161 deletions(-)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index 232adf8..c9ff680 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -307,6 +307,191 @@ static void __init process_dom0_ioports_disable(void)
     }
 }
 
+/* Pages that are part of page tables must be read only. */
+static __init void mark_pv_pt_pages_rdonly(struct domain *d,
+                                           l4_pgentry_t *l4start,
+                                           unsigned long vpt_start,
+                                           unsigned long nr_pt_pages)
+{
+    unsigned long count;
+    struct page_info *page;
+    l4_pgentry_t *pl4e;
+    l3_pgentry_t *pl3e;
+    l2_pgentry_t *pl2e;
+    l1_pgentry_t *pl1e;
+
+    pl4e = l4start + l4_table_offset(vpt_start);
+    pl3e = l4e_to_l3e(*pl4e);
+    pl3e += l3_table_offset(vpt_start);
+    pl2e = l3e_to_l2e(*pl3e);
+    pl2e += l2_table_offset(vpt_start);
+    pl1e = l2e_to_l1e(*pl2e);
+    pl1e += l1_table_offset(vpt_start);
+    for ( count = 0; count < nr_pt_pages; count++ )
+    {
+        l1e_remove_flags(*pl1e, _PAGE_RW);
+        page = mfn_to_page(l1e_get_pfn(*pl1e));
+
+        /* Read-only mapping + PGC_allocated + page-table page. */
+        page->count_info         = PGC_allocated | 3;
+        page->u.inuse.type_info |= PGT_validated | 1;
+
+        /* Top-level p.t. is pinned. */
+        if ( (page->u.inuse.type_info & PGT_type_mask) =+            
(!is_pv_32on64_domain(d) ?
+              PGT_l4_page_table : PGT_l3_page_table) )
+        {
+            page->count_info        += 1;
+            page->u.inuse.type_info += 1 | PGT_pinned;
+        }
+
+        /* Iterate. */
+        if ( !((unsigned long)++pl1e & (PAGE_SIZE - 1)) )
+        {
+            if ( !((unsigned long)++pl2e & (PAGE_SIZE - 1)) )
+            {
+                if ( !((unsigned long)++pl3e & (PAGE_SIZE - 1)) )
+                    pl3e = l4e_to_l3e(*++pl4e);
+                pl2e = l3e_to_l2e(*pl3e);
+            }
+            pl1e = l2e_to_l1e(*pl2e);
+        }
+    }
+}
+
+/* Set up the phys->machine table if not part of the initial mapping. */
+static __init void setup_pv_physmap(struct domain *d, unsigned long pgtbl_pfn,
+                                    unsigned long v_start, unsigned long v_end,
+                                    unsigned long vphysmap_start,
+                                    unsigned long vphysmap_end,
+                                    unsigned long nr_pages)
+{
+    struct page_info *page = NULL;
+    l4_pgentry_t *pl4e = NULL, *l4start;
+    l3_pgentry_t *pl3e = NULL;
+    l2_pgentry_t *pl2e = NULL;
+    l1_pgentry_t *pl1e = NULL;
+
+    l4start = map_domain_page(pgtbl_pfn);
+
+    if ( v_start <= vphysmap_end && vphysmap_start <= v_end )
+        panic("DOM0 P->M table overlaps initial mapping");
+
+    while ( vphysmap_start < vphysmap_end )
+    {
+        if ( d->tot_pages + ((round_pgup(vphysmap_end) - vphysmap_start)
+                             >> PAGE_SHIFT) + 3 > nr_pages )
+            panic("Dom0 allocation too small for initial P->M
table.\n");
+
+        if ( pl1e )
+        {
+            unmap_domain_page(pl1e);
+            pl1e = NULL;
+        }
+        if ( pl2e )
+        {
+            unmap_domain_page(pl2e);
+            pl2e = NULL;
+        }
+        if ( pl3e )
+        {
+            unmap_domain_page(pl3e);
+            pl3e = NULL;
+        }
+        pl4e = l4start + l4_table_offset(vphysmap_start);
+        if ( !l4e_get_intpte(*pl4e) )
+        {
+            page = alloc_domheap_page(d, 0);
+            if ( !page )
+                break;
+
+            /* No mapping, PGC_allocated + page-table page. */
+            page->count_info = PGC_allocated | 2;
+            page->u.inuse.type_info = PGT_l3_page_table | PGT_validated | 1;
+            pl3e = __map_domain_page(page);
+            clear_page(pl3e);
+            *pl4e = l4e_from_page(page, L4_PROT);
+        } else
+            pl3e = map_domain_page(l4e_get_pfn(*pl4e));
+
+        pl3e += l3_table_offset(vphysmap_start);
+        if ( !l3e_get_intpte(*pl3e) )
+        {
+            if ( cpu_has_page1gb &&
+                 !(vphysmap_start & ((1UL << L3_PAGETABLE_SHIFT) -
1)) &&
+                 vphysmap_end >= vphysmap_start + (1UL <<
L3_PAGETABLE_SHIFT) &&
+                 (page = alloc_domheap_pages(d,
+                                             L3_PAGETABLE_SHIFT - PAGE_SHIFT,
+                                             0)) != NULL )
+            {
+                *pl3e = l3e_from_page(page, L1_PROT|_PAGE_DIRTY|_PAGE_PSE);
+                vphysmap_start += 1UL << L3_PAGETABLE_SHIFT;
+                continue;
+            }
+            if ( (page = alloc_domheap_page(d, 0)) == NULL )
+                break;
+
+            /* No mapping, PGC_allocated + page-table page. */
+            page->count_info = PGC_allocated | 2;
+            page->u.inuse.type_info = PGT_l2_page_table | PGT_validated | 1;
+            pl2e = __map_domain_page(page);
+            clear_page(pl2e);
+            *pl3e = l3e_from_page(page, L3_PROT);
+        }
+        else
+           pl2e = map_domain_page(l3e_get_pfn(*pl3e));
+
+        pl2e += l2_table_offset(vphysmap_start);
+        if ( !l2e_get_intpte(*pl2e) )
+        {
+            if ( !(vphysmap_start & ((1UL << L2_PAGETABLE_SHIFT) -
1)) &&
+                 vphysmap_end >= vphysmap_start + (1UL <<
L2_PAGETABLE_SHIFT) &&
+                 (page = alloc_domheap_pages(d,
+                                             L2_PAGETABLE_SHIFT - PAGE_SHIFT,
+                                             0)) != NULL )
+            {
+                *pl2e = l2e_from_page(page, L1_PROT|_PAGE_DIRTY|_PAGE_PSE);
+                if ( opt_allow_superpage )
+                    get_superpage(page_to_mfn(page), d);
+                vphysmap_start += 1UL << L2_PAGETABLE_SHIFT;
+                continue;
+            }
+            if ( (page = alloc_domheap_page(d, 0)) == NULL )
+                break;
+
+            /* No mapping, PGC_allocated + page-table page. */
+            page->count_info = PGC_allocated | 2;
+            page->u.inuse.type_info = PGT_l1_page_table | PGT_validated | 1;
+            pl1e = __map_domain_page(page);
+            clear_page(pl1e);
+            *pl2e = l2e_from_page(page, L2_PROT);
+        }
+        else
+            pl1e = map_domain_page(l2e_get_pfn(*pl2e));
+
+        pl1e += l1_table_offset(vphysmap_start);
+        BUG_ON(l1e_get_intpte(*pl1e));
+        page = alloc_domheap_page(d, 0);
+        if ( !page )
+            break;
+
+        *pl1e = l1e_from_page(page, L1_PROT|_PAGE_DIRTY);
+        vphysmap_start += PAGE_SIZE;
+        vphysmap_start &= PAGE_MASK;
+    }
+    if ( !page )
+        panic("Not enough RAM for DOM0 P->M table.\n");
+
+    if ( pl1e )
+        unmap_domain_page(pl1e);
+    if ( pl2e )
+        unmap_domain_page(pl2e);
+    if ( pl3e )
+        unmap_domain_page(pl3e);
+
+    unmap_domain_page(l4start);
+}
+
 int __init construct_dom0(
     struct domain *d,
     const module_t *image, unsigned long image_headroom,
@@ -705,44 +890,8 @@ int __init construct_dom0(
                COMPAT_L2_PAGETABLE_XEN_SLOTS(d) * sizeof(*l2tab));
     }
 
-    /* Pages that are part of page tables must be read only. */
-    l4tab = l4start + l4_table_offset(vpt_start);
-    l3start = l3tab = l4e_to_l3e(*l4tab);
-    l3tab += l3_table_offset(vpt_start);
-    l2start = l2tab = l3e_to_l2e(*l3tab);
-    l2tab += l2_table_offset(vpt_start);
-    l1start = l1tab = l2e_to_l1e(*l2tab);
-    l1tab += l1_table_offset(vpt_start);
-    for ( count = 0; count < nr_pt_pages; count++ ) 
-    {
-        l1e_remove_flags(*l1tab, _PAGE_RW);
-        page = mfn_to_page(l1e_get_pfn(*l1tab));
-
-        /* Read-only mapping + PGC_allocated + page-table page. */
-        page->count_info         = PGC_allocated | 3;
-        page->u.inuse.type_info |= PGT_validated | 1;
-
-        /* Top-level p.t. is pinned. */
-        if ( (page->u.inuse.type_info & PGT_type_mask) =-            
(!is_pv_32on64_domain(d) ?
-              PGT_l4_page_table : PGT_l3_page_table) )
-        {
-            page->count_info        += 1;
-            page->u.inuse.type_info += 1 | PGT_pinned;
-        }
-
-        /* Iterate. */
-        if ( !((unsigned long)++l1tab & (PAGE_SIZE - 1)) )
-        {
-            if ( !((unsigned long)++l2tab & (PAGE_SIZE - 1)) )
-            {
-                if ( !((unsigned long)++l3tab & (PAGE_SIZE - 1)) )
-                    l3start = l3tab = l4e_to_l3e(*++l4tab);
-                l2start = l2tab = l3e_to_l2e(*l3tab);
-            }
-            l1start = l1tab = l2e_to_l1e(*l2tab);
-        }
-    }
+    if  ( is_pv_domain(d) )
+        mark_pv_pt_pages_rdonly(d, l4start, vpt_start, nr_pt_pages);
 
     /* Mask all upcalls... */
     for ( i = 0; i < XEN_LEGACY_MAX_VCPUS; i++ )
@@ -814,132 +963,14 @@ int __init construct_dom0(
              elf_64bit(&elf) ? 64 : 32, parms.pae ? "p" :
"");
 
     count = d->tot_pages;
-    l4start = map_domain_page(pagetable_get_pfn(v->arch.guest_table));
-    l3tab = NULL;
-    l2tab = NULL;
-    l1tab = NULL;
-    /* Set up the phys->machine table if not part of the initial mapping. */
-    if ( parms.p2m_base != UNSET_ADDR )
-    {
-        unsigned long va = vphysmap_start;
 
-        if ( v_start <= vphysmap_end && vphysmap_start <= v_end )
-            panic("DOM0 P->M table overlaps initial mapping");
-
-        while ( va < vphysmap_end )
-        {
-            if ( d->tot_pages + ((round_pgup(vphysmap_end) - va)
-                                 >> PAGE_SHIFT) + 3 > nr_pages )
-                panic("Dom0 allocation too small for initial P->M
table.\n");
-
-            if ( l1tab )
-            {
-                unmap_domain_page(l1tab);
-                l1tab = NULL;
-            }
-            if ( l2tab )
-            {
-                unmap_domain_page(l2tab);
-                l2tab = NULL;
-            }
-            if ( l3tab )
-            {
-                unmap_domain_page(l3tab);
-                l3tab = NULL;
-            }
-            l4tab = l4start + l4_table_offset(va);
-            if ( !l4e_get_intpte(*l4tab) )
-            {
-                page = alloc_domheap_page(d, 0);
-                if ( !page )
-                    break;
-                /* No mapping, PGC_allocated + page-table page. */
-                page->count_info = PGC_allocated | 2;
-                page->u.inuse.type_info -                   
PGT_l3_page_table | PGT_validated | 1;
-                l3tab = __map_domain_page(page);
-                clear_page(l3tab);
-                *l4tab = l4e_from_page(page, L4_PROT);
-            } else
-                l3tab = map_domain_page(l4e_get_pfn(*l4tab));
-            l3tab += l3_table_offset(va);
-            if ( !l3e_get_intpte(*l3tab) )
-            {
-                if ( cpu_has_page1gb &&
-                     !(va & ((1UL << L3_PAGETABLE_SHIFT) - 1))
&&
-                     vphysmap_end >= va + (1UL << L3_PAGETABLE_SHIFT)
&&
-                     (page = alloc_domheap_pages(d,
-                                                 L3_PAGETABLE_SHIFT -
-                                                     PAGE_SHIFT,
-                                                 0)) != NULL )
-                {
-                    *l3tab = l3e_from_page(page,
-                                           L1_PROT|_PAGE_DIRTY|_PAGE_PSE);
-                    va += 1UL << L3_PAGETABLE_SHIFT;
-                    continue;
-                }
-                if ( (page = alloc_domheap_page(d, 0)) == NULL )
-                    break;
-                /* No mapping, PGC_allocated + page-table page. */
-                page->count_info = PGC_allocated | 2;
-                page->u.inuse.type_info -                   
PGT_l2_page_table | PGT_validated | 1;
-                l2tab = __map_domain_page(page);
-                clear_page(l2tab);
-                *l3tab = l3e_from_page(page, L3_PROT);
-            }
-            else
-               l2tab = map_domain_page(l3e_get_pfn(*l3tab));
-            l2tab += l2_table_offset(va);
-            if ( !l2e_get_intpte(*l2tab) )
-            {
-                if ( !(va & ((1UL << L2_PAGETABLE_SHIFT) - 1))
&&
-                     vphysmap_end >= va + (1UL << L2_PAGETABLE_SHIFT)
&&
-                     (page = alloc_domheap_pages(d,
-                                                 L2_PAGETABLE_SHIFT -
-                                                     PAGE_SHIFT,
-                                                 0)) != NULL )
-                {
-                    *l2tab = l2e_from_page(page,
-                                           L1_PROT|_PAGE_DIRTY|_PAGE_PSE);
-                    if ( opt_allow_superpage )
-                        get_superpage(page_to_mfn(page), d);
-                    va += 1UL << L2_PAGETABLE_SHIFT;
-                    continue;
-                }
-                if ( (page = alloc_domheap_page(d, 0)) == NULL )
-                    break;
-                /* No mapping, PGC_allocated + page-table page. */
-                page->count_info = PGC_allocated | 2;
-                page->u.inuse.type_info -                   
PGT_l1_page_table | PGT_validated | 1;
-                l1tab = __map_domain_page(page);
-                clear_page(l1tab);
-                *l2tab = l2e_from_page(page, L2_PROT);
-            }
-            else
-                l1tab = map_domain_page(l2e_get_pfn(*l2tab));
-            l1tab += l1_table_offset(va);
-            BUG_ON(l1e_get_intpte(*l1tab));
-            page = alloc_domheap_page(d, 0);
-            if ( !page )
-                break;
-            *l1tab = l1e_from_page(page, L1_PROT|_PAGE_DIRTY);
-            va += PAGE_SIZE;
-            va &= PAGE_MASK;
-        }
-        if ( !page )
-            panic("Not enough RAM for DOM0 P->M table.\n");
+    if ( is_pv_domain(d) && parms.p2m_base != UNSET_ADDR )
+    {
+        pfn = pagetable_get_pfn(v->arch.guest_table);
+        setup_pv_physmap(d, pfn, v_start, v_end, vphysmap_start, vphysmap_end,
+                         nr_pages);
     }
 
-    if ( l1tab )
-        unmap_domain_page(l1tab);
-    if ( l2tab )
-        unmap_domain_page(l2tab);
-    if ( l3tab )
-        unmap_domain_page(l3tab);
-    unmap_domain_page(l4start);
-
     /* Write the phys->machine and machine->phys table entries. */
     for ( pfn = 0; pfn < count; pfn++ )
     {
-- 
1.7.2.3

Mukesh Rathor

2013-Nov-09 01:23 UTC

head link

[V1 PATCH 06/11] PVH dom0: construct_dom0 changes

This patch changes construct_dom0 to boot in PVH mode. Changes
need to support it are also included here.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/domain_build.c |  224 +++++++++++++++++++++++++++++++++++++++----
 xen/arch/x86/domctl.c       |    2 +-
 xen/arch/x86/mm/hap/hap.c   |   15 +++
 xen/include/asm-x86/hap.h   |    1 +
 xen/include/xen/domain.h    |    3 +
 5 files changed, 227 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index c9ff680..d7c62d1 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -35,6 +35,7 @@
 #include <asm/setup.h>
 #include <asm/bzimage.h> /* for bzimage_parse */
 #include <asm/io_apic.h>
+#include <asm/hap.h>
 
 #include <public/version.h>
 
@@ -307,6 +308,140 @@ static void __init process_dom0_ioports_disable(void)
     }
 }
 
+/*
+ * Set the 1:1 map for all non-RAM regions for dom 0. Thus, dom0 will have
+ * the entire io region mapped in the EPT/NPT.
+ *
+ * PVH FIXME: The following doesn''t map MMIO ranges when they sit
above the
+ *            highest E820 covered address.
+ */
+static __init void pvh_map_all_iomem(struct domain *d)
+{
+    unsigned long start_pfn, end_pfn, end = 0, start = 0;
+    const struct e820entry *entry;
+    unsigned int i, nump;
+    int rc;
+
+    for ( i = 0, entry = e820.map; i < e820.nr_map; i++, entry++ )
+    {
+        end = entry->addr + entry->size;
+
+        if ( entry->type == E820_RAM || entry->type == E820_UNUSABLE ||
+             i == e820.nr_map - 1 )
+        {
+            start_pfn = PFN_DOWN(start);
+
+            /* Unused RAM areas are marked UNUSABLE, so skip it too */
+            if ( entry->type != E820_RAM && entry->type !=
E820_UNUSABLE )
+                end_pfn = PFN_UP(end);
+            else
+                end_pfn = PFN_UP(entry->addr);
+
+            if ( start_pfn < end_pfn )
+            {
+                nump = end_pfn - start_pfn;
+                /* Add pages to the mapping */
+                rc = update_memory_mapping(d, start_pfn, start_pfn, nump, 1);
+                BUG_ON(rc);
+            }
+            start = end;
+        }
+    }
+
+    /* If the e820 ended under 4GB, we must map the remaining space upto 4GB */
+    if ( end < GB(4) )
+    {
+        start_pfn = PFN_UP(end);
+        end_pfn = (GB(4)) >> PAGE_SHIFT;
+        nump = end_pfn - start_pfn;
+        rc = update_memory_mapping(d, start_pfn, start_pfn, nump, 1);
+        BUG_ON(rc);
+    }
+}
+
+static __init void dom0_update_physmap(struct domain *d, unsigned long pfn,
+                                   unsigned long mfn, unsigned long vphysmap_s)
+{
+    if ( is_pvh_domain(d) )
+    {
+        int rc = guest_physmap_add_page(d, pfn, mfn, 0);
+        BUG_ON(rc);
+        return;
+    }
+    if ( !is_pv_32on64_domain(d) )
+        ((unsigned long *)vphysmap_s)[pfn] = mfn;
+    else
+        ((unsigned int *)vphysmap_s)[pfn] = mfn;
+
+    set_gpfn_from_mfn(mfn, pfn);
+}
+
+static __init void pvh_fixup_page_tables_for_hap(struct vcpu *v,
+                                                 unsigned long v_start,
+                                                 unsigned long v_end)
+{
+    int i, j, k;
+    l4_pgentry_t *pl4e, *l4start;
+    l3_pgentry_t *pl3e;
+    l2_pgentry_t *pl2e;
+    l1_pgentry_t *pl1e;
+    unsigned long cr3_pfn;
+
+    ASSERT(paging_mode_enabled(v->domain));
+
+    l4start = map_domain_page(pagetable_get_pfn(v->arch.guest_table));
+
+    /* Give the guest a clean slate to start with */
+    pl4e = l4start + l4_table_offset(v_start);
+    memset(l4start, 0, (unsigned long)pl4e - (unsigned long)l4start);
+
+    pl4e = l4start + l4_table_offset(v_end) + 1;
+    if ( (unsigned long)pl4e & (PAGE_SIZE - 1) )
+        memset(pl4e, 0, PAGE_SIZE - ((unsigned long)pl4e & (PAGE_SIZE -
1)));
+
+    pl4e = l4start + l4_table_offset(v_start);
+    pl3e = map_l3t_from_l4e(*pl4e);
+    for ( i = 0; i < PAGE_SIZE / sizeof(*pl3e); i++, pl3e++ )
+    {
+        if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
+            continue;
+
+        pl2e = map_l2t_from_l3e(*pl3e);
+        for ( j = 0; j < PAGE_SIZE / sizeof(*pl2e); j++, pl2e++ )
+        {
+            if ( !(l2e_get_flags(*pl2e)  & _PAGE_PRESENT) )
+                continue;
+
+            pl1e = map_l1t_from_l2e(*pl2e);
+
+            for ( k = 0; k < PAGE_SIZE / sizeof(*pl1e); k++, pl1e++ )
+            {
+                if ( !(l1e_get_flags(*pl1e) & _PAGE_PRESENT) )
+                    continue;
+
+                *pl1e = l1e_from_pfn(get_gpfn_from_mfn(l1e_get_pfn(*pl1e)),
+                                     l1e_get_flags(*pl1e));
+            }
+            *pl2e = l2e_from_pfn(get_gpfn_from_mfn(l2e_get_pfn(*pl2e)),
+                                 l2e_get_flags(*pl2e));
+        }
+        *pl3e = l3e_from_pfn(get_gpfn_from_mfn(l3e_get_pfn(*pl3e)),
+                             l3e_get_flags(*pl3e));
+    }
+    *pl4e = l4e_from_pfn(get_gpfn_from_mfn(l4e_get_pfn(*pl4e)),
+                         l4e_get_flags(*pl4e));
+
+    cr3_pfn = get_gpfn_from_mfn(paddr_to_pfn(v->arch.cr3));
+    v->arch.hvm_vcpu.guest_cr[3] = pfn_to_paddr(cr3_pfn);
+
+    /*
+     * Finally, we update the paging modes (hap_update_paging_modes). This will
+     * create monitor_table for us, update v->arch.cr3, and update vmcs.cr3.
+     */
+    paging_update_paging_modes(v);
+
+}
+
 /* Pages that are part of page tables must be read only. */
 static __init void mark_pv_pt_pages_rdonly(struct domain *d,
                                            l4_pgentry_t *l4start,
@@ -520,6 +655,8 @@ int __init construct_dom0(
     l3_pgentry_t *l3tab = NULL, *l3start = NULL;
     l2_pgentry_t *l2tab = NULL, *l2start = NULL;
     l1_pgentry_t *l1tab = NULL, *l1start = NULL;
+    paddr_t shared_info_paddr = 0;
+    u32 save_pvh_pg_mode = 0;
 
     /*
      * This fully describes the memory layout of the initial domain. All 
@@ -597,12 +734,21 @@ int __init construct_dom0(
         goto out;
     }
 
-    if ( parms.elf_notes[XEN_ELFNOTE_SUPPORTED_FEATURES].type != XEN_ENT_NONE
&&
-         !test_bit(XENFEAT_dom0, parms.f_supported) )
+    if ( parms.elf_notes[XEN_ELFNOTE_SUPPORTED_FEATURES].type != XEN_ENT_NONE )
     {
-        printk("Kernel does not support Dom0 operation\n");
-        rc = -EINVAL;
-        goto out;
+        if ( !test_bit(XENFEAT_dom0, parms.f_supported) )
+        {
+            printk("Kernel does not support Dom0 operation\n");
+            rc = -EINVAL;
+            goto out;
+        }
+        if ( is_pvh_domain(d) &&
+             !test_bit(XENFEAT_hvm_callback_vector, parms.f_supported) )
+        {
+            printk("Kernel does not support PVH mode\n");
+            rc = -EINVAL;
+            goto out;
+        }
     }
 
     if ( compat32 )
@@ -667,6 +813,13 @@ int __init construct_dom0(
     vstartinfo_end   = (vstartinfo_start +
                         sizeof(struct start_info) +
                         sizeof(struct dom0_vga_console_info));
+
+    if ( is_pvh_domain(d) )
+    {
+        shared_info_paddr = round_pgup(vstartinfo_end) - v_start;
+        vstartinfo_end   += PAGE_SIZE;
+    }
+
     vpt_start        = round_pgup(vstartinfo_end);
     for ( nr_pt_pages = 2; ; nr_pt_pages++ )
     {
@@ -906,6 +1059,13 @@ int __init construct_dom0(
         (void)alloc_vcpu(d, i, cpu);
     }
 
+    /*
+     * pvh: we temporarily disable paging mode so that we can build cr3 needed
+     * to run on dom0''s page tables.
+     */
+    save_pvh_pg_mode = d->arch.paging.mode;
+    d->arch.paging.mode = 0;
+
     /* Set up CR3 value for write_ptbase */
     if ( paging_mode_enabled(d) )
         paging_update_paging_modes(v);
@@ -971,6 +1131,15 @@ int __init construct_dom0(
                          nr_pages);
     }
 
+    if ( is_pvh_domain(d) )
+        hap_set_pvh_alloc_for_dom0(d, nr_pages);
+
+    /*
+     * We enable paging mode again so guest_physmap_add_page will do the
+     * right thing for us.
+     */
+    d->arch.paging.mode = save_pvh_pg_mode;
+
     /* Write the phys->machine and machine->phys table entries. */
     for ( pfn = 0; pfn < count; pfn++ )
     {
@@ -987,11 +1156,7 @@ int __init construct_dom0(
         if ( pfn > REVERSE_START && (vinitrd_start || pfn <
initrd_pfn) )
             mfn = alloc_epfn - (pfn - REVERSE_START);
 #endif
-        if ( !is_pv_32on64_domain(d) )
-            ((unsigned long *)vphysmap_start)[pfn] = mfn;
-        else
-            ((unsigned int *)vphysmap_start)[pfn] = mfn;
-        set_gpfn_from_mfn(mfn, pfn);
+        dom0_update_physmap(d, pfn, mfn, vphysmap_start);
         if (!(pfn & 0xfffff))
             process_pending_softirqs();
     }
@@ -1007,8 +1172,8 @@ int __init construct_dom0(
             if ( !page->u.inuse.type_info &&
                  !get_page_and_type(page, d, PGT_writable_page) )
                 BUG();
-            ((unsigned long *)vphysmap_start)[pfn] = mfn;
-            set_gpfn_from_mfn(mfn, pfn);
+
+            dom0_update_physmap(d, pfn, mfn, vphysmap_start);
             ++pfn;
             if (!(pfn & 0xfffff))
                 process_pending_softirqs();
@@ -1028,11 +1193,7 @@ int __init construct_dom0(
 #ifndef NDEBUG
 #define pfn (nr_pages - 1 - (pfn - (alloc_epfn - alloc_spfn)))
 #endif
-            if ( !is_pv_32on64_domain(d) )
-                ((unsigned long *)vphysmap_start)[pfn] = mfn;
-            else
-                ((unsigned int *)vphysmap_start)[pfn] = mfn;
-            set_gpfn_from_mfn(mfn, pfn);
+            dom0_update_physmap(d, pfn, mfn, vphysmap_start);
 #undef pfn
             page++; pfn++;
             if (!(pfn & 0xfffff))
@@ -1056,6 +1217,15 @@ int __init construct_dom0(
         si->console.dom0.info_size = sizeof(struct dom0_vga_console_info);
     }
 
+    /*
+     * PVH: We need to update si->shared_info while we are on dom0 page
tables,
+     * but need to defer the p2m update until after we have fixed up the
+     * page tables for PVH so that the m2p for the si pte entry returns
+     * correct pfn.
+     */
+    if ( is_pvh_domain(d) )
+        si->shared_info = shared_info_paddr;
+
     if ( is_pv_32on64_domain(d) )
         xlat_start_info(si, XLAT_start_info_console_dom0);
 
@@ -1089,8 +1259,15 @@ int __init construct_dom0(
     regs->eflags = X86_EFLAGS_IF;
 
     if ( opt_dom0_shadow )
+    {
+        if ( is_pvh_domain(d) )
+        {
+            printk("Invalid option dom0_shadow for PVH\n");
+            return -EINVAL;
+        }
         if ( paging_enable(d, PG_SH_enable) == 0 ) 
             paging_update_paging_modes(v);
+    }
 
     if ( supervisor_mode_kernel )
     {
@@ -1180,6 +1357,19 @@ int __init construct_dom0(
         printk(" Xen warning: dom0 kernel broken ELF: %s\n",
                elf_check_broken(&elf));
 
+    if ( is_pvh_domain(d) )
+    {
+        /* finally, fixup the page table, replacing mfns with pfns */
+        pvh_fixup_page_tables_for_hap(v, v_start, v_end);
+
+        /* the pt has correct pfn for si, now update the mfn in the p2m */
+        mfn = virt_to_mfn(d->shared_info);
+        pfn = shared_info_paddr >> PAGE_SHIFT;
+        dom0_update_physmap(d, pfn, mfn, 0);
+
+        pvh_map_all_iomem(d);
+    }
+
     iommu_dom0_init(dom0);
     return 0;
 
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 31e1aed..9cf4467 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -46,7 +46,7 @@ static int gdbsx_guest_mem_io(
     return (iop->remain ? -EFAULT : 0);
 }
 
-static long update_memory_mapping(struct domain *d, unsigned long gfn,
+long update_memory_mapping(struct domain *d, unsigned long gfn,
                            unsigned long mfn, unsigned long nr_mfns,
                            bool_t add)
 {
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index d3f64bd..4accab6 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -579,6 +579,21 @@ int hap_domctl(struct domain *d, xen_domctl_shadow_op_t
*sc,
     }
 }
 
+void __init hap_set_pvh_alloc_for_dom0(struct domain *d,
+                                       unsigned long num_pages)
+{
+    int rc;
+    unsigned long memkb = num_pages * (PAGE_SIZE / 1024);
+
+   /* Copied from: libxl_get_required_shadow_memory() */
+    memkb = 4 * (256 * d->max_vcpus + 2 * (memkb / 1024));
+    num_pages = ((memkb+1023)/1024) << (20 - PAGE_SHIFT);
+    paging_lock(d);
+    rc = hap_set_allocation(d, num_pages, NULL);
+    paging_unlock(d);
+    BUG_ON(rc);
+}
+
 static const struct paging_mode hap_paging_real_mode;
 static const struct paging_mode hap_paging_protected_mode;
 static const struct paging_mode hap_paging_pae_mode;
diff --git a/xen/include/asm-x86/hap.h b/xen/include/asm-x86/hap.h
index e03f983..aab8558 100644
--- a/xen/include/asm-x86/hap.h
+++ b/xen/include/asm-x86/hap.h
@@ -63,6 +63,7 @@ int   hap_track_dirty_vram(struct domain *d,
                            XEN_GUEST_HANDLE_64(uint8) dirty_bitmap);
 
 extern const struct paging_mode *hap_paging_get_mode(struct vcpu *);
+void hap_set_pvh_alloc_for_dom0(struct domain *d, unsigned long num_pages);
 
 #endif /* XEN_HAP_H */
 
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index a057069..bf44ad2 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -89,4 +89,7 @@ extern unsigned int xen_processor_pmbits;
 
 extern bool_t opt_dom0_vcpus_pin;
 
+extern long update_memory_mapping(struct domain *d, unsigned long gfn,
+                    unsigned long mfn, unsigned long nr_mfns, bool_t add_map);
+
 #endif /* __XEN_DOMAIN_H__ */
-- 
1.7.2.3

Mukesh Rathor

2013-Nov-09 01:23 UTC

head link

[V1 PATCH 07/11] PVH dom0: implement XENMEM_add_to_physmap_range for x86

This preparatory patch adds support for XENMEM_add_to_physmap_range
on x86 so it can be used to create a guest on PVH dom0. To this end, we
add a new function xenmem_add_to_physmap_range(), and change
xenmem_add_to_physmap_once parameters so it can be called from
xenmem_add_to_physmap_range.

Please note, compat will continue to return -ENOSYS.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/mm.c |   82 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e1f47ab..2a6339a 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2810,7 +2810,7 @@ static struct domain *get_pg_owner(domid_t domid)
         goto out;
     }
 
-    if ( unlikely(paging_mode_translate(curr)) )
+    if ( !is_pvh_domain(curr) && unlikely(paging_mode_translate(curr))
)
     {
         MEM_LOG("Cannot mix foreign mappings with translated
domains");
         goto out;
@@ -4522,7 +4522,8 @@ static int handle_iomem_range(unsigned long s, unsigned
long e, void *p)
 
 static int xenmem_add_to_physmap_once(
     struct domain *d,
-    const struct xen_add_to_physmap *xatp)
+    const struct xen_add_to_physmap *xatp,
+    domid_t foreign_domid)
 {
     struct page_info *page = NULL;
     unsigned long gfn = 0; /* gcc ... */
@@ -4646,7 +4647,7 @@ static int xenmem_add_to_physmap(struct domain *d,
         start_xatp = *xatp;
         while ( xatp->size > 0 )
         {
-            rc = xenmem_add_to_physmap_once(d, xatp);
+            rc = xenmem_add_to_physmap_once(d, xatp, DOMID_INVALID);
             if ( rc < 0 )
                 return rc;
 
@@ -4672,7 +4673,43 @@ static int xenmem_add_to_physmap(struct domain *d,
         return rc;
     }
 
-    return xenmem_add_to_physmap_once(d, xatp);
+    return xenmem_add_to_physmap_once(d, xatp, DOMID_INVALID);
+}
+
+static int xenmem_add_to_physmap_range(struct domain *d,
+                                       struct xen_add_to_physmap_range *xatpr)
+{
+    int rc;
+
+    /* Process entries in reverse order to allow continuations */
+    while ( xatpr->size > 0 )
+    {
+        xen_ulong_t idx;
+        xen_pfn_t gpfn;
+        struct xen_add_to_physmap xatp;
+
+        if ( copy_from_guest_offset(&idx, xatpr->idxs, xatpr->size-1,
1)  ||
+             copy_from_guest_offset(&gpfn, xatpr->gpfns,
xatpr->size-1, 1) )
+        {
+            return -EFAULT;
+        }
+
+        xatp.space = xatpr->space;
+        xatp.idx = idx;
+        xatp.gpfn = gpfn;
+        rc = xenmem_add_to_physmap_once(d, &xatp, xatpr->foreign_domid);
+
+        if ( copy_to_guest_offset(xatpr->errs, xatpr->size-1, &rc, 1)
)
+            return -EFAULT;
+
+        xatpr->size--;
+
+        /* Check for continuation if it''s not the last interation */
+        if ( xatpr->size > 0 && hypercall_preempt_check() )
+            return -EAGAIN;
+    }
+
+    return 0;
 }
 
 long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
@@ -4689,6 +4726,10 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void)
arg)
         if ( copy_from_guest(&xatp, arg, 1) )
             return -EFAULT;
 
+        /* This one is only supported for add_to_physmap_range */
+        if ( xatp.space == XENMAPSPACE_gmfn_foreign )
+            return -EINVAL;
+
         d = rcu_lock_domain_by_any_id(xatp.domid);
         if ( d == NULL )
             return -ESRCH;
@@ -4716,6 +4757,39 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void)
arg)
         return rc;
     }
 
+    case XENMEM_add_to_physmap_range:
+    {
+        struct xen_add_to_physmap_range xatpr;
+        struct domain *d;
+
+        if ( copy_from_guest(&xatpr, arg, 1) )
+            return -EFAULT;
+
+        /* This mapspace is redundant for this hypercall */
+        if ( xatpr.space == XENMAPSPACE_gmfn_range )
+            return -EINVAL;
+
+        d = rcu_lock_domain_by_any_id(xatpr.domid);
+        if ( d == NULL )
+            return -ESRCH;
+
+        if ( (rc = xsm_add_to_physmap(XSM_TARGET, current->domain, d)) )
+        {
+            rcu_unlock_domain(d);
+            return rc;
+        }
+
+        rc = xenmem_add_to_physmap_range(d, &xatpr);
+
+        rcu_unlock_domain(d);
+
+        if ( rc == -EAGAIN )
+            rc = hypercall_create_continuation(
+                __HYPERVISOR_memory_op, "ih", op, arg);
+
+        return rc;
+    }
+
     case XENMEM_set_memory_map:
     {
         struct xen_foreign_memory_map fmap;
-- 
1.7.2.3

Mukesh Rathor

2013-Nov-09 01:23 UTC

head link

[V1 PATCH 08/11] PVH dom0: Introduce p2m_map_foreign

In this patch, a new type p2m_map_foreign is introduced for pages
that toolstack on PVH dom0 maps from foreign domains that its creating.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/mm/p2m-ept.c |    1 +
 xen/arch/x86/mm/p2m-pt.c  |    1 +
 xen/arch/x86/mm/p2m.c     |   28 ++++++++++++++++++++--------
 xen/include/asm-x86/p2m.h |    4 ++++
 4 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 595c6e7..67c200c 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -75,6 +75,7 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry,
p2m_type_t type, p2m_acces
             entry->w = 0;
             break;
         case p2m_grant_map_rw:
+        case p2m_map_foreign:
             entry->r = entry->w = 1;
             entry->x = 0;
             break;
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index a1d5650..09b60ce 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -89,6 +89,7 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t
mfn)
     case p2m_ram_rw:
         return flags | P2M_BASE_FLAGS | _PAGE_RW;
     case p2m_grant_map_rw:
+    case p2m_map_foreign:
         return flags | P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT;
     case p2m_mmio_direct:
         if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 8f380ed..b9755e4 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -525,7 +525,7 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn,
unsigned long mfn,
         for ( i = 0; i < (1UL << page_order); i++ )
         {
             mfn_return = p2m->get_entry(p2m, gfn + i, &t, &a, 0,
NULL);
-            if ( !p2m_is_grant(t) && !p2m_is_shared(t) )
+            if ( !p2m_is_grant(t) && !p2m_is_shared(t) &&
!p2m_is_foreign(t) )
                 set_gpfn_from_mfn(mfn+i, INVALID_M2P_ENTRY);
             ASSERT( !p2m_is_valid(t) || mfn + i == mfn_x(mfn_return) );
         }
@@ -756,10 +756,9 @@ void p2m_change_type_range(struct domain *d,
     p2m_unlock(p2m);
 }
 
-
-
-int
-set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+static int
+set_typed_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
+                    p2m_type_t gfn_p2mt)
 {
     int rc = 0;
     p2m_access_t a;
@@ -784,16 +783,29 @@ set_mmio_p2m_entry(struct domain *d, unsigned long gfn,
mfn_t mfn)
         set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
     }
 
-    P2M_DEBUG("set mmio %lx %lx\n", gfn, mfn_x(mfn));
-    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_mmio_direct,
p2m->default_access);
+    P2M_DEBUG("set %d %lx %lx\n", gfn_p2mt, gfn, mfn_x(mfn));
+    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, gfn_p2mt,
+                       p2m->default_access);
     gfn_unlock(p2m, gfn, 0);
     if ( 0 == rc )
         gdprintk(XENLOG_ERR,
-            "set_mmio_p2m_entry: set_p2m_entry failed! mfn=%08lx\n",
+            "%s: set_p2m_entry failed! mfn=%08lx\n", __func__,
             mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot)));
     return rc;
 }
 
+/* Returns: True for success. 0 for failure. */
+int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+{
+    return set_typed_p2m_entry(d, gfn, mfn, p2m_map_foreign);
+}
+
+int
+set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+{
+    return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct);
+}
+
 int
 clear_mmio_p2m_entry(struct domain *d, unsigned long gfn)
 {
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 43583b2..6fc71a1 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -70,6 +70,7 @@ typedef enum {
     p2m_ram_paging_in = 11,       /* Memory that is being paged in */
     p2m_ram_shared = 12,          /* Shared or sharable memory */
     p2m_ram_broken = 13,          /* Broken page, access cause domain crash */
+    p2m_map_foreign  = 14,        /* ram pages from foreign domain */
 } p2m_type_t;
 
 /*
@@ -180,6 +181,7 @@ typedef unsigned int p2m_query_t;
 #define p2m_is_sharable(_t) (p2m_to_mask(_t) & P2M_SHARABLE_TYPES)
 #define p2m_is_shared(_t)   (p2m_to_mask(_t) & P2M_SHARED_TYPES)
 #define p2m_is_broken(_t)   (p2m_to_mask(_t) & P2M_BROKEN_TYPES)
+#define p2m_is_foreign(_t)  (p2m_to_mask(_t) &
p2m_to_mask(p2m_map_foreign))
 
 /* Per-p2m-table state */
 struct p2m_domain {
@@ -510,6 +512,8 @@ p2m_type_t p2m_change_type(struct domain *d, unsigned long
gfn,
 int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn);
 
+/* Set foreign mfn in the current guest''s p2m table. */
+int set_foreign_p2m_entry(struct domain *domp, unsigned long gfn, mfn_t mfn);
 
 /* 
  * Populate-on-demand
-- 
1.7.2.3

Mukesh Rathor

2013-Nov-09 01:23 UTC

head link

[V1 PATCH 09/11] PVH dom0: Add and remove foreign pages

In this patch, a new function, xenmem_add_foreign_to_pmap(), is added
to map pages from foreign guest into current dom0 for domU creation.
Such pages are typed p2m_map_foreign. Also, support is added here to
XENMEM_remove_from_physmap to remove such pages. Note, in the remove
path, we must release the refcount that was taken during the map phase.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/mm.c           |   81 +++++++++++++++++++++++++++++++++++++++++++
 xen/common/memory.c         |   38 +++++++++++++++++---
 xen/include/public/memory.h |    2 +-
 3 files changed, 115 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 2a6339a..d6927a9 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4520,6 +4520,79 @@ static int handle_iomem_range(unsigned long s, unsigned
long e, void *p)
     return 0;
 }
 
+/*
+ * Add frames from foreign domain to current domain''s physmap. Similar
to
+ * XENMAPSPACE_gmfn but the frame is foreign being mapped into current,
+ * and is not removed from foreign domain.
+ * Usage: libxl on pvh dom0 creating a guest and doing privcmd_ioctl_mmap.
+ * Side Effect: the mfn for fgfn will be refcounted so it is not lost
+ *              while mapped here. The refcnt is released in do_memory_op()
+ *              via XENMEM_remove_from_physmap.
+ * Returns: 0 ==> success
+ */
+static int xenmem_add_foreign_to_pmap(domid_t foreign_domid,
+                                      unsigned long fgfn, unsigned long gpfn)
+{
+    p2m_type_t p2mt, p2mt_prev;
+    int rc = 0;
+    unsigned long prev_mfn, mfn = 0;
+    struct domain *fdom, *currd = current->domain;
+    struct page_info *page = NULL;
+
+    if ( currd->domain_id == foreign_domid || foreign_domid == DOMID_SELF ||
+         !is_pvh_domain(currd) )
+        return -EINVAL;
+
+    if ( !is_hardware_domain(currd) ||
+         (fdom = get_pg_owner(foreign_domid)) == NULL )
+        return -EPERM;
+
+    /* following will take a refcnt on the mfn */
+    page = get_page_from_gfn(fdom, fgfn, &p2mt, P2M_ALLOC);
+    if ( !page || !p2m_is_valid(p2mt) )
+    {
+        if ( page )
+            put_page(page);
+        put_pg_owner(fdom);
+        return -EINVAL;
+    }
+    mfn = page_to_mfn(page);
+
+    /* Remove previously mapped page if it is present. */
+    prev_mfn = mfn_x(get_gfn(currd, gpfn, &p2mt_prev));
+    if ( mfn_valid(prev_mfn) )
+    {
+        if ( is_xen_heap_mfn(prev_mfn) )
+            /* Xen heap frames are simply unhooked from this phys slot */
+            guest_physmap_remove_page(currd, gpfn, prev_mfn, 0);
+        else
+            /* Normal domain memory is freed, to avoid leaking memory. */
+            guest_remove_page(currd, gpfn);
+    }
+    /*
+     * Create the new mapping. Can''t use guest_physmap_add_page()
because it
+     * will update the m2p table which will result in  mfn -> gpfn of dom0
+     * and not fgfn of domU.
+     */
+    if ( set_foreign_p2m_entry(currd, gpfn, _mfn(mfn)) == 0 )
+    {
+        dprintk(XENLOG_WARNING,
+                "guest_physmap_add_page failed. gpfn:%lx mfn:%lx
fgfn:%lx\n",
+                gpfn, mfn, fgfn);
+        put_page(page);
+        rc = -EINVAL;
+    }
+
+    /*
+     * We must do this put_gfn after set_foreign_p2m_entry so another cpu
+     * doesn''t populate the gpfn before us.
+     */
+    put_gfn(currd, gpfn);
+
+    put_pg_owner(fdom);
+    return rc;
+}
+
 static int xenmem_add_to_physmap_once(
     struct domain *d,
     const struct xen_add_to_physmap *xatp,
@@ -4582,6 +4655,14 @@ static int xenmem_add_to_physmap_once(
             page = mfn_to_page(mfn);
             break;
         }
+
+        case XENMAPSPACE_gmfn_foreign:
+        {
+            rc = xenmem_add_foreign_to_pmap(foreign_domid, xatp->idx,
+                                            xatp->gpfn);
+            return rc;
+        }
+
         default:
             break;
     }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 50b740f..d81df18 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -675,9 +675,11 @@ long do_memory_op(unsigned long cmd,
XEN_GUEST_HANDLE_PARAM(void) arg)
 
     case XENMEM_remove_from_physmap:
     {
+        unsigned long mfn;
         struct xen_remove_from_physmap xrfp;
         struct page_info *page;
-        struct domain *d;
+        struct domain *d, *foreign_dom = NULL;
+        p2m_type_t p2mt, tp;
 
         if ( copy_from_guest(&xrfp, arg, 1) )
             return -EFAULT;
@@ -693,11 +695,37 @@ long do_memory_op(unsigned long cmd,
XEN_GUEST_HANDLE_PARAM(void) arg)
             return rc;
         }
 
-        page = get_page_from_gfn(d, xrfp.gpfn, NULL, P2M_ALLOC);
-        if ( page )
+        /*
+         * if PVH, the gfn could be mapped to a mfn from foreign domain by the
+         * user space tool during domain creation. We need to check for that,
+         * free it up from the p2m, and release refcnt on it. In such a case,
+         * page would be NULL and the following call would not have
refcnt''d
+         * the page. See also xenmem_add_foreign_to_pmap().
+         */
+        page = get_page_from_gfn(d, xrfp.gpfn, &p2mt, P2M_ALLOC);
+
+        if ( page || p2m_is_foreign(p2mt) )
         {
-            guest_physmap_remove_page(d, xrfp.gpfn, page_to_mfn(page), 0);
-            put_page(page);
+            if ( page )
+                mfn = page_to_mfn(page);
+            else
+            {
+                mfn = mfn_x(get_gfn_query(d, xrfp.gpfn, &tp));
+                foreign_dom = page_get_owner(mfn_to_page(mfn));
+                ASSERT(is_pvh_domain(d));
+                ASSERT(d != foreign_dom);
+                ASSERT(p2m_is_foreign(tp));
+            }
+
+            guest_physmap_remove_page(d, xrfp.gpfn, mfn, 0);
+            if (page)
+                put_page(page);
+
+            if ( p2m_is_foreign(p2mt) )
+            {
+                put_page(mfn_to_page(mfn));
+                put_gfn(d, xrfp.gpfn);
+            }
         }
         else
             rc = -ENOENT;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 7a26dee..e319b5d 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -208,7 +208,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_machphys_mapping_t);
 #define XENMAPSPACE_gmfn_range   3 /* GMFN range, XENMEM_add_to_physmap only.
*/
 #define XENMAPSPACE_gmfn_foreign 4 /* GMFN from another dom,
                                     * XENMEM_add_to_physmap_range only.
-                                    */
+                                    * (PVH x86 only) */
 /* ` } */
 
 /*
-- 
1.7.2.3

Mukesh Rathor

2013-Nov-09 01:23 UTC

head link

[V1 PATCH 10/11] PVH dom0: add opt_dom0pvh to setup.c

Add opt_dom0pvh. Note, PVH dom0 is disabled until the FIXME in 
domain_build.c is resolved.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/setup.c |   14 ++++++++++++--
 1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 32b23fc..0ad1d73 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -56,6 +56,10 @@ int opt_earlykdb=0;
 boolean_param("earlykdb", opt_earlykdb);
 #endif
 
+/* Boot dom0 in PVH mode */
+bool_t __initdata opt_dom0pvh;
+boolean_param("dom0pvh", opt_dom0pvh);
+
 /* opt_nosmp: If true, secondary processors are ignored. */
 static bool_t __initdata opt_nosmp;
 boolean_param("nosmp", opt_nosmp);
@@ -552,7 +556,7 @@ void __init __start_xen(unsigned long mbi_p)
 {
     char *memmap_type = NULL;
     char *cmdline, *kextra, *loader;
-    unsigned int initrdidx;
+    unsigned int initrdidx, domcr_flags = 0;
     multiboot_info_t *mbi = __va(mbi_p);
     module_t *mod = (module_t *)__va(mbi->mods_addr);
     unsigned long nr_pages, raw_max_page, modules_headroom, *module_map;
@@ -1334,8 +1338,14 @@ void __init __start_xen(unsigned long mbi_p)
     if ( !tboot_protect_mem_regions() )
         panic("Could not protect TXT memory regions\n");
 
+    /* Following removed when "PVH FIXME" in domain_build.c is
resolved */
+    if ( opt_dom0pvh )
+        panic("You do NOT have the correct xen version for dom0
PVH\n");
+
     /* Create initial domain 0. */
-    dom0 = domain_create(0, DOMCRF_s3_integrity, 0);
+    domcr_flags = (opt_dom0pvh ? DOMCRF_pvh | DOMCRF_hap : 0);
+    domcr_flags |= DOMCRF_s3_integrity;
+    dom0 = domain_create(0, domcr_flags, 0);
     if ( IS_ERR(dom0) || (alloc_dom0_vcpu0() == NULL) )
         panic("Error creating domain 0\n");
 
-- 
1.7.2.3

Jan Beulich

2013-Nov-12 16:08 UTC

head link

Re: [V1 PATCH 03/11] PVH dom0: iommu related changes

>>> On 09.11.13 at 02:23, Mukesh Rathor
<mukesh.rathor@oracle.com> wrote:
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -125,15 +125,27 @@ int iommu_domain_init(struct domain *d)
>      return hd->platform_ops->init(d);
>  }
>  
> +static inline void check_dom0_pvh_reqs(struct domain *d)
No point for this to be an inline function. Even more, irrespective
of whether it is, it should also be marked __init.
> @@ -141,12 +153,13 @@ void __init iommu_dom0_init(struct domain *d)
>          page_list_for_each ( page, &d->page_list )
>          {
>              unsigned long mfn = page_to_mfn(page);
> +            unsigned long gfn = mfn_to_gfn(d, _mfn(mfn));
So the other place in this file uses mfn_to_gmfn(), and I''d really like
two to remain in sync, so that things don''t get even more confusing
(i.e. it''s bad enough already that we have two almost identical
constructs).

Jan

Jan Beulich

2013-Nov-12 16:12 UTC

head link

Re: [V1 PATCH 04/11] PVH dom0: create update_memory_mapping() function

>>> On 09.11.13 at 02:23, Mukesh Rathor
<mukesh.rathor@oracle.com> wrote:
> Note, iomem_access_permitted can''t be moved to the function
> because current doesn''t point to dom0 in construct_dom0.
And it would be rather pointless to try to check this. For Dom0 you
just need to make sure you don''t call the function for inappropriate
ranges.
> +static long update_memory_mapping(struct domain *d, unsigned long gfn,
Is there any code path really returning something that doesn''t
fit in an int?
> +                           unsigned long mfn, unsigned long nr_mfns,
> +                           bool_t add)
> +{
> +    unsigned long i;
> +    long ret;
Same question here.

Jan

Konrad Rzeszutek Wilk

2013-Nov-12 16:13 UTC

head link

Re: [V1 PATCH 06/11] PVH dom0: construct_dom0 changes

On Fri, Nov 08, 2013 at 05:23:31PM -0800, Mukesh Rathor
wrote:> This patch changes construct_dom0 to boot in PVH mode. Changes
> need to support it are also included here.
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> ---
>  xen/arch/x86/domain_build.c |  224
+++++++++++++++++++++++++++++++++++++++----
>  xen/arch/x86/domctl.c       |    2 +-
>  xen/arch/x86/mm/hap/hap.c   |   15 +++
>  xen/include/asm-x86/hap.h   |    1 +
>  xen/include/xen/domain.h    |    3 +
>  5 files changed, 227 insertions(+), 18 deletions(-)
> 
> diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
> index c9ff680..d7c62d1 100644
> --- a/xen/arch/x86/domain_build.c
> +++ b/xen/arch/x86/domain_build.c
> @@ -35,6 +35,7 @@
>  #include <asm/setup.h>
>  #include <asm/bzimage.h> /* for bzimage_parse */
>  #include <asm/io_apic.h>
> +#include <asm/hap.h>
>  
>  #include <public/version.h>
>  
> @@ -307,6 +308,140 @@ static void __init process_dom0_ioports_disable(void)
>      }
>  }
>  
> +/*
> + * Set the 1:1 map for all non-RAM regions for dom 0. Thus, dom0 will have
> + * the entire io region mapped in the EPT/NPT.
> + *
> + * PVH FIXME: The following doesn''t map MMIO ranges when they sit
above the
> + *            highest E820 covered address.
> + */
> +static __init void pvh_map_all_iomem(struct domain *d)
> +{
> +    unsigned long start_pfn, end_pfn, end = 0, start = 0;
> +    const struct e820entry *entry;
> +    unsigned int i, nump;
> +    int rc;
> +
> +    for ( i = 0, entry = e820.map; i < e820.nr_map; i++, entry++ )
> +    {
> +        end = entry->addr + entry->size;
> +
> +        if ( entry->type == E820_RAM || entry->type == E820_UNUSABLE
||
> +             i == e820.nr_map - 1 )
> +        {
> +            start_pfn = PFN_DOWN(start);
> +
> +            /* Unused RAM areas are marked UNUSABLE, so skip it too */
> +            if ( entry->type != E820_RAM && entry->type !=
E820_UNUSABLE )
The conditional above is of: if ( .. == E820_RAM || .. E820_UNUSABLE )

why not replicate it here as well? This way it follows the same logic:

if ( entry->type == E820_RAM || entry->type == E820_UNUSABLE)
	end_pfn = PFN_UP(entry->addr);
else
	end_pfn = PFN_UP(end);

?
> +                end_pfn = PFN_UP(end);
> +            else
> +                end_pfn = PFN_UP(entry->addr);
> +
> +            if ( start_pfn < end_pfn )
> +            {
> +                nump = end_pfn - start_pfn;
> +                /* Add pages to the mapping */
> +                rc = update_memory_mapping(d, start_pfn, start_pfn, nump,
1);
> +                BUG_ON(rc);
> +            }
> +            start = end;
> +        }
> +    }
> +
> +    /* If the e820 ended under 4GB, we must map the remaining space upto
4GB */
Could you explain a bit more of ''why''? What if the machine
only has 3GB and
we want to boot dom0 with 512MB.
> +    if ( end < GB(4) )
> +    {
> +        start_pfn = PFN_UP(end);
> +        end_pfn = (GB(4)) >> PAGE_SHIFT;
> +        nump = end_pfn - start_pfn;
> +        rc = update_memory_mapping(d, start_pfn, start_pfn, nump, 1);
> +        BUG_ON(rc);
> +    }
> +}
Thank you!

Konrad Rzeszutek Wilk

2013-Nov-12 16:16 UTC

head link

Re: [V1 PATCH 08/11] PVH dom0: Introduce p2m_map_foreign

On Fri, Nov 08, 2013 at 05:23:33PM -0800, Mukesh Rathor
wrote:> In this patch, a new type p2m_map_foreign is introduced for pages
> that toolstack on PVH dom0 maps from foreign domains that its creating.
Is it only for creating? I thought it was used during run-time as well?
Say you need to use xenctx or anything else to map a foreign domain?
> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> ---
>  xen/arch/x86/mm/p2m-ept.c |    1 +
>  xen/arch/x86/mm/p2m-pt.c  |    1 +
>  xen/arch/x86/mm/p2m.c     |   28 ++++++++++++++++++++--------
>  xen/include/asm-x86/p2m.h |    4 ++++
>  4 files changed, 26 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index 595c6e7..67c200c 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -75,6 +75,7 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry,
p2m_type_t type, p2m_acces
>              entry->w = 0;
>              break;
>          case p2m_grant_map_rw:
> +        case p2m_map_foreign:
>              entry->r = entry->w = 1;
>              entry->x = 0;
>              break;
> diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
> index a1d5650..09b60ce 100644
> --- a/xen/arch/x86/mm/p2m-pt.c
> +++ b/xen/arch/x86/mm/p2m-pt.c
> @@ -89,6 +89,7 @@ static unsigned long p2m_type_to_flags(p2m_type_t t,
mfn_t mfn)
>      case p2m_ram_rw:
>          return flags | P2M_BASE_FLAGS | _PAGE_RW;
>      case p2m_grant_map_rw:
> +    case p2m_map_foreign:
>          return flags | P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT;
>      case p2m_mmio_direct:
>          if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index 8f380ed..b9755e4 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -525,7 +525,7 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long
gfn, unsigned long mfn,
>          for ( i = 0; i < (1UL << page_order); i++ )
>          {
>              mfn_return = p2m->get_entry(p2m, gfn + i, &t, &a,
0, NULL);
> -            if ( !p2m_is_grant(t) && !p2m_is_shared(t) )
> +            if ( !p2m_is_grant(t) && !p2m_is_shared(t) &&
!p2m_is_foreign(t) )
>                  set_gpfn_from_mfn(mfn+i, INVALID_M2P_ENTRY);
>              ASSERT( !p2m_is_valid(t) || mfn + i == mfn_x(mfn_return) );
>          }
> @@ -756,10 +756,9 @@ void p2m_change_type_range(struct domain *d,
>      p2m_unlock(p2m);
>  }
>  
> -
> -
> -int
> -set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
> +static int
> +set_typed_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
> +                    p2m_type_t gfn_p2mt)
>  {
>      int rc = 0;
>      p2m_access_t a;
> @@ -784,16 +783,29 @@ set_mmio_p2m_entry(struct domain *d, unsigned long
gfn, mfn_t mfn)
>          set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
>      }
>  
> -    P2M_DEBUG("set mmio %lx %lx\n", gfn, mfn_x(mfn));
> -    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_mmio_direct,
p2m->default_access);
> +    P2M_DEBUG("set %d %lx %lx\n", gfn_p2mt, gfn, mfn_x(mfn));
> +    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, gfn_p2mt,
> +                       p2m->default_access);
>      gfn_unlock(p2m, gfn, 0);
>      if ( 0 == rc )
>          gdprintk(XENLOG_ERR,
> -            "set_mmio_p2m_entry: set_p2m_entry failed!
mfn=%08lx\n",
> +            "%s: set_p2m_entry failed! mfn=%08lx\n", __func__,
>              mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot)));
>      return rc;
>  }
>  
> +/* Returns: True for success. 0 for failure. */
Um, ''true'' and ''0'' ? Shouldn''t it
say ''True for success, false for failure?''

Or ''1 for success, 0 for failure?''

> +int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
> +{
> +    return set_typed_p2m_entry(d, gfn, mfn, p2m_map_foreign);
> +}
> +
> +int
> +set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
> +{
> +    return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct);
> +}
> +
>  int
>  clear_mmio_p2m_entry(struct domain *d, unsigned long gfn)
>  {
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 43583b2..6fc71a1 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -70,6 +70,7 @@ typedef enum {
>      p2m_ram_paging_in = 11,       /* Memory that is being paged in */
>      p2m_ram_shared = 12,          /* Shared or sharable memory */
>      p2m_ram_broken = 13,          /* Broken page, access cause domain
crash */
> +    p2m_map_foreign  = 14,        /* ram pages from foreign domain */
>  } p2m_type_t;
>  
>  /*
> @@ -180,6 +181,7 @@ typedef unsigned int p2m_query_t;
>  #define p2m_is_sharable(_t) (p2m_to_mask(_t) & P2M_SHARABLE_TYPES)
>  #define p2m_is_shared(_t)   (p2m_to_mask(_t) & P2M_SHARED_TYPES)
>  #define p2m_is_broken(_t)   (p2m_to_mask(_t) & P2M_BROKEN_TYPES)
> +#define p2m_is_foreign(_t)  (p2m_to_mask(_t) &
p2m_to_mask(p2m_map_foreign))
>  
>  /* Per-p2m-table state */
>  struct p2m_domain {
> @@ -510,6 +512,8 @@ p2m_type_t p2m_change_type(struct domain *d, unsigned
long gfn,
>  int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
>  int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn);
>  
> +/* Set foreign mfn in the current guest''s p2m table. */
> +int set_foreign_p2m_entry(struct domain *domp, unsigned long gfn, mfn_t
mfn);
>  
>  /* 
>   * Populate-on-demand
> -- 
> 1.7.2.3
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2013-Nov-12 16:18 UTC

head link

Re: [V1 PATCH 10/11] PVH dom0: add opt_dom0pvh to setup.c

On Fri, Nov 08, 2013 at 05:23:35PM -0800, Mukesh Rathor
wrote:> Add opt_dom0pvh. Note, PVH dom0 is disabled until the FIXME in 
> domain_build.c is resolved.
Could you refer to in this description to which commit that is
please?> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> ---
>  xen/arch/x86/setup.c |   14 ++++++++++++--
>  1 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index 32b23fc..0ad1d73 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -56,6 +56,10 @@ int opt_earlykdb=0;
>  boolean_param("earlykdb", opt_earlykdb);
>  #endif
>  
> +/* Boot dom0 in PVH mode */
> +bool_t __initdata opt_dom0pvh;
> +boolean_param("dom0pvh", opt_dom0pvh);
> +
>  /* opt_nosmp: If true, secondary processors are ignored. */
>  static bool_t __initdata opt_nosmp;
>  boolean_param("nosmp", opt_nosmp);
> @@ -552,7 +556,7 @@ void __init __start_xen(unsigned long mbi_p)
>  {
>      char *memmap_type = NULL;
>      char *cmdline, *kextra, *loader;
> -    unsigned int initrdidx;
> +    unsigned int initrdidx, domcr_flags = 0;
>      multiboot_info_t *mbi = __va(mbi_p);
>      module_t *mod = (module_t *)__va(mbi->mods_addr);
>      unsigned long nr_pages, raw_max_page, modules_headroom, *module_map;
> @@ -1334,8 +1338,14 @@ void __init __start_xen(unsigned long mbi_p)
>      if ( !tboot_protect_mem_regions() )
>          panic("Could not protect TXT memory regions\n");
>  
> +    /* Following removed when "PVH FIXME" in domain_build.c is
resolved */
Please also mention the title of the patch that added the FIXME.
> +    if ( opt_dom0pvh )
> +        panic("You do NOT have the correct xen version for dom0
PVH\n");                                              ^-X
> +
>      /* Create initial domain 0. */
> -    dom0 = domain_create(0, DOMCRF_s3_integrity, 0);
> +    domcr_flags = (opt_dom0pvh ? DOMCRF_pvh | DOMCRF_hap : 0);
> +    domcr_flags |= DOMCRF_s3_integrity;
> +    dom0 = domain_create(0, domcr_flags, 0);
>      if ( IS_ERR(dom0) || (alloc_dom0_vcpu0() == NULL) )
>          panic("Error creating domain 0\n");
>  
> -- 
> 1.7.2.3
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2013-Nov-12 16:19 UTC

head link

Re: [V1 PATCH 01/11] PVH dom0: set eflags resvd bit #1

On Fri, Nov 08, 2013 at 05:23:26PM -0800, Mukesh Rathor
wrote:> In this patch the eflags resv bit #1 is set in vmx_vmenter_helper. If
> the bit is not set, the vmlaunch/resume will fail with guest state
> invalid.
What does ''MBS'' stand for? Somehow I assumed you would call
it RSV????> 
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> ---
>  xen/arch/x86/domain.c           |    2 --
>  xen/arch/x86/hvm/hvm.c          |    7 ++-----
>  xen/arch/x86/hvm/vmx/vmx.c      |    2 +-
>  xen/include/asm-x86/processor.h |    1 +
>  4 files changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 0736a4b..e9ab6c8 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -732,8 +732,6 @@ int arch_set_info_guest(
>      for ( i = 0; i < ARRAY_SIZE(v->arch.debugreg); ++i )
>          v->arch.debugreg[i] = c(debugreg[i]);
>  
> -    v->arch.user_regs.eflags |= 2;
> -
>      if ( !is_pv_vcpu(v) )
>      {
>          hvm_set_info_guest(v);
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 61b5d8e..0d7ab44 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -952,7 +952,7 @@ static int hvm_load_cpu_ctxt(struct domain *d,
hvm_domain_context_t *h)
>      v->arch.user_regs.edi = ctxt.rdi;
>      v->arch.user_regs.esp = ctxt.rsp;
>      v->arch.user_regs.eip = ctxt.rip;
> -    v->arch.user_regs.eflags = ctxt.rflags | 2;
> +    v->arch.user_regs.eflags = ctxt.rflags | X86_EFLAGS_MBS;
>      v->arch.user_regs.r8  = ctxt.r8;
>      v->arch.user_regs.r9  = ctxt.r9;
>      v->arch.user_regs.r10 = ctxt.r10;
> @@ -1133,7 +1133,6 @@ static int pvh_vcpu_initialise(struct vcpu *v)
>                           (unsigned long)v);
>  
>      v->arch.hvm_vcpu.hcall_64bit = 1;    /* PVH 32bitfixme. */
> -    v->arch.user_regs.eflags = 2;
>      v->arch.hvm_vcpu.inject_trap.vector = -1;
>  
>      if ( (rc = hvm_vcpu_cacheattr_init(v)) != 0 )
> @@ -1218,8 +1217,6 @@ int hvm_vcpu_initialise(struct vcpu *v)
>          (void(*)(unsigned long))hvm_assert_evtchn_irq,
>          (unsigned long)v);
>  
> -    v->arch.user_regs.eflags = 2;
> -
>      if ( v->vcpu_id == 0 )
>      {
>          /* NB. All these really belong in hvm_domain_initialise(). */
> @@ -3619,7 +3616,7 @@ void hvm_vcpu_reset_state(struct vcpu *v, uint16_t
cs, uint16_t ip)
>  
>      v->arch.vgc_flags = VGCF_online;
>      memset(&v->arch.user_regs, 0, sizeof(v->arch.user_regs));
> -    v->arch.user_regs.eflags = 2;
> +    v->arch.user_regs.eflags = X86_EFLAGS_MBS;
>      v->arch.user_regs.edx = 0x00000f00;
>      v->arch.user_regs.eip = ip;
>      memset(&v->arch.debugreg, 0, sizeof(v->arch.debugreg));
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index e93fd98..1f39220 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -3033,7 +3033,7 @@ void vmx_vmenter_helper(const struct cpu_user_regs
*regs)
>  
>      __vmwrite(GUEST_RIP,    regs->rip);
>      __vmwrite(GUEST_RSP,    regs->rsp);
> -    __vmwrite(GUEST_RFLAGS, regs->rflags);
> +    __vmwrite(GUEST_RFLAGS, regs->rflags | X86_EFLAGS_MBS);
>  }
>  
>  /*
> diff --git a/xen/include/asm-x86/processor.h
b/xen/include/asm-x86/processor.h
> index 551036d..73a3202 100644
> --- a/xen/include/asm-x86/processor.h
> +++ b/xen/include/asm-x86/processor.h
> @@ -35,6 +35,7 @@
>   * EFLAGS bits
>   */
>  #define X86_EFLAGS_CF	0x00000001 /* Carry Flag */
> +#define X86_EFLAGS_MBS	0x00000002 /* Resvd bit */
>  #define X86_EFLAGS_PF	0x00000004 /* Parity Flag */
>  #define X86_EFLAGS_AF	0x00000010 /* Auxillary carry Flag */
>  #define X86_EFLAGS_ZF	0x00000040 /* Zero Flag */
> -- 
> 1.7.2.3
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Jan Beulich

2013-Nov-12 16:24 UTC

head link

Re: [V1 PATCH 01/11] PVH dom0: set eflags resvd bit #1

>>> On 12.11.13 at 17:19, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Fri, Nov 08, 2013 at 05:23:26PM -0800, Mukesh Rathor wrote:
>> In this patch the eflags resv bit #1 is set in vmx_vmenter_helper. If
>> the bit is not set, the vmlaunch/resume will fail with guest state
>> invalid.
> 
> What does ''MBS'' stand for? Somehow I assumed you would
call
> it RSV????
With MBZ = "must be zero", MBS is "must be set". And
I''m fine
with that - RSV doesn''t tell you which way it is reserved.

Jan

Konrad Rzeszutek Wilk

2013-Nov-12 16:31 UTC

head link

Re: [V1 PATCH 01/11] PVH dom0: set eflags resvd bit #1

On Tue, Nov 12, 2013 at 04:24:50PM +0000, Jan Beulich
wrote:> >>> On 12.11.13 at 17:19, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> > On Fri, Nov 08, 2013 at 05:23:26PM -0800, Mukesh Rathor wrote:
> >> In this patch the eflags resv bit #1 is set in vmx_vmenter_helper.
If
> >> the bit is not set, the vmlaunch/resume will fail with guest state
> >> invalid.
> > 
> > What does ''MBS'' stand for? Somehow I assumed you
would call
> > it RSV????
> 
> With MBZ = "must be zero", MBS is "must be set". And
I''m fine
> with that - RSV doesn''t tell you which way it is reserved.
Ah, thank you for the explanation!> 
> Jan
>

Jan Beulich

2013-Nov-12 16:35 UTC

head link

Re: [V1 PATCH 06/11] PVH dom0: construct_dom0 changes

>>> On 09.11.13 at 02:23, Mukesh Rathor
<mukesh.rathor@oracle.com> wrote:
> +/*
> + * Set the 1:1 map for all non-RAM regions for dom 0. Thus, dom0 will have
> + * the entire io region mapped in the EPT/NPT.
> + *
> + * PVH FIXME: The following doesn''t map MMIO ranges when they sit
above the
All other fixme-s of yours so far used lower case. Please - be
consistent.
> +static __init void pvh_fixup_page_tables_for_hap(struct vcpu *v,
> +                                                 unsigned long v_start,
> +                                                 unsigned long v_end)
> +{
> +    int i, j, k;
> +    l4_pgentry_t *pl4e, *l4start;
> +    l3_pgentry_t *pl3e;
> +    l2_pgentry_t *pl2e;
> +    l1_pgentry_t *pl1e;
> +    unsigned long cr3_pfn;
> +
> +    ASSERT(paging_mode_enabled(v->domain));
> +
> +    l4start = map_domain_page(pagetable_get_pfn(v->arch.guest_table));
> +
> +    /* Give the guest a clean slate to start with */
> +    pl4e = l4start + l4_table_offset(v_start);
> +    memset(l4start, 0, (unsigned long)pl4e - (unsigned long)l4start);
> +
> +    pl4e = l4start + l4_table_offset(v_end) + 1;
> +    if ( (unsigned long)pl4e & (PAGE_SIZE - 1) )
> +        memset(pl4e, 0, PAGE_SIZE - ((unsigned long)pl4e & (PAGE_SIZE
- 1)));
These look more complicated than necessary (largely due to the
casts), but be it that way.
> +    pl4e = l4start + l4_table_offset(v_start);
>...
> +    *pl4e = l4e_from_pfn(get_gpfn_from_mfn(l4e_get_pfn(*pl4e)),
> +                         l4e_get_flags(*pl4e));
But this one I told before needs to be in a loop. You must not make
assumptions on guest virtual address space layout, and hence you
must not assume none of the initial mapping crosses an L4 boundary.

And once these are in a loop, getting the earlier two loops simplified
(using l4e_empty() rather than plain memset()) and ordered properly
(the head part before the main loop, the tail part after) will be almost
obvious legibility cleanups.

Jan

Jan Beulich

2013-Nov-12 16:46 UTC

head link

Re: [V1 PATCH 07/11] PVH dom0: implement XENMEM_add_to_physmap_range for x86

>>> On 09.11.13 at 02:23, Mukesh Rathor
<mukesh.rathor@oracle.com> wrote:
> @@ -4522,7 +4522,8 @@ static int handle_iomem_range(unsigned long s,
unsigned long e, void *p)
>  
>  static int xenmem_add_to_physmap_once(
>      struct domain *d,
> -    const struct xen_add_to_physmap *xatp)
> +    const struct xen_add_to_physmap *xatp,
> +    domid_t foreign_domid)
I continue to be confused by this parameter being added, but not
being used. I can only repeat - logically consistent pieces please.
At least from looking at the patch alone, a user of
XENMEM_add_to_physmap_range will, after this patch, not get
what it is supposed to get.

Jan

Jan Beulich

2013-Nov-12 16:58 UTC

head link

Re: [V1 PATCH 09/11] PVH dom0: Add and remove foreign pages

>>> On 09.11.13 at 02:23, Mukesh Rathor
<mukesh.rathor@oracle.com> wrote:
> +static int xenmem_add_foreign_to_pmap(domid_t foreign_domid,
> +                                      unsigned long fgfn, unsigned long
gpfn)
> +{
> +    p2m_type_t p2mt, p2mt_prev;
> +    int rc = 0;
> +    unsigned long prev_mfn, mfn = 0;
> +    struct domain *fdom, *currd = current->domain;
> +    struct page_info *page = NULL;
> +
> +    if ( currd->domain_id == foreign_domid || foreign_domid ==
DOMID_SELF ||
> +         !is_pvh_domain(currd) )
> +        return -EINVAL;
> +
> +    if ( !is_hardware_domain(currd) ||
is_control_domain()

Jan

Roger Pau Monné

2013-Nov-14 11:38 UTC

head link

Re: [V1 PATCH 0/11]: PVH dom0....

On 09/11/13 02:23, Mukesh Rathor wrote:> These patches implement PVH dom0. Please note there is 1 fixme in
> this entire series that is being addressed under a different patch 
> series.  PVH dom0 creation is disabled until then.
> 
> Patches 1 thru 6 implement changes in and around construct_dom0.
> Patches 7 thru 11 are to support tool stack on PVH dom0, and have
> been mostly looked at in the past.
Thanks for the patches :)

I guess they are based on top of George''s latest PVH DomU series (v15)
that were already committed. Which patch is needed in order to run a PVH
Dom0? Also, is the series available on a git repository that I pull from?

Roger.

Mukesh Rathor

2013-Nov-14 22:42 UTC

head link

Re: [V1 PATCH 0/11]: PVH dom0....

On Thu, 14 Nov 2013 12:38:10 +0100
Roger Pau Monné <roger.pau@citrix.com> wrote:
> On 09/11/13 02:23, Mukesh Rathor wrote:
> > These patches implement PVH dom0. Please note there is 1 fixme in
> > this entire series that is being addressed under a different patch 
> > series.  PVH dom0 creation is disabled until then.
> > 
> > Patches 1 thru 6 implement changes in and around construct_dom0.
> > Patches 7 thru 11 are to support tool stack on PVH dom0, and have
> > been mostly looked at in the past.
> 
> Thanks for the patches :)
> 
> I guess they are based on top of George's latest PVH DomU series (v15)
> that were already committed. Which patch is needed in order to run a
> PVH Dom0? Also, is the series available on a git repository that I
> pull from?
> 
> Roger.
Hey Roger,

No, I was going to base them on v14, but then there was talk of making
changes again, and looked like it had unresolved issues. These are based
on my latest pathes rebased to: c/s 170fa99 

If v15 is checked in very soon, I would be glad to rebase the next
version. 

Let me get back to you on the external repo, may be next
version I'll create one, unless you need sooner.

thanks,
Mukesh


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Mukesh Rathor

2013-Nov-15 01:43 UTC

head link

Re: [V1 PATCH 03/11] PVH dom0: iommu related changes

On Tue, 12 Nov 2013 16:08:51 +0000
"Jan Beulich" <JBeulich@suse.com> wrote:
> >>> On 09.11.13 at 02:23, Mukesh Rathor
<mukesh.rathor@oracle.com>
> >>> wrote:
> > --- a/xen/drivers/passthrough/iommu.c
> > +++ b/xen/drivers/passthrough/iommu.c
> > @@ -125,15 +125,27 @@ int iommu_domain_init(struct domain *d)
> >      return hd->platform_ops->init(d);
> >  }
> >  
> > +static inline void check_dom0_pvh_reqs(struct domain *d)
> 
> No point for this to be an inline function. Even more, irrespective
> of whether it is, it should also be marked __init.
Ok, marked __init, removed inline. Or did you mean get
rid of the function and move checks to iommu_dom0_init()?

thanks,
m

Mukesh Rathor

2013-Nov-15 01:59 UTC

head link

Re: [V1 PATCH 04/11] PVH dom0: create update_memory_mapping() function

On Tue, 12 Nov 2013 16:12:13 +0000
"Jan Beulich" <JBeulich@suse.com> wrote:
> >>> On 09.11.13 at 02:23, Mukesh Rathor
<mukesh.rathor@oracle.com>
> >>> wrote:
> > Note, iomem_access_permitted can''t be moved to the function
> > because current doesn''t point to dom0 in construct_dom0.
> 
> And it would be rather pointless to try to check this. For Dom0 you
> just need to make sure you don''t call the function for
inappropriate
> ranges.
> 
> > +static long update_memory_mapping(struct domain *d, unsigned long
> > gfn,
> 
> Is there any code path really returning something that doesn''t
> fit in an int?
Well, the caller arch_do_domctl() returns long and has it declared
long too. Want me to change it to int and it will implicitly get
cast''d?

long arch_do_domctl(
    struct xen_domctl *domctl, struct domain *d,
    XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
{
    long ret = 0;
>> +                           unsigned long mfn, unsigned long nr_mfns,
>> +                           bool_t add)
>> +{
>> +    unsigned long i;
>> +    long ret;  
>Same question here.
Original code has loop count as ulong, prob because nr_mfns is long.

    case XEN_DOMCTL_memory_mapping:
    {
        unsigned long gfn = domctl->u.memory_mapping.first_gfn;
        ..
        unsigned long i;

Want me to change it to int also?

thanks
Mukesh

Mukesh Rathor

2013-Nov-15 02:21 UTC

head link

Re: [V1 PATCH 06/11] PVH dom0: construct_dom0 changes

On Tue, 12 Nov 2013 11:13:31 -0500
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Fri, Nov 08, 2013 at 05:23:31PM -0800, Mukesh Rathor wrote:
> > This patch changes construct_dom0 to boot in PVH mode. Changes
> > need to support it are also included here.
> > 
> > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> > ---
> >  xen/arch/x86/domain_build.c |  224
> > +++++++++++++++++++++++++++++++++++++++----
> > xen/arch/x86/domctl.c       |    2 +- xen/arch/x86/mm/hap/hap.c
> > |   15 +++ xen/include/asm-x86/hap.h   |    1 +
> >  xen/include/xen/domain.h    |    3 +
> >  5 files changed, 227 insertions(+), 18 deletions(-)
> > 
> > diff --git a/xen/arch/x86/domain_build.c
> > b/xen/arch/x86/domain_build.c index c9ff680..d7c62d1 100644
> > --- a/xen/arch/x86/domain_build.c
> > +++ b/xen/arch/x86/domain_build.c
> > @@ -35,6 +35,7 @@
> >  #include <asm/setup.h>
> >  #include <asm/bzimage.h> /* for bzimage_parse */
> >  #include <asm/io_apic.h>
> > +#include <asm/hap.h>
> >  
> >  #include <public/version.h>
> >  
> > @@ -307,6 +308,140 @@ static void __init
> > process_dom0_ioports_disable(void) }
> >  }
> >  
> > +/*
> > + * Set the 1:1 map for all non-RAM regions for dom 0. Thus, dom0
> > will have
> > + * the entire io region mapped in the EPT/NPT.
> > + *
> > + * PVH FIXME: The following doesn''t map MMIO ranges when
they sit
> > above the
> > + *            highest E820 covered address.
> > + */
> > +static __init void pvh_map_all_iomem(struct domain *d)
> > +{
> > +    unsigned long start_pfn, end_pfn, end = 0, start = 0;
> > +    const struct e820entry *entry;
> > +    unsigned int i, nump;
> > +    int rc;
> > +
> > +    for ( i = 0, entry = e820.map; i < e820.nr_map; i++, entry++ )
> > +    {
> > +        end = entry->addr + entry->size;
> > +
> > +        if ( entry->type == E820_RAM || entry->type => >
E820_UNUSABLE ||
> > +             i == e820.nr_map - 1 )
> > +        {
> > +            start_pfn = PFN_DOWN(start);
> > +
> > +            /* Unused RAM areas are marked UNUSABLE, so skip it
> > too */
> > +            if ( entry->type != E820_RAM && entry->type
!> > E820_UNUSABLE )
> 
> The conditional above is of: if ( .. == E820_RAM || .. E820_UNUSABLE )
> 
> why not replicate it here as well? This way it follows the same logic:
> 
> if ( entry->type == E820_RAM || entry->type == E820_UNUSABLE)
> 	end_pfn = PFN_UP(entry->addr);
> else
> 	end_pfn = PFN_UP(end);
Yeah, OK.
> > +                end_pfn = PFN_UP(end);
> > +            else
> > +                end_pfn = PFN_UP(entry->addr);
> > +
> > +            if ( start_pfn < end_pfn )
> > +            {
> > +                nump = end_pfn - start_pfn;
> > +                /* Add pages to the mapping */
> > +                rc = update_memory_mapping(d, start_pfn,
> > start_pfn, nump, 1);
> > +                BUG_ON(rc);
> > +            }
> > +            start = end;
> > +        }
> > +    }
> > +
> > +    /* If the e820 ended under 4GB, we must map the remaining
> > space upto 4GB */
> 
> Could you explain a bit more of ''why''? What if the
machine only has
> 3GB and we want to boot dom0 with 512MB.
That''s fine. We are mapping the non-ram region, beyond last e820 entry.

Jan, you had pointed this out in earlier review. With the other patch
addressing mmio space above last e820 mapping, this will not be needed
anymore right? can I just remove it from the next patch version then?

thanks
Mukesh

Mukesh Rathor

2013-Nov-15 02:34 UTC

head link

Re: [V1 PATCH 06/11] PVH dom0: construct_dom0 changes

On Tue, 12 Nov 2013 16:35:05 +0000
"Jan Beulich" <JBeulich@suse.com> wrote:
> >>> On 09.11.13 at 02:23, Mukesh Rathor
<mukesh.rathor@oracle.com>
> >>> wrote:
......> These look more complicated than necessary (largely due to the
> casts), but be it that way.
> 
> > +    pl4e = l4start + l4_table_offset(v_start);
> >...
> > +    *pl4e = l4e_from_pfn(get_gpfn_from_mfn(l4e_get_pfn(*pl4e)),
> > +                         l4e_get_flags(*pl4e));
> 
> But this one I told before needs to be in a loop. You must not make
> assumptions on guest virtual address space layout, and hence you
> must not assume none of the initial mapping crosses an L4 boundary.
Ah right, my bad.
> And once these are in a loop, getting the earlier two loops simplified
> (using l4e_empty() rather than plain memset()) and ordered properly
> (the head part before the main loop, the tail part after) will be
> almost obvious legibility cleanups.
Actually, the loop will go from v_start->L4 to v_end->L4. The memsets
are clearing the entries before v_start and after v_end. So, I''m
thinking something like:

clear entries before v_start using memset
clear entries after v_end using memset

for ( pl4e = l4start + l4_table_offset(v_start);
      pl4e <= l4start + l4_table_offset(v_end); 
      pl4e++)
{
    pl3e = map_l3t_from_l4e(*pl4e);
    for ( i = 0; i < PAGE_SIZE / sizeof(*pl3e); i++, pl3e++ )
    {
    ...
}

Look ok?

thanks,
Mukesh

Jan Beulich

2013-Nov-15 07:36 UTC

head link

Re: [V1 PATCH 03/11] PVH dom0: iommu related changes

>>> On 15.11.13 at 02:43, Mukesh Rathor
<mukesh.rathor@oracle.com> wrote:
> On Tue, 12 Nov 2013 16:08:51 +0000
> "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>> >>> On 09.11.13 at 02:23, Mukesh Rathor
<mukesh.rathor@oracle.com>
>> >>> wrote:
>> > --- a/xen/drivers/passthrough/iommu.c
>> > +++ b/xen/drivers/passthrough/iommu.c
>> > @@ -125,15 +125,27 @@ int iommu_domain_init(struct domain *d)
>> >      return hd->platform_ops->init(d);
>> >  }
>> >  
>> > +static inline void check_dom0_pvh_reqs(struct domain *d)
>> 
>> No point for this to be an inline function. Even more, irrespective
>> of whether it is, it should also be marked __init.
> 
> Ok, marked __init, removed inline. Or did you mean get
> rid of the function and move checks to iommu_dom0_init()?
I would have preferred so, but I''m fine with what you did.

Jan

Jan Beulich

2013-Nov-15 07:38 UTC

head link

Re: [V1 PATCH 04/11] PVH dom0: create update_memory_mapping() function

>>> On 15.11.13 at 02:59, Mukesh Rathor
<mukesh.rathor@oracle.com> wrote:
> On Tue, 12 Nov 2013 16:12:13 +0000
> "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>> >>> On 09.11.13 at 02:23, Mukesh Rathor
<mukesh.rathor@oracle.com>
>> >>> wrote:
>> > Note, iomem_access_permitted can''t be moved to the
function
>> > because current doesn''t point to dom0 in construct_dom0.
>> 
>> And it would be rather pointless to try to check this. For Dom0 you
>> just need to make sure you don''t call the function for
inappropriate
>> ranges.
>> 
>> > +static long update_memory_mapping(struct domain *d, unsigned long
>> > gfn,
>> 
>> Is there any code path really returning something that doesn''t
>> fit in an int?
> 
> Well, the caller arch_do_domctl() returns long and has it declared
> long too. Want me to change it to int and it will implicitly get
cast''d?
Yes.
> long arch_do_domctl(
>     struct xen_domctl *domctl, struct domain *d,
>     XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
> {
>     long ret = 0;
> 
>>> +                           unsigned long mfn, unsigned long
nr_mfns,
>>> +                           bool_t add)
>>> +{
>>> +    unsigned long i;
>>> +    long ret;  
> 
>>Same question here.
> 
> Original code has loop count as ulong, prob because nr_mfns is long.
> 
>     case XEN_DOMCTL_memory_mapping:
>     {
>         unsigned long gfn = domctl->u.memory_mapping.first_gfn;
>         ..
>         unsigned long i;
> 
> Want me to change it to int also?
The remark was about "ret", not "i" or some other count.

Jan

Jan Beulich

2013-Nov-15 07:40 UTC

head link

Re: [V1 PATCH 06/11] PVH dom0: construct_dom0 changes

>>> On 15.11.13 at 03:34, Mukesh Rathor
<mukesh.rathor@oracle.com> wrote:
> On Tue, 12 Nov 2013 16:35:05 +0000
> "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>> >>> On 09.11.13 at 02:23, Mukesh Rathor
<mukesh.rathor@oracle.com>
>> >>> wrote:
> ......
>> These look more complicated than necessary (largely due to the
>> casts), but be it that way.
>> 
>> > +    pl4e = l4start + l4_table_offset(v_start);
>> >...
>> > +    *pl4e = l4e_from_pfn(get_gpfn_from_mfn(l4e_get_pfn(*pl4e)),
>> > +                         l4e_get_flags(*pl4e));
>> 
>> But this one I told before needs to be in a loop. You must not make
>> assumptions on guest virtual address space layout, and hence you
>> must not assume none of the initial mapping crosses an L4 boundary.
> 
> Ah right, my bad.
> 
>> And once these are in a loop, getting the earlier two loops simplified
>> (using l4e_empty() rather than plain memset()) and ordered properly
>> (the head part before the main loop, the tail part after) will be
>> almost obvious legibility cleanups.
> 
> Actually, the loop will go from v_start->L4 to v_end->L4. The memsets
> are clearing the entries before v_start and after v_end. So, I''m
> thinking something like:
> 
> clear entries before v_start using memset
> clear entries after v_end using memset
> 
> for ( pl4e = l4start + l4_table_offset(v_start);
>       pl4e <= l4start + l4_table_offset(v_end); 
>       pl4e++)
> {
>     pl3e = map_l3t_from_l4e(*pl4e);
>     for ( i = 0; i < PAGE_SIZE / sizeof(*pl3e); i++, pl3e++ )
>     {
>     ...
> }
> 
> Look ok?
Yes, with the leading part getting cleared prior to the loop, and
the trailing part after (thus allowing to not re-initialized pl4e in
each loop''s for()).

And the ending condition of course if off by one above: You
really want pl4e <= l4start + l4_table_offset(v_end - 1).

Jan

Jan Beulich

2013-Nov-15 07:55 UTC

head link

Re: [V1 PATCH 0/11]: PVH dom0....

>>> On 14.11.13 at 23:42, Mukesh Rathor
<mukesh.rathor@oracle.com> wrote:
> On Thu, 14 Nov 2013 12:38:10 +0100
> Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
>> On 09/11/13 02:23, Mukesh Rathor wrote:
>> > These patches implement PVH dom0. Please note there is 1 fixme in
>> > this entire series that is being addressed under a different patch
>> > series.  PVH dom0 creation is disabled until then.
>> > 
>> > Patches 1 thru 6 implement changes in and around construct_dom0.
>> > Patches 7 thru 11 are to support tool stack on PVH dom0, and have
>> > been mostly looked at in the past.
>> 
>> Thanks for the patches :)
>> 
>> I guess they are based on top of George's latest PVH DomU series
(v15)
>> that were already committed. Which patch is needed in order to run a
>> PVH Dom0? Also, is the series available on a git repository that I
>> pull from?
>> 
>> Roger.
> 
> Hey Roger,
> 
> No, I was going to base them on v14, but then there was talk of making
> changes again, and looked like it had unresolved issues. These are based
> on my latest pathes rebased to: c/s 170fa99 
> 
> If v15 is checked in very soon, I would be glad to rebase the next
> version. 
As you may have noticed, v15 did go in before you wrote this reply.
As did the first two patches of your Dom0 series (suitably fixed up
to apply on top of v15).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Jan Beulich

2013-Nov-15 07:59 UTC

head link

Re: [V1 PATCH 06/11] PVH dom0: construct_dom0 changes

>>> On 15.11.13 at 03:21, Mukesh Rathor
<mukesh.rathor@oracle.com> wrote:
> On Tue, 12 Nov 2013 11:13:31 -0500
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
>> On Fri, Nov 08, 2013 at 05:23:31PM -0800, Mukesh Rathor wrote:
>> > +                end_pfn = PFN_UP(end);
>> > +            else
>> > +                end_pfn = PFN_UP(entry->addr);
>> > +
>> > +            if ( start_pfn < end_pfn )
>> > +            {
>> > +                nump = end_pfn - start_pfn;
>> > +                /* Add pages to the mapping */
>> > +                rc = update_memory_mapping(d, start_pfn,
>> > start_pfn, nump, 1);
>> > +                BUG_ON(rc);
>> > +            }
>> > +            start = end;
>> > +        }
>> > +    }
>> > +
>> > +    /* If the e820 ended under 4GB, we must map the remaining
>> > space upto 4GB */
>> 
>> Could you explain a bit more of ''why''? What if the
machine only has
>> 3GB and we want to boot dom0 with 512MB.
> 
> That''s fine. We are mapping the non-ram region, beyond last e820
entry.
> 
> Jan, you had pointed this out in earlier review. With the other patch
> addressing mmio space above last e820 mapping, this will not be needed
> anymore right? can I just remove it from the next patch version then?
Not sure what needs explaining there: The address range right
below 4G is used for various sorts of non-RAM on _all_ systems
I''ve ever seen, so not mapping the (top-of-RAM, 4Gb) range
would seem rather awkward - I doubt you could bring up Dom0
there. (And just to make this explicit, a good part of the data
structures here is _not_ enumerable via PCI device BARs.)

Jan

Xen devel - Nov 2013 - [V1 PATCH 0/11]: PVH dom0....

[V1 PATCH 0/11]: PVH dom0....

[V1 PATCH 01/11] PVH dom0: set eflags resvd bit #1

[V1 PATCH 02/11] PVH dom0: Allow physdevops for PVH dom0

[V1 PATCH 03/11] PVH dom0: iommu related changes

[V1 PATCH 04/11] PVH dom0: create update_memory_mapping() function

[V1 PATCH 05/11] PVH dom0: move some pv specific code to static functions

[V1 PATCH 06/11] PVH dom0: construct_dom0 changes

[V1 PATCH 07/11] PVH dom0: implement XENMEM_add_to_physmap_range for x86

[V1 PATCH 08/11] PVH dom0: Introduce p2m_map_foreign

[V1 PATCH 09/11] PVH dom0: Add and remove foreign pages

[V1 PATCH 10/11] PVH dom0: add opt_dom0pvh to setup.c

Re: [V1 PATCH 03/11] PVH dom0: iommu related changes

Re: [V1 PATCH 04/11] PVH dom0: create update_memory_mapping() function

Re: [V1 PATCH 06/11] PVH dom0: construct_dom0 changes

Re: [V1 PATCH 08/11] PVH dom0: Introduce p2m_map_foreign

Re: [V1 PATCH 10/11] PVH dom0: add opt_dom0pvh to setup.c

Re: [V1 PATCH 01/11] PVH dom0: set eflags resvd bit #1

Re: [V1 PATCH 01/11] PVH dom0: set eflags resvd bit #1

Re: [V1 PATCH 01/11] PVH dom0: set eflags resvd bit #1

Re: [V1 PATCH 06/11] PVH dom0: construct_dom0 changes

Re: [V1 PATCH 07/11] PVH dom0: implement XENMEM_add_to_physmap_range for x86

Re: [V1 PATCH 09/11] PVH dom0: Add and remove foreign pages

Re: [V1 PATCH 0/11]: PVH dom0....

Re: [V1 PATCH 0/11]: PVH dom0....

Re: [V1 PATCH 03/11] PVH dom0: iommu related changes

Re: [V1 PATCH 04/11] PVH dom0: create update_memory_mapping() function

Re: [V1 PATCH 06/11] PVH dom0: construct_dom0 changes

Re: [V1 PATCH 06/11] PVH dom0: construct_dom0 changes

Re: [V1 PATCH 03/11] PVH dom0: iommu related changes

Re: [V1 PATCH 04/11] PVH dom0: create update_memory_mapping() function

Re: [V1 PATCH 06/11] PVH dom0: construct_dom0 changes

Re: [V1 PATCH 0/11]: PVH dom0....

Re: [V1 PATCH 06/11] PVH dom0: construct_dom0 changes