thr3ads.net - Xen devel - [PATCH RESEND v5 0/6] xen/arm: live migration support in arndale board [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Jaeyong Yoo

2013-Nov-08 07:50 UTC

[PATCH RESEND v5 0/6] xen/arm: live migration support in arndale board

Note: Resend the patch series since the last one was white-space corrupted.

Hello!
here goes the v5 patch series. The major changes in this version are the
following:

  1) Save and restore the vcpu-related registers in hvm context. (In patch 1)

  2) Rather than using set_memory_map hypercall use GUEST_RAM_BASE macro as the
start
     of physical address of guest RAM and just let the hypervisor know the
     maximum memory of guest (In patch 2 and 6)

  3) Use bitmap for temporary saving the dirty pages rather than linked list.
     (In patch 5)


Patch 1 implements hvm save/restore (including vcpu).

Patch 2 implements 'get_maximum_gpfn' hypercall

Patch 3 implements 'modify_returncode' for switching the return value of
suspend hypercall from domU.

Patch 4 implements base functions for VLPT.

Patch 5 implements dirty-page tracing by using VLPT.

Patch 6 implements the toolstack part for live migration of ARM.

NOTE: In this version, I do not use the p2m walker infrastructure for
p2m_change_entry_type_global.
I think I can apply it after Stefano’s patch is commited. 

Best,
Jaeyong


Alexey Sokolov (2):
  xen/arm: Implement modify_returncode
  xen/arm: Implement toolstack for xl restore/save and migrate

Evgeny Fedotov (2):
  xen/arm: Implement hvm save and restore
  xen/arm: Implement get_maximum_gpfn hypercall for arm

Jaeyong Yoo (2):
  xen/arm: Implement virtual-linear page table for guest p2m mapping in
    live migration
  xen/arm: Implement hypercall for dirty page tracing

 config/arm32.mk                        |   1 +
 tools/libxc/Makefile                   |   6 +-
 tools/libxc/xc_arm_migrate.c           | 712 +++++++++++++++++++++++++++++++++
 tools/libxc/xc_dom_arm.c               |   4 +-
 tools/libxc/xc_resume.c                |  25 ++
 tools/misc/Makefile                    |   4 +-
 xen/arch/arm/Makefile                  |   1 +
 xen/arch/arm/domain.c                  |  19 +
 xen/arch/arm/domctl.c                  |  98 ++++-
 xen/arch/arm/hvm.c                     | 464 ++++++++++++++++++++-
 xen/arch/arm/mm.c                      | 237 ++++++++++-
 xen/arch/arm/p2m.c                     | 206 ++++++++++
 xen/arch/arm/save.c                    |  66 +++
 xen/arch/arm/traps.c                   |   9 +
 xen/common/Makefile                    |   2 +
 xen/include/asm-arm/arm32/page.h       |  41 +-
 xen/include/asm-arm/config.h           |   5 +
 xen/include/asm-arm/domain.h           |  14 +
 xen/include/asm-arm/hvm/support.h      |  29 ++
 xen/include/asm-arm/mm.h               |  24 ++
 xen/include/asm-arm/p2m.h              |   4 +
 xen/include/asm-arm/processor.h        |   2 +
 xen/include/public/arch-arm/hvm/save.h | 130 ++++++
 23 files changed, 2084 insertions(+), 19 deletions(-)
 create mode 100644 tools/libxc/xc_arm_migrate.c
 create mode 100644 xen/arch/arm/save.c
 create mode 100644 xen/include/asm-arm/hvm/support.h

-- 
1.8.1.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Jaeyong Yoo

2013-Nov-08 07:50 UTC

head link

[PATCH RESEND v5 1/6] xen/arm: Implement hvm save and restore

From: Evgeny Fedotov <e.fedotov@samsung.com>

Implement save/restore of hvm context hypercall.
In hvm context save/restore, we save gic, timer and vfp registers.

Changes from v4: Save vcpu registers within hvm context, and purge
the save-vcpu-register patch.

Singed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
---
 xen/arch/arm/Makefile                  |   1 +
 xen/arch/arm/domctl.c                  |  89 ++++++-
 xen/arch/arm/hvm.c                     | 464 ++++++++++++++++++++++++++++++++-
 xen/arch/arm/save.c                    |  66 +++++
 xen/common/Makefile                    |   2 +
 xen/include/asm-arm/hvm/support.h      |  29 +++
 xen/include/public/arch-arm/hvm/save.h | 130 +++++++++
 7 files changed, 779 insertions(+), 2 deletions(-)
 create mode 100644 xen/arch/arm/save.c
 create mode 100644 xen/include/asm-arm/hvm/support.h

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 003ac84..8910a6c 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -31,6 +31,7 @@ obj-y += vuart.o
 obj-y += hvm.o
 obj-y += device.o
 obj-y += decode.o
+obj-y += save.o
 
 #obj-bin-y += ....o
 
diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
index 851ee40..cb38e59 100644
--- a/xen/arch/arm/domctl.c
+++ b/xen/arch/arm/domctl.c
@@ -9,12 +9,99 @@
 #include <xen/lib.h>
 #include <xen/errno.h>
 #include <xen/sched.h>
+#include <xen/hvm/save.h>
+#include <xen/guest_access.h>
 #include <public/domctl.h>
 
 long arch_do_domctl(struct xen_domctl *domctl, struct domain *d,
                     XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
 {
-    return -ENOSYS;
+    long ret = 0;
+    bool_t copyback = 0;
+
+    switch ( domctl->cmd )
+    {
+    case XEN_DOMCTL_sethvmcontext:
+    {
+        struct hvm_domain_context c = { .size = domctl->u.hvmcontext.size };
+
+        ret = -ENOMEM;
+        if ( (c.data = xmalloc_bytes(c.size)) == NULL )
+            goto sethvmcontext_out;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(c.data, domctl->u.hvmcontext.buffer, c.size) !=
0 )
+            goto sethvmcontext_out;
+
+        domain_pause(d);
+        ret = hvm_load(d, &c);
+        domain_unpause(d);
+
+    sethvmcontext_out:
+        if ( c.data != NULL )
+            xfree(c.data);
+    }
+    break;
+    case XEN_DOMCTL_gethvmcontext:
+    {
+        struct hvm_domain_context c = { 0 };
+
+        ret = -EINVAL;
+
+        c.size = hvm_save_size(d);
+
+        if ( guest_handle_is_null(domctl->u.hvmcontext.buffer) )
+        {
+            /* Client is querying for the correct buffer size */
+            domctl->u.hvmcontext.size = c.size;
+            ret = 0;
+            goto gethvmcontext_out;
+        }
+
+        /* Check that the client has a big enough buffer */
+        ret = -ENOSPC;
+        if ( domctl->u.hvmcontext.size < c.size )
+        {
+            printk("(gethvmcontext) size error: %d and %d\n",
+                   domctl->u.hvmcontext.size, c.size );
+            goto gethvmcontext_out;
+        }
+
+        /* Allocate our own marshalling buffer */
+        ret = -ENOMEM;
+        if ( (c.data = xmalloc_bytes(c.size)) == NULL )
+        {
+            printk("(gethvmcontext) xmalloc_bytes failed: %d\n",
c.size );
+            goto gethvmcontext_out;
+        }
+
+        domain_pause(d);
+        ret = hvm_save(d, &c);
+        domain_unpause(d);
+
+        domctl->u.hvmcontext.size = c.cur;
+        if ( copy_to_guest(domctl->u.hvmcontext.buffer, c.data, c.size) != 0
)
+        {
+            printk("(gethvmcontext) copy to guest failed\n");
+            ret = -EFAULT;
+        }
+
+    gethvmcontext_out:
+        copyback = 1;
+
+        if ( c.data != NULL )
+            xfree(c.data);
+    }
+    break;
+
+    default:
+        return -EINVAL;
+    }
+
+    if ( copyback && __copy_to_guest(u_domctl, domctl, 1) )
+        ret = -EFAULT;
+
+    return ret;
 }
 
 void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c
index 471c4cd..01ce2e7 100644
--- a/xen/arch/arm/hvm.c
+++ b/xen/arch/arm/hvm.c
@@ -7,14 +7,15 @@
 
 #include <xsm/xsm.h>
 
+#include <xen/hvm/save.h>
 #include <public/xen.h>
 #include <public/hvm/params.h>
 #include <public/hvm/hvm_op.h>
 
 #include <asm/hypercall.h>
+#include <asm/gic.h>
 
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
-
 {
     long rc = 0;
 
@@ -65,3 +66,464 @@ long do_hvm_op(unsigned long op,
XEN_GUEST_HANDLE_PARAM(void) arg)
 
     return rc;
 }
+
+static int vgic_irq_rank_save(struct vgic_rank *ext,
+                               struct vgic_irq_rank *rank)
+{
+    spin_lock(&rank->lock);
+    /* Some of VGIC registers are not used yet, it is for a future usage */
+    /* IENABLE, IACTIVE, IPEND,  PENDSGI registers */
+    ext->ienable = rank->ienable;
+    ext->iactive = rank->iactive;
+    ext->ipend = rank->ipend;
+    ext->pendsgi = rank->pendsgi;
+    /* ICFG */
+    ext->icfg[0] = rank->icfg[0];
+    ext->icfg[1] = rank->icfg[1];
+    /* IPRIORITY */
+    if ( sizeof(rank->ipriority) != sizeof (ext->ipriority) )
+    {
+        dprintk(XENLOG_G_ERR, "hvm_hw_gic: check ipriority dumping
space\n");
+        return -EINVAL;
+    }
+    memcpy(ext->ipriority, rank->ipriority, sizeof(rank->ipriority));
+    /* ITARGETS */
+    if ( sizeof(rank->itargets) != sizeof (ext->itargets) )
+    {
+        dprintk(XENLOG_G_ERR, "hvm_hw_gic: check itargets dumping
space\n");
+        return -EINVAL;
+    }
+    memcpy(ext->itargets, rank->itargets, sizeof(rank->itargets));
+    spin_unlock(&rank->lock);
+    return 0;
+}
+
+static int vgic_irq_rank_restore(struct vgic_irq_rank *rank,
+                                  struct vgic_rank *ext)
+{
+    spin_lock(&rank->lock);
+    /* IENABLE, IACTIVE, IPEND,  PENDSGI registers */
+    rank->ienable = ext->ienable;
+    rank->iactive = ext->iactive;
+    rank->ipend = ext->ipend;
+    rank->pendsgi = ext->pendsgi;
+    /* ICFG */
+    rank->icfg[0] = ext->icfg[0];
+    rank->icfg[1] = ext->icfg[1];
+    /* IPRIORITY */
+    if ( sizeof(rank->ipriority) != sizeof (ext->ipriority) )
+    {
+        dprintk(XENLOG_G_ERR, "hvm_hw_gic: check ipriority dumping
space\n");
+        return -EINVAL;
+    }
+    memcpy(rank->ipriority, ext->ipriority, sizeof(rank->ipriority));
+    /* ITARGETS */
+    if ( sizeof(rank->itargets) != sizeof (ext->itargets) )
+    {
+        dprintk(XENLOG_G_ERR, "hvm_hw_gic: check itargets dumping
space\n");
+        return -EINVAL;
+    }
+    memcpy(rank->itargets, ext->itargets, sizeof(rank->itargets));
+    spin_unlock(&rank->lock);
+    return 0;
+}
+
+
+static int gic_save(struct domain *d, hvm_domain_context_t *h)
+{
+    struct hvm_hw_gic ctxt;
+    struct vcpu *v;
+
+    /* Save the state of GICs */
+    for_each_vcpu( d, v )
+    {
+        ctxt.gic_hcr = v->arch.gic_hcr;
+        ctxt.gic_vmcr = v->arch.gic_vmcr;
+        ctxt.gic_apr = v->arch.gic_apr;
+
+        /* Save list registers and masks */
+        /* (it is not necessary to save/restore them, but LR state can have
+         * influence on downtime after Live Migration (to be tested)
+         */
+        if ( sizeof(v->arch.gic_lr) > sizeof (ctxt.gic_lr) )
+        {
+             dprintk(XENLOG_G_ERR, "hvm_hw_gic: increase LR dumping
space\n");
+             return -EINVAL;
+        }
+        memcpy(ctxt.gic_lr, v->arch.gic_lr, sizeof(v->arch.gic_lr));
+        ctxt.lr_mask = v->arch.lr_mask;
+        ctxt.event_mask = v->arch.event_mask;
+
+        /* Save PPI states (per-CPU) */
+        /* It is necessary if SMP enabled */
+        if ( vgic_irq_rank_save(&ctxt.ppi_state, 
&v->arch.vgic.private_irqs) )
+            return 1;
+
+        if ( hvm_save_entry(GIC, v->vcpu_id, h, &ctxt) != 0 )
+            return 1;
+    }
+    return 0;
+}
+
+static int gic_load(struct domain *d, hvm_domain_context_t *h)
+{
+    int vcpuid;
+    struct hvm_hw_gic ctxt;
+    struct vcpu *v;
+
+    /* Which vcpu is this? */
+    vcpuid = hvm_load_instance(h);
+    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
+    {
+        dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n",
+                d->domain_id, vcpuid);
+        return -EINVAL;
+    }
+
+    if ( hvm_load_entry(GIC, h, &ctxt) != 0 )
+        return -EINVAL;
+
+    v->arch.gic_hcr = ctxt.gic_hcr;
+    v->arch.gic_vmcr = ctxt.gic_vmcr;
+    v->arch.gic_apr = ctxt.gic_apr;
+
+    /* Restore list registers and masks */
+    if ( sizeof(v->arch.gic_lr) > sizeof (ctxt.gic_lr) )
+    {
+         dprintk(XENLOG_G_ERR, "hvm_hw_gic: increase LR dumping
space\n");
+         return -EINVAL;
+    }
+    memcpy(v->arch.gic_lr, ctxt.gic_lr, sizeof(v->arch.gic_lr));
+    v->arch.lr_mask = ctxt.lr_mask;
+    v->arch.event_mask = ctxt.event_mask;
+
+    /* Restore PPI states */
+    if ( vgic_irq_rank_restore(&v->arch.vgic.private_irqs,
&ctxt.ppi_state) )
+        return 1;
+
+    return 0;
+}
+
+HVM_REGISTER_SAVE_RESTORE(GIC, gic_save, gic_load, 1, HVMSR_PER_VCPU);
+
+static int timer_save(struct domain *d, hvm_domain_context_t *h)
+{
+    struct hvm_hw_timer ctxt;
+    struct vcpu *v;
+    struct vtimer *t;
+    int i;
+
+    /* Save the state of vtimer and ptimer */
+    for_each_vcpu( d, v )
+    {
+        t = &v->arch.virt_timer;
+        for ( i = 0; i < 2; i++ )
+        {
+            ctxt.cval = t->cval;
+            ctxt.ctl = t->ctl;
+            ctxt.vtb_offset = i ? d->arch.phys_timer_base.offset :
+                d->arch.virt_timer_base.offset;
+            ctxt.type = i ? TIMER_TYPE_PHYS : TIMER_TYPE_VIRT;
+            if ( hvm_save_entry(A15_TIMER, v->vcpu_id, h, &ctxt) != 0 )
+                return 1;
+            t = &v->arch.phys_timer;
+        }
+    }
+
+    return 0;
+}
+
+static int timer_load(struct domain *d, hvm_domain_context_t *h)
+{
+    int vcpuid;
+    struct hvm_hw_timer ctxt;
+    struct vcpu *v;
+    struct vtimer *t = NULL;
+
+    /* Which vcpu is this? */
+    vcpuid = hvm_load_instance(h);
+
+    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
+    {
+        dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n",
+                d->domain_id, vcpuid);
+        return -EINVAL;
+    }
+
+    if ( hvm_load_entry(A15_TIMER, h, &ctxt) != 0 )
+        return -EINVAL;
+
+
+    if ( ctxt.type == TIMER_TYPE_VIRT )
+    {
+        t = &v->arch.virt_timer;
+        d->arch.virt_timer_base.offset = ctxt.vtb_offset;
+
+    }
+    else
+    {
+        t = &v->arch.phys_timer;
+        d->arch.phys_timer_base.offset = ctxt.vtb_offset;
+    }
+
+    t->cval = ctxt.cval;
+    t->ctl = ctxt.ctl;
+    t->v = v;
+
+    return 0;
+}
+
+HVM_REGISTER_SAVE_RESTORE(A15_TIMER, timer_save, timer_load, 2,
HVMSR_PER_VCPU);
+
+static int cpu_save(struct domain *d, hvm_domain_context_t *h)
+{
+    struct hvm_hw_cpu ctxt;
+    struct vcpu_guest_core_regs c;
+    struct vcpu *v;
+
+    /* Save the state of CPU */
+    for_each_vcpu( d, v )
+    {
+        memset(&ctxt, 0, sizeof(ctxt));
+
+        ctxt.sctlr = v->arch.sctlr;
+        ctxt.ttbr0 = v->arch.ttbr0;
+        ctxt.ttbr1 = v->arch.ttbr1;
+        ctxt.ttbcr = v->arch.ttbcr;
+
+        ctxt.dacr = v->arch.dacr;
+        ctxt.ifar = v->arch.ifar;
+        ctxt.ifsr = v->arch.ifsr;
+        ctxt.dfar = v->arch.dfar;
+        ctxt.dfsr = v->arch.dfsr;
+
+#ifdef CONFIG_ARM_32
+        ctxt.mair0 = v->arch.mair0;
+        ctxt.mair1 = v->arch.mair1;
+#else
+        ctxt.mair0 = v->arch.mair;
+#endif
+        /* Control Registers */
+        ctxt.actlr = v->arch.actlr;
+        ctxt.sctlr = v->arch.sctlr;
+        ctxt.cpacr = v->arch.cpacr;
+
+        ctxt.contextidr = v->arch.contextidr;
+        ctxt.tpidr_el0 = v->arch.tpidr_el0;
+        ctxt.tpidr_el1 = v->arch.tpidr_el1;
+        ctxt.tpidrro_el0 = v->arch.tpidrro_el0;
+
+        /* CP 15 */
+        ctxt.csselr = v->arch.csselr;
+
+        ctxt.afsr0 = v->arch.afsr0;
+        ctxt.afsr1 = v->arch.afsr1;
+        ctxt.vbar = v->arch.vbar;
+        ctxt.par = v->arch.par;
+        ctxt.teecr = v->arch.teecr;
+        ctxt.teehbr = v->arch.teehbr;
+        ctxt.joscr = v->arch.joscr;
+        ctxt.jmcr = v->arch.jmcr;
+
+        memset(&c, 0, sizeof(c));
+
+        /* get guest core registers */
+        vcpu_regs_hyp_to_user(v, &c);
+
+        ctxt.x0 = c.x0;
+        ctxt.x1 = c.x1;
+        ctxt.x2 = c.x2;
+        ctxt.x3 = c.x3;
+        ctxt.x4 = c.x4;
+        ctxt.x5 = c.x5;
+        ctxt.x6 = c.x6;
+        ctxt.x7 = c.x7;
+        ctxt.x8 = c.x8;
+        ctxt.x9 = c.x9;
+        ctxt.x10 = c.x10;
+        ctxt.x11 = c.x11;
+        ctxt.x12 = c.x12;
+        ctxt.x13 = c.x13;
+        ctxt.x14 = c.x14;
+        ctxt.x15 = c.x15;
+        ctxt.x16 = c.x16;
+        ctxt.x17 = c.x17;
+        ctxt.x18 = c.x18;
+        ctxt.x19 = c.x19;
+        ctxt.x20 = c.x20;
+        ctxt.x21 = c.x21;
+        ctxt.x22 = c.x22;
+        ctxt.x23 = c.x23;
+        ctxt.x24 = c.x24;
+        ctxt.x25 = c.x25;
+        ctxt.x26 = c.x26;
+        ctxt.x27 = c.x27;
+        ctxt.x28 = c.x28;
+        ctxt.x29 = c.x29;
+        ctxt.x30 = c.x30;
+        ctxt.pc64 = c.pc64;
+        ctxt.cpsr = c.cpsr;
+        ctxt.spsr_el1 = c.spsr_el1; /* spsr_svc */
+
+        #ifdef CONFIG_ARM_32
+                ctxt.spsr_fiq = c.spsr_fiq;
+                ctxt.spsr_irq = c.spsr_irq;
+                ctxt.spsr_und = c.spsr_und;
+                ctxt.spsr_abt = c.spsr_abt;
+        #endif
+        #ifdef CONFIG_ARM_64
+                ctxt.sp_el0 = c.sp_el0;
+                ctxt.sp_el1 = c.sp_el1;
+                ctxt.elr_el1 = c.elr_el1;
+        #endif
+
+        /* check VFP state size before dumping */
+        if ( sizeof(v->arch.vfp) > sizeof (ctxt.vfp) )
+        {
+            dprintk(XENLOG_G_ERR, "hvm_hw_cpu: increase VFP dumping
space\n");
+            return -EINVAL;
+        }
+        memcpy((void*) &ctxt.vfp, (void*) &v->arch.vfp,
sizeof(v->arch.vfp));
+
+        ctxt.pause_flags = v->pause_flags;
+
+        if ( hvm_save_entry(VCPU, v->vcpu_id, h, &ctxt) != 0 )
+            return 1;
+    }
+    return 0;
+}
+
+static int cpu_load(struct domain *d, hvm_domain_context_t *h)
+{
+    int vcpuid;
+    struct hvm_hw_cpu ctxt;
+    struct vcpu *v;
+    struct vcpu_guest_core_regs c;
+
+    /* Which vcpu is this? */
+    vcpuid = hvm_load_instance(h);
+    if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
+    {
+        dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n",
+                d->domain_id, vcpuid);
+        return -EINVAL;
+    }
+
+    if ( hvm_load_entry(VCPU, h, &ctxt) != 0 )
+        return -EINVAL;
+
+    v->arch.sctlr = ctxt.sctlr;
+    v->arch.ttbr0 = ctxt.ttbr0;
+    v->arch.ttbr1 = ctxt.ttbr1;
+    v->arch.ttbcr = ctxt.ttbcr;
+
+    v->arch.dacr = ctxt.dacr;
+    v->arch.ifar = ctxt.ifar;
+    v->arch.ifsr = ctxt.ifsr;
+    v->arch.dfar = ctxt.dfar;
+    v->arch.dfsr = ctxt.dfsr;
+
+#ifdef CONFIG_ARM_32
+    v->arch.mair0 = ctxt.mair0;
+    v->arch.mair1 = ctxt.mair1;
+#else
+    v->arch.mair = ctxt.mair0;
+#endif
+
+    /* Control Registers */
+    v->arch.actlr = ctxt.actlr;
+    v->arch.cpacr = ctxt.cpacr;
+    v->arch.contextidr = ctxt.contextidr;
+    v->arch.tpidr_el0 = ctxt.tpidr_el0;
+    v->arch.tpidr_el1 = ctxt.tpidr_el1;
+    v->arch.tpidrro_el0 = ctxt.tpidrro_el0;
+
+    /* CP 15 */
+    v->arch.csselr = ctxt.csselr;
+
+    v->arch.afsr0 = ctxt.afsr0;
+    v->arch.afsr1 = ctxt.afsr1;
+    v->arch.vbar = ctxt.vbar;
+    v->arch.par = ctxt.par;
+    v->arch.teecr = ctxt.teecr;
+    v->arch.teehbr = ctxt.teehbr;
+    v->arch.joscr = ctxt.joscr;
+    v->arch.jmcr = ctxt.jmcr;
+
+    /* fill guest core registers */
+    memset(&c, 0, sizeof(c));
+    c.x0 = ctxt.x0;
+    c.x1 = ctxt.x1;
+    c.x2 = ctxt.x2;
+    c.x3 = ctxt.x3;
+    c.x4 = ctxt.x4;
+    c.x5 = ctxt.x5;
+    c.x6 = ctxt.x6;
+    c.x7 = ctxt.x7;
+    c.x8 = ctxt.x8;
+    c.x9 = ctxt.x9;
+    c.x10 = ctxt.x10;
+    c.x11 = ctxt.x11;
+    c.x12 = ctxt.x12;
+    c.x13 = ctxt.x13;
+    c.x14 = ctxt.x14;
+    c.x15 = ctxt.x15;
+    c.x16 = ctxt.x16;
+    c.x17 = ctxt.x17;
+    c.x18 = ctxt.x18;
+    c.x19 = ctxt.x19;
+    c.x20 = ctxt.x20;
+    c.x21 = ctxt.x21;
+    c.x22 = ctxt.x22;
+    c.x23 = ctxt.x23;
+    c.x24 = ctxt.x24;
+    c.x25 = ctxt.x25;
+    c.x26 = ctxt.x26;
+    c.x27 = ctxt.x27;
+    c.x28 = ctxt.x28;
+    c.x29 = ctxt.x29;
+    c.x30 = ctxt.x30;
+    c.pc64 = ctxt.pc64;
+    c.cpsr = ctxt.cpsr;
+    c.spsr_el1 = ctxt.spsr_el1; /* spsr_svc */
+
+   #ifdef CONFIG_ARM_32
+    c.spsr_fiq = ctxt.spsr_fiq;
+    c.spsr_irq = ctxt.spsr_irq;
+    c.spsr_und = ctxt.spsr_und;
+    c.spsr_abt = ctxt.spsr_abt;
+   #endif
+   #ifdef CONFIG_ARM_64
+    c.sp_el0 = ctxt.sp_el0;
+    c.sp_el1 = ctxt.sp_el1;
+    c.elr_el1 = ctxt.elr_el1;
+   #endif
+
+    /* set guest core registers */
+    vcpu_regs_user_to_hyp(v, &c);
+
+    if ( sizeof(v->arch.vfp) > sizeof (ctxt.vfp) )
+    {
+        dprintk(XENLOG_G_ERR, "hvm_hw_cpu: increase VFP dumping
space\n");
+        return -EINVAL;
+    }
+
+    memcpy(&v->arch.vfp, &ctxt,  sizeof(v->arch.vfp));
+
+    v->is_initialised = 1;
+    v->pause_flags = ctxt.pause_flags;
+
+    return 0;
+}
+
+HVM_REGISTER_SAVE_RESTORE(VCPU, cpu_save, cpu_load, 1, HVMSR_PER_VCPU);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/save.c b/xen/arch/arm/save.c
new file mode 100644
index 0000000..c923910
--- /dev/null
+++ b/xen/arch/arm/save.c
@@ -0,0 +1,66 @@
+/*
+ * hvm/save.c: Save and restore HVM guest''s emulated hardware state
for ARM.
+ *
+ * Copyright (c) 2013, Samsung Electronics.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <asm/hvm/support.h>
+#include <public/hvm/save.h>
+
+void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr)
+{
+    hdr->cpuid = READ_SYSREG32(MIDR_EL1);
+}
+
+int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr)
+{
+    uint32_t cpuid;
+
+    if ( hdr->magic != HVM_FILE_MAGIC )
+    {
+        printk(XENLOG_G_ERR "HVM%d restore: bad magic number
%#"PRIx32"\n",
+               d->domain_id, hdr->magic);
+        return -1;
+    }
+
+    if ( hdr->version != HVM_FILE_VERSION )
+    {
+        printk(XENLOG_G_ERR "HVM%d restore: unsupported version
%u\n",
+               d->domain_id, hdr->version);
+        return -1;
+    }
+
+    cpuid = READ_SYSREG32(MIDR_EL1);
+    if ( hdr->cpuid != cpuid )
+    {
+        printk(XENLOG_G_INFO "HVM%d restore: VM saved on one CPU "
+               "(%#"PRIx32") and restored on another
(%#"PRIx32").\n",
+               d->domain_id, hdr->cpuid, cpuid);
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 686f7a1..f943302 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -63,6 +63,8 @@ subdir-$(CONFIG_COMPAT) += compat
 
 subdir-$(x86_64) += hvm
 
+subdir-$(CONFIG_ARM) += hvm
+
 subdir-$(coverage) += gcov
 
 subdir-y += libelf
diff --git a/xen/include/asm-arm/hvm/support.h
b/xen/include/asm-arm/hvm/support.h
new file mode 100644
index 0000000..8311f2f
--- /dev/null
+++ b/xen/include/asm-arm/hvm/support.h
@@ -0,0 +1,29 @@
+/*
+ * support.h: HVM support routines used by ARMv7 VE.
+ *
+ * Copyright (c) 2012, Citrix Systems
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#ifndef __ASM_ARM_HVM_SUPPORT_H__
+#define __ASM_ARM_HVM_SUPPORT_H__
+
+#include <xen/types.h>
+#include <public/hvm/ioreq.h>
+#include <xen/sched.h>
+#include <xen/hvm/save.h>
+#include <asm/processor.h>
+
+#endif /* __ASM_ARM_HVM_SUPPORT_H__ */
diff --git a/xen/include/public/arch-arm/hvm/save.h
b/xen/include/public/arch-arm/hvm/save.h
index 75b8e65..1f71c41 100644
--- a/xen/include/public/arch-arm/hvm/save.h
+++ b/xen/include/public/arch-arm/hvm/save.h
@@ -26,6 +26,136 @@
 #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__
 #define __XEN_PUBLIC_HVM_SAVE_ARM_H__
 
+#define HVM_FILE_MAGIC   0x92385520
+#define HVM_FILE_VERSION 0x00000001
+
+
+struct hvm_save_header
+{
+    uint32_t magic;             /* Must be HVM_FILE_MAGIC */
+    uint32_t version;           /* File format version */
+    uint64_t changeset;         /* Version of Xen that saved this file */
+    uint32_t cpuid;             /* MIDR_EL1 on the saving machine */
+};
+
+DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header);
+
+struct vgic_rank
+{
+    uint32_t ienable, iactive, ipend, pendsgi;
+    uint32_t icfg[2];
+    uint32_t ipriority[8];
+    uint32_t itargets[8];
+};
+
+struct hvm_hw_gic
+{
+    uint32_t gic_hcr;
+    uint32_t gic_vmcr;
+    uint32_t gic_apr;
+    uint32_t gic_lr[64];
+    uint64_t event_mask;
+    uint64_t lr_mask;
+    struct vgic_rank ppi_state;
+};
+
+DECLARE_HVM_SAVE_TYPE(GIC, 2, struct hvm_hw_gic);
+
+#define TIMER_TYPE_VIRT 0
+#define TIMER_TYPE_PHYS 1
+
+struct hvm_hw_timer
+{
+    uint64_t vtb_offset;
+    uint32_t ctl;
+    uint64_t cval;
+    uint32_t type;
+};
+
+DECLARE_HVM_SAVE_TYPE(A15_TIMER, 3, struct hvm_hw_timer);
+
+
+struct hvm_hw_cpu
+{
+    uint64_t vfp[34]; /* Vector floating pointer */
+    /* VFP v3 state is 34x64 bit, VFP v4 is not yet supported */
+
+    /* Guest core registers */
+    uint64_t x0;     /* r0_usr */
+    uint64_t x1;     /* r1_usr */
+    uint64_t x2;     /* r2_usr */
+    uint64_t x3;     /* r3_usr */
+    uint64_t x4;     /* r4_usr */
+    uint64_t x5;     /* r5_usr */
+    uint64_t x6;     /* r6_usr */
+    uint64_t x7;     /* r7_usr */
+    uint64_t x8;     /* r8_usr */
+    uint64_t x9;     /* r9_usr */
+    uint64_t x10;    /* r10_usr */
+    uint64_t x11;    /* r11_usr */
+    uint64_t x12;    /* r12_usr */
+    uint64_t x13;    /* sp_usr */
+    uint64_t x14;    /* lr_usr; */
+    uint64_t x15;    /* __unused_sp_hyp */
+    uint64_t x16;    /* lr_irq */
+    uint64_t x17;    /* sp_irq */
+    uint64_t x18;    /* lr_svc */
+    uint64_t x19;    /* sp_svc */
+    uint64_t x20;    /* lr_abt */
+    uint64_t x21;    /* sp_abt */
+    uint64_t x22;    /* lr_und */
+    uint64_t x23;    /* sp_und */
+    uint64_t x24;    /* r8_fiq */
+    uint64_t x25;    /* r9_fiq */
+    uint64_t x26;    /* r10_fiq */
+    uint64_t x27;    /* r11_fiq */
+    uint64_t x28;    /* r12_fiq */
+    uint64_t x29;    /* fp,sp_fiq */
+    uint64_t x30;    /* lr_fiq */
+    uint64_t pc64;   /* ELR_EL2 */
+    uint32_t cpsr;   /* SPSR_EL2 */
+    uint32_t spsr_el1;  /*spsr_svc */
+    /* AArch32 guests only */
+    uint32_t spsr_fiq, spsr_irq, spsr_und, spsr_abt;
+    /* AArch64 guests only */
+    uint64_t sp_el0;
+    uint64_t sp_el1, elr_el1;
+
+    uint32_t sctlr, ttbcr;
+    uint64_t ttbr0, ttbr1;
+
+    uint32_t ifar, dfar;
+    uint32_t ifsr, dfsr;
+    uint32_t dacr;
+    uint64_t par;
+
+    uint64_t mair0, mair1;
+    uint64_t tpidr_el0;
+    uint64_t tpidr_el1;
+    uint64_t tpidrro_el0;
+    uint64_t vbar;
+
+    /* Control Registers */
+    uint32_t actlr;
+    uint32_t cpacr;
+    uint32_t afsr0, afsr1;
+    uint32_t contextidr;
+    uint32_t teecr, teehbr; /* ThumbEE, 32-bit guests only */
+    uint32_t joscr, jmcr;
+    /* CP 15 */
+    uint32_t csselr;
+
+    unsigned long pause_flags;
+
+};
+
+DECLARE_HVM_SAVE_TYPE(VCPU, 4, struct hvm_hw_cpu);
+
+/*
+ * Largest type-code in use
+ */
+#define HVM_SAVE_CODE_MAX 4
+
 #endif
 
 /*
-- 
1.8.1.2

Jaeyong Yoo

2013-Nov-08 07:50 UTC

head link

[PATCH RESEND v5 2/6] xen/arm: Implement get_maximum_gpfn hypercall for arm

From: Evgeny Fedotov <e.fedotov@samsung.com>

By using the memory map info in arch_domain (from set_memory_map hypercall)
implement get_maximum_gpfn hypercall.

Changes from v4: Use GUEST_RAM_BASE as the start physical address of guest
RAM. And, purge set-memory-map patch

Singed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
---
 xen/arch/arm/mm.c        | 22 +++++++++++++++++++++-
 xen/include/asm-arm/mm.h |  2 ++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 123280e..3801f07 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -927,7 +927,11 @@ int page_is_ram_type(unsigned long mfn, unsigned long
mem_type)
 
 unsigned long domain_get_maximum_gpfn(struct domain *d)
 {
-    return -ENOSYS;
+    paddr_t end;
+
+    get_gma_start_end(d, NULL, &end);
+
+    return (unsigned long) (end >> PAGE_SHIFT);
 }
 
 void share_xen_page_with_guest(struct page_info *page,
@@ -1308,6 +1312,22 @@ int is_iomem_page(unsigned long mfn)
         return 1;
     return 0;
 }
+
+/* Guest RAM base */
+#define GUEST_RAM_BASE 0x80000000
+/*
+ *  XXX: Use correct definition for RAM base when the following patch
+ *  xen: arm: 64-bit guest support and domU FDT autogeneration
+ *  will be upstreamed.
+ */
+void get_gma_start_end(struct domain *d, paddr_t *start, paddr_t *end)
+{
+    if ( start )
+        *start = GUEST_RAM_BASE;
+    if ( end )
+        *end = GUEST_RAM_BASE + ((paddr_t) d->max_pages <<
PAGE_SHIFT);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index ce66099..c5cb3af 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -341,6 +341,8 @@ static inline void put_page_and_type(struct page_info *page)
     put_page(page);
 }
 
+void get_gma_start_end(struct domain *d, paddr_t *start, paddr_t *end);
+
 #endif /*  __ARCH_ARM_MM__ */
 /*
  * Local variables:
-- 
1.8.1.2

Jaeyong Yoo

2013-Nov-08 07:50 UTC

head link

[PATCH RESEND v5 3/6] xen/arm: Implement modify_returncode

From: Alexey Sokolov <sokolov.a@samsung.com>

Making sched_op in do_suspend (driver/xen/manage.c) returns 0 on the success of
suspend.

Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com>
---
 tools/libxc/xc_resume.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
index 18b4818..9315eb8 100644
--- a/tools/libxc/xc_resume.c
+++ b/tools/libxc/xc_resume.c
@@ -73,6 +73,31 @@ static int modify_returncode(xc_interface *xch, uint32_t
domid)
     return 0;
 }
 
+#elif defined(__arm__)
+
+static int modify_returncode(xc_interface *xch, uint32_t domid)
+{
+    vcpu_guest_context_any_t ctxt;
+    xc_dominfo_t info;
+    int rc;
+
+    if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 )
+    {
+        PERROR("Could not get domain info");
+        return -1;
+    }
+
+    if ( (rc = xc_vcpu_getcontext(xch, domid, 0, &ctxt)) != 0 )
+        return rc;
+
+    ctxt.c.user_regs.r0_usr = 1;
+
+    if ( (rc = xc_vcpu_setcontext(xch, domid, 0, &ctxt)) != 0 )
+        return rc;
+
+    return 0;
+}
+
 #else
 
 static int modify_returncode(xc_interface *xch, uint32_t domid)
-- 
1.8.1.2

Jaeyong Yoo

2013-Nov-08 07:50 UTC

head link

[PATCH RESEND v5 4/6] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration

Implement vlpt (virtual-linear page table) for fast accessing of 3rd PTE of
guest p2m.
When creating a mapping for vlpt, just copy the 1st level PTE of guest p2m to
the xen''s
2nd level PTE. Then the mapping becomes the following:
    xen''s 1st PTE -->
    xen''s 2nd PTE (which is the same as 1st PTE of guest p2m) -->
    guest p2m''s 2nd PTE -->
    guest p2m''s 3rd PTE (the memory contents where the vlpt points)
For more info about vlpt, see:
http://www.technovelty.org/linux/virtual-linear-page-table.html

This function is used in dirty-page tracing: when domU write-fault is trapped by
xen,
xen can immediately locate the 3rd PTE of guest p2m.
The following link shows the performance comparison for handling a dirty-page
between
vlpt and typical page table walking.
http://lists.xen.org/archives/html/xen-devel/2013-08/msg01503.html

Changes from v4:
1. In the restoring vlpt, use __foo variant without barriers for write_pte and
   flush_xen_data_tlb_range_va.
2. Support two consecutive pages for guest''s first level page table.

Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
---
 xen/arch/arm/domain.c            |   5 ++
 xen/arch/arm/mm.c                | 112 +++++++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/arm32/page.h |  41 ++++++++++----
 xen/include/asm-arm/config.h     |   5 ++
 xen/include/asm-arm/domain.h     |   7 +++
 xen/include/asm-arm/mm.h         |  15 ++++++
 6 files changed, 174 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index cb0424d..c0b5dd8 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -509,6 +509,11 @@ int arch_domain_create(struct domain *d, unsigned int
domcr_flags)
     /* Default the virtual ID to match the physical */
     d->arch.vpidr = boot_cpu_data.midr.bits;
 
+    d->arch.dirty.second_lvl_start = 0;
+    d->arch.dirty.second_lvl_end = 0;
+    d->arch.dirty.second_lvl[0] = NULL;
+    d->arch.dirty.second_lvl[1] = NULL;
+
     clear_page(d->shared_info);
     share_xen_page_with_guest(
         virt_to_page(d->shared_info), d, XENSHARE_writable);
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 3801f07..bf13993 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1328,6 +1328,118 @@ void get_gma_start_end(struct domain *d, paddr_t *start,
paddr_t *end)
         *end = GUEST_RAM_BASE + ((paddr_t) d->max_pages <<
PAGE_SHIFT);
 }
 
+/* flush the vlpt area */
+void flush_vlpt(struct domain *d)
+{
+    int flush_size;
+    flush_size = (d->arch.dirty.second_lvl_end -
+                  d->arch.dirty.second_lvl_start) << SECOND_SHIFT;
+    /* flushing the 3rd level mapping */
+    flush_xen_data_tlb_range_va(d->arch.dirty.second_lvl_start <<
SECOND_SHIFT,
+                                flush_size);
+}
+
+/* restore the xen page table for vlpt mapping for domain d */
+void restore_vlpt(struct domain *d)
+{
+    int i;
+    dsb();
+    for ( i = d->arch.dirty.second_lvl_start;
+          i < d->arch.dirty.second_lvl_end;
+          ++i )
+    {
+        int k = i % LPAE_ENTRIES;
+        int l = i / LPAE_ENTRIES;
+
+        if ( xen_second[i].bits != d->arch.dirty.second_lvl[l][k].bits )
+        {
+            __write_pte(&xen_second[i], d->arch.dirty.second_lvl[l][k]);
+            __flush_xen_data_tlb_range_va(i << SECOND_SHIFT,
+                                          1 << SECOND_SHIFT);
+        }
+    }
+    dsb();
+    isb();
+}
+
+/* setting up the xen page table for vlpt mapping for domain d */
+int prepare_vlpt(struct domain *d)
+{
+    int xen_second_linear_base;
+    int gp2m_start_index, gp2m_end_index;
+    struct p2m_domain *p2m = &d->arch.p2m;
+    struct page_info *second_lvl_page;
+    paddr_t gma_start = 0;
+    paddr_t gma_end = 0;
+    lpae_t *first[2];
+    int i;
+    uint64_t required, avail = VIRT_LIN_P2M_END - VIRT_LIN_P2M_START;
+
+    get_gma_start_end(d, &gma_start, &gma_end);
+    required = (gma_end - gma_start) >> LPAE_SHIFT;
+
+
+    if ( required > avail )
+    {
+        dprintk(XENLOG_ERR, "Available VLPT is small for domU guest"
+                            "(avail: %llx, required: %llx)\n",
+                            avail, required);
+        return -ENOMEM;
+    }
+
+    xen_second_linear_base = second_linear_offset(VIRT_LIN_P2M_START);
+
+    gp2m_start_index = gma_start >> FIRST_SHIFT;
+    gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1;
+
+    if ( xen_second_linear_base + gp2m_end_index >= LPAE_ENTRIES * 2 )
+    {
+        dprintk(XENLOG_ERR, "xen second page is small for VLPT for
domU");
+        return -ENOMEM;
+    }
+
+    second_lvl_page = alloc_domheap_pages(NULL, 1, 0);
+    if ( second_lvl_page == NULL )
+        return -ENOMEM;
+
+    /* First level p2m is 2 consecutive pages */
+    d->arch.dirty.second_lvl[0] = map_domain_page_global(
+                                           page_to_mfn(second_lvl_page) );
+    d->arch.dirty.second_lvl[1] = map_domain_page_global(
+                                           page_to_mfn(second_lvl_page+1) );
+
+    first[0] = __map_domain_page(p2m->first_level);
+    first[1] = __map_domain_page(p2m->first_level+1);
+    for ( i = gp2m_start_index; i < gp2m_end_index; ++i )
+    {
+        int k = i % LPAE_ENTRIES;
+        int l = i / LPAE_ENTRIES;
+        int k2 = (xen_second_linear_base + i) % LPAE_ENTRIES;
+        int l2 = (xen_second_linear_base + i) / LPAE_ENTRIES;
+
+        write_pte(&xen_second[xen_second_linear_base+i], first[l][k]);
+
+        /* we copy the mapping into domain''s structure as a reference
+         * in case of the context switch (used in restore_vlpt) */
+        d->arch.dirty.second_lvl[l2][k2] = first[l][k];
+    }
+    unmap_domain_page(first[0]);
+    unmap_domain_page(first[1]);
+
+    /* storing the start and end index */
+    d->arch.dirty.second_lvl_start = xen_second_linear_base +
gp2m_start_index;
+    d->arch.dirty.second_lvl_end = xen_second_linear_base + gp2m_end_index;
+
+    flush_vlpt(d);
+    return 0;
+}
+
+void cleanup_vlpt(struct domain *d)
+{
+    /* First level p2m is 2 consecutive pages */
+    unmap_domain_page_global(d->arch.dirty.second_lvl[0]);
+    unmap_domain_page_global(d->arch.dirty.second_lvl[1]);
+}
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-arm/arm32/page.h b/xen/include/asm-arm/arm32/page.h
index cf12a89..0a4e115 100644
--- a/xen/include/asm-arm/arm32/page.h
+++ b/xen/include/asm-arm/arm32/page.h
@@ -5,20 +5,28 @@
 
 /* Write a pagetable entry.
  *
- * If the table entry is changing a text mapping, it is responsibility
- * of the caller to issue an ISB after write_pte.
+ * All necessary barriers are responsibility of the caller
  */
-static inline void write_pte(lpae_t *p, lpae_t pte)
+static inline void __write_pte(lpae_t *p, lpae_t pte)
 {
     asm volatile (
-        /* Ensure any writes have completed with the old mappings. */
-        "dsb;"
         /* Safely write the entry (STRD is atomic on CPUs that support LPAE) */
         "strd %0, %H0, [%1];"
-        "dsb;"
         : : "r" (pte.bits), "r" (p) : "memory");
 }
 
+/* Write a pagetable entry with dsb barriers.
+ *
+ * If the table entry is changing a text mapping, it is responsibility
+ * of the caller to issue an ISB after write_pte.
+ */
+static inline void write_pte(lpae_t *p, lpae_t pte)
+{
+    dsb();
+    __write_pte(p, pte);
+    dsb();
+}
+
 /* Inline ASM to flush dcache on register R (may be an inline asm operand) */
 #define __flush_xen_dcache_one(R) STORE_CP32(R, DCCMVAC)
 
@@ -57,18 +65,28 @@ static inline void flush_xen_data_tlb(void)
 }
 
 /*
- * Flush a range of VA''s hypervisor mappings from the data TLB. This
is not
- * sufficient when changing code mappings or for self modifying code.
+ * Flush a range of VA''s hypervisor mappings from the data TLB.
+ * All necessary barriers are responsibility of the caller
  */
-static inline void flush_xen_data_tlb_range_va(unsigned long va, unsigned long
size)
+static inline void __flush_xen_data_tlb_range_va(unsigned long va, unsigned
long size)
 {
     unsigned long end = va + size;
-    dsb(); /* Ensure preceding are visible */
     while ( va < end ) {
         asm volatile(STORE_CP32(0, TLBIMVAH)
                      : : "r" (va) : "memory");
         va += PAGE_SIZE;
     }
+}
+
+/*
+ * Flush a range of VA''s hypervisor mappings from the data TLB with
barriers.
+ * This barrier is not sufficient when changing code mappings or for self
+ * modifying code.
+ */
+static inline void flush_xen_data_tlb_range_va(unsigned long va, unsigned long
size)
+{
+    dsb(); /* Ensure preceding are visible */
+    __flush_xen_data_tlb_range_va(va, size);
     dsb(); /* Ensure completion of the TLB flush */
     isb();
 }
diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h
index 5b7b1a8..15ad56d 100644
--- a/xen/include/asm-arm/config.h
+++ b/xen/include/asm-arm/config.h
@@ -87,6 +87,7 @@
  *   0  -   8M   <COMMON>
  *
  *  32M - 128M   Frametable: 24 bytes per page for 16GB of RAM
+ * 128M - 256M   Virtual-linear mapping to P2M table
  * 256M -   1G   VMAP: ioremap and early_ioremap use this virtual address
  *                    space
  *
@@ -124,7 +125,9 @@
 #define CONFIG_SEPARATE_XENHEAP 1
 
 #define FRAMETABLE_VIRT_START  _AT(vaddr_t,0x02000000)
+#define VIRT_LIN_P2M_START     _AT(vaddr_t,0x08000000)
 #define VMAP_VIRT_START  _AT(vaddr_t,0x10000000)
+#define VIRT_LIN_P2M_END       VMAP_VIRT_START
 #define XENHEAP_VIRT_START     _AT(vaddr_t,0x40000000)
 #define XENHEAP_VIRT_END       _AT(vaddr_t,0x7fffffff)
 #define DOMHEAP_VIRT_START     _AT(vaddr_t,0x80000000)
@@ -157,6 +160,8 @@
 
 #define HYPERVISOR_VIRT_END    DIRECTMAP_VIRT_END
 
+/*TODO (ARM_64): define VIRT_LIN_P2M_START VIRT_LIN_P2M_END */
+
 #endif
 
 /* Fixmap slots */
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 67bfbbc..4f366f1 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -112,6 +112,13 @@ struct arch_domain
         spinlock_t                  lock;
     } vuart;
 
+    /* dirty-page tracing */
+    struct {
+        volatile int second_lvl_start;   /* for context switch */
+        volatile int second_lvl_end;
+        lpae_t *second_lvl[2];           /* copy of guest p2m''s first
*/
+    } dirty;
+
 }  __cacheline_aligned;
 
 struct arch_vcpu
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index c5cb3af..a74e135 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -342,6 +342,21 @@ static inline void put_page_and_type(struct page_info
*page)
 }
 
 void get_gma_start_end(struct domain *d, paddr_t *start, paddr_t *end);
+int prepare_vlpt(struct domain *d);
+void cleanup_vlpt(struct domain *d);
+void restore_vlpt(struct domain *d);
+
+/* calculate the xen''s virtual address for accessing the leaf PTE of
+ * a given address (GPA) */
+static inline lpae_t * get_vlpt_3lvl_pte(paddr_t addr)
+{
+    lpae_t *table = (lpae_t *)VIRT_LIN_P2M_START;
+
+    /* Since we slotted the guest''s first p2m page table to
xen''s
+     * second page table, one shift is enough for calculating the
+     * index of guest p2m table entry */
+    return &table[addr >> PAGE_SHIFT];
+}
 
 #endif /*  __ARCH_ARM_MM__ */
 /*
-- 
1.8.1.2

Jaeyong Yoo

2013-Nov-08 07:50 UTC

head link

[PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

Add hypercall (shadow op: enable/disable and clean/peek dirtied page bitmap).
It consists of two parts: dirty page detecting and saving.
For detecting, we setup the guest p2m''s leaf PTE read-only and whenever
the guest
tries to write something, permission fault happens and traps into xen.
The permission-faulted GPA should be saved for the toolstack (when it wants to
see
which pages are dirtied). For this purpose, we temporarily save the GPAs into
bitmap.

Changes from v4:
1. For temporary saving dirty pages, use bitmap rather than linked list.
2. Setup the p2m''s second level page as read-write in the view of
xen''s memory access.
   It happens in p2m_create_table function.

Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
---
 xen/arch/arm/domain.c           |  14 +++
 xen/arch/arm/domctl.c           |   9 ++
 xen/arch/arm/mm.c               | 103 +++++++++++++++++++-
 xen/arch/arm/p2m.c              | 206 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/arm/traps.c            |   9 ++
 xen/include/asm-arm/domain.h    |   7 ++
 xen/include/asm-arm/mm.h        |   7 ++
 xen/include/asm-arm/p2m.h       |   4 +
 xen/include/asm-arm/processor.h |   2 +
 9 files changed, 360 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index c0b5dd8..0a32301 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -215,6 +215,12 @@ static void ctxt_switch_to(struct vcpu *n)
     WRITE_SYSREG(hcr, HCR_EL2);
     isb();
 
+    /* for dirty-page tracing
+     * XXX: how do we consider SMP case?
+     */
+    if ( n->domain->arch.dirty.mode )
+        restore_vlpt(n->domain);
+
     /* This is could trigger an hardware interrupt from the virtual
      * timer. The interrupt needs to be injected into the guest. */
     virt_timer_restore(n);
@@ -509,11 +515,19 @@ int arch_domain_create(struct domain *d, unsigned int
domcr_flags)
     /* Default the virtual ID to match the physical */
     d->arch.vpidr = boot_cpu_data.midr.bits;
 
+    /* init for dirty-page tracing */
+    d->arch.dirty.count = 0;
+    d->arch.dirty.mode = 0;
+    spin_lock_init(&d->arch.dirty.lock);
+
     d->arch.dirty.second_lvl_start = 0;
     d->arch.dirty.second_lvl_end = 0;
     d->arch.dirty.second_lvl[0] = NULL;
     d->arch.dirty.second_lvl[1] = NULL;
 
+    memset(d->arch.dirty.bitmap, 0, sizeof(d->arch.dirty.bitmap));
+    d->arch.dirty.bitmap_pages = 0;
+
     clear_page(d->shared_info);
     share_xen_page_with_guest(
         virt_to_page(d->shared_info), d, XENSHARE_writable);
diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
index cb38e59..eb74225 100644
--- a/xen/arch/arm/domctl.c
+++ b/xen/arch/arm/domctl.c
@@ -93,6 +93,15 @@ long arch_do_domctl(struct xen_domctl *domctl, struct domain
*d,
             xfree(c.data);
     }
     break;
+    case XEN_DOMCTL_shadow_op:
+    {
+        domain_pause(d);
+        ret = dirty_mode_op(d, &domctl->u.shadow_op);
+        domain_unpause(d);
+
+        copyback = 1;
+    }
+    break;
 
     default:
         return -EINVAL;
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index bf13993..d5a0a11 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -845,7 +845,6 @@ void destroy_xen_mappings(unsigned long v, unsigned long e)
     create_xen_entries(REMOVE, v, 0, (e - v) >> PAGE_SHIFT, 0);
 }
 
-enum mg { mg_clear, mg_ro, mg_rw, mg_rx };
 static void set_pte_flags_on_range(const char *p, unsigned long l, enum mg mg)
 {
     lpae_t pte;
@@ -1320,6 +1319,60 @@ int is_iomem_page(unsigned long mfn)
  *  xen: arm: 64-bit guest support and domU FDT autogeneration
  *  will be upstreamed.
  */
+
+static inline void mark_dirty_bitmap(struct domain *d, paddr_t addr)
+{
+    paddr_t ram_base = (paddr_t) GUEST_RAM_BASE;
+    int bit_index = PFN_DOWN(addr - ram_base);
+    int page_index = bit_index >> (PAGE_SHIFT + 3);
+    int bit_index_residual = bit_index & ((1ul << (PAGE_SHIFT + 3)) -
1);
+
+    set_bit(bit_index_residual, d->arch.dirty.bitmap[page_index]);
+}
+
+/* routine for dirty-page tracing
+ *
+ * On first write, it page faults, its entry is changed to read-write,
+ * and on retry the write succeeds.
+ *
+ * for locating p2m of the faulting entry, we use virtual-linear page table.
+ * returns zero if addr is not valid or dirty mode is not set
+ */
+int handle_page_fault(struct domain *d, paddr_t addr)
+{
+
+    lpae_t *vlp2m_pte = 0;
+    paddr_t gma_start = 0;
+    paddr_t gma_end = 0;
+
+    if ( !d->arch.dirty.mode ) return 0;
+    get_gma_start_end(d, &gma_start, &gma_end);
+
+    /* Ensure that addr is inside guest''s RAM */
+    if ( addr < gma_start ||
+         addr > gma_end ) return 0;
+
+    vlp2m_pte = get_vlpt_3lvl_pte(addr);
+    if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0
&&
+         vlp2m_pte->p2m.avail == 0 /* reuse avail bit as read-only */ )
+    {
+        lpae_t pte = *vlp2m_pte;
+        pte.p2m.write = 1;
+        write_pte(vlp2m_pte, pte);
+        flush_tlb_local();
+
+        /* only necessary to lock between get-dirty bitmap and mark dirty
+         * bitmap. If get-dirty bitmap happens immediately before this
+         * lock, the corresponding dirty-page would be marked at the next
+         * round of get-dirty bitmap */
+        spin_lock(&d->arch.dirty.lock);
+        mark_dirty_bitmap(d, addr);
+        spin_unlock(&d->arch.dirty.lock);
+    }
+
+    return 1;
+}
+
 void get_gma_start_end(struct domain *d, paddr_t *start, paddr_t *end)
 {
     if ( start )
@@ -1440,6 +1493,54 @@ void cleanup_vlpt(struct domain *d)
     unmap_domain_page_global(d->arch.dirty.second_lvl[0]);
     unmap_domain_page_global(d->arch.dirty.second_lvl[1]);
 }
+
+int prepare_bitmap(struct domain *d)
+{
+    paddr_t gma_start = 0;
+    paddr_t gma_end = 0;
+    int nr_bytes;
+    int nr_pages;
+    int i;
+
+    get_gma_start_end(d, &gma_start, &gma_end);
+
+    nr_bytes = (PFN_DOWN(gma_end - gma_start) + 7) / 8;
+    nr_pages = (nr_bytes + PAGE_SIZE - 1) / PAGE_SIZE;
+
+    BUG_ON( nr_pages > MAX_DIRTY_BITMAP_PAGES );
+
+    for ( i = 0; i < nr_pages; ++i )
+    {
+        struct page_info *page;
+        page = alloc_domheap_page(NULL, 0);
+        if ( page == NULL )
+            goto cleanup_on_failure;
+
+        d->arch.dirty.bitmap[i] =
map_domain_page_global(__page_to_mfn(page));
+        clear_page(d->arch.dirty.bitmap[i]);
+    }
+
+    d->arch.dirty.bitmap_pages = nr_pages;
+    return 0;
+
+cleanup_on_failure:
+    nr_pages = i;
+    for ( i = 0; i < nr_pages; ++i )
+    {
+        unmap_domain_page_global(d->arch.dirty.bitmap[i]);
+    }
+    return -ENOMEM;
+}
+
+void cleanup_bitmap(struct domain *d)
+{
+    int i;
+    for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
+    {
+        unmap_domain_page_global(d->arch.dirty.bitmap[i]);
+    }
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 2d09fef..b7dbf7d 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -6,6 +6,8 @@
 #include <xen/bitops.h>
 #include <asm/flushtlb.h>
 #include <asm/gic.h>
+#include <xen/guest_access.h>
+#include <xen/pfn.h>
 
 void dump_p2m_lookup(struct domain *d, paddr_t addr)
 {
@@ -113,6 +115,10 @@ static int p2m_create_table(struct domain *d,
 
     pte = mfn_to_p2m_entry(page_to_mfn(page), MATTR_MEM);
 
+    /* mark the write bit (page table''s case, ro bit) as 0
+     * so, it is writable in case of vlpt access */
+    pte.pt.ro = 0;
+
     write_pte(entry, pte);
 
     return 0;
@@ -408,6 +414,206 @@ unsigned long gmfn_to_mfn(struct domain *d, unsigned long
gpfn)
     return p >> PAGE_SHIFT;
 }
 
+/* Change types across all p2m entries in a domain */
+void p2m_change_entry_type_global(struct domain *d, enum mg nt)
+{
+    struct p2m_domain *p2m = &d->arch.p2m;
+    paddr_t ram_base;
+    int i1, i2, i3;
+    int first_index, second_index, third_index;
+    lpae_t *first = __map_domain_page(p2m->first_level);
+    lpae_t pte, *second = NULL, *third = NULL;
+
+    get_gma_start_end(d, &ram_base, NULL);
+
+    first_index = first_table_offset((uint64_t)ram_base);
+    second_index = second_table_offset((uint64_t)ram_base);
+    third_index = third_table_offset((uint64_t)ram_base);
+
+    BUG_ON( !first && "Can''t map first level p2m."
);
+
+    spin_lock(&p2m->lock);
+
+    for ( i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1 )
+    {
+        lpae_walk_t first_pte = first[i1].walk;
+        if ( !first_pte.valid || !first_pte.table )
+            goto out;
+
+        second = map_domain_page(first_pte.base);
+        BUG_ON( !second && "Can''t map second level
p2m.");
+        for ( i2 = second_index; i2 < LPAE_ENTRIES; ++i2 )
+        {
+            lpae_walk_t second_pte = second[i2].walk;
+
+            if ( !second_pte.valid || !second_pte.table )
+                goto out;
+
+            third = map_domain_page(second_pte.base);
+            BUG_ON( !third && "Can''t map third level
p2m.");
+
+            for ( i3 = third_index; i3 < LPAE_ENTRIES; ++i3 )
+            {
+                lpae_walk_t third_pte = third[i3].walk;
+                if ( !third_pte.valid )
+                    goto out;
+
+                pte = third[i3];
+                if ( nt == mg_ro )
+                {
+                    if ( pte.p2m.write == 1 )
+                    {
+                        pte.p2m.write = 0;
+                        pte.p2m.avail = 0;
+                    }
+                    else
+                    {
+                        /* reuse avail bit as an indicator
+                         * of ''actual'' read-only */
+                        pte.p2m.avail = 1;
+                    }
+                }
+                else if ( nt == mg_rw )
+                {
+                    if ( pte.p2m.write == 0 && pte.p2m.avail == 0 )
+                    {
+                        pte.p2m.write = 1;
+                    }
+                }
+                write_pte(&third[i3], pte);
+            }
+            unmap_domain_page(third);
+
+            third = NULL;
+            third_index = 0;
+        }
+        unmap_domain_page(second);
+
+        second = NULL;
+        second_index = 0;
+        third_index = 0;
+    }
+
+out:
+    flush_tlb_all_local();
+    if ( third ) unmap_domain_page(third);
+    if ( second ) unmap_domain_page(second);
+    if ( first ) unmap_domain_page(first);
+
+    spin_unlock(&p2m->lock);
+}
+
+/* Read a domain''s log-dirty bitmap and stats.
+ * If the operation is a CLEAN, clear the bitmap and stats. */
+int log_dirty_op(struct domain *d, xen_domctl_shadow_op_t *sc)
+{
+    int peek = 1;
+    int i;
+    int bitmap_size;
+    paddr_t gma_start, gma_end;
+
+    /* this hypercall is called from domain 0, and we don''t know which
guest''s
+     * vlpt is mapped in xen_second, so, to be sure, we restore vlpt here */
+    restore_vlpt(d);
+
+    get_gma_start_end(d, &gma_start, &gma_end);
+    bitmap_size = (gma_end - gma_start) / 8;
+
+    if ( guest_handle_is_null(sc->dirty_bitmap) )
+    {
+        peek = 0;
+    }
+    else
+    {
+        spin_lock(&d->arch.dirty.lock);
+        for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
+        {
+            int j = 0;
+            uint8_t *bitmap;
+            copy_to_guest_offset(sc->dirty_bitmap, i * PAGE_SIZE,
+                                 d->arch.dirty.bitmap[i],
+                                 bitmap_size < PAGE_SIZE ? bitmap_size :
+                                                           PAGE_SIZE);
+            bitmap_size -= PAGE_SIZE;
+
+            /* set p2m page table read-only */
+            bitmap = d->arch.dirty.bitmap[i];
+            while ((j = find_next_bit((const long unsigned int *)bitmap,
+                                      PAGE_SIZE*8, j)) < PAGE_SIZE*8)
+            {
+                lpae_t *vlpt;
+                paddr_t addr = gma_start +
+                               (i << (2*PAGE_SHIFT+3)) +
+                               (j << PAGE_SHIFT);
+                vlpt = get_vlpt_3lvl_pte(addr);
+                vlpt->p2m.write = 0;
+                j++;
+            }
+        }
+
+        if ( sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN )
+        {
+            for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
+            {
+                clear_page(d->arch.dirty.bitmap[i]);
+            }
+        }
+
+        spin_unlock(&d->arch.dirty.lock);
+        flush_tlb_local();
+    }
+
+    sc->stats.dirty_count = d->arch.dirty.count;
+
+    return 0;
+}
+
+long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc)
+{
+    long ret = 0;
+    switch (sc->op)
+    {
+        case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY:
+        case XEN_DOMCTL_SHADOW_OP_OFF:
+        {
+            enum mg nt = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? mg_rw : mg_ro;
+
+            d->arch.dirty.mode = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? 0 :
1;
+            p2m_change_entry_type_global(d, nt);
+
+            if ( sc->op == XEN_DOMCTL_SHADOW_OP_OFF )
+            {
+                cleanup_vlpt(d);
+                cleanup_bitmap(d);
+            }
+            else
+            {
+                if ( (ret = prepare_vlpt(d)) )
+                   return ret;
+
+                if ( (ret = prepare_bitmap(d)) )
+                {
+                   /* in case of failure, we have to cleanup vlpt */
+                   cleanup_vlpt(d);
+                   return ret;
+                }
+            }
+        }
+        break;
+
+        case XEN_DOMCTL_SHADOW_OP_CLEAN:
+        case XEN_DOMCTL_SHADOW_OP_PEEK:
+        {
+            ret = log_dirty_op(d, sc);
+        }
+        break;
+
+        default:
+            return -ENOSYS;
+    }
+    return ret;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 287dd7b..1a7ed11 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1321,6 +1321,8 @@ static void do_trap_data_abort_guest(struct cpu_user_regs
*regs,
     const char *msg;
     int rc, level = -1;
     mmio_info_t info;
+    int page_fault = ( (dabt.dfsc & FSC_MASK) =+                         
(FSC_FLT_PERM | FSC_3D_LEVEL) && dabt.write );
 
     if ( !check_conditional_instr(regs, hsr) )
     {
@@ -1342,6 +1344,13 @@ static void do_trap_data_abort_guest(struct cpu_user_regs
*regs,
     if ( rc == -EFAULT )
         goto bad_data_abort;
 
+    /* domU page fault handling for guest live migration */
+    /* dabt.valid can be 0 here */
+    if ( page_fault && handle_page_fault(current->domain, info.gpa)
)
+    {
+        /* Do not modify pc after page fault to repeat memory operation */
+        return;
+    }
     /* XXX: Decode the instruction if ISS is not valid */
     if ( !dabt.valid )
         goto bad_data_abort;
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 4f366f1..180d924 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -114,9 +114,16 @@ struct arch_domain
 
     /* dirty-page tracing */
     struct {
+#define MAX_DIRTY_BITMAP_PAGES 64        /* support upto 8GB guest memory */
+        spinlock_t lock;                 /* protect list: head, mvn_head */
+        volatile int mode;               /* 1 if dirty pages tracing enabled */
+        volatile unsigned int count;     /* dirty pages counter */
         volatile int second_lvl_start;   /* for context switch */
         volatile int second_lvl_end;
         lpae_t *second_lvl[2];           /* copy of guest p2m''s first
*/
+        /* dirty bitmap */
+        uint8_t *bitmap[MAX_DIRTY_BITMAP_PAGES];
+        int bitmap_pages;                /* number of dirty bitmap pages */
     } dirty;
 
 }  __cacheline_aligned;
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index a74e135..1ce7a4b 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -341,11 +341,18 @@ static inline void put_page_and_type(struct page_info
*page)
     put_page(page);
 }
 
+enum mg { mg_clear, mg_ro, mg_rw, mg_rx };
+
+/* routine for dirty-page tracing */
+int handle_page_fault(struct domain *d, paddr_t addr);
 void get_gma_start_end(struct domain *d, paddr_t *start, paddr_t *end);
 int prepare_vlpt(struct domain *d);
 void cleanup_vlpt(struct domain *d);
 void restore_vlpt(struct domain *d);
 
+int prepare_bitmap(struct domain *d);
+void cleanup_bitmap(struct domain *d);
+
 /* calculate the xen''s virtual address for accessing the leaf PTE of
  * a given address (GPA) */
 static inline lpae_t * get_vlpt_3lvl_pte(paddr_t addr)
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index c660820..dba9a7b 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -2,6 +2,7 @@
 #define _XEN_P2M_H
 
 #include <xen/mm.h>
+#include <public/domctl.h>
 
 struct domain;
 
@@ -110,6 +111,9 @@ static inline int get_page_and_type(struct page_info *page,
     return rc;
 }
 
+void p2m_change_entry_type_global(struct domain *d, enum mg nt);
+long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc);
+
 #endif /* _XEN_P2M_H */
 
 /*
diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h
index 5294421..fced6ad 100644
--- a/xen/include/asm-arm/processor.h
+++ b/xen/include/asm-arm/processor.h
@@ -399,6 +399,8 @@ union hsr {
 #define FSC_CPR        (0x3a) /* Coprocossor Abort */
 
 #define FSC_LL_MASK    (0x03<<0)
+#define FSC_MASK       (0x3f) /* Fault status mask */
+#define FSC_3D_LEVEL   (0x03) /* Third level fault*/
 
 /* Time counter hypervisor control register */
 #define CNTHCTL_PA      (1u<<0)  /* Kernel/user access to physical
counter */
-- 
1.8.1.2

Jaeyong Yoo

2013-Nov-08 07:50 UTC

head link

[PATCH RESEND v5 6/6] xen/arm: Implement toolstack for xl restore/save and migrate

From: Alexey Sokolov <sokolov.a@samsung.com>

Implement for xl restore/save (which are also used for migrate) operation in
xc_arm_migrate.c and make it compilable.
The overall process of save is the following:
1) save guest parameters (i.e., memory map, console and store pfn, etc)
2) save memory (if it is live, perform dirty-page tracing)
3) save hvm states (i.e., gic, timer, vcpu etc)

Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com>
---
 config/arm32.mk              |   1 +
 tools/libxc/Makefile         |   6 +-
 tools/libxc/xc_arm_migrate.c | 712 +++++++++++++++++++++++++++++++++++++++++++
 tools/libxc/xc_dom_arm.c     |   4 +-
 tools/misc/Makefile          |   4 +-
 5 files changed, 723 insertions(+), 4 deletions(-)
 create mode 100644 tools/libxc/xc_arm_migrate.c

diff --git a/config/arm32.mk b/config/arm32.mk
index aa79d22..01374c9 100644
--- a/config/arm32.mk
+++ b/config/arm32.mk
@@ -1,6 +1,7 @@
 CONFIG_ARM := y
 CONFIG_ARM_32 := y
 CONFIG_ARM_$(XEN_OS) := y
+CONFIG_MIGRATE := y
 
 CONFIG_XEN_INSTALL_SUFFIX : 
diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
index 4c64c15..05dfef4 100644
--- a/tools/libxc/Makefile
+++ b/tools/libxc/Makefile
@@ -42,8 +42,13 @@ CTRL_SRCS-$(CONFIG_MiniOS) += xc_minios.c
 GUEST_SRCS-y : GUEST_SRCS-y += xg_private.c xc_suspend.c
 ifeq ($(CONFIG_MIGRATE),y)
+ifeq ($(CONFIG_X86),y)
 GUEST_SRCS-y += xc_domain_restore.c xc_domain_save.c
 GUEST_SRCS-y += xc_offline_page.c xc_compression.c
+endif
+ifeq ($(CONFIG_ARM),y)
+GUEST_SRCS-y += xc_arm_migrate.c
+endif
 else
 GUEST_SRCS-y += xc_nomigrate.c
 endif
@@ -63,7 +68,6 @@ $(patsubst %.c,%.opic,$(ELF_SRCS-y)): CFLAGS +=
-Wno-pointer-sign
 GUEST_SRCS-y                 += xc_dom_core.c xc_dom_boot.c
 GUEST_SRCS-y                 += xc_dom_elfloader.c
 GUEST_SRCS-$(CONFIG_X86)     += xc_dom_bzimageloader.c
-GUEST_SRCS-$(CONFIG_X86)     += xc_dom_decompress_lz4.c
 GUEST_SRCS-$(CONFIG_ARM)     += xc_dom_armzimageloader.c
 GUEST_SRCS-y                 += xc_dom_binloader.c
 GUEST_SRCS-y                 += xc_dom_compat_linux.c
diff --git a/tools/libxc/xc_arm_migrate.c b/tools/libxc/xc_arm_migrate.c
new file mode 100644
index 0000000..461e339
--- /dev/null
+++ b/tools/libxc/xc_arm_migrate.c
@@ -0,0 +1,712 @@
+/******************************************************************************
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301 
USA
+ *
+ * Copyright (c) 2013, Samsung Electronics
+ */
+
+#include <inttypes.h>
+#include <errno.h>
+#include <xenctrl.h>
+#include <xenguest.h>
+
+#include <unistd.h>
+#include <xc_private.h>
+#include <xc_dom.h>
+#include "xc_bitops.h"
+#include "xg_private.h"
+
+/* Guest RAM base */
+#define GUEST_RAM_BASE 0x80000000
+/*
+ *  XXX: Use correct definition for RAM base when the following patch
+ *  xen: arm: 64-bit guest support and domU FDT autogeneration
+ *  will be upstreamed.
+ */
+
+#define DEF_MAX_ITERS          29 /* limit us to 30 times round loop   */
+#define DEF_MAX_FACTOR         3  /* never send more than 3x p2m_size  */
+#define DEF_MIN_DIRTY_PER_ITER 50 /* dirty page count to define last iter */
+#define DEF_PROGRESS_RATE      50 /* progress bar update rate */
+
+/* Enable this macro for debug only: "static" migration instead of
live */
+/*
+#define DISABLE_LIVE_MIGRATION
+*/
+
+/* Enable this macro for debug only: additional debug info */
+/*
+#define ARM_MIGRATE_VERBOSE
+*/
+
+/*
+ * Guest params to save: used HVM params, save flags, memory map
+ */
+typedef struct guest_params
+{
+    unsigned long console_pfn;
+    unsigned long store_pfn;
+    uint32_t flags;
+    xen_pfn_t start_gpfn;
+    xen_pfn_t max_gpfn;
+    uint32_t max_vcpu_id;
+} guest_params_t;
+
+static int suspend_and_state(int (*suspend)(void*), void *data,
+                             xc_interface *xch, int dom)
+{
+    xc_dominfo_t info;
+    if ( !(*suspend)(data) )
+    {
+        ERROR("Suspend request failed");
+        return -1;
+    }
+
+    if ( (xc_domain_getinfo(xch, dom, 1, &info) != 1) ||
+         !info.shutdown || (info.shutdown_reason != SHUTDOWN_suspend) )
+    {
+        ERROR("Domain is not in suspended state after suspend
attempt");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_exact_handled(xc_interface *xch, int fd, const void *data,
+                               size_t size)
+{
+    if ( write_exact(fd, data, size) )
+    {
+        ERROR("Write failed, check space");
+        return -1;
+    }
+    return 0;
+}
+
+/* ============ Memory ============= */
+static int save_memory(xc_interface *xch, int io_fd, uint32_t dom,
+                       struct save_callbacks *callbacks,
+                       uint32_t max_iters, uint32_t max_factor,
+                       guest_params_t *params)
+{
+    int live =  !!(params->flags & XCFLAGS_LIVE);
+    int debug =  !!(params->flags & XCFLAGS_DEBUG);
+    xen_pfn_t i;
+    char reportbuf[80];
+    int iter = 0;
+    int last_iter = !live;
+    int total_dirty_pages_num = 0;
+    int dirty_pages_on_prev_iter_num = 0;
+    int count = 0;
+    char *page = 0;
+    xen_pfn_t *busy_pages = 0;
+    int busy_pages_count = 0;
+    int busy_pages_max = 256;
+
+    DECLARE_HYPERCALL_BUFFER(unsigned long, to_send);
+
+    xen_pfn_t start = params->start_gpfn;
+    const xen_pfn_t end = params->max_gpfn;
+    const xen_pfn_t mem_size = end - start;
+
+    if ( debug )
+    {
+        IPRINTF("(save mem) start=%llx end=%llx!\n", start, end);
+    }
+
+    if ( live )
+    {
+        if ( xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
+                    NULL, 0, NULL, 0, NULL) < 0 )
+        {
+            ERROR("Couldn''t enable log-dirty mode !\n");
+            return -1;
+        }
+
+        max_iters  = max_iters  ? : DEF_MAX_ITERS;
+        max_factor = max_factor ? : DEF_MAX_FACTOR;
+
+        if ( debug )
+            IPRINTF("Log-dirty mode enabled, max_iters=%d,
max_factor=%d!\n",
+                    max_iters, max_factor);
+    }
+
+    to_send = xc_hypercall_buffer_alloc_pages(xch, to_send,
+                                              NRPAGES(bitmap_size(mem_size)));
+    if ( !to_send )
+    {
+        ERROR("Couldn''t allocate to_send array!\n");
+        return -1;
+    }
+
+    /* send all pages on first iter */
+    memset(to_send, 0xff, bitmap_size(mem_size));
+
+    for ( ; ; )
+    {
+        int dirty_pages_on_current_iter_num = 0;
+        int frc;
+        iter++;
+
+        snprintf(reportbuf, sizeof(reportbuf),
+                 "Saving memory: iter %d (last sent %u)",
+                 iter, dirty_pages_on_prev_iter_num);
+
+        xc_report_progress_start(xch, reportbuf, mem_size);
+
+        if ( (iter > 1 &&
+              dirty_pages_on_prev_iter_num < DEF_MIN_DIRTY_PER_ITER) ||
+             (iter == max_iters) ||
+             (total_dirty_pages_num >= mem_size*max_factor) )
+        {
+            if ( debug )
+                IPRINTF("Last iteration");
+            last_iter = 1;
+        }
+
+        if ( last_iter )
+        {
+            if ( suspend_and_state(callbacks->suspend, callbacks->data,
+                                   xch, dom) )
+            {
+                ERROR("Domain appears not to have suspended");
+                return -1;
+            }
+        }
+        if ( live && iter > 1 )
+        {
+            frc = xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_CLEAN,
+                                    HYPERCALL_BUFFER(to_send), mem_size,
+                                                     NULL, 0, NULL);
+            if ( frc != mem_size )
+            {
+                ERROR("Error peeking shadow bitmap");
+                xc_hypercall_buffer_free_pages(xch, to_send,
+                                               NRPAGES(bitmap_size(mem_size)));
+                return -1;
+            }
+        }
+
+        busy_pages = malloc(sizeof(xen_pfn_t) * busy_pages_max);
+
+        for ( i = start; i < end; ++i )
+        {
+            if ( test_bit(i - start, to_send) )
+            {
+                page = xc_map_foreign_range(xch, dom, PAGE_SIZE, PROT_READ, i);
+                if ( !page )
+                {
+                    /* This page is mapped elsewhere, should be resent later */
+                    busy_pages[busy_pages_count] = i;
+                    busy_pages_count++;
+                    if ( busy_pages_count >= busy_pages_max )
+                    {
+                        busy_pages_max += 256;
+                        busy_pages = realloc(busy_pages, sizeof(xen_pfn_t) *
+                                                         busy_pages_max);
+                    }
+                    continue;
+                }
+
+                if ( write_exact_handled(xch, io_fd, &i, sizeof(i)) ||
+                     write_exact_handled(xch, io_fd, page, PAGE_SIZE) )
+                {
+                    munmap(page, PAGE_SIZE);
+                    free(busy_pages);
+                    return -1;
+                }
+                count++;
+                munmap(page, PAGE_SIZE);
+
+                if ( (i % DEF_PROGRESS_RATE) == 0 )
+                    xc_report_progress_step(xch, i - start, mem_size);
+                dirty_pages_on_current_iter_num++;
+            }
+        }
+
+        while ( busy_pages_count )
+        {
+            /* Send busy pages */
+            busy_pages_count--;
+            i = busy_pages[busy_pages_count];
+            if ( test_bit(i - start, to_send) )
+            {
+                page = xc_map_foreign_range(xch, dom, PAGE_SIZE,PROT_READ, i);
+                if ( !page )
+                {
+                    IPRINTF("WARNING: 2nd attempt to save page "
+                            "busy failed pfn=%llx", i);
+                    continue;
+                }
+
+                if ( debug )
+                {
+                    IPRINTF("save mem: resend busy page %llx\n", i);
+                }
+
+                if ( write_exact_handled(xch, io_fd, &i, sizeof(i)) ||
+                     write_exact_handled(xch, io_fd, page, PAGE_SIZE) )
+                {
+                    munmap(page, PAGE_SIZE);
+                    free(busy_pages);
+                    return -1;
+                }
+                count++;
+                munmap(page, PAGE_SIZE);
+                dirty_pages_on_current_iter_num++;
+            }
+        }
+        free(busy_pages);
+
+        if ( debug )
+            IPRINTF("Dirty pages=%d",
dirty_pages_on_current_iter_num);
+
+        xc_report_progress_step(xch, mem_size, mem_size);
+
+        dirty_pages_on_prev_iter_num = dirty_pages_on_current_iter_num;
+        total_dirty_pages_num += dirty_pages_on_current_iter_num;
+
+        if ( last_iter )
+        {
+            xc_hypercall_buffer_free_pages(xch, to_send,
+                                           NRPAGES(bitmap_size(mem_size)));
+            if ( live )
+            {
+                if ( xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_OFF,
+                                       NULL, 0, NULL, 0, NULL) < 0 )
+                    ERROR("Couldn''t disable log-dirty
mode");
+            }
+            break;
+        }
+    }
+    if ( debug )
+    {
+        IPRINTF("save mem: pages count = %d\n", count);
+    }
+
+    i = (xen_pfn_t) -1; /* end page marker */
+    return write_exact_handled(xch, io_fd, &i, sizeof(i));
+}
+
+static int restore_memory(xc_interface *xch, int io_fd, uint32_t dom,
+                          guest_params_t *params)
+{
+    xen_pfn_t end = params->max_gpfn;
+    xen_pfn_t gpfn;
+    int debug =  !!(params->flags & XCFLAGS_DEBUG);
+    int count = 0;
+    char *page;
+    xen_pfn_t start = params->start_gpfn;
+
+    /* TODO allocate several pages per call */
+    for ( gpfn = start; gpfn < end; ++gpfn )
+    {
+        if ( xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &gpfn) )
+        {
+            PERROR("Memory allocation for a new domain failed");
+            return -1;
+        }
+    }
+    while ( 1 )
+    {
+
+        if ( read_exact(io_fd, &gpfn, sizeof(gpfn)) )
+        {
+            PERROR("GPFN read failed during memory transfer,
count=%d", count);
+            return -1;
+        }
+        if ( gpfn == (xen_pfn_t) -1 ) break; /* end page marker */
+
+        if ( gpfn < start || gpfn >= end )
+        {
+            ERROR("GPFN %llx doesn''t belong to RAM address space,
count=%d",
+                    gpfn, count);
+            return -1;
+        }
+        page = xc_map_foreign_range(xch, dom, PAGE_SIZE,
+                                    PROT_READ | PROT_WRITE, gpfn);
+        if ( !page )
+        {
+            PERROR("xc_map_foreign_range failed, pfn=%llx", gpfn);
+            return -1;
+        }
+        if ( read_exact(io_fd, page, PAGE_SIZE) )
+        {
+            PERROR("Page data read failed during memory transfer,
pfn=%llx",
+                    gpfn);
+            return -1;
+        }
+        munmap(page, PAGE_SIZE);
+        count++;
+    }
+
+    if ( debug )
+    {
+        IPRINTF("Memory restored, pages count=%d", count);
+    }
+    return 0;
+}
+
+/* ============ HVM context =========== */
+static int save_armhvm(xc_interface *xch, int io_fd, uint32_t dom, int debug)
+{
+    /* HVM: a buffer for holding HVM context */
+    uint32_t hvm_buf_size = 0;
+    uint8_t *hvm_buf = NULL;
+    uint32_t rec_size;
+    int retval = -1;
+
+    /* Need another buffer for HVM context */
+    hvm_buf_size = xc_domain_hvm_getcontext(xch, dom, 0, 0);
+    if ( hvm_buf_size == -1 )
+    {
+        ERROR("Couldn''t get HVM context size from Xen");
+        goto out;
+    }
+    hvm_buf = malloc(hvm_buf_size);
+
+    if ( !hvm_buf )
+    {
+        ERROR("Couldn''t allocate memory for hvm buffer");
+        goto out;
+    }
+
+    /* Get HVM context from Xen and save it too */
+    if ( (rec_size = xc_domain_hvm_getcontext(xch, dom, hvm_buf,
+                    hvm_buf_size)) == -1 )
+    {
+        ERROR("HVM:Could not get hvm buffer");
+        goto out;
+    }
+
+    if ( debug )
+        IPRINTF("HVM save size %d %d", hvm_buf_size, rec_size);
+
+    if ( write_exact_handled(xch, io_fd, &rec_size, sizeof(uint32_t)) )
+        goto out;
+
+    if ( write_exact_handled(xch, io_fd, hvm_buf, rec_size) )
+    {
+        goto out;
+    }
+    retval = 0;
+
+out:
+    if ( hvm_buf )
+        free (hvm_buf);
+    return retval;
+}
+
+static int restore_armhvm(xc_interface *xch, int io_fd,
+                          uint32_t dom, int debug)
+{
+    uint32_t rec_size;
+    uint32_t hvm_buf_size = 0;
+    uint8_t *hvm_buf = NULL;
+    int frc = 0;
+    int retval = -1;
+
+    if ( read_exact(io_fd, &rec_size, sizeof(uint32_t)) )
+    {
+        PERROR("Could not read HVM size");
+        goto out;
+    }
+
+    if ( !rec_size )
+    {
+        ERROR("Zero HVM size");
+        goto out;
+    }
+
+    hvm_buf_size = xc_domain_hvm_getcontext(xch, dom, 0, 0);
+    if ( hvm_buf_size != rec_size )
+    {
+        ERROR("HVM size for this domain is not the same as stored");
+    }
+
+    hvm_buf = malloc(hvm_buf_size);
+    if ( !hvm_buf )
+    {
+        ERROR("Couldn''t allocate memory");
+        goto out;
+    }
+
+    if ( read_exact(io_fd, hvm_buf, hvm_buf_size) )
+    {
+        PERROR("Could not read HVM context");
+        goto out;
+    }
+
+    frc = xc_domain_hvm_setcontext(xch, dom, hvm_buf, hvm_buf_size);
+    if ( frc )
+    {
+        ERROR("error setting the HVM context");
+        goto out;
+    }
+    retval = 0;
+
+    if ( debug )
+    {
+            IPRINTF("HVM restore size %d %d", hvm_buf_size,
rec_size);
+    }
+out:
+    if ( hvm_buf )
+        free (hvm_buf);
+    return retval;
+}
+
+/* ================= Console & Xenstore & Memory map =========== */
+static int save_guest_params(xc_interface *xch, int io_fd,
+                             uint32_t dom, uint32_t flags,
+                             guest_params_t *params)
+{
+    size_t sz = sizeof(guest_params_t);
+    xc_dominfo_t dom_info;
+
+    params->max_gpfn = xc_domain_maximum_gpfn(xch, dom);
+    params->start_gpfn = (GUEST_RAM_BASE >> PAGE_SHIFT);
+
+    if ( flags & XCFLAGS_DEBUG )
+    {
+        IPRINTF("Guest param save size: %d ", sz);
+    }
+
+    if ( xc_get_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
+            &params->console_pfn) )
+    {
+        ERROR("Can''t get console gpfn");
+        return -1;
+    }
+
+    if ( xc_get_hvm_param(xch, dom, HVM_PARAM_STORE_PFN,
&params->store_pfn) )
+    {
+        ERROR("Can''t get store gpfn");
+        return -1;
+    }
+
+    if ( xc_domain_getinfo(xch, dom, 1, &dom_info ) < 0)
+    {
+        ERROR("Can''t get domain info for dom %d", dom);
+        return -1;
+    }
+    params->max_vcpu_id = dom_info.max_vcpu_id;
+
+    params->flags = flags;
+
+    if ( write_exact_handled(xch, io_fd, params, sz) )
+    {
+        return -1;
+    }
+
+    return 0;
+}
+
+static int restore_guest_params(xc_interface *xch, int io_fd,
+                                uint32_t dom, guest_params_t *params)
+{
+    size_t sz = sizeof(guest_params_t);
+    xen_pfn_t nr_pfns;
+    unsigned int maxmemkb;
+
+    if ( read_exact(io_fd, params, sizeof(guest_params_t)) )
+    {
+        PERROR("Can''t read guest params");
+        return -1;
+    }
+
+    nr_pfns = params->max_gpfn - params->start_gpfn;
+    maxmemkb = (unsigned int) nr_pfns << (PAGE_SHIFT - 10);
+
+    if ( params->flags & XCFLAGS_DEBUG )
+    {
+        IPRINTF("Guest param restore size: %d ", sz);
+        IPRINTF("Guest memory size: %d MB", maxmemkb >> 10);
+    }
+
+    if ( xc_domain_setmaxmem(xch, dom, maxmemkb) )
+    {
+        ERROR("Can''t set memory map");
+        return -1;
+    }
+
+    /* Set max. number of vcpus as max_vcpu_id + 1 */
+    if ( xc_domain_max_vcpus(xch, dom, params->max_vcpu_id + 1) )
+    {
+        ERROR("Can''t set max vcpu number for domain");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int set_guest_params(xc_interface *xch, int io_fd, uint32_t dom,
+                            guest_params_t *params, unsigned int
console_evtchn,
+                            domid_t console_domid, unsigned int store_evtchn,
+                            domid_t store_domid)
+{
+    int rc = 0;
+
+    if ( (rc = xc_clear_domain_page(xch, dom, params->console_pfn)) )
+    {
+        ERROR("Can''t clear console page");
+        return rc;
+    }
+
+    if ( (rc = xc_clear_domain_page(xch, dom, params->store_pfn)) )
+    {
+        ERROR("Can''t clear xenstore page");
+        return rc;
+    }
+
+    if ( (rc = xc_dom_gnttab_hvm_seed(xch, dom, params->console_pfn,
+                                      params->store_pfn, console_domid,
+                                      store_domid)) )
+    {
+        ERROR("Can''t grant console and xenstore pages");
+        return rc;
+    }
+
+    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN,
+                                params->console_pfn)) )
+    {
+        ERROR("Can''t set console gpfn");
+        return rc;
+    }
+
+    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN,
+                                params->store_pfn)) )
+    {
+        ERROR("Can''t set xenstore gpfn");
+        return rc;
+    }
+
+    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_EVTCHN,
+                                console_evtchn)) )
+    {
+        ERROR("Can''t set console event channel");
+        return rc;
+    }
+
+    if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_EVTCHN,
+                                store_evtchn)) )
+    {
+        ERROR("Can''t set xenstore event channel");
+        return rc;
+    }
+    return 0;
+}
+
+/* ================== Main ============== */
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
+                   uint32_t max_iters, uint32_t max_factor, uint32_t flags,
+                   struct save_callbacks *callbacks, int hvm,
+                   unsigned long vm_generationid_addr)
+{
+    int debug;
+    guest_params_t params;
+
+#ifdef ARM_MIGRATE_VERBOSE
+    flags |= XCFLAGS_DEBUG;
+#endif
+
+#ifdef DISABLE_LIVE_MIGRATION
+    flags &= ~(XCFLAGS_LIVE);
+#endif
+
+    debug = !!(flags & XCFLAGS_DEBUG);
+    if ( save_guest_params(xch, io_fd, dom, flags, &params) )
+    {
+       ERROR("Can''t save guest params");
+       return -1;
+    }
+
+    if ( save_memory(xch, io_fd, dom, callbacks, max_iters,
+            max_factor, &params) )
+    {
+        ERROR("Memory not saved");
+        return -1;
+    }
+
+    if ( save_armhvm(xch, io_fd, dom, debug) )
+    {
+        ERROR("HVM not saved");
+        return -1;
+    }
+
+    if ( debug )
+    {
+        IPRINTF("Domain %d saved", dom);
+    }
+    return 0;
+}
+
+int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom,
+                      unsigned int store_evtchn, unsigned long *store_gpfn,
+                      domid_t store_domid, unsigned int console_evtchn,
+                      unsigned long *console_gpfn, domid_t console_domid,
+                      unsigned int hvm, unsigned int pae, int superpages,
+                      int no_incr_generationid, int checkpointed_stream,
+                      unsigned long *vm_generationid_addr,
+                      struct restore_callbacks *callbacks)
+{
+    guest_params_t params;
+    int debug = 1;
+
+    if ( restore_guest_params(xch, io_fd, dom, &params) )
+    {
+        ERROR("Can''t restore guest params");
+        return -1;
+    }
+    debug = !!(params.flags & XCFLAGS_DEBUG);
+
+    if ( restore_memory(xch, io_fd, dom, &params) )
+    {
+        ERROR("Can''t restore memory");
+        return -1;
+    }
+    if ( set_guest_params(xch, io_fd, dom, &params,
+                console_evtchn, console_domid,
+                store_evtchn, store_domid) )
+    {
+        ERROR("Can''t setup guest params");
+        return -1;
+    }
+
+    /* Setup console and store PFNs to caller */
+    *console_gpfn = params.console_pfn;
+    *store_gpfn = params.store_pfn;
+
+    if ( restore_armhvm(xch, io_fd, dom, debug) )
+    {
+        ERROR("HVM not restored");
+        return -1;
+    }
+
+    if ( debug )
+    {
+         IPRINTF("Domain %d restored", dom);
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
index df59ffb..ebeeb41 100644
--- a/tools/libxc/xc_dom_arm.c
+++ b/tools/libxc/xc_dom_arm.c
@@ -191,7 +191,9 @@ int arch_setup_meminit(struct xc_dom_image *dom)
             0, 0, &dom->p2m_host[i]);
     }
 
-    return 0;
+    return xc_domain_setmaxmem(dom->xch, dom->guest_domid,
+                               (dom->total_pages + NR_MAGIC_PAGES)
+                                << (PAGE_SHIFT - 10));
 }
 
 int arch_setup_bootearly(struct xc_dom_image *dom)
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index 17aeda5..0824100 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -11,7 +11,7 @@ HDRS     = $(wildcard *.h)
 
 TARGETS-y := xenperf xenpm xen-tmem-list-parse gtraceview gtracestat
xenlockprof xenwatchdogd xencov
 TARGETS-$(CONFIG_X86) += xen-detect xen-hvmctx xen-hvmcrash xen-lowmemd
xen-mfndump
-TARGETS-$(CONFIG_MIGRATE) += xen-hptool
+TARGETS-$(CONFIG_X86) += xen-hptool
 TARGETS := $(TARGETS-y)
 
 SUBDIRS := $(SUBDIRS-y)
@@ -23,7 +23,7 @@ INSTALL_BIN := $(INSTALL_BIN-y)
 INSTALL_SBIN-y := xen-bugtool xen-python-path xenperf xenpm xen-tmem-list-parse
gtraceview \
 	gtracestat xenlockprof xenwatchdogd xen-ringwatch xencov
 INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx xen-hvmcrash xen-lowmemd xen-mfndump
-INSTALL_SBIN-$(CONFIG_MIGRATE) += xen-hptool
+INSTALL_SBIN-$(CONFIG_X86) += xen-hptool
 INSTALL_SBIN := $(INSTALL_SBIN-y)
 
 INSTALL_PRIVBIN-y := xenpvnetboot
-- 
1.8.1.2

Ian Campbell

2013-Nov-12 15:15 UTC

head link

Re: [PATCH RESEND v5 1/6] xen/arm: Implement hvm save and restore

On Fri, 2013-11-08 at 16:50 +0900, Jaeyong Yoo wrote:

If these get resent again please can you CC the ARM maintainers. 
$ git grep -A5 ARM MAINTAINERS
MAINTAINERS:ARM (W/ VIRTUALISATION EXTENSIONS) ARCHITECTURE
MAINTAINERS-M:  Ian Campbell <ian.campbell@citrix.com>
MAINTAINERS-M:  Stefano Stabellini <stefano.stabellini@citrix.com>
MAINTAINERS-M:  Tim Deegan <tim@xen.org>
MAINTAINERS-S:  Supported
MAINTAINERS-L:  xen-devel@lists.xen.org
> From: Evgeny Fedotov <e.fedotov@samsung.com>
> 
> Implement save/restore of hvm context hypercall.
> In hvm context save/restore, we save gic, timer and vfp registers.
> 
> Changes from v4: Save vcpu registers within hvm context, and purge
> the save-vcpu-register patch.
The inter-version changelog should go after the main commit message and
the S-o-b separated by a "---" on a line by itself. This way it is not
included in the final commit.
[...]> +static int vgic_irq_rank_save(struct vgic_rank *ext,
> +                               struct vgic_irq_rank *rank)
> +{
> +    spin_lock(&rank->lock);
This is probably wise, but the domain must be paused at this point,
right?
> +    /* Some of VGIC registers are not used yet, it is for a future usage
*/
> +    /* IENABLE, IACTIVE, IPEND,  PENDSGI registers */
> +    ext->ienable = rank->ienable;
> +    ext->iactive = rank->iactive;
> +    ext->ipend = rank->ipend;
> +    ext->pendsgi = rank->pendsgi;
> +    /* ICFG */
> +    ext->icfg[0] = rank->icfg[0];
> +    ext->icfg[1] = rank->icfg[1];
> +    /* IPRIORITY */
> +    if ( sizeof(rank->ipriority) != sizeof (ext->ipriority) )
> +    {
> +        dprintk(XENLOG_G_ERR, "hvm_hw_gic: check ipriority dumping
space\n");
What makes ipriority (and itargets below) special enough to warrant such
a check compared with all the other fields?

Could these checks be BUILD_BUG_ON?
> +        return -EINVAL;
> +    }
> +    memcpy(ext->ipriority, rank->ipriority,
sizeof(rank->ipriority));
> +    /* ITARGETS */
> +    if ( sizeof(rank->itargets) != sizeof (ext->itargets) )
> +    {
> +        dprintk(XENLOG_G_ERR, "hvm_hw_gic: check itargets dumping
space\n");
> +        return -EINVAL;
> +    }
> +    memcpy(ext->itargets, rank->itargets,
sizeof(rank->itargets));
> +    spin_unlock(&rank->lock);
> +    return 0;
> +}
> +
> +static int gic_save(struct domain *d, hvm_domain_context_t *h)
> +{
> +    struct hvm_hw_gic ctxt;
> +    struct vcpu *v;
> +
> +    /* Save the state of GICs */
> +    for_each_vcpu( d, v )
Where is the GICD state saved then?
> +    {
> +        ctxt.gic_hcr = v->arch.gic_hcr;
> +        ctxt.gic_vmcr = v->arch.gic_vmcr;
> +        ctxt.gic_apr = v->arch.gic_apr;
> +
> +        /* Save list registers and masks */
> +        /* (it is not necessary to save/restore them, but LR state can
have
> +         * influence on downtime after Live Migration (to be tested)
What does this mean?

I''m in two minds about whether LR counts as architectural state, since
it is hypervisor side and not guest facing. I''d have thought that the
LR
content could be reconstructed from the pending and active bitmasks.
> +         */
> +        if ( sizeof(v->arch.gic_lr) > sizeof (ctxt.gic_lr) )
> +        {
> +             dprintk(XENLOG_G_ERR, "hvm_hw_gic: increase LR dumping
space\n");
> +             return -EINVAL;
> +        }
> +        memcpy(ctxt.gic_lr, v->arch.gic_lr, sizeof(v->arch.gic_lr));
> +        ctxt.lr_mask = v->arch.lr_mask;
> +        ctxt.event_mask = v->arch.event_mask;
These last two in particular make me think that saving the LRs is wrong.
> +        /* Save PPI states (per-CPU) */
> +        /* It is necessary if SMP enabled */
> +        if ( vgic_irq_rank_save(&ctxt.ppi_state, 
&v->arch.vgic.private_irqs) )
> +            return 1;
> +
> +        if ( hvm_save_entry(GIC, v->vcpu_id, h, &ctxt) != 0 )
> +            return 1;
> +    }
> +    return 0;
> +}
> +
> +static int gic_load(struct domain *d, hvm_domain_context_t *h)
> +{
> +    int vcpuid;
> +    struct hvm_hw_gic ctxt;
> +    struct vcpu *v;
> [...]
> +    return 0;
> +}
> +
> +HVM_REGISTER_SAVE_RESTORE(GIC, gic_save, gic_load, 1, HVMSR_PER_VCPU);
> +
> [...]
> +#ifdef CONFIG_ARM_32
> +        ctxt.mair0 = v->arch.mair0;
> +        ctxt.mair1 = v->arch.mair1;
> +#else
> +        ctxt.mair0 = v->arch.mair;
> +#endif
> +        /* Control Registers */
> +        ctxt.actlr = v->arch.actlr;
> +        ctxt.sctlr = v->arch.sctlr;
> +        ctxt.cpacr = v->arch.cpacr;
> [...]
> +        ctxt.x0 = c.x0;
Would it make sense to include a field of type vcpu_guest_core_regs in
the save struct instead of this big list?
[...]> +        #ifdef CONFIG_ARM_32
#ifdef in the first columns please.
> +                ctxt.spsr_fiq = c.spsr_fiq;
> +                ctxt.spsr_irq = c.spsr_irq;
> +                ctxt.spsr_und = c.spsr_und;
> +                ctxt.spsr_abt = c.spsr_abt;
and don''t indent these as if they were inside a block

You had this right when you saved the MAIR registers above.
> +        #endif
> +        #ifdef CONFIG_ARM_64
or #else if you prefer.
> +                ctxt.sp_el0 = c.sp_el0;
> +                ctxt.sp_el1 = c.sp_el1;
> +                ctxt.elr_el1 = c.elr_el1;
> +        #endif
> +
> +        /* check VFP state size before dumping */
> +        if ( sizeof(v->arch.vfp) > sizeof (ctxt.vfp) )
> +        {
> +            dprintk(XENLOG_G_ERR, "hvm_hw_cpu: increase VFP dumping
space\n");
> +            return -EINVAL;
> +        }
> +        memcpy((void*) &ctxt.vfp, (void*) &v->arch.vfp,
sizeof(v->arch.vfp));
> +
> +        ctxt.pause_flags = v->pause_flags;
x86 has at the top of the loop:
        /* We don''t need to save state for a vcpu that is down; the
restore
         * code will leave it down if there is nothing saved. */
        if ( test_bit(_VPF_down, &v->pause_flags) ) 
            continue;

which I think is preferable to this.
> diff --git a/xen/include/asm-arm/hvm/support.h
b/xen/include/asm-arm/hvm/support.h
> new file mode 100644
> index 0000000..8311f2f
> --- /dev/null
> +++ b/xen/include/asm-arm/hvm/support.h
> @@ -0,0 +1,29 @@
> +/*
> + * support.h: HVM support routines used by ARMv7 VE.
Not just on Vexpress nor just on v7 I expect?
> + *
> + * Copyright (c) 2012, Citrix Systems
Really?
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along
with
> + * this program; if not, write to the Free Software Foundation, Inc., 59
Temple
> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> + */
> +
> +#ifndef __ASM_ARM_HVM_SUPPORT_H__
> +#define __ASM_ARM_HVM_SUPPORT_H__
> +
> +#include <xen/types.h>
> +#include <public/hvm/ioreq.h>
> +#include <xen/sched.h>
> +#include <xen/hvm/save.h>
> +#include <asm/processor.h>
Just some #includes here?

Ian,

Ian Campbell

2013-Nov-12 15:21 UTC

head link

Re: [PATCH RESEND v5 2/6] xen/arm: Implement get_maximum_gpfn hypercall for arm

On Fri, 2013-11-08 at 16:50 +0900, Jaeyong Yoo wrote:> From: Evgeny Fedotov <e.fedotov@samsung.com>
> 
> By using the memory map info in arch_domain (from set_memory_map hypercall)
> implement get_maximum_gpfn hypercall.
> 
> Changes from v4: Use GUEST_RAM_BASE as the start physical address of guest
> RAM. And, purge set-memory-map patch
> 
> Singed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
> ---
>  xen/arch/arm/mm.c        | 22 +++++++++++++++++++++-
>  xen/include/asm-arm/mm.h |  2 ++
>  2 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 123280e..3801f07 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -927,7 +927,11 @@ int page_is_ram_type(unsigned long mfn, unsigned long
mem_type)
>  
>  unsigned long domain_get_maximum_gpfn(struct domain *d)
s/unsigned long/xen_pfn_t/ I think.

Urk, which will break the ABI for this hypercall. That''s something of a
conundrum :-/

It is a domctl so we are at liberty to change it, probably to taking a
xen_pfn_t * to fill in instead of returning the value.

I''m in two minds about whether we should do so now or postpone it
(which
risks forgetting and having an obscure bug somewhere down the line).
>  {
> -    return -ENOSYS;
> +    paddr_t end;
> +
> +    get_gma_start_end(d, NULL, &end);
I suppose this function is useful in a future patch too, otherwise 
	return (GUEST_RAM_BASE>>PAGE_SHIFT)+d->max_pages
would suffice.

Ian.

Ian Campbell

2013-Nov-12 15:24 UTC

head link

Re: [PATCH RESEND v5 3/6] xen/arm: Implement modify_returncode

On Fri, 2013-11-08 at 16:50 +0900, Jaeyong Yoo wrote:> From: Alexey Sokolov <sokolov.a@samsung.com>
> 
> Making sched_op in do_suspend (driver/xen/manage.c) returns 0 on the
success of suspend.
> 
> Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

Although if you wanted to...> +#elif defined(__arm__)make this "... || defined(__aarch64__)" and
...> +    ctxt.c.user_regs.r0_usr = 1;
Use x0 here instead (which is a 64-bit alias of r0_usr, so good for both
32 and 64 bit).

Then it would be ready made for 64-bit too in the future.

Ian.

Ian Campbell

2013-Nov-12 15:58 UTC

head link

Re: [PATCH RESEND v5 4/6] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration

On Fri, 2013-11-08 at 16:50 +0900, Jaeyong Yoo wrote:> Implement vlpt (virtual-linear page table) for fast accessing of 3rd PTE of
guest p2m.
> When creating a mapping for vlpt, just copy the 1st level PTE of guest p2m
to the xen''s
> 2nd level PTE. Then the mapping becomes the following:
>     xen''s 1st PTE -->
>     xen''s 2nd PTE (which is the same as 1st PTE of guest p2m)
-->
>     guest p2m''s 2nd PTE -->
>     guest p2m''s 3rd PTE (the memory contents where the vlpt
points)
> For more info about vlpt, see:
> http://www.technovelty.org/linux/virtual-linear-page-table.html
> 
> This function is used in dirty-page tracing: when domU write-fault is
trapped by xen,
> xen can immediately locate the 3rd PTE of guest p2m.
> The following link shows the performance comparison for handling a
dirty-page between
> vlpt and typical page table walking.
> http://lists.xen.org/archives/html/xen-devel/2013-08/msg01503.html
> 
> Changes from v4:
> 1. In the restoring vlpt, use __foo variant without barriers for write_pte
and
>    flush_xen_data_tlb_range_va.
> 2. Support two consecutive pages for guest''s first level page
table.
> 
> Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
> ---
>  xen/arch/arm/domain.c            |   5 ++
>  xen/arch/arm/mm.c                | 112
+++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-arm/arm32/page.h |  41 ++++++++++----
>  xen/include/asm-arm/config.h     |   5 ++
>  xen/include/asm-arm/domain.h     |   7 +++
>  xen/include/asm-arm/mm.h         |  15 ++++++
>  6 files changed, 174 insertions(+), 11 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index cb0424d..c0b5dd8 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -509,6 +509,11 @@ int arch_domain_create(struct domain *d, unsigned int
domcr_flags)
>      /* Default the virtual ID to match the physical */
>      d->arch.vpidr = boot_cpu_data.midr.bits;
>  
> +    d->arch.dirty.second_lvl_start = 0;
> +    d->arch.dirty.second_lvl_end = 0;
> +    d->arch.dirty.second_lvl[0] = NULL;
> +    d->arch.dirty.second_lvl[1] = NULL;
> +
>      clear_page(d->shared_info);
>      share_xen_page_with_guest(
>          virt_to_page(d->shared_info), d, XENSHARE_writable);
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 3801f07..bf13993 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -1328,6 +1328,118 @@ void get_gma_start_end(struct domain *d, paddr_t
*start, paddr_t *end)
>          *end = GUEST_RAM_BASE + ((paddr_t) d->max_pages <<
PAGE_SHIFT);
>  }
>  
> +/* flush the vlpt area */
> +void flush_vlpt(struct domain *d)
> +{
> +    int flush_size;
> +    flush_size = (d->arch.dirty.second_lvl_end -
> +                  d->arch.dirty.second_lvl_start) <<
SECOND_SHIFT;
> +    /* flushing the 3rd level mapping */
> +    flush_xen_data_tlb_range_va(d->arch.dirty.second_lvl_start <<
SECOND_SHIFT,
> +                                flush_size);
All these << SECOND_SHIFT, can they be put into a macro
VLPT_OFFSET(guest_vaddr) perhaps (assuming that''s the right semantics)
> +}
> +
> +/* restore the xen page table for vlpt mapping for domain d */
> +void restore_vlpt(struct domain *d)
I guess this is called on context switch? Why not in this patch?
> +{
> +    int i;
> +    dsb();
> +    for ( i = d->arch.dirty.second_lvl_start;
> +          i < d->arch.dirty.second_lvl_end;
> +          ++i )
> +    {
> +        int k = i % LPAE_ENTRIES;
> +        int l = i / LPAE_ENTRIES;
> +
> +        if ( xen_second[i].bits != d->arch.dirty.second_lvl[l][k].bits
)
> +        {
> +            __write_pte(&xen_second[i],
d->arch.dirty.second_lvl[l][k]);
> +            __flush_xen_data_tlb_range_va(i << SECOND_SHIFT,
> +                                          1 << SECOND_SHIFT);
> +        }
> +    }
> +    dsb();
> +    isb();
> +}
> +
> +/* setting up the xen page table for vlpt mapping for domain d */
> +int prepare_vlpt(struct domain *d)
> +{
> +    int xen_second_linear_base;
> +    int gp2m_start_index, gp2m_end_index;
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +    struct page_info *second_lvl_page;
> +    paddr_t gma_start = 0;
> +    paddr_t gma_end = 0;
> +    lpae_t *first[2];
> +    int i;
> +    uint64_t required, avail = VIRT_LIN_P2M_END - VIRT_LIN_P2M_START;
> +
> +    get_gma_start_end(d, &gma_start, &gma_end);
> +    required = (gma_end - gma_start) >> LPAE_SHIFT;
> +
> +
> +    if ( required > avail )
avail is the number of bytes of virtual address space in the linear p2m
area.

required is the number of pages which the guest has. Does this
comparison correctly account for each page being represented by an
8-byte lpae_t entry?
> +    {
> +        dprintk(XENLOG_ERR, "Available VLPT is small for domU
guest"
> +                            "(avail: %llx, required: %llx)\n",
> +                            avail, required);
> +        return -ENOMEM;
> +    }
> +
> +    xen_second_linear_base = second_linear_offset(VIRT_LIN_P2M_START);
> +
> +    gp2m_start_index = gma_start >> FIRST_SHIFT;
> +    gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1;
> +
> +    if ( xen_second_linear_base + gp2m_end_index >= LPAE_ENTRIES * 2 )
> +    {
> +        dprintk(XENLOG_ERR, "xen second page is small for VLPT for
domU");
> +        return -ENOMEM;
> +    }
> +
> +    second_lvl_page = alloc_domheap_pages(NULL, 1, 0);
There''d be no harm in allocating the two pages separately I think, and
avoiding the possiblilty of an order 1 allocation failing.

But if you do allocate them contiguously then d->arch.dirty.second_lvl
can be a simple lpae_t * and you can just access offsets 0..1023 without
worrying about managing the array of two pointers.

As it stands I think you have the worst of both worlds.

We could also just consider allocating all firstlevel p2m pages from the
xenheap instead of the domheap.
> +    if ( second_lvl_page == NULL )
> +        return -ENOMEM;
> +
> +    /* First level p2m is 2 consecutive pages */
> +    d->arch.dirty.second_lvl[0] = map_domain_page_global(
> +                                           page_to_mfn(second_lvl_page) );
> +    d->arch.dirty.second_lvl[1] = map_domain_page_global(
> +                                           page_to_mfn(second_lvl_page+1)
);
> +
> +    first[0] = __map_domain_page(p2m->first_level);
> +    first[1] = __map_domain_page(p2m->first_level+1);
> +    for ( i = gp2m_start_index; i < gp2m_end_index; ++i )
Can''t this just be a loop over 0..1023 and avoid all this modular
arithmetic:> +    {
> +        int k = i % LPAE_ENTRIES;
> +        int l = i / LPAE_ENTRIES;
> +        int k2 = (xen_second_linear_base + i) % LPAE_ENTRIES;
> +        int l2 = (xen_second_linear_base + i) / LPAE_ENTRIES;
> +
> +        write_pte(&xen_second[xen_second_linear_base+i], first[l][k]);
> +
> +        /* we copy the mapping into domain''s structure as a
reference
> +         * in case of the context switch (used in restore_vlpt) */
Do you keep this up to date when changing the p2m?

In fact, why aren''t you just mapping the actual first level pages? Why
does it have to be a copy?
> +        d->arch.dirty.second_lvl[l2][k2] = first[l][k];
> +    }
> +    unmap_domain_page(first[0]);
> +    unmap_domain_page(first[1]);
> +
> +    /* storing the start and end index */
> +    d->arch.dirty.second_lvl_start = xen_second_linear_base +
gp2m_start_index;
> +    d->arch.dirty.second_lvl_end = xen_second_linear_base +
gp2m_end_index;
> +
> +    flush_vlpt(d);
> +    return 0;
> +}
> +
> +void cleanup_vlpt(struct domain *d)
> +{
> +    /* First level p2m is 2 consecutive pages */
> +    unmap_domain_page_global(d->arch.dirty.second_lvl[0]);
> +    unmap_domain_page_global(d->arch.dirty.second_lvl[1]);
> +}
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/include/asm-arm/arm32/page.h
b/xen/include/asm-arm/arm32/page.h
> index cf12a89..0a4e115 100644
> --- a/xen/include/asm-arm/arm32/page.h
> +++ b/xen/include/asm-arm/arm32/page.h
> @@ -5,20 +5,28 @@
>  
>  /* Write a pagetable entry.
>   *
> - * If the table entry is changing a text mapping, it is responsibility
> - * of the caller to issue an ISB after write_pte.
> + * All necessary barriers are responsibility of the caller
>   */
> -static inline void write_pte(lpae_t *p, lpae_t pte)
> +static inline void __write_pte(lpae_t *p, lpae_t pte)
>  {
>      asm volatile (
> -        /* Ensure any writes have completed with the old mappings. */
> -        "dsb;"
>          /* Safely write the entry (STRD is atomic on CPUs that support
LPAE) */
>          "strd %0, %H0, [%1];"
> -        "dsb;"
>          : : "r" (pte.bits), "r" (p) :
"memory");
>  }
>  
> +/* Write a pagetable entry with dsb barriers.
> + *
> + * If the table entry is changing a text mapping, it is responsibility
> + * of the caller to issue an ISB after write_pte.
> + */
> +static inline void write_pte(lpae_t *p, lpae_t pte)
> +{
> +    dsb();
Please retain the comments from the original version.
> +    __write_pte(p, pte);
> +    dsb();
> +}
> +
>  /* Inline ASM to flush dcache on register R (may be an inline asm operand)
*/
>  #define __flush_xen_dcache_one(R) STORE_CP32(R, DCCMVAC)
>  
> @@ -57,18 +65,28 @@ static inline void flush_xen_data_tlb(void)
>  }
>  
>  /*
> - * Flush a range of VA''s hypervisor mappings from the data TLB.
This is not
> - * sufficient when changing code mappings or for self modifying code.
> + * Flush a range of VA''s hypervisor mappings from the data TLB.
> + * All necessary barriers are responsibility of the caller
>   */
> -static inline void flush_xen_data_tlb_range_va(unsigned long va, unsigned
long size)
> +static inline void __flush_xen_data_tlb_range_va(unsigned long va,
unsigned long size)
>  {
>      unsigned long end = va + size;
> -    dsb(); /* Ensure preceding are visible */
>      while ( va < end ) {
>          asm volatile(STORE_CP32(0, TLBIMVAH)
>                       : : "r" (va) : "memory");
>          va += PAGE_SIZE;
>      }
> +}
> +
> +/*
> + * Flush a range of VA''s hypervisor mappings from the data TLB
with barriers.
> + * This barrier is not sufficient when changing code mappings or for self
> + * modifying code.
> + */
> +static inline void flush_xen_data_tlb_range_va(unsigned long va, unsigned
long size)
> +{
> +    dsb(); /* Ensure preceding are visible */
> +    __flush_xen_data_tlb_range_va(va, size);
>      dsb(); /* Ensure completion of the TLB flush */
>      isb();
>  }
> diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h
> index 5b7b1a8..15ad56d 100644
> --- a/xen/include/asm-arm/config.h
> +++ b/xen/include/asm-arm/config.h
> @@ -87,6 +87,7 @@
>   *   0  -   8M   <COMMON>
>   *
>   *  32M - 128M   Frametable: 24 bytes per page for 16GB of RAM
> + * 128M - 256M   Virtual-linear mapping to P2M table
>   * 256M -   1G   VMAP: ioremap and early_ioremap use this virtual address
>   *                    space
>   *
> @@ -124,7 +125,9 @@
>  #define CONFIG_SEPARATE_XENHEAP 1
>  
>  #define FRAMETABLE_VIRT_START  _AT(vaddr_t,0x02000000)
> +#define VIRT_LIN_P2M_START     _AT(vaddr_t,0x08000000)
>  #define VMAP_VIRT_START  _AT(vaddr_t,0x10000000)
> +#define VIRT_LIN_P2M_END       VMAP_VIRT_START
>  #define XENHEAP_VIRT_START     _AT(vaddr_t,0x40000000)
>  #define XENHEAP_VIRT_END       _AT(vaddr_t,0x7fffffff)
>  #define DOMHEAP_VIRT_START     _AT(vaddr_t,0x80000000)
> @@ -157,6 +160,8 @@
>  
>  #define HYPERVISOR_VIRT_END    DIRECTMAP_VIRT_END
>  
> +/*TODO (ARM_64): define VIRT_LIN_P2M_START VIRT_LIN_P2M_END */
> +
>  #endif
>  
>  /* Fixmap slots */
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index 67bfbbc..4f366f1 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -112,6 +112,13 @@ struct arch_domain
>          spinlock_t                  lock;
>      } vuart;
>  
> +    /* dirty-page tracing */
> +    struct {
> +        volatile int second_lvl_start;   /* for context switch */
> +        volatile int second_lvl_end;
Are these virtual addresses, or pages or something else? We mostly hav
specific types (e.g. vaddr_t) for such things.

volatile is almost always wrong to use. Why do you want it here? If
there is an actual coherency issue you will need proper barriers, not
volatile.

Ian.

Ian Campbell

2013-Nov-12 16:56 UTC

head link

Re: [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

On Fri, 2013-11-08 at 16:50 +0900, Jaeyong Yoo wrote:> Add hypercall (shadow op: enable/disable and clean/peek dirtied page
bitmap).
> It consists of two parts: dirty page detecting and saving.
> For detecting, we setup the guest p2m''s leaf PTE read-only and
whenever the guest
> tries to write something, permission fault happens and traps into xen.
> The permission-faulted GPA should be saved for the toolstack (when it wants
to see
> which pages are dirtied). For this purpose, we temporarily save the GPAs
into bitmap.
> 
> Changes from v4:
> 1. For temporary saving dirty pages, use bitmap rather than linked list.
> 2. Setup the p2m''s second level page as read-write in the view of
xen''s memory access.
>    It happens in p2m_create_table function.
> 
> Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
> ---
>  xen/arch/arm/domain.c           |  14 +++
>  xen/arch/arm/domctl.c           |   9 ++
>  xen/arch/arm/mm.c               | 103 +++++++++++++++++++-
>  xen/arch/arm/p2m.c              | 206
++++++++++++++++++++++++++++++++++++++++
>  xen/arch/arm/traps.c            |   9 ++
>  xen/include/asm-arm/domain.h    |   7 ++
>  xen/include/asm-arm/mm.h        |   7 ++
>  xen/include/asm-arm/p2m.h       |   4 +
>  xen/include/asm-arm/processor.h |   2 +
>  9 files changed, 360 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index c0b5dd8..0a32301 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -215,6 +215,12 @@ static void ctxt_switch_to(struct vcpu *n)
>      WRITE_SYSREG(hcr, HCR_EL2);
>      isb();
>  
> +    /* for dirty-page tracing
> +     * XXX: how do we consider SMP case?
> +     */
> +    if ( n->domain->arch.dirty.mode )
> +        restore_vlpt(n->domain);
This is an interesting question. xen_second is shared between all pcpus,
which means that the vlpt is currently only usable from a single
physical CPU at a time.

Currently the only per-cpu area is the domheap region from 2G-4G. We
could steal some address space from the top or bottom of there?
>      /* This is could trigger an hardware interrupt from the virtual
>       * timer. The interrupt needs to be injected into the guest. */
>      virt_timer_restore(n);
> @@ -509,11 +515,19 @@ int arch_domain_create(struct domain *d, unsigned int
domcr_flags)
>      /* Default the virtual ID to match the physical */
>      d->arch.vpidr = boot_cpu_data.midr.bits;
>  
> +    /* init for dirty-page tracing */
> +    d->arch.dirty.count = 0;
> +    d->arch.dirty.mode = 0;
> +    spin_lock_init(&d->arch.dirty.lock);
> +
>      d->arch.dirty.second_lvl_start = 0;
>      d->arch.dirty.second_lvl_end = 0;
>      d->arch.dirty.second_lvl[0] = NULL;
>      d->arch.dirty.second_lvl[1] = NULL;
>  
> +    memset(d->arch.dirty.bitmap, 0, sizeof(d->arch.dirty.bitmap));
> +    d->arch.dirty.bitmap_pages = 0;
> +
>      clear_page(d->shared_info);
>      share_xen_page_with_guest(
>          virt_to_page(d->shared_info), d, XENSHARE_writable);
> diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
> index cb38e59..eb74225 100644
> --- a/xen/arch/arm/domctl.c
> +++ b/xen/arch/arm/domctl.c
> @@ -93,6 +93,15 @@ long arch_do_domctl(struct xen_domctl *domctl, struct
domain *d,
>              xfree(c.data);
>      }
>      break;
> +    case XEN_DOMCTL_shadow_op:
> +    {
> +        domain_pause(d);
> +        ret = dirty_mode_op(d, &domctl->u.shadow_op);
> +        domain_unpause(d);
> +
> +        copyback = 1;
> +    }
> +    break;
>  
>      default:
>          return -EINVAL;
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index bf13993..d5a0a11 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -845,7 +845,6 @@ void destroy_xen_mappings(unsigned long v, unsigned
long e)
>      create_xen_entries(REMOVE, v, 0, (e - v) >> PAGE_SHIFT, 0);
>  }
>  
> -enum mg { mg_clear, mg_ro, mg_rw, mg_rx };
>  static void set_pte_flags_on_range(const char *p, unsigned long l, enum mg
mg)
>  {
>      lpae_t pte;
> @@ -1320,6 +1319,60 @@ int is_iomem_page(unsigned long mfn)
>   *  xen: arm: 64-bit guest support and domU FDT autogeneration
>   *  will be upstreamed.
>   */
> +
> +static inline void mark_dirty_bitmap(struct domain *d, paddr_t addr)
> +{
> +    paddr_t ram_base = (paddr_t) GUEST_RAM_BASE;
> +    int bit_index = PFN_DOWN(addr - ram_base);
> +    int page_index = bit_index >> (PAGE_SHIFT + 3);
+3?
> +    int bit_index_residual = bit_index & ((1ul << (PAGE_SHIFT +
3)) - 1);
> +
> +    set_bit(bit_index_residual, d->arch.dirty.bitmap[page_index]);
> +}
> +
> +/* routine for dirty-page tracing
> + *
> + * On first write, it page faults, its entry is changed to read-write,
> + * and on retry the write succeeds.
> + *
> + * for locating p2m of the faulting entry, we use virtual-linear page
table.
> + * returns zero if addr is not valid or dirty mode is not set
> + */
> +int handle_page_fault(struct domain *d, paddr_t addr)
> +{
> +
> +    lpae_t *vlp2m_pte = 0;
> +    paddr_t gma_start = 0;
> +    paddr_t gma_end = 0;
> +
> +    if ( !d->arch.dirty.mode ) return 0;
> +    get_gma_start_end(d, &gma_start, &gma_end);
> +
> +    /* Ensure that addr is inside guest''s RAM */
> +    if ( addr < gma_start ||
> +         addr > gma_end ) return 0;
> +
> +    vlp2m_pte = get_vlpt_3lvl_pte(addr);
> +    if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0
&&
> +         vlp2m_pte->p2m.avail == 0 /* reuse avail bit as read-only */ )
> +    {
> +        lpae_t pte = *vlp2m_pte;
> +        pte.p2m.write = 1;
> +        write_pte(vlp2m_pte, pte);
Do we not need the p2m lock around here somewhere?
> +        flush_tlb_local();
If SMP was working this would need to be inner shareable to cope with
vcpus running on other pcpus.
> +        /* only necessary to lock between get-dirty bitmap and mark dirty
> +         * bitmap. If get-dirty bitmap happens immediately before this
> +         * lock, the corresponding dirty-page would be marked at the next
> +         * round of get-dirty bitmap */
> +        spin_lock(&d->arch.dirty.lock);
> +        mark_dirty_bitmap(d, addr);
> +        spin_unlock(&d->arch.dirty.lock);
Might an atomic set_bit suffice rather than the lock?
> +    }
> +
> +    return 1;
> +}
> +
>  void get_gma_start_end(struct domain *d, paddr_t *start, paddr_t *end)
>  {
>      if ( start )
> @@ -1440,6 +1493,54 @@ void cleanup_vlpt(struct domain *d)
>      unmap_domain_page_global(d->arch.dirty.second_lvl[0]);
>      unmap_domain_page_global(d->arch.dirty.second_lvl[1]);
>  }
> +
> +int prepare_bitmap(struct domain *d)
> +{
> +    paddr_t gma_start = 0;
> +    paddr_t gma_end = 0;
> +    int nr_bytes;
> +    int nr_pages;
> +    int i;
> +
> +    get_gma_start_end(d, &gma_start, &gma_end);
> +
> +    nr_bytes = (PFN_DOWN(gma_end - gma_start) + 7) / 8;
> +    nr_pages = (nr_bytes + PAGE_SIZE - 1) / PAGE_SIZE;
> +
> +    BUG_ON( nr_pages > MAX_DIRTY_BITMAP_PAGES );
> +
> +    for ( i = 0; i < nr_pages; ++i )
> +    {
> +        struct page_info *page;
> +        page = alloc_domheap_page(NULL, 0);
> +        if ( page == NULL )
> +            goto cleanup_on_failure;
> +
> +        d->arch.dirty.bitmap[i] =
map_domain_page_global(__page_to_mfn(page));
I think you may as well just use the xenheap here.
> +/* Change types across all p2m entries in a domain */
> +void p2m_change_entry_type_global(struct domain *d, enum mg nt)
> +{
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +    paddr_t ram_base;
> +    int i1, i2, i3;
> +    int first_index, second_index, third_index;
> +    lpae_t *first = __map_domain_page(p2m->first_level);
> +    lpae_t pte, *second = NULL, *third = NULL;
> +
> +    get_gma_start_end(d, &ram_base, NULL);
> +
> +    first_index = first_table_offset((uint64_t)ram_base);
> +    second_index = second_table_offset((uint64_t)ram_base);
> +    third_index = third_table_offset((uint64_t)ram_base);
Are those casts needed?
> +
> +    BUG_ON( !first && "Can''t map first level
p2m." );
> +
> +    spin_lock(&p2m->lock);
> +
> +    for ( i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1 )
You avoid iterating over stuff below ram_base but you aren''t worried
about ram end?
> +    {
> +        lpae_walk_t first_pte = first[i1].walk;
> +        if ( !first_pte.valid || !first_pte.table )
> +            goto out;
> +
> +        second = map_domain_page(first_pte.base);
> +        BUG_ON( !second && "Can''t map second level
p2m.");
> +        for ( i2 = second_index; i2 < LPAE_ENTRIES; ++i2 )
> +        {
> +            lpae_walk_t second_pte = second[i2].walk;
> +
> +            if ( !second_pte.valid || !second_pte.table )
> +                goto out;
goto out is a bit strong. Don''t you want to move on to the next second
level page?
> +
> +            third = map_domain_page(second_pte.base);
> +            BUG_ON( !third && "Can''t map third level
p2m.");
> +
> +            for ( i3 = third_index; i3 < LPAE_ENTRIES; ++i3 )
> +            {
> +                lpae_walk_t third_pte = third[i3].walk;
> +                if ( !third_pte.valid )
> +                    goto out;
> +
> +                pte = third[i3];
> +                if ( nt == mg_ro )
> +                {
> +                    if ( pte.p2m.write == 1 )
> +                    {
> +                        pte.p2m.write = 0;
> +                        pte.p2m.avail = 0;
> +                    }
> +                    else
> +                    {
> +                        /* reuse avail bit as an indicator
> +                         * of ''actual'' read-only */
> +                        pte.p2m.avail = 1;
> +                    }
This inner if is equivalent to
	pte.p2m.avail = pte.p2m.write;
	pte.p2m.write = 1;
> +                }
> +                else if ( nt == mg_rw )
> +                {
> +                    if ( pte.p2m.write == 0 && pte.p2m.avail == 0
)
> +                    {
> +                        pte.p2m.write = 1;
> +                    }
and this one is:
	pte.p2m.write = pte.p2m.avail;
> +                }
> +                write_pte(&third[i3], pte);
> +            }
> +            unmap_domain_page(third);
> +
> +            third = NULL;
> +            third_index = 0;
> +        }
> +        unmap_domain_page(second);
> +
> +        second = NULL;
> +        second_index = 0;
> +        third_index = 0;
> +    }
> +
> +out:
> +    flush_tlb_all_local();
> +    if ( third ) unmap_domain_page(third);
> +    if ( second ) unmap_domain_page(second);
> +    if ( first ) unmap_domain_page(first);
> +
> +    spin_unlock(&p2m->lock);
> +}
> +
> +/* Read a domain''s log-dirty bitmap and stats.
> + * If the operation is a CLEAN, clear the bitmap and stats. */
> +int log_dirty_op(struct domain *d, xen_domctl_shadow_op_t *sc)
> +{
> +    int peek = 1;
> +    int i;
> +    int bitmap_size;
> +    paddr_t gma_start, gma_end;
> +
> +    /* this hypercall is called from domain 0, and we don''t know
which guest''s
> +     * vlpt is mapped in xen_second, so, to be sure, we restore vlpt here
*/
> +    restore_vlpt(d);
> +
> +    get_gma_start_end(d, &gma_start, &gma_end);
> +    bitmap_size = (gma_end - gma_start) / 8;
> +
> +    if ( guest_handle_is_null(sc->dirty_bitmap) )
> +    {
> +        peek = 0;
What do you do with this?
> +    }
> +    else
> +    {
> +        spin_lock(&d->arch.dirty.lock);
> +        for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
> +        {
> +            int j = 0;
> +            uint8_t *bitmap;
> +            copy_to_guest_offset(sc->dirty_bitmap, i * PAGE_SIZE,
> +                                 d->arch.dirty.bitmap[i],
> +                                 bitmap_size < PAGE_SIZE ? bitmap_size
:
> +                                                           PAGE_SIZE);
The bitmap is always a mutliple of page_size, isn''t it?
> +            bitmap_size -= PAGE_SIZE;
> +
> +            /* set p2m page table read-only */
> +            bitmap = d->arch.dirty.bitmap[i];
> +            while ((j = find_next_bit((const long unsigned int *)bitmap,
> +                                      PAGE_SIZE*8, j)) < PAGE_SIZE*8)
> +            {
> +                lpae_t *vlpt;
> +                paddr_t addr = gma_start +
> +                               (i << (2*PAGE_SHIFT+3)) +
> +                               (j << PAGE_SHIFT);
> +                vlpt = get_vlpt_3lvl_pte(addr);
> +                vlpt->p2m.write = 0;
Needs to be a write_pte type construct I think.
> +                j++;
> +            }
> +        }
> +
> +        if ( sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN )
> +        {
> +            for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
> +            {
> +                clear_page(d->arch.dirty.bitmap[i]);
> +            }
> +        }
> +
> +        spin_unlock(&d->arch.dirty.lock);
Ah, I see why the lock in the handler is needed.

This holds the lock over quite a long period. Could it be made more
granular by copying and clearing each page in sequence, dropping the
lock between pages?
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index 287dd7b..1a7ed11 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -1321,6 +1321,8 @@ static void do_trap_data_abort_guest(struct
cpu_user_regs *regs,
>      const char *msg;
>      int rc, level = -1;
>      mmio_info_t info;
> +    int page_fault = ( (dabt.dfsc & FSC_MASK) => +                 
(FSC_FLT_PERM | FSC_3D_LEVEL) && dabt.write );
I think you can use FSC_TYPE_MASK and FSC_LL_MASK here, can''t you?

I think this would be better off refactored into a
dabt_is_page_fault(dabt), used in the test below. That would allow you
to more easily comment on why these particular conditions are the ones
we care about etc.
>      if ( !check_conditional_instr(regs, hsr) )
>      {
> @@ -1342,6 +1344,13 @@ static void do_trap_data_abort_guest(struct
cpu_user_regs *regs,
>      if ( rc == -EFAULT )
>          goto bad_data_abort;
>  
> +    /* domU page fault handling for guest live migration */
> +    /* dabt.valid can be 0 here */
> +    if ( page_fault && handle_page_fault(current->domain,
info.gpa) )
> +    {
> +        /* Do not modify pc after page fault to repeat memory operation */
> +        return;
> +    }
>      /* XXX: Decode the instruction if ISS is not valid */
>      if ( !dabt.valid )
>          goto bad_data_abort;
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index 4f366f1..180d924 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -114,9 +114,16 @@ struct arch_domain
>  
>      /* dirty-page tracing */
>      struct {
> +#define MAX_DIRTY_BITMAP_PAGES 64        /* support upto 8GB guest memory
*/
> +        spinlock_t lock;                 /* protect list: head, mvn_head
*/
> +        volatile int mode;               /* 1 if dirty pages tracing
enabled */
> +        volatile unsigned int count;     /* dirty pages counter */
More unnecessary volatiles i think.
>          volatile int second_lvl_start;   /* for context switch */
>          volatile int second_lvl_end;
>          lpae_t *second_lvl[2];           /* copy of guest p2m''s
first */
> +        /* dirty bitmap */
> +        uint8_t *bitmap[MAX_DIRTY_BITMAP_PAGES];
> +        int bitmap_pages;                /* number of dirty bitmap pages
*/
>      } dirty;
>  
>  }  __cacheline_aligned;
> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
> index a74e135..1ce7a4b 100644
> --- a/xen/include/asm-arm/mm.h
> +++ b/xen/include/asm-arm/mm.h
> @@ -341,11 +341,18 @@ static inline void put_page_and_type(struct page_info
*page)
>      put_page(page);
>  }
>  
> +enum mg { mg_clear, mg_ro, mg_rw, mg_rx };
> +
> +/* routine for dirty-page tracing */
> +int handle_page_fault(struct domain *d, paddr_t addr);
>  void get_gma_start_end(struct domain *d, paddr_t *start, paddr_t *end);
>  int prepare_vlpt(struct domain *d);
>  void cleanup_vlpt(struct domain *d);
>  void restore_vlpt(struct domain *d);
>  
> +int prepare_bitmap(struct domain *d);
> +void cleanup_bitmap(struct domain *d);
> +
>  /* calculate the xen''s virtual address for accessing the leaf PTE
of
>   * a given address (GPA) */
>  static inline lpae_t * get_vlpt_3lvl_pte(paddr_t addr)
> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> index c660820..dba9a7b 100644
> --- a/xen/include/asm-arm/p2m.h
> +++ b/xen/include/asm-arm/p2m.h
> @@ -2,6 +2,7 @@
>  #define _XEN_P2M_H
>  
>  #include <xen/mm.h>
> +#include <public/domctl.h>
>  
>  struct domain;
>  
> @@ -110,6 +111,9 @@ static inline int get_page_and_type(struct page_info
*page,
>      return rc;
>  }
>  
> +void p2m_change_entry_type_global(struct domain *d, enum mg nt);
> +long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc);
> +
>  #endif /* _XEN_P2M_H */
>  
>  /*
> diff --git a/xen/include/asm-arm/processor.h
b/xen/include/asm-arm/processor.h
> index 5294421..fced6ad 100644
> --- a/xen/include/asm-arm/processor.h
> +++ b/xen/include/asm-arm/processor.h
> @@ -399,6 +399,8 @@ union hsr {
>  #define FSC_CPR        (0x3a) /* Coprocossor Abort */
>  
>  #define FSC_LL_MASK    (0x03<<0)
> +#define FSC_MASK       (0x3f) /* Fault status mask */
> +#define FSC_3D_LEVEL   (0x03) /* Third level fault*/
>  
>  /* Time counter hypervisor control register */
>  #define CNTHCTL_PA      (1u<<0)  /* Kernel/user access to physical
counter */

Ian Campbell

2013-Nov-12 17:22 UTC

head link

Re: [PATCH RESEND v5 6/6] xen/arm: Implement toolstack for xl restore/save and migrate

On Fri, 2013-11-08 at 16:50 +0900, Jaeyong Yoo wrote:> From: Alexey Sokolov <sokolov.a@samsung.com>
> 
> Implement for xl restore/save (which are also used for migrate) operation
in xc_arm_migrate.c and make it compilable.
> The overall process of save is the following:
> 1) save guest parameters (i.e., memory map, console and store pfn, etc)
> 2) save memory (if it is live, perform dirty-page tracing)
> 3) save hvm states (i.e., gic, timer, vcpu etc)
> 
> Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com>
> ---
>  config/arm32.mk              |   1 +
>  tools/libxc/Makefile         |   6 +-
>  tools/libxc/xc_arm_migrate.c | 712
+++++++++++++++++++++++++++++++++++++++++++
>  tools/libxc/xc_dom_arm.c     |   4 +-
>  tools/misc/Makefile          |   4 +-
>  5 files changed, 723 insertions(+), 4 deletions(-)
>  create mode 100644 tools/libxc/xc_arm_migrate.c
> 
> diff --git a/config/arm32.mk b/config/arm32.mk
> index aa79d22..01374c9 100644
> --- a/config/arm32.mk
> +++ b/config/arm32.mk
> @@ -1,6 +1,7 @@
>  CONFIG_ARM := y
>  CONFIG_ARM_32 := y
>  CONFIG_ARM_$(XEN_OS) := y
> +CONFIG_MIGRATE := y
>  
>  CONFIG_XEN_INSTALL_SUFFIX :>  
> diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
> index 4c64c15..05dfef4 100644
> --- a/tools/libxc/Makefile
> +++ b/tools/libxc/Makefile
> @@ -42,8 +42,13 @@ CTRL_SRCS-$(CONFIG_MiniOS) += xc_minios.c
>  GUEST_SRCS-y :>  GUEST_SRCS-y += xg_private.c xc_suspend.c
>  ifeq ($(CONFIG_MIGRATE),y)
> +ifeq ($(CONFIG_X86),y)
>  GUEST_SRCS-y += xc_domain_restore.c xc_domain_save.c
>  GUEST_SRCS-y += xc_offline_page.c xc_compression.c
> +endif
> +ifeq ($(CONFIG_ARM),y)
> +GUEST_SRCS-y += xc_arm_migrate.c
I know you are just following the example above but I think this can be
GUEST_SRCS-$(CONFIG_ARM) += xc_arm...
> +endif
>  else
>  GUEST_SRCS-y += xc_nomigrate.c
>  endif
> @@ -63,7 +68,6 @@ $(patsubst %.c,%.opic,$(ELF_SRCS-y)): CFLAGS +=
-Wno-pointer-sign
>  GUEST_SRCS-y                 += xc_dom_core.c xc_dom_boot.c
>  GUEST_SRCS-y                 += xc_dom_elfloader.c
>  GUEST_SRCS-$(CONFIG_X86)     += xc_dom_bzimageloader.c
> -GUEST_SRCS-$(CONFIG_X86)     += xc_dom_decompress_lz4.c
I don''t think this was intentional, was it?
>  GUEST_SRCS-$(CONFIG_ARM)     += xc_dom_armzimageloader.c
>  GUEST_SRCS-y                 += xc_dom_binloader.c
>  GUEST_SRCS-y                 += xc_dom_compat_linux.c
> diff --git a/tools/libxc/xc_arm_migrate.c b/tools/libxc/xc_arm_migrate.c
> new file mode 100644
> index 0000000..461e339
> --- /dev/null
> +++ b/tools/libxc/xc_arm_migrate.c
> @@ -0,0 +1,712 @@
Is this implementing the exact protocol as described in
tools/libxc/xg_save_restore.h or is it a variant? Are there any docs of
the specifics of the ARM protocol?

We will eventually need to make a statement about the stability of the
protocol, i.e on x86 we support X->X+1 migrations across Xen versions. I
think we''d need to make similar guarantees on ARM before we would
remove
the "tech preview" label from the migration feature.

So the docs are useful so we can review the intended protocol for
forward compatibility problems etc. We needn''t necessarily implement
the
x86 one from xg_save_restore.h.

In particular it would be nice if the protocol and each of the
"chunks"
in it were explicitly versioned etc. For example the code assumes that
the HVM context implicitly follows the last iteration -- this caused
untold pain on x86 when remus was added...
>
+/******************************************************************************
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation;
> + * version 2.1 of the License.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 
02110-1301  USA
> + *
> + * Copyright (c) 2013, Samsung Electronics
> + */
> +
> +#include <inttypes.h>
> +#include <errno.h>
> +#include <xenctrl.h>
> +#include <xenguest.h>
> +
> +#include <unistd.h>
> +#include <xc_private.h>
> +#include <xc_dom.h>
> +#include "xc_bitops.h"
> +#include "xg_private.h"
> +
> +/* Guest RAM base */
> +#define GUEST_RAM_BASE 0x80000000
> +/*
> + *  XXX: Use correct definition for RAM base when the following patch
> + *  xen: arm: 64-bit guest support and domU FDT autogeneration
> + *  will be upstreamed.
> + */
> +
> +#define DEF_MAX_ITERS          29 /* limit us to 30 times round loop   */
> +#define DEF_MAX_FACTOR         3  /* never send more than 3x p2m_size  */
> +#define DEF_MIN_DIRTY_PER_ITER 50 /* dirty page count to define last iter
*/
> +#define DEF_PROGRESS_RATE      50 /* progress bar update rate */
> +
> +/* Enable this macro for debug only: "static" migration instead
of live */
> +/*
> +#define DISABLE_LIVE_MIGRATION
> +*/
I don''t think this is needed, the caller can be hacked if necessary.
> +
> +/* Enable this macro for debug only: additional debug info */
> +/*
> +#define ARM_MIGRATE_VERBOSE
> +*/
Likewise.
> +/* ============ Memory ============= */
> +static int save_memory(xc_interface *xch, int io_fd, uint32_t dom,
> +                       struct save_callbacks *callbacks,
> +                       uint32_t max_iters, uint32_t max_factor,
> +                       guest_params_t *params)
> +{
> +    int live =  !!(params->flags & XCFLAGS_LIVE);
> +    int debug =  !!(params->flags & XCFLAGS_DEBUG);
> +    xen_pfn_t i;
> +    char reportbuf[80];
> +    int iter = 0;
> +    int last_iter = !live;
> +    int total_dirty_pages_num = 0;
> +    int dirty_pages_on_prev_iter_num = 0;
> +    int count = 0;
> +    char *page = 0;
> +    xen_pfn_t *busy_pages = 0;
> +    int busy_pages_count = 0;
> +    int busy_pages_max = 256;
> +
> +    DECLARE_HYPERCALL_BUFFER(unsigned long, to_send);
> +
> +    xen_pfn_t start = params->start_gpfn;
> +    const xen_pfn_t end = params->max_gpfn;
> +    const xen_pfn_t mem_size = end - start;
> +
> +    if ( debug )
> +    {
> +        IPRINTF("(save mem) start=%llx end=%llx!\n", start,
end);
> +    }
FYI you don''t need the {}''s for cases like this.

is if ( debug ) IPRINTF(...) not the equivalent of DPRINTF?
> +
> +    if ( live )
> +    {
> +        if ( xc_shadow_control(xch, dom,
XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
> +                    NULL, 0, NULL, 0, NULL) < 0 )
> +        {
> +            ERROR("Couldn''t enable log-dirty mode
!\n");
> +            return -1;
> +        }
> +
> +        max_iters  = max_iters  ? : DEF_MAX_ITERS;
> +        max_factor = max_factor ? : DEF_MAX_FACTOR;
> +
> +        if ( debug )
> +            IPRINTF("Log-dirty mode enabled, max_iters=%d,
max_factor=%d!\n",
> +                    max_iters, max_factor);
> +    }
> +
> +    to_send = xc_hypercall_buffer_alloc_pages(xch, to_send,
> +                                             
NRPAGES(bitmap_size(mem_size)));
> +    if ( !to_send )
> +    {
> +        ERROR("Couldn''t allocate to_send array!\n");
> +        return -1;
> +    }
> +
> +    /* send all pages on first iter */
> +    memset(to_send, 0xff, bitmap_size(mem_size));
> +
> +    for ( ; ; )
> +    {
> +        int dirty_pages_on_current_iter_num = 0;
> +        int frc;
> +        iter++;
> +
> +        snprintf(reportbuf, sizeof(reportbuf),
> +                 "Saving memory: iter %d (last sent %u)",
> +                 iter, dirty_pages_on_prev_iter_num);
> +
> +        xc_report_progress_start(xch, reportbuf, mem_size);
> +
> +        if ( (iter > 1 &&
> +              dirty_pages_on_prev_iter_num < DEF_MIN_DIRTY_PER_ITER) ||
> +             (iter == max_iters) ||
> +             (total_dirty_pages_num >= mem_size*max_factor) )
> +        {
> +            if ( debug )
> +                IPRINTF("Last iteration");
> +            last_iter = 1;
> +        }
> +
> +        if ( last_iter )
> +        {
> +            if ( suspend_and_state(callbacks->suspend,
callbacks->data,
> +                                   xch, dom) )
> +            {
> +                ERROR("Domain appears not to have suspended");
> +                return -1;
> +            }
> +        }
> +        if ( live && iter > 1 )
> +        {
> +            frc = xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_CLEAN,
> +                                    HYPERCALL_BUFFER(to_send), mem_size,
> +                                                     NULL, 0, NULL);
> +            if ( frc != mem_size )
> +            {
> +                ERROR("Error peeking shadow bitmap");
> +                xc_hypercall_buffer_free_pages(xch, to_send,
> +                                              
NRPAGES(bitmap_size(mem_size)));
> +                return -1;
> +            }
> +        }
> +
> +        busy_pages = malloc(sizeof(xen_pfn_t) * busy_pages_max);
> +
> +        for ( i = start; i < end; ++i )
> +        {
> +            if ( test_bit(i - start, to_send) )
> +            {
> +                page = xc_map_foreign_range(xch, dom, PAGE_SIZE,
PROT_READ, i);
On x86 we try to do this in batches to reduce the overheads. I suppose
that could be a future enhancement.
> 
> +                if ( !page )
> +                {
> +                    /* This page is mapped elsewhere, should be resent
later */
What does this ("busy") mean? When does this happen?

[...]> +
> +static int restore_guest_params(xc_interface *xch, int io_fd,
> +                                uint32_t dom, guest_params_t *params)
> +{
> [...]
> +    if ( xc_domain_setmaxmem(xch, dom, maxmemkb) )
> +    {
> +        ERROR("Can''t set memory map");
> +        return -1;
> +    }
> +
> +    /* Set max. number of vcpus as max_vcpu_id + 1 */
> +    if ( xc_domain_max_vcpus(xch, dom, params->max_vcpu_id + 1) )
Does the higher level toolstack not take care of vcpus and maxmem? I
thought so. I think this is how it shoud be.
> +    {
> +        ERROR("Can''t set max vcpu number for domain");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +[...]
> diff --git a/tools/misc/Makefile b/tools/misc/Makefile
> index 17aeda5..0824100 100644
> --- a/tools/misc/Makefile
> +++ b/tools/misc/Makefile
> @@ -11,7 +11,7 @@ HDRS     = $(wildcard *.h)
>  
>  TARGETS-y := xenperf xenpm xen-tmem-list-parse gtraceview gtracestat
xenlockprof xenwatchdogd xencov
>  TARGETS-$(CONFIG_X86) += xen-detect xen-hvmctx xen-hvmcrash xen-lowmemd
xen-mfndump
> -TARGETS-$(CONFIG_MIGRATE) += xen-hptool
> +TARGETS-$(CONFIG_X86) += xen-hptool
>  TARGETS := $(TARGETS-y)
>  
>  SUBDIRS := $(SUBDIRS-y)
> @@ -23,7 +23,7 @@ INSTALL_BIN := $(INSTALL_BIN-y)
>  INSTALL_SBIN-y := xen-bugtool xen-python-path xenperf xenpm
xen-tmem-list-parse gtraceview \
>  	gtracestat xenlockprof xenwatchdogd xen-ringwatch xencov
>  INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx xen-hvmcrash xen-lowmemd
xen-mfndump
> -INSTALL_SBIN-$(CONFIG_MIGRATE) += xen-hptool
> +INSTALL_SBIN-$(CONFIG_X86) += xen-hptool
>  INSTALL_SBIN := $(INSTALL_SBIN-y)
>  
>  INSTALL_PRIVBIN-y := xenpvnetboot
You could resend these last two separately and they could probably go
straight in.

Ian.

Eugene Fedotov

2013-Nov-13 08:00 UTC

head link

Re: [PATCH RESEND v5 1/6] xen/arm: Implement hvm save and restore

12.11.2013 19:15, Ian Campbell  wrote:> On Fri, 2013-11-08 at 16:50 +0900, Jaeyong Yoo wrote:
>
> If these get resent again please can you CC the ARM maintainers.
> $ git grep -A5 ARM MAINTAINERS
> MAINTAINERS:ARM (W/ VIRTUALISATION EXTENSIONS) ARCHITECTURE
> MAINTAINERS-M:  Ian Campbell <ian.campbell@citrix.com>
> MAINTAINERS-M:  Stefano Stabellini <stefano.stabellini@citrix.com>
> MAINTAINERS-M:  Tim Deegan <tim@xen.org>
> MAINTAINERS-S:  Supported
> MAINTAINERS-L:  xen-devel@lists.xen.org
>
>> From: Evgeny Fedotov <e.fedotov@samsung.com>
>>
>> Implement save/restore of hvm context hypercall.
>> In hvm context save/restore, we save gic, timer and vfp registers.
>>
>> Changes from v4: Save vcpu registers within hvm context, and purge
>> the save-vcpu-register patch.
> The inter-version changelog should go after the main commit message and
> the S-o-b separated by a "---" on a line by itself. This way it
is not
> included in the final commit.
OK.> [...]
>> +static int vgic_irq_rank_save(struct vgic_rank *ext,
>> +                               struct vgic_irq_rank *rank)
>> +{
>> +    spin_lock(&rank->lock);
> This is probably wise, but the domain must be paused at this point,
> right?
I think this lock can be removed, thanks.>
>> +    /* Some of VGIC registers are not used yet, it is for a future
usage */
>> +    /* IENABLE, IACTIVE, IPEND,  PENDSGI registers */
>> +    ext->ienable = rank->ienable;
>> +    ext->iactive = rank->iactive;
>> +    ext->ipend = rank->ipend;
>> +    ext->pendsgi = rank->pendsgi;
>> +    /* ICFG */
>> +    ext->icfg[0] = rank->icfg[0];
>> +    ext->icfg[1] = rank->icfg[1];
>> +    /* IPRIORITY */
>> +    if ( sizeof(rank->ipriority) != sizeof (ext->ipriority) )
>> +    {
>> +        dprintk(XENLOG_G_ERR, "hvm_hw_gic: check ipriority
dumping space\n");
> What makes ipriority (and itargets below) special enough to warrant such
> a check compared with all the other fields?
>
> Could these checks be BUILD_BUG_ON?If we use memcpy I think we should always check if the size of source 
and destination is equal.
Of course,  BUILD_BUG_ON is safe so I can  use it.
Thanks.>
>> +        return -EINVAL;
>> +    }
>> +    memcpy(ext->ipriority, rank->ipriority,
sizeof(rank->ipriority));
>> +    /* ITARGETS */
>> +    if ( sizeof(rank->itargets) != sizeof (ext->itargets) )
>> +    {
>> +        dprintk(XENLOG_G_ERR, "hvm_hw_gic: check itargets dumping
space\n");
>> +        return -EINVAL;
>> +    }
>> +    memcpy(ext->itargets, rank->itargets,
sizeof(rank->itargets));
>> +    spin_unlock(&rank->lock);
>> +    return 0;
>> +}
>> +
>> +static int gic_save(struct domain *d, hvm_domain_context_t *h)
>> +{
>> +    struct hvm_hw_gic ctxt;
>> +    struct vcpu *v;
>> +
>> +    /* Save the state of GICs */
>> +    for_each_vcpu( d, v )
> Where is the GICD state saved then?The only GICD structure we save for guest domain is struct 
vgic_irq_rank, it includes: IENABLE, IACTIVE, IPEND,  PENDSGI, ICFG, 
IPRIORITY, ITARGETS registers. We create the same structure inside hvm : 
vgic_rank (that is no guaranteed to be the same as struct vgic_irq_rank) 
and save it calling vgic_irq_rank_save routine below in
gic_save.>
>> +    {
>> +        ctxt.gic_hcr = v->arch.gic_hcr;
>> +        ctxt.gic_vmcr = v->arch.gic_vmcr;
>> +        ctxt.gic_apr = v->arch.gic_apr;
>> +
>> +        /* Save list registers and masks */
>> +        /* (it is not necessary to save/restore them, but LR state can
have
>> +         * influence on downtime after Live Migration (to be tested)
> What does this mean?
>
> I''m in two minds about whether LR counts as architectural state,
since
> it is hypervisor side and not guest facing. I''d have thought that
the LR
> content could be reconstructed from the pending and active bitmasks.Well, I tried to save LR to restore all interrupt queues (pending and 
active) in future to minimize the quantity of lost IRQs during the 
migration. But in this stage it is quite wrong to save LR so I remove
this.>
>> +         */
>> +        if ( sizeof(v->arch.gic_lr) > sizeof (ctxt.gic_lr) )
>> +        {
>> +             dprintk(XENLOG_G_ERR, "hvm_hw_gic: increase LR
dumping space\n");
>> +             return -EINVAL;
>> +        }
>> +        memcpy(ctxt.gic_lr, v->arch.gic_lr,
sizeof(v->arch.gic_lr));
>> +        ctxt.lr_mask = v->arch.lr_mask;
>> +        ctxt.event_mask = v->arch.event_mask;
> These last two in particular make me think that saving the LRs is wrong.
OK>
>> +        /* Save PPI states (per-CPU) */
>> +        /* It is necessary if SMP enabled */
>> +        if ( vgic_irq_rank_save(&ctxt.ppi_state, 
&v->arch.vgic.private_irqs) )
>> +            return 1;
>> +
>> +        if ( hvm_save_entry(GIC, v->vcpu_id, h, &ctxt) != 0 )
>> +            return 1;
>> +    }
>> +    return 0;
>> +}
>> +
>> +static int gic_load(struct domain *d, hvm_domain_context_t *h)
>> +{
>> +    int vcpuid;
>> +    struct hvm_hw_gic ctxt;
>> +    struct vcpu *v;
>> [...]
>> +    return 0;
>> +}
>> +
>> +HVM_REGISTER_SAVE_RESTORE(GIC, gic_save, gic_load, 1, HVMSR_PER_VCPU);
>> +
>> [...]
>> +#ifdef CONFIG_ARM_32
>> +        ctxt.mair0 = v->arch.mair0;
>> +        ctxt.mair1 = v->arch.mair1;
>> +#else
>> +        ctxt.mair0 = v->arch.mair;
>> +#endif
>> +        /* Control Registers */
>> +        ctxt.actlr = v->arch.actlr;
>> +        ctxt.sctlr = v->arch.sctlr;
>> +        ctxt.cpacr = v->arch.cpacr;
>> [...]
>> +        ctxt.x0 = c.x0;
> Would it make sense to include a field of type vcpu_guest_core_regs in
> the save struct instead of this big list?Yes, I think vcpu_guest_core_regs is already common definition for ARM32 
and ARM64.

  
[...]
>> +        #ifdef CONFIG_ARM_32
> #ifdef in the first columns please.
OK>
>> +                ctxt.spsr_fiq = c.spsr_fiq;
>> +                ctxt.spsr_irq = c.spsr_irq;
>> +                ctxt.spsr_und = c.spsr_und;
>> +                ctxt.spsr_abt = c.spsr_abt;
> and don''t indent these as if they were inside a block
Sure>
> You had this right when you saved the MAIR registers above.
>
>> +        #endif
>> +        #ifdef CONFIG_ARM_64
> or #else if you prefer.
OK>
>> +                ctxt.sp_el0 = c.sp_el0;
>> +                ctxt.sp_el1 = c.sp_el1;
>> +                ctxt.elr_el1 = c.elr_el1;
>> +        #endif
>> +
>> +        /* check VFP state size before dumping */
>> +        if ( sizeof(v->arch.vfp) > sizeof (ctxt.vfp) )
>> +        {
>> +            dprintk(XENLOG_G_ERR, "hvm_hw_cpu: increase VFP
dumping space\n");
>> +            return -EINVAL;
>> +        }
>> +        memcpy((void*) &ctxt.vfp, (void*) &v->arch.vfp,
sizeof(v->arch.vfp));
>> +
>> +        ctxt.pause_flags = v->pause_flags;
> x86 has at the top of the loop:
>          /* We don''t need to save state for a vcpu that is down;
the restore
>           * code will leave it down if there is nothing saved. */
>          if ( test_bit(_VPF_down, &v->pause_flags) )
>              continue;
>
> which I think is preferable to this.
OK>
>> diff --git a/xen/include/asm-arm/hvm/support.h
b/xen/include/asm-arm/hvm/support.h
>> new file mode 100644
>> index 0000000..8311f2f
>> --- /dev/null
>> +++ b/xen/include/asm-arm/hvm/support.h
>> @@ -0,0 +1,29 @@
>> +/*
>> + * support.h: HVM support routines used by ARMv7 VE.
> Not just on Vexpress nor just on v7 I expect?VE means Virtualization Extentions here.
OK, we should support ARM v8 in future,so can be "HVM support routines 
used by ARMv7/v8 with Virtualization Extensions" a good
comment?>> + *
>> + * Copyright (c) 2012, Citrix Systems
> Really?
OK>> + *
>> + * This program is free software; you can redistribute it and/or
modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but
WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
along with
>> + * this program; if not, write to the Free Software Foundation, Inc.,
59 Temple
>> + * Place - Suite 330, Boston, MA 02111-1307 USA.
>> + */
>> +
>> +#ifndef __ASM_ARM_HVM_SUPPORT_H__
>> +#define __ASM_ARM_HVM_SUPPORT_H__
>> +
>> +#include <xen/types.h>
>> +#include <public/hvm/ioreq.h>
>> +#include <xen/sched.h>
>> +#include <xen/hvm/save.h>
>> +#include <asm/processor.h>
> Just some #includes here?
Yes, I remove unnecessary includes.>
> Ian,
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

Eugene Fedotov

2013-Nov-13 08:28 UTC

head link

Re: [PATCH RESEND v5 2/6] xen/arm: Implement get_maximum_gpfn hypercall for arm

12.11.2013 19:21, Ian Campbell пишет:> On Fri, 2013-11-08 at 16:50 +0900, Jaeyong Yoo wrote:
>> From: Evgeny Fedotov <e.fedotov@samsung.com>
>>
>> By using the memory map info in arch_domain (from set_memory_map
hypercall)
>> implement get_maximum_gpfn hypercall.
>>
>> Changes from v4: Use GUEST_RAM_BASE as the start physical address of
guest
>> RAM. And, purge set-memory-map patch
>>
>> Singed-off-by: Evgeny Fedotov <e.fedotov@samsung.com>
>> ---
>>   xen/arch/arm/mm.c        | 22 +++++++++++++++++++++-
>>   xen/include/asm-arm/mm.h |  2 ++
>>   2 files changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
>> index 123280e..3801f07 100644
>> --- a/xen/arch/arm/mm.c
>> +++ b/xen/arch/arm/mm.c
>> @@ -927,7 +927,11 @@ int page_is_ram_type(unsigned long mfn, unsigned
long mem_type)
>>   
>>   unsigned long domain_get_maximum_gpfn(struct domain *d)
> s/unsigned long/xen_pfn_t/ I think.
>
> Urk, which will break the ABI for this hypercall. That's something of a
> conundrum :-/
>
> It is a domctl so we are at liberty to change it, probably to taking a
> xen_pfn_t * to fill in instead of returning the value.
>
> I'm in two minds about whether we should do so now or postpone it
(which
> risks forgetting and having an obscure bug somewhere down the line).OK, this way, or other solution is to purge this hypercall yet in 
migration and use max_memkb field in DOMCTL_getdomaininfo to obtain the 
maximum PFN.>
>>   {
>> -    return -ENOSYS;
>> +    paddr_t end;
>> +
>> +    get_gma_start_end(d, NULL, &end);
> I suppose this function is useful in a future patch too, otherwise
> 	return (GUEST_RAM_BASE>>PAGE_SHIFT)+d->max_pages
> would suffice.
OK> Ian.
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Eugene Fedotov

2013-Nov-13 08:40 UTC

head link

Re: [PATCH RESEND v5 3/6] xen/arm: Implement modify_returncode

12.11.2013 19:24, Ian Campbell пишет:> On Fri, 2013-11-08 at 16:50 +0900, Jaeyong Yoo wrote:
>> From: Alexey Sokolov <sokolov.a@samsung.com>
>>
>> Making sched_op in do_suspend (driver/xen/manage.c) returns 0 on the
success of suspend.
>>
>> Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com>
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
>
> Although if you wanted to...
>> +#elif defined(__arm__)
> make this "... || defined(__aarch64__)" and ...
>> +    ctxt.c.user_regs.r0_usr = 1;
> Use x0 here instead (which is a 64-bit alias of r0_usr, so good for both
> 32 and 64 bit).
>
> Then it would be ready made for 64-bit too in the future.
>
> Ian.Good point, OK.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Eugene Fedotov

2013-Nov-13 09:57 UTC

head link

Re: [PATCH RESEND v5 6/6] xen/arm: Implement toolstack for xl restore/save and migrate

12.11.2013 21:22, Ian Campbell wrote:> On Fri, 2013-11-08 at 16:50 +0900, Jaeyong Yoo wrote:
>> From: Alexey Sokolov <sokolov.a@samsung.com>
>>
>> Implement for xl restore/save (which are also used for migrate)
operation in xc_arm_migrate.c and make it compilable.
>> The overall process of save is the following:
>> 1) save guest parameters (i.e., memory map, console and store pfn, etc)
>> 2) save memory (if it is live, perform dirty-page tracing)
>> 3) save hvm states (i.e., gic, timer, vcpu etc)
>>
>> Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com>
>> ---
>>   config/arm32.mk              |   1 +
>>   tools/libxc/Makefile         |   6 +-
>>   tools/libxc/xc_arm_migrate.c | 712
+++++++++++++++++++++++++++++++++++++++++++
>>   tools/libxc/xc_dom_arm.c     |   4 +-
>>   tools/misc/Makefile          |   4 +-
>>   5 files changed, 723 insertions(+), 4 deletions(-)
>>   create mode 100644 tools/libxc/xc_arm_migrate.c
>>
>> diff --git a/config/arm32.mk b/config/arm32.mk
>> index aa79d22..01374c9 100644
>> --- a/config/arm32.mk
>> +++ b/config/arm32.mk
>> @@ -1,6 +1,7 @@
>>   CONFIG_ARM := y
>>   CONFIG_ARM_32 := y
>>   CONFIG_ARM_$(XEN_OS) := y
>> +CONFIG_MIGRATE := y
>>   
>>   CONFIG_XEN_INSTALL_SUFFIX :>>   
>> diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
>> index 4c64c15..05dfef4 100644
>> --- a/tools/libxc/Makefile
>> +++ b/tools/libxc/Makefile
>> @@ -42,8 +42,13 @@ CTRL_SRCS-$(CONFIG_MiniOS) += xc_minios.c
>>   GUEST_SRCS-y :>>   GUEST_SRCS-y += xg_private.c xc_suspend.c
>>   ifeq ($(CONFIG_MIGRATE),y)
>> +ifeq ($(CONFIG_X86),y)
>>   GUEST_SRCS-y += xc_domain_restore.c xc_domain_save.c
>>   GUEST_SRCS-y += xc_offline_page.c xc_compression.c
>> +endif
>> +ifeq ($(CONFIG_ARM),y)
>> +GUEST_SRCS-y += xc_arm_migrate.c
> I know you are just following the example above but I think this can be
> GUEST_SRCS-$(CONFIG_ARM) += xc_arm...
OK>
>> +endif
>>   else
>>   GUEST_SRCS-y += xc_nomigrate.c
>>   endif
>> @@ -63,7 +68,6 @@ $(patsubst %.c,%.opic,$(ELF_SRCS-y)): CFLAGS +=
-Wno-pointer-sign
>>   GUEST_SRCS-y                 += xc_dom_core.c xc_dom_boot.c
>>   GUEST_SRCS-y                 += xc_dom_elfloader.c
>>   GUEST_SRCS-$(CONFIG_X86)     += xc_dom_bzimageloader.c
>> -GUEST_SRCS-$(CONFIG_X86)     += xc_dom_decompress_lz4.c
> I don''t think this was intentional, was it?
Oops, sure.>>   GUEST_SRCS-$(CONFIG_ARM)     += xc_dom_armzimageloader.c
>>   GUEST_SRCS-y                 += xc_dom_binloader.c
>>   GUEST_SRCS-y                 += xc_dom_compat_linux.c
>> diff --git a/tools/libxc/xc_arm_migrate.c
b/tools/libxc/xc_arm_migrate.c
>> new file mode 100644
>> index 0000000..461e339
>> --- /dev/null
>> +++ b/tools/libxc/xc_arm_migrate.c
>> @@ -0,0 +1,712 @@
> Is this implementing the exact protocol as described in
> tools/libxc/xg_save_restore.h or is it a variant? Are there any docs of
> the specifics of the ARM protocol?This implements a quite different from tools/libxc/xg_save_restore.h 
protocol, it is much more simplified because we do not need some things 
that implemented for x86. So you''re right, it has to be documented.  
Should we use a different header to  place documentation to this (and 
place some definitions), for example xc_arm_migrate.h?> We will eventually need to make a statement about the stability of the
> protocol, i.e on x86 we support X->X+1 migrations across Xen versions. I
> think we''d need to make similar guarantees on ARM before we would
remove
> the "tech preview" label from the migration feature.So, should you believe our results (and where should we place this 
statement) or should you make tests from your side?> So the docs are useful so we can review the intended protocol for
> forward compatibility problems etc. We needn''t necessarily
implement the
> x86 one from xg_save_restore.h.
>
> In particular it would be nice if the protocol and each of the
"chunks"
> in it were explicitly versioned etc. For example the code assumes that
> the HVM context implicitly follows the last iteration -- this caused
> untold pain on x86 when remus was added...
OK, such documentation will be soon.>
>>
+/******************************************************************************
>> + * This library is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation;
>> + * version 2.1 of the License.
>> + *
>> + * This library is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with this library; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 
02110-1301  USA
>> + *
>> + * Copyright (c) 2013, Samsung Electronics
>> + */
>> +
>> +#include <inttypes.h>
>> +#include <errno.h>
>> +#include <xenctrl.h>
>> +#include <xenguest.h>
>> +
>> +#include <unistd.h>
>> +#include <xc_private.h>
>> +#include <xc_dom.h>
>> +#include "xc_bitops.h"
>> +#include "xg_private.h"
>> +
>> +/* Guest RAM base */
>> +#define GUEST_RAM_BASE 0x80000000
>> +/*
>> + *  XXX: Use correct definition for RAM base when the following patch
>> + *  xen: arm: 64-bit guest support and domU FDT autogeneration
>> + *  will be upstreamed.
>> + */
>> +
>> +#define DEF_MAX_ITERS          29 /* limit us to 30 times round loop  
*/
>> +#define DEF_MAX_FACTOR         3  /* never send more than 3x p2m_size 
*/
>> +#define DEF_MIN_DIRTY_PER_ITER 50 /* dirty page count to define last
iter */
>> +#define DEF_PROGRESS_RATE      50 /* progress bar update rate */
>> +
>> +/* Enable this macro for debug only: "static" migration
instead of live */
>> +/*
>> +#define DISABLE_LIVE_MIGRATION
>> +*/
> I don''t think this is needed, the caller can be hacked if
necessary.
OK, I remove. It is internal hack.>
>> +
>> +/* Enable this macro for debug only: additional debug info */
>> +/*
>> +#define ARM_MIGRATE_VERBOSE
>> +*/
> Likewise.
OK>
>> +/* ============ Memory ============= */
>> +static int save_memory(xc_interface *xch, int io_fd, uint32_t dom,
>> +                       struct save_callbacks *callbacks,
>> +                       uint32_t max_iters, uint32_t max_factor,
>> +                       guest_params_t *params)
>> +{
>> +    int live =  !!(params->flags & XCFLAGS_LIVE);
>> +    int debug =  !!(params->flags & XCFLAGS_DEBUG);
>> +    xen_pfn_t i;
>> +    char reportbuf[80];
>> +    int iter = 0;
>> +    int last_iter = !live;
>> +    int total_dirty_pages_num = 0;
>> +    int dirty_pages_on_prev_iter_num = 0;
>> +    int count = 0;
>> +    char *page = 0;
>> +    xen_pfn_t *busy_pages = 0;
>> +    int busy_pages_count = 0;
>> +    int busy_pages_max = 256;
>> +
>> +    DECLARE_HYPERCALL_BUFFER(unsigned long, to_send);
>> +
>> +    xen_pfn_t start = params->start_gpfn;
>> +    const xen_pfn_t end = params->max_gpfn;
>> +    const xen_pfn_t mem_size = end - start;
>> +
>> +    if ( debug )
>> +    {
>> +        IPRINTF("(save mem) start=%llx end=%llx!\n", start,
end);
>> +    }
> FYI you don''t need the {}''s for cases like this.Actually we don''t need {}, this has been done because we was not sure
if
this macro can be empty-substituted.>
> is if ( debug ) IPRINTF(...) not the equivalent of DPRINTF?This equivalence is not obvious for me, because in current code we 
obtain debug flag  with XCFLAGS_DEBUG mask (when --debug option passed).
If it is equivalent I''ll use DPRINTF.>
>> +
>> +    if ( live )
>> +    {
>> +        if ( xc_shadow_control(xch, dom,
XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY,
>> +                    NULL, 0, NULL, 0, NULL) < 0 )
>> +        {
>> +            ERROR("Couldn''t enable log-dirty mode
!\n");
>> +            return -1;
>> +        }
>> +
>> +        max_iters  = max_iters  ? : DEF_MAX_ITERS;
>> +        max_factor = max_factor ? : DEF_MAX_FACTOR;
>> +
>> +        if ( debug )
>> +            IPRINTF("Log-dirty mode enabled, max_iters=%d,
max_factor=%d!\n",
>> +                    max_iters, max_factor);
>> +    }
>> +
>> +    to_send = xc_hypercall_buffer_alloc_pages(xch, to_send,
>> +                                             
NRPAGES(bitmap_size(mem_size)));
>> +    if ( !to_send )
>> +    {
>> +        ERROR("Couldn''t allocate to_send
array!\n");
>> +        return -1;
>> +    }
>> +
>> +    /* send all pages on first iter */
>> +    memset(to_send, 0xff, bitmap_size(mem_size));
>> +
>> +    for ( ; ; )
>> +    {
>> +        int dirty_pages_on_current_iter_num = 0;
>> +        int frc;
>> +        iter++;
>> +
>> +        snprintf(reportbuf, sizeof(reportbuf),
>> +                 "Saving memory: iter %d (last sent %u)",
>> +                 iter, dirty_pages_on_prev_iter_num);
>> +
>> +        xc_report_progress_start(xch, reportbuf, mem_size);
>> +
>> +        if ( (iter > 1 &&
>> +              dirty_pages_on_prev_iter_num <
DEF_MIN_DIRTY_PER_ITER) ||
>> +             (iter == max_iters) ||
>> +             (total_dirty_pages_num >= mem_size*max_factor) )
>> +        {
>> +            if ( debug )
>> +                IPRINTF("Last iteration");
>> +            last_iter = 1;
>> +        }
>> +
>> +        if ( last_iter )
>> +        {
>> +            if ( suspend_and_state(callbacks->suspend,
callbacks->data,
>> +                                   xch, dom) )
>> +            {
>> +                ERROR("Domain appears not to have
suspended");
>> +                return -1;
>> +            }
>> +        }
>> +        if ( live && iter > 1 )
>> +        {
>> +            frc = xc_shadow_control(xch, dom,
XEN_DOMCTL_SHADOW_OP_CLEAN,
>> +                                    HYPERCALL_BUFFER(to_send),
mem_size,
>> +                                                     NULL, 0, NULL);
>> +            if ( frc != mem_size )
>> +            {
>> +                ERROR("Error peeking shadow bitmap");
>> +                xc_hypercall_buffer_free_pages(xch, to_send,
>> +                                              
NRPAGES(bitmap_size(mem_size)));
>> +                return -1;
>> +            }
>> +        }
>> +
>> +        busy_pages = malloc(sizeof(xen_pfn_t) * busy_pages_max);
>> +
>> +        for ( i = start; i < end; ++i )
>> +        {
>> +            if ( test_bit(i - start, to_send) )
>> +            {
>> +                page = xc_map_foreign_range(xch, dom, PAGE_SIZE,
PROT_READ, i);
> On x86 we try to do this in batches to reduce the overheads. I suppose
> that could be a future enhancement.
OK. I''ll make TODO comment.>
>> +                if ( !page )
>> +                {
>> +                    /* This page is mapped elsewhere, should be resent
later */
> What does this ("busy") mean? When does this happen?Oh, it looks like workaround of problem that maybe don''t need now.
In case of dom0 with 2 CPU xc_map_foriegn range can return NULL, but 
guest page is exist. The later call xc_map_foriegn_range on the same 
page will return valid pointer.
I''ll remove this.
>
> [...]
>> +
>> +static int restore_guest_params(xc_interface *xch, int io_fd,
>> +                                uint32_t dom, guest_params_t *params)
>> +{
>> [...]
>> +    if ( xc_domain_setmaxmem(xch, dom, maxmemkb) )
>> +    {
>> +        ERROR("Can''t set memory map");
>> +        return -1;
>> +    }
>> +
>> +    /* Set max. number of vcpus as max_vcpu_id + 1 */
>> +    if ( xc_domain_max_vcpus(xch, dom, params->max_vcpu_id + 1) )
> Does the higher level toolstack not take care of vcpus and maxmem? I
> thought so. I think this is how it shoud be.
For my tests guest config information is not transferred for ARM case 
from high-level stack. At the migration receiver side toolstack always 
create a new domain with vcpus=1 and default max. mem. So we have to 
send guest information as our local guest_params structure (at the 
beginning of migration).
It is easy way to work "xl save" or "xl migrate" without
modification of
libxl level, but you may have another idea?
Also, toolstack_restore callback is not set (NULL) for ARM case.
>
>> +    {
>> +        ERROR("Can''t set max vcpu number for
domain");
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +[...]
>> diff --git a/tools/misc/Makefile b/tools/misc/Makefile
>> index 17aeda5..0824100 100644
>> --- a/tools/misc/Makefile
>> +++ b/tools/misc/Makefile
>> @@ -11,7 +11,7 @@ HDRS     = $(wildcard *.h)
>>   
>>   TARGETS-y := xenperf xenpm xen-tmem-list-parse gtraceview gtracestat
xenlockprof xenwatchdogd xencov
>>   TARGETS-$(CONFIG_X86) += xen-detect xen-hvmctx xen-hvmcrash
xen-lowmemd xen-mfndump
>> -TARGETS-$(CONFIG_MIGRATE) += xen-hptool
>> +TARGETS-$(CONFIG_X86) += xen-hptool
>>   TARGETS := $(TARGETS-y)
>>   
>>   SUBDIRS := $(SUBDIRS-y)
>> @@ -23,7 +23,7 @@ INSTALL_BIN := $(INSTALL_BIN-y)
>>   INSTALL_SBIN-y := xen-bugtool xen-python-path xenperf xenpm
xen-tmem-list-parse gtraceview \
>>   	gtracestat xenlockprof xenwatchdogd xen-ringwatch xencov
>>   INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx xen-hvmcrash xen-lowmemd
xen-mfndump
>> -INSTALL_SBIN-$(CONFIG_MIGRATE) += xen-hptool
>> +INSTALL_SBIN-$(CONFIG_X86) += xen-hptool
>>   INSTALL_SBIN := $(INSTALL_SBIN-y)
>>   
>>   INSTALL_PRIVBIN-y := xenpvnetboot
> You could resend these last two separately and they could probably go
> straight in.
OK>
> Ian.
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

Ian Campbell

2013-Nov-13 10:56 UTC

head link

Re: [PATCH RESEND v5 1/6] xen/arm: Implement hvm save and restore

On Wed, 2013-11-13 at 12:00 +0400, Eugene Fedotov wrote:
> > [...]
> >> +static int vgic_irq_rank_save(struct vgic_rank *ext,
> >> +                               struct vgic_irq_rank *rank)
> >> +{
> >> +    spin_lock(&rank->lock);
> > This is probably wise, but the domain must be paused at this point,
> > right?
> I think this lock can be removed, thanks.
I''d be happy for it to stay as a belt-and-braces/good practice thing.
> >> +        return -EINVAL;
> >> +    }
> >> +    memcpy(ext->ipriority, rank->ipriority,
sizeof(rank->ipriority));
> >> +    /* ITARGETS */
> >> +    if ( sizeof(rank->itargets) != sizeof (ext->itargets) )
> >> +    {
> >> +        dprintk(XENLOG_G_ERR, "hvm_hw_gic: check itargets
dumping space\n");
> >> +        return -EINVAL;
> >> +    }
> >> +    memcpy(ext->itargets, rank->itargets,
sizeof(rank->itargets));
> >> +    spin_unlock(&rank->lock);
> >> +    return 0;
> >> +}
> >> +
> >> +static int gic_save(struct domain *d, hvm_domain_context_t *h)
> >> +{
> >> +    struct hvm_hw_gic ctxt;
> >> +    struct vcpu *v;
> >> +
> >> +    /* Save the state of GICs */
> >> +    for_each_vcpu( d, v )
> > Where is the GICD state saved then?
> The only GICD structure we save for guest domain is struct 
> vgic_irq_rank, it includes: IENABLE, IACTIVE, IPEND,  PENDSGI, ICFG, 
> IPRIORITY, ITARGETS registers. We create the same structure inside hvm : 
> vgic_rank (that is no guaranteed to be the same as struct vgic_irq_rank) 
> and save it calling vgic_irq_rank_save routine below in gic_save.
I can only see one call to vgic_irq_rank_save which is the one to save
PPI state within the vcpu loop. What about the (per-cpu) SGI and
(global) SPIs? I can''t see where either of those are saved.
> >> +        ctxt.x0 = c.x0;
> > Would it make sense to include a field of type vcpu_guest_core_regs in
> > the save struct instead of this big list?
> Yes, I think vcpu_guest_core_regs is already common definition for ARM32 
> and ARM64.
Correct.
> >> diff --git a/xen/include/asm-arm/hvm/support.h
b/xen/include/asm-arm/hvm/support.h
> >> new file mode 100644
> >> index 0000000..8311f2f
> >> --- /dev/null
> >> +++ b/xen/include/asm-arm/hvm/support.h
> >> @@ -0,0 +1,29 @@
> >> +/*
> >> + * support.h: HVM support routines used by ARMv7 VE.
> > Not just on Vexpress nor just on v7 I expect?
> VE means Virtualization Extentions here.
> OK, we should support ARM v8 in future,so can be "HVM support routines
> used by ARMv7/v8 with Virtualization Extensions" a good comment?
I think just "...used by ARM" would be sufficient.
> >> + *
> >> + * Copyright (c) 2012, Citrix Systems
> > Really?
> OK
> >> + *
> >> + * This program is free software; you can redistribute it and/or
modify it
> >> + * under the terms and conditions of the GNU General Public
License,
> >> + * version 2, as published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope it will be useful, but
WITHOUT
> >> + * ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or
> >> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
License for
> >> + * more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public
License along with
> >> + * this program; if not, write to the Free Software Foundation,
Inc., 59 Temple
> >> + * Place - Suite 330, Boston, MA 02111-1307 USA.
> >> + */
> >> +
> >> +#ifndef __ASM_ARM_HVM_SUPPORT_H__
> >> +#define __ASM_ARM_HVM_SUPPORT_H__
> >> +
> >> +#include <xen/types.h>
> >> +#include <public/hvm/ioreq.h>
> >> +#include <xen/sched.h>
> >> +#include <xen/hvm/save.h>
> >> +#include <asm/processor.h>
> > Just some #includes here?
> Yes, I remove unnecessary includes.
Oh, I see now this header required by common code, that''s OK then.

But please do minimise what is needed here.

Ian.

Ian Campbell

2013-Nov-13 10:58 UTC

head link

Re: [PATCH RESEND v5 2/6] xen/arm: Implement get_maximum_gpfn hypercall for arm

On Wed, 2013-11-13 at 12:28 +0400, Eugene Fedotov wrote:> >> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> >> index 123280e..3801f07 100644
> >> --- a/xen/arch/arm/mm.c
> >> +++ b/xen/arch/arm/mm.c
> >> @@ -927,7 +927,11 @@ int page_is_ram_type(unsigned long mfn,
unsigned long mem_type)
> >>   
> >>   unsigned long domain_get_maximum_gpfn(struct domain *d)
> > s/unsigned long/xen_pfn_t/ I think.
> >
> > Urk, which will break the ABI for this hypercall. That''s
something of a
> > conundrum :-/
> >
> > It is a domctl so we are at liberty to change it, probably to taking a
> > xen_pfn_t * to fill in instead of returning the value.
> >
> > I''m in two minds about whether we should do so now or
postpone it (which
> > risks forgetting and having an obscure bug somewhere down the line).
> OK, this way, or other solution is to purge this hypercall yet in 
> migration and use max_memkb field in DOMCTL_getdomaininfo to obtain the 
> maximum PFN.
Is there an interface to retrieve that? I don''t see it :-(

Ian.

Ian Campbell

2013-Nov-13 11:09 UTC

head link

Re: [PATCH RESEND v5 6/6] xen/arm: Implement toolstack for xl restore/save and migrate

On Wed, 2013-11-13 at 13:57 +0400, Eugene Fedotov wrote:> >> diff --git a/tools/libxc/xc_arm_migrate.c
b/tools/libxc/xc_arm_migrate.c
> >> new file mode 100644
> >> index 0000000..461e339
> >> --- /dev/null
> >> +++ b/tools/libxc/xc_arm_migrate.c
> >> @@ -0,0 +1,712 @@
> > Is this implementing the exact protocol as described in
> > tools/libxc/xg_save_restore.h or is it a variant? Are there any docs
of
> > the specifics of the ARM protocol?
> This implements a quite different from tools/libxc/xg_save_restore.h 
> protocol, it is much more simplified because we do not need some things 
> that implemented for x86. So you''re right, it has to be
documented.
> Should we use a different header to  place documentation to this (and 
> place some definitions), for example xc_arm_migrate.h?
I think xg_arm_save_restore.h would be the consistent name. At some
point we should rename some of the x86 specific stuff, but you don''t
need to worry about that.
> > We will eventually need to make a statement about the stability of the
> > protocol, i.e on x86 we support X->X+1 migrations across Xen
versions. I
> > think we''d need to make similar guarantees on ARM before we
would remove
> > the "tech preview" label from the migration feature.
> So, should you believe our results (and where should we place this 
> statement) or should you make tests from your side?
By stability I mean the stability of the migration ABI across version
changes, not stability as in the bugginess or otherwise of the code. IOW
a commitment to supporting migration from version X to version X+1 in
the future.

WRT "tech preview" I don''t think we are going to be able to
honestly
call this production ready for the general case in 4.4, I expect it''ll
need a bit longer to "bed in" before we are happy with that statement.
(this has no bearing on whether you want to fully support it as
production ready in your particular application).
> >
> >> +/* ============ Memory ============= */
> >> +static int save_memory(xc_interface *xch, int io_fd, uint32_t
dom,
> >> +                       struct save_callbacks *callbacks,
> >> +                       uint32_t max_iters, uint32_t max_factor,
> >> +                       guest_params_t *params)
> >> +{
> >> +    int live =  !!(params->flags & XCFLAGS_LIVE);
> >> +    int debug =  !!(params->flags & XCFLAGS_DEBUG);
> >> +    xen_pfn_t i;
> >> +    char reportbuf[80];
> >> +    int iter = 0;
> >> +    int last_iter = !live;
> >> +    int total_dirty_pages_num = 0;
> >> +    int dirty_pages_on_prev_iter_num = 0;
> >> +    int count = 0;
> >> +    char *page = 0;
> >> +    xen_pfn_t *busy_pages = 0;
> >> +    int busy_pages_count = 0;
> >> +    int busy_pages_max = 256;
> >> +
> >> +    DECLARE_HYPERCALL_BUFFER(unsigned long, to_send);
> >> +
> >> +    xen_pfn_t start = params->start_gpfn;
> >> +    const xen_pfn_t end = params->max_gpfn;
> >> +    const xen_pfn_t mem_size = end - start;
> >> +
> >> +    if ( debug )
> >> +    {
> >> +        IPRINTF("(save mem) start=%llx end=%llx!\n",
start, end);
> >> +    }
> > FYI you don''t need the {}''s for cases like this.
> Actually we don''t need {}, this has been done because we was not
sure if
> this macro can be empty-substituted.
> >
> > is if ( debug ) IPRINTF(...) not the equivalent of DPRINTF?
> This equivalence is not obvious for me, because in current code we 
> obtain debug flag  with XCFLAGS_DEBUG mask (when --debug option passed).
> If it is equivalent I''ll use DPRINTF.
No you are right, these are different. Actually DPRINTF is "detail
level", there is a DBGPRINTF which is "debug level" (levels here
being
in terms of XTL_FOO from xentoollog.h). So you probably do want to use
if (debug), although you may separately want to consider which level the
resulting messages are printed at when they are printed...
> >
> > [...]
> >> +
> >> +static int restore_guest_params(xc_interface *xch, int io_fd,
> >> +                                uint32_t dom, guest_params_t
*params)
> >> +{
> >> [...]
> >> +    if ( xc_domain_setmaxmem(xch, dom, maxmemkb) )
> >> +    {
> >> +        ERROR("Can''t set memory map");
> >> +        return -1;
> >> +    }
> >> +
> >> +    /* Set max. number of vcpus as max_vcpu_id + 1 */
> >> +    if ( xc_domain_max_vcpus(xch, dom, params->max_vcpu_id +
1) )
> > Does the higher level toolstack not take care of vcpus and maxmem? I
> > thought so. I think this is how it shoud be.
> 
> For my tests guest config information is not transferred for ARM case 
> from high-level stack. At the migration receiver side toolstack always 
> create a new domain with vcpus=1 and default max. mem. So we have to 
> send guest information as our local guest_params structure (at the 
> beginning of migration).
> It is easy way to work "xl save" or "xl migrate"
without modification of
> libxl level, but you may have another idea?
> Also, toolstack_restore callback is not set (NULL) for ARM case.
So you aren''t using xl to do the migration? This is what we should
ultimately be aiming for. It is almost certainly going to require fixes
at the libxl level though.

If you need to send additional information for your usecase then it
should be done at the layer above libxc.

Ian.

Ian Campbell

2013-Nov-13 12:21 UTC

head link

Re: [PATCH RESEND v5 6/6] xen/arm: Implement toolstack for xl restore/save and migrate

Readding the list...

On Wed, 2013-11-13 at 12:19 +0000, Ian Campbell wrote:> On Wed, 2013-11-13 at 16:17 +0400, Eugene Fedotov wrote:
> > 13.11.2013 15:09, Ian Campbell wrote:
> > > For my tests guest config information is not transferred for ARM
case
> > > from high-level stack. At the migration receiver side toolstack
always
> > > create a new domain with vcpus=1 and default max. mem. So we have
to
> > > send guest information as our local guest_params structure (at
the
> > > beginning of migration).
> > > It is easy way to work "xl save" or "xl
migrate" without modification of
> > > libxl level, but you may have another idea?
> > > Also, toolstack_restore callback is not set (NULL) for ARM case.
> > > So you aren''t using xl to do the migration? This is what
we should
> > > ultimately be aiming for. It is almost certainly going to require
fixes
> > > at the libxl level though.
> > Some misunderstanding. We are using xl for migration.  I mean that
libxl
> > doesn''t correctly transfer guest parameters:  number of
vcpus, memory.
> 
> Huh, I''d have thought that would all be taken care of by generic
libxl
> code and not require x86 and arm specifics. Maybe I''m wrong.
> 
> In any case it should be fixed.
> 
> > At the proof-of-concept stage there was easier to transfer it inside 
> > xc_domain_save and xc_domain_restore rather then patching libxl.
> 
> Understood.
> 
> > For example, we should correctly set maximum memory for destination 
> > domain by using xc_setmaxmem hypercall. Otherwise, toolstack set it by
> > default calling
> > xc_domain_setmaxmem(ctx->xch, domid, info->target_memkb + 
> > LIBXL_MAXMEM_CONSTANT);
> > (see libxl_dom.c:241). But we don''t need to add
LIBXL_MAXMEM_CONSTANT on
> > ARM, so we set it manually.
> 
> This is added on create, so why not on restore? Whether or not it is
> necessary on ARM is a separate question, it should be consistent.
> 
> Ian.

Ian Campbell

2013-Nov-13 12:22 UTC

head link

Re: [PATCH RESEND v5 1/6] xen/arm: Implement hvm save and restore

To the list this time as well.

On Wed, 2013-11-13 at 11:47 +0000, Ian Campbell wrote:> On Wed, 2013-11-13 at 15:50 +0400, Eugene Fedotov wrote:
> > 13.11.2013 14:56, Ian Campbell пишет:
> > > On Wed, 2013-11-13 at 12:00 +0400, Eugene Fedotov wrote:
> > >
> > >>> [...]
> > >>>> +static int vgic_irq_rank_save(struct vgic_rank *ext,
> > >>>> +                               struct vgic_irq_rank
*rank)
> > >>>> +{
> > >>>> +    spin_lock(&rank->lock);
> > >>> This is probably wise, but the domain must be paused at
this point,
> > >>> right?
> > >> I think this lock can be removed, thanks.
> > > I'd be happy for it to stay as a belt-and-braces/good
practice thing.
> > >
> > >>>> +        return -EINVAL;
> > >>>> +    }
> > >>>> +    memcpy(ext->ipriority, rank->ipriority,
sizeof(rank->ipriority));
> > >>>> +    /* ITARGETS */
> > >>>> +    if ( sizeof(rank->itargets) != sizeof
(ext->itargets) )
> > >>>> +    {
> > >>>> +        dprintk(XENLOG_G_ERR, "hvm_hw_gic:
check itargets dumping space\n");
> > >>>> +        return -EINVAL;
> > >>>> +    }
> > >>>> +    memcpy(ext->itargets, rank->itargets,
sizeof(rank->itargets));
> > >>>> +    spin_unlock(&rank->lock);
> > >>>> +    return 0;
> > >>>> +}
> > >>>> +
> > >>>> +static int gic_save(struct domain *d,
hvm_domain_context_t *h)
> > >>>> +{
> > >>>> +    struct hvm_hw_gic ctxt;
> > >>>> +    struct vcpu *v;
> > >>>> +
> > >>>> +    /* Save the state of GICs */
> > >>>> +    for_each_vcpu( d, v )
> > >>> Where is the GICD state saved then?
> > >> The only GICD structure we save for guest domain is struct
> > >> vgic_irq_rank, it includes: IENABLE, IACTIVE, IPEND, 
PENDSGI, ICFG,
> > >> IPRIORITY, ITARGETS registers. We create the same structure
inside hvm :
> > >> vgic_rank (that is no guaranteed to be the same as struct
vgic_irq_rank)
> > >> and save it calling vgic_irq_rank_save routine below in
gic_save.
> > > I can only see one call to vgic_irq_rank_save which is the one to
save
> > > PPI state within the vcpu loop. What about the (per-cpu) SGI and
> > > (global) SPIs? I can't see where either of those are saved.
> 
> > 1) For the guest domain vgic.nr_lines was set to 0 (arch_domain 
> > structure, see "domain_vgic_init" function in vgic.c):
> > d->arch.vgic.nr_lines = 0; /* We don't need SPIs for the guest
*/
> > Should we not rely on this, assuming that SPIs will be enabled for
guest?
> 
> Aha! I'd forgotten about that. I think we can safely ignore SPIs until
> device passthrough is added. And even then I don't think we would want
> to migrate with an active passthrough device attached.
> 
> > 2) Could you point me where is SGI states are situated in arch_vcpu? I
> > thought they are inside those 32 IRQs (namely 0-15 IRQs) that we have 
> > already saved, because vgic_irq_rank in arch_vcpu is the context for 
> > those 32 IRQs.
> 
> SGIs are "interrupts" 0..15 which PPIs are "interrupts"
16..31, so if
> you are saving the first rank which contains all 32 of those then I
> think you have probably covered them already. I think you just need to
> update the comment(s) to mention SGI as well as PPI.
> 
> Thanks,
> 
> Ian
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Eugene Fedotov

2013-Nov-13 12:31 UTC

head link

Re: [PATCH RESEND v5 1/6] xen/arm: Implement hvm save and restore

13.11.2013 15:47, Ian Campbell пишет:> On Wed, 2013-11-13 at 15:50 +0400, Eugene Fedotov wrote:
>> 13.11.2013 14:56, Ian Campbell пишет:
>>> On Wed, 2013-11-13 at 12:00 +0400, Eugene Fedotov wrote:
>>>
>>>>> [...]
>>>>>> +static int vgic_irq_rank_save(struct vgic_rank *ext,
>>>>>> +                               struct vgic_irq_rank
*rank)
>>>>>> +{
>>>>>> +    spin_lock(&rank->lock);
>>>>> This is probably wise, but the domain must be paused at
this point,
>>>>> right?
>>>> I think this lock can be removed, thanks.
>>> I'd be happy for it to stay as a belt-and-braces/good practice
thing.
>>>
>>>>>> +        return -EINVAL;
>>>>>> +    }
>>>>>> +    memcpy(ext->ipriority, rank->ipriority,
sizeof(rank->ipriority));
>>>>>> +    /* ITARGETS */
>>>>>> +    if ( sizeof(rank->itargets) != sizeof
(ext->itargets) )
>>>>>> +    {
>>>>>> +        dprintk(XENLOG_G_ERR, "hvm_hw_gic: check
itargets dumping space\n");
>>>>>> +        return -EINVAL;
>>>>>> +    }
>>>>>> +    memcpy(ext->itargets, rank->itargets,
sizeof(rank->itargets));
>>>>>> +    spin_unlock(&rank->lock);
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int gic_save(struct domain *d,
hvm_domain_context_t *h)
>>>>>> +{
>>>>>> +    struct hvm_hw_gic ctxt;
>>>>>> +    struct vcpu *v;
>>>>>> +
>>>>>> +    /* Save the state of GICs */
>>>>>> +    for_each_vcpu( d, v )
>>>>> Where is the GICD state saved then?
>>>> The only GICD structure we save for guest domain is struct
>>>> vgic_irq_rank, it includes: IENABLE, IACTIVE, IPEND,  PENDSGI,
ICFG,
>>>> IPRIORITY, ITARGETS registers. We create the same structure
inside hvm :
>>>> vgic_rank (that is no guaranteed to be the same as struct
vgic_irq_rank)
>>>> and save it calling vgic_irq_rank_save routine below in
gic_save.
>>> I can only see one call to vgic_irq_rank_save which is the one to
save
>>> PPI state within the vcpu loop. What about the (per-cpu) SGI and
>>> (global) SPIs? I can't see where either of those are saved.
>> 1) For the guest domain vgic.nr_lines was set to 0 (arch_domain
>> structure, see "domain_vgic_init" function in vgic.c):
>> d->arch.vgic.nr_lines = 0; /* We don't need SPIs for the guest
*/
>> Should we not rely on this, assuming that SPIs will be enabled for
guest?
> Aha! I'd forgotten about that. I think we can safely ignore SPIs until
> device passthrough is added. And even then I don't think we would want
> to migrate with an active passthrough device attached.
>
>> 2) Could you point me where is SGI states are situated in arch_vcpu? I
>> thought they are inside those 32 IRQs (namely 0-15 IRQs) that we have
>> already saved, because vgic_irq_rank in arch_vcpu is the context for
>> those 32 IRQs.
> SGIs are "interrupts" 0..15 which PPIs are "interrupts"
16..31, so if
> you are saving the first rank which contains all 32 of those then I
> think you have probably covered them already. I think you just need to
> update the comment(s) to mention SGI as well as PPI.OK, thanks.

Best regards,
Evgeny.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Jaeyong Yoo

2013-Nov-14 23:58 UTC

head link

Re: [PATCH RESEND v5 4/6] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration

> -----Original Message-----
> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> Sent: Wednesday, November 13, 2013 12:59 AM
> To: Jaeyong Yoo
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH RESEND v5 4/6] xen/arm: Implement virtual-
> linear page table for guest p2m mapping in live migration
> 
> On Fri, 2013-11-08 at 16:50 +0900, Jaeyong Yoo wrote:
> > Implement vlpt (virtual-linear page table) for fast accessing of 3rd
PTE
> of guest p2m.
> > When creating a mapping for vlpt, just copy the 1st level PTE of guest
> > p2m to the xen''s 2nd level PTE. Then the mapping becomes the
following:
> >     xen''s 1st PTE -->
> >     xen''s 2nd PTE (which is the same as 1st PTE of guest p2m)
-->
> >     guest p2m''s 2nd PTE -->
> >     guest p2m''s 3rd PTE (the memory contents where the vlpt
points)
> > For more info about vlpt, see:
> > http://www.technovelty.org/linux/virtual-linear-page-table.html
> >
> > This function is used in dirty-page tracing: when domU write-fault is
> > trapped by xen, xen can immediately locate the 3rd PTE of guest p2m.
> > The following link shows the performance comparison for handling a
> > dirty-page between vlpt and typical page table walking.
> > http://lists.xen.org/archives/html/xen-devel/2013-08/msg01503.html
> >
> > Changes from v4:
> > 1. In the restoring vlpt, use __foo variant without barriers for
> write_pte and
> >    flush_xen_data_tlb_range_va.
> > 2. Support two consecutive pages for guest''s first level page
table.
> >
> > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
> > ---
> >  xen/arch/arm/domain.c            |   5 ++
> >  xen/arch/arm/mm.c                | 112
> +++++++++++++++++++++++++++++++++++++++
> >  xen/include/asm-arm/arm32/page.h |  41 ++++++++++----
> >  xen/include/asm-arm/config.h     |   5 ++
> >  xen/include/asm-arm/domain.h     |   7 +++
> >  xen/include/asm-arm/mm.h         |  15 ++++++
> >  6 files changed, 174 insertions(+), 11 deletions(-)
> >
> > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index
> > cb0424d..c0b5dd8 100644
> > --- a/xen/arch/arm/domain.c
> > +++ b/xen/arch/arm/domain.c
> > @@ -509,6 +509,11 @@ int arch_domain_create(struct domain *d, unsigned
> int domcr_flags)
> >      /* Default the virtual ID to match the physical */
> >      d->arch.vpidr = boot_cpu_data.midr.bits;
> >
> > +    d->arch.dirty.second_lvl_start = 0;
> > +    d->arch.dirty.second_lvl_end = 0;
> > +    d->arch.dirty.second_lvl[0] = NULL;
> > +    d->arch.dirty.second_lvl[1] = NULL;
> > +
> >      clear_page(d->shared_info);
> >      share_xen_page_with_guest(
> >          virt_to_page(d->shared_info), d, XENSHARE_writable); diff
> > --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 3801f07..bf13993
> > 100644
> > --- a/xen/arch/arm/mm.c
> > +++ b/xen/arch/arm/mm.c
> > @@ -1328,6 +1328,118 @@ void get_gma_start_end(struct domain *d,
paddr_t
> *start, paddr_t *end)
> >          *end = GUEST_RAM_BASE + ((paddr_t) d->max_pages <<
> > PAGE_SHIFT);  }
> >
> > +/* flush the vlpt area */
> > +void flush_vlpt(struct domain *d)
> > +{
> > +    int flush_size;
> > +    flush_size = (d->arch.dirty.second_lvl_end -
> > +                  d->arch.dirty.second_lvl_start) <<
SECOND_SHIFT;
> > +    /* flushing the 3rd level mapping */
> > +    flush_xen_data_tlb_range_va(d->arch.dirty.second_lvl_start
<<
> SECOND_SHIFT,
> > +                                flush_size);
> 
> All these << SECOND_SHIFT, can they be put into a macro
> VLPT_OFFSET(guest_vaddr) perhaps (assuming that''s the right
semantics)
> 
VLPT_OFFSET sounds fine.
> > +}
> > +
> > +/* restore the xen page table for vlpt mapping for domain d */ void
> > +restore_vlpt(struct domain *d)
> 
> I guess this is called on context switch? Why not in this patch?
OK. 
> 
> > +{
> > +    int i;
> > +    dsb();
> > +    for ( i = d->arch.dirty.second_lvl_start;
> > +          i < d->arch.dirty.second_lvl_end;
> > +          ++i )
> > +    {
> > +        int k = i % LPAE_ENTRIES;
> > +        int l = i / LPAE_ENTRIES;
> > +
> > +        if ( xen_second[i].bits !=
d->arch.dirty.second_lvl[l][k].bits )
> > +        {
> > +            __write_pte(&xen_second[i],
d->arch.dirty.second_lvl[l][k]);
> > +            __flush_xen_data_tlb_range_va(i << SECOND_SHIFT,
> > +                                          1 << SECOND_SHIFT);
> > +        }
> > +    }
> > +    dsb();
> > +    isb();
> > +}
> > +
> > +/* setting up the xen page table for vlpt mapping for domain d */ int
> > +prepare_vlpt(struct domain *d) {
> > +    int xen_second_linear_base;
> > +    int gp2m_start_index, gp2m_end_index;
> > +    struct p2m_domain *p2m = &d->arch.p2m;
> > +    struct page_info *second_lvl_page;
> > +    paddr_t gma_start = 0;
> > +    paddr_t gma_end = 0;
> > +    lpae_t *first[2];
> > +    int i;
> > +    uint64_t required, avail = VIRT_LIN_P2M_END - VIRT_LIN_P2M_START;
> > +
> > +    get_gma_start_end(d, &gma_start, &gma_end);
> > +    required = (gma_end - gma_start) >> LPAE_SHIFT;
> > +
> > +
> > +    if ( required > avail )
> 
> avail is the number of bytes of virtual address space in the linear p2m
> area.
> 
> required is the number of pages which the guest has. Does this comparison
> correctly account for each page being represented by an 8-byte lpae_t
> entry?
Yes, it does. See, LPAE_SHIFT only shifts 9 bits. Shifting PAGE_SHIFT (12 bits)
gives the number of pages (every bit), and 9-bit shift gives the required
memory for storing third-level PTE. Since this one-shifting of LPAE_SHIFT
is confusing enough, we can make it explicit by using something like  
required = ((gma_end - gma_start) >> PAGE_SHIFT ) * (sizeof lpae_t).
> 
> > +    {
> > +        dprintk(XENLOG_ERR, "Available VLPT is small for domU
guest"
> > +                            "(avail: %llx, required:
%llx)\n",
> > +                            avail, required);
> > +        return -ENOMEM;
> > +    }
> > +
> > +    xen_second_linear_base > > +
second_linear_offset(VIRT_LIN_P2M_START);
> > +
> > +    gp2m_start_index = gma_start >> FIRST_SHIFT;
> > +    gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1;
> > +
> > +    if ( xen_second_linear_base + gp2m_end_index >= LPAE_ENTRIES *
2 )
> > +    {
> > +        dprintk(XENLOG_ERR, "xen second page is small for VLPT
for domU");
> > +        return -ENOMEM;
> > +    }
> > +
> > +    second_lvl_page = alloc_domheap_pages(NULL, 1, 0);
> 
> There''d be no harm in allocating the two pages separately I think,
and
> avoiding the possiblilty of an order 1 allocation failing.
> 
> But if you do allocate them contiguously then d->arch.dirty.second_lvl
can
> be a simple lpae_t * and you can just access offsets 0..1023 without
> worrying about managing the array of two pointers.
> 
> As it stands I think you have the worst of both worlds.
> 
> We could also just consider allocating all firstlevel p2m pages from the
> xenheap instead of the domheap.
OK. xenheap allocation sounds good. I was not sure if I can use xenheap
for the one that looks like ''domain-specific'' purpose. If
it''s OK, it would be
much better code readability.
> 
> > +    if ( second_lvl_page == NULL )
> > +        return -ENOMEM;
> > +
> > +    /* First level p2m is 2 consecutive pages */
> > +    d->arch.dirty.second_lvl[0] = map_domain_page_global(
> > +                                          
page_to_mfn(second_lvl_page) );
> > +    d->arch.dirty.second_lvl[1] = map_domain_page_global(
> > +
> > + page_to_mfn(second_lvl_page+1) );
> > +
> > +    first[0] = __map_domain_page(p2m->first_level);
> > +    first[1] = __map_domain_page(p2m->first_level+1);
> > +    for ( i = gp2m_start_index; i < gp2m_end_index; ++i )
> 
> Can''t this just be a loop over 0..1023 and avoid all this modular
> arithmetic:
Sure.
> > +    {
> > +        int k = i % LPAE_ENTRIES;
> > +        int l = i / LPAE_ENTRIES;
> > +        int k2 = (xen_second_linear_base + i) % LPAE_ENTRIES;
> > +        int l2 = (xen_second_linear_base + i) / LPAE_ENTRIES;
> > +
> > +        write_pte(&xen_second[xen_second_linear_base+i],
> > + first[l][k]);
> > +
> > +        /* we copy the mapping into domain''s structure as a
reference
> > +         * in case of the context switch (used in restore_vlpt) */
> 
> Do you keep this up to date when changing the p2m?
> 
> In fact, why aren''t you just mapping the actual first level pages?
Why
> does it have to be a copy?
Oh right. That is much better! Then, no need to maintain the copy. Thanks!
> 
> > +        d->arch.dirty.second_lvl[l2][k2] = first[l][k];
> > +    }
> > +    unmap_domain_page(first[0]);
> > +    unmap_domain_page(first[1]);
> > +
> > +    /* storing the start and end index */
> > +    d->arch.dirty.second_lvl_start = xen_second_linear_base +
> gp2m_start_index;
> > +    d->arch.dirty.second_lvl_end = xen_second_linear_base +
> > + gp2m_end_index;
> > +
> > +    flush_vlpt(d);
> > +    return 0;
> > +}
> > +
> > +void cleanup_vlpt(struct domain *d)
> > +{
> > +    /* First level p2m is 2 consecutive pages */
> > +    unmap_domain_page_global(d->arch.dirty.second_lvl[0]);
> > +    unmap_domain_page_global(d->arch.dirty.second_lvl[1]);
> > +}
> >  /*
> >   * Local variables:
> >   * mode: C
> > diff --git a/xen/include/asm-arm/arm32/page.h
> > b/xen/include/asm-arm/arm32/page.h
> > index cf12a89..0a4e115 100644
> > --- a/xen/include/asm-arm/arm32/page.h
> > +++ b/xen/include/asm-arm/arm32/page.h
> > @@ -5,20 +5,28 @@
> >
> >  /* Write a pagetable entry.
> >   *
> > - * If the table entry is changing a text mapping, it is
> > responsibility
> > - * of the caller to issue an ISB after write_pte.
> > + * All necessary barriers are responsibility of the caller
> >   */
> > -static inline void write_pte(lpae_t *p, lpae_t pte)
> > +static inline void __write_pte(lpae_t *p, lpae_t pte)
> >  {
> >      asm volatile (
> > -        /* Ensure any writes have completed with the old mappings. */
> > -        "dsb;"
> >          /* Safely write the entry (STRD is atomic on CPUs that
support
> LPAE) */
> >          "strd %0, %H0, [%1];"
> > -        "dsb;"
> >          : : "r" (pte.bits), "r" (p) :
"memory");  }
> >
> > +/* Write a pagetable entry with dsb barriers.
> > + *
> > + * If the table entry is changing a text mapping, it is
> > +responsibility
> > + * of the caller to issue an ISB after write_pte.
> > + */
> > +static inline void write_pte(lpae_t *p, lpae_t pte) {
> > +    dsb();
> 
> Please retain the comments from the original version.
OK.
> 
> > +    __write_pte(p, pte);
> > +    dsb();
> > +}
> > +
> >  /* Inline ASM to flush dcache on register R (may be an inline asm
> > operand) */  #define __flush_xen_dcache_one(R) STORE_CP32(R, DCCMVAC)
> >
> > @@ -57,18 +65,28 @@ static inline void flush_xen_data_tlb(void)  }
> >
> >  /*
> > - * Flush a range of VA''s hypervisor mappings from the data
TLB. This
> > is not
> > - * sufficient when changing code mappings or for self modifying code.
> > + * Flush a range of VA''s hypervisor mappings from the data
TLB.
> > + * All necessary barriers are responsibility of the caller
> >   */
> > -static inline void flush_xen_data_tlb_range_va(unsigned long va,
> > unsigned long size)
> > +static inline void __flush_xen_data_tlb_range_va(unsigned long va,
> > +unsigned long size)
> >  {
> >      unsigned long end = va + size;
> > -    dsb(); /* Ensure preceding are visible */
> >      while ( va < end ) {
> >          asm volatile(STORE_CP32(0, TLBIMVAH)
> >                       : : "r" (va) : "memory");
> >          va += PAGE_SIZE;
> >      }
> > +}
> > +
> > +/*
> > + * Flush a range of VA''s hypervisor mappings from the data
TLB with
> barriers.
> > + * This barrier is not sufficient when changing code mappings or for
> > +self
> > + * modifying code.
> > + */
> > +static inline void flush_xen_data_tlb_range_va(unsigned long va,
> > +unsigned long size) {
> > +    dsb(); /* Ensure preceding are visible */
> > +    __flush_xen_data_tlb_range_va(va, size);
> >      dsb(); /* Ensure completion of the TLB flush */
> >      isb();
> >  }
> > diff --git a/xen/include/asm-arm/config.h
> > b/xen/include/asm-arm/config.h index 5b7b1a8..15ad56d 100644
> > --- a/xen/include/asm-arm/config.h
> > +++ b/xen/include/asm-arm/config.h
> > @@ -87,6 +87,7 @@
> >   *   0  -   8M   <COMMON>
> >   *
> >   *  32M - 128M   Frametable: 24 bytes per page for 16GB of RAM
> > + * 128M - 256M   Virtual-linear mapping to P2M table
> >   * 256M -   1G   VMAP: ioremap and early_ioremap use this virtual
address
> >   *                    space
> >   *
> > @@ -124,7 +125,9 @@
> >  #define CONFIG_SEPARATE_XENHEAP 1
> >
> >  #define FRAMETABLE_VIRT_START  _AT(vaddr_t,0x02000000)
> > +#define VIRT_LIN_P2M_START     _AT(vaddr_t,0x08000000)
> >  #define VMAP_VIRT_START  _AT(vaddr_t,0x10000000)
> > +#define VIRT_LIN_P2M_END       VMAP_VIRT_START
> >  #define XENHEAP_VIRT_START     _AT(vaddr_t,0x40000000)
> >  #define XENHEAP_VIRT_END       _AT(vaddr_t,0x7fffffff)
> >  #define DOMHEAP_VIRT_START     _AT(vaddr_t,0x80000000)
> > @@ -157,6 +160,8 @@
> >
> >  #define HYPERVISOR_VIRT_END    DIRECTMAP_VIRT_END
> >
> > +/*TODO (ARM_64): define VIRT_LIN_P2M_START VIRT_LIN_P2M_END */
> > +
> >  #endif
> >
> >  /* Fixmap slots */
> > diff --git a/xen/include/asm-arm/domain.h
> > b/xen/include/asm-arm/domain.h index 67bfbbc..4f366f1 100644
> > --- a/xen/include/asm-arm/domain.h
> > +++ b/xen/include/asm-arm/domain.h
> > @@ -112,6 +112,13 @@ struct arch_domain
> >          spinlock_t                  lock;
> >      } vuart;
> >
> > +    /* dirty-page tracing */
> > +    struct {
> > +        volatile int second_lvl_start;   /* for context switch */
> > +        volatile int second_lvl_end;
> 
> Are these virtual addresses, or pages or something else? We mostly hav
> specific types (e.g. vaddr_t) for such things.
> 
OK
> volatile is almost always wrong to use. Why do you want it here? If there
> is an actual coherency issue you will need proper barriers, not volatile.
Ah, OK.
Barriers would be more explicit.
> 
> Ian.

Jaeyong Yoo

2013-Nov-15 02:26 UTC

head link

Re: [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

> -----Original Message-----
> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> Sent: Wednesday, November 13, 2013 1:56 AM
> To: Jaeyong Yoo
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH RESEND v5 5/6] xen/arm: Implement
> hypercall for dirty page tracing
> 
> On Fri, 2013-11-08 at 16:50 +0900, Jaeyong Yoo wrote:
> > Add hypercall (shadow op: enable/disable and clean/peek dirtied page
> bitmap).
> > It consists of two parts: dirty page detecting and saving.
> > For detecting, we setup the guest p2m''s leaf PTE read-only
and
> > whenever the guest tries to write something, permission fault happens
> and traps into xen.
> > The permission-faulted GPA should be saved for the toolstack (when it
> > wants to see which pages are dirtied). For this purpose, we
temporarily
> save the GPAs into bitmap.
> >
> > Changes from v4:
> > 1. For temporary saving dirty pages, use bitmap rather than linked
list.
> > 2. Setup the p2m''s second level page as read-write in the
view of xen''s
> memory access.
> >    It happens in p2m_create_table function.
> >
> > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
> > ---
> >  xen/arch/arm/domain.c           |  14 +++
> >  xen/arch/arm/domctl.c           |   9 ++
> >  xen/arch/arm/mm.c               | 103 +++++++++++++++++++-
> >  xen/arch/arm/p2m.c              | 206
> ++++++++++++++++++++++++++++++++++++++++
> >  xen/arch/arm/traps.c            |   9 ++
> >  xen/include/asm-arm/domain.h    |   7 ++
> >  xen/include/asm-arm/mm.h        |   7 ++
> >  xen/include/asm-arm/p2m.h       |   4 +
> >  xen/include/asm-arm/processor.h |   2 +
> >  9 files changed, 360 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index
> > c0b5dd8..0a32301 100644
> > --- a/xen/arch/arm/domain.c
> > +++ b/xen/arch/arm/domain.c
> > @@ -215,6 +215,12 @@ static void ctxt_switch_to(struct vcpu *n)
> >      WRITE_SYSREG(hcr, HCR_EL2);
> >      isb();
> >
> > +    /* for dirty-page tracing
> > +     * XXX: how do we consider SMP case?
> > +     */
> > +    if ( n->domain->arch.dirty.mode )
> > +        restore_vlpt(n->domain);
> 
> This is an interesting question. xen_second is shared between all pcpus,
> which means that the vlpt is currently only usable from a single physical
> CPU at a time.
> 
> Currently the only per-cpu area is the domheap region from 2G-4G. We could
> steal some address space from the top or bottom of there?
oh right hmm. Then, how about place the vlpt area starting from 2G (between dom
heap and xen heap), and let the map_domain_page map the domain page starting
from
the VLPT-end address?

For this, I think just changing DOMHEAP_VIRT_START to some place (maybe
0x88000000)
would be suffice.
> 
> >      /* This is could trigger an hardware interrupt from the virtual
> >       * timer. The interrupt needs to be injected into the guest. */
> >      virt_timer_restore(n);
> > @@ -509,11 +515,19 @@ int arch_domain_create(struct domain *d,
unsigned
> int domcr_flags)
> >      /* Default the virtual ID to match the physical */
> >      d->arch.vpidr = boot_cpu_data.midr.bits;
> >
> > +    /* init for dirty-page tracing */
> > +    d->arch.dirty.count = 0;
> > +    d->arch.dirty.mode = 0;
> > +    spin_lock_init(&d->arch.dirty.lock);
> > +
> >      d->arch.dirty.second_lvl_start = 0;
> >      d->arch.dirty.second_lvl_end = 0;
> >      d->arch.dirty.second_lvl[0] = NULL;
> >      d->arch.dirty.second_lvl[1] = NULL;
> >
> > +    memset(d->arch.dirty.bitmap, 0,
sizeof(d->arch.dirty.bitmap));
> > +    d->arch.dirty.bitmap_pages = 0;
> > +
> >      clear_page(d->shared_info);
> >      share_xen_page_with_guest(
> >          virt_to_page(d->shared_info), d, XENSHARE_writable); diff
> > --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c index
> > cb38e59..eb74225 100644
> > --- a/xen/arch/arm/domctl.c
> > +++ b/xen/arch/arm/domctl.c
> > @@ -93,6 +93,15 @@ long arch_do_domctl(struct xen_domctl *domctl,
struct
> domain *d,
> >              xfree(c.data);
> >      }
> >      break;
> > +    case XEN_DOMCTL_shadow_op:
> > +    {
> > +        domain_pause(d);
> > +        ret = dirty_mode_op(d, &domctl->u.shadow_op);
> > +        domain_unpause(d);
> > +
> > +        copyback = 1;
> > +    }
> > +    break;
> >
> >      default:
> >          return -EINVAL;
> > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index
> > bf13993..d5a0a11 100644
> > --- a/xen/arch/arm/mm.c
> > +++ b/xen/arch/arm/mm.c
> > @@ -845,7 +845,6 @@ void destroy_xen_mappings(unsigned long v,
unsigned
> long e)
> >      create_xen_entries(REMOVE, v, 0, (e - v) >> PAGE_SHIFT, 0);
}
> >
> > -enum mg { mg_clear, mg_ro, mg_rw, mg_rx };  static void
> > set_pte_flags_on_range(const char *p, unsigned long l, enum mg mg)  {
> >      lpae_t pte;
> > @@ -1320,6 +1319,60 @@ int is_iomem_page(unsigned long mfn)
> >   *  xen: arm: 64-bit guest support and domU FDT autogeneration
> >   *  will be upstreamed.
> >   */
> > +
> > +static inline void mark_dirty_bitmap(struct domain *d, paddr_t addr)
> > +{
> > +    paddr_t ram_base = (paddr_t) GUEST_RAM_BASE;
> > +    int bit_index = PFN_DOWN(addr - ram_base);
> > +    int page_index = bit_index >> (PAGE_SHIFT + 3);
> 
> +3?
+3 is for changing the bit-index to byte-index. I think it would be better to
use BYTE_SHIFT kind of thing.
> 
> > +    int bit_index_residual = bit_index & ((1ul <<
(PAGE_SHIFT + 3)) -
> > + 1);
> > +
> > +    set_bit(bit_index_residual, d->arch.dirty.bitmap[page_index]);
> > +}
> > +
> > +/* routine for dirty-page tracing
> > + *
> > + * On first write, it page faults, its entry is changed to
> > +read-write,
> > + * and on retry the write succeeds.
> > + *
> > + * for locating p2m of the faulting entry, we use virtual-linear page
> table.
> > + * returns zero if addr is not valid or dirty mode is not set  */ int
> > +handle_page_fault(struct domain *d, paddr_t addr) {
> > +
> > +    lpae_t *vlp2m_pte = 0;
> > +    paddr_t gma_start = 0;
> > +    paddr_t gma_end = 0;
> > +
> > +    if ( !d->arch.dirty.mode ) return 0;
> > +    get_gma_start_end(d, &gma_start, &gma_end);
> > +
> > +    /* Ensure that addr is inside guest''s RAM */
> > +    if ( addr < gma_start ||
> > +         addr > gma_end ) return 0;
> > +
> > +    vlp2m_pte = get_vlpt_3lvl_pte(addr);
> > +    if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write
== 0 &&
> > +         vlp2m_pte->p2m.avail == 0 /* reuse avail bit as read-only
*/ )
> > +    {
> > +        lpae_t pte = *vlp2m_pte;
> > +        pte.p2m.write = 1;
> > +        write_pte(vlp2m_pte, pte);
> 
> Do we not need the p2m lock around here somewhere?
Oh, I think this brings an another question. In the dirty-page tracing phase,
would it be OK to permit changes in p2m page table (such as adding new pages)?
In that case, VLPT and dirty-bitmap should be re-prepared for taking account
this change.
> 
> > +        flush_tlb_local();
> 
> If SMP was working this would need to be inner shareable to cope with
> vcpus running on other pcpus.
You meant the p2m page table should be inner shareable?
> 
> > +        /* only necessary to lock between get-dirty bitmap and mark
dirty
> > +         * bitmap. If get-dirty bitmap happens immediately before
this
> > +         * lock, the corresponding dirty-page would be marked at the
next
> > +         * round of get-dirty bitmap */
> > +        spin_lock(&d->arch.dirty.lock);
> > +        mark_dirty_bitmap(d, addr);
> > +        spin_unlock(&d->arch.dirty.lock);
> 
> Might an atomic set_bit suffice rather than the lock?
> 
> > +    }
> > +
> > +    return 1;
> > +}
> > +
> >  void get_gma_start_end(struct domain *d, paddr_t *start, paddr_t
> > *end)  {
> >      if ( start )
> > @@ -1440,6 +1493,54 @@ void cleanup_vlpt(struct domain *d)
> >      unmap_domain_page_global(d->arch.dirty.second_lvl[0]);
> >      unmap_domain_page_global(d->arch.dirty.second_lvl[1]);
> >  }
> > +
> > +int prepare_bitmap(struct domain *d)
> > +{
> > +    paddr_t gma_start = 0;
> > +    paddr_t gma_end = 0;
> > +    int nr_bytes;
> > +    int nr_pages;
> > +    int i;
> > +
> > +    get_gma_start_end(d, &gma_start, &gma_end);
> > +
> > +    nr_bytes = (PFN_DOWN(gma_end - gma_start) + 7) / 8;
> > +    nr_pages = (nr_bytes + PAGE_SIZE - 1) / PAGE_SIZE;
> > +
> > +    BUG_ON( nr_pages > MAX_DIRTY_BITMAP_PAGES );
> > +
> > +    for ( i = 0; i < nr_pages; ++i )
> > +    {
> > +        struct page_info *page;
> > +        page = alloc_domheap_page(NULL, 0);
> > +        if ( page == NULL )
> > +            goto cleanup_on_failure;
> > +
> > +        d->arch.dirty.bitmap[i] > > +
map_domain_page_global(__page_to_mfn(page));
> 
> I think you may as well just use the xenheap here.
OK.
> 
> > +/* Change types across all p2m entries in a domain */ void
> > +p2m_change_entry_type_global(struct domain *d, enum mg nt) {
> > +    struct p2m_domain *p2m = &d->arch.p2m;
> > +    paddr_t ram_base;
> > +    int i1, i2, i3;
> > +    int first_index, second_index, third_index;
> > +    lpae_t *first = __map_domain_page(p2m->first_level);
> > +    lpae_t pte, *second = NULL, *third = NULL;
> > +
> > +    get_gma_start_end(d, &ram_base, NULL);
> > +
> > +    first_index = first_table_offset((uint64_t)ram_base);
> > +    second_index = second_table_offset((uint64_t)ram_base);
> > +    third_index = third_table_offset((uint64_t)ram_base);
> 
> Are those casts needed?
Oh, they do not need to be.
> 
> > +
> > +    BUG_ON( !first && "Can''t map first level
p2m." );
> > +
> > +    spin_lock(&p2m->lock);
> > +
> > +    for ( i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1 )
> 
> You avoid iterating over stuff below ram_base but you aren''t
worried about
> ram end?
I think I need to rework this function for accounting all those comments below.
> 
> > +    {
> > +        lpae_walk_t first_pte = first[i1].walk;
> > +        if ( !first_pte.valid || !first_pte.table )
> > +            goto out;
> > +
> > +        second = map_domain_page(first_pte.base);
> > +        BUG_ON( !second && "Can''t map second
level p2m.");
> > +        for ( i2 = second_index; i2 < LPAE_ENTRIES; ++i2 )
> > +        {
> > +            lpae_walk_t second_pte = second[i2].walk;
> > +
> > +            if ( !second_pte.valid || !second_pte.table )
> > +                goto out;
> 
> goto out is a bit strong. Don''t you want to move on to the next
second
> level page?
> 
> > +
> > +            third = map_domain_page(second_pte.base);
> > +            BUG_ON( !third && "Can''t map third
level p2m.");
> > +
> > +            for ( i3 = third_index; i3 < LPAE_ENTRIES; ++i3 )
> > +            {
> > +                lpae_walk_t third_pte = third[i3].walk;
> > +                if ( !third_pte.valid )
> > +                    goto out;
> > +
> > +                pte = third[i3];
> > +                if ( nt == mg_ro )
> > +                {
> > +                    if ( pte.p2m.write == 1 )
> > +                    {
> > +                        pte.p2m.write = 0;
> > +                        pte.p2m.avail = 0;
> > +                    }
> > +                    else
> > +                    {
> > +                        /* reuse avail bit as an indicator
> > +                         * of ''actual'' read-only */
> > +                        pte.p2m.avail = 1;
> > +                    }
> 
> This inner if is equivalent to
> 	pte.p2m.avail = pte.p2m.write;
> 	pte.p2m.write = 1;
> 
> > +                }
> > +                else if ( nt == mg_rw )
> > +                {
> > +                    if ( pte.p2m.write == 0 && pte.p2m.avail
== 0 )
> > +                    {
> > +                        pte.p2m.write = 1;
> > +                    }
> 
> and this one is:
> 	pte.p2m.write = pte.p2m.avail;
> 
> > +                }
> > +                write_pte(&third[i3], pte);
> > +            }
> > +            unmap_domain_page(third);
> > +
> > +            third = NULL;
> > +            third_index = 0;
> > +        }
> > +        unmap_domain_page(second);
> > +
> > +        second = NULL;
> > +        second_index = 0;
> > +        third_index = 0;
> > +    }
> > +
> > +out:
> > +    flush_tlb_all_local();
> > +    if ( third ) unmap_domain_page(third);
> > +    if ( second ) unmap_domain_page(second);
> > +    if ( first ) unmap_domain_page(first);
> > +
> > +    spin_unlock(&p2m->lock);
> > +}
> > +
> > +/* Read a domain''s log-dirty bitmap and stats.
> > + * If the operation is a CLEAN, clear the bitmap and stats. */ int
> > +log_dirty_op(struct domain *d, xen_domctl_shadow_op_t *sc) {
> > +    int peek = 1;
> > +    int i;
> > +    int bitmap_size;
> > +    paddr_t gma_start, gma_end;
> > +
> > +    /* this hypercall is called from domain 0, and we don''t
know which
> guest''s
> > +     * vlpt is mapped in xen_second, so, to be sure, we restore vlpt
> here */
> > +    restore_vlpt(d);
> > +
> > +    get_gma_start_end(d, &gma_start, &gma_end);
> > +    bitmap_size = (gma_end - gma_start) / 8;
> > +
> > +    if ( guest_handle_is_null(sc->dirty_bitmap) )
> > +    {
> > +        peek = 0;
> 
> What do you do with this?
peek was used for just seeing which pages are dirted without resetting the
dirty bitmap. It could be used for analyzing phase of dirty-pages (just to see
the dirty page generation rates) at toolstack. But, we are having second thought
on this and I think the code cleanup of this analysis phase was not perfect. 
So, the answer is nothing. I will clean up this. 
> 
> > +    }
> > +    else
> > +    {
> > +        spin_lock(&d->arch.dirty.lock);
> > +        for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
> > +        {
> > +            int j = 0;
> > +            uint8_t *bitmap;
> > +            copy_to_guest_offset(sc->dirty_bitmap, i * PAGE_SIZE,
> > +                                 d->arch.dirty.bitmap[i],
> > +                                 bitmap_size < PAGE_SIZE ?
bitmap_size :
> > +
> > + PAGE_SIZE);
> 
> The bitmap is always a mutliple of page_size, isn''t it?
But the actually used bits inside of the bitmap are not multiple of page_size I
think.
> 
> > +            bitmap_size -= PAGE_SIZE;
> > +
> > +            /* set p2m page table read-only */
> > +            bitmap = d->arch.dirty.bitmap[i];
> > +            while ((j = find_next_bit((const long unsigned int
*)bitmap,
> > +                                      PAGE_SIZE*8, j)) <
PAGE_SIZE*8)
> > +            {
> > +                lpae_t *vlpt;
> > +                paddr_t addr = gma_start +
> > +                               (i << (2*PAGE_SHIFT+3)) +
> > +                               (j << PAGE_SHIFT);
> > +                vlpt = get_vlpt_3lvl_pte(addr);
> > +                vlpt->p2m.write = 0;
> 
> Needs to be a write_pte type construct I think.
Oh sure.
> 
> > +                j++;
> > +            }
> > +        }
> > +
> > +        if ( sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN )
> > +        {
> > +            for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
> > +            {
> > +                clear_page(d->arch.dirty.bitmap[i]);
> > +            }
> > +        }
> > +
> > +        spin_unlock(&d->arch.dirty.lock);
> 
> Ah, I see why the lock in the handler is needed.
> 
> This holds the lock over quite a long period. Could it be made more
> granular by copying and clearing each page in sequence, dropping the lock
> between pages?
OK, I see.
> 
> > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index
> > 287dd7b..1a7ed11 100644
> > --- a/xen/arch/arm/traps.c
> > +++ b/xen/arch/arm/traps.c
> > @@ -1321,6 +1321,8 @@ static void do_trap_data_abort_guest(struct
> cpu_user_regs *regs,
> >      const char *msg;
> >      int rc, level = -1;
> >      mmio_info_t info;
> > +    int page_fault = ( (dabt.dfsc & FSC_MASK) => > +       
(FSC_FLT_PERM | FSC_3D_LEVEL) && dabt.write
> > + );
> 
> I think you can use FSC_TYPE_MASK and FSC_LL_MASK here, can''t you?
> 
> I think this would be better off refactored into a
> dabt_is_page_fault(dabt), used in the test below. That would allow you to
> more easily comment on why these particular conditions are the ones we
> care about etc.
> 
> >      if ( !check_conditional_instr(regs, hsr) )
> >      {
> > @@ -1342,6 +1344,13 @@ static void do_trap_data_abort_guest(struct
> cpu_user_regs *regs,
> >      if ( rc == -EFAULT )
> >          goto bad_data_abort;
> >
> > +    /* domU page fault handling for guest live migration */
> > +    /* dabt.valid can be 0 here */
> > +    if ( page_fault && handle_page_fault(current->domain,
info.gpa) )
> > +    {
> > +        /* Do not modify pc after page fault to repeat memory
operation
> */
> > +        return;
> > +    }
> >      /* XXX: Decode the instruction if ISS is not valid */
> >      if ( !dabt.valid )
> >          goto bad_data_abort;
> > diff --git a/xen/include/asm-arm/domain.h
> > b/xen/include/asm-arm/domain.h index 4f366f1..180d924 100644
> > --- a/xen/include/asm-arm/domain.h
> > +++ b/xen/include/asm-arm/domain.h
> > @@ -114,9 +114,16 @@ struct arch_domain
> >
> >      /* dirty-page tracing */
> >      struct {
> > +#define MAX_DIRTY_BITMAP_PAGES 64        /* support upto 8GB guest
memory
> */
> > +        spinlock_t lock;                 /* protect list: head,
mvn_head */
> > +        volatile int mode;               /* 1 if dirty pages tracing
enabled
> */
> > +        volatile unsigned int count;     /* dirty pages counter */
> 
> More unnecessary volatiles i think.
OK. 
> 
> >          volatile int second_lvl_start;   /* for context switch */
> >          volatile int second_lvl_end;
> >          lpae_t *second_lvl[2];           /* copy of guest
p2m''s first */
> > +        /* dirty bitmap */
> > +        uint8_t *bitmap[MAX_DIRTY_BITMAP_PAGES];
> > +        int bitmap_pages;                /* number of dirty bitmap
pages */
> >      } dirty;
> >
> >  }  __cacheline_aligned;
> > diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index
> > a74e135..1ce7a4b 100644
> > --- a/xen/include/asm-arm/mm.h
> > +++ b/xen/include/asm-arm/mm.h
> > @@ -341,11 +341,18 @@ static inline void put_page_and_type(struct
> page_info *page)
> >      put_page(page);
> >  }
> >
> > +enum mg { mg_clear, mg_ro, mg_rw, mg_rx };
> > +
> > +/* routine for dirty-page tracing */
> > +int handle_page_fault(struct domain *d, paddr_t addr);
> >  void get_gma_start_end(struct domain *d, paddr_t *start, paddr_t
> > *end);  int prepare_vlpt(struct domain *d);  void cleanup_vlpt(struct
> > domain *d);  void restore_vlpt(struct domain *d);
> >
> > +int prepare_bitmap(struct domain *d); void cleanup_bitmap(struct
> > +domain *d);
> > +
> >  /* calculate the xen''s virtual address for accessing the
leaf PTE of
> >   * a given address (GPA) */
> >  static inline lpae_t * get_vlpt_3lvl_pte(paddr_t addr) diff --git
> > a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h index
> > c660820..dba9a7b 100644
> > --- a/xen/include/asm-arm/p2m.h
> > +++ b/xen/include/asm-arm/p2m.h
> > @@ -2,6 +2,7 @@
> >  #define _XEN_P2M_H
> >
> >  #include <xen/mm.h>
> > +#include <public/domctl.h>
> >
> >  struct domain;
> >
> > @@ -110,6 +111,9 @@ static inline int get_page_and_type(struct
page_info
> *page,
> >      return rc;
> >  }
> >
> > +void p2m_change_entry_type_global(struct domain *d, enum mg nt); long
> > +dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc);
> > +
> >  #endif /* _XEN_P2M_H */
> >
> >  /*
> > diff --git a/xen/include/asm-arm/processor.h
> > b/xen/include/asm-arm/processor.h index 5294421..fced6ad 100644
> > --- a/xen/include/asm-arm/processor.h
> > +++ b/xen/include/asm-arm/processor.h
> > @@ -399,6 +399,8 @@ union hsr {
> >  #define FSC_CPR        (0x3a) /* Coprocossor Abort */
> >
> >  #define FSC_LL_MASK    (0x03<<0)
> > +#define FSC_MASK       (0x3f) /* Fault status mask */
> > +#define FSC_3D_LEVEL   (0x03) /* Third level fault*/
> >
> >  /* Time counter hypervisor control register */
> >  #define CNTHCTL_PA      (1u<<0)  /* Kernel/user access to
physical
> counter */

Jaeyong Yoo

2013-Nov-15 04:15 UTC

head link

Re: [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

I accidently missed this comment in the previous email.
> > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index
> > 287dd7b..1a7ed11 100644
> > --- a/xen/arch/arm/traps.c
> > +++ b/xen/arch/arm/traps.c
> > @@ -1321,6 +1321,8 @@ static void do_trap_data_abort_guest(struct
> cpu_user_regs *regs,
> >      const char *msg;
> >      int rc, level = -1;
> >      mmio_info_t info;
> > +    int page_fault = ( (dabt.dfsc & FSC_MASK) => > +       
(FSC_FLT_PERM | FSC_3D_LEVEL) && dabt.write
> > + );
> 
> I think you can use FSC_TYPE_MASK and FSC_LL_MASK here, can''t you?
> 
Something like

dabt.dfsc & FSC_TYPE_MASK == FSC_TYPE_FAULT &&
dabt.dfsc & FSC_LL_MASK == 0x3 /* third level */,

right?
> I think this would be better off refactored into a
> dabt_is_page_fault(dabt), used in the test below. That would allow you to
> more easily comment on why these particular conditions are the ones we
> care about etc.
> 
Sure.

Jaeyong

Eugene Fedotov

2013-Nov-15 07:04 UTC

head link

Re: [PATCH RESEND v5 2/6] xen/arm: Implement get_maximum_gpfn hypercall for arm

13.11.2013 14:58, Ian Campbell пишет:> On Wed, 2013-11-13 at 12:28 +0400, Eugene Fedotov wrote:
>>>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
>>>> index 123280e..3801f07 100644
>>>> --- a/xen/arch/arm/mm.c
>>>> +++ b/xen/arch/arm/mm.c
>>>> @@ -927,7 +927,11 @@ int page_is_ram_type(unsigned long mfn,
unsigned long mem_type)
>>>>    
>>>>    unsigned long domain_get_maximum_gpfn(struct domain *d)
>>> s/unsigned long/xen_pfn_t/ I think.
>>>
>>> Urk, which will break the ABI for this hypercall. That's
something of a
>>> conundrum :-/
>>>
>>> It is a domctl so we are at liberty to change it, probably to
taking a
>>> xen_pfn_t * to fill in instead of returning the value.
>>>
>>> I'm in two minds about whether we should do so now or postpone
it (which
>>> risks forgetting and having an obscure bug somewhere down the
line).
>> OK, this way, or other solution is to purge this hypercall yet in
>> migration and use max_memkb field in DOMCTL_getdomaininfo to obtain the
>> maximum PFN.
> Is there an interface to retrieve that? I don't see it :-(No, it is another hypercall XEN_DOMCTL_getdomaininfo. max_memkb field in 
xc_dominfo_t structure is a second way to retrieve maximum memory size 
and calculate PFN. Do we need domain_get_maximum_gpfn ?

Best regards,
Evgeny.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Jaeyong Yoo

2013-Nov-18 03:47 UTC

head link

Re: [PATCH RESEND v5 4/6] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration

> > > +
> > > +    second_lvl_page = alloc_domheap_pages(NULL, 1, 0);
> >
> > There''d be no harm in allocating the two pages separately I
think, and
> > avoiding the possiblilty of an order 1 allocation failing.
> >
> > But if you do allocate them contiguously then
d->arch.dirty.second_lvl
> > can be a simple lpae_t * and you can just access offsets 0..1023
> > without worrying about managing the array of two pointers.
> >
> > As it stands I think you have the worst of both worlds.
> >
> > We could also just consider allocating all firstlevel p2m pages from
> > the xenheap instead of the domheap.
> 
> OK. xenheap allocation sounds good. I was not sure if I can use xenheap
> for the one that looks like ''domain-specific'' purpose. If
it''s OK, it
> would be much better code readability.
> 
> >
> > > +    if ( second_lvl_page == NULL )
> > > +        return -ENOMEM;
> > > +
> > > +    /* First level p2m is 2 consecutive pages */
> > > +    d->arch.dirty.second_lvl[0] = map_domain_page_global(
> > > +
page_to_mfn(second_lvl_page) );> > > +    d->arch.dirty.second_lvl[1] = map_domain_page_global(
> > > +
> > > + page_to_mfn(second_lvl_page+1) );
> > > +
> > > +    first[0] = __map_domain_page(p2m->first_level);
> > > +    first[1] = __map_domain_page(p2m->first_level+1);
> > > +    for ( i = gp2m_start_index; i < gp2m_end_index; ++i )
> >
> > Can''t this just be a loop over 0..1023 and avoid all this
modular
> > arithmetic:
> 
> Sure.
CTTOI, since p2m->first_level consists of 2 pages, we can''t just
loop over
0..1023,
unless we have an improved map_domain_page that can map consecutive pages. 
One way to avoid the modular arithmetic is to use nested loop; loop over
first_level
and then loop inside a first_level page. I think this one (nested loop) is
better than
modular arithmetic.

Jaeyong

Jaeyong Yoo

2013-Nov-19 01:32 UTC

head link

Re: [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

> > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index
> > > c0b5dd8..0a32301 100644
> > > --- a/xen/arch/arm/domain.c
> > > +++ b/xen/arch/arm/domain.c
> > > @@ -215,6 +215,12 @@ static void ctxt_switch_to(struct vcpu *n)
> > >      WRITE_SYSREG(hcr, HCR_EL2);
> > >      isb();
> > >
> > > +    /* for dirty-page tracing
> > > +     * XXX: how do we consider SMP case?
> > > +     */
> > > +    if ( n->domain->arch.dirty.mode )
> > > +        restore_vlpt(n->domain);
> >
> > This is an interesting question. xen_second is shared between all
> > pcpus, which means that the vlpt is currently only usable from a
> > single physical CPU at a time.
> >
> > Currently the only per-cpu area is the domheap region from 2G-4G. We
> > could steal some address space from the top or bottom of there?
> 
> oh right hmm. Then, how about place the vlpt area starting from 2G
> (between dom heap and xen heap), and let the map_domain_page map the
> domain page starting from the VLPT-end address?
> 
> For this, I think just changing DOMHEAP_VIRT_START to some place (maybe
> 0x88000000) would be suffice.
> 
I just found out that DOMHEAP_VIRT_START should be aligned to first-level
size.
So, 0x88000000 wouldn''t work.
So, I''m thinking to use the back of the domheap area and piggybacking
the
VLPT
into the dom-heap region. For this, we should use something like the
following layout:

#define DOMHEAP_VIRT_START     _AT(vaddr_t,0x80000000)
#define VIRT_LIN_P2M_START     _AT(vaddr_t,0xf8000000)
#define VIRT_LIN_P2M_END       _AT(vaddr_t<0xffffffff)
#define DOMHEAP_VIRT_END       _AT(vaddr_t,0xffffffff)

where VLPT area is overlapped to domheap. This is necessary since dommap
size is decided
at booting time by the calculation of DOMHEAP_VIRT_END - DOMHEAP_VIRT_START,
and at this time
we have to also allocate the dommap for VLPT. Something else we can do is to
give VLPT for 1G memory
and giving it its own per-cpu page table. But, using 1G for VLPT looks like
some waste of virtual address.
So, I''m thinking of using overlapped VLPT. Could you tell me your
opinion
about this?

Jaeyong

Eugene Fedotov

2013-Nov-19 11:06 UTC

head link

Re: [PATCH RESEND v5 6/6] xen/arm: Implement toolstack for xl restore/save and migrate

13.11.2013 15:09, Ian Campbell wrote:> On Wed, 2013-11-13 at 13:57 +0400, Eugene Fedotov wrote:
>>>> diff --git a/tools/libxc/xc_arm_migrate.c
b/tools/libxc/xc_arm_migrate.c
>>>> new file mode 100644
>>>> index 0000000..461e339
>>>> --- /dev/null
>>>> +++ b/tools/libxc/xc_arm_migrate.c
>>>> @@ -0,0 +1,712 @@
>>> Is this implementing the exact protocol as described in
>>> tools/libxc/xg_save_restore.h or is it a variant? Are there any
docs of
>>> the specifics of the ARM protocol?
>> This implements a quite different from tools/libxc/xg_save_restore.h
>> protocol, it is much more simplified because we do not need some things
>> that implemented for x86. So you''re right, it has to be
documented.
>> Should we use a different header to  place documentation to this (and
>> place some definitions), for example xc_arm_migrate.h?
> I think xg_arm_save_restore.h would be the consistent name. At some
> point we should rename some of the x86 specific stuff, but you
don''t
> need to worry about that.
OK>>> We will eventually need to make a statement about the stability of
the
>>> protocol, i.e on x86 we support X->X+1 migrations across Xen
versions. I
>>> think we''d need to make similar guarantees on ARM before
we would remove
>>> the "tech preview" label from the migration feature.
>> So, should you believe our results (and where should we place this
>> statement) or should you make tests from your side?
> By stability I mean the stability of the migration ABI across version
> changes, not stability as in the bugginess or otherwise of the code. IOW
> a commitment to supporting migration from version X to version X+1 in
> the future.
>
> WRT "tech preview" I don''t think we are going to be able
to honestly
> call this production ready for the general case in 4.4, I expect
it''ll
> need a bit longer to "bed in" before we are happy with that
statement.
> (this has no bearing on whether you want to fully support it as
> production ready in your particular application).
Our migration protocol consists of: header (Console and XenStore PFN), 
memory data, HVM context. Can we guarantee that HVM context of version X 
is compatible (or the same) as HVM context of version X ? I think if we 
add some HVM hardware (such as virtual RTC) in version X+1 we may have a 
compatibility problem, because HVM buffer structure had been changed. I 
think we cannot predict such things in future versions. But our 
migration protocol may have version number and migration between X and 
X+1 Xen versions is possible when we use the same protocol number in 
both sides (we may compare protocol versions during the migration runtime).
>>>> +/* ============ Memory ============= */
>>>> +static int save_memory(xc_interface *xch, int io_fd, uint32_t
dom,
>>>> +                       struct save_callbacks *callbacks,
>>>> +                       uint32_t max_iters, uint32_t
max_factor,
>>>> +                       guest_params_t *params)
>>>> +{
>>>> +    int live =  !!(params->flags & XCFLAGS_LIVE);
>>>> +    int debug =  !!(params->flags & XCFLAGS_DEBUG);
>>>> +    xen_pfn_t i;
>>>> +    char reportbuf[80];
>>>> +    int iter = 0;
>>>> +    int last_iter = !live;
>>>> +    int total_dirty_pages_num = 0;
>>>> +    int dirty_pages_on_prev_iter_num = 0;
>>>> +    int count = 0;
>>>> +    char *page = 0;
>>>> +    xen_pfn_t *busy_pages = 0;
>>>> +    int busy_pages_count = 0;
>>>> +    int busy_pages_max = 256;
>>>> +
>>>> +    DECLARE_HYPERCALL_BUFFER(unsigned long, to_send);
>>>> +
>>>> +    xen_pfn_t start = params->start_gpfn;
>>>> +    const xen_pfn_t end = params->max_gpfn;
>>>> +    const xen_pfn_t mem_size = end - start;
>>>> +
>>>> +    if ( debug )
>>>> +    {
>>>> +        IPRINTF("(save mem) start=%llx end=%llx!\n",
start, end);
>>>> +    }
>>> FYI you don''t need the {}''s for cases like this.
>> Actually we don''t need {}, this has been done because we was
not sure if
>> this macro can be empty-substituted.
>>> is if ( debug ) IPRINTF(...) not the equivalent of DPRINTF?
>> This equivalence is not obvious for me, because in current code we
>> obtain debug flag  with XCFLAGS_DEBUG mask (when --debug option
passed).
>> If it is equivalent I''ll use DPRINTF.
> No you are right, these are different. Actually DPRINTF is "detail
> level", there is a DBGPRINTF which is "debug level" (levels
here being
> in terms of XTL_FOO from xentoollog.h). So you probably do want to use
> if (debug), although you may separately want to consider which level the
> resulting messages are printed at when they are printed...
OK>>> [...]
>>>> +
>>>> +static int restore_guest_params(xc_interface *xch, int io_fd,
>>>> +                                uint32_t dom, guest_params_t
*params)
>>>> +{
>>>> [...]
>>>> +    if ( xc_domain_setmaxmem(xch, dom, maxmemkb) )
>>>> +    {
>>>> +        ERROR("Can''t set memory map");
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    /* Set max. number of vcpus as max_vcpu_id + 1 */
>>>> +    if ( xc_domain_max_vcpus(xch, dom, params->max_vcpu_id
+ 1) )
>>> Does the higher level toolstack not take care of vcpus and maxmem?
I
>>> thought so. I think this is how it shoud be.
>> For my tests guest config information is not transferred for ARM case
>> from high-level stack. At the migration receiver side toolstack always
>> create a new domain with vcpus=1 and default max. mem. So we have to
>> send guest information as our local guest_params structure (at the
>> beginning of migration).
>> It is easy way to work "xl save" or "xl migrate"
without modification of
>> libxl level, but you may have another idea?
>> Also, toolstack_restore callback is not set (NULL) for ARM case.
> So you aren''t using xl to do the migration? This is what we should
> ultimately be aiming for. It is almost certainly going to require fixes
> at the libxl level though.
>
> If you need to send additional information for your usecase then it
> should be done at the layer above libxc.I found the reason why guest config information is not saved in libxl. 
the following line in xl_cmdimpl.c:
if (!dom_info->quiet)
         printf("Parsing config from %s\n", config_source);
in function "create_domain"
calls printf instead of fprintf(stderr, ...).
It is not the same on ARM. I have feeling that printf() causes output to 
socket buffer and it breaks config data.
Should we use fprintf(stderr, "Parsing config from %s\n", 
config_source); here?
If we do this, config is transferred and we do not need set VCPU number 
and memory manually.

Best regards,
Evgeny.

Ian Campbell

2013-Nov-19 11:37 UTC

head link

Re: [PATCH RESEND v5 4/6] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration

> > > +{
> > > +    int i;
> > > +    dsb();
> > > +    for ( i = d->arch.dirty.second_lvl_start;
> > > +          i < d->arch.dirty.second_lvl_end;
> > > +          ++i )
> > > +    {
> > > +        int k = i % LPAE_ENTRIES;
> > > +        int l = i / LPAE_ENTRIES;
> > > +
> > > +        if ( xen_second[i].bits !=
d->arch.dirty.second_lvl[l][k].bits )
> > > +        {
> > > +            __write_pte(&xen_second[i],
d->arch.dirty.second_lvl[l][k]);
> > > +            __flush_xen_data_tlb_range_va(i <<
SECOND_SHIFT,
> > > +                                          1 <<
SECOND_SHIFT);
> > > +        }
> > > +    }
> > > +    dsb();
> > > +    isb();
> > > +}
> > > +
> > > +/* setting up the xen page table for vlpt mapping for domain d
*/ int
> > > +prepare_vlpt(struct domain *d) {
> > > +    int xen_second_linear_base;
> > > +    int gp2m_start_index, gp2m_end_index;
> > > +    struct p2m_domain *p2m = &d->arch.p2m;
> > > +    struct page_info *second_lvl_page;
> > > +    paddr_t gma_start = 0;
> > > +    paddr_t gma_end = 0;
> > > +    lpae_t *first[2];
> > > +    int i;
> > > +    uint64_t required, avail = VIRT_LIN_P2M_END -
VIRT_LIN_P2M_START;
> > > +
> > > +    get_gma_start_end(d, &gma_start, &gma_end);
> > > +    required = (gma_end - gma_start) >> LPAE_SHIFT;
> > > +
> > > +
> > > +    if ( required > avail )
> > 
> > avail is the number of bytes of virtual address space in the linear
p2m
> > area.
> > 
> > required is the number of pages which the guest has. Does this
comparison
> > correctly account for each page being represented by an 8-byte lpae_t
> > entry?
> 
> Yes, it does. See, LPAE_SHIFT only shifts 9 bits. Shifting PAGE_SHIFT (12
bits)
> gives the number of pages (every bit), and 9-bit shift gives the required
> memory for storing third-level PTE. Since this one-shifting of LPAE_SHIFT
> is confusing enough, we can make it explicit by using something like  
> required = ((gma_end - gma_start) >> PAGE_SHIFT ) * (sizeof lpae_t).
I think this is clearer than requiring the reader to have to think about
the effect of shift by N-M and the reasons for it, thanks. The
alternative would be a comment explaining what is going on.

Ian.

Ian Campbell

2013-Nov-19 11:38 UTC

head link

Re: [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

On Fri, 2013-11-15 at 13:15 +0900, Jaeyong Yoo wrote:> I accidently missed this comment in the previous email.
> 
> > > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index
> > > 287dd7b..1a7ed11 100644
> > > --- a/xen/arch/arm/traps.c
> > > +++ b/xen/arch/arm/traps.c
> > > @@ -1321,6 +1321,8 @@ static void do_trap_data_abort_guest(struct
> > cpu_user_regs *regs,
> > >      const char *msg;
> > >      int rc, level = -1;
> > >      mmio_info_t info;
> > > +    int page_fault = ( (dabt.dfsc & FSC_MASK) => >
> +                          (FSC_FLT_PERM | FSC_3D_LEVEL) &&
dabt.write
> > > + );
> > 
> > I think you can use FSC_TYPE_MASK and FSC_LL_MASK here, can''t
you?
> > 
> 
> Something like
> 
> dabt.dfsc & FSC_TYPE_MASK == FSC_TYPE_FAULT &&
> dabt.dfsc & FSC_LL_MASK == 0x3 /* third level */,
> 
> right?
Something like that, yes. You''d need the bit about it being a perm
fault
in there somewhere too.
> 
> > I think this would be better off refactored into a
> > dabt_is_page_fault(dabt), used in the test below. That would allow you
to
> > more easily comment on why these particular conditions are the ones we
> > care about etc.
> > 
> 
> Sure.
> 
> Jaeyong
>

Ian Campbell

2013-Nov-19 11:42 UTC

head link

Re: [PATCH RESEND v5 4/6] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration

On Mon, 2013-11-18 at 12:47 +0900, Jaeyong Yoo wrote:> > > > +
> > > > +    second_lvl_page = alloc_domheap_pages(NULL, 1, 0);
> > >
> > > There''d be no harm in allocating the two pages
separately I think, and
> > > avoiding the possiblilty of an order 1 allocation failing.
> > >
> > > But if you do allocate them contiguously then
d->arch.dirty.second_lvl
> > > can be a simple lpae_t * and you can just access offsets 0..1023
> > > without worrying about managing the array of two pointers.
> > >
> > > As it stands I think you have the worst of both worlds.
> > >
> > > We could also just consider allocating all firstlevel p2m pages
from
> > > the xenheap instead of the domheap.
> > 
> > OK. xenheap allocation sounds good. I was not sure if I can use
xenheap
> > for the one that looks like ''domain-specific''
purpose. If it''s OK, it
> > would be much better code readability.
> > 
> > >
> > > > +    if ( second_lvl_page == NULL )
> > > > +        return -ENOMEM;
> > > > +
> > > > +    /* First level p2m is 2 consecutive pages */
> > > > +    d->arch.dirty.second_lvl[0] =
map_domain_page_global(
> > > > +
> page_to_mfn(second_lvl_page) );
> > > > +    d->arch.dirty.second_lvl[1] =
map_domain_page_global(
> > > > +
> > > > + page_to_mfn(second_lvl_page+1) );
> > > > +
> > > > +    first[0] = __map_domain_page(p2m->first_level);
> > > > +    first[1] = __map_domain_page(p2m->first_level+1);
> > > > +    for ( i = gp2m_start_index; i < gp2m_end_index; ++i
)
> > >
> > > Can''t this just be a loop over 0..1023 and avoid all
this modular
> > > arithmetic:
> > 
> > Sure.
> 
> CTTOI, since p2m->first_level consists of 2 pages, we can''t
just loop over
> 0..1023,
> unless we have an improved map_domain_page that can map consecutive pages. 
> One way to avoid the modular arithmetic is to use nested loop; loop over
> first_level
> and then loop inside a first_level page. I think this one (nested loop) is
> better than
> modular arithmetic.
Or if it were a xenheap allocation it would allow consecutive pages.

If the modular arithmetic is needed then that''s fine, or if you prefer
the nested loop then that is fine too.

A helper function to get an entry from a specified offset in the first
table which encapsulated the modular arithmetic would likely make that
approach more palatable.

Ian.

Ian Campbell

2013-Nov-19 11:54 UTC

head link

Re: [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

On Fri, 2013-11-15 at 11:26 +0900, Jaeyong Yoo wrote:> > -----Original Message-----
> > From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> > Sent: Wednesday, November 13, 2013 1:56 AM
> > To: Jaeyong Yoo
> > Cc: xen-devel@lists.xen.org
> > Subject: Re: [Xen-devel] [PATCH RESEND v5 5/6] xen/arm: Implement
> > hypercall for dirty page tracing
> > 
> > On Fri, 2013-11-08 at 16:50 +0900, Jaeyong Yoo wrote:
> > > Add hypercall (shadow op: enable/disable and clean/peek dirtied
page
> > bitmap).
> > > It consists of two parts: dirty page detecting and saving.
> > > For detecting, we setup the guest p2m''s leaf PTE
read-only and
> > > whenever the guest tries to write something, permission fault
happens
> > and traps into xen.
> > > The permission-faulted GPA should be saved for the toolstack
(when it
> > > wants to see which pages are dirtied). For this purpose, we
temporarily
> > save the GPAs into bitmap.
> > >
> > > Changes from v4:
> > > 1. For temporary saving dirty pages, use bitmap rather than
linked list.
> > > 2. Setup the p2m''s second level page as read-write in
the view of xen''s
> > memory access.
> > >    It happens in p2m_create_table function.
> > >
> > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
> > > ---
> > >  xen/arch/arm/domain.c           |  14 +++
> > >  xen/arch/arm/domctl.c           |   9 ++
> > >  xen/arch/arm/mm.c               | 103 +++++++++++++++++++-
> > >  xen/arch/arm/p2m.c              | 206
> > ++++++++++++++++++++++++++++++++++++++++
> > >  xen/arch/arm/traps.c            |   9 ++
> > >  xen/include/asm-arm/domain.h    |   7 ++
> > >  xen/include/asm-arm/mm.h        |   7 ++
> > >  xen/include/asm-arm/p2m.h       |   4 +
> > >  xen/include/asm-arm/processor.h |   2 +
> > >  9 files changed, 360 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index
> > > c0b5dd8..0a32301 100644
> > > --- a/xen/arch/arm/domain.c
> > > +++ b/xen/arch/arm/domain.c
> > > @@ -215,6 +215,12 @@ static void ctxt_switch_to(struct vcpu *n)
> > >      WRITE_SYSREG(hcr, HCR_EL2);
> > >      isb();
> > >
> > > +    /* for dirty-page tracing
> > > +     * XXX: how do we consider SMP case?
> > > +     */
> > > +    if ( n->domain->arch.dirty.mode )
> > > +        restore_vlpt(n->domain);
> > 
> > This is an interesting question. xen_second is shared between all
pcpus,
> > which means that the vlpt is currently only usable from a single
physical
> > CPU at a time.
> > 
> > Currently the only per-cpu area is the domheap region from 2G-4G. We
could
> > steal some address space from the top or bottom of there?
> 
> oh right hmm. Then, how about place the vlpt area starting from 2G (between
dom
> heap and xen heap), and let the map_domain_page map the domain page
starting from
> the VLPT-end address?
> 
> For this, I think just changing DOMHEAP_VIRT_START to some place (maybe
0x88000000)
> would be suffice.
You might find that stealing from the start needs a bit more thought in
the PT setup code wrt offsets and alignments and assumption that it
starts on a 1GB boundary etc. In which case stealing the space from the
end might turn out to be simpler by reducing DOMHEAP_ENTRIES. It would
be good to keep DOMHEAP_ENTRIES a power of two I suspect.
> > > +static inline void mark_dirty_bitmap(struct domain *d, paddr_t
addr)
> > > +{
> > > +    paddr_t ram_base = (paddr_t) GUEST_RAM_BASE;
> > > +    int bit_index = PFN_DOWN(addr - ram_base);
> > > +    int page_index = bit_index >> (PAGE_SHIFT + 3);
> > 
> > +3?
> 
> +3 is for changing the bit-index to byte-index. I think it would be better
to
> use BYTE_SHIFT kind of thing.
Or a comment, yes.

[...]> > > +    vlp2m_pte = get_vlpt_3lvl_pte(addr);
> > > +    if ( vlp2m_pte->p2m.valid &&
vlp2m_pte->p2m.write == 0 &&
> > > +         vlp2m_pte->p2m.avail == 0 /* reuse avail bit as
read-only */ )
> > > +    {
> > > +        lpae_t pte = *vlp2m_pte;
> > > +        pte.p2m.write = 1;
> > > +        write_pte(vlp2m_pte, pte);
> > 
> > Do we not need the p2m lock around here somewhere?
> 
> Oh, I think this brings an another question. In the dirty-page tracing
phase,
> would it be OK to permit changes in p2m page table (such as adding new
pages)?
> In that case, VLPT and dirty-bitmap should be re-prepared for taking
account this change.
The p2m can certainly change during migration -- e.g. grant table ops
resulting from IO.

Those changes should all call back into the p2m. Hopefully this is in
the common grant table code etc but you should check, especially that
the ARM implementation of the callback isn''t a NULL!

Or maybe this is done in arch code -- best to check when and where x86
does this I think.
> 
> > 
> > > +        flush_tlb_local();
> > 
> > If SMP was working this would need to be inner shareable to cope with
> > vcpus running on other pcpus.
> 
> You meant the p2m page table should be inner shareable?
I mean the TLB flush would need to be to all cpus in the inner shareable
domain, not just the local processor.
> > 
> > > +    }
> > > +    else
> > > +    {
> > > +        spin_lock(&d->arch.dirty.lock);
> > > +        for ( i = 0; i < d->arch.dirty.bitmap_pages; ++i )
> > > +        {
> > > +            int j = 0;
> > > +            uint8_t *bitmap;
> > > +            copy_to_guest_offset(sc->dirty_bitmap, i *
PAGE_SIZE,
> > > +                                 d->arch.dirty.bitmap[i],
> > > +                                 bitmap_size < PAGE_SIZE ?
bitmap_size :
> > > +
> > > + PAGE_SIZE);
> > 
> > The bitmap is always a mutliple of page_size, isn''t it?
> 
> But the actually used bits inside of the bitmap are not multiple of
page_size I think.
Oh right. I think it is up to us to declare in the interface whether the
supplied buffer should be rounded up to a page boundary or not. There''s
a pretty good chance that the userspace allocations will all be entire
pages anyhow.

Ian.

Ian Campbell

2013-Nov-19 11:57 UTC

head link

Re: [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

On Tue, 2013-11-19 at 10:32 +0900, Jaeyong Yoo wrote:> > > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index
> > > > c0b5dd8..0a32301 100644
> > > > --- a/xen/arch/arm/domain.c
> > > > +++ b/xen/arch/arm/domain.c
> > > > @@ -215,6 +215,12 @@ static void ctxt_switch_to(struct vcpu
*n)
> > > >      WRITE_SYSREG(hcr, HCR_EL2);
> > > >      isb();
> > > >
> > > > +    /* for dirty-page tracing
> > > > +     * XXX: how do we consider SMP case?
> > > > +     */
> > > > +    if ( n->domain->arch.dirty.mode )
> > > > +        restore_vlpt(n->domain);
> > >
> > > This is an interesting question. xen_second is shared between all
> > > pcpus, which means that the vlpt is currently only usable from a
> > > single physical CPU at a time.
> > >
> > > Currently the only per-cpu area is the domheap region from 2G-4G.
We
> > > could steal some address space from the top or bottom of there?
> > 
> > oh right hmm. Then, how about place the vlpt area starting from 2G
> > (between dom heap and xen heap), and let the map_domain_page map the
> > domain page starting from the VLPT-end address?
> > 
> > For this, I think just changing DOMHEAP_VIRT_START to some place
(maybe
> > 0x88000000) would be suffice.
> > 
> 
> I just found out that DOMHEAP_VIRT_START should be aligned to first-level
> size.
> So, 0x88000000 wouldn''t work.
> So, I''m thinking to use the back of the domheap area and
piggybacking the
> VLPT
> into the dom-heap region. For this, we should use something like the
> following layout:
> 
> #define DOMHEAP_VIRT_START     _AT(vaddr_t,0x80000000)
> #define VIRT_LIN_P2M_START     _AT(vaddr_t,0xf8000000)
> #define VIRT_LIN_P2M_END       _AT(vaddr_t<0xffffffff)
> #define DOMHEAP_VIRT_END       _AT(vaddr_t,0xffffffff)
> 
> where VLPT area is overlapped to domheap. This is necessary since dommap
> size is decided
> at booting time by the calculation of DOMHEAP_VIRT_END -
DOMHEAP_VIRT_START,
> and at this time
> we have to also allocate the dommap for VLPT.
I''d prefer to make that explicit at setup time than artificially
overlaying the two.

I was half wondering about providing a function to allocate a chunk of
domheap address range which could then be used by this code to get a bit
of memory dynamically. I''m not sure about this. One concern is that it
might provoke pessimal behaviour from the domheap (which is a hash
table, so it would be stealing slots).
>  Something else we can do is to
> give VLPT for 1G memory
> and giving it its own per-cpu page table. But, using 1G for VLPT looks like
> some waste of virtual address.
Yes, I don''t think we want to do that, mainly since it would
necessarily
halve the size of the domheap range.
> So, I''m thinking of using overlapped VLPT. Could you tell me your
opinion
> about this?
> 
> Jaeyong
> 
>

Ian Campbell

2013-Nov-19 12:01 UTC

head link

Re: [PATCH RESEND v5 6/6] xen/arm: Implement toolstack for xl restore/save and migrate

On Tue, 2013-11-19 at 15:06 +0400, Eugene Fedotov wrote:> Our migration protocol consists of: header (Console and XenStore PFN), 
> memory data, HVM context. Can we guarantee that HVM context of version X 
> is compatible (or the same) as HVM context of version X ? I think if we                                                         ^ +1
?> add some HVM hardware (such as virtual RTC) in version X+1 we may have a 
> compatibility problem, because HVM buffer structure had been changed. I 
> think we cannot predict such things in future versions.
Indeed, it may well be that Xen version X+1 needs some code adding to be
able to consume Xen version X save records.

I thin x86 does this with either explicit or implicit versioning of the
individual save structures.
>  But our 
> migration protocol may have version number and migration between X and 
> X+1 Xen versions is possible when we use the same protocol number in 
> both sides (we may compare protocol versions during the migration runtime).
On x86 the protocol almost always changes in some (possibly minor) way
between versions, so this approach would mean that we didn''t support
X->X+1 migrations in the majority of releases which would be
unacceptable.
> >>> [...]
> >>>> +
> >>>> +static int restore_guest_params(xc_interface *xch, int
io_fd,
> >>>> +                                uint32_t dom,
guest_params_t *params)
> >>>> +{
> >>>> [...]
> >>>> +    if ( xc_domain_setmaxmem(xch, dom, maxmemkb) )
> >>>> +    {
> >>>> +        ERROR("Can''t set memory map");
> >>>> +        return -1;
> >>>> +    }
> >>>> +
> >>>> +    /* Set max. number of vcpus as max_vcpu_id + 1 */
> >>>> +    if ( xc_domain_max_vcpus(xch, dom,
params->max_vcpu_id + 1) )
> >>> Does the higher level toolstack not take care of vcpus and
maxmem? I
> >>> thought so. I think this is how it shoud be.
> >> For my tests guest config information is not transferred for ARM
case
> >> from high-level stack. At the migration receiver side toolstack
always
> >> create a new domain with vcpus=1 and default max. mem. So we have
to
> >> send guest information as our local guest_params structure (at the
> >> beginning of migration).
> >> It is easy way to work "xl save" or "xl
migrate" without modification of
> >> libxl level, but you may have another idea?
> >> Also, toolstack_restore callback is not set (NULL) for ARM case.
> > So you aren''t using xl to do the migration? This is what we
should
> > ultimately be aiming for. It is almost certainly going to require
fixes
> > at the libxl level though.
> >
> > If you need to send additional information for your usecase then it
> > should be done at the layer above libxc.
> I found the reason why guest config information is not saved in libxl. 
> the following line in xl_cmdimpl.c:
> if (!dom_info->quiet)
>          printf("Parsing config from %s\n", config_source);
> in function "create_domain"
> calls printf instead of fprintf(stderr, ...).
> It is not the same on ARM. I have feeling that printf() causes output to 
> socket buffer and it breaks config data.
If the socket buffer is attached to stdout then that is a bug which
needs fixing. I''d be surprised if this turned out to be the case though
-- nothing ought to be closing stdout!
> Should we use fprintf(stderr, "Parsing config from %s\n", 
> config_source); here?
> If we do this, config is transferred and we do not need set VCPU number 
> and memory manually.
> 
> Best regards,
> Evgeny.
>

Eugene Fedotov

2013-Nov-19 12:35 UTC

head link

Re: [PATCH RESEND v5 2/6] xen/arm: Implement get_maximum_gpfn hypercall for arm

15.11.2013 11:04, Eugene Fedotov wrote:> 13.11.2013 14:58, Ian Campbell  wrote:
>> On Wed, 2013-11-13 at 12:28 +0400, Eugene Fedotov wrote:
>>>>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
>>>>> index 123280e..3801f07 100644
>>>>> --- a/xen/arch/arm/mm.c
>>>>> +++ b/xen/arch/arm/mm.c
>>>>> @@ -927,7 +927,11 @@ int page_is_ram_type(unsigned long
mfn,
>>>>> unsigned long mem_type)
>>>>>       unsigned long domain_get_maximum_gpfn(struct domain
*d)
>>>> s/unsigned long/xen_pfn_t/ I think.
>>>>
>>>> Urk, which will break the ABI for this hypercall.
That''s something
>>>> of a
>>>> conundrum :-/
>>>>
>>>> It is a domctl so we are at liberty to change it, probably to
taking a
>>>> xen_pfn_t * to fill in instead of returning the value.
>>>>
>>>> I''m in two minds about whether we should do so now or
postpone it
>>>> (which
>>>> risks forgetting and having an obscure bug somewhere down the
line).
>>> OK, this way, or other solution is to purge this hypercall yet in
>>> migration and use max_memkb field in DOMCTL_getdomaininfo to obtain
the
>>> maximum PFN.
>> Is there an interface to retrieve that? I don''t see it :-( No, it is another hypercall XEN_DOMCTL_getdomaininfo. max_memkb field in 
xc_dominfo_t structure is a second way to retrieve maximum memory size 
and calculate PFN. Do we need domain_get_maximum_gpfn ?

In addition to this question I need to ask why we need xen_pfn_t (that 
is always 64 bit on Xen-arm) for physical memory PFN. Currently the 
following libxc proxies for DOMCTL:
xc_domain_setmaxmem and max_memkb field in xc_domain_getinfo()  are 
working with unsigned long max_mem_kb parameter (that is 32 bit for ARM32)

In case if 64 bit PFN is needed, xen_pfn_t returning hypercall looks 
like better then xc_domain_getinfo() .

Best regards,
Evgeny.

Ian Campbell

2013-Nov-19 12:53 UTC

head link

Re: [PATCH RESEND v5 2/6] xen/arm: Implement get_maximum_gpfn hypercall for arm

On Tue, 2013-11-19 at 16:35 +0400, Eugene Fedotov wrote:> 15.11.2013 11:04, Eugene Fedotov wrote:
> > 13.11.2013 14:58, Ian Campbell  wrote:
> >> On Wed, 2013-11-13 at 12:28 +0400, Eugene Fedotov wrote:
> >>>>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> >>>>> index 123280e..3801f07 100644
> >>>>> --- a/xen/arch/arm/mm.c
> >>>>> +++ b/xen/arch/arm/mm.c
> >>>>> @@ -927,7 +927,11 @@ int page_is_ram_type(unsigned
long mfn,
> >>>>> unsigned long mem_type)
> >>>>>       unsigned long domain_get_maximum_gpfn(struct
domain *d)
> >>>> s/unsigned long/xen_pfn_t/ I think.
> >>>>
> >>>> Urk, which will break the ABI for this hypercall.
That''s something
> >>>> of a
> >>>> conundrum :-/
> >>>>
> >>>> It is a domctl so we are at liberty to change it, probably
to taking a
> >>>> xen_pfn_t * to fill in instead of returning the value.
> >>>>
> >>>> I''m in two minds about whether we should do so
now or postpone it
> >>>> (which
> >>>> risks forgetting and having an obscure bug somewhere down
the line).
> >>> OK, this way, or other solution is to purge this hypercall yet
in
> >>> migration and use max_memkb field in DOMCTL_getdomaininfo to
obtain the
> >>> maximum PFN.
> >> Is there an interface to retrieve that? I don''t see it
:-(
> No, it is another hypercall XEN_DOMCTL_getdomaininfo. max_memkb field in 
> xc_dominfo_t structure is a second way to retrieve maximum memory size 
> and calculate PFN. Do we need domain_get_maximum_gpfn ?
I think not yet.
> In addition to this question I need to ask why we need xen_pfn_t (that 
> is always 64 bit on Xen-arm) for physical memory PFN. Currently the 
> following libxc proxies for DOMCTL:
> xc_domain_setmaxmem and max_memkb field in xc_domain_getinfo()  are 
> working with unsigned long max_mem_kb parameter (that is 32 bit for ARM32)
The intention is that the hypercall interface be identical across 32-
and 64-bit hypervisors to avoid the need for any argument translation
when running 32-bit guests on a 64-bit hypervisor etc. Hence MFNs and
PFNs are always 64-bits. 
> 
> In case if 64 bit PFN is needed, xen_pfn_t returning hypercall looks 
> like better then xc_domain_getinfo() .
> 
> Best regards,
> Evgeny.
>

Eugene Fedotov

2013-Nov-19 13:09 UTC

head link

Re: [PATCH RESEND v5 2/6] xen/arm: Implement get_maximum_gpfn hypercall for arm

19.11.2013 16:53, Ian Campbell wrote:> On Tue, 2013-11-19 at 16:35 +0400, Eugene Fedotov wrote:
>> 15.11.2013 11:04, Eugene Fedotov wrote:
>>> 13.11.2013 14:58, Ian Campbell  wrote:
>>>> On Wed, 2013-11-13 at 12:28 +0400, Eugene Fedotov wrote:
>>>>>>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
>>>>>>> index 123280e..3801f07 100644
>>>>>>> --- a/xen/arch/arm/mm.c
>>>>>>> +++ b/xen/arch/arm/mm.c
>>>>>>> @@ -927,7 +927,11 @@ int page_is_ram_type(unsigned
long mfn,
>>>>>>> unsigned long mem_type)
>>>>>>>        unsigned long domain_get_maximum_gpfn(struct
domain *d)
>>>>>> s/unsigned long/xen_pfn_t/ I think.
>>>>>>
>>>>>> Urk, which will break the ABI for this hypercall.
That''s something
>>>>>> of a
>>>>>> conundrum :-/
>>>>>>
>>>>>> It is a domctl so we are at liberty to change it,
probably to taking a
>>>>>> xen_pfn_t * to fill in instead of returning the value.
>>>>>>
>>>>>> I''m in two minds about whether we should do so
now or postpone it
>>>>>> (which
>>>>>> risks forgetting and having an obscure bug somewhere
down the line).
>>>>> OK, this way, or other solution is to purge this hypercall
yet in
>>>>> migration and use max_memkb field in DOMCTL_getdomaininfo
to obtain the
>>>>> maximum PFN.
>>>> Is there an interface to retrieve that? I don''t see it
:-(
>> No, it is another hypercall XEN_DOMCTL_getdomaininfo. max_memkb field
in
>> xc_dominfo_t structure is a second way to retrieve maximum memory size
>> and calculate PFN. Do we need domain_get_maximum_gpfn ?
> I think not yet.
OK, I''ll remove domain_get_maximum_gpfn from patches.

Best regards,
Evgeny.

Jaeyong Yoo

2013-Nov-20 09:49 UTC

head link

Re: [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

> -----Original Message-----
> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> Sent: Tuesday, November 19, 2013 8:57 PM
> To: Jaeyong Yoo
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH RESEND v5 5/6] xen/arm: Implement
> hypercall for dirty page tracing
> 
> On Tue, 2013-11-19 at 10:32 +0900, Jaeyong Yoo wrote:
> > > > > diff --git a/xen/arch/arm/domain.c
b/xen/arch/arm/domain.c index
> > > > > c0b5dd8..0a32301 100644
> > > > > --- a/xen/arch/arm/domain.c
> > > > > +++ b/xen/arch/arm/domain.c
> > > > > @@ -215,6 +215,12 @@ static void ctxt_switch_to(struct
vcpu *n)
> > > > >      WRITE_SYSREG(hcr, HCR_EL2);
> > > > >      isb();
> > > > >
> > > > > +    /* for dirty-page tracing
> > > > > +     * XXX: how do we consider SMP case?
> > > > > +     */
> > > > > +    if ( n->domain->arch.dirty.mode )
> > > > > +        restore_vlpt(n->domain);
> > > >
> > > > This is an interesting question. xen_second is shared
between all
> > > > pcpus, which means that the vlpt is currently only usable
from a
> > > > single physical CPU at a time.
> > > >
> > > > Currently the only per-cpu area is the domheap region from
2G-4G.
> > > > We could steal some address space from the top or bottom of
there?
> > >
> > > oh right hmm. Then, how about place the vlpt area starting from
2G
> > > (between dom heap and xen heap), and let the map_domain_page map
the
> > > domain page starting from the VLPT-end address?
> > >
> > > For this, I think just changing DOMHEAP_VIRT_START to some place
> > > (maybe
> > > 0x88000000) would be suffice.
> > >
> >
> > I just found out that DOMHEAP_VIRT_START should be aligned to
> > first-level size.
> > So, 0x88000000 wouldn''t work.
> > So, I''m thinking to use the back of the domheap area and
piggybacking
> > the VLPT into the dom-heap region. For this, we should use something
> > like the following layout:
> >
> > #define DOMHEAP_VIRT_START     _AT(vaddr_t,0x80000000)
> > #define VIRT_LIN_P2M_START     _AT(vaddr_t,0xf8000000)
> > #define VIRT_LIN_P2M_END       _AT(vaddr_t<0xffffffff)
> > #define DOMHEAP_VIRT_END       _AT(vaddr_t,0xffffffff)
> >
> > where VLPT area is overlapped to domheap. This is necessary since
> > dommap size is decided at booting time by the calculation of
> > DOMHEAP_VIRT_END - DOMHEAP_VIRT_START, and at this time we have to
> > also allocate the dommap for VLPT.
> 
> I''d prefer to make that explicit at setup time than artificially
> overlaying the two.
> 
> I was half wondering about providing a function to allocate a chunk of
> domheap address range which could then be used by this code to get a bit
> of memory dynamically. I''m not sure about this. One concern is
that it
> might provoke pessimal behaviour from the domheap (which is a hash table,
> so it would be stealing slots).
I think so. If we are going to provide such a function like that, we have to
think about how to handle a pool of memory region ready for allocation, and
how does it work with ''map_domain_page''. One quick thought is
to make
"DOMHEAP_ENTRIES" as a variable and adjust this value as the
large-chunk
memory is allocated or freed. But, honestly I don''t want to increase
the
probability of breaking anything.

I think it would be better off setting it statically at booting time. And,
by doing so, DOMHEAP_ENTRIES may not be the power of 2. I think it is OK, but
you suggested that DOMHEAP_ENTRIES being power of 2 is good. Could you explain
a bit more about this?


Jaeyong.

Ian Campbell

2013-Nov-20 10:03 UTC

head link

Re: [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

On Wed, 2013-11-20 at 18:49 +0900, Jaeyong Yoo wrote:> > -----Original Message-----
> > From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> > Sent: Tuesday, November 19, 2013 8:57 PM
> > To: Jaeyong Yoo
> > Cc: xen-devel@lists.xen.org
> > Subject: Re: [Xen-devel] [PATCH RESEND v5 5/6] xen/arm: Implement
> > hypercall for dirty page tracing
> > 
> > On Tue, 2013-11-19 at 10:32 +0900, Jaeyong Yoo wrote:
> > > > > > diff --git a/xen/arch/arm/domain.c
b/xen/arch/arm/domain.c index
> > > > > > c0b5dd8..0a32301 100644
> > > > > > --- a/xen/arch/arm/domain.c
> > > > > > +++ b/xen/arch/arm/domain.c
> > > > > > @@ -215,6 +215,12 @@ static void
ctxt_switch_to(struct vcpu *n)
> > > > > >      WRITE_SYSREG(hcr, HCR_EL2);
> > > > > >      isb();
> > > > > >
> > > > > > +    /* for dirty-page tracing
> > > > > > +     * XXX: how do we consider SMP case?
> > > > > > +     */
> > > > > > +    if ( n->domain->arch.dirty.mode )
> > > > > > +        restore_vlpt(n->domain);
> > > > >
> > > > > This is an interesting question. xen_second is shared
between all
> > > > > pcpus, which means that the vlpt is currently only
usable from a
> > > > > single physical CPU at a time.
> > > > >
> > > > > Currently the only per-cpu area is the domheap region
from 2G-4G.
> > > > > We could steal some address space from the top or
bottom of there?
> > > >
> > > > oh right hmm. Then, how about place the vlpt area starting
from 2G
> > > > (between dom heap and xen heap), and let the map_domain_page
map the
> > > > domain page starting from the VLPT-end address?
> > > >
> > > > For this, I think just changing DOMHEAP_VIRT_START to some
place
> > > > (maybe
> > > > 0x88000000) would be suffice.
> > > >
> > >
> > > I just found out that DOMHEAP_VIRT_START should be aligned to
> > > first-level size.
> > > So, 0x88000000 wouldn''t work.
> > > So, I''m thinking to use the back of the domheap area and
piggybacking
> > > the VLPT into the dom-heap region. For this, we should use
something
> > > like the following layout:
> > >
> > > #define DOMHEAP_VIRT_START     _AT(vaddr_t,0x80000000)
> > > #define VIRT_LIN_P2M_START     _AT(vaddr_t,0xf8000000)
> > > #define VIRT_LIN_P2M_END       _AT(vaddr_t<0xffffffff)
> > > #define DOMHEAP_VIRT_END       _AT(vaddr_t,0xffffffff)
> > >
> > > where VLPT area is overlapped to domheap. This is necessary since
> > > dommap size is decided at booting time by the calculation of
> > > DOMHEAP_VIRT_END - DOMHEAP_VIRT_START, and at this time we have
to
> > > also allocate the dommap for VLPT.
> > 
> > I''d prefer to make that explicit at setup time than
artificially
> > overlaying the two.
> > 
> > I was half wondering about providing a function to allocate a chunk of
> > domheap address range which could then be used by this code to get a
bit
> > of memory dynamically. I''m not sure about this. One concern
is that it
> > might provoke pessimal behaviour from the domheap (which is a hash
table,
> > so it would be stealing slots).
> 
> I think so. If we are going to provide such a function like that, we have
to
> think about how to handle a pool of memory region ready for allocation, and
> how does it work with ''map_domain_page''. One quick
thought is to make
> "DOMHEAP_ENTRIES" as a variable and adjust this value as the
large-chunk
> memory is allocated or freed. But, honestly I don''t want to
increase the
> probability of breaking anything.
> 
> I think it would be better off setting it statically at booting time. And,
> by doing so, DOMHEAP_ENTRIES may not be the power of 2. I think it is OK,
but
> you suggested that DOMHEAP_ENTRIES being power of 2 is good. Could you
explain
> a bit more about this?
I was just thinking that the it would make the modular arithmetic
simpler and/or that the hashing algorithm might rely on it somehow. But
thinking again I think this mustn''t be the case, and anyway keeping
with
powers of two would require halving from 2GB to 1GB which is
undesirable.

Ian.

Xen devel - Nov 2013 - [PATCH RESEND v5 0/6] xen/arm: live migration support in arndale board

[PATCH RESEND v5 0/6] xen/arm: live migration support in arndale board

[PATCH RESEND v5 1/6] xen/arm: Implement hvm save and restore

[PATCH RESEND v5 2/6] xen/arm: Implement get_maximum_gpfn hypercall for arm

[PATCH RESEND v5 3/6] xen/arm: Implement modify_returncode

[PATCH RESEND v5 4/6] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration

[PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

[PATCH RESEND v5 6/6] xen/arm: Implement toolstack for xl restore/save and migrate

Re: [PATCH RESEND v5 1/6] xen/arm: Implement hvm save and restore

Re: [PATCH RESEND v5 2/6] xen/arm: Implement get_maximum_gpfn hypercall for arm

Re: [PATCH RESEND v5 3/6] xen/arm: Implement modify_returncode

Re: [PATCH RESEND v5 4/6] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration

Re: [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

Re: [PATCH RESEND v5 6/6] xen/arm: Implement toolstack for xl restore/save and migrate

Re: [PATCH RESEND v5 1/6] xen/arm: Implement hvm save and restore

Re: [PATCH RESEND v5 2/6] xen/arm: Implement get_maximum_gpfn hypercall for arm

Re: [PATCH RESEND v5 3/6] xen/arm: Implement modify_returncode

Re: [PATCH RESEND v5 6/6] xen/arm: Implement toolstack for xl restore/save and migrate

Re: [PATCH RESEND v5 1/6] xen/arm: Implement hvm save and restore

Re: [PATCH RESEND v5 2/6] xen/arm: Implement get_maximum_gpfn hypercall for arm

Re: [PATCH RESEND v5 6/6] xen/arm: Implement toolstack for xl restore/save and migrate

Re: [PATCH RESEND v5 6/6] xen/arm: Implement toolstack for xl restore/save and migrate

Re: [PATCH RESEND v5 1/6] xen/arm: Implement hvm save and restore

Re: [PATCH RESEND v5 1/6] xen/arm: Implement hvm save and restore

Re: [PATCH RESEND v5 4/6] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration

Re: [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

Re: [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

Re: [PATCH RESEND v5 2/6] xen/arm: Implement get_maximum_gpfn hypercall for arm

Re: [PATCH RESEND v5 4/6] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration

Re: [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

Re: [PATCH RESEND v5 6/6] xen/arm: Implement toolstack for xl restore/save and migrate

Re: [PATCH RESEND v5 4/6] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration

Re: [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

Re: [PATCH RESEND v5 4/6] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration

Re: [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

Re: [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

Re: [PATCH RESEND v5 6/6] xen/arm: Implement toolstack for xl restore/save and migrate

Re: [PATCH RESEND v5 2/6] xen/arm: Implement get_maximum_gpfn hypercall for arm

Re: [PATCH RESEND v5 2/6] xen/arm: Implement get_maximum_gpfn hypercall for arm

Re: [PATCH RESEND v5 2/6] xen/arm: Implement get_maximum_gpfn hypercall for arm

Re: [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing

Re: [PATCH RESEND v5 5/6] xen/arm: Implement hypercall for dirty page tracing