Jaeyong Yoo
2013-Oct-04 04:43 UTC
[PATCH v4 0/9] xen/arm: live migration support in arndale board
Hello xen-devel, here goes the v4 patch series. The major changes in this version is the following: 1) VLPT is improved. In version 3, we manually construct the xen''s page table to establish VLPT, but in this version, by simply copying the guest p2m''s entry into xen''s page table, we can accomplish the same with smaller LOC and faster dirty-page detection. 2) Enable SMP-domu live migration. 3) Stability improved. For single-core domU live migration, we can migrate 512 times in a row (512 is the maximum number of domain ID, as far as I know). For SMP-core domU live migration, we can migrate 50 ~ 300 times in a row, (minimum 50), and after that dom0 or domU crashes. We are trying to figure out the reason of faults. I hope I could get some feedbacks from community members about this. And, here goes the brief description of each patch. Patch 1 implements hvm save/restore. Patch 2 implements vcpu save/restore by adding more required registers for live migration. Patch 3 implements ''set_memory_map'' hyerpcall for telling hypervisor about the DomU''s memory map. This memory map is used for dirty-page tracing (Patch 4, 7, 8, and 9) Patch 4 implements ''get_maximum_gpfn'' Patch 5 implements ''modify_returncode'' for switching the return value of suspend hypercall from domU. Patch 6 is an obvious fix of a clear_guest_offset macro. Patch 7 implements base functions for VLPT. Patch 8 implements dirty-page tracing by using VLPT. Patch 9 implements the toolstack part for live migration of ARM. Best, Jaeyong Alexey Sokolov, Elena Pyatunina, Evgeny Fedotov, and Nikolay Martyanov (1): xen/arm: Implement toolstack for xl restore/save and migrate Alexey Sokolov (1): xen/arm: Implement modify_returncode Evgeny Fedotov (2): xen/arm: Implement set_memory_map hypercall xen/arm: Implement get_maximum_gpfn hypercall for arm Jaeyong Yoo and Evgeny Fedotov (1): xen/arm: Implement hvm save and restore Jaeyong Yoo and Alexey Sokolov (1): xen/arm: Add more registers for saving and restoring vcpu registers Jaeyong Yoo and Elena Pyatunina (1) xen/arm: Implement hypercall for dirty page tracing Jaeyong Yoo (2): xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration xen/arm: Fixing clear_guest_offset macro config/arm32.mk | 1 + tools/include/xen-foreign/reference.size | 2 +- tools/libxc/Makefile | 5 + tools/libxc/xc_arm_migrate.c | 805 +++++++++++++++++++++++++++++++ tools/libxc/xc_dom_arm.c | 12 +- tools/libxc/xc_domain.c | 44 ++ tools/libxc/xc_resume.c | 25 + tools/libxc/xenctrl.h | 23 + tools/misc/Makefile | 4 + xen/arch/arm/Makefile | 1 + xen/arch/arm/domain.c | 52 ++ xen/arch/arm/domctl.c | 136 +++++- xen/arch/arm/hvm.c | 228 +++++++++ xen/arch/arm/mm.c | 190 +++++++- xen/arch/arm/p2m.c | 300 ++++++++++++ xen/arch/arm/save.c | 66 +++ xen/arch/arm/traps.c | 21 +- xen/common/Makefile | 2 + xen/include/asm-arm/config.h | 6 + xen/include/asm-arm/domain.h | 14 + xen/include/asm-arm/guest_access.h | 5 +- xen/include/asm-arm/hvm/support.h | 29 ++ xen/include/asm-arm/mm.h | 29 ++ xen/include/asm-arm/p2m.h | 4 + xen/include/asm-arm/processor.h | 2 + xen/include/public/arch-arm.h | 35 ++ xen/include/public/arch-arm/hvm/save.h | 66 +++ xen/include/public/memory.h | 15 +- xen/include/xsm/dummy.h | 5 + xen/include/xsm/xsm.h | 5 + 30 files changed, 2119 insertions(+), 13 deletions(-) create mode 100644 tools/libxc/xc_arm_migrate.c create mode 100644 xen/arch/arm/save.c create mode 100644 xen/include/asm-arm/hvm/support.h -- 1.8.1.2
Implement save/restore of hvm context hypercall. In hvm context save/restore, we save gic, timer and vfp registers. Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> --- xen/arch/arm/Makefile | 1 + xen/arch/arm/domctl.c | 88 ++++++++++++- xen/arch/arm/hvm.c | 228 +++++++++++++++++++++++++++++++++ xen/arch/arm/save.c | 66 ++++++++++ xen/common/Makefile | 2 + xen/include/asm-arm/hvm/support.h | 29 +++++ xen/include/public/arch-arm/hvm/save.h | 66 ++++++++++ 7 files changed, 479 insertions(+), 1 deletion(-) create mode 100644 xen/arch/arm/save.c create mode 100644 xen/include/asm-arm/hvm/support.h diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile index 003ac84..8910a6c 100644 --- a/xen/arch/arm/Makefile +++ b/xen/arch/arm/Makefile @@ -31,6 +31,7 @@ obj-y += vuart.o obj-y += hvm.o obj-y += device.o obj-y += decode.o +obj-y += save.o #obj-bin-y += ....o diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c index 851ee40..86c07de 100644 --- a/xen/arch/arm/domctl.c +++ b/xen/arch/arm/domctl.c @@ -9,12 +9,98 @@ #include <xen/lib.h> #include <xen/errno.h> #include <xen/sched.h> +#include <xen/hvm/save.h> +#include <xen/guest_access.h> #include <public/domctl.h> long arch_do_domctl(struct xen_domctl *domctl, struct domain *d, XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) { - return -ENOSYS; + long ret = 0; + bool_t copyback = 0; + + switch ( domctl->cmd ) + { + case XEN_DOMCTL_sethvmcontext: + { + struct hvm_domain_context c = { .size = domctl->u.hvmcontext.size }; + + ret = -ENOMEM; + if ( (c.data = xmalloc_bytes(c.size)) == NULL ) + goto sethvmcontext_out; + + ret = -EFAULT; + if ( copy_from_guest(c.data, domctl->u.hvmcontext.buffer, c.size) != 0 ) + goto sethvmcontext_out; + + domain_pause(d); + ret = hvm_load(d, &c); + domain_unpause(d); + + sethvmcontext_out: + if ( c.data != NULL ) + xfree(c.data); + } + break; + case XEN_DOMCTL_gethvmcontext: + { + struct hvm_domain_context c = { 0 }; + + ret = -EINVAL; + + c.size = hvm_save_size(d); + + if ( guest_handle_is_null(domctl->u.hvmcontext.buffer) ) + { + /* Client is querying for the correct buffer size */ + domctl->u.hvmcontext.size = c.size; + ret = 0; + goto gethvmcontext_out; + } + /* Check that the client has a big enough buffer */ + ret = -ENOSPC; + if ( domctl->u.hvmcontext.size < c.size ) + { + printk("(gethvmcontext) size error: %d and %d\n", + domctl->u.hvmcontext.size, c.size ); + goto gethvmcontext_out; + } + + /* Allocate our own marshalling buffer */ + ret = -ENOMEM; + if ( (c.data = xmalloc_bytes(c.size)) == NULL ) + { + printk("(gethvmcontext) xmalloc_bytes failed: %d\n", c.size ); + goto gethvmcontext_out; + } + + domain_pause(d); + ret = hvm_save(d, &c); + domain_unpause(d); + + domctl->u.hvmcontext.size = c.cur; + if ( copy_to_guest(domctl->u.hvmcontext.buffer, c.data, c.size) != 0 ) + { + printk("(gethvmcontext) copy to guest failed\n"); + ret = -EFAULT; + } + + gethvmcontext_out: + copyback = 1; + + if ( c.data != NULL ) + xfree(c.data); + } + break; + + default: + return -EINVAL; + } + + if ( copyback && __copy_to_guest(u_domctl, domctl, 1) ) + ret = -EFAULT; + + return ret; } void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c) diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c index 471c4cd..befd75d 100644 --- a/xen/arch/arm/hvm.c +++ b/xen/arch/arm/hvm.c @@ -7,11 +7,13 @@ #include <xsm/xsm.h> +#include <xen/hvm/save.h> #include <public/xen.h> #include <public/hvm/params.h> #include <public/hvm/hvm_op.h> #include <asm/hypercall.h> +#include <asm/gic.h> long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) @@ -65,3 +67,229 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) return rc; } + +static void vgic_irq_rank_save(struct vgic_rank *ext, + struct vgic_irq_rank *rank) +{ + spin_lock(&rank->lock); + /* Some of VGIC registers are not used yet, it is for a future usage */ + /* IENABLE, IACTIVE, IPEND, PENDSGI registers */ + ext->ienable = rank->ienable; + ext->iactive = rank->iactive; + ext->ipend = rank->ipend; + ext->pendsgi = rank->pendsgi; + /* ICFG */ + ext->icfg[0] = rank->icfg[0]; + ext->icfg[1] = rank->icfg[1]; + /* IPRIORITY */ + memcpy(ext->ipriority, rank->ipriority, sizeof(rank->ipriority)); + /* ITARGETS */ + memcpy(ext->itargets, rank->itargets, sizeof(rank->itargets)); + spin_unlock(&rank->lock); +} + +static void vgic_irq_rank_restore(struct vgic_irq_rank *rank, + struct vgic_rank *ext) +{ + spin_lock(&rank->lock); + /* IENABLE, IACTIVE, IPEND, PENDSGI registers */ + rank->ienable = ext->ienable; + rank->iactive = ext->iactive; + rank->ipend = ext->ipend; + rank->pendsgi = ext->pendsgi; + /* ICFG */ + rank->icfg[0] = ext->icfg[0]; + rank->icfg[1] = ext->icfg[1]; + /* IPRIORITY */ + memcpy(rank->ipriority, ext->ipriority, sizeof(rank->ipriority)); + /* ITARGETS */ + memcpy(rank->itargets, ext->itargets, sizeof(rank->itargets)); + spin_unlock(&rank->lock); +} + + +static int gic_save(struct domain *d, hvm_domain_context_t *h) +{ + struct hvm_hw_gic ctxt; + struct vcpu *v; + + /* Save the state of GICs */ + for_each_vcpu( d, v ) + { + ctxt.gic_hcr = v->arch.gic_hcr; + ctxt.gic_vmcr = v->arch.gic_vmcr; + ctxt.gic_apr = v->arch.gic_apr; + + /* Save list registers and masks */ + /* (it is not necessary to save/restore them, but LR state can have + * influence on downtime after Live Migration (to be tested) + */ + memcpy(ctxt.gic_lr, v->arch.gic_lr, sizeof(ctxt.gic_lr)); + ctxt.lr_mask = v->arch.lr_mask; + ctxt.event_mask = v->arch.event_mask; + + /* Save PPI states (per-CPU) */ + /* It is necessary if SMP enabled */ + vgic_irq_rank_save(&ctxt.ppi_state, &v->arch.vgic.private_irqs); + + if ( hvm_save_entry(GIC, v->vcpu_id, h, &ctxt) != 0 ) + return 1; + } + return 0; +} + +static int gic_load(struct domain *d, hvm_domain_context_t *h) +{ + int vcpuid; + struct hvm_hw_gic ctxt; + struct vcpu *v; + + /* Which vcpu is this? */ + vcpuid = hvm_load_instance(h); + if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) + { + dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n", + d->domain_id, vcpuid); + return -EINVAL; + } + + if ( hvm_load_entry(GIC, h, &ctxt) != 0 ) + return -EINVAL; + + v->arch.gic_hcr = ctxt.gic_hcr; + v->arch.gic_vmcr = ctxt.gic_vmcr; + v->arch.gic_apr = ctxt.gic_apr; + + /* Restore list registers and masks */ + memcpy(v->arch.gic_lr, ctxt.gic_lr, sizeof(ctxt.gic_lr)); + v->arch.lr_mask = ctxt.lr_mask; + v->arch.event_mask = ctxt.event_mask; + + /* Restore PPI states */ + vgic_irq_rank_restore(&v->arch.vgic.private_irqs, &ctxt.ppi_state); + + return 0; +} + +HVM_REGISTER_SAVE_RESTORE(GIC, gic_save, gic_load, 1, HVMSR_PER_VCPU); + +static int timer_save(struct domain *d, hvm_domain_context_t *h) +{ + struct hvm_hw_timer ctxt; + struct vcpu *v; + struct vtimer *t; + int i; + + /* Save the state of vtimer and ptimer */ + for_each_vcpu( d, v ) + { + t = &v->arch.virt_timer; + for ( i = 0; i < 2; i++ ) + { + ctxt.cval = t->cval; + ctxt.ctl = t->ctl; + ctxt.vtb_offset = i ? d->arch.phys_timer_base.offset : + d->arch.virt_timer_base.offset; + ctxt.type = i; + if ( hvm_save_entry(A15_TIMER, v->vcpu_id, h, &ctxt) != 0 ) + return 1; + t = &v->arch.phys_timer; + } + } + + return 0; +} + +static int timer_load(struct domain *d, hvm_domain_context_t *h) +{ + int vcpuid; + struct hvm_hw_timer ctxt; + struct vcpu *v; + struct vtimer *t = NULL; + + /* Which vcpu is this? */ + vcpuid = hvm_load_instance(h); + + if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) + { + dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n", + d->domain_id, vcpuid); + return -EINVAL; + } + + if ( hvm_load_entry(A15_TIMER, h, &ctxt) != 0 ) + return -EINVAL; + + + if ( ctxt.type == TIMER_TYPE_VIRT ) + { + t = &v->arch.virt_timer; + d->arch.virt_timer_base.offset = ctxt.vtb_offset; + + } + else + { + t = &v->arch.phys_timer; + d->arch.phys_timer_base.offset = ctxt.vtb_offset; + } + + t->cval = ctxt.cval; + t->ctl = ctxt.ctl; + t->v = v; + + return 0; +} + +HVM_REGISTER_SAVE_RESTORE(A15_TIMER, timer_save, timer_load, 2, HVMSR_PER_VCPU); + +static int vfp_save(struct domain *d, hvm_domain_context_t *h) +{ + struct hvm_hw_vfp ctxt; + struct vcpu *v; + + /* Save the state of VFP */ + for_each_vcpu( d, v ) + { + if ( sizeof(v->arch.vfp) != sizeof (ctxt) ) + { + dprintk(XENLOG_G_ERR, "HVM: check VFP structure match\n"); + return -EINVAL; + } + memcpy(&ctxt, &v->arch.vfp, sizeof(ctxt)); + + if ( hvm_save_entry(VFP, v->vcpu_id, h, &ctxt) != 0 ) + return 1; + } + return 0; +} + +static int vfp_load(struct domain *d, hvm_domain_context_t *h) +{ + int vcpuid; + struct hvm_hw_vfp ctxt; + struct vcpu *v; + + /* Which vcpu is this? */ + vcpuid = hvm_load_instance(h); + if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) + { + dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n", + d->domain_id, vcpuid); + return -EINVAL; + } + + if ( sizeof(v->arch.vfp) != sizeof (ctxt) ) + { + dprintk(XENLOG_G_ERR, "HVM: check VFP structure match\n"); + return -EINVAL; + } + + if ( hvm_load_entry(VFP, h, &ctxt) != 0 ) + return -EINVAL; + + memcpy(&v->arch.vfp, &ctxt, sizeof(ctxt)); + + return 0; +} + +HVM_REGISTER_SAVE_RESTORE(VFP, vfp_save, vfp_load, 1, HVMSR_PER_VCPU); diff --git a/xen/arch/arm/save.c b/xen/arch/arm/save.c new file mode 100644 index 0000000..c923910 --- /dev/null +++ b/xen/arch/arm/save.c @@ -0,0 +1,66 @@ +/* + * hvm/save.c: Save and restore HVM guest''s emulated hardware state for ARM. + * + * Copyright (c) 2013, Samsung Electronics. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + */ + +#include <asm/hvm/support.h> +#include <public/hvm/save.h> + +void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr) +{ + hdr->cpuid = READ_SYSREG32(MIDR_EL1); +} + +int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr) +{ + uint32_t cpuid; + + if ( hdr->magic != HVM_FILE_MAGIC ) + { + printk(XENLOG_G_ERR "HVM%d restore: bad magic number %#"PRIx32"\n", + d->domain_id, hdr->magic); + return -1; + } + + if ( hdr->version != HVM_FILE_VERSION ) + { + printk(XENLOG_G_ERR "HVM%d restore: unsupported version %u\n", + d->domain_id, hdr->version); + return -1; + } + + cpuid = READ_SYSREG32(MIDR_EL1); + if ( hdr->cpuid != cpuid ) + { + printk(XENLOG_G_INFO "HVM%d restore: VM saved on one CPU " + "(%#"PRIx32") and restored on another (%#"PRIx32").\n", + d->domain_id, hdr->cpuid, cpuid); + return -1; + } + + return 0; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/common/Makefile b/xen/common/Makefile index 6da4651..733b19f 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -61,6 +61,8 @@ subdir-$(CONFIG_COMPAT) += compat subdir-$(x86_64) += hvm +subdir-$(CONFIG_ARM) += hvm + subdir-$(coverage) += gcov subdir-y += libelf diff --git a/xen/include/asm-arm/hvm/support.h b/xen/include/asm-arm/hvm/support.h new file mode 100644 index 0000000..8311f2f --- /dev/null +++ b/xen/include/asm-arm/hvm/support.h @@ -0,0 +1,29 @@ +/* + * support.h: HVM support routines used by ARMv7 VE. + * + * Copyright (c) 2012, Citrix Systems + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + */ + +#ifndef __ASM_ARM_HVM_SUPPORT_H__ +#define __ASM_ARM_HVM_SUPPORT_H__ + +#include <xen/types.h> +#include <public/hvm/ioreq.h> +#include <xen/sched.h> +#include <xen/hvm/save.h> +#include <asm/processor.h> + +#endif /* __ASM_ARM_HVM_SUPPORT_H__ */ diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h index 75b8e65..7e55384 100644 --- a/xen/include/public/arch-arm/hvm/save.h +++ b/xen/include/public/arch-arm/hvm/save.h @@ -26,6 +26,72 @@ #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__ #define __XEN_PUBLIC_HVM_SAVE_ARM_H__ +#define HVM_FILE_MAGIC 0x92385520 +#define HVM_FILE_VERSION 0x00000001 + + +struct hvm_save_header +{ + uint32_t magic; /* Must be HVM_FILE_MAGIC */ + uint32_t version; /* File format version */ + uint64_t changeset; /* Version of Xen that saved this file */ + uint32_t cpuid; /* MIDR_EL1 on the saving machine */ +}; + +DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header); + +struct vgic_rank +{ + uint32_t ienable, iactive, ipend, pendsgi; + uint32_t icfg[2]; + uint32_t ipriority[8]; + uint32_t itargets[8]; +}; + +struct hvm_hw_gic +{ + uint32_t gic_hcr; + uint32_t gic_vmcr; + uint32_t gic_apr; + uint32_t gic_lr[64]; + uint64_t event_mask; + uint64_t lr_mask; + struct vgic_rank ppi_state; +}; + +DECLARE_HVM_SAVE_TYPE(GIC, 2, struct hvm_hw_gic); + +#define TIMER_TYPE_VIRT 0 +#define TIMER_TYPE_PHYS 1 + +struct hvm_hw_timer +{ + uint64_t vtb_offset; + uint32_t ctl; + uint64_t cval; + uint32_t type; +}; + +DECLARE_HVM_SAVE_TYPE(A15_TIMER, 3, struct hvm_hw_timer); + +struct hvm_hw_vfp +{ + uint64_t fpregs1[16]; /* {d0-d15} */ + uint64_t fpregs2[16]; /* {d16-d31} */ + uint32_t fpexc; + uint32_t fpscr; + /* VFP implementation specific state */ + uint32_t fpinst; + uint32_t fpinst2; +}; + +DECLARE_HVM_SAVE_TYPE(VFP, 4, struct hvm_hw_vfp); + +/* + * Largest type-code in use + */ +#define HVM_SAVE_CODE_MAX 4 + #endif /* -- 1.8.1.2
Jaeyong Yoo
2013-Oct-04 04:43 UTC
[PATCH v4 2/9] xen/arm: Add more registers for saving and restoring vcpu registers
Add more registers for saving and restoring vcpu registers for live migration. Those registers are selected from the registers stored when vcpu context switching. Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> --- tools/include/xen-foreign/reference.size | 2 +- xen/arch/arm/domain.c | 34 +++++++++++++++++++++++++++++++ xen/arch/arm/domctl.c | 35 ++++++++++++++++++++++++++++++++ xen/include/public/arch-arm.h | 35 ++++++++++++++++++++++++++++++++ 4 files changed, 105 insertions(+), 1 deletion(-) diff --git a/tools/include/xen-foreign/reference.size b/tools/include/xen-foreign/reference.size index b3347b4..dd7b0f8 100644 --- a/tools/include/xen-foreign/reference.size +++ b/tools/include/xen-foreign/reference.size @@ -5,7 +5,7 @@ start_info | - - 1112 1168 trap_info | - - 8 16 cpu_user_regs | - - 68 200 vcpu_guest_core_regs | 304 304 - - -vcpu_guest_context | 336 336 2800 5168 +vcpu_guest_context | 440 440 2800 5168 arch_vcpu_info | 0 0 24 16 vcpu_time_info | 32 32 32 32 vcpu_info | 48 48 64 64 diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index cb0424d..d3bac4d 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -601,6 +601,40 @@ int arch_set_info_guest( v->arch.ttbr1 = ctxt->ttbr1; v->arch.ttbcr = ctxt->ttbcr; + v->arch.dacr = ctxt->dacr; + v->arch.ifar = ctxt->ifar; + v->arch.ifsr = ctxt->ifsr; + v->arch.dfar = ctxt->dfar; + v->arch.dfsr = ctxt->dfsr; + +#ifdef CONFIG_ARM_32 + v->arch.mair0 = ctxt->mair0; + v->arch.mair1 = ctxt->mair1; +#else + v->arch.mair = ctxt->mair; +#endif + + /* Control Registers */ + v->arch.actlr = ctxt->actlr; + v->arch.cpacr = ctxt->cpacr; + + v->arch.contextidr = ctxt->contextidr; + v->arch.tpidr_el0 = ctxt->tpidr_el0; + v->arch.tpidr_el1 = ctxt->tpidr_el1; + v->arch.tpidrro_el0 = ctxt->tpidrro_el0; + + /* CP 15 */ + v->arch.csselr = ctxt->csselr; + + v->arch.afsr0 = ctxt->afsr0; + v->arch.afsr1 = ctxt->afsr1; + v->arch.vbar = ctxt->vbar; + v->arch.par = ctxt->par; + v->arch.teecr = ctxt->teecr; + v->arch.teehbr = ctxt->teehbr; + v->arch.joscr = ctxt->joscr; + v->arch.jmcr = ctxt->jmcr; + v->is_initialised = 1; if ( ctxt->flags & VGCF_online ) diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c index 86c07de..78fb322 100644 --- a/xen/arch/arm/domctl.c +++ b/xen/arch/arm/domctl.c @@ -115,6 +115,41 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c) ctxt->ttbr1 = v->arch.ttbr1; ctxt->ttbcr = v->arch.ttbcr; + ctxt->dacr = v->arch.dacr; + ctxt->ifar = v->arch.ifar; + ctxt->ifsr = v->arch.ifsr; + ctxt->dfar = v->arch.dfar; + ctxt->dfsr = v->arch.dfsr; + +#ifdef CONFIG_ARM_32 + ctxt->mair0 = v->arch.mair0; + ctxt->mair1 = v->arch.mair1; +#else + ctxt->mair = v->arch.mair; +#endif + + /* Control Registers */ + ctxt->actlr = v->arch.actlr; + ctxt->sctlr = v->arch.sctlr; + ctxt->cpacr = v->arch.cpacr; + + ctxt->contextidr = v->arch.contextidr; + ctxt->tpidr_el0 = v->arch.tpidr_el0; + ctxt->tpidr_el1 = v->arch.tpidr_el1; + ctxt->tpidrro_el0 = v->arch.tpidrro_el0; + + /* CP 15 */ + ctxt->csselr = v->arch.csselr; + + ctxt->afsr0 = v->arch.afsr0; + ctxt->afsr1 = v->arch.afsr1; + ctxt->vbar = v->arch.vbar; + ctxt->par = v->arch.par; + ctxt->teecr = v->arch.teecr; + ctxt->teehbr = v->arch.teehbr; + ctxt->joscr = v->arch.joscr; + ctxt->jmcr = v->arch.jmcr; + if ( !test_bit(_VPF_down, &v->pause_flags) ) ctxt->flags |= VGCF_online; } diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h index 5d359af..bf6dc9a 100644 --- a/xen/include/public/arch-arm.h +++ b/xen/include/public/arch-arm.h @@ -253,6 +253,41 @@ struct vcpu_guest_context { uint32_t sctlr, ttbcr; uint64_t ttbr0, ttbr1; + uint32_t ifar, dfar; + uint32_t ifsr, dfsr; + uint32_t dacr; + uint64_t par; + +#ifdef CONFIG_ARM_32 + uint32_t mair0, mair1; + uint32_t tpidr_el0; + uint32_t tpidr_el1; + uint32_t tpidrro_el0; + uint32_t vbar; +#else + uint64_t mair; + uint64_t tpidr_el0; + uint64_t tpidr_el1; + uint64_t tpidrro_el0; + uint64_t vbar; +#endif + + /* Control Registers */ + uint32_t actlr; + uint32_t cpacr; + + uint32_t afsr0, afsr1; + + uint32_t contextidr; + + uint32_t teecr, teehbr; /* ThumbEE, 32-bit guests only */ + +#ifdef CONFIG_ARM_32 + uint32_t joscr, jmcr; +#endif + + /* CP 15 */ + uint32_t csselr; }; typedef struct vcpu_guest_context vcpu_guest_context_t; DEFINE_XEN_GUEST_HANDLE(vcpu_guest_context_t); -- 1.8.1.2
Jaeyong Yoo
2013-Oct-04 04:43 UTC
[PATCH v4 3/9] xen/arm: Implement set_memory_map hypercall for arm
From: Evgeny Fedotov <e.fedotov@samsung.com> When creating domU in toolstack, pass the guest memory map info to the hypervisor, and the hypervisor stores those info in arch_domain for later use. Singed-off-by: Evgeny Fedotov <e.fedotov@samsung.com> --- tools/libxc/xc_dom_arm.c | 12 +++++++- tools/libxc/xc_domain.c | 44 ++++++++++++++++++++++++++++ tools/libxc/xenctrl.h | 23 +++++++++++++++ xen/arch/arm/domain.c | 3 ++ xen/arch/arm/mm.c | 68 ++++++++++++++++++++++++++++++++++++++++++++ xen/include/asm-arm/domain.h | 2 ++ xen/include/asm-arm/mm.h | 1 + xen/include/public/memory.h | 15 ++++++++-- xen/include/xsm/dummy.h | 5 ++++ xen/include/xsm/xsm.h | 5 ++++ 10 files changed, 175 insertions(+), 3 deletions(-) diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c index df59ffb..20c9095 100644 --- a/tools/libxc/xc_dom_arm.c +++ b/tools/libxc/xc_dom_arm.c @@ -166,6 +166,7 @@ int arch_setup_meminit(struct xc_dom_image *dom) { int rc; xen_pfn_t pfn, allocsz, i; + struct dt_mem_info memmap; dom->shadow_enabled = 1; @@ -191,7 +192,16 @@ int arch_setup_meminit(struct xc_dom_image *dom) 0, 0, &dom->p2m_host[i]); } - return 0; + /* setup guest memory map */ + memmap.nr_banks = 2; + memmap.bank[0].start = (dom->rambase_pfn << PAGE_SHIFT_ARM); + memmap.bank[0].size = (dom->total_pages << PAGE_SHIFT_ARM); + /*The end of main memory: magic pages */ + memmap.bank[1].start = memmap.bank[0].start + memmap.bank[0].size; + memmap.bank[1].size = NR_MAGIC_PAGES << PAGE_SHIFT_ARM; + + return xc_domain_set_memory_map(dom->xch, dom->guest_domid, &memmap); + } int arch_setup_bootearly(struct xc_dom_image *dom) diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index 81316d3..7b283f1 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -662,7 +662,51 @@ int xc_domain_set_memmap_limit(xc_interface *xch, return -1; } #endif +#if defined(__arm__) +int xc_domain_get_memory_map(xc_interface *xch, + uint32_t domid, + struct dt_mem_info *map) +{ + int rc; + struct xen_arm_memory_map fmap = { + .domid = domid + }; + + DECLARE_HYPERCALL_BOUNCE(map, sizeof(struct dt_mem_info), + XC_HYPERCALL_BUFFER_BOUNCE_OUT); + + if ( !map || xc_hypercall_bounce_pre(xch, map) ) + return -1; + set_xen_guest_handle(fmap.buffer, map); + rc = do_memory_op(xch, XENMEM_memory_map, &fmap, sizeof(fmap)); + + xc_hypercall_bounce_post(xch, map); + + return rc; +} + +int xc_domain_set_memory_map(xc_interface *xch, + uint32_t domid, + struct dt_mem_info *map) +{ + int rc; + struct xen_arm_memory_map fmap = { + .domid = domid + }; + DECLARE_HYPERCALL_BOUNCE(map, sizeof(struct dt_mem_info), + XC_HYPERCALL_BUFFER_BOUNCE_IN); + + if ( !map || xc_hypercall_bounce_pre(xch, map) ) + return -1; + set_xen_guest_handle(fmap.buffer, map); + + rc = do_memory_op(xch, XENMEM_set_memory_map, &fmap, sizeof(fmap)); + + xc_hypercall_bounce_post(xch, map); + return rc; +} +#endif int xc_domain_set_time_offset(xc_interface *xch, uint32_t domid, int32_t time_offset_seconds) diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h index 58d51f3..c1468fa 100644 --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -1122,6 +1122,29 @@ int xc_get_machine_memory_map(xc_interface *xch, struct e820entry entries[], uint32_t max_entries); #endif + +#if defined(__arm__) +#define NR_MEM_BANKS 8 +typedef uint64_t paddr_t; + +struct membank { + paddr_t start; + paddr_t size; +}; + +struct dt_mem_info { + int nr_banks; + struct membank bank[NR_MEM_BANKS]; +}; + +int xc_domain_set_memory_map(xc_interface *xch, + uint32_t domid, + struct dt_mem_info *map); +int xc_domain_get_memory_map(xc_interface *xch, + uint32_t domid, + struct dt_mem_info *map); +#endif + int xc_domain_set_time_offset(xc_interface *xch, uint32_t domid, int32_t time_offset_seconds); diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index d3bac4d..57be341 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -509,6 +509,9 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) /* Default the virtual ID to match the physical */ d->arch.vpidr = boot_cpu_data.midr.bits; + spin_lock_init(&d->arch.map_lock); + d->arch.map_domain.nr_banks = 0; + clear_page(d->shared_info); share_xen_page_with_guest( virt_to_page(d->shared_info), d, XENSHARE_writable); diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 474dfef..f5d9cf4 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1156,6 +1156,74 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) return rc; } + case XENMEM_set_memory_map: + { + struct xen_arm_memory_map fmap; + struct domain *d; + struct dt_mem_info info; + + if ( copy_from_guest(&fmap, arg, 1) ) + return -EFAULT; + + if ( copy_from_guest(&info, fmap.buffer, 1) ) + { + return -EFAULT; + } + + if ( info.nr_banks > NR_MEM_BANKS ) + return -EINVAL; + + d = rcu_lock_domain_by_any_id(fmap.domid); + if ( d == NULL ) + return -ESRCH; + + rc = xsm_domain_memory_map(XSM_TARGET, d); + if ( rc ) + { + rcu_unlock_domain(d); + return rc; + } + spin_lock(&d->arch.map_lock); + d->arch.map_domain = info; + spin_unlock(&d->arch.map_lock); + + rcu_unlock_domain(d); + return rc; + } + + case XENMEM_memory_map: + { + /* get the domain''s memory map as it was stored */ + struct xen_arm_memory_map fmap; + struct domain *d; + struct dt_mem_info info; + + if ( copy_from_guest(&fmap, arg, 1) ) + return -EFAULT; + + d = rcu_lock_domain_by_any_id(fmap.domid); + if ( d == NULL ) + return -ESRCH; + + spin_lock(&d->arch.map_lock); + info = d->arch.map_domain; + spin_unlock(&d->arch.map_lock); + + if ( copy_to_guest(fmap.buffer, &info, 1) ) + { + rcu_unlock_domain(d); + return -EFAULT; + } + + if ( copy_to_guest(arg, &fmap, 1) ) + { + rcu_unlock_domain(d); + return -EFAULT; + } + + rcu_unlock_domain(d); + return 0; + } /* XXX: memsharing not working yet */ case XENMEM_get_sharing_shared_pages: case XENMEM_get_sharing_freed_pages: diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 67bfbbc..755c7af 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -112,6 +112,8 @@ struct arch_domain spinlock_t lock; } vuart; + struct dt_mem_info map_domain; + spinlock_t map_lock; } __cacheline_aligned; struct arch_vcpu diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index ce66099..5142524 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -5,6 +5,7 @@ #include <xen/kernel.h> #include <asm/page.h> #include <public/xen.h> +#include <xen/device_tree.h> /* Align Xen to a 2 MiB boundary. */ #define XEN_PADDR_ALIGN (1 << 21) diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h index 7a26dee..264fb8f 100644 --- a/xen/include/public/memory.h +++ b/xen/include/public/memory.h @@ -283,9 +283,12 @@ DEFINE_XEN_GUEST_HANDLE(xen_remove_from_physmap_t); /*#define XENMEM_translate_gpfn_list 8*/ /* - * Returns the pseudo-physical memory map as it was when the domain + * x86: returns the pseudo-physical memory map as it was when the domain * was started (specified by XENMEM_set_memory_map). * arg == addr of xen_memory_map_t. + * ARM: returns the pseudo-physical memory map as it was set + * (specified by XENMEM_set_memory_map). + * arg == addr of xen_arm_memory_map_t. */ #define XENMEM_memory_map 9 struct xen_memory_map { @@ -315,7 +318,8 @@ DEFINE_XEN_GUEST_HANDLE(xen_memory_map_t); /* * Set the pseudo-physical memory map of a domain, as returned by * XENMEM_memory_map. - * arg == addr of xen_foreign_memory_map_t. + * x86: arg == addr of xen_foreign_memory_map_t. + * ARM: arg == addr of xen_arm_memory_map_t */ #define XENMEM_set_memory_map 13 struct xen_foreign_memory_map { @@ -325,6 +329,13 @@ struct xen_foreign_memory_map { typedef struct xen_foreign_memory_map xen_foreign_memory_map_t; DEFINE_XEN_GUEST_HANDLE(xen_foreign_memory_map_t); +struct xen_arm_memory_map { + domid_t domid; + XEN_GUEST_HANDLE(void) buffer; +}; +typedef struct xen_arm_memory_map xen_arm_memory_map_t; +DEFINE_XEN_GUEST_HANDLE(xen_arm_memory_map_t); + #define XENMEM_set_pod_target 16 #define XENMEM_get_pod_target 17 struct xen_pod_target { diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h index 052f3e0..427f800 100644 --- a/xen/include/xsm/dummy.h +++ b/xen/include/xsm/dummy.h @@ -627,4 +627,9 @@ static XSM_INLINE int xsm_map_gmfn_foreign(XSM_DEFAULT_ARG struct domain *d, str XSM_ASSERT_ACTION(XSM_TARGET); return xsm_default_action(action, d, t); } +static XSM_INLINE int xsm_domain_memory_map(XSM_DEFAULT_ARG struct domain *d) +{ + XSM_ASSERT_ACTION(XSM_TARGET); + return xsm_default_action(action, current->domain, d); +} #endif diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h index 1939453..9764011 100644 --- a/xen/include/xsm/xsm.h +++ b/xen/include/xsm/xsm.h @@ -625,6 +625,11 @@ static inline int xsm_map_gmfn_foreign (struct domain *d, struct domain *t) { return xsm_ops->map_gmfn_foreign(d, t); } +static inline int xsm_domain_memory_map(xsm_default_t def, struct domain *d) +{ + return xsm_ops->domain_memory_map(d); +} + #endif /* CONFIG_ARM */ #endif /* XSM_NO_WRAPPERS */ -- 1.8.1.2
Jaeyong Yoo
2013-Oct-04 04:44 UTC
[PATCH v4 4/9] xen/arm: Implement get_maximum_gpfn hypercall for arm
From: Evgeny Fedotov <e.fedotov@samsung.com> By using the memory map info in arch_domain (from set_memory_map hypercall) implement get_maximum_gpfn hypercall. Singed-off-by: Evgeny Fedotov <e.fedotov@samsung.com> --- xen/arch/arm/mm.c | 21 ++++++++++++++++++++- xen/include/asm-arm/mm.h | 2 ++ 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index f5d9cf4..120eae8 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -918,9 +918,28 @@ int page_is_ram_type(unsigned long mfn, unsigned long mem_type) return 0; } +void get_gma_start_end(struct domain *d, vaddr_t *start, vaddr_t *end) +{ + int nr_banks = d->arch.map_domain.nr_banks; + + ASSERT(nr_banks != 0); + + spin_lock(&d->arch.map_lock); + if ( start ) + *start = d->arch.map_domain.bank[0].start; + if ( end ) + *end = d->arch.map_domain.bank[nr_banks-1].start + + d->arch.map_domain.bank[nr_banks-1].size; + spin_unlock(&d->arch.map_lock); +} + unsigned long domain_get_maximum_gpfn(struct domain *d) { - return -ENOSYS; + vaddr_t end; + + get_gma_start_end(d, NULL, &end); + + return (unsigned long) (end >> PAGE_SHIFT); } void share_xen_page_with_guest(struct page_info *page, diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index 5142524..021e8d6 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -342,6 +342,8 @@ static inline void put_page_and_type(struct page_info *page) put_page(page); } +void get_gma_start_end(struct domain *d, vaddr_t *start, vaddr_t *end); + #endif /* __ARCH_ARM_MM__ */ /* * Local variables: -- 1.8.1.2
From: Alexey Sokolov <sokolov.a@samsung.com> Making sched_op in do_suspend (driver/xen/manage.c) returns 0 on the success of suspend. Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com> --- tools/libxc/xc_resume.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c index cb61650..934158c 100644 --- a/tools/libxc/xc_resume.c +++ b/tools/libxc/xc_resume.c @@ -73,6 +73,31 @@ static int modify_returncode(xc_interface *xch, uint32_t domid) return 0; } +#elif defined(__arm__) + +static int modify_returncode(xc_interface *xch, uint32_t domid) +{ + vcpu_guest_context_any_t ctxt; + xc_dominfo_t info; + int rc; + + if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 ) + { + PERROR("Could not get domain info"); + return -1; + } + + if ( (rc = xc_vcpu_getcontext(xch, domid, 0, &ctxt)) != 0 ) + return rc; + + ctxt.c.user_regs.r0_usr = 1; + + if ( (rc = xc_vcpu_setcontext(xch, domid, 0, &ctxt)) != 0 ) + return rc; + + return 0; +} + #else static int modify_returncode(xc_interface *xch, uint32_t domid) -- 1.8.1.2
Fix the the broken macro ''clear_guest_offset'' in arm. Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> --- xen/include/asm-arm/guest_access.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-arm/guest_access.h index 34aae14..8ff088f 100644 --- a/xen/include/asm-arm/guest_access.h +++ b/xen/include/asm-arm/guest_access.h @@ -77,8 +77,9 @@ unsigned long raw_clear_guest(void *to, unsigned len); * Clear an array of objects in guest context via a guest handle, * specifying an offset into the guest array. */ -#define clear_guest_offset(hnd, off, ptr, nr) ({ \ - raw_clear_guest(_d+(off), nr); \ +#define clear_guest_offset(hnd, off, nr) ({ \ + void *_d = (hnd).p; \ + raw_clear_guest(_d+(off), nr); \ }) /* -- 1.8.1.2
Jaeyong Yoo
2013-Oct-04 04:44 UTC
[PATCH v4 7/9] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration
Implement vlpt (virtual-linear page table) for fast accessing of 3rd PTE of guest p2m. When creating a mapping for vlpt, just copy the 1st level PTE of guest p2m to the xen''s 2nd level PTE. Then the mapping becomes the following: xen''s 1st PTE --> xen''s 2nd PTE (which is the same as 1st PTE of guest p2m) --> guest p2m''s 2nd PTE --> guest p2m''s 3rd PTE (the memory contents where the vlpt points) For more info about vlpt, see: http://www.technovelty.org/linux/virtual-linear-page-table.html This function is used in dirty-page tracing: when domU write-fault is trapped by xen, xen can immediately locate the 3rd PTE of guest p2m. Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> --- xen/arch/arm/mm.c | 78 ++++++++++++++++++++++++++++++++++++++++++++ xen/include/asm-arm/config.h | 6 ++++ xen/include/asm-arm/domain.h | 7 ++++ xen/include/asm-arm/mm.h | 22 +++++++++++++ 4 files changed, 113 insertions(+) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 120eae8..98edbc9 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1382,6 +1382,84 @@ int is_iomem_page(unsigned long mfn) return 1; return 0; } + +/* flush the vlpt area */ +static void flush_vlpt(struct domain *d) +{ + int flush_size; + flush_size = (d->arch.dirty.second_lvl_end - + d->arch.dirty.second_lvl_start) << SECOND_SHIFT; + /* flushing the 3rd level mapping */ + flush_xen_data_tlb_range_va(VIRT_LIN_P2M_START, + flush_size); +} + +/* restore the xen page table for vlpt mapping for domain d */ +void restore_vlpt(struct domain *d) +{ + int i, j = 0; + int need_flush = 0; + + for ( i = d->arch.dirty.second_lvl_start; + i < d->arch.dirty.second_lvl_end; + ++i, ++j ) + { + if ( xen_second[i].bits != d->arch.dirty.second_lvl[j].bits ) + { + write_pte(&xen_second[i], d->arch.dirty.second_lvl[j]); + need_flush = 1; + } + } + + if ( need_flush ) flush_vlpt(d); +} + +/* setting up the xen page table for vlpt mapping for domain d */ +void prepare_vlpt(struct domain *d) +{ + int xen_second_linear_base; + int gp2m_start_index, gp2m_end_index; + struct p2m_domain *p2m = &d->arch.p2m; + struct page_info *second_lvl_page; + vaddr_t gma_start = 0; + vaddr_t gma_end = 0; + lpae_t *first; + int i, j; + + xen_second_linear_base = second_linear_offset(VIRT_LIN_P2M_START); + get_gma_start_end(d, &gma_start, &gma_end); + + gp2m_start_index = gma_start >> FIRST_SHIFT; + gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1; + + second_lvl_page = alloc_domheap_page(NULL, 0); + d->arch.dirty.second_lvl = map_domain_page_global( + page_to_mfn(second_lvl_page) ); + + first = __map_domain_page(p2m->first_level); + for ( i = gp2m_start_index, j = 0; i < gp2m_end_index; ++i, ++j ) + { + write_pte(&xen_second[xen_second_linear_base+i], + first[i]); + + /* we copy the mapping into domain''s structure as a reference in case + * of the context switch (used in restore_vlpt) */ + d->arch.dirty.second_lvl[j] = first[i]; + } + unmap_domain_page(first); + + /* storing the start and end index */ + d->arch.dirty.second_lvl_start = xen_second_linear_base + gp2m_start_index; + d->arch.dirty.second_lvl_end = xen_second_linear_base + gp2m_end_index; + + flush_vlpt(d); +} + +void cleanup_vlpt(struct domain *d) +{ + unmap_domain_page_global(d->arch.dirty.second_lvl); +} + /* * Local variables: * mode: C diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h index 9e395c2..47fc26e 100644 --- a/xen/include/asm-arm/config.h +++ b/xen/include/asm-arm/config.h @@ -87,6 +87,7 @@ * 0 - 8M <COMMON> * * 32M - 128M Frametable: 24 bytes per page for 16GB of RAM + * 128M - 256M Virtual-linear mapping to P2M table * 256M - 1G VMAP: ioremap and early_ioremap use this virtual address * space * @@ -132,6 +133,9 @@ #define VMAP_VIRT_END XENHEAP_VIRT_START +#define VIRT_LIN_P2M_START _AT(vaddr_t,0x08000000) +#define VIRT_LIN_P2M_END VMAP_VIRT_START + #define DOMHEAP_ENTRIES 1024 /* 1024 2MB mapping slots */ /* Number of domheap pagetable pages required at the second level (2MB mappings) */ @@ -158,6 +162,8 @@ #define HYPERVISOR_VIRT_END DIRECTMAP_VIRT_END +/*TODO (ARM_64): define VIRT_LIN_P2M_START VIRT_LIN_P2M_END */ + #endif /* Fixmap slots */ diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 755c7af..7e7743a 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -112,6 +112,13 @@ struct arch_domain spinlock_t lock; } vuart; + /* dirty-page tracing */ + struct { + int second_lvl_start; /* for context switch */ + int second_lvl_end; + lpae_t *second_lvl; + } dirty; + struct dt_mem_info map_domain; spinlock_t map_lock; } __cacheline_aligned; diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index 021e8d6..b9cbcf2 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -344,6 +344,28 @@ static inline void put_page_and_type(struct page_info *page) void get_gma_start_end(struct domain *d, vaddr_t *start, vaddr_t *end); +/* vlpt (virtual linear page table) related functions */ +void prepare_vlpt(struct domain *d); +void cleanup_vlpt(struct domain *d); +void restore_vlpt(struct domain *d); + +/* calculate the xen''s virtual address for accessing the leaf PTE of + * a given address (GPA) */ +static inline lpae_t * get_vlpt_3lvl_pte(vaddr_t addr) +{ + unsigned long ipa_third = third_table_offset(addr); + unsigned long ipa_second = second_table_offset(addr); + unsigned long ipa_first = first_table_offset(addr); + + /* guest p2m''s first level index is the index of xen''s second level, + and guest p2m''s second level index is the index of xen''s third level */ + return (lpae_t *)( + ( (second_linear_offset(VIRT_LIN_P2M_START) + ipa_first) + << SECOND_SHIFT) + + (ipa_second << THIRD_SHIFT) + + (ipa_third << 3) ); +} + #endif /* __ARCH_ARM_MM__ */ /* * Local variables: -- 1.8.1.2
Jaeyong Yoo
2013-Oct-04 04:44 UTC
[PATCH v4 8/9] xen/arm: Implement hypercall for dirty page tracing
Add hypercall (shadow op: enable/disable and clean/peek dirted page bitmap). It consists of two parts: dirty page detecting and saving. For detecting, we setup the guest p2m''s leaf PTE read-only and whenever the guest tries to write something, permission fault happens and traps into xen. The permission-faulted GPA should be saved for the toolstack (when it wants to see which pages are dirted). For this purpose, we temporarily save the GPAs into linked list by using ''add_mapped_vaddr'' function and when toolstack wants (log_dirty_op function) the GPAs are copied into bitmap and the linnked list is flushed. Additionally, for supporting parallel migration of domUs, vlpt area should be context switched. Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> --- xen/arch/arm/domain.c | 15 ++ xen/arch/arm/domctl.c | 13 ++ xen/arch/arm/mm.c | 23 ++- xen/arch/arm/p2m.c | 300 ++++++++++++++++++++++++++++++++++++++++ xen/arch/arm/traps.c | 21 ++- xen/include/asm-arm/domain.h | 5 + xen/include/asm-arm/mm.h | 4 + xen/include/asm-arm/p2m.h | 4 + xen/include/asm-arm/processor.h | 2 + 9 files changed, 382 insertions(+), 5 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 57be341..40ebeed 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -215,6 +215,10 @@ static void ctxt_switch_to(struct vcpu *n) WRITE_SYSREG(hcr, HCR_EL2); isb(); + /* vlpt area context switching for dirty-page tracing */ + if ( n->domain->arch.dirty.mode ) + restore_vlpt(n->domain); + /* This is could trigger an hardware interrupt from the virtual * timer. The interrupt needs to be injected into the guest. */ virt_timer_restore(n); @@ -512,6 +516,17 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) spin_lock_init(&d->arch.map_lock); d->arch.map_domain.nr_banks = 0; + /* init for dirty-page tracing */ + d->arch.dirty.count = 0; + d->arch.dirty.mode = 0; + d->arch.dirty.head = NULL; + d->arch.dirty.mvn_head = NULL; + spin_lock_init(&d->arch.dirty.lock); + + d->arch.dirty.second_lvl_start = 0; + d->arch.dirty.second_lvl_end = 0; + d->arch.dirty.second_lvl = NULL; + clear_page(d->shared_info); share_xen_page_with_guest( virt_to_page(d->shared_info), d, XENSHARE_writable); diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c index 78fb322..7edce12 100644 --- a/xen/arch/arm/domctl.c +++ b/xen/arch/arm/domctl.c @@ -92,6 +92,19 @@ long arch_do_domctl(struct xen_domctl *domctl, struct domain *d, xfree(c.data); } break; + case XEN_DOMCTL_shadow_op: + { + domain_pause(d); + ret = dirty_mode_op(d, &domctl->u.shadow_op); + domain_unpause(d); + + if ( (&domctl->u.shadow_op)->op == XEN_DOMCTL_SHADOW_OP_CLEAN || + (&domctl->u.shadow_op)->op == XEN_DOMCTL_SHADOW_OP_PEEK ) + { + copyback = 1; + } + } + break; default: return -EINVAL; diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 98edbc9..5b9d26d 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -838,7 +838,6 @@ void destroy_xen_mappings(unsigned long v, unsigned long e) create_xen_entries(REMOVE, v, 0, (e - v) >> PAGE_SHIFT, 0); } -enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; static void set_pte_flags_on_range(const char *p, unsigned long l, enum mg mg) { lpae_t pte; @@ -1460,6 +1459,28 @@ void cleanup_vlpt(struct domain *d) unmap_domain_page_global(d->arch.dirty.second_lvl); } +/* routine for dirty-page tracing + * + * On first write, it page faults, its entry is changed to read-write, + * and on retry the write succeeds. + * + * for locating p2m of the faulting entry, we use virtual-linear page table. + */ +void handle_page_fault(struct domain *d, paddr_t addr) +{ + lpae_t *vlp2m_pte = get_vlpt_3lvl_pte(addr); + if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0 ) + { + lpae_t pte = *vlp2m_pte; + pte.p2m.write = 1; + write_pte(vlp2m_pte, pte); + flush_tlb_local(); + + /* in order to remove mappings in cleanup stage */ + add_mapped_vaddr(d, addr); + } +} + /* * Local variables: * mode: C diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index 2d09fef..3b2b11a 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -6,6 +6,8 @@ #include <xen/bitops.h> #include <asm/flushtlb.h> #include <asm/gic.h> +#include <xen/guest_access.h> +#include <xen/pfn.h> void dump_p2m_lookup(struct domain *d, paddr_t addr) { @@ -408,6 +410,304 @@ unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn) return p >> PAGE_SHIFT; } +#define MAX_VA_PER_NODE (PAGE_SIZE - 2 * sizeof(void *) - sizeof(int))\ + / sizeof(unsigned long) + +/* An array-based linked list for storing virtual addresses (i.e., the 3rd + * level table mapping) that should be destroyed after live migration */ +struct mapped_va_node +{ + struct page_info *next; + struct mapped_va_node *mvn_next; + int items; + unsigned long vaddrs[MAX_VA_PER_NODE]; +}; + +int add_mapped_vaddr(struct domain *d, unsigned long va) +{ + struct page_info *page_head; + struct mapped_va_node *mvn_head; + + spin_lock(&d->arch.dirty.lock); + + page_head = d->arch.dirty.head; + mvn_head = d->arch.dirty.mvn_head; + if ( !mvn_head ) + { + page_head = alloc_domheap_page(NULL, 0); + if ( !page_head ) + { + spin_unlock(&d->arch.dirty.lock); + return -ENOMEM; + } + + mvn_head = map_domain_page_global(__page_to_mfn(page_head)); + mvn_head->items = 0; + mvn_head->next = NULL; + mvn_head->mvn_next = NULL; + d->arch.dirty.head = page_head; + d->arch.dirty.mvn_head = mvn_head; + } + + if ( mvn_head->items == MAX_VA_PER_NODE ) + { + struct page_info *page; + + page = alloc_domheap_page(NULL, 0); + if ( !page ) + { + spin_unlock(&d->arch.dirty.lock); + return -ENOMEM; + } + + mvn_head = map_domain_page_global(__page_to_mfn(page)); + mvn_head->items = 0; + mvn_head->next = d->arch.dirty.head; + mvn_head->mvn_next = d->arch.dirty.mvn_head; + + d->arch.dirty.mvn_head = mvn_head; + d->arch.dirty.head = page; + } + + mvn_head->vaddrs[mvn_head->items] = va; + mvn_head->items ++; + + spin_unlock(&d->arch.dirty.lock); + return 0; +} + +/* get the dirty bitmap from the linked list stored by add_mapped_vaddr + * and also clear the mapped vaddrs linked list */ +static void get_dirty_bitmap(struct domain *d, uint8_t *bitmap[]) +{ + struct page_info *page_head; + struct mapped_va_node *mvn_head; + struct page_info *tmp_page; + struct mapped_va_node *tmp_mvn; + vaddr_t gma_start = 0; + vaddr_t gma_end = 0; + + /* this hypercall is called from domain 0, and we don''t know which guest''s + * vlpt is mapped in xen_second, so, to be sure, we restore vlpt here */ + restore_vlpt(d); + + spin_lock(&d->arch.dirty.lock); + + page_head = d->arch.dirty.head; + mvn_head = d->arch.dirty.mvn_head; + get_gma_start_end(d, &gma_start, &gma_end); + + while ( page_head ) + { + int i; + + for ( i = 0; i < mvn_head->items; ++i ) + { + unsigned int bit_offset; + int bitmap_index, bitmap_offset; + lpae_t *vlp2m_pte; + + bit_offset = third_linear_offset(mvn_head->vaddrs[i] - gma_start); + bitmap_index = bit_offset >> (PAGE_SHIFT + 3); + bitmap_offset = bit_offset & ((1ul << (PAGE_SHIFT + 3)) - 1); + + __set_bit(bitmap_offset, bitmap[bitmap_index]); + + vlp2m_pte = get_vlpt_3lvl_pte(mvn_head->vaddrs[i]); + vlp2m_pte->p2m.write = 0; + } + + tmp_page = page_head; + page_head = mvn_head->next; + + tmp_mvn = mvn_head; + mvn_head = mvn_head->mvn_next; + + unmap_domain_page_global(tmp_mvn); + free_domheap_page(tmp_page); + } + + d->arch.dirty.head = NULL; + d->arch.dirty.mvn_head = NULL; + + spin_unlock(&d->arch.dirty.lock); +} + + +/* Change types across all p2m entries in a domain */ +static void p2m_change_entry_type_global(struct domain *d, enum mg nt) +{ + struct p2m_domain *p2m = &d->arch.p2m; + vaddr_t ram_base; + int i1, i2, i3; + int first_index, second_index, third_index; + lpae_t *first = __map_domain_page(p2m->first_level); + lpae_t pte, *second = NULL, *third = NULL; + + get_gma_start_end(d, &ram_base, NULL); + + first_index = first_table_offset((uint64_t)ram_base); + second_index = second_table_offset((uint64_t)ram_base); + third_index = third_table_offset((uint64_t)ram_base); + + BUG_ON( !first && "Can''t map first level p2m." ); + + spin_lock(&p2m->lock); + + for ( i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1 ) + { + lpae_walk_t first_pte = first[i1].walk; + if ( !first_pte.valid || !first_pte.table ) + goto out; + + second = map_domain_page(first_pte.base); + BUG_ON( !second && "Can''t map second level p2m."); + for ( i2 = second_index; i2 < LPAE_ENTRIES; ++i2 ) + { + lpae_walk_t second_pte = second[i2].walk; + + /* a nasty hack for vlpt due to the difference + * of page table entry layout between p2m and pt */ + second[i2].pt.ro = 0; + + if ( !second_pte.valid || !second_pte.table ) + goto out; + + third = map_domain_page(second_pte.base); + BUG_ON( !third && "Can''t map third level p2m."); + + for ( i3 = third_index; i3 < LPAE_ENTRIES; ++i3 ) + { + lpae_walk_t third_pte = third[i3].walk; + int write = 0; + if ( !third_pte.valid ) + goto out; + + pte = third[i3]; + if ( pte.p2m.write == 1 && nt == mg_ro ) + { + pte.p2m.write = 0; + write = 1; + } + else if ( pte.p2m.write == 0 && nt == mg_rw ) + { + pte.p2m.write = 1; + write = 1; + } + if ( write ) + write_pte(&third[i3], pte); + } + unmap_domain_page(third); + + third = NULL; + third_index = 0; + } + unmap_domain_page(second); + + second = NULL; + second_index = 0; + third_index = 0; + } + +out: + flush_tlb_all_local(); + if ( third ) unmap_domain_page(third); + if ( second ) unmap_domain_page(second); + if ( first ) unmap_domain_page(first); + + spin_unlock(&p2m->lock); +} + +/* Read a domain''s log-dirty bitmap and stats. + * If the operation is a CLEAN, clear the bitmap and stats. */ +int log_dirty_op(struct domain *d, xen_domctl_shadow_op_t *sc) +{ + unsigned long gmfn_start; + unsigned long gmfn_end; + unsigned long gmfns; + unsigned int bitmap_pages; + int rc = 0, clean = 0, peek = 1; + uint8_t *bitmap[256]; /* bitmap[256] covers 32GB ram */ + int i; + + BUG_ON( !d->arch.map_domain.nr_banks ); + + gmfn_start = d->arch.map_domain.bank[0].start >> PAGE_SHIFT; + gmfn_end = domain_get_maximum_gpfn(d); + gmfns = gmfn_end - gmfn_start; + bitmap_pages = PFN_UP((gmfns + 7) / 8); + + if ( guest_handle_is_null(sc->dirty_bitmap) ) + { + peek = 0; + } + else + { + /* mapping to the bitmap from guest param */ + vaddr_t to = (vaddr_t)sc->dirty_bitmap.p; + + BUG_ON( to & ~PAGE_MASK && "offset not aligned to PAGE SIZE"); + + for ( i = 0; i < bitmap_pages; ++i ) + { + paddr_t g; + rc = gvirt_to_maddr(to, &g); + if ( rc ) + return rc; + bitmap[i] = map_domain_page(g>>PAGE_SHIFT); + memset(bitmap[i], 0x00, PAGE_SIZE); + to += PAGE_SIZE; + } + } + + clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN); + sc->stats.dirty_count = d->arch.dirty.count; + + get_dirty_bitmap(d, bitmap); + if ( peek ) + { + for ( i = 0; i < bitmap_pages; ++i ) + { + unmap_domain_page(bitmap[i]); + } + } + + return 0; +} + +long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc) +{ + long ret = 0; + switch (sc->op) + { + case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY: + case XEN_DOMCTL_SHADOW_OP_OFF: + { + enum mg nt = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? mg_rw : mg_ro; + + d->arch.dirty.mode = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? 0 : 1; + p2m_change_entry_type_global(d, nt); + + if ( sc->op == XEN_DOMCTL_SHADOW_OP_OFF ) + cleanup_vlpt(d); + else + prepare_vlpt(d); + } + break; + + case XEN_DOMCTL_SHADOW_OP_CLEAN: + case XEN_DOMCTL_SHADOW_OP_PEEK: + { + ret = log_dirty_op(d, sc); + } + break; + + default: + return -ENOSYS; + } + return ret; +} + /* * Local variables: * mode: C diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 4c0fc32..3b78ed2 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1313,6 +1313,8 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, const char *msg; int rc, level = -1; mmio_info_t info; + int page_fault = ((dabt.dfsc & FSC_MASK) =+ (FSC_FLT_PERM + FSC_3D_LEVEL) && dabt.write); if ( !check_conditional_instr(regs, hsr) ) { @@ -1327,22 +1329,23 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, info.gva = READ_SYSREG64(FAR_EL2); #endif - if (dabt.s1ptw) + if ( dabt.s1ptw && !page_fault ) goto bad_data_abort; rc = gva_to_ipa(info.gva, &info.gpa); - if ( rc == -EFAULT ) + if ( rc == -EFAULT && !page_fault ) goto bad_data_abort; /* XXX: Decode the instruction if ISS is not valid */ - if ( !dabt.valid ) + if ( !dabt.valid && !page_fault ) goto bad_data_abort; /* * Erratum 766422: Thumb store translation fault to Hypervisor may * not have correct HSR Rt value. */ - if ( cpu_has_erratum_766422() && (regs->cpsr & PSR_THUMB) && dabt.write ) + if ( cpu_has_erratum_766422() && (regs->cpsr & PSR_THUMB) && dabt.write + && !page_fault) { rc = decode_instruction(regs, &info.dabt); if ( rc ) @@ -1358,6 +1361,16 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, return; } + /* handle permission fault on write */ + if ( page_fault ) + { + if ( current->domain->arch.dirty.mode == 0 ) + goto bad_data_abort; + + handle_page_fault(current->domain, info.gpa); + return; + } + bad_data_abort: msg = decode_fsc( dabt.dfsc, &level); diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 7e7743a..de349e9 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -114,6 +114,11 @@ struct arch_domain /* dirty-page tracing */ struct { + spinlock_t lock; + int mode; + unsigned int count; + struct page_info *head; /* maintain the mapped vaddrs */ + struct mapped_va_node *mvn_head; /* va of the head */ int second_lvl_start; /* for context switch */ int second_lvl_end; lpae_t *second_lvl; diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index b9cbcf2..c0eade0 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -366,6 +366,10 @@ static inline lpae_t * get_vlpt_3lvl_pte(vaddr_t addr) (ipa_third << 3) ); } +/* dirty-page tracing */ +enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; +void handle_page_fault(struct domain *d, paddr_t addr); + #endif /* __ARCH_ARM_MM__ */ /* * Local variables: diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h index c660820..0bac332 100644 --- a/xen/include/asm-arm/p2m.h +++ b/xen/include/asm-arm/p2m.h @@ -2,6 +2,7 @@ #define _XEN_P2M_H #include <xen/mm.h> +#include <public/domctl.h> struct domain; @@ -110,6 +111,9 @@ static inline int get_page_and_type(struct page_info *page, return rc; } +long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc); +int add_mapped_vaddr(struct domain *d, unsigned long va); + #endif /* _XEN_P2M_H */ /* diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h index 5294421..fced6ad 100644 --- a/xen/include/asm-arm/processor.h +++ b/xen/include/asm-arm/processor.h @@ -399,6 +399,8 @@ union hsr { #define FSC_CPR (0x3a) /* Coprocossor Abort */ #define FSC_LL_MASK (0x03<<0) +#define FSC_MASK (0x3f) /* Fault status mask */ +#define FSC_3D_LEVEL (0x03) /* Third level fault*/ /* Time counter hypervisor control register */ #define CNTHCTL_PA (1u<<0) /* Kernel/user access to physical counter */ -- 1.8.1.2
Jaeyong Yoo
2013-Oct-04 04:44 UTC
[PATCH v4 9/9] xen/arm: Implement toolstack for xl restore/save and migrate
From: Alexey Sokolov <sokolov.a@samsung.com> Implement for xl restore/save (which are also used for migrate) operation in xc_arm_migrate.c and make it compilable. The overall process of save is the following: 1) save guest parameters (i.e., memory map, console and store pfn, etc) 2) save memory (if it is live, perform dirty-page tracing) 3) save hvm states (i.e., gic, timer, etc) 4) save vcpu registers (i.e., pc, sp, lr, etc) The overall process of restore is the same to the one in save. Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com> --- config/arm32.mk | 1 + tools/libxc/Makefile | 5 + tools/libxc/xc_arm_migrate.c | 805 +++++++++++++++++++++++++++++++++++++++++++ tools/misc/Makefile | 4 + 4 files changed, 815 insertions(+) create mode 100644 tools/libxc/xc_arm_migrate.c diff --git a/config/arm32.mk b/config/arm32.mk index aa79d22..01374c9 100644 --- a/config/arm32.mk +++ b/config/arm32.mk @@ -1,6 +1,7 @@ CONFIG_ARM := y CONFIG_ARM_32 := y CONFIG_ARM_$(XEN_OS) := y +CONFIG_MIGRATE := y CONFIG_XEN_INSTALL_SUFFIX : diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile index 512a994..05dfef4 100644 --- a/tools/libxc/Makefile +++ b/tools/libxc/Makefile @@ -42,8 +42,13 @@ CTRL_SRCS-$(CONFIG_MiniOS) += xc_minios.c GUEST_SRCS-y : GUEST_SRCS-y += xg_private.c xc_suspend.c ifeq ($(CONFIG_MIGRATE),y) +ifeq ($(CONFIG_X86),y) GUEST_SRCS-y += xc_domain_restore.c xc_domain_save.c GUEST_SRCS-y += xc_offline_page.c xc_compression.c +endif +ifeq ($(CONFIG_ARM),y) +GUEST_SRCS-y += xc_arm_migrate.c +endif else GUEST_SRCS-y += xc_nomigrate.c endif diff --git a/tools/libxc/xc_arm_migrate.c b/tools/libxc/xc_arm_migrate.c new file mode 100644 index 0000000..52f06b5 --- /dev/null +++ b/tools/libxc/xc_arm_migrate.c @@ -0,0 +1,805 @@ +/****************************************************************************** + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; + * version 2.1 of the License. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + * + * Copyright (c) 2013, Samsung Electronics + */ + +#include <inttypes.h> +#include <errno.h> +#include <xenctrl.h> +#include <xenguest.h> + +#include <unistd.h> +#include <xc_private.h> +#include <xc_dom.h> +#include "xc_bitops.h" +#include "xg_private.h" + +#define DEF_MAX_ITERS 29 /* limit us to 30 times round loop */ +#define DEF_MAX_FACTOR 3 /* never send more than 3x p2m_size */ +#define DEF_MIN_DIRTY_PER_ITER 50 /* dirty page count to define last iter */ +#define DEF_PROGRESS_RATE 50 /* progress bar update rate */ + +/* Enable this macro for debug only: "static" migration instead of live */ +/* +#define DISABLE_LIVE_MIGRATION +*/ + +/* Enable this macro for debug only: additional debug info */ +#define ARM_MIGRATE_VERBOSE + +/* + * Guest params to save: used HVM params, save flags, memory map + */ +typedef struct guest_params +{ + unsigned long console_pfn; + unsigned long store_pfn; + uint32_t flags; + struct dt_mem_info memmap; /* Memory map */ + uint32_t max_vcpu_id; +} guest_params_t; + +static int suspend_and_state(int (*suspend)(void*), void *data, + xc_interface *xch, int dom) +{ + xc_dominfo_t info; + if ( !(*suspend)(data) ) + { + ERROR("Suspend request failed"); + return -1; + } + + if ( (xc_domain_getinfo(xch, dom, 1, &info) != 1) || + !info.shutdown || (info.shutdown_reason != SHUTDOWN_suspend) ) + { + ERROR("Domain is not in suspended state after suspend attempt"); + return -1; + } + + return 0; +} + +static int write_exact_handled(xc_interface *xch, int fd, const void *data, + size_t size) +{ + if ( write_exact(fd, data, size) ) + { + ERROR("Write failed, check space"); + return -1; + } + return 0; +} + +/* ============ Memory ============= */ +static int save_memory(xc_interface *xch, int io_fd, uint32_t dom, + struct save_callbacks *callbacks, + uint32_t max_iters, uint32_t max_factor, + guest_params_t *params) +{ + int live = !!(params->flags & XCFLAGS_LIVE); + int debug = !!(params->flags & XCFLAGS_DEBUG); + xen_pfn_t i; + char reportbuf[80]; + int iter = 0; + int last_iter = !live; + int total_dirty_pages_num = 0; + int dirty_pages_on_prev_iter_num = 0; + int count = 0; + char *page = 0; + xen_pfn_t *busy_pages = 0; + int busy_pages_count = 0; + int busy_pages_max = 256; + + DECLARE_HYPERCALL_BUFFER(unsigned long, to_send); + + /* We suppose that guest''s memory base is the first region base */ + xen_pfn_t start = (params->memmap.bank[0].start >> PAGE_SHIFT); + const xen_pfn_t end = xc_domain_maximum_gpfn(xch, dom); + const xen_pfn_t mem_size = end - start; + if ( debug ) + { + IPRINTF("(save mem) start=%llx end=%llx!\n", start, end); + } + + if ( write_exact_handled(xch, io_fd, &end, sizeof(xen_pfn_t)) ) + return -1; + + if ( live ) + { + if ( xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY, + NULL, 0, NULL, 0, NULL) < 0 ) + { + ERROR("Couldn''t enable log-dirty mode !\n"); + return -1; + } + + max_iters = max_iters ? : DEF_MAX_ITERS; + max_factor = max_factor ? : DEF_MAX_FACTOR; + + if ( debug ) + IPRINTF("Log-dirty mode enabled, max_iters=%d, max_factor=%d!\n", + max_iters, max_factor); + } + + to_send = xc_hypercall_buffer_alloc_pages(xch, to_send, + NRPAGES(bitmap_size(mem_size))); + if ( !to_send ) + { + ERROR("Couldn''t allocate to_send array!\n"); + return -1; + } + + /* send all pages on first iter */ + memset(to_send, 0xff, bitmap_size(mem_size)); + + for ( ; ; ) + { + int dirty_pages_on_current_iter_num = 0; + int frc; + iter++; + + snprintf(reportbuf, sizeof(reportbuf), + "Saving memory: iter %d (last sent %u)", + iter, dirty_pages_on_prev_iter_num); + + xc_report_progress_start(xch, reportbuf, mem_size); + + if ( (iter > 1 && + dirty_pages_on_prev_iter_num < DEF_MIN_DIRTY_PER_ITER) || + (iter == max_iters) || + (total_dirty_pages_num >= mem_size*max_factor) ) + { + if ( debug ) + IPRINTF("Last iteration"); + last_iter = 1; + } + + if ( last_iter ) + { + if ( suspend_and_state(callbacks->suspend, callbacks->data, + xch, dom) ) + { + ERROR("Domain appears not to have suspended"); + return -1; + } + } + if ( live && iter > 1 ) + { + frc = xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_CLEAN, + HYPERCALL_BUFFER(to_send), mem_size, + NULL, 0, NULL); + if ( frc != mem_size ) + { + ERROR("Error peeking shadow bitmap"); + xc_hypercall_buffer_free_pages(xch, to_send, + NRPAGES(bitmap_size(mem_size))); + return -1; + } + } + + busy_pages = malloc(sizeof(xen_pfn_t) * busy_pages_max); + + for ( i = start; i < end; ++i ) + { + if ( test_bit(i - start, to_send) ) + { + page = xc_map_foreign_range(xch, dom, PAGE_SIZE, PROT_READ, i); + if ( !page ) + { + /* This page is mapped elsewhere, should be resent later */ + busy_pages[busy_pages_count] = i; + busy_pages_count++; + if ( busy_pages_count >= busy_pages_max ) + { + busy_pages_max += 256; + busy_pages = realloc(busy_pages, sizeof(xen_pfn_t) * + busy_pages_max); + } + continue; + } + + if ( write_exact_handled(xch, io_fd, &i, sizeof(i)) || + write_exact_handled(xch, io_fd, page, PAGE_SIZE) ) + { + munmap(page, PAGE_SIZE); + free(busy_pages); + return -1; + } + count++; + munmap(page, PAGE_SIZE); + + if ( (i % DEF_PROGRESS_RATE) == 0 ) + xc_report_progress_step(xch, i - start, mem_size); + dirty_pages_on_current_iter_num++; + } + } + + while ( busy_pages_count ) + { + /* Send busy pages */ + busy_pages_count--; + i = busy_pages[busy_pages_count]; + if ( test_bit(i - start, to_send) ) + { + page = xc_map_foreign_range(xch, dom, PAGE_SIZE,PROT_READ, i); + if ( !page ) + { + IPRINTF("WARNING: 2nd attempt to save page " + "busy failed pfn=%llx", i); + continue; + } + + if ( debug ) + { + IPRINTF("save mem: resend busy page %llx\n", i); + } + + if ( write_exact_handled(xch, io_fd, &i, sizeof(i)) || + write_exact_handled(xch, io_fd, page, PAGE_SIZE) ) + { + munmap(page, PAGE_SIZE); + free(busy_pages); + return -1; + } + count++; + munmap(page, PAGE_SIZE); + dirty_pages_on_current_iter_num++; + } + } + free(busy_pages); + + if ( debug ) + IPRINTF("Dirty pages=%d", dirty_pages_on_current_iter_num); + + xc_report_progress_step(xch, mem_size, mem_size); + + dirty_pages_on_prev_iter_num = dirty_pages_on_current_iter_num; + total_dirty_pages_num += dirty_pages_on_current_iter_num; + + if ( last_iter ) + { + xc_hypercall_buffer_free_pages(xch, to_send, + NRPAGES(bitmap_size(mem_size))); + if ( live ) + { + if ( xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_OFF, + NULL, 0, NULL, 0, NULL) < 0 ) + ERROR("Couldn''t disable log-dirty mode"); + } + break; + } + } + if ( debug ) + { + IPRINTF("save mem: pages count = %d\n", count); + } + + i = (xen_pfn_t) -1; /* end page marker */ + return write_exact_handled(xch, io_fd, &i, sizeof(i)); +} + +static int restore_memory(xc_interface *xch, int io_fd, uint32_t dom, + guest_params_t *params) +{ + xen_pfn_t end; + xen_pfn_t gpfn; + int debug = !!(params->flags & XCFLAGS_DEBUG); + int count = 0; + char *page; + + /* We suppose that guest''s memory base is the first region base */ + xen_pfn_t start = (params->memmap.bank[0].start >> PAGE_SHIFT); + + if ( read_exact(io_fd, &end, sizeof(xen_pfn_t)) ) + { + PERROR("First read of incoming memory failed"); + return -1; + } + + /* TODO allocate several pages per call */ + for ( gpfn = start; gpfn < end; ++gpfn ) + { + if ( xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &gpfn) ) + { + PERROR("Memory allocation for a new domain failed"); + return -1; + } + } + while ( 1 ) + { + + if ( read_exact(io_fd, &gpfn, sizeof(gpfn)) ) + { + PERROR("GPFN read failed during memory transfer, count=%d", count); + return -1; + } + if ( gpfn == (xen_pfn_t) -1 ) break; /* end page marker */ + + if ( gpfn < start || gpfn >= end ) + { + ERROR("GPFN %llx doesn''t belong to RAM address space, count=%d", + gpfn, count); + return -1; + } + page = xc_map_foreign_range(xch, dom, PAGE_SIZE, + PROT_READ | PROT_WRITE, gpfn); + if ( !page ) + { + PERROR("xc_map_foreign_range failed, pfn=%llx", gpfn); + return -1; + } + if ( read_exact(io_fd, page, PAGE_SIZE) ) + { + PERROR("Page data read failed during memory transfer, pfn=%llx", + gpfn); + return -1; + } + munmap(page, PAGE_SIZE); + count++; + } + + if ( debug ) + { + IPRINTF("Memory restored, pages count=%d", count); + } + return 0; +} + +/* ============ HVM context =========== */ +static int save_armhvm(xc_interface *xch, int io_fd, uint32_t dom, int debug) +{ + /* HVM: a buffer for holding HVM context */ + uint32_t hvm_buf_size = 0; + uint8_t *hvm_buf = NULL; + uint32_t rec_size; + int retval = -1; + + /* Need another buffer for HVM context */ + hvm_buf_size = xc_domain_hvm_getcontext(xch, dom, 0, 0); + if ( hvm_buf_size == -1 ) + { + ERROR("Couldn''t get HVM context size from Xen"); + goto out; + } + hvm_buf = malloc(hvm_buf_size); + + if ( !hvm_buf ) + { + ERROR("Couldn''t allocate memory for hvm buffer"); + goto out; + } + + /* Get HVM context from Xen and save it too */ + if ( (rec_size = xc_domain_hvm_getcontext(xch, dom, hvm_buf, + hvm_buf_size)) == -1 ) + { + ERROR("HVM:Could not get hvm buffer"); + goto out; + } + + if ( debug ) + IPRINTF("HVM save size %d %d", hvm_buf_size, rec_size); + + if ( write_exact_handled(xch, io_fd, &rec_size, sizeof(uint32_t)) ) + goto out; + + if ( write_exact_handled(xch, io_fd, hvm_buf, rec_size) ) + { + goto out; + } + retval = 0; + +out: + if ( hvm_buf ) + free (hvm_buf); + return retval; +} + +static int restore_armhvm(xc_interface *xch, int io_fd, + uint32_t dom, int debug) +{ + uint32_t rec_size; + uint32_t hvm_buf_size = 0; + uint8_t *hvm_buf = NULL; + int frc = 0; + int retval = -1; + + if ( read_exact(io_fd, &rec_size, sizeof(uint32_t)) ) + { + PERROR("Could not read HVM size"); + goto out; + } + + if ( !rec_size ) + { + ERROR("Zero HVM size"); + goto out; + } + + hvm_buf_size = xc_domain_hvm_getcontext(xch, dom, 0, 0); + if ( hvm_buf_size != rec_size ) + { + ERROR("HVM size for this domain is not the same as stored"); + } + + hvm_buf = malloc(hvm_buf_size); + if ( !hvm_buf ) + { + ERROR("Couldn''t allocate memory"); + goto out; + } + + if ( read_exact(io_fd, hvm_buf, hvm_buf_size) ) + { + PERROR("Could not read HVM context"); + goto out; + } + + frc = xc_domain_hvm_setcontext(xch, dom, hvm_buf, hvm_buf_size); + if ( frc ) + { + ERROR("error setting the HVM context"); + goto out; + } + retval = 0; + + if ( debug ) + { + IPRINTF("HVM restore size %d %d", hvm_buf_size, rec_size); + } +out: + if ( hvm_buf ) + free (hvm_buf); + return retval; +} + +/* ================= Console & Xenstore & Memory map =========== */ +static int save_guest_params(xc_interface *xch, int io_fd, + uint32_t dom, uint32_t flags, + guest_params_t *params) +{ + size_t sz = sizeof(guest_params_t); + xc_dominfo_t dom_info; + + if ( xc_domain_get_memory_map(xch, dom, ¶ms->memmap) ) + { + ERROR("Can''t get memory map"); + return -1; + } + + if ( flags & XCFLAGS_DEBUG ) + { + IPRINTF("Guest param save size: %d ", sz); + IPRINTF("Guest memory map save %d entries", params->memmap.nr_banks); + } + + if ( xc_get_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN, + ¶ms->console_pfn) ) + { + ERROR("Can''t get console gpfn"); + return -1; + } + + if ( xc_get_hvm_param(xch, dom, HVM_PARAM_STORE_PFN, ¶ms->store_pfn) ) + { + ERROR("Can''t get store gpfn"); + return -1; + } + + if ( xc_domain_getinfo(xch, dom, 1, &dom_info ) < 0) + { + ERROR("Can''t get domain info for dom %d", dom); + return -1; + } + params->max_vcpu_id = dom_info.max_vcpu_id; + + params->flags = flags; + + if ( write_exact_handled(xch, io_fd, params, sz) ) + { + return -1; + } + + return 0; +} + +static int restore_guest_params(xc_interface *xch, int io_fd, + uint32_t dom, guest_params_t *params) +{ + size_t sz = sizeof(guest_params_t); + + if ( read_exact(io_fd, params, sizeof(guest_params_t)) ) + { + PERROR("Can''t read guest params"); + return -1; + } + + if ( params->flags & XCFLAGS_DEBUG ) + { + IPRINTF("Guest param restore size: %d ", sz); + IPRINTF("Guest memory map restore %d entries", + params->memmap.nr_banks); + } + + if ( xc_domain_set_memory_map(xch, dom, ¶ms->memmap) ) + { + ERROR("Can''t set memory map"); + return -1; + } + /* Set max. number of vcpus as max_vcpu_id + 1 */ + if ( xc_domain_max_vcpus(xch, dom, params->max_vcpu_id + 1) ) + { + ERROR("Can''t set max vcpu number for domain"); + return -1; + } + + return 0; +} + +static int set_guest_params(xc_interface *xch, int io_fd, uint32_t dom, + guest_params_t *params, unsigned int console_evtchn, + domid_t console_domid, unsigned int store_evtchn, + domid_t store_domid) +{ + int rc = 0; + + if ( (rc = xc_clear_domain_page(xch, dom, params->console_pfn)) ) + { + ERROR("Can''t clear console page"); + return rc; + } + + if ( (rc = xc_clear_domain_page(xch, dom, params->store_pfn)) ) + { + ERROR("Can''t clear xenstore page"); + return rc; + } + + if ( (rc = xc_dom_gnttab_hvm_seed(xch, dom, params->console_pfn, + params->store_pfn, console_domid, + store_domid)) ) + { + ERROR("Can''t grant console and xenstore pages"); + return rc; + } + + if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN, + params->console_pfn)) ) + { + ERROR("Can''t set console gpfn"); + return rc; + } + + if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN, + params->store_pfn)) ) + { + ERROR("Can''t set xenstore gpfn"); + return rc; + } + + if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_EVTCHN, + console_evtchn)) ) + { + ERROR("Can''t set console event channel"); + return rc; + } + + if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_EVTCHN, + store_evtchn)) ) + { + ERROR("Can''t set xenstore event channel"); + return rc; + } + return 0; +} + +/* ====================== VCPU ============== */ +static int save_vcpus(xc_interface *xch, int io_fd, uint32_t dom, + guest_params_t *params) +{ + vcpu_guest_context_any_t ctxt; + int rc, nr_online = 0; + uint32_t i; + xc_vcpuinfo_t info; + int debug = !!(params->flags & XCFLAGS_DEBUG); + + for ( i = 0; i<= params->max_vcpu_id; i++ ) + { + rc = xc_vcpu_getinfo(xch, dom, i, &info); + /* Store contexts for online vcpus only; + * vcpu id is also stored before */ + if ( rc || info.online == 0 ) continue; + if ( xc_vcpu_getcontext(xch, dom, i, &ctxt) ) continue; + rc = write_exact_handled(xch, io_fd, &i, sizeof(i)); + if ( rc ) return rc; + rc = write_exact_handled(xch, io_fd, &ctxt, sizeof(ctxt)); + if ( rc ) return rc; + nr_online++; + } + /* end marker */ + i = params->max_vcpu_id + 1; + rc = write_exact_handled(xch, io_fd, &i, sizeof(i)); + if ( rc ) return rc; + if ( debug ) + IPRINTF("save: max_vcpu=%d, nr_online=%d\n", i, nr_online); + return 0; +} + +static int restore_vcpus(xc_interface *xch, int io_fd, uint32_t dom, + guest_params_t *params) +{ + int rc = 0; + uint32_t i; + int nr_online = 0; + int debug = !!(params->flags & XCFLAGS_DEBUG); + + DECLARE_DOMCTL; + DECLARE_HYPERCALL_BUFFER(vcpu_guest_context_any_t, ctxt); + + ctxt = xc_hypercall_buffer_alloc(xch, ctxt, sizeof(*ctxt)); + + while ( 1 ) + { + memset(ctxt, 0, sizeof(*ctxt)); + + if ( read_exact(io_fd, &i, sizeof(i)) ) + { + PERROR("VCPU context read failed"); + rc = -1; + break; + } + if ( i > params->max_vcpu_id ) break; + if ( read_exact(io_fd, ctxt, sizeof(*ctxt)) ) + { + PERROR("VCPU context read failed"); + rc = -1; + break; + } + nr_online++; + memset(&domctl, 0, sizeof(domctl)); + domctl.cmd = XEN_DOMCTL_setvcpucontext; + domctl.domain = dom; + domctl.u.vcpucontext.vcpu = i; + set_xen_guest_handle(domctl.u.vcpucontext.ctxt, ctxt); + rc = do_domctl(xch, &domctl); + if ( rc ) + { + ERROR("VCPU context set failed (error %d)", rc); + break; + } + } + if ( debug ) + IPRINTF("restore: max_vcpu=%d, nr_online=%d\n", + params->max_vcpu_id + 1, nr_online); + + xc_hypercall_buffer_free(xch, ctxt); + return rc; +} + +/* ================== Main ============== */ +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, + uint32_t max_iters, uint32_t max_factor, uint32_t flags, + struct save_callbacks *callbacks, int hvm, + unsigned long vm_generationid_addr) +{ + int debug; + guest_params_t params; + +#ifdef ARM_MIGRATE_VERBOSE + flags |= XCFLAGS_DEBUG; +#endif + +#ifdef DISABLE_LIVE_MIGRATION + flags &= ~(XCFLAGS_LIVE); +#endif + + debug = !!(flags & XCFLAGS_DEBUG); + if ( save_guest_params(xch, io_fd, dom, flags, ¶ms) ) + { + ERROR("Can''t save guest params"); + return -1; + } + + if ( save_memory(xch, io_fd, dom, callbacks, max_iters, + max_factor, ¶ms) ) + { + ERROR("Memory not saved"); + return -1; + } + + if ( save_armhvm(xch, io_fd, dom, debug) ) + { + ERROR("HVM not saved"); + return -1; + } + + if ( save_vcpus(xch, io_fd, dom, ¶ms) ) + { + ERROR("VCPU(s) not saved"); + return -1; + } + if ( debug ) + { + IPRINTF("Domain %d saved", dom); + } + return 0; +} + +int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, + unsigned int store_evtchn, unsigned long *store_gpfn, + domid_t store_domid, unsigned int console_evtchn, + unsigned long *console_gpfn, domid_t console_domid, + unsigned int hvm, unsigned int pae, int superpages, + int no_incr_generationid, + unsigned long *vm_generationid_addr, + struct restore_callbacks *callbacks) +{ + guest_params_t params; + int debug = 1; + + if ( restore_guest_params(xch, io_fd, dom, ¶ms) ) + { + ERROR("Can''t restore guest params"); + return -1; + } + debug = !!(params.flags & XCFLAGS_DEBUG); + + if ( restore_memory(xch, io_fd, dom, ¶ms) ) + { + ERROR("Can''t restore memory"); + return -1; + } + if ( set_guest_params(xch, io_fd, dom, ¶ms, + console_evtchn, console_domid, + store_evtchn, store_domid) ) + { + ERROR("Can''t setup guest params"); + return -1; + } + + /* Setup console and store PFNs to caller */ + *console_gpfn = params.console_pfn; + *store_gpfn = params.store_pfn; + + if ( restore_armhvm(xch, io_fd, dom, debug) ) + { + ERROR("HVM not restored"); + return -1; + } + + if ( restore_vcpus(xch, io_fd, dom, ¶ms) ) + { + ERROR("Can''t restore VCPU(s)"); + return -1; + } + + if ( debug ) + { + IPRINTF("Domain %d restored", dom); + } + + return 0; +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/tools/misc/Makefile b/tools/misc/Makefile index 17aeda5..29a1f66 100644 --- a/tools/misc/Makefile +++ b/tools/misc/Makefile @@ -11,7 +11,9 @@ HDRS = $(wildcard *.h) TARGETS-y := xenperf xenpm xen-tmem-list-parse gtraceview gtracestat xenlockprof xenwatchdogd xencov TARGETS-$(CONFIG_X86) += xen-detect xen-hvmctx xen-hvmcrash xen-lowmemd xen-mfndump +ifeq ($(CONFIG_X86),y) TARGETS-$(CONFIG_MIGRATE) += xen-hptool +endif TARGETS := $(TARGETS-y) SUBDIRS := $(SUBDIRS-y) @@ -23,7 +25,9 @@ INSTALL_BIN := $(INSTALL_BIN-y) INSTALL_SBIN-y := xen-bugtool xen-python-path xenperf xenpm xen-tmem-list-parse gtraceview \ gtracestat xenlockprof xenwatchdogd xen-ringwatch xencov INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx xen-hvmcrash xen-lowmemd xen-mfndump +ifeq ($(CONFIG_X86),y) INSTALL_SBIN-$(CONFIG_MIGRATE) += xen-hptool +endif INSTALL_SBIN := $(INSTALL_SBIN-y) INSTALL_PRIVBIN-y := xenpvnetboot -- 1.8.1.2
Julien Grall
2013-Oct-07 12:49 UTC
Re: [PATCH v4 1/9] xen/arm: Implement hvm save and restore
On 10/04/2013 05:43 AM, Jaeyong Yoo wrote:> Implement save/restore of hvm context hypercall. In hvm > context save/restore, we save gic, timer and vfp registers. > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > --- > xen/arch/arm/Makefile | 1 + > xen/arch/arm/domctl.c | 88 ++++++++++++- > xen/arch/arm/hvm.c | 228 +++++++++++++++++++++++++++++++++ > xen/arch/arm/save.c | 66 ++++++++++ > xen/common/Makefile | 2 + > xen/include/asm-arm/hvm/support.h | 29 +++++ > xen/include/public/arch-arm/hvm/save.h | 66 ++++++++++ > 7 files changed, 479 insertions(+), 1 deletion(-) > create mode 100644 xen/arch/arm/save.c > create mode 100644 xen/include/asm-arm/hvm/support.h > > diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile > index 003ac84..8910a6c 100644 > --- a/xen/arch/arm/Makefile > +++ b/xen/arch/arm/Makefile > @@ -31,6 +31,7 @@ obj-y += vuart.o > obj-y += hvm.o > obj-y += device.o > obj-y += decode.o > +obj-y += save.o > > #obj-bin-y += ....o > > diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c > index 851ee40..86c07de 100644 > --- a/xen/arch/arm/domctl.c > +++ b/xen/arch/arm/domctl.c > @@ -9,12 +9,98 @@ > #include <xen/lib.h> > #include <xen/errno.h> > #include <xen/sched.h> > +#include <xen/hvm/save.h> > +#include <xen/guest_access.h> > #include <public/domctl.h> > > long arch_do_domctl(struct xen_domctl *domctl, struct domain *d, > XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) > { > - return -ENOSYS; > + long ret = 0; > + bool_t copyback = 0; > + > + switch ( domctl->cmd ) > + { > + case XEN_DOMCTL_sethvmcontext: > + { > + struct hvm_domain_context c = { .size = domctl->u.hvmcontext.size }; > + > + ret = -ENOMEM; > + if ( (c.data = xmalloc_bytes(c.size)) == NULL ) > + goto sethvmcontext_out; > + > + ret = -EFAULT; > + if ( copy_from_guest(c.data, domctl->u.hvmcontext.buffer, c.size) != 0 ) > + goto sethvmcontext_out; > + > + domain_pause(d); > + ret = hvm_load(d, &c); > + domain_unpause(d); > + > + sethvmcontext_out: > + if ( c.data != NULL ) > + xfree(c.data); > + } > + break; > + case XEN_DOMCTL_gethvmcontext: > + { > + struct hvm_domain_context c = { 0 }; > + > + ret = -EINVAL; > + > + c.size = hvm_save_size(d); > + > + if ( guest_handle_is_null(domctl->u.hvmcontext.buffer) ) > + { > + /* Client is querying for the correct buffer size */ > + domctl->u.hvmcontext.size = c.size; > + ret = 0; > + goto gethvmcontext_out; > + } > + /* Check that the client has a big enough buffer */ > + ret = -ENOSPC; > + if ( domctl->u.hvmcontext.size < c.size ) > + { > + printk("(gethvmcontext) size error: %d and %d\n", > + domctl->u.hvmcontext.size, c.size ); > + goto gethvmcontext_out; > + } > + > + /* Allocate our own marshalling buffer */ > + ret = -ENOMEM; > + if ( (c.data = xmalloc_bytes(c.size)) == NULL ) > + { > + printk("(gethvmcontext) xmalloc_bytes failed: %d\n", c.size ); > + goto gethvmcontext_out; > + } > + > + domain_pause(d); > + ret = hvm_save(d, &c); > + domain_unpause(d); > + > + domctl->u.hvmcontext.size = c.cur; > + if ( copy_to_guest(domctl->u.hvmcontext.buffer, c.data, c.size) != 0 ) > + { > + printk("(gethvmcontext) copy to guest failed\n"); > + ret = -EFAULT; > + } > + > + gethvmcontext_out: > + copyback = 1; > + > + if ( c.data != NULL ) > + xfree(c.data); > + } > + break; > + > + default: > + return -EINVAL; > + } > + > + if ( copyback && __copy_to_guest(u_domctl, domctl, 1) ) > + ret = -EFAULT; > + > + return ret; > } > > void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c) > diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c > index 471c4cd..befd75d 100644 > --- a/xen/arch/arm/hvm.c > +++ b/xen/arch/arm/hvm.c > @@ -7,11 +7,13 @@ > > #include <xsm/xsm.h> > > +#include <xen/hvm/save.h> > #include <public/xen.h> > #include <public/hvm/params.h> > #include <public/hvm/hvm_op.h> > > #include <asm/hypercall.h> > +#include <asm/gic.h> > > long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) > > @@ -65,3 +67,229 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) > > return rc; > } > + > +static void vgic_irq_rank_save(struct vgic_rank *ext, > + struct vgic_irq_rank *rank) > +{ > + spin_lock(&rank->lock); > + /* Some of VGIC registers are not used yet, it is for a future usage */ > + /* IENABLE, IACTIVE, IPEND, PENDSGI registers */ > + ext->ienable = rank->ienable; > + ext->iactive = rank->iactive; > + ext->ipend = rank->ipend; > + ext->pendsgi = rank->pendsgi; > + /* ICFG */ > + ext->icfg[0] = rank->icfg[0]; > + ext->icfg[1] = rank->icfg[1];Can you use memcpy?> + /* IPRIORITY */ > + memcpy(ext->ipriority, rank->ipriority, sizeof(rank->ipriority)); > + /* ITARGETS */ > + memcpy(ext->itargets, rank->itargets, sizeof(rank->itargets)); > + spin_unlock(&rank->lock); > +} > + > +static void vgic_irq_rank_restore(struct vgic_irq_rank *rank, > + struct vgic_rank *ext) > +{ > + spin_lock(&rank->lock); > + /* IENABLE, IACTIVE, IPEND, PENDSGI registers */ > + rank->ienable = ext->ienable; > + rank->iactive = ext->iactive; > + rank->ipend = ext->ipend; > + rank->pendsgi = ext->pendsgi; > + /* ICFG */ > + rank->icfg[0] = ext->icfg[0]; > + rank->icfg[1] = ext->icfg[1];Same here.> + /* IPRIORITY */ > + memcpy(rank->ipriority, ext->ipriority, sizeof(rank->ipriority)); > + /* ITARGETS */ > + memcpy(rank->itargets, ext->itargets, sizeof(rank->itargets)); > + spin_unlock(&rank->lock); > +} > + > + > +static int gic_save(struct domain *d, hvm_domain_context_t *h) > +{ > + struct hvm_hw_gic ctxt; > + struct vcpu *v; > + > + /* Save the state of GICs */ > + for_each_vcpu( d, v ) > + { > + ctxt.gic_hcr = v->arch.gic_hcr; > + ctxt.gic_vmcr = v->arch.gic_vmcr; > + ctxt.gic_apr = v->arch.gic_apr; > + > + /* Save list registers and masks */ > + /* (it is not necessary to save/restore them, but LR state can have > + * influence on downtime after Live Migration (to be tested) > + */ > + memcpy(ctxt.gic_lr, v->arch.gic_lr, sizeof(ctxt.gic_lr)); > + ctxt.lr_mask = v->arch.lr_mask; > + ctxt.event_mask = v->arch.event_mask; > + > + /* Save PPI states (per-CPU) */ > + /* It is necessary if SMP enabled */ > + vgic_irq_rank_save(&ctxt.ppi_state, &v->arch.vgic.private_irqs); > + > + if ( hvm_save_entry(GIC, v->vcpu_id, h, &ctxt) != 0 ) > + return 1; > + } > + return 0; > +} > + > +static int gic_load(struct domain *d, hvm_domain_context_t *h) > +{ > + int vcpuid; > + struct hvm_hw_gic ctxt; > + struct vcpu *v; > + > + /* Which vcpu is this? */ > + vcpuid = hvm_load_instance(h); > + if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) > + { > + dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n", > + d->domain_id, vcpuid); > + return -EINVAL; > + } > + > + if ( hvm_load_entry(GIC, h, &ctxt) != 0 ) > + return -EINVAL; > + > + v->arch.gic_hcr = ctxt.gic_hcr; > + v->arch.gic_vmcr = ctxt.gic_vmcr; > + v->arch.gic_apr = ctxt.gic_apr; > + > + /* Restore list registers and masks */ > + memcpy(v->arch.gic_lr, ctxt.gic_lr, sizeof(ctxt.gic_lr)); > + v->arch.lr_mask = ctxt.lr_mask; > + v->arch.event_mask = ctxt.event_mask; > + > + /* Restore PPI states */ > + vgic_irq_rank_restore(&v->arch.vgic.private_irqs, &ctxt.ppi_state); > + > + return 0; > +} > + > +HVM_REGISTER_SAVE_RESTORE(GIC, gic_save, gic_load, 1, HVMSR_PER_VCPU); > + > +static int timer_save(struct domain *d, hvm_domain_context_t *h) > +{ > + struct hvm_hw_timer ctxt; > + struct vcpu *v; > + struct vtimer *t; > + int i; > + > + /* Save the state of vtimer and ptimer */ > + for_each_vcpu( d, v ) > + { > + t = &v->arch.virt_timer; > + for ( i = 0; i < 2; i++ ) > + { > + ctxt.cval = t->cval; > + ctxt.ctl = t->ctl; > + ctxt.vtb_offset = i ? d->arch.phys_timer_base.offset : > + d->arch.virt_timer_base.offset; > + ctxt.type = i; > + if ( hvm_save_entry(A15_TIMER, v->vcpu_id, h, &ctxt) != 0 ) > + return 1; > + t = &v->arch.phys_timer; > + } > + } > + > + return 0; > +} > + > +static int timer_load(struct domain *d, hvm_domain_context_t *h) > +{ > + int vcpuid; > + struct hvm_hw_timer ctxt; > + struct vcpu *v; > + struct vtimer *t = NULL; > + > + /* Which vcpu is this? */ > + vcpuid = hvm_load_instance(h); > + > + if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) > + { > + dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n", > + d->domain_id, vcpuid); > + return -EINVAL; > + } > + > + if ( hvm_load_entry(A15_TIMER, h, &ctxt) != 0 ) > + return -EINVAL; > + > + > + if ( ctxt.type == TIMER_TYPE_VIRT ) > + { > + t = &v->arch.virt_timer; > + d->arch.virt_timer_base.offset = ctxt.vtb_offset; > + > + } > + else > + { > + t = &v->arch.phys_timer; > + d->arch.phys_timer_base.offset = ctxt.vtb_offset; > + } > + > + t->cval = ctxt.cval; > + t->ctl = ctxt.ctl; > + t->v = v; > + > + return 0; > +} > + > +HVM_REGISTER_SAVE_RESTORE(A15_TIMER, timer_save, timer_load, 2, HVMSR_PER_VCPU); > + > +static int vfp_save(struct domain *d, hvm_domain_context_t *h) > +{ > + struct hvm_hw_vfp ctxt; > + struct vcpu *v; > + > + /* Save the state of VFP */ > + for_each_vcpu( d, v ) > + { > + if ( sizeof(v->arch.vfp) != sizeof (ctxt) ) > + { > + dprintk(XENLOG_G_ERR, "HVM: check VFP structure match\n"); > + return -EINVAL; > + } > + memcpy(&ctxt, &v->arch.vfp, sizeof(ctxt)); > + > + if ( hvm_save_entry(VFP, v->vcpu_id, h, &ctxt) != 0 ) > + return 1; > + } > + return 0; > +} > + > +static int vfp_load(struct domain *d, hvm_domain_context_t *h) > +{ > + int vcpuid; > + struct hvm_hw_vfp ctxt; > + struct vcpu *v; > + > + /* Which vcpu is this? */ > + vcpuid = hvm_load_instance(h); > + if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) > + { > + dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n", > + d->domain_id, vcpuid); > + return -EINVAL; > + } > + > + if ( sizeof(v->arch.vfp) != sizeof (ctxt) ) > + { > + dprintk(XENLOG_G_ERR, "HVM: check VFP structure match\n"); > + return -EINVAL; > + } > + > + if ( hvm_load_entry(VFP, h, &ctxt) != 0 ) > + return -EINVAL; > + > + memcpy(&v->arch.vfp, &ctxt, sizeof(ctxt)); > + > + return 0; > +} > + > +HVM_REGISTER_SAVE_RESTORE(VFP, vfp_save, vfp_load, 1, HVMSR_PER_VCPU); > diff --git a/xen/arch/arm/save.c b/xen/arch/arm/save.c > new file mode 100644 > index 0000000..c923910 > --- /dev/null > +++ b/xen/arch/arm/save.c > @@ -0,0 +1,66 @@ > +/* > + * hvm/save.c: Save and restore HVM guest''s emulated hardware state for ARM. > + * > + * Copyright (c) 2013, Samsung Electronics. > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms and conditions of the GNU General Public License, > + * version 2, as published by the Free Software Foundation. > + * > + * This program is distributed in the hope it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + * > + * You should have received a copy of the GNU General Public License along with > + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple > + * Place - Suite 330, Boston, MA 02111-1307 USA. > + */ > + > +#include <asm/hvm/support.h> > +#include <public/hvm/save.h> > + > +void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr) > +{ > + hdr->cpuid = READ_SYSREG32(MIDR_EL1); > +} > + > +int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr) > +{ > + uint32_t cpuid; > + > + if ( hdr->magic != HVM_FILE_MAGIC ) > + { > + printk(XENLOG_G_ERR "HVM%d restore: bad magic number %#"PRIx32"\n", > + d->domain_id, hdr->magic); > + return -1; > + } > + > + if ( hdr->version != HVM_FILE_VERSION ) > + { > + printk(XENLOG_G_ERR "HVM%d restore: unsupported version %u\n", > + d->domain_id, hdr->version); > + return -1; > + } > + > + cpuid = READ_SYSREG32(MIDR_EL1); > + if ( hdr->cpuid != cpuid ) > + { > + printk(XENLOG_G_INFO "HVM%d restore: VM saved on one CPU " > + "(%#"PRIx32") and restored on another (%#"PRIx32").\n", > + d->domain_id, hdr->cpuid, cpuid); > + return -1; > + } > + > + return 0; > +} > + > +/* > + * Local variables: > + * mode: C > + * c-file-style: "BSD" > + * c-basic-offset: 4 > + * tab-width: 4 > + * indent-tabs-mode: nil > + * End: > + */ > diff --git a/xen/common/Makefile b/xen/common/Makefile > index 6da4651..733b19f 100644 > --- a/xen/common/Makefile > +++ b/xen/common/Makefile > @@ -61,6 +61,8 @@ subdir-$(CONFIG_COMPAT) += compat > > subdir-$(x86_64) += hvm > > +subdir-$(CONFIG_ARM) += hvm > + > subdir-$(coverage) += gcov > > subdir-y += libelf > diff --git a/xen/include/asm-arm/hvm/support.h b/xen/include/asm-arm/hvm/support.h > new file mode 100644 > index 0000000..8311f2f > --- /dev/null > +++ b/xen/include/asm-arm/hvm/support.h > @@ -0,0 +1,29 @@ > +/* > + * support.h: HVM support routines used by ARMv7 VE. > + * > + * Copyright (c) 2012, Citrix Systems > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms and conditions of the GNU General Public License, > + * version 2, as published by the Free Software Foundation. > + * > + * This program is distributed in the hope it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + * > + * You should have received a copy of the GNU General Public License along with > + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple > + * Place - Suite 330, Boston, MA 02111-1307 USA. > + */ > + > +#ifndef __ASM_ARM_HVM_SUPPORT_H__ > +#define __ASM_ARM_HVM_SUPPORT_H__ > + > +#include <xen/types.h> > +#include <public/hvm/ioreq.h> > +#include <xen/sched.h> > +#include <xen/hvm/save.h> > +#include <asm/processor.h> > + > +#endif /* __ASM_ARM_HVM_SUPPORT_H__ */ > diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h > index 75b8e65..7e55384 100644 > --- a/xen/include/public/arch-arm/hvm/save.h > +++ b/xen/include/public/arch-arm/hvm/save.h > @@ -26,6 +26,72 @@ > #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__ > #define __XEN_PUBLIC_HVM_SAVE_ARM_H__ > > +#define HVM_FILE_MAGIC 0x92385520 > +#define HVM_FILE_VERSION 0x00000001 > + > + > +struct hvm_save_header > +{ > + uint32_t magic; /* Must be HVM_FILE_MAGIC */ > + uint32_t version; /* File format version */ > + uint64_t changeset; /* Version of Xen that saved this file */ > + uint32_t cpuid; /* MIDR_EL1 on the saving machine */ > +}; > + > +DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header); > + > +struct vgic_rank > +{ > + uint32_t ienable, iactive, ipend, pendsgi; > + uint32_t icfg[2]; > + uint32_t ipriority[8]; > + uint32_t itargets[8]; > +}; > + > +struct hvm_hw_gic > +{ > + uint32_t gic_hcr; > + uint32_t gic_vmcr; > + uint32_t gic_apr; > + uint32_t gic_lr[64]; > + uint64_t event_mask; > + uint64_t lr_mask; > + struct vgic_rank ppi_state; > +}; > + > +DECLARE_HVM_SAVE_TYPE(GIC, 2, struct hvm_hw_gic); > + > +#define TIMER_TYPE_VIRT 0 > +#define TIMER_TYPE_PHYS 1 > + > +struct hvm_hw_timer > +{ > + uint64_t vtb_offset; > + uint32_t ctl; > + uint64_t cval; > + uint32_t type; > +}; > + > +DECLARE_HVM_SAVE_TYPE(A15_TIMER, 3, struct hvm_hw_timer); > + > +struct hvm_hw_vfp > +{ > + uint64_t fpregs1[16]; /* {d0-d15} */ > + uint64_t fpregs2[16]; /* {d16-d31} */ > + uint32_t fpexc; > + uint32_t fpscr; > + /* VFP implementation specific state */ > + uint32_t fpinst; > + uint32_t fpinst2; > +}; > + > +DECLARE_HVM_SAVE_TYPE(VFP, 4, struct hvm_hw_vfp); > + > +/* > + * Largest type-code in use > + */ > +#define HVM_SAVE_CODE_MAX 4 > + > #endif > > /* >-- Julien Grall
Julien Grall
2013-Oct-07 12:53 UTC
Re: [PATCH v4 6/9] xen/arm: Fixing clear_guest_offset macro
On 10/04/2013 05:44 AM, Jaeyong Yoo wrote:> Fix the the broken macro ''clear_guest_offset'' in arm. > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>Reviewed-by: Julien Grall <julien.grall@linaro.org>> --- > xen/include/asm-arm/guest_access.h | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-arm/guest_access.h > index 34aae14..8ff088f 100644 > --- a/xen/include/asm-arm/guest_access.h > +++ b/xen/include/asm-arm/guest_access.h > @@ -77,8 +77,9 @@ unsigned long raw_clear_guest(void *to, unsigned len); > * Clear an array of objects in guest context via a guest handle, > * specifying an offset into the guest array. > */ > -#define clear_guest_offset(hnd, off, ptr, nr) ({ \ > - raw_clear_guest(_d+(off), nr); \ > +#define clear_guest_offset(hnd, off, nr) ({ \ > + void *_d = (hnd).p; \ > + raw_clear_guest(_d+(off), nr); \ > }) > > /* >-- Julien Grall
Julien Grall
2013-Oct-07 13:02 UTC
Re: [PATCH v4 8/9] xen/arm: Implement hypercall for dirty page tracing
On 10/04/2013 05:44 AM, Jaeyong Yoo wrote:> Add hypercall (shadow op: enable/disable and clean/peek dirted page bitmap). > It consists of two parts: dirty page detecting and saving. > For detecting, we setup the guest p2m''s leaf PTE read-only and whenever > the guest tries to write something, permission fault happens and traps into xen. > The permission-faulted GPA should be saved for the toolstack (when it wants to see > which pages are dirted). For this purpose, we temporarily save the GPAs into linked > list by using ''add_mapped_vaddr'' function and when toolstack wants > (log_dirty_op function) the GPAs are copied into bitmap and the linnked list is flushed. > > Additionally, for supporting parallel migration of domUs, vlpt area should be context > switched. > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > ---[..]> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > index 4c0fc32..3b78ed2 100644 > --- a/xen/arch/arm/traps.c > +++ b/xen/arch/arm/traps.c > @@ -1313,6 +1313,8 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, > const char *msg; > int rc, level = -1; > mmio_info_t info; > + int page_fault = ((dabt.dfsc & FSC_MASK) => + (FSC_FLT_PERM + FSC_3D_LEVEL) && dabt.write); > > if ( !check_conditional_instr(regs, hsr) ) > { > @@ -1327,22 +1329,23 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, > info.gva = READ_SYSREG64(FAR_EL2); > #endif > > - if (dabt.s1ptw) > + if ( dabt.s1ptw && !page_fault )I think checking !page_fault is nearly everywhere is error-prone when this function will be modified. Can you do something like this? if ( page_fault ) // Your code to handle page fault else { // handle_mmio } It will avoid && !page_fault.> goto bad_data_abort; > > rc = gva_to_ipa(info.gva, &info.gpa); > - if ( rc == -EFAULT ) > + if ( rc == -EFAULT && !page_fault ) > goto bad_data_abort; > > /* XXX: Decode the instruction if ISS is not valid */ > - if ( !dabt.valid ) > + if ( !dabt.valid && !page_fault ) > goto bad_data_abort; > > /* > * Erratum 766422: Thumb store translation fault to Hypervisor may > * not have correct HSR Rt value. > */ > - if ( cpu_has_erratum_766422() && (regs->cpsr & PSR_THUMB) && dabt.write ) > + if ( cpu_has_erratum_766422() && (regs->cpsr & PSR_THUMB) && dabt.write > + && !page_fault) > { > rc = decode_instruction(regs, &info.dabt); > if ( rc ) > @@ -1358,6 +1361,16 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, > return; > } > > + /* handle permission fault on write */ > + if ( page_fault ) > + { > + if ( current->domain->arch.dirty.mode == 0 ) > + goto bad_data_abort; > + > + handle_page_fault(current->domain, info.gpa);You must call advance_pc(regs, hsr) here.> + return; > + } > + > bad_data_abort:-- Julien Grall
Eugene Fedotov
2013-Oct-07 15:51 UTC
Re: [PATCH v4 8/9] xen/arm: Implement hypercall for dirty page tracing
07.10.2013 17:02, Julien Grall:> I think checking !page_fault is nearly everywhere is error-prone when > this function will be modified. > > Can you do something like this? > > if ( page_fault ) > // Your code to handle page fault > else > { > // handle_mmio > } > > It will avoid && !page_fault.Yes, but I think at page fault condition we should check whether address belong MMIO regions (page fault at MMIO addr is possible, but we don''t need that memory to transfer)> >> > goto bad_data_abort; >> > >> > rc = gva_to_ipa(info.gva, &info.gpa); >> >- if ( rc == -EFAULT ) >> >+ if ( rc == -EFAULT && !page_fault ) >> > goto bad_data_abort; >> > >> > /* XXX: Decode the instruction if ISS is not valid */ >> >- if ( !dabt.valid ) >> >+ if ( !dabt.valid && !page_fault ) >> > goto bad_data_abort; >> > >> > /* >> > * Erratum 766422: Thumb store translation fault to Hypervisor may >> > * not have correct HSR Rt value. >> > */ >> >- if ( cpu_has_erratum_766422() && (regs->cpsr & PSR_THUMB) && dabt.write ) >> >+ if ( cpu_has_erratum_766422() && (regs->cpsr & PSR_THUMB) && dabt.write >> >+ && !page_fault) >> > { >> > rc = decode_instruction(regs, &info.dabt); >> > if ( rc ) >> >@@ -1358,6 +1361,16 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, >> > return; >> > } >> > >> >+ /* handle permission fault on write */ >> >+ if ( page_fault ) >> >+ { >> >+ if ( current->domain->arch.dirty.mode == 0 ) >> >+ goto bad_data_abort; >> >+ >> >+ handle_page_fault(current->domain, info.gpa); > You must call advance_pc(regs, hsr) here. >Let me explain. I think the difference between handle_page_fault and handle_mmio is that the ongoing memory operation (that trapped here) should be repeated after handle_page_fault (the page fault handler clears read-only bit), but should be stepped out after handle_mmio (to do this we call advance_pc to increase the pc register). So, advance_pc is intentionally missed. Best regards, Evgeny.
Ian Campbell
2013-Oct-10 10:40 UTC
Re: [PATCH v4 2/9] xen/arm: Add more registers for saving and restoring vcpu registers
On Fri, 2013-10-04 at 13:43 +0900, Jaeyong Yoo wrote:> diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h > index 5d359af..bf6dc9a 100644 > --- a/xen/include/public/arch-arm.h > +++ b/xen/include/public/arch-arm.h > @@ -253,6 +253,41 @@ struct vcpu_guest_context { > > uint32_t sctlr, ttbcr; > uint64_t ttbr0, ttbr1; > + uint32_t ifar, dfar; > + uint32_t ifsr, dfsr; > + uint32_t dacr; > + uint64_t par; > + > +#ifdef CONFIG_ARM_32I''m afraid a per arch ifdef isn''t allowed in the include/public tree. The interface should be identical for both 32 and 64 bit callers. Also think of 32-on-64 guests etc. Also, this struct is guest facing (via VCPUOP_initialise) but many/all of these new registers are not things which a guest needs to specify via a hypercall. IOW I think many of them should be part of some toolstack private save/restore interface. I think x86 has a hvm_hw_cpu struct, which I think seems more appropriate. Tim, what do you think?> + uint32_t mair0, mair1; > + uint32_t tpidr_el0; > + uint32_t tpidr_el1; > + uint32_t tpidrro_el0; > + uint32_t vbar; > +#else > + uint64_t mair; > + uint64_t tpidr_el0; > + uint64_t tpidr_el1; > + uint64_t tpidrro_el0; > + uint64_t vbar; > +#endif > + > + /* Control Registers */ > + uint32_t actlr; > + uint32_t cpacr; > + > + uint32_t afsr0, afsr1; > + > + uint32_t contextidr; > + > + uint32_t teecr, teehbr; /* ThumbEE, 32-bit guests only */ > + > +#ifdef CONFIG_ARM_32 > + uint32_t joscr, jmcr; > +#endif > + > + /* CP 15 */ > + uint32_t csselr; > }; > typedef struct vcpu_guest_context vcpu_guest_context_t; > DEFINE_XEN_GUEST_HANDLE(vcpu_guest_context_t);
Ian Campbell
2013-Oct-10 10:56 UTC
Re: [PATCH v4 3/9] xen/arm: Implement set_memory_map hypercall for arm
On Fri, 2013-10-04 at 13:43 +0900, Jaeyong Yoo wrote:> From: Evgeny Fedotov <e.fedotov@samsung.com> > > When creating domU in toolstack, pass the guest memory > map info to the hypervisor, and the hypervisor stores those info in > arch_domain for later use. > > Singed-off-by: Evgeny Fedotov <e.fedotov@samsung.com> > --- > tools/libxc/xc_dom_arm.c | 12 +++++++- > tools/libxc/xc_domain.c | 44 ++++++++++++++++++++++++++++ > tools/libxc/xenctrl.h | 23 +++++++++++++++ > xen/arch/arm/domain.c | 3 ++ > xen/arch/arm/mm.c | 68 ++++++++++++++++++++++++++++++++++++++++++++ > xen/include/asm-arm/domain.h | 2 ++ > xen/include/asm-arm/mm.h | 1 + > xen/include/public/memory.h | 15 ++++++++-- > xen/include/xsm/dummy.h | 5 ++++ > xen/include/xsm/xsm.h | 5 ++++ > 10 files changed, 175 insertions(+), 3 deletions(-) > > diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c > index df59ffb..20c9095 100644 > --- a/tools/libxc/xc_dom_arm.c > +++ b/tools/libxc/xc_dom_arm.c > @@ -166,6 +166,7 @@ int arch_setup_meminit(struct xc_dom_image *dom) > { > int rc; > xen_pfn_t pfn, allocsz, i; > + struct dt_mem_info memmap; > > dom->shadow_enabled = 1; > > @@ -191,7 +192,16 @@ int arch_setup_meminit(struct xc_dom_image *dom) > 0, 0, &dom->p2m_host[i]); > } > > - return 0; > + /* setup guest memory map */ > + memmap.nr_banks = 2; > + memmap.bank[0].start = (dom->rambase_pfn << PAGE_SHIFT_ARM); > + memmap.bank[0].size = (dom->total_pages << PAGE_SHIFT_ARM); > + /*The end of main memory: magic pages */ > + memmap.bank[1].start = memmap.bank[0].start + memmap.bank[0].size; > + memmap.bank[1].size = NR_MAGIC_PAGES << PAGE_SHIFT_ARM;Are the 0 and 1 here hardcoded magic numbers?> + return xc_domain_set_memory_map(dom->xch, dom->guest_domid, &memmap);I think this is using set_memory_map in a different way to it is used for x86 (where it gives the PV e820 map, a PV version of a BIOS provided datastructure). The guest should get its memory map via DTB not via a PV hypercall. I know the guest isn''t using get_memory_map but I don''t think we should even make it available. On x86 the magic pages are handled specially in the save restore code via HVM_PARAMS and not exposed to the guest via this memory map call. I think that''s the way to go here too. We do need a way to remember and return the guest RAM layout (well, for now the RAM base is hardcoded, so we could fudge it) but I think a toolstack internal mechanism like a save record would be better. Ian.
Ian Campbell
2013-Oct-10 12:00 UTC
Re: [PATCH v4 6/9] xen/arm: Fixing clear_guest_offset macro
On Mon, 2013-10-07 at 13:53 +0100, Julien Grall wrote:> On 10/04/2013 05:44 AM, Jaeyong Yoo wrote: > > Fix the the broken macro ''clear_guest_offset'' in arm. > > > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > Reviewed-by: Julien Grall <julien.grall@linaro.org>acked + applied. thanks.
Julien Grall
2013-Oct-10 14:10 UTC
Re: [PATCH v4 8/9] xen/arm: Implement hypercall for dirty page tracing
On 10/07/2013 04:51 PM, Eugene Fedotov wrote:> Let me explain. I think the difference between handle_page_fault and > handle_mmio is that the ongoing memory operation (that trapped here) > should be repeated after handle_page_fault (the page fault handler > clears read-only bit), but should be stepped out after handle_mmio (to > do this we call advance_pc to increase the pc register). So, advance_pc > is intentionally missed.Oh right, thanks for the explanation! -- Julien Grall
Jaeyong Yoo
2013-Oct-11 08:30 UTC
Re: [PATCH v4 2/9] xen/arm: Add more registers for saving and restoring vcpu registers
> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of Ian Campbell > Sent: Thursday, October 10, 2013 7:41 PM > To: Jaeyong Yoo > Cc: Tim Deegan; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH v4 2/9] xen/arm: Add more registers for > saving and restoring vcpu registers > > On Fri, 2013-10-04 at 13:43 +0900, Jaeyong Yoo wrote: > > diff --git a/xen/include/public/arch-arm.h > > b/xen/include/public/arch-arm.h index 5d359af..bf6dc9a 100644 > > --- a/xen/include/public/arch-arm.h > > +++ b/xen/include/public/arch-arm.h > > @@ -253,6 +253,41 @@ struct vcpu_guest_context { > > > > uint32_t sctlr, ttbcr; > > uint64_t ttbr0, ttbr1; > > + uint32_t ifar, dfar; > > + uint32_t ifsr, dfsr; > > + uint32_t dacr; > > + uint64_t par; > > + > > +#ifdef CONFIG_ARM_32 > > I''m afraid a per arch ifdef isn''t allowed in the include/public tree. > The interface should be identical for both 32 and 64 bit callers. Also > think of 32-on-64 guests etc. > > Also, this struct is guest facing (via VCPUOP_initialise) but many/all of > these new registers are not things which a guest needs to specify via a > hypercall. IOW I think many of them should be part of some toolstack > private save/restore interface.I see, the guest can specify something like sctlr, and ttbr/ttbcr, and the others should be hidden inside hvm save/restore.> > I think x86 has a hvm_hw_cpu struct, which I think seems more appropriate. > Tim, what do you think? > > > + uint32_t mair0, mair1; > > + uint32_t tpidr_el0; > > + uint32_t tpidr_el1; > > + uint32_t tpidrro_el0; > > + uint32_t vbar; > > +#else > > + uint64_t mair; > > + uint64_t tpidr_el0; > > + uint64_t tpidr_el1; > > + uint64_t tpidrro_el0; > > + uint64_t vbar; > > +#endif > > + > > + /* Control Registers */ > > + uint32_t actlr; > > + uint32_t cpacr; > > + > > + uint32_t afsr0, afsr1; > > + > > + uint32_t contextidr; > > + > > + uint32_t teecr, teehbr; /* ThumbEE, 32-bit guests only */ > > + > > +#ifdef CONFIG_ARM_32 > > + uint32_t joscr, jmcr; > > +#endif > > + > > + /* CP 15 */ > > + uint32_t csselr; > > }; > > typedef struct vcpu_guest_context vcpu_guest_context_t; > > DEFINE_XEN_GUEST_HANDLE(vcpu_guest_context_t); > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2013-Oct-11 08:43 UTC
Re: [PATCH v4 2/9] xen/arm: Add more registers for saving and restoring vcpu registers
On Fri, 2013-10-11 at 17:30 +0900, Jaeyong Yoo wrote:> > -----Original Message----- > > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > > bounces@lists.xen.org] On Behalf Of Ian Campbell > > Sent: Thursday, October 10, 2013 7:41 PM > > To: Jaeyong Yoo > > Cc: Tim Deegan; xen-devel@lists.xen.org > > Subject: Re: [Xen-devel] [PATCH v4 2/9] xen/arm: Add more registers for > > saving and restoring vcpu registers > > > > On Fri, 2013-10-04 at 13:43 +0900, Jaeyong Yoo wrote: > > > diff --git a/xen/include/public/arch-arm.h > > > b/xen/include/public/arch-arm.h index 5d359af..bf6dc9a 100644 > > > --- a/xen/include/public/arch-arm.h > > > +++ b/xen/include/public/arch-arm.h > > > @@ -253,6 +253,41 @@ struct vcpu_guest_context { > > > > > > uint32_t sctlr, ttbcr; > > > uint64_t ttbr0, ttbr1; > > > + uint32_t ifar, dfar; > > > + uint32_t ifsr, dfsr; > > > + uint32_t dacr; > > > + uint64_t par; > > > + > > > +#ifdef CONFIG_ARM_32 > > > > I''m afraid a per arch ifdef isn''t allowed in the include/public tree. > > The interface should be identical for both 32 and 64 bit callers. Also > > think of 32-on-64 guests etc. > > > > Also, this struct is guest facing (via VCPUOP_initialise) but many/all of > > these new registers are not things which a guest needs to specify via a > > hypercall. IOW I think many of them should be part of some toolstack > > private save/restore interface. > > I see, the guest can specify something like sctlr, and ttbr/ttbcr, and the > others should be hidden inside hvm save/restore.Right, the important thing is that all that additional state is only visible to the toolstack and the hypervisor, not to guests. Actually the guest shouldn''t really see this interface anyway, that''s really a hold over from x86. On ARM only the toolstack really needs to use this struct. I wonder if we can drop struct vcpu_guest_context from the guest facing ABI on ARM. I see that we already don''t expose VCPUOP_initialise and the only other user is XEN_DOMCTL_(sg)etvcpucontext. I wonder if we could on x86 do: typedef vcpu_guest_context_t domctl_guest_context_t and on ARM s/vcpu_guest_context/domtctl_guest_context/ (plus appropriate __XEN_TOOLS__ ifdeffery) and make the domctl''s use the new name only while VCPUOP uses the old name? Can anyone think of a reason why a guest might need an interface to set/get VCPU state directly like this on ARM? Or why this might not work for x86? Ian.
Eugene Fedotov
2013-Oct-11 10:06 UTC
Re: [PATCH v4 3/9] xen/arm: Implement set_memory_map hypercall for arm
10.10.2013 14:56, Ian Campbell:> On Fri, 2013-10-04 at 13:43 +0900, Jaeyong Yoo wrote: >> From: Evgeny Fedotov <e.fedotov@samsung.com> >> >> When creating domU in toolstack, pass the guest memory >> map info to the hypervisor, and the hypervisor stores those info in >> arch_domain for later use. >> >> Singed-off-by: Evgeny Fedotov <e.fedotov@samsung.com> >> --- >> tools/libxc/xc_dom_arm.c | 12 +++++++- >> tools/libxc/xc_domain.c | 44 ++++++++++++++++++++++++++++ >> tools/libxc/xenctrl.h | 23 +++++++++++++++ >> xen/arch/arm/domain.c | 3 ++ >> xen/arch/arm/mm.c | 68 ++++++++++++++++++++++++++++++++++++++++++++ >> xen/include/asm-arm/domain.h | 2 ++ >> xen/include/asm-arm/mm.h | 1 + >> xen/include/public/memory.h | 15 ++++++++-- >> xen/include/xsm/dummy.h | 5 ++++ >> xen/include/xsm/xsm.h | 5 ++++ >> 10 files changed, 175 insertions(+), 3 deletions(-) >> >> diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c >> index df59ffb..20c9095 100644 >> --- a/tools/libxc/xc_dom_arm.c >> +++ b/tools/libxc/xc_dom_arm.c >> @@ -166,6 +166,7 @@ int arch_setup_meminit(struct xc_dom_image *dom) >> { >> int rc; >> xen_pfn_t pfn, allocsz, i; >> + struct dt_mem_info memmap; >> >> dom->shadow_enabled = 1; >> >> @@ -191,7 +192,16 @@ int arch_setup_meminit(struct xc_dom_image *dom) >> 0, 0, &dom->p2m_host[i]); >> } >> >> - return 0; >> + /* setup guest memory map */ >> + memmap.nr_banks = 2; >> + memmap.bank[0].start = (dom->rambase_pfn << PAGE_SHIFT_ARM); >> + memmap.bank[0].size = (dom->total_pages << PAGE_SHIFT_ARM); >> + /*The end of main memory: magic pages */ >> + memmap.bank[1].start = memmap.bank[0].start + memmap.bank[0].size; >> + memmap.bank[1].size = NR_MAGIC_PAGES << PAGE_SHIFT_ARM; > Are the 0 and 1 here hardcoded magic numbers?Well, we hardcoded here two memory regions: the first one for RAM, the second one for "magic pages".>> + return xc_domain_set_memory_map(dom->xch, dom->guest_domid, &memmap); > I think this is using set_memory_map in a different way to it is used > for x86 (where it gives the PV e820 map, a PV version of a BIOS provided > datastructure).Do you mean that using e820 structure for ARM implementation is better than dt_mem_info structure (that has been taken from libdt) and should be used in ARM implementation of get/set memory map?> The guest should get its memory map via DTB not via a PV hypercall. I > know the guest isn''t using get_memory_map but I don''t think we should > even make it available.OK, this hypercall will be available from dom0 only.> > On x86 the magic pages are handled specially in the save restore code > via HVM_PARAMS and not exposed to the guest via this memory map call. I > think that''s the way to go here too.> > We do need a way to remember and return the guest RAM layout (well, for > now the RAM base is hardcoded, so we could fudge it) but I think a > toolstack internal mechanism like a save record would be better. > > Ian. >So, get/set memory map hypercall should get/set only main RAM region? Two questions about details of next implementation: 1) Can we restrict memory map number of regions to 1 for ARM? In this case we don''t need list or array to define memory map structure. 2)We have already implemented store& restore of HVM params (magic pages PFNs) inside ARM toolstack, but I don''t know if we need to migrate contents of such pages. In current ARM toolstack we migrate magic pages contents too as a part of memory map. In x86 code I cannot find where the content of magic pages are saved or restored, so may be Xen developers are familiar about it. Best regards, Eugene.
Ian Campbell
2013-Oct-11 10:09 UTC
Re: [PATCH v4 3/9] xen/arm: Implement set_memory_map hypercall for arm
On Fri, 2013-10-11 at 14:06 +0400, Eugene Fedotov wrote:> 10.10.2013 14:56, Ian Campbell: > > On Fri, 2013-10-04 at 13:43 +0900, Jaeyong Yoo wrote: > >> From: Evgeny Fedotov <e.fedotov@samsung.com> > >> > >> When creating domU in toolstack, pass the guest memory > >> map info to the hypervisor, and the hypervisor stores those info in > >> arch_domain for later use. > >> > >> Singed-off-by: Evgeny Fedotov <e.fedotov@samsung.com> > >> --- > >> tools/libxc/xc_dom_arm.c | 12 +++++++- > >> tools/libxc/xc_domain.c | 44 ++++++++++++++++++++++++++++ > >> tools/libxc/xenctrl.h | 23 +++++++++++++++ > >> xen/arch/arm/domain.c | 3 ++ > >> xen/arch/arm/mm.c | 68 ++++++++++++++++++++++++++++++++++++++++++++ > >> xen/include/asm-arm/domain.h | 2 ++ > >> xen/include/asm-arm/mm.h | 1 + > >> xen/include/public/memory.h | 15 ++++++++-- > >> xen/include/xsm/dummy.h | 5 ++++ > >> xen/include/xsm/xsm.h | 5 ++++ > >> 10 files changed, 175 insertions(+), 3 deletions(-) > >> > >> diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c > >> index df59ffb..20c9095 100644 > >> --- a/tools/libxc/xc_dom_arm.c > >> +++ b/tools/libxc/xc_dom_arm.c > >> @@ -166,6 +166,7 @@ int arch_setup_meminit(struct xc_dom_image *dom) > >> { > >> int rc; > >> xen_pfn_t pfn, allocsz, i; > >> + struct dt_mem_info memmap; > >> > >> dom->shadow_enabled = 1; > >> > >> @@ -191,7 +192,16 @@ int arch_setup_meminit(struct xc_dom_image *dom) > >> 0, 0, &dom->p2m_host[i]); > >> } > >> > >> - return 0; > >> + /* setup guest memory map */ > >> + memmap.nr_banks = 2; > >> + memmap.bank[0].start = (dom->rambase_pfn << PAGE_SHIFT_ARM); > >> + memmap.bank[0].size = (dom->total_pages << PAGE_SHIFT_ARM); > >> + /*The end of main memory: magic pages */ > >> + memmap.bank[1].start = memmap.bank[0].start + memmap.bank[0].size; > >> + memmap.bank[1].size = NR_MAGIC_PAGES << PAGE_SHIFT_ARM; > > Are the 0 and 1 here hardcoded magic numbers?> Well, we hardcoded here two memory regions: the first one for RAM, the > second one for "magic pages".> >> + return xc_domain_set_memory_map(dom->xch, dom->guest_domid, &memmap); > > I think this is using set_memory_map in a different way to it is used > > for x86 (where it gives the PV e820 map, a PV version of a BIOS provided > > datastructure).> Do you mean that using e820 structure for ARM implementation is better > than dt_mem_info structure (that has been taken from libdt) and should > be used in ARM implementation of get/set memory map?I mean on x86 it is used as a way of communicating a memory map (in a structure vaguely like an actual defined architecture specific layout, the e820), whereas on ARM it is only used by the tools and should not be exposed to the guest, since the guest gets its info from the DTB.> > The guest should get its memory map via DTB not via a PV hypercall. I > > know the guest isn''t using get_memory_map but I don''t think we should > > even make it available.> OK, this hypercall will be available from dom0 only.It''ll have to move away from the XENMEM space then.> > > > On x86 the magic pages are handled specially in the save restore code > > via HVM_PARAMS and not exposed to the guest via this memory map call. I > > think that''s the way to go here too. > > > > > We do need a way to remember and return the guest RAM layout (well, for > > now the RAM base is hardcoded, so we could fudge it) but I think a > > toolstack internal mechanism like a save record would be better. > > > > Ian. > >> So, get/set memory map hypercall should get/set only main RAM region?If you have multiple RAM regions then it should save them, but the magic pages are not RAM regions in that sense so should not be included.> Two questions about details of next implementation: > 1) Can we restrict memory map number of regions to 1 for ARM? In this > case we don''t need list or array to define memory map structure. > 2)We have already implemented store& restore of HVM params (magic pages > PFNs) inside ARM toolstack, but I don''t know if we need to migrate > contents of such pages. In current ARM toolstack we migrate magic pages > contents too as a part of memory map. In x86 code I cannot find where > the content of magic pages are saved or restored, so may be Xen > developers are familiar about it.I have a feeling they are not migrated, and I''m not sure. Perhaps Tim knows... Ian.> > Best regards, > Eugene. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Tim Deegan
2013-Oct-11 10:18 UTC
Re: [PATCH v4 3/9] xen/arm: Implement set_memory_map hypercall for arm
At 11:09 +0100 on 11 Oct (1381489749), Ian Campbell wrote:> > Two questions about details of next implementation: > > 1) Can we restrict memory map number of regions to 1 for ARM? In this > > case we don''t need list or array to define memory map structure.Sounds very sensible to me.> > 2)We have already implemented store& restore of HVM params (magic pages > > PFNs) inside ARM toolstack, but I don''t know if we need to migrate > > contents of such pages. In current ARM toolstack we migrate magic pages > > contents too as a part of memory map. In x86 code I cannot find where > > the content of magic pages are saved or restored, so may be Xen > > developers are familiar about it. > > I have a feeling they are not migrated, and I''m not sure. Perhaps Tim > knows...IIRC they are saved as part of general guest memory, but in fact they''ll all be reset on the other side anyway. Tim.
Tim Deegan
2013-Oct-11 10:22 UTC
Re: [PATCH v4 2/9] xen/arm: Add more registers for saving and restoring vcpu registers
At 09:43 +0100 on 11 Oct (1381484614), Ian Campbell wrote:> On Fri, 2013-10-11 at 17:30 +0900, Jaeyong Yoo wrote: > > > -----Original Message----- > > > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > > > bounces@lists.xen.org] On Behalf Of Ian Campbell > > > Sent: Thursday, October 10, 2013 7:41 PM > > > To: Jaeyong Yoo > > > Cc: Tim Deegan; xen-devel@lists.xen.org > > > Subject: Re: [Xen-devel] [PATCH v4 2/9] xen/arm: Add more registers for > > > saving and restoring vcpu registers > > > > > > On Fri, 2013-10-04 at 13:43 +0900, Jaeyong Yoo wrote: > > > > diff --git a/xen/include/public/arch-arm.h > > > > b/xen/include/public/arch-arm.h index 5d359af..bf6dc9a 100644 > > > > --- a/xen/include/public/arch-arm.h > > > > +++ b/xen/include/public/arch-arm.h > > > > @@ -253,6 +253,41 @@ struct vcpu_guest_context { > > > > > > > > uint32_t sctlr, ttbcr; > > > > uint64_t ttbr0, ttbr1; > > > > + uint32_t ifar, dfar; > > > > + uint32_t ifsr, dfsr; > > > > + uint32_t dacr; > > > > + uint64_t par; > > > > + > > > > +#ifdef CONFIG_ARM_32 > > > > > > I''m afraid a per arch ifdef isn''t allowed in the include/public tree. > > > The interface should be identical for both 32 and 64 bit callers. Also > > > think of 32-on-64 guests etc. > > > > > > Also, this struct is guest facing (via VCPUOP_initialise) but many/all of > > > these new registers are not things which a guest needs to specify via a > > > hypercall. IOW I think many of them should be part of some toolstack > > > private save/restore interface. > > > > I see, the guest can specify something like sctlr, and ttbr/ttbcr, and the > > others should be hidden inside hvm save/restore. > > Right, the important thing is that all that additional state is only > visible to the toolstack and the hypervisor, not to guests. > > Actually the guest shouldn''t really see this interface anyway, that''s > really a hold over from x86. On ARM only the toolstack really needs to > use this struct. > > I wonder if we can drop struct vcpu_guest_context from the guest facing > ABI on ARM. I see that we already don''t expose VCPUOP_initialise and the > only other user is XEN_DOMCTL_(sg)etvcpucontext.Does the guest need to have those on ARM? How are you arranging SMP guest AP bringup? If it''s using SCI then maybe that hypercall interface can be dropped. Tim.
Ian Campbell
2013-Oct-11 10:25 UTC
Re: [PATCH v4 2/9] xen/arm: Add more registers for saving and restoring vcpu registers
On Fri, 2013-10-11 at 11:22 +0100, Tim Deegan wrote:> At 09:43 +0100 on 11 Oct (1381484614), Ian Campbell wrote: > > On Fri, 2013-10-11 at 17:30 +0900, Jaeyong Yoo wrote: > > > > -----Original Message----- > > > > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > > > > bounces@lists.xen.org] On Behalf Of Ian Campbell > > > > Sent: Thursday, October 10, 2013 7:41 PM > > > > To: Jaeyong Yoo > > > > Cc: Tim Deegan; xen-devel@lists.xen.org > > > > Subject: Re: [Xen-devel] [PATCH v4 2/9] xen/arm: Add more registers for > > > > saving and restoring vcpu registers > > > > > > > > On Fri, 2013-10-04 at 13:43 +0900, Jaeyong Yoo wrote: > > > > > diff --git a/xen/include/public/arch-arm.h > > > > > b/xen/include/public/arch-arm.h index 5d359af..bf6dc9a 100644 > > > > > --- a/xen/include/public/arch-arm.h > > > > > +++ b/xen/include/public/arch-arm.h > > > > > @@ -253,6 +253,41 @@ struct vcpu_guest_context { > > > > > > > > > > uint32_t sctlr, ttbcr; > > > > > uint64_t ttbr0, ttbr1; > > > > > + uint32_t ifar, dfar; > > > > > + uint32_t ifsr, dfsr; > > > > > + uint32_t dacr; > > > > > + uint64_t par; > > > > > + > > > > > +#ifdef CONFIG_ARM_32 > > > > > > > > I''m afraid a per arch ifdef isn''t allowed in the include/public tree. > > > > The interface should be identical for both 32 and 64 bit callers. Also > > > > think of 32-on-64 guests etc. > > > > > > > > Also, this struct is guest facing (via VCPUOP_initialise) but many/all of > > > > these new registers are not things which a guest needs to specify via a > > > > hypercall. IOW I think many of them should be part of some toolstack > > > > private save/restore interface. > > > > > > I see, the guest can specify something like sctlr, and ttbr/ttbcr, and the > > > others should be hidden inside hvm save/restore. > > > > Right, the important thing is that all that additional state is only > > visible to the toolstack and the hypervisor, not to guests. > > > > Actually the guest shouldn''t really see this interface anyway, that''s > > really a hold over from x86. On ARM only the toolstack really needs to > > use this struct. > > > > I wonder if we can drop struct vcpu_guest_context from the guest facing > > ABI on ARM. I see that we already don''t expose VCPUOP_initialise and the > > only other user is XEN_DOMCTL_(sg)etvcpucontext. > > Does the guest need to have those on ARM? How are you arranging SMP > guest AP bringup? If it''s using SCI then maybe that hypercall interface > can be dropped.SMP bring up is done via PSCI, which is the firmware "hypercall" interface, defined by ARM for bringing up physucal CPUs which we implement within Xen for the guests'' benefit. Ian
Jaeyong Yoo
2013-Oct-14 04:39 UTC
Re: [PATCH v4 2/9] xen/arm: Add more registers for saving and restoring vcpu registers
> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of Ian Campbell > Sent: Friday, October 11, 2013 7:26 PM > To: Tim Deegan > Cc: Keir Fraser; Stefano Stabellini; Jan Beulich; Jaeyong Yoo; xen- > devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH v4 2/9] xen/arm: Add more registers for > saving and restoring vcpu registers > > On Fri, 2013-10-11 at 11:22 +0100, Tim Deegan wrote: > > At 09:43 +0100 on 11 Oct (1381484614), Ian Campbell wrote: > > > On Fri, 2013-10-11 at 17:30 +0900, Jaeyong Yoo wrote: > > > > > -----Original Message----- > > > > > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > > > > > bounces@lists.xen.org] On Behalf Of Ian Campbell > > > > > Sent: Thursday, October 10, 2013 7:41 PM > > > > > To: Jaeyong Yoo > > > > > Cc: Tim Deegan; xen-devel@lists.xen.org > > > > > Subject: Re: [Xen-devel] [PATCH v4 2/9] xen/arm: Add more > > > > > registers for saving and restoring vcpu registers > > > > > > > > > > On Fri, 2013-10-04 at 13:43 +0900, Jaeyong Yoo wrote: > > > > > > diff --git a/xen/include/public/arch-arm.h > > > > > > b/xen/include/public/arch-arm.h index 5d359af..bf6dc9a 100644 > > > > > > --- a/xen/include/public/arch-arm.h > > > > > > +++ b/xen/include/public/arch-arm.h > > > > > > @@ -253,6 +253,41 @@ struct vcpu_guest_context { > > > > > > > > > > > > uint32_t sctlr, ttbcr; > > > > > > uint64_t ttbr0, ttbr1; > > > > > > + uint32_t ifar, dfar; > > > > > > + uint32_t ifsr, dfsr; > > > > > > + uint32_t dacr; > > > > > > + uint64_t par; > > > > > > + > > > > > > +#ifdef CONFIG_ARM_32 > > > > > > > > > > I''m afraid a per arch ifdef isn''t allowed in the include/public > tree. > > > > > The interface should be identical for both 32 and 64 bit > > > > > callers. Also think of 32-on-64 guests etc. > > > > > > > > > > Also, this struct is guest facing (via VCPUOP_initialise) but > > > > > many/all of these new registers are not things which a guest > > > > > needs to specify via a hypercall. IOW I think many of them > > > > > should be part of some toolstack private save/restore interface. > > > > > > > > I see, the guest can specify something like sctlr, and ttbr/ttbcr, > > > > and the others should be hidden inside hvm save/restore. > > > > > > Right, the important thing is that all that additional state is only > > > visible to the toolstack and the hypervisor, not to guests. > > > > > > Actually the guest shouldn''t really see this interface anyway, > > > that''s really a hold over from x86. On ARM only the toolstack really > > > needs to use this struct. > > > > > > I wonder if we can drop struct vcpu_guest_context from the guest > > > facing ABI on ARM. I see that we already don''t expose > > > VCPUOP_initialise and the only other user is > XEN_DOMCTL_(sg)etvcpucontext. > > > > Does the guest need to have those on ARM? How are you arranging SMP > > guest AP bringup? If it''s using SCI then maybe that hypercall > > interface can be dropped. > > SMP bring up is done via PSCI, which is the firmware "hypercall" > interface, defined by ARM for bringing up physucal CPUs which we implement > within Xen for the guests'' benefit.So, could I remove XEN_DOMCTL_(sg)etvcpucontext and use HVM save/restore for migrating vcpu registers? And, if I could, arch_get_info_guest becomes dangling function and would it be better to remove this function too? or better to keep it for symmetry? Jaeyong> > Ian > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2013-Oct-16 12:50 UTC
Re: [PATCH v4 7/9] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration
On Fri, 2013-10-04 at 13:44 +0900, Jaeyong Yoo wrote:> Implement vlpt (virtual-linear page table) for fast accessing of 3rd > PTE of guest p2m. When creating a mapping for vlpt, just copy the > 1st level PTE of guest p2m to the xen''s 2nd level PTE. Then the mapping > becomes the following: > xen''s 1st PTE --> > xen''s 2nd PTE (which is the same as 1st PTE of guest p2m) --> > guest p2m''s 2nd PTE --> > guest p2m''s 3rd PTE (the memory contents where the vlpt points) > For more info about vlpt, see: > http://www.technovelty.org/linux/virtual-linear-page-table.html > > This function is used in dirty-page tracing: when domU write-fault is trapped by xen, > xen can immediately locate the 3rd PTE of guest p2m.ISTR you had some numbers which backed up the choice of this solution. Can you either include them here or link to the list archives?> Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > --- > xen/arch/arm/mm.c | 78 ++++++++++++++++++++++++++++++++++++++++++++ > xen/include/asm-arm/config.h | 6 ++++ > xen/include/asm-arm/domain.h | 7 ++++ > xen/include/asm-arm/mm.h | 22 +++++++++++++ > 4 files changed, 113 insertions(+) > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index 120eae8..98edbc9 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -1382,6 +1382,84 @@ int is_iomem_page(unsigned long mfn) > return 1; > return 0; > } > + > +/* flush the vlpt area */ > +static void flush_vlpt(struct domain *d) > +{ > + int flush_size; > + flush_size = (d->arch.dirty.second_lvl_end - > + d->arch.dirty.second_lvl_start) << SECOND_SHIFT; > + /* flushing the 3rd level mapping */ > + flush_xen_data_tlb_range_va(VIRT_LIN_P2M_START, > + flush_size);Shouldn''t the base here be VIRT_LIN_P2M_START + (d->arch.dirty.second_lvl_start) << SECOND_SHIFT) or something like that? flush_xen_data_tlb_range_va just turns into a loop over the addresses, so you might find you may as well do the flushes as you update the ptes in the below, perhaps with an optimisation to pull the barriers outside that loop.> +} > + > +/* restore the xen page table for vlpt mapping for domain d */ > +void restore_vlpt(struct domain *d) > +{ > + int i, j = 0; > + int need_flush = 0; > + > + for ( i = d->arch.dirty.second_lvl_start; > + i < d->arch.dirty.second_lvl_end; > + ++i, ++j ) > + { > + if ( xen_second[i].bits != d->arch.dirty.second_lvl[j].bits ) > + { > + write_pte(&xen_second[i], d->arch.dirty.second_lvl[j]); > + need_flush = 1; > + } > + } > + > + if ( need_flush ) flush_vlpt(d); > +} > + > +/* setting up the xen page table for vlpt mapping for domain d */ > +void prepare_vlpt(struct domain *d) > +{ > + int xen_second_linear_base; > + int gp2m_start_index, gp2m_end_index; > + struct p2m_domain *p2m = &d->arch.p2m; > + struct page_info *second_lvl_page; > + vaddr_t gma_start = 0; > + vaddr_t gma_end = 0; > + lpae_t *first; > + int i, j; > + > + xen_second_linear_base = second_linear_offset(VIRT_LIN_P2M_START); > + get_gma_start_end(d, &gma_start, &gma_end); > + > + gp2m_start_index = gma_start >> FIRST_SHIFT; > + gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1; > + > + second_lvl_page = alloc_domheap_page(NULL, 0);The p2m first is two concatenated pages with a total of 1024 entries (which is needed to give the full 40 bit IPA space). I have a feeling this means you need two pages here? Or maybe we shouldn''t be supporting the full 40 bit addresses on 32-bit? NB currently create_p2m_entries doesn''t actually support 40 bits, but I need to fix that for an arm64 platform I''m looking at.> + d->arch.dirty.second_lvl = map_domain_page_global( > + page_to_mfn(second_lvl_page) ); > + > + first = __map_domain_page(p2m->first_level); > + for ( i = gp2m_start_index, j = 0; i < gp2m_end_index; ++i, ++j ) > + { > + write_pte(&xen_second[xen_second_linear_base+i], > + first[i]); > + > + /* we copy the mapping into domain''s structure as a reference in case > + * of the context switch (used in restore_vlpt) */ > + d->arch.dirty.second_lvl[j] = first[i]; > + } > + unmap_domain_page(first); > + > + /* storing the start and end index */ > + d->arch.dirty.second_lvl_start = xen_second_linear_base + gp2m_start_index; > + d->arch.dirty.second_lvl_end = xen_second_linear_base + gp2m_end_index; > + > + flush_vlpt(d); > +} > + > +void cleanup_vlpt(struct domain *d) > +{ > + unmap_domain_page_global(d->arch.dirty.second_lvl); > +} > + > /* > * Local variables: > * mode: C > diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h > index 9e395c2..47fc26e 100644 > --- a/xen/include/asm-arm/config.h > +++ b/xen/include/asm-arm/config.h > @@ -87,6 +87,7 @@ > * 0 - 8M <COMMON> > * > * 32M - 128M Frametable: 24 bytes per page for 16GB of RAM > + * 128M - 256M Virtual-linear mapping to P2M table > * 256M - 1G VMAP: ioremap and early_ioremap use this virtual address > * space > * > @@ -132,6 +133,9 @@ > > #define VMAP_VIRT_END XENHEAP_VIRT_START > > +#define VIRT_LIN_P2M_START _AT(vaddr_t,0x08000000)Can you put this up in the list of addresses in order please.> +#define VIRT_LIN_P2M_END VMAP_VIRT_START> + > #define DOMHEAP_ENTRIES 1024 /* 1024 2MB mapping slots */ > > /* Number of domheap pagetable pages required at the second level (2MB mappings) */ > @@ -158,6 +162,8 @@ > > #define HYPERVISOR_VIRT_END DIRECTMAP_VIRT_END > > +/*TODO (ARM_64): define VIRT_LIN_P2M_START VIRT_LIN_P2M_END */ > + > #endif > > /* Fixmap slots */ > diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h > index 755c7af..7e7743a 100644 > --- a/xen/include/asm-arm/domain.h > +++ b/xen/include/asm-arm/domain.h > @@ -112,6 +112,13 @@ struct arch_domain > spinlock_t lock; > } vuart; > > + /* dirty-page tracing */ > + struct { > + int second_lvl_start; /* for context switch */ > + int second_lvl_end; > + lpae_t *second_lvl; > + } dirty; > + > struct dt_mem_info map_domain; > spinlock_t map_lock; > } __cacheline_aligned; > diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h > index 021e8d6..b9cbcf2 100644 > --- a/xen/include/asm-arm/mm.h > +++ b/xen/include/asm-arm/mm.h > @@ -344,6 +344,28 @@ static inline void put_page_and_type(struct page_info *page) > > void get_gma_start_end(struct domain *d, vaddr_t *start, vaddr_t *end); > > +/* vlpt (virtual linear page table) related functions */ > +void prepare_vlpt(struct domain *d); > +void cleanup_vlpt(struct domain *d); > +void restore_vlpt(struct domain *d); > + > +/* calculate the xen''s virtual address for accessing the leaf PTE of > + * a given address (GPA) */ > +static inline lpae_t * get_vlpt_3lvl_pte(vaddr_t addr)Argument is an IPA and should therefore be a paddr_t I think?> +{ > + unsigned long ipa_third = third_table_offset(addr); > + unsigned long ipa_second = second_table_offset(addr); > + unsigned long ipa_first = first_table_offset(addr); > + > + /* guest p2m''s first level index is the index of xen''s second level, > + and guest p2m''s second level index is the index of xen''s third level */ > + return (lpae_t *)( > + ( (second_linear_offset(VIRT_LIN_P2M_START) + ipa_first) > + << SECOND_SHIFT) + > + (ipa_second << THIRD_SHIFT) + > + (ipa_third << 3) ); > +}Can we assign the table base to an "lpae_t *table" and then use &table[ipa_third] rather than hardcoding << 3? I can''t quite shake the feeling that the "ipa_first<<SECOND_SHIFT" and "ipa_second<<THIRD_SHIFT" elements of this could be achieved with a single shift of the original address (and maybe a mask).> + > #endif /* __ARCH_ARM_MM__ */ > /* > * Local variables:
Ian Campbell
2013-Oct-16 13:44 UTC
Re: [PATCH v4 8/9] xen/arm: Implement hypercall for dirty page tracing
On Fri, 2013-10-04 at 13:44 +0900, Jaeyong Yoo wrote:> Add hypercall (shadow op: enable/disable and clean/peek dirted page bitmap)."dirtied"> It consists of two parts: dirty page detecting and saving. > For detecting, we setup the guest p2m''s leaf PTE read-only and whenever > the guest tries to write something, permission fault happens and traps into xen. > The permission-faulted GPA should be saved for the toolstack (when it wants to see > which pages are dirted). For this purpose, we temporarily save the GPAs into linked"dirtied" again> list by using ''add_mapped_vaddr'' function and when toolstack wants > (log_dirty_op function) the GPAs are copied into bitmap and the linnked list is flushed."linked"> spin_lock_init(&d->arch.map_lock); > d->arch.map_domain.nr_banks = 0; > > + /* init for dirty-page tracing */ > + d->arch.dirty.count = 0; > + d->arch.dirty.mode = 0; > + d->arch.dirty.head = NULL; > + d->arch.dirty.mvn_head = NULL; > + spin_lock_init(&d->arch.dirty.lock); > + > + d->arch.dirty.second_lvl_start = 0; > + d->arch.dirty.second_lvl_end = 0; > + d->arch.dirty.second_lvl = NULL;I think some/all of these belong in the previous patch which added the fields.> + > clear_page(d->shared_info); > share_xen_page_with_guest( > virt_to_page(d->shared_info), d, XENSHARE_writable); > diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c > index 78fb322..7edce12 100644 > --- a/xen/arch/arm/domctl.c > +++ b/xen/arch/arm/domctl.c > @@ -92,6 +92,19 @@ long arch_do_domctl(struct xen_domctl *domctl, struct domain *d, > xfree(c.data); > } > break; > + case XEN_DOMCTL_shadow_op: > + { > + domain_pause(d); > + ret = dirty_mode_op(d, &domctl->u.shadow_op); > + domain_unpause(d); > + > + if ( (&domctl->u.shadow_op)->op == XEN_DOMCTL_SHADOW_OP_CLEAN || > + (&domctl->u.shadow_op)->op == XEN_DOMCTL_SHADOW_OP_PEEK )(&domctl->u.shadow_op)->op is just a weird way to write domctl->u.shadow_op.op isn''t it? Do you only want to copyback on success or is it always correct?> + { > + copyback = 1; > + } > + } > + break;> @@ -1460,6 +1459,28 @@ void cleanup_vlpt(struct domain *d) > unmap_domain_page_global(d->arch.dirty.second_lvl); > } > > +/* routine for dirty-page tracing > + * > + * On first write, it page faults, its entry is changed to read-write, > + * and on retry the write succeeds. > + * > + * for locating p2m of the faulting entry, we use virtual-linear page table. > + */ > +void handle_page_fault(struct domain *d, paddr_t addr) > +{ > + lpae_t *vlp2m_pte = get_vlpt_3lvl_pte(addr); > + if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0 )I think you need to distinguish p2m entries which are r/o due to log dirty from those which are r/o due to wanting them to be r/o. Maybe we don''t have the latter right now? We will eventually need p2m types I think.> + { > + lpae_t pte = *vlp2m_pte; > + pte.p2m.write = 1; > + write_pte(vlp2m_pte, pte); > + flush_tlb_local(); > + > + /* in order to remove mappings in cleanup stage */ > + add_mapped_vaddr(d, addr);No locks or atomic operations here? How are races with the tools reading the dirty bitmap addressed? If it is by clever ordering of the checks and pte writes then I think a comment would be in order.> + } > +} > + > /* > * Local variables: > * mode: C > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index 2d09fef..3b2b11a 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -6,6 +6,8 @@ > #include <xen/bitops.h> > #include <asm/flushtlb.h> > #include <asm/gic.h> > +#include <xen/guest_access.h> > +#include <xen/pfn.h> > > void dump_p2m_lookup(struct domain *d, paddr_t addr) > { > @@ -408,6 +410,304 @@ unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn) > return p >> PAGE_SHIFT; > } > > +#define MAX_VA_PER_NODE (PAGE_SIZE - 2 * sizeof(void *) - sizeof(int))\ > + / sizeof(unsigned long) > + > +/* An array-based linked list for storing virtual addresses (i.e., the 3rd > + * level table mapping) that should be destroyed after live migration */ > +struct mapped_va_node > +{ > + struct page_info *next; > + struct mapped_va_node *mvn_next; > + int items; > + unsigned long vaddrs[MAX_VA_PER_NODE]; > +}; > + > +int add_mapped_vaddr(struct domain *d, unsigned long va) > +{This function seems awefuly expensive (contains allocations etc) for a function called on every page fault during a migration. Why are you remembering the full va of every fault when a single bit per page would do? I think you can allocate a bitmap when logdirty is enabled. At a bit per page it shouldn''t be too huge. You might be able to encode this in the p2m which you walk when the tools ask for the bit map (i.e. p2m.type==writeable p2m.write==0 => dirty), but I think a bitmap would do the trick and be easier to implement.> + struct page_info *page_head; > + struct mapped_va_node *mvn_head; > + > + spin_lock(&d->arch.dirty.lock); > + > + page_head = d->arch.dirty.head; > + mvn_head = d->arch.dirty.mvn_head; > + if ( !mvn_head ) > + { > + page_head = alloc_domheap_page(NULL, 0); > + if ( !page_head ) > + { > + spin_unlock(&d->arch.dirty.lock); > + return -ENOMEM; > + } > + > + mvn_head = map_domain_page_global(__page_to_mfn(page_head)); > + mvn_head->items = 0; > + mvn_head->next = NULL; > + mvn_head->mvn_next = NULL; > + d->arch.dirty.head = page_head; > + d->arch.dirty.mvn_head = mvn_head; > + } > + > + if ( mvn_head->items == MAX_VA_PER_NODE ) > + { > + struct page_info *page; > + > + page = alloc_domheap_page(NULL, 0); > + if ( !page ) > + { > + spin_unlock(&d->arch.dirty.lock); > + return -ENOMEM; > + } > + > + mvn_head = map_domain_page_global(__page_to_mfn(page)); > + mvn_head->items = 0; > + mvn_head->next = d->arch.dirty.head; > + mvn_head->mvn_next = d->arch.dirty.mvn_head; > + > + d->arch.dirty.mvn_head = mvn_head; > + d->arch.dirty.head = page; > + } > + > + mvn_head->vaddrs[mvn_head->items] = va; > + mvn_head->items ++; > + > + spin_unlock(&d->arch.dirty.lock); > + return 0; > +} > + > +/* get the dirty bitmap from the linked list stored by add_mapped_vaddr > + * and also clear the mapped vaddrs linked list */ > +static void get_dirty_bitmap(struct domain *d, uint8_t *bitmap[]) > +{ > + struct page_info *page_head; > + struct mapped_va_node *mvn_head; > + struct page_info *tmp_page; > + struct mapped_va_node *tmp_mvn; > + vaddr_t gma_start = 0; > + vaddr_t gma_end = 0; > + > + /* this hypercall is called from domain 0, and we don''t know which guest''s > + * vlpt is mapped in xen_second, so, to be sure, we restore vlpt here */ > + restore_vlpt(d);AFAICT you don''t actually use the vlpt here, just the domains list of dirty addresses (which as above could become a bitmap)> + > + spin_lock(&d->arch.dirty.lock); > + > + page_head = d->arch.dirty.head; > + mvn_head = d->arch.dirty.mvn_head; > + get_gma_start_end(d, &gma_start, &gma_end); > + > + while ( page_head ) > + { > + int i; > + > + for ( i = 0; i < mvn_head->items; ++i ) > + { > + unsigned int bit_offset; > + int bitmap_index, bitmap_offset; > + lpae_t *vlp2m_pte; > + > + bit_offset = third_linear_offset(mvn_head->vaddrs[i] - gma_start); > + bitmap_index = bit_offset >> (PAGE_SHIFT + 3); > + bitmap_offset = bit_offset & ((1ul << (PAGE_SHIFT + 3)) - 1); > + > + __set_bit(bitmap_offset, bitmap[bitmap_index]); > + > + vlp2m_pte = get_vlpt_3lvl_pte(mvn_head->vaddrs[i]); > + vlp2m_pte->p2m.write = 0; > + } > + > + tmp_page = page_head; > + page_head = mvn_head->next; > + > + tmp_mvn = mvn_head; > + mvn_head = mvn_head->mvn_next; > + > + unmap_domain_page_global(tmp_mvn); > + free_domheap_page(tmp_page); > + } > + > + d->arch.dirty.head = NULL; > + d->arch.dirty.mvn_head = NULL; > + > + spin_unlock(&d->arch.dirty.lock); > +} > + > + > +/* Change types across all p2m entries in a domain */ > +static void p2m_change_entry_type_global(struct domain *d, enum mg nt) > +{Stefano had some patches to generalise create_p2m_entries() into a p2m walker infrastructure in his series to add XENMEM_exchange_and_pin. That series turned out to be unnecessary but it could be resurrected for use here rather than recoding a new one.> + struct p2m_domain *p2m = &d->arch.p2m; > + vaddr_t ram_base; > + int i1, i2, i3; > + int first_index, second_index, third_index; > + lpae_t *first = __map_domain_page(p2m->first_level); > + lpae_t pte, *second = NULL, *third = NULL; > + > + get_gma_start_end(d, &ram_base, NULL); > + > + first_index = first_table_offset((uint64_t)ram_base); > + second_index = second_table_offset((uint64_t)ram_base); > + third_index = third_table_offset((uint64_t)ram_base); > + > + BUG_ON( !first && "Can''t map first level p2m." ); > + > + spin_lock(&p2m->lock); > + > + for ( i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1 ) > + { > + lpae_walk_t first_pte = first[i1].walk; > + if ( !first_pte.valid || !first_pte.table ) > + goto out; > + > + second = map_domain_page(first_pte.base); > + BUG_ON( !second && "Can''t map second level p2m."); > + for ( i2 = second_index; i2 < LPAE_ENTRIES; ++i2 ) > + { > + lpae_walk_t second_pte = second[i2].walk; > + > + /* a nasty hack for vlpt due to the difference > + * of page table entry layout between p2m and pt */ > + second[i2].pt.ro = 0;What is the hack here? The issue is that the p2m second level, which is a table entry in the p2m is installed into the Xen page tables where it ends up the third level, treated as a block entry. That''s OK, because for both PT and P2M bits 2..10 are ignored for table entries. So I think rather than this hack here we should simply ensure that our non-leaf p2m''s have the correct bits in them to be used as p2m table entries.> + > + if ( !second_pte.valid || !second_pte.table ) > + goto out; > + > + third = map_domain_page(second_pte.base); > + BUG_ON( !third && "Can''t map third level p2m."); > + > + for ( i3 = third_index; i3 < LPAE_ENTRIES; ++i3 ) > + { > + lpae_walk_t third_pte = third[i3].walk; > + int write = 0; > + if ( !third_pte.valid ) > + goto out; > + > + pte = third[i3]; > + if ( pte.p2m.write == 1 && nt == mg_ro ) > + { > + pte.p2m.write = 0; > + write = 1; > + } > + else if ( pte.p2m.write == 0 && nt == mg_rw ) > + { > + pte.p2m.write = 1; > + write = 1; > + } > + if ( write ) > + write_pte(&third[i3], pte); > + } > + unmap_domain_page(third); > + > + third = NULL; > + third_index = 0; > + } > + unmap_domain_page(second); > + > + second = NULL; > + second_index = 0; > + third_index = 0; > + } > + > +out: > + flush_tlb_all_local(); > + if ( third ) unmap_domain_page(third); > + if ( second ) unmap_domain_page(second); > + if ( first ) unmap_domain_page(first); > + > + spin_unlock(&p2m->lock); > +} > + > +/* Read a domain''s log-dirty bitmap and stats. > + * If the operation is a CLEAN, clear the bitmap and stats. */ > +int log_dirty_op(struct domain *d, xen_domctl_shadow_op_t *sc)Can this not be static?> +{ > + unsigned long gmfn_start; > + unsigned long gmfn_end; > + unsigned long gmfns; > + unsigned int bitmap_pages; > + int rc = 0, clean = 0, peek = 1; > + uint8_t *bitmap[256]; /* bitmap[256] covers 32GB ram */ > + int i; > + > + BUG_ON( !d->arch.map_domain.nr_banks ); > + > + gmfn_start = d->arch.map_domain.bank[0].start >> PAGE_SHIFT; > + gmfn_end = domain_get_maximum_gpfn(d); > + gmfns = gmfn_end - gmfn_start; > + bitmap_pages = PFN_UP((gmfns + 7) / 8); > + > + if ( guest_handle_is_null(sc->dirty_bitmap) ) > + { > + peek = 0; > + } > + else > + { > + /* mapping to the bitmap from guest param */ > + vaddr_t to = (vaddr_t)sc->dirty_bitmap.p;This is not allowed, please use the guest handle interfaces and copy_(to/from)_(user/guest) instead of diving into the guest handle struct and open coding it yourself. This might mean creating a temporary bitmap in hypervisor space, but if you maintain the bitmap across the whole dirty period as suggested then you should be able to simply copy it out. Actually, for consistency you might need an atomic copy and clear mechanism (otherwise you can perhaps loose updates), in which case you''ll need a temporary buffer.> + > + BUG_ON( to & ~PAGE_MASK && "offset not aligned to PAGE SIZE"); > + > + for ( i = 0; i < bitmap_pages; ++i ) > + { > + paddr_t g; > + rc = gvirt_to_maddr(to, &g); > + if ( rc ) > + return rc; > + bitmap[i] = map_domain_page(g>>PAGE_SHIFT); > + memset(bitmap[i], 0x00, PAGE_SIZE); > + to += PAGE_SIZE; > + } > + } > + > + clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN); > + sc->stats.dirty_count = d->arch.dirty.count; > + > + get_dirty_bitmap(d, bitmap); > + if ( peek ) > + { > + for ( i = 0; i < bitmap_pages; ++i ) > + { > + unmap_domain_page(bitmap[i]); > + } > + } > + > + return 0; > +} > + > +long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc) > +{ > + long ret = 0; > + switch (sc->op) > + { > + case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY: > + case XEN_DOMCTL_SHADOW_OP_OFF: > + { > + enum mg nt = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? mg_rw : mg_ro; > + > + d->arch.dirty.mode = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? 0 : 1; > + p2m_change_entry_type_global(d, nt); > + > + if ( sc->op == XEN_DOMCTL_SHADOW_OP_OFF ) > + cleanup_vlpt(d); > + else > + prepare_vlpt(d); > + } > + break; > + > + case XEN_DOMCTL_SHADOW_OP_CLEAN: > + case XEN_DOMCTL_SHADOW_OP_PEEK: > + { > + ret = log_dirty_op(d, sc); > + } > + break; > + > + default: > + return -ENOSYS; > + } > + return ret; > +} > + > /* > * Local variables: > * mode: C > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > index 4c0fc32..3b78ed2 100644 > --- a/xen/arch/arm/traps.c > +++ b/xen/arch/arm/traps.c > @@ -1313,6 +1313,8 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, > const char *msg; > int rc, level = -1; > mmio_info_t info; > + int page_fault = ((dabt.dfsc & FSC_MASK) => + (FSC_FLT_PERM + FSC_3D_LEVEL) && dabt.write); > > if ( !check_conditional_instr(regs, hsr) ) > { > @@ -1327,22 +1329,23 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, > info.gva = READ_SYSREG64(FAR_EL2); > #endif > > - if (dabt.s1ptw) > + if ( dabt.s1ptw && !page_fault ) > goto bad_data_abort;I see this has been commented on elsewhere. Ian.
Ian Campbell
2013-Oct-16 13:55 UTC
Re: [PATCH v4 9/9] xen/arm: Implement toolstack for xl restore/save and migrate
On Fri, 2013-10-04 at 13:44 +0900, Jaeyong Yoo wrote:> TARGETS-y := xenperf xenpm xen-tmem-list-parse gtraceview gtracestat xenlockprof xenwatchdogd xencov > TARGETS-$(CONFIG_X86) += xen-detect xen-hvmctx xen-hvmcrash xen-lowmemd xen-mfndump > +ifeq ($(CONFIG_X86),y) > TARGETS-$(CONFIG_MIGRATE) += xen-hptool > +endifI know this isn''t your fault but I cannot imagine what xen-hptool (memory hotplug tool) has to do with CONFIG_MIGRATE. If it doesn''t build for ARM better to make it use TARGETS-$(CONFIG_X86) I think.> TARGETS := $(TARGETS-y) > > SUBDIRS := $(SUBDIRS-y) > @@ -23,7 +25,9 @@ INSTALL_BIN := $(INSTALL_BIN-y) > INSTALL_SBIN-y := xen-bugtool xen-python-path xenperf xenpm xen-tmem-list-parse gtraceview \ > gtracestat xenlockprof xenwatchdogd xen-ringwatch xencov > INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx xen-hvmcrash xen-lowmemd xen-mfndump > +ifeq ($(CONFIG_X86),y) > INSTALL_SBIN-$(CONFIG_MIGRATE) += xen-hptool > +endif > INSTALL_SBIN := $(INSTALL_SBIN-y) > > INSTALL_PRIVBIN-y := xenpvnetboot
Jaeyong Yoo
2013-Oct-17 02:14 UTC
Re: [PATCH v4 2/9] xen/arm: Add more registers for saving and restoring vcpu registers
> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of Jaeyong Yoo > Sent: Monday, October 14, 2013 1:39 PM > To: ''Ian Campbell''; ''Tim Deegan'' > Cc: ''Stefano Stabellini''; ''Keir Fraser''; ''Jan Beulich''; xen- > devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH v4 2/9] xen/arm: Add more registers for > saving and restoring vcpu registers > > > -----Original Message----- > > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > > bounces@lists.xen.org] On Behalf Of Ian Campbell > > Sent: Friday, October 11, 2013 7:26 PM > > To: Tim Deegan > > Cc: Keir Fraser; Stefano Stabellini; Jan Beulich; Jaeyong Yoo; xen- > > devel@lists.xen.org > > Subject: Re: [Xen-devel] [PATCH v4 2/9] xen/arm: Add more registers > > for saving and restoring vcpu registers > > > > On Fri, 2013-10-11 at 11:22 +0100, Tim Deegan wrote: > > > At 09:43 +0100 on 11 Oct (1381484614), Ian Campbell wrote: > > > > On Fri, 2013-10-11 at 17:30 +0900, Jaeyong Yoo wrote: > > > > > > -----Original Message----- > > > > > > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > > > > > > bounces@lists.xen.org] On Behalf Of Ian Campbell > > > > > > Sent: Thursday, October 10, 2013 7:41 PM > > > > > > To: Jaeyong Yoo > > > > > > Cc: Tim Deegan; xen-devel@lists.xen.org > > > > > > Subject: Re: [Xen-devel] [PATCH v4 2/9] xen/arm: Add more > > > > > > registers for saving and restoring vcpu registers > > > > > > > > > > > > On Fri, 2013-10-04 at 13:43 +0900, Jaeyong Yoo wrote: > > > > > > > diff --git a/xen/include/public/arch-arm.h > > > > > > > b/xen/include/public/arch-arm.h index 5d359af..bf6dc9a > > > > > > > 100644 > > > > > > > --- a/xen/include/public/arch-arm.h > > > > > > > +++ b/xen/include/public/arch-arm.h > > > > > > > @@ -253,6 +253,41 @@ struct vcpu_guest_context { > > > > > > > > > > > > > > uint32_t sctlr, ttbcr; > > > > > > > uint64_t ttbr0, ttbr1; > > > > > > > + uint32_t ifar, dfar; > > > > > > > + uint32_t ifsr, dfsr; > > > > > > > + uint32_t dacr; > > > > > > > + uint64_t par; > > > > > > > + > > > > > > > +#ifdef CONFIG_ARM_32 > > > > > > > > > > > > I''m afraid a per arch ifdef isn''t allowed in the > > > > > > include/public > > tree. > > > > > > The interface should be identical for both 32 and 64 bit > > > > > > callers. Also think of 32-on-64 guests etc. > > > > > > > > > > > > Also, this struct is guest facing (via VCPUOP_initialise) but > > > > > > many/all of these new registers are not things which a guest > > > > > > needs to specify via a hypercall. IOW I think many of them > > > > > > should be part of some toolstack private save/restore interface. > > > > > > > > > > I see, the guest can specify something like sctlr, and > > > > > ttbr/ttbcr, and the others should be hidden inside hvm > save/restore. > > > > > > > > Right, the important thing is that all that additional state is > > > > only visible to the toolstack and the hypervisor, not to guests. > > > > > > > > Actually the guest shouldn''t really see this interface anyway, > > > > that''s really a hold over from x86. On ARM only the toolstack > > > > really needs to use this struct. > > > > > > > > I wonder if we can drop struct vcpu_guest_context from the guest > > > > facing ABI on ARM. I see that we already don''t expose > > > > VCPUOP_initialise and the only other user is > > XEN_DOMCTL_(sg)etvcpucontext. > > > > > > Does the guest need to have those on ARM? How are you arranging SMP > > > guest AP bringup? If it''s using SCI then maybe that hypercall > > > interface can be dropped. > > > > SMP bring up is done via PSCI, which is the firmware "hypercall" > > interface, defined by ARM for bringing up physucal CPUs which we > > implement within Xen for the guests'' benefit. > > So, could I remove XEN_DOMCTL_(sg)etvcpucontext and use HVM save/restore > for migrating vcpu registers? > > And, if I could, arch_get_info_guest becomes dangling function and would > it be better to remove this function too? or better to keep it for > symmetry?Are these questions missed?> > Jaeyong > > > > > Ian > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Jaeyong Yoo
2013-Oct-17 08:58 UTC
Re: [PATCH v4 7/9] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration
> -----Original Message----- > From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > Sent: Wednesday, October 16, 2013 9:51 PM > To: Jaeyong Yoo > Cc: xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH v4 7/9] xen/arm: Implement virtual-linear > page table for guest p2m mapping in live migration > > On Fri, 2013-10-04 at 13:44 +0900, Jaeyong Yoo wrote: > > Implement vlpt (virtual-linear page table) for fast accessing of 3rd > > PTE of guest p2m. When creating a mapping for vlpt, just copy the 1st > > level PTE of guest p2m to the xen''s 2nd level PTE. Then the mapping > > becomes the following: > > xen''s 1st PTE --> > > xen''s 2nd PTE (which is the same as 1st PTE of guest p2m) --> > > guest p2m''s 2nd PTE --> > > guest p2m''s 3rd PTE (the memory contents where the vlpt points) > > For more info about vlpt, see: > > http://www.technovelty.org/linux/virtual-linear-page-table.html > > > > This function is used in dirty-page tracing: when domU write-fault is > > trapped by xen, xen can immediately locate the 3rd PTE of guest p2m. > > ISTR you had some numbers which backed up the choice of this solution. > Can you either include them here or link to the list archives?I think link to the list is OK. The full text is too long, I guess.> > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > > --- > > xen/arch/arm/mm.c | 78 > ++++++++++++++++++++++++++++++++++++++++++++ > > xen/include/asm-arm/config.h | 6 ++++ xen/include/asm-arm/domain.h > > | 7 ++++ > > xen/include/asm-arm/mm.h | 22 +++++++++++++ > > 4 files changed, 113 insertions(+) > > > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index > > 120eae8..98edbc9 100644 > > --- a/xen/arch/arm/mm.c > > +++ b/xen/arch/arm/mm.c > > @@ -1382,6 +1382,84 @@ int is_iomem_page(unsigned long mfn) > > return 1; > > return 0; > > } > > + > > +/* flush the vlpt area */ > > +static void flush_vlpt(struct domain *d) { > > + int flush_size; > > + flush_size = (d->arch.dirty.second_lvl_end - > > + d->arch.dirty.second_lvl_start) << SECOND_SHIFT; > > + /* flushing the 3rd level mapping */ > > + flush_xen_data_tlb_range_va(VIRT_LIN_P2M_START, > > + flush_size); > > Shouldn''t the base here be VIRT_LIN_P2M_START + > (d->arch.dirty.second_lvl_start) << SECOND_SHIFT) or something like that?Yes, right. It will also decrease the flush overhead.> > flush_xen_data_tlb_range_va just turns into a loop over the addresses, so > you might find you may as well do the flushes as you update the ptes in > the below, perhaps with an optimisation to pull the barriers outside that > loop.You mean the barriers in write_pte? OK. It looks better.> > > +} > > + > > +/* restore the xen page table for vlpt mapping for domain d */ void > > +restore_vlpt(struct domain *d) { > > + int i, j = 0; > > + int need_flush = 0; > > + > > + for ( i = d->arch.dirty.second_lvl_start; > > + i < d->arch.dirty.second_lvl_end; > > + ++i, ++j ) > > + { > > + if ( xen_second[i].bits != d->arch.dirty.second_lvl[j].bits ) > > + { > > + write_pte(&xen_second[i], d->arch.dirty.second_lvl[j]); > > + need_flush = 1; > > + } > > + } > > + > > + if ( need_flush ) flush_vlpt(d); > > +} > > + > > +/* setting up the xen page table for vlpt mapping for domain d */ > > +void prepare_vlpt(struct domain *d) { > > + int xen_second_linear_base; > > + int gp2m_start_index, gp2m_end_index; > > + struct p2m_domain *p2m = &d->arch.p2m; > > + struct page_info *second_lvl_page; > > + vaddr_t gma_start = 0; > > + vaddr_t gma_end = 0; > > + lpae_t *first; > > + int i, j; > > + > > + xen_second_linear_base = second_linear_offset(VIRT_LIN_P2M_START); > > + get_gma_start_end(d, &gma_start, &gma_end); > > + > > + gp2m_start_index = gma_start >> FIRST_SHIFT; > > + gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1; > > + > > + second_lvl_page = alloc_domheap_page(NULL, 0); > > The p2m first is two concatenated pages with a total of 1024 entries > (which is needed to give the full 40 bit IPA space). I have a feeling this > means you need two pages here? > > Or maybe we shouldn''t be supporting the full 40 bit addresses on 32-bit?For generality, I think it is better to support it. But, honestly, I''m not sure how many people would use 40 bit ARM guest.> > NB currently create_p2m_entries doesn''t actually support 40 bits, but I > need to fix that for an arm64 platform I''m looking at. > > > + d->arch.dirty.second_lvl = map_domain_page_global( > > + page_to_mfn(second_lvl_page) ); > > + > > + first = __map_domain_page(p2m->first_level); > > + for ( i = gp2m_start_index, j = 0; i < gp2m_end_index; ++i, ++j ) > > + { > > + write_pte(&xen_second[xen_second_linear_base+i], > > + first[i]); > > + > > + /* we copy the mapping into domain''s structure as a reference in > case > > + * of the context switch (used in restore_vlpt) */ > > + d->arch.dirty.second_lvl[j] = first[i]; > > + } > > + unmap_domain_page(first); > > + > > + /* storing the start and end index */ > > + d->arch.dirty.second_lvl_start = xen_second_linear_base + > gp2m_start_index; > > + d->arch.dirty.second_lvl_end = xen_second_linear_base + > > + gp2m_end_index; > > + > > + flush_vlpt(d); > > +} > > + > > +void cleanup_vlpt(struct domain *d) > > +{ > > + unmap_domain_page_global(d->arch.dirty.second_lvl); > > +} > > + > > /* > > * Local variables: > > * mode: C > > diff --git a/xen/include/asm-arm/config.h > > b/xen/include/asm-arm/config.h index 9e395c2..47fc26e 100644 > > --- a/xen/include/asm-arm/config.h > > +++ b/xen/include/asm-arm/config.h > > @@ -87,6 +87,7 @@ > > * 0 - 8M <COMMON> > > * > > * 32M - 128M Frametable: 24 bytes per page for 16GB of RAM > > + * 128M - 256M Virtual-linear mapping to P2M table > > * 256M - 1G VMAP: ioremap and early_ioremap use this virtual address > > * space > > * > > @@ -132,6 +133,9 @@ > > > > #define VMAP_VIRT_END XENHEAP_VIRT_START > > > > +#define VIRT_LIN_P2M_START _AT(vaddr_t,0x08000000) > > Can you put this up in the list of addresses in order please. > > > +#define VIRT_LIN_P2M_END VMAP_VIRT_START > > > + > > #define DOMHEAP_ENTRIES 1024 /* 1024 2MB mapping slots */ > > > > /* Number of domheap pagetable pages required at the second level > > (2MB mappings) */ @@ -158,6 +162,8 @@ > > > > #define HYPERVISOR_VIRT_END DIRECTMAP_VIRT_END > > > > +/*TODO (ARM_64): define VIRT_LIN_P2M_START VIRT_LIN_P2M_END */ > > + > > #endif > > > > /* Fixmap slots */ > > diff --git a/xen/include/asm-arm/domain.h > > b/xen/include/asm-arm/domain.h index 755c7af..7e7743a 100644 > > --- a/xen/include/asm-arm/domain.h > > +++ b/xen/include/asm-arm/domain.h > > @@ -112,6 +112,13 @@ struct arch_domain > > spinlock_t lock; > > } vuart; > > > > + /* dirty-page tracing */ > > + struct { > > + int second_lvl_start; /* for context switch */ > > + int second_lvl_end; > > + lpae_t *second_lvl; > > + } dirty; > > + > > struct dt_mem_info map_domain; > > spinlock_t map_lock; > > } __cacheline_aligned; > > diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index > > 021e8d6..b9cbcf2 100644 > > --- a/xen/include/asm-arm/mm.h > > +++ b/xen/include/asm-arm/mm.h > > @@ -344,6 +344,28 @@ static inline void put_page_and_type(struct > > page_info *page) > > > > void get_gma_start_end(struct domain *d, vaddr_t *start, vaddr_t > > *end); > > > > +/* vlpt (virtual linear page table) related functions */ void > > +prepare_vlpt(struct domain *d); void cleanup_vlpt(struct domain *d); > > +void restore_vlpt(struct domain *d); > > + > > +/* calculate the xen''s virtual address for accessing the leaf PTE of > > + * a given address (GPA) */ > > +static inline lpae_t * get_vlpt_3lvl_pte(vaddr_t addr) > > Argument is an IPA and should therefore be a paddr_t I think?Oh, right, and the type in traps.c is also paddr_t. It is my mistake.> > > +{ > > + unsigned long ipa_third = third_table_offset(addr); > > + unsigned long ipa_second = second_table_offset(addr); > > + unsigned long ipa_first = first_table_offset(addr); > > + > > + /* guest p2m''s first level index is the index of xen''s second level, > > + and guest p2m''s second level index is the index of xen''s third > level */ > > + return (lpae_t *)( > > + ( (second_linear_offset(VIRT_LIN_P2M_START) + ipa_first) > > + << SECOND_SHIFT) + > > + (ipa_second << THIRD_SHIFT) + > > + (ipa_third << 3) ); > > +} > > Can we assign the table base to an "lpae_t *table" and then use > &table[ipa_third] rather than hardcoding << 3?OK.> > I can''t quite shake the feeling that the "ipa_first<<SECOND_SHIFT" and > "ipa_second<<THIRD_SHIFT" elements of this could be achieved with a single > shift of the original address (and maybe a mask).Right, just one shift with LPAE_SHIFT on the addr is sufficient. I''m surprise you figure this out immediately.> > > + > > #endif /* __ARCH_ARM_MM__ */ > > /* > > * Local variables:
Ian Campbell
2013-Oct-17 10:01 UTC
Re: [PATCH v4 2/9] xen/arm: Add more registers for saving and restoring vcpu registers
On Thu, 2013-10-17 at 11:14 +0900, Jaeyong Yoo wrote:> > -----Original Message----- > > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > > bounces@lists.xen.org] On Behalf Of Jaeyong Yoo > > Sent: Monday, October 14, 2013 1:39 PM > > To: ''Ian Campbell''; ''Tim Deegan'' > > Cc: ''Stefano Stabellini''; ''Keir Fraser''; ''Jan Beulich''; xen- > > devel@lists.xen.org > > Subject: Re: [Xen-devel] [PATCH v4 2/9] xen/arm: Add more registers for > > saving and restoring vcpu registers > > > > > -----Original Message----- > > > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > > > bounces@lists.xen.org] On Behalf Of Ian Campbell > > > Sent: Friday, October 11, 2013 7:26 PM > > > To: Tim Deegan > > > Cc: Keir Fraser; Stefano Stabellini; Jan Beulich; Jaeyong Yoo; xen- > > > devel@lists.xen.org > > > Subject: Re: [Xen-devel] [PATCH v4 2/9] xen/arm: Add more registers > > > for saving and restoring vcpu registers > > > > > > On Fri, 2013-10-11 at 11:22 +0100, Tim Deegan wrote: > > > > At 09:43 +0100 on 11 Oct (1381484614), Ian Campbell wrote: > > > > > On Fri, 2013-10-11 at 17:30 +0900, Jaeyong Yoo wrote: > > > > > > > -----Original Message----- > > > > > > > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > > > > > > > bounces@lists.xen.org] On Behalf Of Ian Campbell > > > > > > > Sent: Thursday, October 10, 2013 7:41 PM > > > > > > > To: Jaeyong Yoo > > > > > > > Cc: Tim Deegan; xen-devel@lists.xen.org > > > > > > > Subject: Re: [Xen-devel] [PATCH v4 2/9] xen/arm: Add more > > > > > > > registers for saving and restoring vcpu registers > > > > > > > > > > > > > > On Fri, 2013-10-04 at 13:43 +0900, Jaeyong Yoo wrote: > > > > > > > > diff --git a/xen/include/public/arch-arm.h > > > > > > > > b/xen/include/public/arch-arm.h index 5d359af..bf6dc9a > > > > > > > > 100644 > > > > > > > > --- a/xen/include/public/arch-arm.h > > > > > > > > +++ b/xen/include/public/arch-arm.h > > > > > > > > @@ -253,6 +253,41 @@ struct vcpu_guest_context { > > > > > > > > > > > > > > > > uint32_t sctlr, ttbcr; > > > > > > > > uint64_t ttbr0, ttbr1; > > > > > > > > + uint32_t ifar, dfar; > > > > > > > > + uint32_t ifsr, dfsr; > > > > > > > > + uint32_t dacr; > > > > > > > > + uint64_t par; > > > > > > > > + > > > > > > > > +#ifdef CONFIG_ARM_32 > > > > > > > > > > > > > > I''m afraid a per arch ifdef isn''t allowed in the > > > > > > > include/public > > > tree. > > > > > > > The interface should be identical for both 32 and 64 bit > > > > > > > callers. Also think of 32-on-64 guests etc. > > > > > > > > > > > > > > Also, this struct is guest facing (via VCPUOP_initialise) but > > > > > > > many/all of these new registers are not things which a guest > > > > > > > needs to specify via a hypercall. IOW I think many of them > > > > > > > should be part of some toolstack private save/restore interface. > > > > > > > > > > > > I see, the guest can specify something like sctlr, and > > > > > > ttbr/ttbcr, and the others should be hidden inside hvm > > save/restore. > > > > > > > > > > Right, the important thing is that all that additional state is > > > > > only visible to the toolstack and the hypervisor, not to guests. > > > > > > > > > > Actually the guest shouldn''t really see this interface anyway, > > > > > that''s really a hold over from x86. On ARM only the toolstack > > > > > really needs to use this struct. > > > > > > > > > > I wonder if we can drop struct vcpu_guest_context from the guest > > > > > facing ABI on ARM. I see that we already don''t expose > > > > > VCPUOP_initialise and the only other user is > > > XEN_DOMCTL_(sg)etvcpucontext. > > > > > > > > Does the guest need to have those on ARM? How are you arranging SMP > > > > guest AP bringup? If it''s using SCI then maybe that hypercall > > > > interface can be dropped. > > > > > > SMP bring up is done via PSCI, which is the firmware "hypercall" > > > interface, defined by ARM for bringing up physucal CPUs which we > > > implement within Xen for the guests'' benefit. > > > > So, could I remove XEN_DOMCTL_(sg)etvcpucontext and use HVM save/restore > > for migrating vcpu registers? > > > > And, if I could, arch_get_info_guest becomes dangling function and would > > it be better to remove this function too? or better to keep it for > > symmetry? > > Are these questions missed?Yes, sorry! In principal we might be able to remove XEN_DOMCTL_(sg)etvcpucontext as you suggest but I''m not sure to what extent tools are at liberty to poke inside HVM save/restore blobs? I suppose they are allowed to and can ignore compatibility stuff like supporting older version''s layouts? But to be honest I think it''s probably easier to keep the domctls, so that common(ish) code like xenctx.c continues to work for dumping basic processor state to distinguish between platforms which use domctl and those which use hvmsave.> > > > > Jaeyong > > > > > > > > Ian > > > > > > > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@lists.xen.org > > > http://lists.xen.org/xen-devel > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel >
Ian Campbell
2013-Oct-17 10:06 UTC
Re: [PATCH v4 7/9] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration
On Thu, 2013-10-17 at 17:58 +0900, Jaeyong Yoo wrote:> > > @@ -1382,6 +1382,84 @@ int is_iomem_page(unsigned long mfn) > > > return 1; > > > return 0; > > > } > > > + > > > +/* flush the vlpt area */ > > > +static void flush_vlpt(struct domain *d) { > > > + int flush_size; > > > + flush_size = (d->arch.dirty.second_lvl_end - > > > + d->arch.dirty.second_lvl_start) << SECOND_SHIFT; > > > + /* flushing the 3rd level mapping */ > > > + flush_xen_data_tlb_range_va(VIRT_LIN_P2M_START, > > > + flush_size); > > > > Shouldn''t the base here be VIRT_LIN_P2M_START + > > (d->arch.dirty.second_lvl_start) << SECOND_SHIFT) or something like that? > > Yes, right. It will also decrease the flush overhead. > > > > > flush_xen_data_tlb_range_va just turns into a loop over the addresses, so > > you might find you may as well do the flushes as you update the ptes in > > the below, perhaps with an optimisation to pull the barriers outside that > > loop. > > You mean the barriers in write_pte? OK. It looks better.I meant the ones in flush_xen_data_tlb_range_va, but perhaps the ones in write_pte too. I suppose in both cases are __foo variant without the barriers could be made so we can do dsb for each page __write_pte __flush_xen_data.... dsb isb (or whatever the right barriers are!)> > > +/* setting up the xen page table for vlpt mapping for domain d */ > > > +void prepare_vlpt(struct domain *d) { > > > + int xen_second_linear_base; > > > + int gp2m_start_index, gp2m_end_index; > > > + struct p2m_domain *p2m = &d->arch.p2m; > > > + struct page_info *second_lvl_page; > > > + vaddr_t gma_start = 0; > > > + vaddr_t gma_end = 0; > > > + lpae_t *first; > > > + int i, j; > > > + > > > + xen_second_linear_base = second_linear_offset(VIRT_LIN_P2M_START); > > > + get_gma_start_end(d, &gma_start, &gma_end); > > > + > > > + gp2m_start_index = gma_start >> FIRST_SHIFT; > > > + gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1; > > > + > > > + second_lvl_page = alloc_domheap_page(NULL, 0); > > > > The p2m first is two concatenated pages with a total of 1024 entries > > (which is needed to give the full 40 bit IPA space). I have a feeling this > > means you need two pages here? > > > > Or maybe we shouldn''t be supporting the full 40 bit addresses on 32-bit? > > For generality, I think it is better to support it. But, honestly, I''m not > sure how many people would use 40 bit ARM guest.All it takes is one peripheral at a high address, even if you aren''t using a large amount of RAM etc. I''ve got a 64-bit system on my desk which has stuff up in the 50 bit range. It seems less likely to happen on 32-bit though... Ian.
Jaeyong Yoo
2013-Oct-17 10:12 UTC
Re: [PATCH v4 8/9] xen/arm: Implement hypercall for dirty page tracing
> -----Original Message----- > From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > Sent: Wednesday, October 16, 2013 10:45 PM > To: Jaeyong Yoo > Cc: xen-devel@lists.xen.org; Stefano Stabellini > Subject: Re: [Xen-devel] [PATCH v4 8/9] xen/arm: Implement hypercall for > dirty page tracing > > On Fri, 2013-10-04 at 13:44 +0900, Jaeyong Yoo wrote: > > Add hypercall (shadow op: enable/disable and clean/peek dirted page > bitmap). > > "dirtied" > > > It consists of two parts: dirty page detecting and saving. > > For detecting, we setup the guest p2m''s leaf PTE read-only and > > whenever the guest tries to write something, permission fault happens > and traps into xen. > > The permission-faulted GPA should be saved for the toolstack (when it > > wants to see which pages are dirted). For this purpose, we temporarily > > save the GPAs into linked > > "dirtied" again > > > list by using ''add_mapped_vaddr'' function and when toolstack wants > > (log_dirty_op function) the GPAs are copied into bitmap and the linnked > list is flushed. > > "linked"Oops, typos.> > > spin_lock_init(&d->arch.map_lock); > > d->arch.map_domain.nr_banks = 0; > > > > + /* init for dirty-page tracing */ > > + d->arch.dirty.count = 0; > > + d->arch.dirty.mode = 0; > > + d->arch.dirty.head = NULL; > > + d->arch.dirty.mvn_head = NULL; > > + spin_lock_init(&d->arch.dirty.lock); > > + > > + d->arch.dirty.second_lvl_start = 0; > > + d->arch.dirty.second_lvl_end = 0; > > + d->arch.dirty.second_lvl = NULL; > > I think some/all of these belong in the previous patch which added the > fields.Oops, right.> > > + > > clear_page(d->shared_info); > > share_xen_page_with_guest( > > virt_to_page(d->shared_info), d, XENSHARE_writable); diff > > --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c index > > 78fb322..7edce12 100644 > > --- a/xen/arch/arm/domctl.c > > +++ b/xen/arch/arm/domctl.c > > @@ -92,6 +92,19 @@ long arch_do_domctl(struct xen_domctl *domctl, struct > domain *d, > > xfree(c.data); > > } > > break; > > + case XEN_DOMCTL_shadow_op: > > + { > > + domain_pause(d); > > + ret = dirty_mode_op(d, &domctl->u.shadow_op); > > + domain_unpause(d); > > + > > + if ( (&domctl->u.shadow_op)->op == XEN_DOMCTL_SHADOW_OP_CLEAN || > > + (&domctl->u.shadow_op)->op == XEN_DOMCTL_SHADOW_OP_PEEK > > + ) > > (&domctl->u.shadow_op)->op is just a weird way to write > domctl->u.shadow_op.op isn''t it?Right!> > Do you only want to copyback on success or is it always correct?Oh, I think I missed the else condition here.> > > + { > > + copyback = 1; > > + } > > + } > > + break; > > > > @@ -1460,6 +1459,28 @@ void cleanup_vlpt(struct domain *d) > > unmap_domain_page_global(d->arch.dirty.second_lvl); > > } > > > > +/* routine for dirty-page tracing > > + * > > + * On first write, it page faults, its entry is changed to > > +read-write, > > + * and on retry the write succeeds. > > + * > > + * for locating p2m of the faulting entry, we use virtual-linear page > table. > > + */ > > +void handle_page_fault(struct domain *d, paddr_t addr) { > > + lpae_t *vlp2m_pte = get_vlpt_3lvl_pte(addr); > > + if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0 ) > > I think you need to distinguish p2m entries which are r/o due to log dirty > from those which are r/o due to wanting them to be r/o. > > Maybe we don''t have the latter right now? We will eventually need p2m > types I think.Yes, that should be distinguished. Actually, that was in my mind and apparently I failed to remember. For doing this, I''m thinking to use un-used field in pte as an indicator of ''wanting to be r/o'' and check this indicator in permission fault.> > > + { > > + lpae_t pte = *vlp2m_pte; > > + pte.p2m.write = 1; > > + write_pte(vlp2m_pte, pte); > > + flush_tlb_local(); > > + > > + /* in order to remove mappings in cleanup stage */ > > + add_mapped_vaddr(d, addr); > > No locks or atomic operations here? How are races with the tools reading > the dirty bitmap addressed? If it is by clever ordering of the checks and > pte writes then I think a comment would be in order.Actually, the lock is inside the add_mapped_vaddr function.> > > + } > > +} > > + > > /* > > * Local variables: > > * mode: C > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index > > 2d09fef..3b2b11a 100644 > > --- a/xen/arch/arm/p2m.c > > +++ b/xen/arch/arm/p2m.c > > @@ -6,6 +6,8 @@ > > #include <xen/bitops.h> > > #include <asm/flushtlb.h> > > #include <asm/gic.h> > > +#include <xen/guest_access.h> > > +#include <xen/pfn.h> > > > > void dump_p2m_lookup(struct domain *d, paddr_t addr) { @@ -408,6 > > +410,304 @@ unsigned long gmfn_to_mfn(struct domain *d, unsigned long > gpfn) > > return p >> PAGE_SHIFT; > > } > > > > +#define MAX_VA_PER_NODE (PAGE_SIZE - 2 * sizeof(void *) - sizeof(int))\ > > + / sizeof(unsigned long) > > + > > +/* An array-based linked list for storing virtual addresses (i.e., > > +the 3rd > > + * level table mapping) that should be destroyed after live migration > > +*/ struct mapped_va_node { > > + struct page_info *next; > > + struct mapped_va_node *mvn_next; > > + int items; > > + unsigned long vaddrs[MAX_VA_PER_NODE]; }; > > + > > +int add_mapped_vaddr(struct domain *d, unsigned long va) { > > This function seems awefuly expensive (contains allocations etc) for a > function called on every page fault during a migration. > > Why are you remembering the full va of every fault when a single bit per > page would do? > > I think you can allocate a bitmap when logdirty is enabled. At a bit per > page it shouldn''t be too huge. > > You might be able to encode this in the p2m which you walk when the tools > ask for the bit map (i.e. p2m.type==writeable p2m.write==0 => dirty), but > I think a bitmap would do the trick and be easier to implement.There are three candidates: 1) bitmap 2) encoding into p2m 3) linked list of VAs. And, two functions for dirty-page tracing: A) dirty-page handling (At trap) B) getting dirty bitmap (for toolstack) 1) and 2) have advantage for A) but have to search the full range of pages at B) to find out which page is dirtied and set the page as read-only for next dirty detection. On the otherhand, 3) does not need to search the full range at B). I prefer 3) due to the ''non-full range search'' but I understand your concern. Maybe I can do some measurement for each method to see which one better.> > > + struct page_info *page_head; > > + struct mapped_va_node *mvn_head; > > + > > + spin_lock(&d->arch.dirty.lock); > > + > > + page_head = d->arch.dirty.head; > > + mvn_head = d->arch.dirty.mvn_head; > > + if ( !mvn_head ) > > + { > > + page_head = alloc_domheap_page(NULL, 0); > > + if ( !page_head ) > > + { > > + spin_unlock(&d->arch.dirty.lock); > > + return -ENOMEM; > > + } > > + > > + mvn_head = map_domain_page_global(__page_to_mfn(page_head)); > > + mvn_head->items = 0; > > + mvn_head->next = NULL; > > + mvn_head->mvn_next = NULL; > > + d->arch.dirty.head = page_head; > > + d->arch.dirty.mvn_head = mvn_head; > > + } > > + > > + if ( mvn_head->items == MAX_VA_PER_NODE ) > > + { > > + struct page_info *page; > > + > > + page = alloc_domheap_page(NULL, 0); > > + if ( !page ) > > + { > > + spin_unlock(&d->arch.dirty.lock); > > + return -ENOMEM; > > + } > > + > > + mvn_head = map_domain_page_global(__page_to_mfn(page)); > > + mvn_head->items = 0; > > + mvn_head->next = d->arch.dirty.head; > > + mvn_head->mvn_next = d->arch.dirty.mvn_head; > > + > > + d->arch.dirty.mvn_head = mvn_head; > > + d->arch.dirty.head = page; > > + } > > + > > + mvn_head->vaddrs[mvn_head->items] = va; > > + mvn_head->items ++; > > + > > + spin_unlock(&d->arch.dirty.lock); > > + return 0; > > +} > > + > > +/* get the dirty bitmap from the linked list stored by > > +add_mapped_vaddr > > + * and also clear the mapped vaddrs linked list */ static void > > +get_dirty_bitmap(struct domain *d, uint8_t *bitmap[]) { > > + struct page_info *page_head; > > + struct mapped_va_node *mvn_head; > > + struct page_info *tmp_page; > > + struct mapped_va_node *tmp_mvn; > > + vaddr_t gma_start = 0; > > + vaddr_t gma_end = 0; > > + > > + /* this hypercall is called from domain 0, and we don''t know which > guest''s > > + * vlpt is mapped in xen_second, so, to be sure, we restore vlpt > here */ > > + restore_vlpt(d); > > AFAICT you don''t actually use the vlpt here, just the domains list of > dirty addresses (which as above could become a bitmap)I think vlpt is still needed because at this stage, we have to reset the write permission of dirtied pages. Maybe some tricks for efficiently doing this?> > > + > > + spin_lock(&d->arch.dirty.lock); > > + > > + page_head = d->arch.dirty.head; > > + mvn_head = d->arch.dirty.mvn_head; > > + get_gma_start_end(d, &gma_start, &gma_end); > > + > > + while ( page_head ) > > + { > > + int i; > > + > > + for ( i = 0; i < mvn_head->items; ++i ) > > + { > > + unsigned int bit_offset; > > + int bitmap_index, bitmap_offset; > > + lpae_t *vlp2m_pte; > > + > > + bit_offset = third_linear_offset(mvn_head->vaddrs[i] - > gma_start); > > + bitmap_index = bit_offset >> (PAGE_SHIFT + 3); > > + bitmap_offset = bit_offset & ((1ul << (PAGE_SHIFT + 3)) - > > + 1); > > + > > + __set_bit(bitmap_offset, bitmap[bitmap_index]); > > + > > + vlp2m_pte = get_vlpt_3lvl_pte(mvn_head->vaddrs[i]); > > + vlp2m_pte->p2m.write = 0; > > + } > > + > > + tmp_page = page_head; > > + page_head = mvn_head->next; > > + > > + tmp_mvn = mvn_head; > > + mvn_head = mvn_head->mvn_next; > > + > > + unmap_domain_page_global(tmp_mvn); > > + free_domheap_page(tmp_page); > > + } > > + > > + d->arch.dirty.head = NULL; > > + d->arch.dirty.mvn_head = NULL; > > + > > + spin_unlock(&d->arch.dirty.lock); } > > + > > + > > +/* Change types across all p2m entries in a domain */ static void > > +p2m_change_entry_type_global(struct domain *d, enum mg nt) { > > Stefano had some patches to generalise create_p2m_entries() into a p2m > walker infrastructure in his series to add XENMEM_exchange_and_pin. That > series turned out to be unnecessary but it could be resurrected for use > here rather than recoding a new one.p2m walker infrastructure sounds interesting.> > > + struct p2m_domain *p2m = &d->arch.p2m; > > + vaddr_t ram_base; > > + int i1, i2, i3; > > + int first_index, second_index, third_index; > > + lpae_t *first = __map_domain_page(p2m->first_level); > > + lpae_t pte, *second = NULL, *third = NULL; > > + > > + get_gma_start_end(d, &ram_base, NULL); > > + > > + first_index = first_table_offset((uint64_t)ram_base); > > + second_index = second_table_offset((uint64_t)ram_base); > > + third_index = third_table_offset((uint64_t)ram_base); > > + > > + BUG_ON( !first && "Can''t map first level p2m." ); > > + > > + spin_lock(&p2m->lock); > > + > > + for ( i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1 ) > > + { > > + lpae_walk_t first_pte = first[i1].walk; > > + if ( !first_pte.valid || !first_pte.table ) > > + goto out; > > + > > + second = map_domain_page(first_pte.base); > > + BUG_ON( !second && "Can''t map second level p2m."); > > + for ( i2 = second_index; i2 < LPAE_ENTRIES; ++i2 ) > > + { > > + lpae_walk_t second_pte = second[i2].walk; > > + > > + /* a nasty hack for vlpt due to the difference > > + * of page table entry layout between p2m and pt */ > > + second[i2].pt.ro = 0; > > What is the hack here? > > The issue is that the p2m second level, which is a table entry in the p2m > is installed into the Xen page tables where it ends up the third level, > treated as a block entry. > > That''s OK, because for both PT and P2M bits 2..10 are ignored for table > entries. So I think rather than this hack here we should simply ensure > that our non-leaf p2m''s have the correct bits in them to be used as p2m > table entries.Oh, I see. So, should I put something like this in create_p2m_entries function?> > > + > > + if ( !second_pte.valid || !second_pte.table ) > > + goto out; > > + > > + third = map_domain_page(second_pte.base); > > + BUG_ON( !third && "Can''t map third level p2m."); > > + > > + for ( i3 = third_index; i3 < LPAE_ENTRIES; ++i3 ) > > + { > > + lpae_walk_t third_pte = third[i3].walk; > > + int write = 0; > > + if ( !third_pte.valid ) > > + goto out; > > + > > + pte = third[i3]; > > + if ( pte.p2m.write == 1 && nt == mg_ro ) > > + { > > + pte.p2m.write = 0; > > + write = 1; > > + } > > + else if ( pte.p2m.write == 0 && nt == mg_rw ) > > + { > > + pte.p2m.write = 1; > > + write = 1; > > + } > > + if ( write ) > > + write_pte(&third[i3], pte); > > + } > > + unmap_domain_page(third); > > + > > + third = NULL; > > + third_index = 0; > > + } > > + unmap_domain_page(second); > > + > > + second = NULL; > > + second_index = 0; > > + third_index = 0; > > + } > > + > > +out: > > + flush_tlb_all_local(); > > + if ( third ) unmap_domain_page(third); > > + if ( second ) unmap_domain_page(second); > > + if ( first ) unmap_domain_page(first); > > + > > + spin_unlock(&p2m->lock); > > +} > > + > > +/* Read a domain''s log-dirty bitmap and stats. > > + * If the operation is a CLEAN, clear the bitmap and stats. */ int > > +log_dirty_op(struct domain *d, xen_domctl_shadow_op_t *sc) > > Can this not be static?Yes, right.> > > +{ > > + unsigned long gmfn_start; > > + unsigned long gmfn_end; > > + unsigned long gmfns; > > + unsigned int bitmap_pages; > > + int rc = 0, clean = 0, peek = 1; > > + uint8_t *bitmap[256]; /* bitmap[256] covers 32GB ram */ > > + int i; > > + > > + BUG_ON( !d->arch.map_domain.nr_banks ); > > + > > + gmfn_start = d->arch.map_domain.bank[0].start >> PAGE_SHIFT; > > + gmfn_end = domain_get_maximum_gpfn(d); > > + gmfns = gmfn_end - gmfn_start; > > + bitmap_pages = PFN_UP((gmfns + 7) / 8); > > + > > + if ( guest_handle_is_null(sc->dirty_bitmap) ) > > + { > > + peek = 0; > > + } > > + else > > + { > > + /* mapping to the bitmap from guest param */ > > + vaddr_t to = (vaddr_t)sc->dirty_bitmap.p; > > This is not allowed, please use the guest handle interfaces and > copy_(to/from)_(user/guest) instead of diving into the guest handle struct > and open coding it yourself. > > This might mean creating a temporary bitmap in hypervisor space, but if > you maintain the bitmap across the whole dirty period as suggested then > you should be able to simply copy it out.Oh, I''m seeing more advantages for using bitmap.> > Actually, for consistency you might need an atomic copy and clear > mechanism (otherwise you can perhaps loose updates), in which case you''ll > need a temporary buffer. > > > + > > + BUG_ON( to & ~PAGE_MASK && "offset not aligned to PAGE > > + SIZE"); > > + > > + for ( i = 0; i < bitmap_pages; ++i ) > > + { > > + paddr_t g; > > + rc = gvirt_to_maddr(to, &g); > > + if ( rc ) > > + return rc; > > + bitmap[i] = map_domain_page(g>>PAGE_SHIFT); > > + memset(bitmap[i], 0x00, PAGE_SIZE); > > + to += PAGE_SIZE; > > + } > > + } > > + > > + clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN); > > + sc->stats.dirty_count = d->arch.dirty.count; > > + > > + get_dirty_bitmap(d, bitmap); > > + if ( peek ) > > + { > > + for ( i = 0; i < bitmap_pages; ++i ) > > + { > > + unmap_domain_page(bitmap[i]); > > + } > > + } > > + > > + return 0; > > +} > > + > > +long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc) { > > + long ret = 0; > > + switch (sc->op) > > + { > > + case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY: > > + case XEN_DOMCTL_SHADOW_OP_OFF: > > + { > > + enum mg nt = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? mg_rw : > > +mg_ro; > > + > > + d->arch.dirty.mode = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? 0 : > 1; > > + p2m_change_entry_type_global(d, nt); > > + > > + if ( sc->op == XEN_DOMCTL_SHADOW_OP_OFF ) > > + cleanup_vlpt(d); > > + else > > + prepare_vlpt(d); > > + } > > + break; > > + > > + case XEN_DOMCTL_SHADOW_OP_CLEAN: > > + case XEN_DOMCTL_SHADOW_OP_PEEK: > > + { > > + ret = log_dirty_op(d, sc); > > + } > > + break; > > + > > + default: > > + return -ENOSYS; > > + } > > + return ret; > > +} > > + > > /* > > * Local variables: > > * mode: C > > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index > > 4c0fc32..3b78ed2 100644 > > --- a/xen/arch/arm/traps.c > > +++ b/xen/arch/arm/traps.c > > @@ -1313,6 +1313,8 @@ static void do_trap_data_abort_guest(struct > cpu_user_regs *regs, > > const char *msg; > > int rc, level = -1; > > mmio_info_t info; > > + int page_fault = ((dabt.dfsc & FSC_MASK) => > + (FSC_FLT_PERM + FSC_3D_LEVEL) && dabt.write); > > > > if ( !check_conditional_instr(regs, hsr) ) > > { > > @@ -1327,22 +1329,23 @@ static void do_trap_data_abort_guest(struct > cpu_user_regs *regs, > > info.gva = READ_SYSREG64(FAR_EL2); #endif > > > > - if (dabt.s1ptw) > > + if ( dabt.s1ptw && !page_fault ) > > goto bad_data_abort; > > I see this has been commented on elsewhere. > > Ian.
Jaeyong Yoo
2013-Oct-17 10:14 UTC
Re: [PATCH v4 9/9] xen/arm: Implement toolstack for xl restore/save and migrate
> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of Ian Campbell > Sent: Wednesday, October 16, 2013 10:55 PM > To: Jaeyong Yoo > Cc: Alexey Sokolov; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH v4 9/9] xen/arm: Implement toolstack for > xl restore/save and migrate > > On Fri, 2013-10-04 at 13:44 +0900, Jaeyong Yoo wrote: > > TARGETS-y := xenperf xenpm xen-tmem-list-parse gtraceview gtracestat > > xenlockprof xenwatchdogd xencov > > TARGETS-$(CONFIG_X86) += xen-detect xen-hvmctx xen-hvmcrash > > xen-lowmemd xen-mfndump > > +ifeq ($(CONFIG_X86),y) > > TARGETS-$(CONFIG_MIGRATE) += xen-hptool > > +endif > > I know this isn''t your fault but I cannot imagine what xen-hptool (memory > hotplug tool) has to do with CONFIG_MIGRATE. > > If it doesn''t build for ARM better to make it use TARGETS-$(CONFIG_X86) I > think.That sounds more reasonable. Thanks for all the comments of this patch series. Jaeyong
Jaeyong Yoo
2013-Oct-17 10:25 UTC
Re: [PATCH v4 7/9] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration
> -----Original Message----- > From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > Sent: Thursday, October 17, 2013 7:06 PM > To: Jaeyong Yoo > Cc: xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH v4 7/9] xen/arm: Implement virtual-linear > page table for guest p2m mapping in live migration > > On Thu, 2013-10-17 at 17:58 +0900, Jaeyong Yoo wrote: > > > > @@ -1382,6 +1382,84 @@ int is_iomem_page(unsigned long mfn) > > > > return 1; > > > > return 0; > > > > } > > > > + > > > > +/* flush the vlpt area */ > > > > +static void flush_vlpt(struct domain *d) { > > > > + int flush_size; > > > > + flush_size = (d->arch.dirty.second_lvl_end - > > > > + d->arch.dirty.second_lvl_start) << SECOND_SHIFT; > > > > + /* flushing the 3rd level mapping */ > > > > + flush_xen_data_tlb_range_va(VIRT_LIN_P2M_START, > > > > + flush_size); > > > > > > Shouldn''t the base here be VIRT_LIN_P2M_START + > > > (d->arch.dirty.second_lvl_start) << SECOND_SHIFT) or something like > that? > > > > Yes, right. It will also decrease the flush overhead. > > > > > > > > flush_xen_data_tlb_range_va just turns into a loop over the > > > addresses, so you might find you may as well do the flushes as you > > > update the ptes in the below, perhaps with an optimisation to pull > > > the barriers outside that loop. > > > > You mean the barriers in write_pte? OK. It looks better. > > I meant the ones in flush_xen_data_tlb_range_va, but perhaps the ones in > write_pte too. I suppose in both cases are __foo variant without the > barriers could be made so we can do > dsb > for each page > __write_pte > __flush_xen_data.... > dsb > isb > (or whatever the right barriers are!) >Oh, you meant put both write_pte and flushing into one loop. I got it! No reason for having two loops.> > > > +/* setting up the xen page table for vlpt mapping for domain d */ > > > > +void prepare_vlpt(struct domain *d) { > > > > + int xen_second_linear_base; > > > > + int gp2m_start_index, gp2m_end_index; > > > > + struct p2m_domain *p2m = &d->arch.p2m; > > > > + struct page_info *second_lvl_page; > > > > + vaddr_t gma_start = 0; > > > > + vaddr_t gma_end = 0; > > > > + lpae_t *first; > > > > + int i, j; > > > > + > > > > + xen_second_linear_base > second_linear_offset(VIRT_LIN_P2M_START); > > > > + get_gma_start_end(d, &gma_start, &gma_end); > > > > + > > > > + gp2m_start_index = gma_start >> FIRST_SHIFT; > > > > + gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1; > > > > + > > > > + second_lvl_page = alloc_domheap_page(NULL, 0); > > > > > > The p2m first is two concatenated pages with a total of 1024 entries > > > (which is needed to give the full 40 bit IPA space). I have a > > > feeling this means you need two pages here? > > > > > > Or maybe we shouldn''t be supporting the full 40 bit addresses on 32- > bit? > > > > For generality, I think it is better to support it. But, honestly, I''m > > not sure how many people would use 40 bit ARM guest. > > All it takes is one peripheral at a high address, even if you aren''t using > a large amount of RAM etc. > > I''ve got a 64-bit system on my desk which has stuff up in the 50 bit range. > It seems less likely to happen on 32-bit though...I think I''d better go for generality. Jaeyong
Ian Campbell
2013-Oct-17 10:30 UTC
Re: [PATCH v4 8/9] xen/arm: Implement hypercall for dirty page tracing
On Thu, 2013-10-17 at 19:12 +0900, Jaeyong Yoo wrote:> > > @@ -1460,6 +1459,28 @@ void cleanup_vlpt(struct domain *d) > > > unmap_domain_page_global(d->arch.dirty.second_lvl); > > > } > > > > > > +/* routine for dirty-page tracing > > > + * > > > + * On first write, it page faults, its entry is changed to > > > +read-write, > > > + * and on retry the write succeeds. > > > + * > > > + * for locating p2m of the faulting entry, we use virtual-linear page > > table. > > > + */ > > > +void handle_page_fault(struct domain *d, paddr_t addr) { > > > + lpae_t *vlp2m_pte = get_vlpt_3lvl_pte(addr); > > > + if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0 ) > > > > I think you need to distinguish p2m entries which are r/o due to log dirty > > from those which are r/o due to wanting them to be r/o. > > > > Maybe we don''t have the latter right now? We will eventually need p2m > > types I think. > > Yes, that should be distinguished. Actually, that was in my mind and apparently > I failed to remember. For doing this, I''m thinking to use un-used field in pte > as an indicator of ''wanting to be r/o'' and check this indicator in permission fault.Yes, I think we can use the 4 avail bits to store a p2m type. ISTR discussing this before, but perhaps it wasn''t with you!> > > > > > + { > > > + lpae_t pte = *vlp2m_pte; > > > + pte.p2m.write = 1; > > > + write_pte(vlp2m_pte, pte); > > > + flush_tlb_local(); > > > + > > > + /* in order to remove mappings in cleanup stage */ > > > + add_mapped_vaddr(d, addr); > > > > No locks or atomic operations here? How are races with the tools reading > > the dirty bitmap addressed? If it is by clever ordering of the checks and > > pte writes then I think a comment would be in order. > > Actually, the lock is inside the add_mapped_vaddr function.But that only covers the bitmap update, not the pte frobbing.> > > > > > + } > > > +} > > > + > > > /* > > > * Local variables: > > > * mode: C > > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index > > > 2d09fef..3b2b11a 100644 > > > --- a/xen/arch/arm/p2m.c > > > +++ b/xen/arch/arm/p2m.c > > > @@ -6,6 +6,8 @@ > > > #include <xen/bitops.h> > > > #include <asm/flushtlb.h> > > > #include <asm/gic.h> > > > +#include <xen/guest_access.h> > > > +#include <xen/pfn.h> > > > > > > void dump_p2m_lookup(struct domain *d, paddr_t addr) { @@ -408,6 > > > +410,304 @@ unsigned long gmfn_to_mfn(struct domain *d, unsigned long > > gpfn) > > > return p >> PAGE_SHIFT; > > > } > > > > > > +#define MAX_VA_PER_NODE (PAGE_SIZE - 2 * sizeof(void *) - sizeof(int))\ > > > + / sizeof(unsigned long) > > > + > > > +/* An array-based linked list for storing virtual addresses (i.e., > > > +the 3rd > > > + * level table mapping) that should be destroyed after live migration > > > +*/ struct mapped_va_node { > > > + struct page_info *next; > > > + struct mapped_va_node *mvn_next; > > > + int items; > > > + unsigned long vaddrs[MAX_VA_PER_NODE]; }; > > > + > > > +int add_mapped_vaddr(struct domain *d, unsigned long va) { > > > > This function seems awefuly expensive (contains allocations etc) for a > > function called on every page fault during a migration. > > > > Why are you remembering the full va of every fault when a single bit per > > page would do? > > > > I think you can allocate a bitmap when logdirty is enabled. At a bit per > > page it shouldn''t be too huge. > > > > You might be able to encode this in the p2m which you walk when the tools > > ask for the bit map (i.e. p2m.type==writeable p2m.write==0 => dirty), but > > I think a bitmap would do the trick and be easier to implement. > > There are three candidates: > > 1) bitmap > 2) encoding into p2m > 3) linked list of VAs. > > And, two functions for dirty-page tracing: > > A) dirty-page handling (At trap) > B) getting dirty bitmap (for toolstack) > > 1) and 2) have advantage for A) but have to search the full range of pages at B) > to find out which page is dirtied and set the page as read-only for next dirty > detection. On the otherhand, 3) does not need to search the full range at B). > I prefer 3) due to the ''non-full range search'' but I understand your concern. > Maybe I can do some measurement for each method to see which one better.If you have a bitmap then you can do a scan for each set bit, and the offset of each bit corresponds to the guest pfn, which allows you to directly calculate the address of the pte in the vlpt. I don''t think doing a scan for each set bit is going to compare unfavourably to walking a linked list. Certainly not once the memory allocator gets involved.> > > +/* get the dirty bitmap from the linked list stored by > > > +add_mapped_vaddr > > > + * and also clear the mapped vaddrs linked list */ static void > > > +get_dirty_bitmap(struct domain *d, uint8_t *bitmap[]) { > > > + struct page_info *page_head; > > > + struct mapped_va_node *mvn_head; > > > + struct page_info *tmp_page; > > > + struct mapped_va_node *tmp_mvn; > > > + vaddr_t gma_start = 0; > > > + vaddr_t gma_end = 0; > > > + > > > + /* this hypercall is called from domain 0, and we don''t know which > > guest''s > > > + * vlpt is mapped in xen_second, so, to be sure, we restore vlpt > > here */ > > > + restore_vlpt(d); > > > > AFAICT you don''t actually use the vlpt here, just the domains list of > > dirty addresses (which as above could become a bitmap) > > I think vlpt is still needed because at this stage, we have to reset the > write permission of dirtied pages. Maybe some tricks for efficiently doing this?Ah, of course. Do you need to be careful about the context switch of a dom0 vcpu which has a foreign vlpt loaded? I suppose you unload it at the end of this function so it is safe because Xen doesn''t preempt vcpus in hypervisor mode.> > > > > + struct p2m_domain *p2m = &d->arch.p2m; > > > + vaddr_t ram_base; > > > + int i1, i2, i3; > > > + int first_index, second_index, third_index; > > > + lpae_t *first = __map_domain_page(p2m->first_level); > > > + lpae_t pte, *second = NULL, *third = NULL; > > > + > > > + get_gma_start_end(d, &ram_base, NULL); > > > + > > > + first_index = first_table_offset((uint64_t)ram_base); > > > + second_index = second_table_offset((uint64_t)ram_base); > > > + third_index = third_table_offset((uint64_t)ram_base); > > > + > > > + BUG_ON( !first && "Can''t map first level p2m." ); > > > + > > > + spin_lock(&p2m->lock); > > > + > > > + for ( i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1 ) > > > + { > > > + lpae_walk_t first_pte = first[i1].walk; > > > + if ( !first_pte.valid || !first_pte.table ) > > > + goto out; > > > + > > > + second = map_domain_page(first_pte.base); > > > + BUG_ON( !second && "Can''t map second level p2m."); > > > + for ( i2 = second_index; i2 < LPAE_ENTRIES; ++i2 ) > > > + { > > > + lpae_walk_t second_pte = second[i2].walk; > > > + > > > + /* a nasty hack for vlpt due to the difference > > > + * of page table entry layout between p2m and pt */ > > > + second[i2].pt.ro = 0; > > > > What is the hack here? > > > > The issue is that the p2m second level, which is a table entry in the p2m > > is installed into the Xen page tables where it ends up the third level, > > treated as a block entry. > > > > That''s OK, because for both PT and P2M bits 2..10 are ignored for table > > entries. So I think rather than this hack here we should simply ensure > > that our non-leaf p2m''s have the correct bits in them to be used as p2m > > table entries. > > Oh, I see. So, should I put something like this in create_p2m_entries function?Probably p2m_create_table or maybe mfn_to_p2m entry would be the right place. Ian.
Jaeyong Yoo
2013-Oct-17 11:05 UTC
Re: [PATCH v4 8/9] xen/arm: Implement hypercall for dirty page tracing
> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of Ian Campbell > Sent: Thursday, October 17, 2013 7:30 PM > To: Jaeyong Yoo > Cc: ''Stefano Stabellini''; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH v4 8/9] xen/arm: Implement hypercall for > dirty page tracing > > On Thu, 2013-10-17 at 19:12 +0900, Jaeyong Yoo wrote: > > > > > @@ -1460,6 +1459,28 @@ void cleanup_vlpt(struct domain *d) > > > > unmap_domain_page_global(d->arch.dirty.second_lvl); > > > > } > > > > > > > > +/* routine for dirty-page tracing > > > > + * > > > > + * On first write, it page faults, its entry is changed to > > > > +read-write, > > > > + * and on retry the write succeeds. > > > > + * > > > > + * for locating p2m of the faulting entry, we use virtual-linear > > > > +page > > > table. > > > > + */ > > > > +void handle_page_fault(struct domain *d, paddr_t addr) { > > > > + lpae_t *vlp2m_pte = get_vlpt_3lvl_pte(addr); > > > > + if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0 ) > > > > > > I think you need to distinguish p2m entries which are r/o due to log > > > dirty from those which are r/o due to wanting them to be r/o. > > > > > > Maybe we don''t have the latter right now? We will eventually need > > > p2m types I think. > > > > Yes, that should be distinguished. Actually, that was in my mind and > > apparently I failed to remember. For doing this, I''m thinking to use > > un-used field in pte as an indicator of ''wanting to be r/o'' and check > this indicator in permission fault. > > Yes, I think we can use the 4 avail bits to store a p2m type. ISTR > discussing this before, but perhaps it wasn''t with you! > > > > > > > > > > + { > > > > + lpae_t pte = *vlp2m_pte; > > > > + pte.p2m.write = 1; > > > > + write_pte(vlp2m_pte, pte); > > > > + flush_tlb_local(); > > > > + > > > > + /* in order to remove mappings in cleanup stage */ > > > > + add_mapped_vaddr(d, addr); > > > > > > No locks or atomic operations here? How are races with the tools > > > reading the dirty bitmap addressed? If it is by clever ordering of > > > the checks and pte writes then I think a comment would be in order. > > > > Actually, the lock is inside the add_mapped_vaddr function. > > But that only covers the bitmap update, not the pte frobbing.I also locked with the same lock at the get_dirty_bitmap which is reading the bitmap.> > > > > > > > > > + } > > > > +} > > > > + > > > > /* > > > > * Local variables: > > > > * mode: C > > > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index > > > > 2d09fef..3b2b11a 100644 > > > > --- a/xen/arch/arm/p2m.c > > > > +++ b/xen/arch/arm/p2m.c > > > > @@ -6,6 +6,8 @@ > > > > #include <xen/bitops.h> > > > > #include <asm/flushtlb.h> > > > > #include <asm/gic.h> > > > > +#include <xen/guest_access.h> > > > > +#include <xen/pfn.h> > > > > > > > > void dump_p2m_lookup(struct domain *d, paddr_t addr) { @@ -408,6 > > > > +410,304 @@ unsigned long gmfn_to_mfn(struct domain *d, unsigned > > > > +long > > > gpfn) > > > > return p >> PAGE_SHIFT; > > > > } > > > > > > > > +#define MAX_VA_PER_NODE (PAGE_SIZE - 2 * sizeof(void *) - > sizeof(int))\ > > > > + / sizeof(unsigned long) > > > > + > > > > +/* An array-based linked list for storing virtual addresses > > > > +(i.e., the 3rd > > > > + * level table mapping) that should be destroyed after live > > > > +migration */ struct mapped_va_node { > > > > + struct page_info *next; > > > > + struct mapped_va_node *mvn_next; > > > > + int items; > > > > + unsigned long vaddrs[MAX_VA_PER_NODE]; }; > > > > + > > > > +int add_mapped_vaddr(struct domain *d, unsigned long va) { > > > > > > This function seems awefuly expensive (contains allocations etc) for > > > a function called on every page fault during a migration. > > > > > > Why are you remembering the full va of every fault when a single bit > > > per page would do? > > > > > > I think you can allocate a bitmap when logdirty is enabled. At a bit > > > per page it shouldn''t be too huge. > > > > > > You might be able to encode this in the p2m which you walk when the > > > tools ask for the bit map (i.e. p2m.type==writeable p2m.write==0 => > > > dirty), but I think a bitmap would do the trick and be easier to > implement. > > > > There are three candidates: > > > > 1) bitmap > > 2) encoding into p2m > > 3) linked list of VAs. > > > > And, two functions for dirty-page tracing: > > > > A) dirty-page handling (At trap) > > B) getting dirty bitmap (for toolstack) > > > > 1) and 2) have advantage for A) but have to search the full range of > > pages at B) to find out which page is dirtied and set the page as > > read-only for next dirty detection. On the otherhand, 3) does not need > to search the full range at B). > > I prefer 3) due to the ''non-full range search'' but I understand your > concern. > > Maybe I can do some measurement for each method to see which one better. > > If you have a bitmap then you can do a scan for each set bit, and the > offset of each bit corresponds to the guest pfn, which allows you to > directly calculate the address of the pte in the vlpt. > > I don''t think doing a scan for each set bit is going to compare > unfavourably to walking a linked list. Certainly not once the memory > allocator gets involved.Oh, do you mean using something like "find_next_zero_bit"? At the first time, I thought searching every bit-by-bit but, I just see findbit.S and it looks quite optimized.> > > > > +/* get the dirty bitmap from the linked list stored by > > > > +add_mapped_vaddr > > > > + * and also clear the mapped vaddrs linked list */ static void > > > > +get_dirty_bitmap(struct domain *d, uint8_t *bitmap[]) { > > > > + struct page_info *page_head; > > > > + struct mapped_va_node *mvn_head; > > > > + struct page_info *tmp_page; > > > > + struct mapped_va_node *tmp_mvn; > > > > + vaddr_t gma_start = 0; > > > > + vaddr_t gma_end = 0; > > > > + > > > > + /* this hypercall is called from domain 0, and we don''t know > > > > + which > > > guest''s > > > > + * vlpt is mapped in xen_second, so, to be sure, we restore > > > > + vlpt > > > here */ > > > > + restore_vlpt(d); > > > > > > AFAICT you don''t actually use the vlpt here, just the domains list > > > of dirty addresses (which as above could become a bitmap) > > > > I think vlpt is still needed because at this stage, we have to reset > > the write permission of dirtied pages. Maybe some tricks for efficiently > doing this? > > Ah, of course. > > Do you need to be careful about the context switch of a dom0 vcpu which > has a foreign vlpt loaded? I suppose you unload it at the end of this > function so it is safe because Xen doesn''t preempt vcpus in hypervisor > mode.That is my concern now. For safety, unloading is necessary but it requires flushing vlpt which is not a cheap operation. Actually, there is one more concern: Since we only context switch the vlpt area with ''migrating domains'' when two domains (one migrating and the other not) are context switched, the non-migrating domain have access to the foreign vlpt. In order to prevent the access of foreign vlpt area, do you think unloading of vlpt when switching into non-migrating domains also necessary?> > > > > > > > + struct p2m_domain *p2m = &d->arch.p2m; > > > > + vaddr_t ram_base; > > > > + int i1, i2, i3; > > > > + int first_index, second_index, third_index; > > > > + lpae_t *first = __map_domain_page(p2m->first_level); > > > > + lpae_t pte, *second = NULL, *third = NULL; > > > > + > > > > + get_gma_start_end(d, &ram_base, NULL); > > > > + > > > > + first_index = first_table_offset((uint64_t)ram_base); > > > > + second_index = second_table_offset((uint64_t)ram_base); > > > > + third_index = third_table_offset((uint64_t)ram_base); > > > > + > > > > + BUG_ON( !first && "Can''t map first level p2m." ); > > > > + > > > > + spin_lock(&p2m->lock); > > > > + > > > > + for ( i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1 ) > > > > + { > > > > + lpae_walk_t first_pte = first[i1].walk; > > > > + if ( !first_pte.valid || !first_pte.table ) > > > > + goto out; > > > > + > > > > + second = map_domain_page(first_pte.base); > > > > + BUG_ON( !second && "Can''t map second level p2m."); > > > > + for ( i2 = second_index; i2 < LPAE_ENTRIES; ++i2 ) > > > > + { > > > > + lpae_walk_t second_pte = second[i2].walk; > > > > + > > > > + /* a nasty hack for vlpt due to the difference > > > > + * of page table entry layout between p2m and pt */ > > > > + second[i2].pt.ro = 0; > > > > > > What is the hack here? > > > > > > The issue is that the p2m second level, which is a table entry in > > > the p2m is installed into the Xen page tables where it ends up the > > > third level, treated as a block entry. > > > > > > That''s OK, because for both PT and P2M bits 2..10 are ignored for > > > table entries. So I think rather than this hack here we should > > > simply ensure that our non-leaf p2m''s have the correct bits in them > > > to be used as p2m table entries. > > > > Oh, I see. So, should I put something like this in create_p2m_entries > function? > > Probably p2m_create_table or maybe mfn_to_p2m entry would be the right > place.OK.> > Ian. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2013-Oct-17 11:47 UTC
Re: [PATCH v4 8/9] xen/arm: Implement hypercall for dirty page tracing
On Thu, 2013-10-17 at 20:05 +0900, Jaeyong Yoo wrote:> > > > > > > > > > > > + { > > > > > + lpae_t pte = *vlp2m_pte; > > > > > + pte.p2m.write = 1; > > > > > + write_pte(vlp2m_pte, pte); > > > > > + flush_tlb_local(); > > > > > + > > > > > + /* in order to remove mappings in cleanup stage */ > > > > > + add_mapped_vaddr(d, addr); > > > > > > > > No locks or atomic operations here? How are races with the tools > > > > reading the dirty bitmap addressed? If it is by clever ordering of > > > > the checks and pte writes then I think a comment would be in order. > > > > > > Actually, the lock is inside the add_mapped_vaddr function. > > > > But that only covers the bitmap update, not the pte frobbing. > > I also locked with the same lock at the get_dirty_bitmap which is reading > the bitmap.I meant the pte frobbing immediately before the call the add_mapped_vaddr quoted above.> > > There are three candidates: > > > > > > 1) bitmap > > > 2) encoding into p2m > > > 3) linked list of VAs. > > > > > > And, two functions for dirty-page tracing: > > > > > > A) dirty-page handling (At trap) > > > B) getting dirty bitmap (for toolstack) > > > > > > 1) and 2) have advantage for A) but have to search the full range of > > > pages at B) to find out which page is dirtied and set the page as > > > read-only for next dirty detection. On the otherhand, 3) does not need > > to search the full range at B). > > > I prefer 3) due to the ''non-full range search'' but I understand your > > concern. > > > Maybe I can do some measurement for each method to see which one better. > > > > If you have a bitmap then you can do a scan for each set bit, and the > > offset of each bit corresponds to the guest pfn, which allows you to > > directly calculate the address of the pte in the vlpt. > > > > I don''t think doing a scan for each set bit is going to compare > > unfavourably to walking a linked list. Certainly not once the memory > > allocator gets involved. > > Oh, do you mean using something like "find_next_zero_bit"?find_next_bit probably since you want to find the 1s. I suppose you could also initialise to all 1 and clear bits instead> At the first > time, > I thought searching every bit-by-bit but, I just see findbit.S and it looks > quite optimized.Yes, I think it should be fast enough.> > > > > +/* get the dirty bitmap from the linked list stored by > > > > > +add_mapped_vaddr > > > > > + * and also clear the mapped vaddrs linked list */ static void > > > > > +get_dirty_bitmap(struct domain *d, uint8_t *bitmap[]) { > > > > > + struct page_info *page_head; > > > > > + struct mapped_va_node *mvn_head; > > > > > + struct page_info *tmp_page; > > > > > + struct mapped_va_node *tmp_mvn; > > > > > + vaddr_t gma_start = 0; > > > > > + vaddr_t gma_end = 0; > > > > > + > > > > > + /* this hypercall is called from domain 0, and we don''t know > > > > > + which > > > > guest''s > > > > > + * vlpt is mapped in xen_second, so, to be sure, we restore > > > > > + vlpt > > > > here */ > > > > > + restore_vlpt(d); > > > > > > > > AFAICT you don''t actually use the vlpt here, just the domains list > > > > of dirty addresses (which as above could become a bitmap) > > > > > > I think vlpt is still needed because at this stage, we have to reset > > > the write permission of dirtied pages. Maybe some tricks for efficiently > > doing this? > > > > Ah, of course. > > > > Do you need to be careful about the context switch of a dom0 vcpu which > > has a foreign vlpt loaded? I suppose you unload it at the end of this > > function so it is safe because Xen doesn''t preempt vcpus in hypervisor > > mode. > > > That is my concern now. For safety, unloading is necessary but it requires > flushing vlpt which is not a cheap operation.I suppose it could be deferred either to the next load of a vplt, if it is a different one, or the next context switch away from the vcpu. You''d just need to track which foreign vlpt was currently loaded on the vcpu in vcpu->arch.something?> Actually, there is one more concern: > Since we only context switch the vlpt area with ''migrating domains'' when two > domains (one migrating and the other not) are context switched, the > non-migrating > domain have access to the foreign vlpt. In order to prevent the access of > foreign vlpt area, do you think unloading of vlpt when switching into > non-migrating > domains also necessary?The domain itself doesn''t have access to the vlpt since it is mapped in hypervisor context so the only danger is that Xen accidentally uses the vlpt for something while the non-migrating domain is scheduled. I think we should just not write that bug ;-) IOW I think flushing the vlpt lazily is OK. Ian.
Jaeyong Yoo
2013-Oct-18 04:17 UTC
Re: [PATCH v4 8/9] xen/arm: Implement hypercall for dirty page tracing
> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of Ian Campbell > Sent: Thursday, October 17, 2013 8:47 PM > To: Jaeyong Yoo > Cc: ''Stefano Stabellini''; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH v4 8/9] xen/arm: Implement hypercall for > dirty page tracing > > On Thu, 2013-10-17 at 20:05 +0900, Jaeyong Yoo wrote: > > > > > > > > > > > > > > > + { > > > > > > + lpae_t pte = *vlp2m_pte; > > > > > > + pte.p2m.write = 1; > > > > > > + write_pte(vlp2m_pte, pte); > > > > > > + flush_tlb_local(); > > > > > > + > > > > > > + /* in order to remove mappings in cleanup stage */ > > > > > > + add_mapped_vaddr(d, addr); > > > > > > > > > > No locks or atomic operations here? How are races with the tools > > > > > reading the dirty bitmap addressed? If it is by clever ordering > > > > > of the checks and pte writes then I think a comment would be in > order. > > > > > > > > Actually, the lock is inside the add_mapped_vaddr function. > > > > > > But that only covers the bitmap update, not the pte frobbing. > > > > I also locked with the same lock at the get_dirty_bitmap which is > > reading the bitmap. > > I meant the pte frobbing immediately before the call the add_mapped_vaddr > quoted above. >I don''t think it is necessary because at the moment the dirty bitmap is constructed based on the VAs in the linked list. If the PTE frobbing happens immediately before add_mapped_vaddr, the corresponding dirty-page would be checked at the next round get-dirty-gitmap. Now, I start thinking of using bitmap rather than linked list, and I think it is similar there. The critical section would be the one that writes into the bitmap and the one that reads the bitmap (copy to the bitmap from toolstack). Jaeyong
Ian Campbell
2013-Oct-18 09:11 UTC
Re: [PATCH v4 8/9] xen/arm: Implement hypercall for dirty page tracing
On Fri, 2013-10-18 at 13:17 +0900, Jaeyong Yoo wrote:> > I meant the pte frobbing immediately before the call the add_mapped_vaddr > > quoted above. > > > > I don''t think it is necessary because at the moment the dirty bitmap > is constructed based on the VAs in the linked list. If the PTE frobbing > happens > immediately before add_mapped_vaddr, the corresponding dirty-page would be > checked at the next round get-dirty-gitmap.OK, thanks. Can we get a code comment on the ordering constraints here please.> > > Now, I start thinking of using bitmap rather than linked list, and I think > it > is similar there. The critical section would be the one that writes into the > bitmap and the one that reads the bitmap (copy to the bitmap from > toolstack). > > Jaeyong >
Eugene Fedotov
2013-Oct-31 08:56 UTC
Re: [PATCH v4 3/9] xen/arm: Implement set_memory_map hypercall for arm
11.10.2013 14:09, Ian Campbell:> On Fri, 2013-10-11 at 14:06 +0400, Eugene Fedotov wrote: >> 10.10.2013 14:56, Ian Campbell: >>> On Fri, 2013-10-04 at 13:43 +0900, Jaeyong Yoo wrote: >>>> From: Evgeny Fedotov <e.fedotov@samsung.com> >>>> >>>> When creating domU in toolstack, pass the guest memory >>>> map info to the hypervisor, and the hypervisor stores those info in >>>> arch_domain for later use. >>>> >>>> Singed-off-by: Evgeny Fedotov <e.fedotov@samsung.com> >>>> --- >>>> tools/libxc/xc_dom_arm.c | 12 +++++++- >>>> tools/libxc/xc_domain.c | 44 ++++++++++++++++++++++++++++ >>>> tools/libxc/xenctrl.h | 23 +++++++++++++++ >>>> xen/arch/arm/domain.c | 3 ++ >>>> xen/arch/arm/mm.c | 68 ++++++++++++++++++++++++++++++++++++++++++++ >>>> xen/include/asm-arm/domain.h | 2 ++ >>>> xen/include/asm-arm/mm.h | 1 + >>>> xen/include/public/memory.h | 15 ++++++++-- >>>> xen/include/xsm/dummy.h | 5 ++++ >>>> xen/include/xsm/xsm.h | 5 ++++ >>>> 10 files changed, 175 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c >>>> index df59ffb..20c9095 100644 >>>> --- a/tools/libxc/xc_dom_arm.c >>>> +++ b/tools/libxc/xc_dom_arm.c >>>> @@ -166,6 +166,7 @@ int arch_setup_meminit(struct xc_dom_image *dom) >>>> { >>>> int rc; >>>> xen_pfn_t pfn, allocsz, i; >>>> + struct dt_mem_info memmap; >>>> >>>> dom->shadow_enabled = 1; >>>> >>>> @@ -191,7 +192,16 @@ int arch_setup_meminit(struct xc_dom_image *dom) >>>> 0, 0, &dom->p2m_host[i]); >>>> } >>>> >>>> - return 0; >>>> + /* setup guest memory map */ >>>> + memmap.nr_banks = 2; >>>> + memmap.bank[0].start = (dom->rambase_pfn << PAGE_SHIFT_ARM); >>>> + memmap.bank[0].size = (dom->total_pages << PAGE_SHIFT_ARM); >>>> + /*The end of main memory: magic pages */ >>>> + memmap.bank[1].start = memmap.bank[0].start + memmap.bank[0].size; >>>> + memmap.bank[1].size = NR_MAGIC_PAGES << PAGE_SHIFT_ARM; >>> Are the 0 and 1 here hardcoded magic numbers? >> Well, we hardcoded here two memory regions: the first one for RAM, the >> second one for "magic pages". >>>> + return xc_domain_set_memory_map(dom->xch, dom->guest_domid, &memmap); >>> I think this is using set_memory_map in a different way to it is used >>> for x86 (where it gives the PV e820 map, a PV version of a BIOS provided >>> datastructure). >> Do you mean that using e820 structure for ARM implementation is better >> than dt_mem_info structure (that has been taken from libdt) and should >> be used in ARM implementation of get/set memory map? > I mean on x86 it is used as a way of communicating a memory map (in a > structure vaguely like an actual defined architecture specific layout, > the e820), whereas on ARM it is only used by the tools and should not be > exposed to the guest, since the guest gets its info from the DTB. > >>> The guest should get its memory map via DTB not via a PV hypercall. I >>> know the guest isn''t using get_memory_map but I don''t think we should >>> even make it available. >> OK, this hypercall will be available from dom0 only. > It''ll have to move away from the XENMEM space then.Can we create new DOMCTL hypercall for get/set memory map?
Ian Campbell
2013-Nov-01 09:04 UTC
Re: [PATCH v4 3/9] xen/arm: Implement set_memory_map hypercall for arm
On Thu, 2013-10-31 at 12:56 +0400, Eugene Fedotov wrote:> Can we create new DOMCTL hypercall for get/set memory map?You could but is it really necessary? The guest memory map is almost entirely static apart from the RAM size (which can be discovered by other means). If you base things on my "xen: arm: 64-bit guest support and domU FDT autogeneration" series (at least v2) then all the necessary #defines should be available to the tools as well. Ian.
Ian Campbell
2013-Nov-01 09:14 UTC
Re: [PATCH v4 3/9] xen/arm: Implement set_memory_map hypercall for arm
On Thu, 2013-10-31 at 12:56 +0400, Eugene Fedotov wrote:> Can we create new DOMCTL hypercall for get/set memory map?You could but do you really need it? The guest memory map is pretty much static apart from RAM size and in the next version of my "xen: arm: 64-bit guest support and domU FDT autogeneration" series all the relevant #defines should be available to the toolstack as well as the hypervisor (and if not, we can fix that). Any get/set memory map calls can be deferred until there is a need for a more dynamic guest layout IMHO. Ian.
Ian Campbell
2013-Nov-01 09:48 UTC
Re: [PATCH v4 3/9] xen/arm: Implement set_memory_map hypercall for arm
On Fri, 2013-11-01 at 13:51 +0400, Eugene Fedotov wrote:> 01.11.2013 13:14, Ian Campbell пишет: > > On Thu, 2013-10-31 at 12:56 +0400, Eugene Fedotov wrote: > >> Can we create new DOMCTL hypercall for get/set memory map? > > You could but do you really need it? The guest memory map is pretty much > > static apart from RAM size and in the next version of my "xen: arm: > > 64-bit guest support and domU FDT autogeneration" series all the > > relevant #defines should be available to the toolstack as well as the > > hypervisor (and if not, we can fix that). > > > > Any get/set memory map calls can be deferred until there is a need for a > > more dynamic guest layout IMHO. > > > > Ian. > > > So it is not yet upstreamed: "xen: arm: 64-bit guest support and domU > FDT autogeneration" ?Not yet.> As to current patch series, how do the hypervisor acquire RAM size (and > base) in current upstream? (since only toolstack knows it yet, or I > missed something).The toolstack makes a hypercall, via libxc:xc_domain_setmaxmem. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Eugene Fedotov
2013-Nov-01 09:51 UTC
Re: [PATCH v4 3/9] xen/arm: Implement set_memory_map hypercall for arm
01.11.2013 13:14, Ian Campbell пишет:> On Thu, 2013-10-31 at 12:56 +0400, Eugene Fedotov wrote: >> Can we create new DOMCTL hypercall for get/set memory map? > You could but do you really need it? The guest memory map is pretty much > static apart from RAM size and in the next version of my "xen: arm: > 64-bit guest support and domU FDT autogeneration" series all the > relevant #defines should be available to the toolstack as well as the > hypervisor (and if not, we can fix that). > > Any get/set memory map calls can be deferred until there is a need for a > more dynamic guest layout IMHO. > > Ian. >So it is not yet upstreamed: "xen: arm: 64-bit guest support and domU FDT autogeneration" ? As to current patch series, how do the hypervisor acquire RAM size (and base) in current upstream? (since only toolstack knows it yet, or I missed something). Best regards, Evgeny. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Eugene Fedotov
2013-Nov-01 10:08 UTC
Re: [PATCH v4 3/9] xen/arm: Implement set_memory_map hypercall for arm
> The toolstack makes a hypercall, via libxc:xc_domain_setmaxmem.OK, RAM size we know. But, for the RAM base should we use (till the patch above applied) a macro like #define GUEST_RAM_BASE 0x80000000 ?> > Ian. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2013-Nov-01 10:30 UTC
Re: [PATCH v4 3/9] xen/arm: Implement set_memory_map hypercall for arm
On Fri, 2013-11-01 at 14:08 +0400, Eugene Fedotov wrote:> > The toolstack makes a hypercall, via libxc:xc_domain_setmaxmem. > OK, RAM size we know. But, for the RAM base should we use (till the > patch above applied) a macro like > #define GUEST_RAM_BASE 0x80000000 ?Yes, some #define which matches what the builder is using will do until the series lands. Ian.