Jaeyong Yoo
2013-Jul-03 09:15 UTC
[PATCH RFC v2 0/9] Preliminary working version of live migration
Hello all, Me and my colleagues have finally made a working version of live migration in xen-on-arndale. Using this patch series, we can xl save/restore and xl migrate <domid> <ip> successfully. I have to further test the stability and the performance, I would like to get some comments and reviews if anything may not nicely fit to the xen framework. It is quite preliminary version, but I hope it could be a start for supporting live migration in xen on arm. Best, Jaeyong Alexey Sokolov (2): xen/arm: Implement get_maximum_gpfn hypercall for arm xen/arm: Implement modify_returncode Alexey Sokolov, Elena Pyatunina, Evgeny Fedotov, and Nikolay Martyanov (1): xen/arm: Implement toolstack for xl restore/save and migrate Elena Pyatunina (2): xen/arm: Add handling write fault for dirty-page tracing xen/arm: Implement hypercall for dirty page tracing (shadow op) Jaeyong Yoo (3): xen-arm: Implement basic save/load for hvm context xen/arm: Add more registers for saving and restoring vcpu registers xen/arm: Missing impl of clear_guest_offset macro Jaeyong Yoo and Evgeny Fedotov (1): xen/arm: Implement save and restore for gic, vtimer, and ptimer config/arm32.mk | 1 + tools/include/xen-foreign/reference.size | 2 +- tools/libxc/Makefile | 5 + tools/libxc/xc_arm_migrate.c | 586 +++++++++++++++++++++++++++++++ tools/libxc/xc_resume.c | 25 ++ tools/misc/Makefile | 4 + xen/arch/arm/Makefile | 2 +- xen/arch/arm/domain.c | 38 ++ xen/arch/arm/domctl.c | 134 ++++++- xen/arch/arm/hvm.c | 67 ---- xen/arch/arm/hvm/Makefile | 2 + xen/arch/arm/hvm/hvm.c | 209 +++++++++++ xen/arch/arm/hvm/save.c | 66 ++++ xen/arch/arm/mm.c | 62 +++- xen/arch/arm/p2m.c | 368 +++++++++++++++++++ xen/arch/arm/traps.c | 7 + xen/common/Makefile | 2 + xen/include/asm-arm/domain.h | 8 + xen/include/asm-arm/guest_access.h | 6 +- xen/include/asm-arm/hvm/support.h | 29 ++ xen/include/asm-arm/mm.h | 2 + xen/include/asm-arm/p2m.h | 8 + xen/include/public/arch-arm.h | 41 +++ xen/include/public/arch-arm/hvm/save.h | 55 +++ 24 files changed, 1655 insertions(+), 74 deletions(-) create mode 100644 tools/libxc/xc_arm_migrate.c delete mode 100644 xen/arch/arm/hvm.c create mode 100644 xen/arch/arm/hvm/Makefile create mode 100644 xen/arch/arm/hvm/hvm.c create mode 100644 xen/arch/arm/hvm/save.c create mode 100644 xen/include/asm-arm/hvm/support.h -- 1.8.1.2
Jaeyong Yoo
2013-Jul-03 09:15 UTC
[PATCH RFC v2 1/9] xen-arm: Implement basic save/load for hvm context
Create a hvm directory for arm and move all the hvm related files to that directory. Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> --- xen/arch/arm/Makefile | 2 +- xen/arch/arm/hvm.c | 67 ---------------------------------- xen/arch/arm/hvm/Makefile | 2 + xen/arch/arm/hvm/hvm.c | 67 ++++++++++++++++++++++++++++++++++ xen/arch/arm/hvm/save.c | 66 +++++++++++++++++++++++++++++++++ xen/include/asm-arm/hvm/support.h | 29 +++++++++++++++ xen/include/public/arch-arm/hvm/save.h | 17 +++++++++ 7 files changed, 182 insertions(+), 68 deletions(-) delete mode 100644 xen/arch/arm/hvm.c create mode 100644 xen/arch/arm/hvm/Makefile create mode 100644 xen/arch/arm/hvm/hvm.c create mode 100644 xen/arch/arm/hvm/save.c create mode 100644 xen/include/asm-arm/hvm/support.h diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile index 87fabe1..0d43539 100644 --- a/xen/arch/arm/Makefile +++ b/xen/arch/arm/Makefile @@ -1,6 +1,7 @@ subdir-$(arm32) += arm32 subdir-$(arm64) += arm64 subdir-y += platforms +subdir-y += hvm obj-$(EARLY_PRINTK) += early_printk.o obj-y += cpu.o @@ -28,7 +29,6 @@ obj-y += traps.o obj-y += vgic.o obj-y += vtimer.o obj-y += vpl011.o -obj-y += hvm.o obj-y += device.o #obj-bin-y += ....o diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c deleted file mode 100644 index 471c4cd..0000000 --- a/xen/arch/arm/hvm.c +++ /dev/null @@ -1,67 +0,0 @@ -#include <xen/config.h> -#include <xen/init.h> -#include <xen/lib.h> -#include <xen/errno.h> -#include <xen/guest_access.h> -#include <xen/sched.h> - -#include <xsm/xsm.h> - -#include <public/xen.h> -#include <public/hvm/params.h> -#include <public/hvm/hvm_op.h> - -#include <asm/hypercall.h> - -long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) - -{ - long rc = 0; - - switch ( op ) - { - case HVMOP_set_param: - case HVMOP_get_param: - { - struct xen_hvm_param a; - struct domain *d; - - if ( copy_from_guest(&a, arg, 1) ) - return -EFAULT; - - if ( a.index >= HVM_NR_PARAMS ) - return -EINVAL; - - d = rcu_lock_domain_by_any_id(a.domid); - if ( d == NULL ) - return -ESRCH; - - rc = xsm_hvm_param(XSM_TARGET, d, op); - if ( rc ) - goto param_fail; - - if ( op == HVMOP_set_param ) - { - d->arch.hvm_domain.params[a.index] = a.value; - } - else - { - a.value = d->arch.hvm_domain.params[a.index]; - rc = copy_to_guest(arg, &a, 1) ? -EFAULT : 0; - } - - param_fail: - rcu_unlock_domain(d); - break; - } - - default: - { - printk("%s: Bad HVM op %ld.\n", __func__, op); - rc = -ENOSYS; - break; - } - } - - return rc; -} diff --git a/xen/arch/arm/hvm/Makefile b/xen/arch/arm/hvm/Makefile new file mode 100644 index 0000000..ad38e09 --- /dev/null +++ b/xen/arch/arm/hvm/Makefile @@ -0,0 +1,2 @@ +obj-y += hvm.o +obj-y += save.o diff --git a/xen/arch/arm/hvm/hvm.c b/xen/arch/arm/hvm/hvm.c new file mode 100644 index 0000000..471c4cd --- /dev/null +++ b/xen/arch/arm/hvm/hvm.c @@ -0,0 +1,67 @@ +#include <xen/config.h> +#include <xen/init.h> +#include <xen/lib.h> +#include <xen/errno.h> +#include <xen/guest_access.h> +#include <xen/sched.h> + +#include <xsm/xsm.h> + +#include <public/xen.h> +#include <public/hvm/params.h> +#include <public/hvm/hvm_op.h> + +#include <asm/hypercall.h> + +long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) + +{ + long rc = 0; + + switch ( op ) + { + case HVMOP_set_param: + case HVMOP_get_param: + { + struct xen_hvm_param a; + struct domain *d; + + if ( copy_from_guest(&a, arg, 1) ) + return -EFAULT; + + if ( a.index >= HVM_NR_PARAMS ) + return -EINVAL; + + d = rcu_lock_domain_by_any_id(a.domid); + if ( d == NULL ) + return -ESRCH; + + rc = xsm_hvm_param(XSM_TARGET, d, op); + if ( rc ) + goto param_fail; + + if ( op == HVMOP_set_param ) + { + d->arch.hvm_domain.params[a.index] = a.value; + } + else + { + a.value = d->arch.hvm_domain.params[a.index]; + rc = copy_to_guest(arg, &a, 1) ? -EFAULT : 0; + } + + param_fail: + rcu_unlock_domain(d); + break; + } + + default: + { + printk("%s: Bad HVM op %ld.\n", __func__, op); + rc = -ENOSYS; + break; + } + } + + return rc; +} diff --git a/xen/arch/arm/hvm/save.c b/xen/arch/arm/hvm/save.c new file mode 100644 index 0000000..0062ea8 --- /dev/null +++ b/xen/arch/arm/hvm/save.c @@ -0,0 +1,66 @@ +/* + * hvm/save.c: Save and restore HVM guest''s emulated hardware state for ARM. + * + * Copyright (c) 2013, Samsung Electronics. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + */ + +#include <asm/hvm/support.h> +#include <public/hvm/save.h> + +void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr) +{ + hdr->cpuid = READ_SYSREG32(MIDR_EL1); +} + +int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr) +{ + uint32_t cpuid; + + if ( hdr->magic != HVM_FILE_MAGIC ) + { + printk(XENLOG_G_ERR "HVM%d restore: bad magic number %#"PRIx32"\n", + d->domain_id, hdr->magic); + return -1; + } + + if ( hdr->version != HVM_FILE_VERSION ) + { + printk(XENLOG_G_ERR "HVM%d restore: unsupported version %u\n", + d->domain_id, hdr->version); + return -1; + } + + cpuid = READ_SYSREG32(MIDR_EL1); + if ( hdr->cpuid != cpuid ) + { + printk(XENLOG_G_INFO "HVM%d restore: VM saved on one CPU " + "(%#"PRIx32") and restored on another (%#"PRIx32").\n", + d->domain_id, hdr->cpuid, cpuid); + return -1; + } + + return 0; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/include/asm-arm/hvm/support.h b/xen/include/asm-arm/hvm/support.h new file mode 100644 index 0000000..33390b0 --- /dev/null +++ b/xen/include/asm-arm/hvm/support.h @@ -0,0 +1,29 @@ +/* + * support.h: HVM support routines used by ARMv7 VE. + * + * Copyright (c) 2013, Samsung Electronics + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + */ + +#ifndef __ASM_ARM_HVM_SUPPORT_H__ +#define __ASM_ARM_HVM_SUPPORT_H__ + +#include <xen/types.h> +#include <public/hvm/ioreq.h> +#include <xen/sched.h> +#include <xen/hvm/save.h> +#include <asm/processor.h> + +#endif /* __ASM_ARM_HVM_SUPPORT_H__ */ diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h index 75b8e65..fbfb2db 100644 --- a/xen/include/public/arch-arm/hvm/save.h +++ b/xen/include/public/arch-arm/hvm/save.h @@ -26,6 +26,23 @@ #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__ #define __XEN_PUBLIC_HVM_SAVE_ARM_H__ +/* + * Save/restore header: general info about the save file. + */ + +#define HVM_FILE_MAGIC 0x92385520 +#define HVM_FILE_VERSION 0x00000001 + +struct hvm_save_header +{ + uint32_t magic; /* Must be HVM_FILE_MAGIC */ + uint32_t version; /* File format version */ + uint64_t changeset; /* Version of Xen that saved this file */ + uint32_t cpuid; /* MIDR_EL1 on the saving machine */ +}; + +DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header); + #endif /* -- 1.8.1.2
Jaeyong Yoo
2013-Jul-03 09:15 UTC
[PATCH RFC v2 2/9] xen/arm: Implement save and restore for gic, vtimer, and ptimer
In save/load of hvm context, add registers for gic, vtimer, and ptimer. Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> Singed-off-by: Evgeny Fedotov <e.fedotov@samsung.com> --- xen/arch/arm/domctl.c | 80 ++++++++++++++++++- xen/arch/arm/hvm/hvm.c | 142 +++++++++++++++++++++++++++++++++ xen/common/Makefile | 2 + xen/include/public/arch-arm/hvm/save.h | 38 +++++++++ 4 files changed, 261 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c index 851ee40..0203501 100644 --- a/xen/arch/arm/domctl.c +++ b/xen/arch/arm/domctl.c @@ -9,12 +9,90 @@ #include <xen/lib.h> #include <xen/errno.h> #include <xen/sched.h> +#include <xen/hvm/save.h> +#include <xen/guest_access.h> #include <public/domctl.h> long arch_do_domctl(struct xen_domctl *domctl, struct domain *d, XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) { - return -ENOSYS; + long ret = 0; + bool_t copyback = 0; + + switch ( domctl->cmd ) + { + case XEN_DOMCTL_sethvmcontext: + { + struct hvm_domain_context c = { .size = domctl->u.hvmcontext.size }; + + ret = -ENOMEM; + if ( (c.data = xmalloc_bytes(c.size)) == NULL ) + goto sethvmcontext_out; + + ret = -EFAULT; + if ( copy_from_guest(c.data, domctl->u.hvmcontext.buffer, c.size) != 0) + goto sethvmcontext_out; + + domain_pause(d); + ret = hvm_load(d, &c); + domain_unpause(d); + + sethvmcontext_out: + if ( c.data != NULL ) + xfree(c.data); + } + break; + case XEN_DOMCTL_gethvmcontext: + { + struct hvm_domain_context c = { 0 }; + + ret = -EINVAL; + + c.size = hvm_save_size(d); + + if ( guest_handle_is_null(domctl->u.hvmcontext.buffer) ) + { + /* Client is querying for the correct buffer size */ + domctl->u.hvmcontext.size = c.size; + ret = 0; + goto gethvmcontext_out; + } + + /* Check that the client has a big enough buffer */ + ret = -ENOSPC; + if ( domctl->u.hvmcontext.size < c.size ) + { + printk("(gethvmcontext) size error: %d and %d\n", + domctl->u.hvmcontext.size, c.size ); + goto gethvmcontext_out; + } + + domain_pause(d); + ret = hvm_save(d, &c); + domain_unpause(d); + + domctl->u.hvmcontext.size = c.cur; + if ( copy_to_guest(domctl->u.hvmcontext.buffer, c.data, c.size) != 0 ) + { + printk("(gethvmcontext) copy to guest failed\n"); + ret = -EFAULT; + } + + gethvmcontext_out: + copyback = 1; + + if ( c.data != NULL ) + xfree(c.data); + } + break; + default: + return -EINVAL; + } + + if ( copyback && __copy_to_guest(u_domctl, domctl, 1) ) + ret = -EFAULT; + + return ret; } void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c) diff --git a/xen/arch/arm/hvm/hvm.c b/xen/arch/arm/hvm/hvm.c index 471c4cd..e95a794 100644 --- a/xen/arch/arm/hvm/hvm.c +++ b/xen/arch/arm/hvm/hvm.c @@ -7,9 +7,11 @@ #include <xsm/xsm.h> +#include <xen/hvm/save.h> #include <public/xen.h> #include <public/hvm/params.h> #include <public/hvm/hvm_op.h> +#include <public/arch-arm/hvm/save.h> #include <asm/hypercall.h> @@ -65,3 +67,143 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) return rc; } + +static int gic_save(struct domain *d, hvm_domain_context_t *h) +{ + struct hvm_hw_gic ctxt; + struct vcpu *v; + + /* Save the state of GICs */ + for_each_vcpu( d, v ) + { + ctxt.gic_hcr = v->arch.gic_hcr; + ctxt.gic_vmcr = v->arch.gic_vmcr; + ctxt.gic_apr = v->arch.gic_apr; + + if ( hvm_save_entry(GIC, v->vcpu_id, h, &ctxt) != 0 ) + return 1; + } + return 0; +} + +static int gic_load(struct domain *d, hvm_domain_context_t *h) +{ + int vcpuid; + struct hvm_hw_gic ctxt; + struct vcpu *v; + + /* Which vcpu is this? */ + vcpuid = hvm_load_instance(h); + if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) + { + dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n", + d->domain_id, vcpuid); + return -EINVAL; + } + + if ( hvm_load_entry(GIC, h, &ctxt) != 0 ) + return -EINVAL; + + v->arch.gic_hcr = ctxt.gic_hcr; + v->arch.gic_vmcr = ctxt.gic_vmcr; + v->arch.gic_apr = ctxt.gic_apr; + + return 0; +} + +HVM_REGISTER_SAVE_RESTORE(GIC, gic_save, gic_load, 1, HVMSR_PER_VCPU); + +static int vtimer_save(struct domain *d, hvm_domain_context_t *h) +{ + struct hvm_hw_vtimer ctxt; + struct vcpu *v; + + ctxt.vtb_offset = d->arch.virt_timer_base.offset; + + /* Save the state of vtimer */ + for_each_vcpu( d, v ) + { + ctxt.cval = v->arch.virt_timer.cval; + ctxt.ctl = v->arch.virt_timer.ctl; + if ( hvm_save_entry(VTIMER, v->vcpu_id, h, &ctxt) != 0 ) + return 1; + } + + return 0; +} + +static int vtimer_load(struct domain *d, hvm_domain_context_t *h) +{ + int vcpuid; + struct hvm_hw_vtimer ctxt; + struct vcpu *v; + /* Which vcpu is this? */ + vcpuid = hvm_load_instance(h); + if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) + { + dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n", + d->domain_id, vcpuid); + return -EINVAL; + } + + if ( hvm_load_entry(VTIMER, h, &ctxt) != 0 ) + return -EINVAL; + + v->arch.virt_timer.cval = ctxt.cval; + v->arch.virt_timer.ctl = ctxt.ctl; + v->arch.virt_timer.v = v; + + d->arch.virt_timer_base.offset = ctxt.vtb_offset; + + return 0; +} + +HVM_REGISTER_SAVE_RESTORE(VTIMER, vtimer_save, vtimer_load, 1, HVMSR_PER_VCPU); + +static int ptimer_save(struct domain *d, hvm_domain_context_t *h) +{ + struct hvm_hw_ptimer ctxt; + struct vcpu *v; + + ctxt.ptb_offset = d->arch.phys_timer_base.offset; + + /* Save the state of ptimer */ + for_each_vcpu( d, v ) + { + ctxt.p_cval = v->arch.phys_timer.cval; + ctxt.p_ctl = v->arch.phys_timer.ctl; + + if ( hvm_save_entry(PTIMER, v->vcpu_id, h, &ctxt) != 0 ) + return 1; + } + + return 0; +} + +static int ptimer_load(struct domain *d, hvm_domain_context_t *h) +{ + int vcpuid; + struct hvm_hw_ptimer ctxt; + struct vcpu *v; + /* Which vcpu is this? */ + vcpuid = hvm_load_instance(h); + if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) + { + dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n", + d->domain_id, vcpuid); + return -EINVAL; + } + + if ( hvm_load_entry(PTIMER, h, &ctxt) != 0 ) + return -EINVAL; + + v->arch.phys_timer.cval = ctxt.p_cval; + v->arch.phys_timer.ctl = ctxt.p_ctl; + v->arch.phys_timer.v = v; + + d->arch.phys_timer_base.offset = ctxt.ptb_offset; + + return 0; +} + +HVM_REGISTER_SAVE_RESTORE(PTIMER, ptimer_save, ptimer_load, 1, HVMSR_PER_VCPU); diff --git a/xen/common/Makefile b/xen/common/Makefile index 0dc2050..956cf29 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -60,6 +60,8 @@ subdir-$(CONFIG_COMPAT) += compat subdir-$(x86_64) += hvm +subdir-$(CONFIG_ARM) += hvm + subdir-$(coverage) += gcov subdir-y += libelf diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h index fbfb2db..a84f49e 100644 --- a/xen/include/public/arch-arm/hvm/save.h +++ b/xen/include/public/arch-arm/hvm/save.h @@ -43,6 +43,44 @@ struct hvm_save_header DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header); +struct hvm_hw_gic +{ + uint32_t gic_hcr; + uint32_t gic_vmcr; + uint32_t gic_apr; +}; + +DECLARE_HVM_SAVE_TYPE(GIC, 3, struct hvm_hw_gic); + +struct hvm_hw_vtimer +{ + uint64_t vtb_offset; + uint64_t vtb_cval; + + uint32_t ctl; + uint64_t offset; + uint64_t cval; +}; + +DECLARE_HVM_SAVE_TYPE(VTIMER, 4, struct hvm_hw_vtimer); + +struct hvm_hw_ptimer +{ + uint64_t ptb_offset; + uint64_t ptb_cval; + + uint32_t p_ctl; + uint64_t p_offset; + uint64_t p_cval; +}; + +DECLARE_HVM_SAVE_TYPE(PTIMER, 5, struct hvm_hw_ptimer); + +/* + * Largest type-code in use + */ +#define HVM_SAVE_CODE_MAX 19 + #endif /* -- 1.8.1.2
Jaeyong Yoo
2013-Jul-03 09:15 UTC
[PATCH RFC v2 3/9] xen/arm: Add more registers for saving and restoring vcpu registers
Add more registers for saving and restoring vcpu registers for live migration. Those registers are selected from the registers stored when vcpu context switching. Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com> --- xen/arch/arm/domain.c | 34 ++++++++++++++++++++++++++++++++++ xen/arch/arm/domctl.c | 35 +++++++++++++++++++++++++++++++++++ xen/include/public/arch-arm.h | 35 +++++++++++++++++++++++++++++++++++ 3 files changed, 104 insertions(+) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 4c434a1..73dd0de 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -572,6 +572,40 @@ int arch_set_info_guest( v->arch.ttbr1 = ctxt->ttbr1; v->arch.ttbcr = ctxt->ttbcr; + v->arch.dacr = ctxt->dacr; + v->arch.ifar = ctxt->ifar; + v->arch.ifsr = ctxt->ifsr; + v->arch.dfar = ctxt->dfar; + v->arch.dfsr = ctxt->dfsr; + +#ifdef CONFIG_ARM_32 + v->arch.mair0 = ctxt->mair0; + v->arch.mair1 = ctxt->mair1; +#else + v->arch.mair = ctxt->mair; +#endif + + /* Control Registers */ + v->arch.actlr = ctxt->actlr; + v->arch.cpacr = ctxt->cpacr; + + v->arch.contextidr = ctxt->contextidr; + v->arch.tpidr_el0 = ctxt->tpidr_el0; + v->arch.tpidr_el1 = ctxt->tpidr_el1; + v->arch.tpidrro_el0 = ctxt->tpidrro_el0; + + /* CP 15 */ + v->arch.csselr = ctxt->csselr; + + v->arch.afsr0 = ctxt->afsr0; + v->arch.afsr1 = ctxt->afsr1; + v->arch.vbar = ctxt->vbar; + v->arch.par = ctxt->par; + v->arch.teecr = ctxt->teecr; + v->arch.teehbr = ctxt->teehbr; + v->arch.joscr = ctxt->joscr; + v->arch.jmcr = ctxt->jmcr; + v->is_initialised = 1; if ( ctxt->flags & VGCF_online ) diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c index 0203501..a0719b0 100644 --- a/xen/arch/arm/domctl.c +++ b/xen/arch/arm/domctl.c @@ -107,6 +107,41 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c) ctxt->ttbr1 = v->arch.ttbr1; ctxt->ttbcr = v->arch.ttbcr; + ctxt->dacr = v->arch.dacr; + ctxt->ifar = v->arch.ifar; + ctxt->ifsr = v->arch.ifsr; + ctxt->dfar = v->arch.dfar; + ctxt->dfsr = v->arch.dfsr; + +#ifdef CONFIG_ARM_32 + ctxt->mair0 = v->arch.mair0; + ctxt->mair1 = v->arch.mair1; +#else + ctxt->mair = v->arch.mair; +#endif + + /* Control Registers */ + ctxt->actlr = v->arch.actlr; + ctxt->sctlr = v->arch.sctlr; + ctxt->cpacr = v->arch.cpacr; + + ctxt->contextidr = v->arch.contextidr; + ctxt->tpidr_el0 = v->arch.tpidr_el0; + ctxt->tpidr_el1 = v->arch.tpidr_el1; + ctxt->tpidrro_el0 = v->arch.tpidrro_el0; + + /* CP 15 */ + ctxt->csselr = v->arch.csselr; + + ctxt->afsr0 = v->arch.afsr0; + ctxt->afsr1 = v->arch.afsr1; + ctxt->vbar = v->arch.vbar; + ctxt->par = v->arch.par; + ctxt->teecr = v->arch.teecr; + ctxt->teehbr = v->arch.teehbr; + ctxt->joscr = v->arch.joscr; + ctxt->jmcr = v->arch.jmcr; + if ( !test_bit(_VPF_down, &v->pause_flags) ) ctxt->flags |= VGCF_online; } diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h index 8aa62d3..352c08d 100644 --- a/xen/include/public/arch-arm.h +++ b/xen/include/public/arch-arm.h @@ -191,6 +191,41 @@ struct vcpu_guest_context { uint32_t sctlr, ttbcr; uint64_t ttbr0, ttbr1; + uint32_t ifar, dfar; + uint32_t ifsr, dfsr; + uint32_t dacr; + uint64_t par; + +#ifdef CONFIG_ARM_32 + uint32_t mair0, mair1; + uint32_t tpidr_el0; + uint32_t tpidr_el1; + uint32_t tpidrro_el0; + uint32_t vbar; +#else + uint64_t mair; + uint64_t tpidr_el0; + uint64_t tpidr_el1; + uint64_t tpidrro_el0; + uint64_t vbar; +#endif + + /* Control Registers */ + uint32_t actlr; + uint32_t cpacr; + + uint32_t afsr0, afsr1; + + uint32_t contextidr; + + uint32_t teecr, teehbr; /* ThumbEE, 32-bit guests only */ + +#ifdef CONFIG_ARM_32 + uint32_t joscr, jmcr; +#endif + + /* CP 15 */ + uint32_t csselr; }; typedef struct vcpu_guest_context vcpu_guest_context_t; DEFINE_XEN_GUEST_HANDLE(vcpu_guest_context_t); -- 1.8.1.2
Jaeyong Yoo
2013-Jul-03 09:15 UTC
[PATCH RFC v2 4/9] xen/arm: Implement get_maximum_gpfn hypercall for arm
From: Alexey Sokolov <sokolov.a@samsung.com> Since we do not know the maximum gpfn size for guest domain, we walk the page table of guest in order to see the maximum size of gpfn. Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com> --- xen/arch/arm/mm.c | 3 +- xen/arch/arm/p2m.c | 69 +++++++++++++++++++++++++++++++++++++++++++ xen/include/asm-arm/p2m.h | 3 ++ xen/include/public/arch-arm.h | 6 ++++ 4 files changed, 80 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index d1290cd..650b1fc 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -762,7 +762,8 @@ int page_is_ram_type(unsigned long mfn, unsigned long mem_type) unsigned long domain_get_maximum_gpfn(struct domain *d) { - return -ENOSYS; + xen_pfn_t result = p2m_get_next_non_used_gpfn(d, GUEST_RAM_BASE >> PAGE_SHIFT); + return result; } void share_xen_page_with_guest(struct page_info *page, diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index 9fc5534..8bf7eb7 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -5,6 +5,75 @@ #include <xen/domain_page.h> #include <asm/flushtlb.h> #include <asm/gic.h> +#include <xen/guest_access.h> + +/* + * walk the guest page table to find out the next non-used gpfn + */ +xen_pfn_t p2m_get_next_non_used_gpfn(struct domain *d, xen_pfn_t start) +{ + struct p2m_domain *p2m = &d->arch.p2m; + paddr_t start_addr = start << PAGE_SHIFT; + int first_index = first_table_offset(start_addr); + int second_index = second_table_offset(start_addr); + int third_index = third_table_offset(start_addr); + lpae_t *first = __map_domain_page(p2m->first_level); + lpae_t *second = NULL; + lpae_t *third = NULL; + + BUG_ON(!first && "Can''t map first level p2m"); + + spin_lock(&p2m->lock); + + while (first_index < LPAE_ENTRIES*2) + { + lpae_walk_t first_pte = first[first_index].walk; + if (!first_pte.valid || !first_pte.table) + { + goto out; + } + second = map_domain_page(first_pte.base); + BUG_ON(!second && "Can''t map second level p2m"); + while (second_index < LPAE_ENTRIES) + { + lpae_walk_t second_pte = second[second_index].walk; + if (!second_pte.valid || !second_pte.table) + { + goto out; + } + third = map_domain_page(second_pte.base); + BUG_ON(!third && "Can''t map third level p2m"); + while (third_index < LPAE_ENTRIES) + { + lpae_walk_t third_pte = third[third_index].walk; + if (!third_pte.valid) + { + goto out; + } + third_index++; + } + unmap_domain_page(third); third = NULL; + second_index++; + third_index = 0; + } + unmap_domain_page(second); second = NULL; + first_index++; + second_index = 0; + third_index = 0; + } + +out: + if (third) unmap_domain_page(third); + if (second) unmap_domain_page(second); + if (first) unmap_domain_page(first); + + spin_unlock(&p2m->lock); + + return ( ((xen_pfn_t)first_index << FIRST_SHIFT) | + (second_index << SECOND_SHIFT) | + (third_index << THIRD_SHIFT) + ) >> PAGE_SHIFT; +} void dump_p2m_lookup(struct domain *d, paddr_t addr) { diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h index a00069b..379c453 100644 --- a/xen/include/asm-arm/p2m.h +++ b/xen/include/asm-arm/p2m.h @@ -57,6 +57,9 @@ void guest_physmap_remove_page(struct domain *d, unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn); +/* walk the guest page table to find out the next non-used gpfn */ +xen_pfn_t p2m_get_next_non_used_gpfn(struct domain *d, xen_pfn_t start); + /* * Populate-on-demand */ diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h index 352c08d..4a53692 100644 --- a/xen/include/public/arch-arm.h +++ b/xen/include/public/arch-arm.h @@ -274,6 +274,12 @@ typedef uint64_t xen_callback_t; #define PSR_GUEST_INIT (PSR_ABT_MASK|PSR_FIQ_MASK|PSR_IRQ_MASK|PSR_MODE_SVC) +/* + * Guest virtual RAM starts here. This must be consistent with the DTB + * appended to the guest kernel. + */ +#define GUEST_RAM_BASE 0x80000000 + #endif /* __XEN_PUBLIC_ARCH_ARM_H__ */ /* -- 1.8.1.2
From: Alexey Sokolov <sokolov.a@samsung.com> Making sched_op in do_suspend (driver/xen/manage.c) returns 0 on the success of suspend. Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com> --- tools/include/xen-foreign/reference.size | 2 +- tools/libxc/xc_resume.c | 25 +++++++++++++++++++++++++ 2 files changed, 26 insertions(+), 1 deletion(-) diff --git a/tools/include/xen-foreign/reference.size b/tools/include/xen-foreign/reference.size index de36455..dfd691c 100644 --- a/tools/include/xen-foreign/reference.size +++ b/tools/include/xen-foreign/reference.size @@ -5,7 +5,7 @@ start_info | - - 1112 1168 trap_info | - - 8 16 cpu_user_regs | - - 68 200 vcpu_guest_core_regs | 304 304 - - -vcpu_guest_context | 336 336 2800 5168 +vcpu_guest_context | 440 440 2800 5168 arch_vcpu_info | - - 24 16 vcpu_time_info | - - 32 32 vcpu_info | - - 64 64 diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c index 1c43ec6..32b58b9 100644 --- a/tools/libxc/xc_resume.c +++ b/tools/libxc/xc_resume.c @@ -87,6 +87,31 @@ static int modify_returncode(xc_interface *xch, uint32_t domid) return 0; } +#elif defined(__arm__) + +static int modify_returncode(xc_interface *xch, uint32_t domid) +{ + vcpu_guest_context_any_t ctxt; + xc_dominfo_t info; + int rc; + + if (xc_domain_getinfo(xch, domid, 1, &info) != 1) + { + PERROR("Could not get domain info"); + return -1; + } + + if ( (rc = xc_vcpu_getcontext(xch, domid, 0, &ctxt)) != 0 ) + return rc; + + ctxt.c.user_regs.r0_usr = 1; + + if ( (rc = xc_vcpu_setcontext(xch, domid, 0, &ctxt)) != 0 ) + return rc; + + return 0; +} + #else static int modify_returncode(xc_interface *xch, uint32_t domid) -- 1.8.1.2
Jaeyong Yoo
2013-Jul-03 09:15 UTC
[PATCH RFC v2 6/9] xen/arm: Add handling write fault for dirty-page tracing
From: Elena Pyatunina <e.pyatunina@samsung.com> Add handling write fault in do_trap_data_abort_guest for dirty-page tracing. Mark dirty function makes the bitmap of dirty pages. Signed-off-by: Elena Pyatunina <e.pyatunina@samsung.com> Singed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> --- xen/arch/arm/mm.c | 59 ++++++++++++++++++++++++++++++- xen/arch/arm/p2m.c | 84 ++++++++++++++++++++++++++++++++++++++++++++ xen/arch/arm/traps.c | 7 ++++ xen/include/asm-arm/domain.h | 8 +++++ xen/include/asm-arm/mm.h | 2 ++ xen/include/asm-arm/p2m.h | 3 ++ 6 files changed, 162 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 650b1fc..f509008 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -680,7 +680,6 @@ void destroy_xen_mappings(unsigned long v, unsigned long e) create_xen_entries(REMOVE, v, 0, (e - v) >> PAGE_SHIFT, 0); } -enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; static void set_pte_flags_on_range(const char *p, unsigned long l, enum mg mg) { lpae_t pte; @@ -1142,6 +1141,64 @@ int is_iomem_page(unsigned long mfn) return 1; return 0; } + +/* + * Set l3e entries of P2M table to be read-only. + * + * On first write, it page faults, its entry is changed to read-write, + * and on retry the write succeeds. + * + * Populate dirty_bitmap by looking for entries that have been + * switched to read-write. + */ +int handle_page_fault(struct domain *d, paddr_t addr) +{ + lpae_t pte, *first = NULL, *second = NULL, *third = NULL; + struct p2m_domain *p2m = &d->arch.p2m; + + if ( d->arch.dirty.mode == 0 || first_table_offset(addr) >= LPAE_ENTRIES ) + return 1; + + spin_lock(&p2m->lock); + + first = __map_domain_page(p2m->first_level); + + if ( !first || + !first[first_table_offset(addr)].walk.valid || + !first[first_table_offset(addr)].walk.table ) + goto done; + + second = map_domain_page(first[first_table_offset(addr)].walk.base); + + if ( !second || + !second[second_table_offset(addr)].walk.valid || + !second[second_table_offset(addr)].walk.table ) + goto done; + + third = map_domain_page(second[second_table_offset(addr)].walk.base); + + if ( !third ) + goto done; + + pte = third[third_table_offset(addr)]; + + if (pte.p2m.valid && pte.p2m.write == 0) + { + mark_dirty(d, addr); + pte.p2m.write = 1; + write_pte(&third[third_table_offset(addr)], pte); + flush_tlb_local(); + } + +done: + if (third) unmap_domain_page(third); + if (second) unmap_domain_page(second); + if (first) unmap_domain_page(first); + + spin_unlock(&p2m->lock); + return 0; +} + /* * Local variables: * mode: C diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index 8bf7eb7..bae7af7 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -412,6 +412,90 @@ unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn) return p >> PAGE_SHIFT; } +static struct page_info * new_dirty_page(void) +{ + struct page_info *page = NULL; + void *p; + + page = alloc_domheap_page(NULL, 0); + if ( page == NULL ) + return NULL; + + p = __map_domain_page(page); + + clear_page(p); + unmap_domain_page(p); + + return page; +} + +void mark_dirty(struct domain *d, paddr_t addr) +{ + struct page_info *page; + xen_pfn_t *first = NULL, *second = NULL, *third = NULL; + void *p; + int changed; + + spin_lock(&d->arch.dirty.lock); + + if (d->arch.dirty.top == NULL) + { + page = alloc_domheap_pages(NULL, 1, 0); + if (page == NULL) + { + printk("Error: couldn''t allocate page for dirty bitmap!\n"); + spin_unlock(&d->arch.dirty.lock); + return; + } + + INIT_PAGE_LIST_HEAD(&d->arch.dirty.pages); + page_list_add(page, &d->arch.dirty.pages); + + /* Clear both first level pages */ + p = __map_domain_page(page); + clear_page(p); + unmap_domain_page(p); + + p = __map_domain_page(page + 1); + clear_page(p); + unmap_domain_page(p); + + d->arch.dirty.top = page; + } + + first = __map_domain_page(d->arch.dirty.top); + BUG_ON(!first && "Can''t map first level p2m."); + if ( !first[first_table_offset(addr)]) + { + page = new_dirty_page(); + page_list_add(page, &d->arch.dirty.pages); + first[first_table_offset(addr)] = page_to_mfn(page); + } + + second = map_domain_page(first[first_table_offset(addr)]); + BUG_ON(!second && "Can''t map second level p2m."); + if (!second[second_table_offset(addr)]) + { + page = new_dirty_page(); + page_list_add(page, &d->arch.dirty.pages); + second[second_table_offset(addr)] = page_to_mfn(page); + } + + third = map_domain_page(second[second_table_offset(addr)]); + BUG_ON(!third && "Can''t map third level p2m."); + changed = !__test_and_set_bit(third_table_offset(addr), third); + if (changed) + { + d->arch.dirty.count++; + } + + if (third) unmap_domain_page(third); + if (second) unmap_domain_page(second); + if (first) unmap_domain_page(first); + + spin_unlock(&d->arch.dirty.lock); +} + /* * Local variables: * mode: C diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 398d209..9927712 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1011,6 +1011,13 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, if ( rc == -EFAULT ) goto bad_data_abort; + /* handle permission fault on write */ + if ((dabt.dfsc & 0x3f) == (FSC_FLT_PERM + 3) && dabt.write) + { + if (handle_page_fault(current->domain, info.gpa) == 0) + return; + } + if (handle_mmio(&info)) { regs->pc += dabt.len ? 4 : 2; diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index cb251cc..3536757 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -107,6 +107,14 @@ struct arch_domain spinlock_t lock; } uart0; + struct { + spinlock_t lock; + int mode; + unsigned int count; + struct page_info *top; + struct page_list_head pages; + } dirty; + } __cacheline_aligned; struct arch_vcpu diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index 5e7c5a3..2b961de 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -327,6 +327,8 @@ static inline void put_page_and_type(struct page_info *page) put_page(page); } +int handle_page_fault(struct domain *d, paddr_t addr); + #endif /* __ARCH_ARM_MM__ */ /* * Local variables: diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h index 379c453..fd5890f 100644 --- a/xen/include/asm-arm/p2m.h +++ b/xen/include/asm-arm/p2m.h @@ -110,6 +110,9 @@ static inline int get_page_and_type(struct page_info *page, return rc; } +enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; +void mark_dirty(struct domain *d, paddr_t addr); + #endif /* _XEN_P2M_H */ /* -- 1.8.1.2
Jaeyong Yoo
2013-Jul-03 09:15 UTC
[PATCH RFC v2 7/9] xen/arm: Missing impl of clear_guest_offset macro
This macro is somehow accidently broken. Fill up the rest of it. Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> --- xen/include/asm-arm/guest_access.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-arm/guest_access.h index 34aae14..1104090 100644 --- a/xen/include/asm-arm/guest_access.h +++ b/xen/include/asm-arm/guest_access.h @@ -77,8 +77,10 @@ unsigned long raw_clear_guest(void *to, unsigned len); * Clear an array of objects in guest context via a guest handle, * specifying an offset into the guest array. */ -#define clear_guest_offset(hnd, off, ptr, nr) ({ \ - raw_clear_guest(_d+(off), nr); \ +#define clear_guest_offset(hnd, off, ptr, nr) ({ \ + const typeof(*(ptr)) *_s = (ptr); \ + char (*_d)[sizeof(*_s)] = (void *)(hnd).p; \ + raw_clear_guest(_d+(off), nr); \ }) /* -- 1.8.1.2
Jaeyong Yoo
2013-Jul-03 09:15 UTC
[PATCH RFC v2 8/9] xen/arm: Implement hypercall for dirty page tracing (shadow op)
From: Elena Pyatunina <e.pyatunina@samsung.com> Add hypercall (shadow op: enable/disable and clean/peek dirted page bitmap) Signed-off-by: Elena Pyatunina <e.pyatunina@samsung.com> Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com> --- xen/arch/arm/domain.c | 4 + xen/arch/arm/domctl.c | 19 ++++ xen/arch/arm/p2m.c | 219 +++++++++++++++++++++++++++++++++++++++++++++- xen/include/asm-arm/p2m.h | 2 + 4 files changed, 242 insertions(+), 2 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 73dd0de..e3fc533 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -484,6 +484,10 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) d->arch.vpidr = boot_cpu_data.midr.bits; d->arch.vmpidr = boot_cpu_data.mpidr.bits; + /* init for dirty-page tracing */ + d->arch.dirty.count = 0; + d->arch.dirty.top = NULL; + clear_page(d->shared_info); share_xen_page_with_guest( virt_to_page(d->shared_info), d, XENSHARE_writable); diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c index a0719b0..bc06534 100644 --- a/xen/arch/arm/domctl.c +++ b/xen/arch/arm/domctl.c @@ -67,6 +67,14 @@ long arch_do_domctl(struct xen_domctl *domctl, struct domain *d, goto gethvmcontext_out; } + /* Allocate our own marshalling buffer */ + ret = -ENOMEM; + if ( (c.data = xmalloc_bytes(c.size)) == NULL ) + { + printk("(gethvmcontext) xmalloc_bytes failed: %d\n", c.size ); + goto gethvmcontext_out; + } + domain_pause(d); ret = hvm_save(d, &c); domain_unpause(d); @@ -85,6 +93,17 @@ long arch_do_domctl(struct xen_domctl *domctl, struct domain *d, xfree(c.data); } break; + case XEN_DOMCTL_shadow_op: + { + ret = dirty_mode_op(d, &domctl->u.shadow_op); + if ( (&domctl->u.shadow_op)->op == XEN_DOMCTL_SHADOW_OP_CLEAN + || (&domctl->u.shadow_op)->op == XEN_DOMCTL_SHADOW_OP_PEEK) + { + copyback = 1; + } + } + break; + default: return -EINVAL; } diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index bae7af7..fb8ce10 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -69,8 +69,8 @@ out: spin_unlock(&p2m->lock); - return ( ((xen_pfn_t)first_index << FIRST_SHIFT) | - (second_index << SECOND_SHIFT) | + return ( ((xen_pfn_t)first_index << FIRST_SHIFT) | + (second_index << SECOND_SHIFT) | (third_index << THIRD_SHIFT) ) >> PAGE_SHIFT; } @@ -496,6 +496,221 @@ void mark_dirty(struct domain *d, paddr_t addr) spin_unlock(&d->arch.dirty.lock); } +static void free_dirty_bitmap(struct domain *d) +{ + struct page_info *pg; + + spin_lock(&d->arch.dirty.lock); + + if(d->arch.dirty.top) + { + while ( (pg = page_list_remove_head(&d->arch.dirty.pages)) ) + free_domheap_page(pg); + } + d->arch.dirty.top = NULL; + d->arch.dirty.count = 0; + + spin_unlock(&d->arch.dirty.lock); +} + +/* Change types across all p2m entries in a domain */ +static void p2m_change_entry_type_global(struct domain *d, enum mg ot, enum mg nt) +{ + struct p2m_domain *p2m = &d->arch.p2m; + int i1, i2, i3; + int first_index = first_table_offset(GUEST_RAM_BASE); + int second_index = second_table_offset(GUEST_RAM_BASE); + int third_index = third_table_offset(GUEST_RAM_BASE); + + lpae_t *first = __map_domain_page(p2m->first_level); + lpae_t pte, *second = NULL, *third = NULL; + + BUG_ON(!first && "Can''t map first level p2m."); + + spin_lock(&p2m->lock); + + for (i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1) + { + lpae_walk_t first_pte = first[i1].walk; + if (!first_pte.valid || !first_pte.table) + goto out; + + second = map_domain_page(first_pte.base); + BUG_ON(!second && "Can''t map second level p2m."); + for(i2 = second_index; i2 < LPAE_ENTRIES; ++i2) + { + lpae_walk_t second_pte = second[i2].walk; + if (!second_pte.valid || !second_pte.table) + goto out; + + third = map_domain_page(second_pte.base); + BUG_ON(!third && "Can''t map third level p2m."); + + for (i3 = third_index; i3 < LPAE_ENTRIES; ++i3) + { + lpae_walk_t third_pte = third[i3].walk; + int write = 0; + if (!third_pte.valid) + goto out; + + pte = third[i3]; + if (pte.p2m.write == 1 && nt == mg_ro) + { + pte.p2m.write = 0; + write = 1; + } + else if (pte.p2m.write == 0 && nt == mg_rw) + { + pte.p2m.write = 1; + write = 1; + } + if (write) + write_pte(&third[i3], pte); + } + unmap_domain_page(third); + + third = NULL; + third_index = 0; + } + unmap_domain_page(second); + + second = NULL; + second_index = 0; + third_index = 0; + } + +out: + flush_tlb_all_local(); + if (third) unmap_domain_page(third); + if (second) unmap_domain_page(second); + if (first) unmap_domain_page(first); + + spin_unlock(&p2m->lock); +} + +/* Read a domain''s log-dirty bitmap and stats. + * If the operation is a CLEAN, clear the bitmap and stats. */ +static int log_dirty_op(struct domain *d, xen_domctl_shadow_op_t *sc) +{ + int i1, i2, rv = 0, clean = 0, peek = 1; + xen_pfn_t *first = NULL, *second = NULL; + unsigned long pages = 0, *third = NULL; + + clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN); + + if (guest_handle_is_null(sc->dirty_bitmap)) + peek = 0; + + domain_pause(d); + + printk("DEBUG: log-dirty %s: dom %u dirty=%u\n", + (clean) ? "clean" : "peek", + d->domain_id, d->arch.dirty.count); + + sc->stats.dirty_count = d->arch.dirty.count; + + spin_lock(&d->arch.dirty.lock); + first = d->arch.dirty.top ? __map_domain_page(d->arch.dirty.top) : NULL; + + /* 32-bit address space starts from 0 and ends at ffffffff. + * First level of p2m tables is indexed by bits 39-30. + * So significant bits for 32-bit addresses in first table are 31-30. + * So we need to iterate from 0 to 4. + * But RAM starts from 2 gigabytes. So start from 2 instead. + * TODO remove this hardcode + */ + for (i1 = 2; (pages < sc->pages) && (i1 < 4); ++i1) + { + second = (first && first[i1]) ? map_domain_page(first[i1]) : NULL; + + for (i2 = 0; (i2 < LPAE_ENTRIES); ++i2) + { + unsigned int bytes = LPAE_ENTRIES/8; + third = (second && second[i2]) ? map_domain_page(second[i2]) : NULL; + if ( unlikely(((sc->pages - pages + 7) >> 3) < bytes) ) + bytes = (unsigned int)((sc->pages - pages + 7) >> 3); + if (peek) + { + if ( (third ? copy_to_guest_offset(sc->dirty_bitmap, pages >> 3, + (uint8_t *)third, bytes) + : clear_guest_offset(sc->dirty_bitmap, pages >> 3, + (uint8_t*)NULL /*only type matters*/, + bytes)) != 0 ) + { + rv = 1; + goto out; + } + } + pages += bytes << 3; + if (third) + { + if (clean) clear_page(third); + unmap_domain_page(third); + } + } + if (second) unmap_domain_page(second); + } + if (first) unmap_domain_page(first); + + spin_unlock(&d->arch.dirty.lock); + + if ( pages < sc->pages ) + sc->pages = pages; + + if (clean) + { + p2m_change_entry_type_global(d, mg_rw, mg_ro); + free_dirty_bitmap(d); + } + +out: + if (rv) + { + spin_unlock(&d->arch.dirty.lock); + } + domain_unpause(d); + return rv; +} + +long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc) +{ + long ret = 0; + + switch (sc->op) + { + case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY: + case XEN_DOMCTL_SHADOW_OP_OFF: + { + enum mg ot = mg_clear, nt = mg_clear; + + ot = sc->op == XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY ? mg_rw : mg_ro; + nt = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? mg_rw : mg_ro; + + domain_pause(d); + + d->arch.dirty.mode = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? 0 : 1; + p2m_change_entry_type_global(d, ot, nt); + + if (sc->op == XEN_DOMCTL_SHADOW_OP_OFF) + free_dirty_bitmap(d); + + domain_unpause(d); + } + break; + + case XEN_DOMCTL_SHADOW_OP_CLEAN: + case XEN_DOMCTL_SHADOW_OP_PEEK: + { + ret = log_dirty_op(d, sc); + } + break; + + default: + return -ENOSYS; + } + return ret; +} + /* * Local variables: * mode: C diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h index fd5890f..498cf0b 100644 --- a/xen/include/asm-arm/p2m.h +++ b/xen/include/asm-arm/p2m.h @@ -2,6 +2,7 @@ #define _XEN_P2M_H #include <xen/mm.h> +#include <public/domctl.h> struct domain; @@ -112,6 +113,7 @@ static inline int get_page_and_type(struct page_info *page, enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; void mark_dirty(struct domain *d, paddr_t addr); +long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc); #endif /* _XEN_P2M_H */ -- 1.8.1.2
Jaeyong Yoo
2013-Jul-03 09:15 UTC
[PATCH RFC v2 9/9] xen/arm: Implement toolstack for xl restore/save and migrate
From: "From: Alexey Sokolov" <sokolov.a@samsung.com> Implement for xl restore/save (which are also used for migrate) operation in xc_arm_migrate.c and make it compilable. Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com> Signed-off-by: Elena Pyatunina <e.pyatunina@samsung.com> Singed-off-by: Evgeny Fedotov <e.fedotov@samsung.com> Singed-off-by: Nikolay Martyanov <n.martyanov@samsung.com> --- config/arm32.mk | 1 + tools/libxc/Makefile | 5 + tools/libxc/xc_arm_migrate.c | 586 +++++++++++++++++++++++++++++++++++++++++++ tools/misc/Makefile | 4 + 4 files changed, 596 insertions(+) create mode 100644 tools/libxc/xc_arm_migrate.c diff --git a/config/arm32.mk b/config/arm32.mk index d8e958b..4e35337 100644 --- a/config/arm32.mk +++ b/config/arm32.mk @@ -1,6 +1,7 @@ CONFIG_ARM := y CONFIG_ARM_32 := y CONFIG_ARM_$(XEN_OS) := y +CONFIG_MIGRATE := y # -march= -mcpu diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile index 512a994..05dfef4 100644 --- a/tools/libxc/Makefile +++ b/tools/libxc/Makefile @@ -42,8 +42,13 @@ CTRL_SRCS-$(CONFIG_MiniOS) += xc_minios.c GUEST_SRCS-y : GUEST_SRCS-y += xg_private.c xc_suspend.c ifeq ($(CONFIG_MIGRATE),y) +ifeq ($(CONFIG_X86),y) GUEST_SRCS-y += xc_domain_restore.c xc_domain_save.c GUEST_SRCS-y += xc_offline_page.c xc_compression.c +endif +ifeq ($(CONFIG_ARM),y) +GUEST_SRCS-y += xc_arm_migrate.c +endif else GUEST_SRCS-y += xc_nomigrate.c endif diff --git a/tools/libxc/xc_arm_migrate.c b/tools/libxc/xc_arm_migrate.c new file mode 100644 index 0000000..936dd3d --- /dev/null +++ b/tools/libxc/xc_arm_migrate.c @@ -0,0 +1,586 @@ +/****************************************************************************** + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; + * version 2.1 of the License. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + * + * Copyright (c) 2013, Samsung Electronics + */ + +#include <inttypes.h> +#include <errno.h> +#include <xenctrl.h> +#include <xenguest.h> + +#include <unistd.h> +#include <xc_private.h> +#include <xc_dom.h> +#include "xc_bitops.h" +#include "xg_private.h" + +#define DEF_MAX_ITERS 29 /* limit us to 30 times round loop */ +#define DEF_MAX_FACTOR 3 /* never send more than 3x p2m_size */ +#define DEF_MIN_DIRTY_PER_ITER 50 /* dirty page count to define last iter */ +#define DEF_PROGRESS_RATE 50 /* progress bar update rate */ + +static int suspend_and_state(int (*suspend)(void*), void *data, + xc_interface *xch, int dom) +{ + xc_dominfo_t info; + if ( !(*suspend)(data) ) + { + ERROR("Suspend request failed"); + return -1; + } + + if ( (xc_domain_getinfo(xch, dom, 1, &info) != 1) || + !info.shutdown || (info.shutdown_reason != SHUTDOWN_suspend) ) + { + ERROR("Domain is not in suspended state after suspend attempt"); + return -1; + } + + return 0; +} + +static int write_exact_handled(xc_interface *xch, int fd, const void *data, size_t size) +{ + if (write_exact(fd, data, size)) + { + ERROR("Write failed, check space"); + return -1; + } + return 0; +} + +/* ============ Memory ============= */ + +static int save_memory(xc_interface *xch, int io_fd, uint32_t dom, + int live, struct save_callbacks* callbacks, + uint32_t max_iters, uint32_t max_factor, int debug) +{ + const xen_pfn_t start = 0x80000; /* Arm-domU RAM starts from 2 gigabytes */ + const char zero = 0; + char reportbuf[80]; + int iter = 0; + int last_iter = !live; + + DECLARE_HYPERCALL_BUFFER(unsigned long, to_send); + + const xen_pfn_t end = xc_domain_maximum_gpfn(xch, dom); + const xen_pfn_t mem_size = end - start; + xen_pfn_t i; + if (write_exact_handled(xch, io_fd, &end, sizeof(xen_pfn_t))) + { + return -1; + } + + if (live) + { + if (xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY, + NULL, 0, NULL, 0, NULL) < 0) + { + ERROR("Couldn''t enable log-dirty mode !\n"); + return -1; + } + if (debug) + { + IPRINTF("Log-dirty mode enabled!\n"); + } + + to_send = xc_hypercall_buffer_alloc_pages(xch, to_send, NRPAGES(bitmap_size(mem_size))); + + if (!to_send) + { + ERROR("Couldn''t allocate to_send array!\n"); + return -1; + } + max_iters = max_iters ? : DEF_MAX_ITERS; + max_factor = max_factor ? : DEF_MAX_FACTOR; + + memset(to_send, 0xff, bitmap_size(mem_size)); + } + else + { + /* This is a non-live suspend. Suspend the domain .*/ + if (suspend_and_state(callbacks->suspend, callbacks->data, xch, dom)) + { + ERROR("Domain appears not to have suspended"); + return -1; + } + } + + snprintf(reportbuf, sizeof(reportbuf), + "Saving memory: iter %d ", iter); + xc_report_progress_start(xch, reportbuf, mem_size); + + for (i = start; i < end; ++i) + { + const char one = 1; + char *page = xc_map_foreign_range(xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, i); + if ( !page ) + { + PERROR("xc_map_foreign_range failed, pfn=%llx", i); + return -1; + } + if (write_exact_handled(xch, io_fd, &one, 1) || write_exact_handled(xch, io_fd, &i, sizeof(i)) + || write_exact_handled(xch, io_fd, page, PAGE_SIZE)) + { + munmap(page, PAGE_SIZE); + return -1; + } + munmap(page, PAGE_SIZE); + if (i % DEF_PROGRESS_RATE == 0) xc_report_progress_step(xch, i - start, mem_size); + } + xc_report_progress_step(xch, mem_size, mem_size); + + if (live) + { + int total_dirty_pages_num = 0; + int dirty_pages_on_prev_iter_num = mem_size; + for ( ; ; ) + { + int dirty_pages_on_current_iter_num = 0; + int frc; + iter++; + + snprintf(reportbuf, sizeof(reportbuf), + "Saving memory: iter %d (last sent %u)", + iter, dirty_pages_on_prev_iter_num); + + xc_report_progress_start(xch, reportbuf, mem_size); + + if ( (iter > 1 && dirty_pages_on_prev_iter_num < DEF_MIN_DIRTY_PER_ITER) || + (iter == max_iters) || + (total_dirty_pages_num >= mem_size*max_factor) ) + { + if (debug) IPRINTF("Last iteration"); + last_iter = 1; + } + + if (last_iter) + { + if (suspend_and_state(callbacks->suspend, callbacks->data, xch, dom)) + { + ERROR("Domain appears not to have suspended"); + return -1; + } + } + + frc = xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_CLEAN, + HYPERCALL_BUFFER(to_send), mem_size, NULL, 0, NULL); + if ( frc != mem_size ) + { + ERROR("Error peeking shadow bitmap"); + xc_hypercall_buffer_free_pages(xch, to_send, NRPAGES(bitmap_size(mem_size))); + return -1; + } + + for (i = start; i < end; ++i) + { + if (test_bit(i - start, to_send)) + { + const char one = 1; + char *page = xc_map_foreign_range(xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, i); + if ( !page ) + { + PERROR("xc_map_foreign_range failed, pfn=%llx", i); + return -1; + } + if (write_exact_handled(xch, io_fd, &one, 1) || write_exact_handled(xch, io_fd, &i, sizeof(i)) + || write_exact_handled(xch, io_fd, page, PAGE_SIZE)) + { + munmap(page, PAGE_SIZE); + return -1; + } + munmap(page, PAGE_SIZE); + if (i % DEF_PROGRESS_RATE == 0) xc_report_progress_step(xch, i - start, mem_size); + dirty_pages_on_current_iter_num++; + } + } + xc_report_progress_step(xch, mem_size, mem_size); + + dirty_pages_on_prev_iter_num = dirty_pages_on_current_iter_num; + total_dirty_pages_num += dirty_pages_on_current_iter_num; + + if (last_iter) + { + xc_hypercall_buffer_free_pages(xch, to_send, NRPAGES(bitmap_size(mem_size))); + if (xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_OFF, NULL, 0, NULL, 0, NULL) < 0) + ERROR("Couldn''t disable log-dirty mode"); + break; + } + } + } + return write_exact_handled(xch, io_fd, &zero, 1); +} + +static int restore_memory(xc_interface *xch, int io_fd, uint32_t dom) +{ + const xen_pfn_t start = 0x80000; /* Arm-domU RAM starts from 2 gigabytes */ + xen_pfn_t end; + xen_pfn_t gpfn; + if (read_exact(io_fd, &end, sizeof(xen_pfn_t))) + { + PERROR("First read of incoming memory failed"); + return -1; + } + /* TODO allocate several pages per call */ + for (gpfn = start; gpfn < end; ++gpfn) + { + if (xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &gpfn)) + { + PERROR("Memory allocation for a new domain failed"); + return -1; + } + } + while (1) + { + char new_page; + xen_pfn_t gpfn; + char *page; + if (read_exact(io_fd, &new_page, 1)) + { + PERROR("End-checking flag read failed during memory transfer"); + return -1; + } + if (!new_page) + { + break; + } + if (read_exact(io_fd, &gpfn, sizeof(gpfn))) + { + PERROR("GPFN read failed during memory transfer"); + return -1; + } + if (gpfn < start || gpfn >= end) { + ERROR("GPFN %llx doesn''t belong to RAM address space", gpfn); + return -1; + } + page = xc_map_foreign_range(xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, gpfn); + if ( !page ) + { + PERROR("xc_map_foreign_range failed, pfn=%llx", gpfn); + return -1; + } + if (read_exact(io_fd, page, PAGE_SIZE)) + { + PERROR("Page data read failed during memory transfer"); + return -1; + } + munmap(page, PAGE_SIZE); + } + + return 0; +} + + +/* ============ HVM context =========== */ + +static int save_armhvm(xc_interface *xch, int io_fd, uint32_t dom, int debug) +{ + /* HVM: a buffer for holding HVM context */ + uint32_t hvm_buf_size = 0; + uint8_t *hvm_buf = NULL; + uint32_t rec_size; + int retval = -1; + /* Need another buffer for HVM context */ + hvm_buf_size = xc_domain_hvm_getcontext(xch, dom, 0, 0); + if ( hvm_buf_size == -1 ) + { + ERROR("Couldn''t get HVM context size from Xen"); + goto out; + } + hvm_buf = malloc(hvm_buf_size); + if ( !hvm_buf ) + { + ERROR("Couldn''t allocate memory for hvm buffer"); + goto out; + } + /* Get HVM context from Xen and save it too */ + if ( (rec_size = xc_domain_hvm_getcontext(xch, dom, hvm_buf, + hvm_buf_size)) == -1 ) + { + ERROR("HVM:Could not get hvm buffer"); + goto out; + } + if (debug) IPRINTF("HVM save size %d %d", hvm_buf_size, rec_size); + + if ( write_exact_handled(xch, io_fd, &rec_size, sizeof(uint32_t)) ) + { + goto out; + } + + if ( write_exact_handled(xch, io_fd, hvm_buf, rec_size) ) + { + goto out; + } + retval = 0; +out: + if (hvm_buf) free (hvm_buf); + return retval; +} + +static int restore_armhvm(xc_interface *xch, int io_fd, uint32_t dom) +{ + uint32_t rec_size; + uint32_t hvm_buf_size = 0; + uint8_t *hvm_buf = NULL; + int frc = 0; + int retval = -1; + if (read_exact(io_fd, &rec_size, sizeof(uint32_t))) + { + PERROR("Could not read HVM size"); + goto out; + } + + if ( !rec_size ) + { + ERROR("Zero HVM size"); + goto out; + } +#ifdef ARM_MIGRATE_VERBOSE + IPRINTF("HVM restore size %d %d", hvm_buf_size, rec_size); +#endif + + hvm_buf_size = xc_domain_hvm_getcontext(xch, dom, 0, 0); + if (hvm_buf_size != rec_size) + { + ERROR("HVM size for this domain is not the same as stored"); + } + hvm_buf = malloc(hvm_buf_size); + if ( !hvm_buf ) + { + ERROR("Couldn''t allocate memory"); + goto out; + } + if (read_exact(io_fd, hvm_buf, hvm_buf_size)) + { + PERROR("Could not read HVM context"); + goto out; + } + frc = xc_domain_hvm_setcontext(xch, dom, hvm_buf, hvm_buf_size); + if ( frc ) + { + ERROR("error setting the HVM context"); + goto out; + } + retval = 0; +out: + if (hvm_buf) free (hvm_buf); + return retval; +} + + +/* ================= Console and Xenstore =========== */ + +static int save_console_and_xenstore(xc_interface *xch, int io_fd, uint32_t dom) +{ + unsigned long int console_pfn, store_pfn; + if (xc_get_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN, &console_pfn)) + { + ERROR("Can''t get console gpfn"); + return -1; + } + if (xc_get_hvm_param(xch, dom, HVM_PARAM_STORE_PFN, &store_pfn)) + { + ERROR("Can''t get store gpfn"); + return -1; + } + if ( write_exact_handled(xch, io_fd, &console_pfn, sizeof(console_pfn)) || + write_exact_handled(xch, io_fd, &store_pfn, sizeof(store_pfn)) ) + { + return -1; + } + return 0; +} + +static int restore_console_and_xenstore(xc_interface *xch, int io_fd, uint32_t dom, + unsigned int console_evtchn, unsigned long int *console_gpfn, domid_t console_domid, + unsigned int store_evtchn, unsigned long int *store_gpfn, domid_t store_domid) +{ + int rc = 0; + if (read_exact(io_fd, console_gpfn, sizeof(*console_gpfn))) + { + PERROR("Can''t read console gpfn"); + return -1; + } + if (read_exact(io_fd, store_gpfn, sizeof(*store_gpfn))) + { + PERROR("Can''t read xenstore gpfn"); + return -1; + } + + if ((rc = xc_clear_domain_page(xch, dom, *console_gpfn))) + { + ERROR("Can''t clear console page"); + return rc; + } + if ((rc = xc_clear_domain_page(xch, dom, *store_gpfn))) + { + ERROR("Can''t clear xenstore page"); + return rc; + } + if ((rc = xc_dom_gnttab_hvm_seed(xch, dom, *console_gpfn, *store_gpfn, console_domid, store_domid))) + { + ERROR("Can''t grant console and xenstore pages"); + return rc; + } + if ((rc = xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN, *console_gpfn))) + { + ERROR("Can''t set console gpfn"); + return rc; + } + if ((rc = xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN, *store_gpfn))) + { + ERROR("Can''t set xenstore gpfn"); + return rc; + } + if ((rc = xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_EVTCHN, console_evtchn))) + { + ERROR("Can''t set console event channel"); + return rc; + } + if ((rc = xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_EVTCHN, store_evtchn))) + { + ERROR("Can''t set xenstore event channel"); + return rc; + } + + return 0; +} + + +/* ====================== VCPU ============== */ + +static int save_vcpu(xc_interface *xch, int io_fd, uint32_t dom) +{ + vcpu_guest_context_any_t ctxt; + xc_vcpu_getcontext(xch, dom, 0, &ctxt); + return write_exact_handled(xch, io_fd, &ctxt, sizeof(ctxt)); +} + +static int restore_vcpu(xc_interface *xch, int io_fd, uint32_t dom) +{ + int rc = -1; + DECLARE_DOMCTL; + DECLARE_HYPERCALL_BUFFER(vcpu_guest_context_any_t, ctxt); + ctxt = xc_hypercall_buffer_alloc(xch, ctxt, sizeof(*ctxt)); + memset(ctxt, 0, sizeof(*ctxt)); + if (read_exact(io_fd, ctxt, sizeof(*ctxt))) + { + PERROR("VCPU context read failed"); + goto out; + } + memset(&domctl, 0, sizeof(domctl)); + domctl.cmd = XEN_DOMCTL_setvcpucontext; + domctl.domain = dom; + domctl.u.vcpucontext.vcpu = 0; + set_xen_guest_handle(domctl.u.vcpucontext.ctxt, ctxt); + rc = do_domctl(xch, &domctl); + if (rc) + { + ERROR("VCPU context set failed (error %d)", rc); + } +out: + xc_hypercall_buffer_free(xch, ctxt); + return rc; +} + + +/* ================== Main ============== */ + +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters, + uint32_t max_factor, uint32_t flags, + struct save_callbacks *callbacks, int hvm, + unsigned long vm_generationid_addr) +{ + int live = !!(flags & XCFLAGS_LIVE); + int debug = !! (flags & XCFLAGS_DEBUG); + + if (save_memory(xch, io_fd, dom, live, callbacks, max_iters, max_factor, debug)) + { + ERROR("Memory not saved"); + return -1; + } + + if (save_console_and_xenstore(xch, io_fd, dom)) + { + ERROR("Can''t save console and xenstore"); + return -1; + } + + if (save_armhvm(xch, io_fd, dom, debug)) + { + ERROR("HVM not saved"); + return -1; + } + + if (save_vcpu(xch, io_fd, dom)) + { + ERROR("VCPU not saved"); + return -1; + } + return 0; +} + +int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, + unsigned int store_evtchn, unsigned long *store_gpfn, + domid_t store_domid, unsigned int console_evtchn, + unsigned long *console_gpfn, domid_t console_domid, + unsigned int hvm, unsigned int pae, int superpages, + int no_incr_generationid, + unsigned long *vm_generationid_addr, + struct restore_callbacks *callbacks) +{ +#ifdef ARM_MIGRATE_VERBOSE + IPRINTF("Hello restoration"); +#endif + if (restore_memory(xch, io_fd, dom)) + { + ERROR("Can''t restore memory"); + return -1; + } + + if (restore_console_and_xenstore(xch, io_fd, dom, + console_evtchn, console_gpfn, console_domid, + store_evtchn, store_gpfn, store_domid)) + { + ERROR("Can''t setup console and xenstore"); + return -1; + } + + if (restore_armhvm(xch, io_fd, dom)) + { + ERROR("HVM not restored"); + return -1; + } + + if (restore_vcpu(xch, io_fd, dom)) + { + ERROR("Can''t restore VCPU"); + return -1; + } + + + return 0; +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/tools/misc/Makefile b/tools/misc/Makefile index 520ef80..5338f87 100644 --- a/tools/misc/Makefile +++ b/tools/misc/Makefile @@ -11,7 +11,9 @@ HDRS = $(wildcard *.h) TARGETS-y := xenperf xenpm xen-tmem-list-parse gtraceview gtracestat xenlockprof xenwatchdogd xencov TARGETS-$(CONFIG_X86) += xen-detect xen-hvmctx xen-hvmcrash xen-lowmemd +ifeq ($(CONFIG_X86),y) TARGETS-$(CONFIG_MIGRATE) += xen-hptool +endif TARGETS := $(TARGETS-y) SUBDIRS-$(CONFIG_LOMOUNT) += lomount @@ -25,7 +27,9 @@ INSTALL_BIN := $(INSTALL_BIN-y) INSTALL_SBIN-y := xm xen-bugtool xen-python-path xend xenperf xsview xenpm xen-tmem-list-parse gtraceview \ gtracestat xenlockprof xenwatchdogd xen-ringwatch xencov INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx xen-hvmcrash xen-lowmemd +ifeq ($(CONFIG_X86),y) INSTALL_SBIN-$(CONFIG_MIGRATE) += xen-hptool +endif INSTALL_SBIN := $(INSTALL_SBIN-y) INSTALL_PRIVBIN-y := xenpvnetboot -- 1.8.1.2
Ian Campbell
2013-Jul-03 09:21 UTC
Re: [PATCH RFC v2 0/9] Preliminary working version of live migration
On Wed, 2013-07-03 at 18:15 +0900, Jaeyong Yoo wrote:> Hello all, > > Me and my colleagues have finally made a working version of live migration > in xen-on-arndale.Wow, nice one! Congratulations and thanks!> Using this patch series, we can xl save/restore and > xl migrate <domid> <ip> successfully. > > I have to further test the stability and the performance, I would like to > get some comments and reviews if anything may not nicely fit to the xen > framework. > > It is quite preliminary version, but I hope it could be a start for > supporting live migration in xen on arm.Absolutely, I''ll put this on my queue of things to look at.> > > Best, > Jaeyong > > Alexey Sokolov (2): > xen/arm: Implement get_maximum_gpfn hypercall for arm > xen/arm: Implement modify_returncode > > Alexey Sokolov, Elena Pyatunina, Evgeny Fedotov, and Nikolay Martyanov (1): > xen/arm: Implement toolstack for xl restore/save and migrate > > Elena Pyatunina (2): > xen/arm: Add handling write fault for dirty-page tracing > xen/arm: Implement hypercall for dirty page tracing (shadow op) > > Jaeyong Yoo (3): > xen-arm: Implement basic save/load for hvm context > xen/arm: Add more registers for saving and restoring vcpu > registers > xen/arm: Missing impl of clear_guest_offset macro > > Jaeyong Yoo and Evgeny Fedotov (1): > xen/arm: Implement save and restore for gic, vtimer, and ptimer > > config/arm32.mk | 1 + > tools/include/xen-foreign/reference.size | 2 +- > tools/libxc/Makefile | 5 + > tools/libxc/xc_arm_migrate.c | 586 +++++++++++++++++++++++++++++++ > tools/libxc/xc_resume.c | 25 ++ > tools/misc/Makefile | 4 + > xen/arch/arm/Makefile | 2 +- > xen/arch/arm/domain.c | 38 ++ > xen/arch/arm/domctl.c | 134 ++++++- > xen/arch/arm/hvm.c | 67 ---- > xen/arch/arm/hvm/Makefile | 2 + > xen/arch/arm/hvm/hvm.c | 209 +++++++++++ > xen/arch/arm/hvm/save.c | 66 ++++ > xen/arch/arm/mm.c | 62 +++- > xen/arch/arm/p2m.c | 368 +++++++++++++++++++ > xen/arch/arm/traps.c | 7 + > xen/common/Makefile | 2 + > xen/include/asm-arm/domain.h | 8 + > xen/include/asm-arm/guest_access.h | 6 +- > xen/include/asm-arm/hvm/support.h | 29 ++ > xen/include/asm-arm/mm.h | 2 + > xen/include/asm-arm/p2m.h | 8 + > xen/include/public/arch-arm.h | 41 +++ > xen/include/public/arch-arm/hvm/save.h | 55 +++ > 24 files changed, 1655 insertions(+), 74 deletions(-) > create mode 100644 tools/libxc/xc_arm_migrate.c > delete mode 100644 xen/arch/arm/hvm.c > create mode 100644 xen/arch/arm/hvm/Makefile > create mode 100644 xen/arch/arm/hvm/hvm.c > create mode 100644 xen/arch/arm/hvm/save.c > create mode 100644 xen/include/asm-arm/hvm/support.h >
Stefano Stabellini
2013-Jul-03 11:29 UTC
Re: [PATCH RFC v2 1/9] xen-arm: Implement basic save/load for hvm context
On Wed, 3 Jul 2013, Jaeyong Yoo wrote:> Create a hvm directory for arm and move all the hvm related files > to that directory. > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > > xen/arch/arm/Makefile | 2 +- > xen/arch/arm/hvm.c | 67 ---------------------------------- > xen/arch/arm/hvm/Makefile | 2 + > xen/arch/arm/hvm/hvm.c | 67 ++++++++++++++++++++++++++++++++++ > xen/arch/arm/hvm/save.c | 66 +++++++++++++++++++++++++++++++++ > xen/include/asm-arm/hvm/support.h | 29 +++++++++++++++ > xen/include/public/arch-arm/hvm/save.h | 17 +++++++++Does it have to be a subdirectory? I don''t think it makes much sense to separate out the hvm files on ARM.
Stefano Stabellini
2013-Jul-03 11:29 UTC
Re: [PATCH RFC v2 2/9] xen/arm: Implement save and restore for gic, vtimer, and ptimer
On Wed, 3 Jul 2013, Jaeyong Yoo wrote:> +static int vtimer_save(struct domain *d, hvm_domain_context_t *h) > +{ > + struct hvm_hw_vtimer ctxt; > + struct vcpu *v; > + > + ctxt.vtb_offset = d->arch.virt_timer_base.offset; > + > + /* Save the state of vtimer */ > + for_each_vcpu( d, v ) > + { > + ctxt.cval = v->arch.virt_timer.cval; > + ctxt.ctl = v->arch.virt_timer.ctl; > + if ( hvm_save_entry(VTIMER, v->vcpu_id, h, &ctxt) != 0 ) > + return 1; > + } > + > + return 0; > +} > + > +static int vtimer_load(struct domain *d, hvm_domain_context_t *h) > +{ > + int vcpuid; > + struct hvm_hw_vtimer ctxt; > + struct vcpu *v; > + /* Which vcpu is this? */ > + vcpuid = hvm_load_instance(h); > + if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) > + { > + dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n", > + d->domain_id, vcpuid); > + return -EINVAL; > + } > + > + if ( hvm_load_entry(VTIMER, h, &ctxt) != 0 ) > + return -EINVAL; > + > + v->arch.virt_timer.cval = ctxt.cval; > + v->arch.virt_timer.ctl = ctxt.ctl; > + v->arch.virt_timer.v = v; > + > + d->arch.virt_timer_base.offset = ctxt.vtb_offset; > + > + return 0; > +} > + > +HVM_REGISTER_SAVE_RESTORE(VTIMER, vtimer_save, vtimer_load, 1, HVMSR_PER_VCPU); > + > +static int ptimer_save(struct domain *d, hvm_domain_context_t *h) > +{ > + struct hvm_hw_ptimer ctxt; > + struct vcpu *v; > + > + ctxt.ptb_offset = d->arch.phys_timer_base.offset; > + > + /* Save the state of ptimer */ > + for_each_vcpu( d, v ) > + { > + ctxt.p_cval = v->arch.phys_timer.cval; > + ctxt.p_ctl = v->arch.phys_timer.ctl; > + > + if ( hvm_save_entry(PTIMER, v->vcpu_id, h, &ctxt) != 0 ) > + return 1; > + } > + > + return 0; > +} > + > +static int ptimer_load(struct domain *d, hvm_domain_context_t *h) > +{ > + int vcpuid; > + struct hvm_hw_ptimer ctxt; > + struct vcpu *v; > + /* Which vcpu is this? */ > + vcpuid = hvm_load_instance(h); > + if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) > + { > + dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n", > + d->domain_id, vcpuid); > + return -EINVAL; > + } > + > + if ( hvm_load_entry(PTIMER, h, &ctxt) != 0 ) > + return -EINVAL; > + > + v->arch.phys_timer.cval = ctxt.p_cval; > + v->arch.phys_timer.ctl = ctxt.p_ctl; > + v->arch.phys_timer.v = v; > + > + d->arch.phys_timer_base.offset = ctxt.ptb_offset; > + > + return 0; > +}Given that the ptimer and vtimer save and load functions as well as the save records are identical, I think it might be better to declare a single timer record with a timer type and reuse the same load and save functions for both ptimer and vtimer.
Stefano Stabellini
2013-Jul-03 11:29 UTC
Re: [PATCH RFC v2 3/9] xen/arm: Add more registers for saving and restoring vcpu registers
On Wed, 3 Jul 2013, Jaeyong Yoo wrote:> Add more registers for saving and restoring vcpu registers for > live migration. Those registers are selected from the registers > stored when vcpu context switching. > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com>Looks OK
Stefano Stabellini
2013-Jul-03 11:29 UTC
Re: [PATCH RFC v2 4/9] xen/arm: Implement get_maximum_gpfn hypercall for arm
On Wed, 3 Jul 2013, Jaeyong Yoo wrote:> From: Alexey Sokolov <sokolov.a@samsung.com> > > Since we do not know the maximum gpfn size for guest domain, > we walk the page table of guest in order to see the maximum size > of gpfn. > > Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com> > --- > xen/arch/arm/mm.c | 3 +- > xen/arch/arm/p2m.c | 69 +++++++++++++++++++++++++++++++++++++++++++ > xen/include/asm-arm/p2m.h | 3 ++ > xen/include/public/arch-arm.h | 6 ++++ > 4 files changed, 80 insertions(+), 1 deletion(-) > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index d1290cd..650b1fc 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -762,7 +762,8 @@ int page_is_ram_type(unsigned long mfn, unsigned long mem_type) > > unsigned long domain_get_maximum_gpfn(struct domain *d) > { > - return -ENOSYS; > + xen_pfn_t result = p2m_get_next_non_used_gpfn(d, GUEST_RAM_BASE >> PAGE_SHIFT); > + return result; > }Rather than implementing p2m_get_next_non_used_gpfn, I think we should keep track of the maximum gpfn, like we do on x86, see: xen/arch/x86/mm.c:domain_get_maximum_gpfn
Ian Campbell
2013-Jul-03 11:35 UTC
Re: [PATCH RFC v2 4/9] xen/arm: Implement get_maximum_gpfn hypercall for arm
On Wed, 2013-07-03 at 12:29 +0100, Stefano Stabellini wrote:> On Wed, 3 Jul 2013, Jaeyong Yoo wrote: > > From: Alexey Sokolov <sokolov.a@samsung.com> > > > > Since we do not know the maximum gpfn size for guest domain, > > we walk the page table of guest in order to see the maximum size > > of gpfn. > > > > Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com> > > --- > > xen/arch/arm/mm.c | 3 +- > > xen/arch/arm/p2m.c | 69 +++++++++++++++++++++++++++++++++++++++++++ > > xen/include/asm-arm/p2m.h | 3 ++ > > xen/include/public/arch-arm.h | 6 ++++ > > 4 files changed, 80 insertions(+), 1 deletion(-) > > > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > > index d1290cd..650b1fc 100644 > > --- a/xen/arch/arm/mm.c > > +++ b/xen/arch/arm/mm.c > > @@ -762,7 +762,8 @@ int page_is_ram_type(unsigned long mfn, unsigned long mem_type) > > > > unsigned long domain_get_maximum_gpfn(struct domain *d) > > { > > - return -ENOSYS; > > + xen_pfn_t result = p2m_get_next_non_used_gpfn(d, GUEST_RAM_BASE >> PAGE_SHIFT); > > + return result; > > } > > Rather than implementing p2m_get_next_non_used_gpfn, I think we should > keep track of the maximum gpfn, like we do on x86, see:ISTR Tim wanting to get rid of this concept on x86. Anything which is walking the P2M should be doing so via iterator based interfaces and not by looping on O...N where N is potentially guest controlled. Not to mention 0...N might be quite sparse. Ian.
Stefano Stabellini
2013-Jul-03 11:37 UTC
Re: [PATCH RFC v2 7/9] xen/arm: Missing impl of clear_guest_offset macro
On Wed, 3 Jul 2013, Jaeyong Yoo wrote:> This macro is somehow accidently broken. Fill up the rest of it. > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > --- > xen/include/asm-arm/guest_access.h | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-arm/guest_access.h > index 34aae14..1104090 100644 > --- a/xen/include/asm-arm/guest_access.h > +++ b/xen/include/asm-arm/guest_access.h > @@ -77,8 +77,10 @@ unsigned long raw_clear_guest(void *to, unsigned len); > * Clear an array of objects in guest context via a guest handle, > * specifying an offset into the guest array. > */ > -#define clear_guest_offset(hnd, off, ptr, nr) ({ \ > - raw_clear_guest(_d+(off), nr); \ > +#define clear_guest_offset(hnd, off, ptr, nr) ({ \ > + const typeof(*(ptr)) *_s = (ptr); \ > + char (*_d)[sizeof(*_s)] = (void *)(hnd).p; \ > + raw_clear_guest(_d+(off), nr); \ > })You are right that is broken, in fact shouldn''t this macro have only 3 arguments?
Ian Campbell
2013-Jul-03 11:57 UTC
Re: [PATCH RFC v2 7/9] xen/arm: Missing impl of clear_guest_offset macro
On Wed, 2013-07-03 at 12:37 +0100, Stefano Stabellini wrote:> On Wed, 3 Jul 2013, Jaeyong Yoo wrote: > > This macro is somehow accidently broken. Fill up the rest of it. > > > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > > --- > > xen/include/asm-arm/guest_access.h | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-arm/guest_access.h > > index 34aae14..1104090 100644 > > --- a/xen/include/asm-arm/guest_access.h > > +++ b/xen/include/asm-arm/guest_access.h > > @@ -77,8 +77,10 @@ unsigned long raw_clear_guest(void *to, unsigned len); > > * Clear an array of objects in guest context via a guest handle, > > * specifying an offset into the guest array. > > */ > > -#define clear_guest_offset(hnd, off, ptr, nr) ({ \ > > - raw_clear_guest(_d+(off), nr); \ > > +#define clear_guest_offset(hnd, off, ptr, nr) ({ \ > > + const typeof(*(ptr)) *_s = (ptr); \ > > + char (*_d)[sizeof(*_s)] = (void *)(hnd).p; \ > > + raw_clear_guest(_d+(off), nr); \ > > }) > > You are right that is broken, in fact shouldn''t this macro have only 3 > arguments?It does everywhere else! The implementation should be the same as the x86 one I think. Ian.> > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Stefano Stabellini
2013-Jul-03 12:10 UTC
Re: [PATCH RFC v2 6/9] xen/arm: Add handling write fault for dirty-page tracing
On Wed, 3 Jul 2013, Jaeyong Yoo wrote:> From: Elena Pyatunina <e.pyatunina@samsung.com> > > Add handling write fault in do_trap_data_abort_guest for dirty-page > tracing. Mark dirty function makes the bitmap of dirty pages. > > Signed-off-by: Elena Pyatunina <e.pyatunina@samsung.com> > Singed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > --- > xen/arch/arm/mm.c | 59 ++++++++++++++++++++++++++++++- > xen/arch/arm/p2m.c | 84 ++++++++++++++++++++++++++++++++++++++++++++ > xen/arch/arm/traps.c | 7 ++++ > xen/include/asm-arm/domain.h | 8 +++++ > xen/include/asm-arm/mm.h | 2 ++ > xen/include/asm-arm/p2m.h | 3 ++ > 6 files changed, 162 insertions(+), 1 deletion(-) > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index 650b1fc..f509008 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -680,7 +680,6 @@ void destroy_xen_mappings(unsigned long v, unsigned long e) > create_xen_entries(REMOVE, v, 0, (e - v) >> PAGE_SHIFT, 0); > } > > -enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; > static void set_pte_flags_on_range(const char *p, unsigned long l, enum mg mg) > { > lpae_t pte; > @@ -1142,6 +1141,64 @@ int is_iomem_page(unsigned long mfn) > return 1; > return 0; > } > + > +/* > + * Set l3e entries of P2M table to be read-only. > + * > + * On first write, it page faults, its entry is changed to read-write, > + * and on retry the write succeeds. > + * > + * Populate dirty_bitmap by looking for entries that have been > + * switched to read-write. > + */ > +int handle_page_fault(struct domain *d, paddr_t addr) > +{ > + lpae_t pte, *first = NULL, *second = NULL, *third = NULL; > + struct p2m_domain *p2m = &d->arch.p2m; > + > + if ( d->arch.dirty.mode == 0 || first_table_offset(addr) >= LPAE_ENTRIES ) > + return 1;if this is an error condition, it would be better to return an error> + spin_lock(&p2m->lock); > + > + first = __map_domain_page(p2m->first_level); > + > + if ( !first || > + !first[first_table_offset(addr)].walk.valid || > + !first[first_table_offset(addr)].walk.table ) > + goto done; > + > + second = map_domain_page(first[first_table_offset(addr)].walk.base); > + > + if ( !second || > + !second[second_table_offset(addr)].walk.valid || > + !second[second_table_offset(addr)].walk.table ) > + goto done; > + > + third = map_domain_page(second[second_table_offset(addr)].walk.base); > + > + if ( !third ) > + goto done; > + > + pte = third[third_table_offset(addr)]; > + > + if (pte.p2m.valid && pte.p2m.write == 0) > + { > + mark_dirty(d, addr); > + pte.p2m.write = 1;Would this bit be sufficient? Otherwise could we take one of the other available bits in the p2m pte? That would free us from having to handle a separate data structure to handle dirty bits.> + write_pte(&third[third_table_offset(addr)], pte); > + flush_tlb_local();flush_tlb_local should be OK because I think that we can always guarantee that the current VMID is the VMID of the guest we are handling the fault for.> + } > + > +done: > + if (third) unmap_domain_page(third); > + if (second) unmap_domain_page(second); > + if (first) unmap_domain_page(first); > + > + spin_unlock(&p2m->lock); > + return 0; > +} > + > /* > * Local variables: > * mode: C > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index 8bf7eb7..bae7af7 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -412,6 +412,90 @@ unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn) > return p >> PAGE_SHIFT; > } > > +static struct page_info * new_dirty_page(void) > +{ > + struct page_info *page = NULL; > + void *p; > + > + page = alloc_domheap_page(NULL, 0); > + if ( page == NULL ) > + return NULL; > + > + p = __map_domain_page(page); > + > + clear_page(p); > + unmap_domain_page(p); > + > + return page; > +} > + > +void mark_dirty(struct domain *d, paddr_t addr) > +{ > + struct page_info *page; > + xen_pfn_t *first = NULL, *second = NULL, *third = NULL; > + void *p; > + int changed; > + > + spin_lock(&d->arch.dirty.lock); > + > + if (d->arch.dirty.top == NULL) > + { > + page = alloc_domheap_pages(NULL, 1, 0); > + if (page == NULL) > + { > + printk("Error: couldn''t allocate page for dirty bitmap!\n"); > + spin_unlock(&d->arch.dirty.lock); > + return;no error return codes?> + } > + > + INIT_PAGE_LIST_HEAD(&d->arch.dirty.pages); > + page_list_add(page, &d->arch.dirty.pages); > + > + /* Clear both first level pages */ > + p = __map_domain_page(page); > + clear_page(p); > + unmap_domain_page(p); > + > + p = __map_domain_page(page + 1); > + clear_page(p); > + unmap_domain_page(p); > + > + d->arch.dirty.top = page; > + } > + > + first = __map_domain_page(d->arch.dirty.top); > + BUG_ON(!first && "Can''t map first level p2m."); > > + if ( !first[first_table_offset(addr)]) > + { > + page = new_dirty_page(); > + page_list_add(page, &d->arch.dirty.pages); > + first[first_table_offset(addr)] = page_to_mfn(page); > + } > + > + second = map_domain_page(first[first_table_offset(addr)]); > + BUG_ON(!second && "Can''t map second level p2m."); > > + if (!second[second_table_offset(addr)]) > + { > + page = new_dirty_page(); > + page_list_add(page, &d->arch.dirty.pages); > + second[second_table_offset(addr)] = page_to_mfn(page); > + } > + > + third = map_domain_page(second[second_table_offset(addr)]); > + BUG_ON(!third && "Can''t map third level p2m."); > > + changed = !__test_and_set_bit(third_table_offset(addr), third);If I am not missing something, the third_table_offset was written to give you an index into a page to retrieve a 64 bit pte. If you are only using 1 bit, shouldn''t you be using a different indexing system? Otherwise you are wasting 63 bits for each entry.> + if (changed)^ code style> + { > + d->arch.dirty.count++; > + } > + > + if (third) unmap_domain_page(third); > + if (second) unmap_domain_page(second); > + if (first) unmap_domain_page(first); > + > + spin_unlock(&d->arch.dirty.lock); > +} > + > /* > * Local variables: > * mode: C
Ian Campbell
2013-Jul-03 12:26 UTC
Re: [PATCH RFC v2 6/9] xen/arm: Add handling write fault for dirty-page tracing
On Wed, 2013-07-03 at 18:15 +0900, Jaeyong Yoo wrote:> From: Elena Pyatunina <e.pyatunina@samsung.com> > > Add handling write fault in do_trap_data_abort_guest for dirty-page > tracing. Mark dirty function makes the bitmap of dirty pages.You''ve gone with the model of allocating a bitmap when logdirty is enabled and updating it directly from the fault handler? The bit map is based around a three level structure mirroring the p2m tables? I had been considering the option of reusing one of the bits available for software use in the p2m entries to track the dirty status and walking the p2m when the toolstack asked for the current bitmap instead. I think this would have less memory overhead and also would allow us to be lock free on the fault handling path (by using suitable atomic exchanges on the p2m entries), as well as avoiding memory allocations on that path. I think we''d want to establish a linear map of the current guest p2m for these purposes so we could also avoid all the map_domain_page stuff in the fault path. I think there is enough virtual address space left for that even on 32-bit, it''s less critical on 64-bit anyway since we are about to have a direct map of RAM available to us.> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > index 398d209..9927712 100644 > --- a/xen/arch/arm/traps.c > +++ b/xen/arch/arm/traps.c > @@ -1011,6 +1011,13 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, > if ( rc == -EFAULT ) > goto bad_data_abort; > > + /* handle permission fault on write */ > + if ((dabt.dfsc & 0x3f) == (FSC_FLT_PERM + 3) && dabt.write)Can we get some #defines for these magic constants please.> + { > + if (handle_page_fault(current->domain, info.gpa) == 0) > + return; > + } > + > if (handle_mmio(&info))Since you handle page fault first, won''t this make MMIO trapping regions writable? Ian.
Stefano Stabellini
2013-Jul-03 12:38 UTC
Re: [PATCH RFC v2 8/9] xen/arm: Implement hypercall for dirty page tracing (shadow op)
On Wed, 3 Jul 2013, Jaeyong Yoo wrote:> From: Elena Pyatunina <e.pyatunina@samsung.com> > > Add hypercall (shadow op: enable/disable and clean/peek dirted page bitmap) > > Signed-off-by: Elena Pyatunina <e.pyatunina@samsung.com> > Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com> > --- > xen/arch/arm/domain.c | 4 + > xen/arch/arm/domctl.c | 19 ++++ > xen/arch/arm/p2m.c | 219 +++++++++++++++++++++++++++++++++++++++++++++- > xen/include/asm-arm/p2m.h | 2 + > 4 files changed, 242 insertions(+), 2 deletions(-) > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > index 73dd0de..e3fc533 100644 > --- a/xen/arch/arm/domain.c > +++ b/xen/arch/arm/domain.c > @@ -484,6 +484,10 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) > d->arch.vpidr = boot_cpu_data.midr.bits; > d->arch.vmpidr = boot_cpu_data.mpidr.bits; > > + /* init for dirty-page tracing */ > + d->arch.dirty.count = 0; > + d->arch.dirty.top = NULL; > + > clear_page(d->shared_info); > share_xen_page_with_guest( > virt_to_page(d->shared_info), d, XENSHARE_writable); > diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c > index a0719b0..bc06534 100644 > --- a/xen/arch/arm/domctl.c > +++ b/xen/arch/arm/domctl.c > @@ -67,6 +67,14 @@ long arch_do_domctl(struct xen_domctl *domctl, struct domain *d, > goto gethvmcontext_out; > } > > + /* Allocate our own marshalling buffer */ > + ret = -ENOMEM; > + if ( (c.data = xmalloc_bytes(c.size)) == NULL ) > + { > + printk("(gethvmcontext) xmalloc_bytes failed: %d\n", c.size ); > + goto gethvmcontext_out; > + } > + > domain_pause(d); > ret = hvm_save(d, &c); > domain_unpause(d);Shouldn''t this hunk belong to patch #2?> @@ -85,6 +93,17 @@ long arch_do_domctl(struct xen_domctl *domctl, struct domain *d, > xfree(c.data); > } > break; > + case XEN_DOMCTL_shadow_op: > + { > + ret = dirty_mode_op(d, &domctl->u.shadow_op); > + if ( (&domctl->u.shadow_op)->op == XEN_DOMCTL_SHADOW_OP_CLEAN > + || (&domctl->u.shadow_op)->op == XEN_DOMCTL_SHADOW_OP_PEEK) > + { > + copyback = 1; > + } > + } > + break; > + > default: > return -EINVAL; > } > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index bae7af7..fb8ce10 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -69,8 +69,8 @@ out: > > spin_unlock(&p2m->lock); > > - return ( ((xen_pfn_t)first_index << FIRST_SHIFT) | > - (second_index << SECOND_SHIFT) | > + return ( ((xen_pfn_t)first_index << FIRST_SHIFT) | > + (second_index << SECOND_SHIFT) | > (third_index << THIRD_SHIFT) > ) >> PAGE_SHIFT;Does this belong here? If so, why?> @@ -496,6 +496,221 @@ void mark_dirty(struct domain *d, paddr_t addr) > spin_unlock(&d->arch.dirty.lock); > } > > +static void free_dirty_bitmap(struct domain *d) > +{ > + struct page_info *pg; > + > + spin_lock(&d->arch.dirty.lock); > + > + if(d->arch.dirty.top) > + { > + while ( (pg = page_list_remove_head(&d->arch.dirty.pages)) ) > + free_domheap_page(pg); > + } > + d->arch.dirty.top = NULL; > + d->arch.dirty.count = 0; > + > + spin_unlock(&d->arch.dirty.lock); > +} > + > +/* Change types across all p2m entries in a domain */ > +static void p2m_change_entry_type_global(struct domain *d, enum mg ot, enum mg nt) > +{ > + struct p2m_domain *p2m = &d->arch.p2m; > + int i1, i2, i3; > + int first_index = first_table_offset(GUEST_RAM_BASE); > + int second_index = second_table_offset(GUEST_RAM_BASE); > + int third_index = third_table_offset(GUEST_RAM_BASE);you shouldn''t assume GUEST_RAM_BASE is static anywhere> + lpae_t *first = __map_domain_page(p2m->first_level); > + lpae_t pte, *second = NULL, *third = NULL; > + > + BUG_ON(!first && "Can''t map first level p2m."); > + > + spin_lock(&p2m->lock); > + > + for (i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1) > + { > + lpae_walk_t first_pte = first[i1].walk; > + if (!first_pte.valid || !first_pte.table) > + goto out; > + > + second = map_domain_page(first_pte.base); > + BUG_ON(!second && "Can''t map second level p2m."); > + for(i2 = second_index; i2 < LPAE_ENTRIES; ++i2) > + { > + lpae_walk_t second_pte = second[i2].walk; > + if (!second_pte.valid || !second_pte.table) > + goto out; > + > + third = map_domain_page(second_pte.base); > + BUG_ON(!third && "Can''t map third level p2m."); > + > + for (i3 = third_index; i3 < LPAE_ENTRIES; ++i3) > + { > + lpae_walk_t third_pte = third[i3].walk; > + int write = 0; > + if (!third_pte.valid) > + goto out; > + > + pte = third[i3]; > + if (pte.p2m.write == 1 && nt == mg_ro) > + { > + pte.p2m.write = 0; > + write = 1; > + } > + else if (pte.p2m.write == 0 && nt == mg_rw) > + { > + pte.p2m.write = 1; > + write = 1; > + } > + if (write) > + write_pte(&third[i3], pte); > + } > + unmap_domain_page(third); > + > + third = NULL; > + third_index = 0; > + } > + unmap_domain_page(second); > + > + second = NULL; > + second_index = 0; > + third_index = 0; > + } > + > +out: > + flush_tlb_all_local(); > + if (third) unmap_domain_page(third); > + if (second) unmap_domain_page(second); > + if (first) unmap_domain_page(first); > + > + spin_unlock(&p2m->lock); > +} > + > +/* Read a domain''s log-dirty bitmap and stats. > + * If the operation is a CLEAN, clear the bitmap and stats. */ > +static int log_dirty_op(struct domain *d, xen_domctl_shadow_op_t *sc) > +{ > + int i1, i2, rv = 0, clean = 0, peek = 1; > + xen_pfn_t *first = NULL, *second = NULL; > + unsigned long pages = 0, *third = NULL; > + > + clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN); > + > + if (guest_handle_is_null(sc->dirty_bitmap)) > + peek = 0; > + > + domain_pause(d); > + > + printk("DEBUG: log-dirty %s: dom %u dirty=%u\n", > + (clean) ? "clean" : "peek", > + d->domain_id, d->arch.dirty.count); > + > + sc->stats.dirty_count = d->arch.dirty.count; > + > + spin_lock(&d->arch.dirty.lock); > + first = d->arch.dirty.top ? __map_domain_page(d->arch.dirty.top) : NULL; > + > + /* 32-bit address space starts from 0 and ends at ffffffff.This should be a 40 bit address space due to LPAE support> + * First level of p2m tables is indexed by bits 39-30. > + * So significant bits for 32-bit addresses in first table are 31-30. > + * So we need to iterate from 0 to 4. > + * But RAM starts from 2 gigabytes. So start from 2 instead. > + * TODO remove this hardcodeI wouldn''t assume that the guest ram start at 2GB> + */ > + for (i1 = 2; (pages < sc->pages) && (i1 < 4); ++i1) > + { > + second = (first && first[i1]) ? map_domain_page(first[i1]) : NULL; > + > + for (i2 = 0; (i2 < LPAE_ENTRIES); ++i2) > + { > + unsigned int bytes = LPAE_ENTRIES/8; > + third = (second && second[i2]) ? map_domain_page(second[i2]) : NULL; > + if ( unlikely(((sc->pages - pages + 7) >> 3) < bytes) ) > + bytes = (unsigned int)((sc->pages - pages + 7) >> 3); > + if (peek)^ code style> + { > + if ( (third ? copy_to_guest_offset(sc->dirty_bitmap, pages >> 3, > + (uint8_t *)third, bytes) > + : clear_guest_offset(sc->dirty_bitmap, pages >> 3, > + (uint8_t*)NULL /*only type matters*/, > + bytes)) != 0 ) > + { > + rv = 1; > + goto out; > + } > + } > + pages += bytes << 3; > + if (third)^ code style> + { > + if (clean) clear_page(third); > + unmap_domain_page(third); > + } > + } > + if (second) unmap_domain_page(second); > + } > + if (first) unmap_domain_page(first); > + > + spin_unlock(&d->arch.dirty.lock); > + > + if ( pages < sc->pages ) > + sc->pages = pages; > + > + if (clean)^ code style> + { > + p2m_change_entry_type_global(d, mg_rw, mg_ro); > + free_dirty_bitmap(d); > + } > + > +out: > + if (rv) > + { > + spin_unlock(&d->arch.dirty.lock); > + } > + domain_unpause(d); > + return rv; > +} > + > +long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc) > +{ > + long ret = 0; > + > + switch (sc->op) > + { > + case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY: > + case XEN_DOMCTL_SHADOW_OP_OFF: > + { > + enum mg ot = mg_clear, nt = mg_clear; > + > + ot = sc->op == XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY ? mg_rw : mg_ro; > + nt = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? mg_rw : mg_ro; > + > + domain_pause(d); > + > + d->arch.dirty.mode = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? 0 : 1; > + p2m_change_entry_type_global(d, ot, nt); > + > + if (sc->op == XEN_DOMCTL_SHADOW_OP_OFF) > + free_dirty_bitmap(d); > + > + domain_unpause(d);Do we need a lock to make sure that all the operations happen atomically?> + } > + break; > + > + case XEN_DOMCTL_SHADOW_OP_CLEAN: > + case XEN_DOMCTL_SHADOW_OP_PEEK: > + { > + ret = log_dirty_op(d, sc); > + } > + break;For symmetry maybe it would be better to pause the domain outside log_dirty_op> + default: > + return -ENOSYS; > + } > + return ret; > +} > + > /* > * Local variables: > * mode: C > diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h > index fd5890f..498cf0b 100644 > --- a/xen/include/asm-arm/p2m.h > +++ b/xen/include/asm-arm/p2m.h > @@ -2,6 +2,7 @@ > #define _XEN_P2M_H > > #include <xen/mm.h> > +#include <public/domctl.h> > > struct domain; > > @@ -112,6 +113,7 @@ static inline int get_page_and_type(struct page_info *page, > > enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; > void mark_dirty(struct domain *d, paddr_t addr); > +long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc); > > #endif /* _XEN_P2M_H */ > > -- > 1.8.1.2 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
Stefano Stabellini
2013-Jul-03 12:54 UTC
Re: [PATCH RFC v2 0/9] Preliminary working version of live migration
On Wed, 3 Jul 2013, Jaeyong Yoo wrote:> Hello all, > > Me and my colleagues have finally made a working version of live migration > in xen-on-arndale. Using this patch series, we can xl save/restore and > xl migrate <domid> <ip> successfully.Good job!! Thanks for the patches!> I have to further test the stability and the performance, I would like to > get some comments and reviews if anything may not nicely fit to the xen > framework. > > It is quite preliminary version, but I hope it could be a start for > supporting live migration in xen on arm. > > > Best, > Jaeyong > > Alexey Sokolov (2): > xen/arm: Implement get_maximum_gpfn hypercall for arm > xen/arm: Implement modify_returncode > > Alexey Sokolov, Elena Pyatunina, Evgeny Fedotov, and Nikolay Martyanov (1): > xen/arm: Implement toolstack for xl restore/save and migrate > > Elena Pyatunina (2): > xen/arm: Add handling write fault for dirty-page tracing > xen/arm: Implement hypercall for dirty page tracing (shadow op) > > Jaeyong Yoo (3): > xen-arm: Implement basic save/load for hvm context > xen/arm: Add more registers for saving and restoring vcpu > registers > xen/arm: Missing impl of clear_guest_offset macro > > Jaeyong Yoo and Evgeny Fedotov (1): > xen/arm: Implement save and restore for gic, vtimer, and ptimer > > config/arm32.mk | 1 + > tools/include/xen-foreign/reference.size | 2 +- > tools/libxc/Makefile | 5 + > tools/libxc/xc_arm_migrate.c | 586 +++++++++++++++++++++++++++++++ > tools/libxc/xc_resume.c | 25 ++ > tools/misc/Makefile | 4 + > xen/arch/arm/Makefile | 2 +- > xen/arch/arm/domain.c | 38 ++ > xen/arch/arm/domctl.c | 134 ++++++- > xen/arch/arm/hvm.c | 67 ---- > xen/arch/arm/hvm/Makefile | 2 + > xen/arch/arm/hvm/hvm.c | 209 +++++++++++ > xen/arch/arm/hvm/save.c | 66 ++++ > xen/arch/arm/mm.c | 62 +++- > xen/arch/arm/p2m.c | 368 +++++++++++++++++++ > xen/arch/arm/traps.c | 7 + > xen/common/Makefile | 2 + > xen/include/asm-arm/domain.h | 8 + > xen/include/asm-arm/guest_access.h | 6 +- > xen/include/asm-arm/hvm/support.h | 29 ++ > xen/include/asm-arm/mm.h | 2 + > xen/include/asm-arm/p2m.h | 8 + > xen/include/public/arch-arm.h | 41 +++ > xen/include/public/arch-arm/hvm/save.h | 55 +++ > 24 files changed, 1655 insertions(+), 74 deletions(-) > create mode 100644 tools/libxc/xc_arm_migrate.c > delete mode 100644 xen/arch/arm/hvm.c > create mode 100644 xen/arch/arm/hvm/Makefile > create mode 100644 xen/arch/arm/hvm/hvm.c > create mode 100644 xen/arch/arm/hvm/save.c > create mode 100644 xen/include/asm-arm/hvm/support.h > > -- > 1.8.1.2 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
Jaeyong Yoo
2013-Jul-03 23:51 UTC
Re: [PATCH RFC v2 1/9] xen-arm: Implement basic save/load for hvm context
> -----Original Message----- > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com] > Sent: Wednesday, July 03, 2013 8:29 PM > To: Jaeyong Yoo > Cc: xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH RFC v2 1/9] xen-arm: Implement basic > save/load for hvm context > > On Wed, 3 Jul 2013, Jaeyong Yoo wrote: > > Create a hvm directory for arm and move all the hvm related files to > > that directory. > > > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > > > > xen/arch/arm/Makefile | 2 +- > > xen/arch/arm/hvm.c | 67--------------------------------> -- > > xen/arch/arm/hvm/Makefile | 2 + > > xen/arch/arm/hvm/hvm.c | 67 > ++++++++++++++++++++++++++++++++++ > > xen/arch/arm/hvm/save.c | 66 > +++++++++++++++++++++++++++++++++ > > xen/include/asm-arm/hvm/support.h | 29 +++++++++++++++ > > xen/include/public/arch-arm/hvm/save.h | 17 +++++++++ > > Does it have to be a subdirectory? > I don''t think it makes much sense to separate out the hvm files on ARM.Got it.
Jaeyong Yoo
2013-Jul-03 23:53 UTC
Re: [PATCH RFC v2 2/9] xen/arm: Implement save and restore for gic, vtimer, and ptimer
> -----Original Message----- > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com] > Sent: Wednesday, July 03, 2013 8:30 PM > To: Jaeyong Yoo > Cc: xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH RFC v2 2/9] xen/arm: Implement save and > restore for gic, vtimer, and ptimer > > On Wed, 3 Jul 2013, Jaeyong Yoo wrote: > > +static int vtimer_save(struct domain *d, hvm_domain_context_t *h) { > > + struct hvm_hw_vtimer ctxt; > > + struct vcpu *v; > > + > > + ctxt.vtb_offset = d->arch.virt_timer_base.offset; > > + > > + /* Save the state of vtimer */ > > + for_each_vcpu( d, v ) > > + { > > + ctxt.cval = v->arch.virt_timer.cval; > > + ctxt.ctl = v->arch.virt_timer.ctl; > > + if ( hvm_save_entry(VTIMER, v->vcpu_id, h, &ctxt) != 0 ) > > + return 1; > > + } > > + > > + return 0; > > +} > > + > > +static int vtimer_load(struct domain *d, hvm_domain_context_t *h) { > > + int vcpuid; > > + struct hvm_hw_vtimer ctxt; > > + struct vcpu *v; > > + /* Which vcpu is this? */ > > + vcpuid = hvm_load_instance(h); > > + if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) > > + { > > + dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n", > > + d->domain_id, vcpuid); > > + return -EINVAL; > > + } > > + > > + if ( hvm_load_entry(VTIMER, h, &ctxt) != 0 ) > > + return -EINVAL; > > + > > + v->arch.virt_timer.cval = ctxt.cval; > > + v->arch.virt_timer.ctl = ctxt.ctl; > > + v->arch.virt_timer.v = v; > > + > > + d->arch.virt_timer_base.offset = ctxt.vtb_offset; > > + > > + return 0; > > +} > > + > > +HVM_REGISTER_SAVE_RESTORE(VTIMER, vtimer_save, vtimer_load, 1, > > +HVMSR_PER_VCPU); > > + > > +static int ptimer_save(struct domain *d, hvm_domain_context_t *h) { > > + struct hvm_hw_ptimer ctxt; > > + struct vcpu *v; > > + > > + ctxt.ptb_offset = d->arch.phys_timer_base.offset; > > + > > + /* Save the state of ptimer */ > > + for_each_vcpu( d, v ) > > + { > > + ctxt.p_cval = v->arch.phys_timer.cval; > > + ctxt.p_ctl = v->arch.phys_timer.ctl; > > + > > + if ( hvm_save_entry(PTIMER, v->vcpu_id, h, &ctxt) != 0 ) > > + return 1; > > + } > > + > > + return 0; > > +} > > + > > +static int ptimer_load(struct domain *d, hvm_domain_context_t *h) { > > + int vcpuid; > > + struct hvm_hw_ptimer ctxt; > > + struct vcpu *v; > > + /* Which vcpu is this? */ > > + vcpuid = hvm_load_instance(h); > > + if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) > > + { > > + dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n", > > + d->domain_id, vcpuid); > > + return -EINVAL; > > + } > > + > > + if ( hvm_load_entry(PTIMER, h, &ctxt) != 0 ) > > + return -EINVAL; > > + > > + v->arch.phys_timer.cval = ctxt.p_cval; > > + v->arch.phys_timer.ctl = ctxt.p_ctl; > > + v->arch.phys_timer.v = v; > > + > > + d->arch.phys_timer_base.offset = ctxt.ptb_offset; > > + > > + return 0; > > +} > > Given that the ptimer and vtimer save and load functions as well as the > save records are identical, I think it might be better to declare a single > timer record with a timer type and reuse the same load and save functions > for both ptimer and vtimer.Got it.
Jaeyong Yoo
2013-Jul-04 00:09 UTC
Re: [PATCH RFC v2 4/9] xen/arm: Implement get_maximum_gpfn hypercall for arm
> > > > Rather than implementing p2m_get_next_non_used_gpfn, I think we should > > keep track of the maximum gpfn, like we do on x86, see: > > ISTR Tim wanting to get rid of this concept on x86. > > Anything which is walking the P2M should be doing so via iterator based > interfaces and not by looping on O...N where N is potentially guest > controlled. Not to mention 0...N might be quite sparse. >I guess we are now going to implment via iterator? Jaeyong
Jaeyong Yoo
2013-Jul-04 00:09 UTC
Re: [PATCH RFC v2 7/9] xen/arm: Missing impl of clear_guest_offset macro
> -----Original Message----- > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com] > Sent: Wednesday, July 03, 2013 8:37 PM > To: Jaeyong Yoo > Cc: xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH RFC v2 7/9] xen/arm: Missing impl of > clear_guest_offset macro > > On Wed, 3 Jul 2013, Jaeyong Yoo wrote: > > This macro is somehow accidently broken. Fill up the rest of it. > > > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > > --- > > xen/include/asm-arm/guest_access.h | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/xen/include/asm-arm/guest_access.h > > b/xen/include/asm-arm/guest_access.h > > index 34aae14..1104090 100644 > > --- a/xen/include/asm-arm/guest_access.h > > +++ b/xen/include/asm-arm/guest_access.h > > @@ -77,8 +77,10 @@ unsigned long raw_clear_guest(void *to, unsignedlen);> > * Clear an array of objects in guest context via a guest handle, > > * specifying an offset into the guest array. > > */ > > -#define clear_guest_offset(hnd, off, ptr, nr) ({ \ > > - raw_clear_guest(_d+(off), nr); \ > > +#define clear_guest_offset(hnd, off, ptr, nr) ({ \ > > + const typeof(*(ptr)) *_s = (ptr); \ > > + char (*_d)[sizeof(*_s)] = (void *)(hnd).p; \ > > + raw_clear_guest(_d+(off), nr); \ > > }) > > You are right that is broken, in fact shouldn''t this macro have only 3 > arguments?Oops right! Jaeyong
Jaeyong Yoo
2013-Jul-04 00:44 UTC
Re: [PATCH RFC v2 6/9] xen/arm: Add handling write fault for dirty-page tracing
> -----Original Message----- > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com] > Sent: Wednesday, July 03, 2013 9:11 PM > To: Jaeyong Yoo > Cc: xen-devel@lists.xen.org; Elena Pyatunina > Subject: Re: [Xen-devel] [PATCH RFC v2 6/9] xen/arm: Add handling write > fault for dirty-page tracing > > On Wed, 3 Jul 2013, Jaeyong Yoo wrote: > > From: Elena Pyatunina <e.pyatunina@samsung.com> > > > > Add handling write fault in do_trap_data_abort_guest for dirty-page > > tracing. Mark dirty function makes the bitmap of dirty pages. > > > > Signed-off-by: Elena Pyatunina <e.pyatunina@samsung.com> > > Singed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > > --- > > xen/arch/arm/mm.c | 59 ++++++++++++++++++++++++++++++- > > xen/arch/arm/p2m.c | 84 > ++++++++++++++++++++++++++++++++++++++++++++ > > xen/arch/arm/traps.c | 7 ++++ > > xen/include/asm-arm/domain.h | 8 +++++ > > xen/include/asm-arm/mm.h | 2 ++ > > xen/include/asm-arm/p2m.h | 3 ++ > > 6 files changed, 162 insertions(+), 1 deletion(-) > > > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index > > 650b1fc..f509008 100644 > > --- a/xen/arch/arm/mm.c > > +++ b/xen/arch/arm/mm.c > > @@ -680,7 +680,6 @@ void destroy_xen_mappings(unsigned long v, unsigned > long e) > > create_xen_entries(REMOVE, v, 0, (e - v) >> PAGE_SHIFT, 0); } > > > > -enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; static void > > set_pte_flags_on_range(const char *p, unsigned long l, enum mg mg) { > > lpae_t pte; > > @@ -1142,6 +1141,64 @@ int is_iomem_page(unsigned long mfn) > > return 1; > > return 0; > > } > > + > > +/* > > + * Set l3e entries of P2M table to be read-only. > > + * > > + * On first write, it page faults, its entry is changed to > > +read-write, > > + * and on retry the write succeeds. > > + * > > + * Populate dirty_bitmap by looking for entries that have been > > + * switched to read-write. > > + */ > > +int handle_page_fault(struct domain *d, paddr_t addr) { > > + lpae_t pte, *first = NULL, *second = NULL, *third = NULL; > > + struct p2m_domain *p2m = &d->arch.p2m; > > + > > + if ( d->arch.dirty.mode == 0 || first_table_offset(addr) >> LPAE_ENTRIES ) > > + return 1; > > if this is an error condition, it would be better to return an errorThe first condition is not an error and the second one is. And, I got the idea.> > > > + spin_lock(&p2m->lock); > > + > > + first = __map_domain_page(p2m->first_level); > > + > > + if ( !first || > > + !first[first_table_offset(addr)].walk.valid || > > + !first[first_table_offset(addr)].walk.table ) > > + goto done; > > + > > + second > > + map_domain_page(first[first_table_offset(addr)].walk.base); > > + > > + if ( !second || > > + !second[second_table_offset(addr)].walk.valid || > > + !second[second_table_offset(addr)].walk.table ) > > + goto done; > > + > > + third > > + map_domain_page(second[second_table_offset(addr)].walk.base); > > + > > + if ( !third ) > > + goto done; > > + > > + pte = third[third_table_offset(addr)]; > > + > > + if (pte.p2m.valid && pte.p2m.write == 0) > > + { > > + mark_dirty(d, addr); > > + pte.p2m.write = 1; > > Would this bit be sufficient? Otherwise could we take one of the other > available bits in the p2m pte? That would free us from having to handle a > separate data structure to handle dirty bits.I think it would fix the memory inefficiency problem. But, I would like to follow the way Ian suggested, making a linear dirty-bit map for guest p2m. I think it can increase the write fault handling performance and also peeking performance.> > > > + write_pte(&third[third_table_offset(addr)], pte); > > + flush_tlb_local(); > > flush_tlb_local should be OK because I think that we can always guarantee > that the current VMID is the VMID of the guest we are handling the fault > for.Got it.> > > > + } > > + > > +done: > > + if (third) unmap_domain_page(third); > > + if (second) unmap_domain_page(second); > > + if (first) unmap_domain_page(first); > > + > > + spin_unlock(&p2m->lock); > > + return 0; > > +} > > + > > /* > > * Local variables: > > * mode: C > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index > > 8bf7eb7..bae7af7 100644 > > --- a/xen/arch/arm/p2m.c > > +++ b/xen/arch/arm/p2m.c > > @@ -412,6 +412,90 @@ unsigned long gmfn_to_mfn(struct domain *d, > unsigned long gpfn) > > return p >> PAGE_SHIFT; > > } > > > > +static struct page_info * new_dirty_page(void) { > > + struct page_info *page = NULL; > > + void *p; > > + > > + page = alloc_domheap_page(NULL, 0); > > + if ( page == NULL ) > > + return NULL; > > + > > + p = __map_domain_page(page); > > + > > + clear_page(p); > > + unmap_domain_page(p); > > + > > + return page; > > +} > > + > > +void mark_dirty(struct domain *d, paddr_t addr) { > > + struct page_info *page; > > + xen_pfn_t *first = NULL, *second = NULL, *third = NULL; > > + void *p; > > + int changed; > > + > > + spin_lock(&d->arch.dirty.lock); > > + > > + if (d->arch.dirty.top == NULL) > > + { > > + page = alloc_domheap_pages(NULL, 1, 0); > > + if (page == NULL) > > + { > > + printk("Error: couldn''t allocate page for dirtybitmap!\n");> > + spin_unlock(&d->arch.dirty.lock); > > + return; > > no error return codes?Got it.> > > > + } > > + > > + INIT_PAGE_LIST_HEAD(&d->arch.dirty.pages); > > + page_list_add(page, &d->arch.dirty.pages); > > + > > + /* Clear both first level pages */ > > + p = __map_domain_page(page); > > + clear_page(p); > > + unmap_domain_page(p); > > + > > + p = __map_domain_page(page + 1); > > + clear_page(p); > > + unmap_domain_page(p); > > + > > + d->arch.dirty.top = page; > > + } > > + > > + first = __map_domain_page(d->arch.dirty.top); > > + BUG_ON(!first && "Can''t map first level p2m."); > > > > + if ( !first[first_table_offset(addr)]) > > + { > > + page = new_dirty_page(); > > + page_list_add(page, &d->arch.dirty.pages); > > + first[first_table_offset(addr)] = page_to_mfn(page); > > + } > > + > > + second = map_domain_page(first[first_table_offset(addr)]); > > + BUG_ON(!second && "Can''t map second level p2m."); > > > > + if (!second[second_table_offset(addr)]) > > + { > > + page = new_dirty_page(); > > + page_list_add(page, &d->arch.dirty.pages); > > + second[second_table_offset(addr)] = page_to_mfn(page); > > + } > > + > > + third = map_domain_page(second[second_table_offset(addr)]); > > + BUG_ON(!third && "Can''t map third level p2m."); > > > > + changed = !__test_and_set_bit(third_table_offset(addr), third); > > If I am not missing something, the third_table_offset was written to give > you an index into a page to retrieve a 64 bit pte. > If you are only using 1 bit, shouldn''t you be using a different indexing > system? Otherwise you are wasting 63 bits for each entry.You got this right. It is also memory inefficient.> > > > + if (changed) > ^ code styleOops!> > > + { > > + d->arch.dirty.count++; > > + } > > + > > + if (third) unmap_domain_page(third); > > + if (second) unmap_domain_page(second); > > + if (first) unmap_domain_page(first); > > + > > + spin_unlock(&d->arch.dirty.lock); } > > + > > /* > > * Local variables: > > * mode: C
Jaeyong Yoo
2013-Jul-04 01:02 UTC
Re: [PATCH RFC v2 6/9] xen/arm: Add handling write fault for dirty-page tracing
> -----Original Message----- > From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > Sent: Wednesday, July 03, 2013 9:27 PM > To: Jaeyong Yoo > Cc: xen-devel@lists.xen.org; Elena Pyatunina > Subject: Re: [Xen-devel] [PATCH RFC v2 6/9] xen/arm: Add handling write > fault for dirty-page tracing > > On Wed, 2013-07-03 at 18:15 +0900, Jaeyong Yoo wrote: > > From: Elena Pyatunina <e.pyatunina@samsung.com> > > > > Add handling write fault in do_trap_data_abort_guest for dirty-page > > tracing. Mark dirty function makes the bitmap of dirty pages. > > You''ve gone with the model of allocating a bitmap when logdirty is enabled > and updating it directly from the fault handler? The bit map is based > around a three level structure mirroring the p2m tables?Yes, you got it right.> > I had been considering the option of reusing one of the bits available for > software use in the p2m entries to track the dirty status and walking the > p2m when the toolstack asked for the current bitmap instead. > > I think this would have less memory overhead and also would allow us to be > lock free on the fault handling path (by using suitable atomic exchanges > on the p2m entries), as well as avoiding memory allocations on that path. > > I think we''d want to establish a linear map of the current guest p2m for > these purposes so we could also avoid all the map_domain_page stuff in the > fault path. I think there is enough virtual address space left for that > even on 32-bit, it''s less critical on 64-bit anyway since we are about to > have a direct map of RAM available to us.I think it is a good idea to have an ever-mapping linear dirty-bit map for guest p2m. Maybe between fixmap and frametable?> > > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index > > 398d209..9927712 100644 > > --- a/xen/arch/arm/traps.c > > +++ b/xen/arch/arm/traps.c > > @@ -1011,6 +1011,13 @@ static void do_trap_data_abort_guest(struct > cpu_user_regs *regs, > > if ( rc == -EFAULT ) > > goto bad_data_abort; > > > > + /* handle permission fault on write */ > > + if ((dabt.dfsc & 0x3f) == (FSC_FLT_PERM + 3) && dabt.write) > > Can we get some #defines for these magic constants please.Got it.> > > + { > > + if (handle_page_fault(current->domain, info.gpa) == 0) > > + return; > > + } > > + > > if (handle_mmio(&info)) > > Since you handle page fault first, won''t this make MMIO trapping regions > writable?Oops. I forget to exchange the sequence of handle_page_fault and handle_mmio. Then, it would fix the problem right? Because handle_mmio is handling virtual emulated devices, I don''t think writing to mmio is necessary to be dirty-page traced.> > Ian.
Jaeyong Yoo
2013-Jul-04 01:25 UTC
Re: [PATCH RFC v2 8/9] xen/arm: Implement hypercall for dirty page tracing (shadow op)
> -----Original Message----- > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com] > Sent: Wednesday, July 03, 2013 9:39 PM > To: Jaeyong Yoo > Cc: xen-devel@lists.xen.org; Elena Pyatunina > Subject: Re: [Xen-devel] [PATCH RFC v2 8/9] xen/arm: Implement hypercall > for dirty page tracing (shadow op) > > On Wed, 3 Jul 2013, Jaeyong Yoo wrote: > > From: Elena Pyatunina <e.pyatunina@samsung.com> > > > > Add hypercall (shadow op: enable/disable and clean/peek dirted page > > bitmap) > > > > Signed-off-by: Elena Pyatunina <e.pyatunina@samsung.com> > > Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com> > > --- > > xen/arch/arm/domain.c | 4 + > > xen/arch/arm/domctl.c | 19 ++++ > > xen/arch/arm/p2m.c | 219 > +++++++++++++++++++++++++++++++++++++++++++++- > > xen/include/asm-arm/p2m.h | 2 + > > 4 files changed, 242 insertions(+), 2 deletions(-) > > > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index > > 73dd0de..e3fc533 100644 > > --- a/xen/arch/arm/domain.c > > +++ b/xen/arch/arm/domain.c > > @@ -484,6 +484,10 @@ int arch_domain_create(struct domain *d, unsigned > int domcr_flags) > > d->arch.vpidr = boot_cpu_data.midr.bits; > > d->arch.vmpidr = boot_cpu_data.mpidr.bits; > > > > + /* init for dirty-page tracing */ > > + d->arch.dirty.count = 0; > > + d->arch.dirty.top = NULL; > > + > > clear_page(d->shared_info); > > share_xen_page_with_guest( > > virt_to_page(d->shared_info), d, XENSHARE_writable); diff > > --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c index > > a0719b0..bc06534 100644 > > --- a/xen/arch/arm/domctl.c > > +++ b/xen/arch/arm/domctl.c > > @@ -67,6 +67,14 @@ long arch_do_domctl(struct xen_domctl *domctl, struct > domain *d, > > goto gethvmcontext_out; > > } > > > > + /* Allocate our own marshalling buffer */ > > + ret = -ENOMEM; > > + if ( (c.data = xmalloc_bytes(c.size)) == NULL ) > > + { > > + printk("(gethvmcontext) xmalloc_bytes failed: %d\n", c.size);> > + goto gethvmcontext_out; > > + } > > + > > domain_pause(d); > > ret = hvm_save(d, &c); > > domain_unpause(d); > > Shouldn''t this hunk belong to patch #2?Oops my mistake!> > > > @@ -85,6 +93,17 @@ long arch_do_domctl(struct xen_domctl *domctl, struct > domain *d, > > xfree(c.data); > > } > > break; > > + case XEN_DOMCTL_shadow_op: > > + { > > + ret = dirty_mode_op(d, &domctl->u.shadow_op); > > + if ( (&domctl->u.shadow_op)->op == XEN_DOMCTL_SHADOW_OP_CLEAN > > + || (&domctl->u.shadow_op)->op =XEN_DOMCTL_SHADOW_OP_PEEK) > > + { > > + copyback = 1; > > + } > > + } > > + break; > > + > > default: > > return -EINVAL; > > } > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index > > bae7af7..fb8ce10 100644 > > --- a/xen/arch/arm/p2m.c > > +++ b/xen/arch/arm/p2m.c > > @@ -69,8 +69,8 @@ out: > > > > spin_unlock(&p2m->lock); > > > > - return ( ((xen_pfn_t)first_index << FIRST_SHIFT) | > > - (second_index << SECOND_SHIFT) | > > + return ( ((xen_pfn_t)first_index << FIRST_SHIFT) | > > + (second_index << SECOND_SHIFT) | > > (third_index << THIRD_SHIFT) > > ) >> PAGE_SHIFT; > > Does this belong here? If so, why?Oops, mistake again. It doesn''t belong here. Sorry.> > > > @@ -496,6 +496,221 @@ void mark_dirty(struct domain *d, paddr_t addr) > > spin_unlock(&d->arch.dirty.lock); } > > > > +static void free_dirty_bitmap(struct domain *d) { > > + struct page_info *pg; > > + > > + spin_lock(&d->arch.dirty.lock); > > + > > + if(d->arch.dirty.top) > > + { > > + while ( (pg = page_list_remove_head(&d->arch.dirty.pages)) ) > > + free_domheap_page(pg); > > + } > > + d->arch.dirty.top = NULL; > > + d->arch.dirty.count = 0; > > + > > + spin_unlock(&d->arch.dirty.lock); } > > + > > +/* Change types across all p2m entries in a domain */ static void > > +p2m_change_entry_type_global(struct domain *d, enum mg ot, enum mg > > +nt) { > > + struct p2m_domain *p2m = &d->arch.p2m; > > + int i1, i2, i3; > > + int first_index = first_table_offset(GUEST_RAM_BASE); > > + int second_index = second_table_offset(GUEST_RAM_BASE); > > + int third_index = third_table_offset(GUEST_RAM_BASE); > > you shouldn''t assume GUEST_RAM_BASE is static anywhereYes, you are right. Since there is no way to figure out this value at the moment, we use this static value here. I think we need to parse dtb of domu guest and get this value from there. Do you have any suggestion?> > > > + lpae_t *first = __map_domain_page(p2m->first_level); > > + lpae_t pte, *second = NULL, *third = NULL; > > + > > + BUG_ON(!first && "Can''t map first level p2m."); > > + > > + spin_lock(&p2m->lock); > > + > > + for (i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1) > > + { > > + lpae_walk_t first_pte = first[i1].walk; > > + if (!first_pte.valid || !first_pte.table) > > + goto out; > > + > > + second = map_domain_page(first_pte.base); > > + BUG_ON(!second && "Can''t map second level p2m."); > > + for(i2 = second_index; i2 < LPAE_ENTRIES; ++i2) > > + { > > + lpae_walk_t second_pte = second[i2].walk; > > + if (!second_pte.valid || !second_pte.table) > > + goto out; > > + > > + third = map_domain_page(second_pte.base); > > + BUG_ON(!third && "Can''t map third level p2m."); > > + > > + for (i3 = third_index; i3 < LPAE_ENTRIES; ++i3) > > + { > > + lpae_walk_t third_pte = third[i3].walk; > > + int write = 0; > > + if (!third_pte.valid) > > + goto out; > > + > > + pte = third[i3]; > > + if (pte.p2m.write == 1 && nt == mg_ro) > > + { > > + pte.p2m.write = 0; > > + write = 1; > > + } > > + else if (pte.p2m.write == 0 && nt == mg_rw) > > + { > > + pte.p2m.write = 1; > > + write = 1; > > + } > > + if (write) > > + write_pte(&third[i3], pte); > > + } > > + unmap_domain_page(third); > > + > > + third = NULL; > > + third_index = 0; > > + } > > + unmap_domain_page(second); > > + > > + second = NULL; > > + second_index = 0; > > + third_index = 0; > > + } > > + > > +out: > > + flush_tlb_all_local(); > > + if (third) unmap_domain_page(third); > > + if (second) unmap_domain_page(second); > > + if (first) unmap_domain_page(first); > > + > > + spin_unlock(&p2m->lock); > > +} > > + > > +/* Read a domain''s log-dirty bitmap and stats. > > + * If the operation is a CLEAN, clear the bitmap and stats. */ static > > +int log_dirty_op(struct domain *d, xen_domctl_shadow_op_t *sc) { > > + int i1, i2, rv = 0, clean = 0, peek = 1; > > + xen_pfn_t *first = NULL, *second = NULL; > > + unsigned long pages = 0, *third = NULL; > > + > > + clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN); > > + > > + if (guest_handle_is_null(sc->dirty_bitmap)) > > + peek = 0; > > + > > + domain_pause(d); > > + > > + printk("DEBUG: log-dirty %s: dom %u dirty=%u\n", > > + (clean) ? "clean" : "peek", > > + d->domain_id, d->arch.dirty.count); > > + > > + sc->stats.dirty_count = d->arch.dirty.count; > > + > > + spin_lock(&d->arch.dirty.lock); > > + first = d->arch.dirty.top ? __map_domain_page(d->arch.dirty.top) > > + : NULL; > > + > > + /* 32-bit address space starts from 0 and ends at ffffffff. > > This should be a 40 bit address space due to LPAE supportYes, you are right.> > > > + * First level of p2m tables is indexed by bits 39-30. > > + * So significant bits for 32-bit addresses in first table are31-30.> > + * So we need to iterate from 0 to 4. > > + * But RAM starts from 2 gigabytes. So start from 2 instead. > > + * TODO remove this hardcode > > I wouldn''t assume that the guest ram start at 2GBYes, right, and we are thinking of using dtb for guest> > > > + */ > > + for (i1 = 2; (pages < sc->pages) && (i1 < 4); ++i1) > > + { > > + second = (first && first[i1]) ? map_domain_page(first[i1]) : > > + NULL; > > + > > + for (i2 = 0; (i2 < LPAE_ENTRIES); ++i2) > > + { > > + unsigned int bytes = LPAE_ENTRIES/8; > > + third = (second && second[i2]) ?map_domain_page(second[i2]) :> NULL; > > + if ( unlikely(((sc->pages - pages + 7) >> 3) < bytes) ) > > + bytes = (unsigned int)((sc->pages - pages + 7) >> 3); > > + if (peek) > ^ code style > > + { > > + if ( (third ? copy_to_guest_offset(sc->dirty_bitmap,pages> >> 3, > > + (uint8_t *)third,bytes)> > + : clear_guest_offset(sc->dirty_bitmap,pages >> 3,> > + (uint8_t*)NULL /*onlytype> matters*/, > > + bytes)) != 0 ) > > + { > > + rv = 1; > > + goto out; > > + } > > + } > > + pages += bytes << 3; > > + if (third) > ^ code style > > + { > > + if (clean) clear_page(third); > > + unmap_domain_page(third); > > + } > > + } > > + if (second) unmap_domain_page(second); > > + } > > + if (first) unmap_domain_page(first); > > + > > + spin_unlock(&d->arch.dirty.lock); > > + > > + if ( pages < sc->pages ) > > + sc->pages = pages; > > + > > + if (clean) > ^ code style > > + { > > + p2m_change_entry_type_global(d, mg_rw, mg_ro); > > + free_dirty_bitmap(d); > > + } > > + > > +out: > > + if (rv) > > + { > > + spin_unlock(&d->arch.dirty.lock); > > + } > > + domain_unpause(d); > > + return rv; > > +} > > + > > +long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc) { > > + long ret = 0; > > + > > + switch (sc->op) > > + { > > + case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY: > > + case XEN_DOMCTL_SHADOW_OP_OFF: > > + { > > + enum mg ot = mg_clear, nt = mg_clear; > > + > > + ot = sc->op == XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY ? mg_rw:> mg_ro; > > + nt = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? mg_rw : mg_ro; > > + > > + domain_pause(d); > > + > > + d->arch.dirty.mode = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? 0:> 1; > > + p2m_change_entry_type_global(d, ot, nt); > > + > > + if (sc->op == XEN_DOMCTL_SHADOW_OP_OFF) > > + free_dirty_bitmap(d); > > + > > + domain_unpause(d); > > Do we need a lock to make sure that all the operations happen atomically?Yes right. Since anyone can call xl simultaneously.> > > > + } > > + break; > > + > > + case XEN_DOMCTL_SHADOW_OP_CLEAN: > > + case XEN_DOMCTL_SHADOW_OP_PEEK: > > + { > > + ret = log_dirty_op(d, sc); > > + } > > + break; > > For symmetry maybe it would be better to pause the domain outside > log_dirty_opOK.> > > > + default: > > + return -ENOSYS; > > + } > > + return ret; > > +} > > + > > /* > > * Local variables: > > * mode: C > > diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h > > index fd5890f..498cf0b 100644 > > --- a/xen/include/asm-arm/p2m.h > > +++ b/xen/include/asm-arm/p2m.h > > @@ -2,6 +2,7 @@ > > #define _XEN_P2M_H > > > > #include <xen/mm.h> > > +#include <public/domctl.h> > > > > struct domain; > > > > @@ -112,6 +113,7 @@ static inline int get_page_and_type(struct > > page_info *page, > > > > enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; void mark_dirty(struct > > domain *d, paddr_t addr); > > +long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc); > > > > #endif /* _XEN_P2M_H */ > > > > -- > > 1.8.1.2 > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel > >
Jaeyong Yoo
2013-Jul-04 07:21 UTC
Re: [PATCH RFC v2 6/9] xen/arm: Add handling write fault for dirty-page tracing
> > I think we''d want to establish a linear map of the current guest p2m > > for these purposes so we could also avoid all the map_domain_page > > stuff in the fault path. I think there is enough virtual address space > > left for that even on 32-bit, it''s less critical on 64-bit anyway > > since we are about to have a direct map of RAM available to us. > > I think it is a good idea to have an ever-mapping linear dirty-bit map for > guest p2m. Maybe between fixmap and frametable? >CTTOI, just mapping to xenheap is enough. Jaeyong
Ian Campbell
2013-Jul-04 08:42 UTC
Re: [PATCH RFC v2 8/9] xen/arm: Implement hypercall for dirty page tracing (shadow op)
On Thu, 2013-07-04 at 10:25 +0900, Jaeyong Yoo wrote:> > > @@ -496,6 +496,221 @@ void mark_dirty(struct domain *d, paddr_t addr) > > > spin_unlock(&d->arch.dirty.lock); } > > > > > > +static void free_dirty_bitmap(struct domain *d) { > > > + struct page_info *pg; > > > + > > > + spin_lock(&d->arch.dirty.lock); > > > + > > > + if(d->arch.dirty.top) > > > + { > > > + while ( (pg = page_list_remove_head(&d->arch.dirty.pages)) ) > > > + free_domheap_page(pg); > > > + } > > > + d->arch.dirty.top = NULL; > > > + d->arch.dirty.count = 0; > > > + > > > + spin_unlock(&d->arch.dirty.lock); } > > > + > > > +/* Change types across all p2m entries in a domain */ static void > > > +p2m_change_entry_type_global(struct domain *d, enum mg ot, enum mg > > > +nt) { > > > + struct p2m_domain *p2m = &d->arch.p2m; > > > + int i1, i2, i3; > > > + int first_index = first_table_offset(GUEST_RAM_BASE); > > > + int second_index = second_table_offset(GUEST_RAM_BASE); > > > + int third_index = third_table_offset(GUEST_RAM_BASE); > > > > you shouldn''t assume GUEST_RAM_BASE is static anywhere > > Yes, you are right. Since there is no way to figure out this value at > the moment, we use this static value here. I think we need to parse dtb > of domu guest and get this value from there. Do you have any suggestion?The hypervisor should never be parsing (or even seeing) the domU DTB. Instead the toolstack should poke this sort of information down via a (presumably new) DOMCTL. Ian.
Ian Campbell
2013-Jul-04 08:46 UTC
Re: [PATCH RFC v2 6/9] xen/arm: Add handling write fault for dirty-page tracing
On Thu, 2013-07-04 at 10:02 +0900, Jaeyong Yoo wrote:> > I had been considering the option of reusing one of the bits available for > > software use in the p2m entries to track the dirty status and walking the > > p2m when the toolstack asked for the current bitmap instead. > > > > I think this would have less memory overhead and also would allow us to be > > lock free on the fault handling path (by using suitable atomic exchanges > > on the p2m entries), as well as avoiding memory allocations on that path. > > > > I think we''d want to establish a linear map of the current guest p2m for > > these purposes so we could also avoid all the map_domain_page stuff in the > > fault path. I think there is enough virtual address space left for that > > even on 32-bit, it''s less critical on 64-bit anyway since we are about to > > have a direct map of RAM available to us. > > I think it is a good idea to have an ever-mapping linear dirty-bit map for > guest p2m.By Linear map I meant slotting the p2m into the hypervisor''s own page tables (I believe the relevant p2m and pte bits in the entries do not overlap, so this is possible), such that the p2m table entries are available at known virtual addresses and you can locate the L1, L2 and L3 entries corresponding to a given guest physical address with just arithmetic, i.e. something like: http://www.technovelty.org/linux/virtual-linear-page-table.html As opposed to an always mapped version of the dirty bitmap, which I think we can sync only when the tools ask.> Maybe between fixmap and frametable? > > > > > > > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index > > > 398d209..9927712 100644 > > > --- a/xen/arch/arm/traps.c > > > +++ b/xen/arch/arm/traps.c > > > @@ -1011,6 +1011,13 @@ static void do_trap_data_abort_guest(struct > > cpu_user_regs *regs, > > > if ( rc == -EFAULT ) > > > goto bad_data_abort; > > > > > > + /* handle permission fault on write */ > > > + if ((dabt.dfsc & 0x3f) == (FSC_FLT_PERM + 3) && dabt.write) > > > > Can we get some #defines for these magic constants please. > > Got it. > > > > > > + { > > > + if (handle_page_fault(current->domain, info.gpa) == 0) > > > + return; > > > + } > > > + > > > if (handle_mmio(&info)) > > > > Since you handle page fault first, won''t this make MMIO trapping regions > > writable? > > Oops. I forget to exchange the sequence of handle_page_fault and handle_mmio. > Then, it would fix the problem right? > Because handle_mmio is handling virtual emulated devices, > I don''t think writing to mmio is necessary to be dirty-page traced. > > > > > > Ian. >
Ian Campbell
2013-Jul-04 08:47 UTC
Re: [PATCH RFC v2 4/9] xen/arm: Implement get_maximum_gpfn hypercall for arm
On Thu, 2013-07-04 at 09:09 +0900, Jaeyong Yoo wrote:> > > > > > Rather than implementing p2m_get_next_non_used_gpfn, I think we should > > > keep track of the maximum gpfn, like we do on x86, see: > > > > ISTR Tim wanting to get rid of this concept on x86. > > > > Anything which is walking the P2M should be doing so via iterator based > > interfaces and not by looping on O...N where N is potentially guest > > controlled. Not to mention 0...N might be quite sparse. > > > > I guess we are now going to implment via iterator?I think we should, Tim may have some ideas about what such an interface should look like.
Jaeyong Yoo
2013-Jul-04 10:18 UTC
Re: [PATCH RFC v2 8/9] xen/arm: Implement hypercall for dirty page tracing (shadow op)
> > Yes, you are right. Since there is no way to figure out this value at > > the moment, we use this static value here. I think we need to parse > > dtb of domu guest and get this value from there. Do you have any > suggestion? > > The hypervisor should never be parsing (or even seeing) the domU DTB. > Instead the toolstack should poke this sort of information down via a > (presumably new) DOMCTL.Got it.> > Ian.
Stefano Stabellini
2013-Jul-04 10:29 UTC
Re: [PATCH RFC v2 8/9] xen/arm: Implement hypercall for dirty page tracing (shadow op)
On Thu, 4 Jul 2013, Jaeyong Yoo wrote:> > > Yes, you are right. Since there is no way to figure out this value at > > > the moment, we use this static value here. I think we need to parse > > > dtb of domu guest and get this value from there. Do you have any > > suggestion? > > > > The hypervisor should never be parsing (or even seeing) the domU DTB. > > Instead the toolstack should poke this sort of information down via a > > (presumably new) DOMCTL. > > Got it.I wonder whether we could find the start of memory by looking at the p2m and then storing the value in a per-domain variable to avoid a pagetable walk every time.
Ian Campbell
2013-Jul-04 10:36 UTC
Re: [PATCH RFC v2 8/9] xen/arm: Implement hypercall for dirty page tracing (shadow op)
On Thu, 2013-07-04 at 11:29 +0100, Stefano Stabellini wrote:> On Thu, 4 Jul 2013, Jaeyong Yoo wrote: > > > > Yes, you are right. Since there is no way to figure out this value at > > > > the moment, we use this static value here. I think we need to parse > > > > dtb of domu guest and get this value from there. Do you have any > > > suggestion? > > > > > > The hypervisor should never be parsing (or even seeing) the domU DTB. > > > Instead the toolstack should poke this sort of information down via a > > > (presumably new) DOMCTL. > > > > Got it. > > I wonder whether we could find the start of memory by looking at the p2m > and then storing the value in a per-domain variable to avoid a pagetable > walk every time.I think the toolstack should just tell the hypervisor. Ian.
Jaeyong Yoo
2013-Jul-04 11:47 UTC
Re: [PATCH RFC v2 6/9] xen/arm: Add handling write fault for dirty-page tracing
> -----Original Message----- > From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > Sent: Thursday, July 04, 2013 5:47 PM > To: Jaeyong Yoo > Cc: xen-devel@lists.xen.org; ''Elena Pyatunina'' > Subject: Re: [Xen-devel] [PATCH RFC v2 6/9] xen/arm: Add handling write > fault for dirty-page tracing > > On Thu, 2013-07-04 at 10:02 +0900, Jaeyong Yoo wrote: > > > > I had been considering the option of reusing one of the bits > > > available for software use in the p2m entries to track the dirty > > > status and walking the p2m when the toolstack asked for the current > bitmap instead. > > > > > > I think this would have less memory overhead and also would allow us > > > to be lock free on the fault handling path (by using suitable atomic > > > exchanges on the p2m entries), as well as avoiding memory allocations > on that path. > > > > > > I think we''d want to establish a linear map of the current guest p2m > > > for these purposes so we could also avoid all the map_domain_page > > > stuff in the fault path. I think there is enough virtual address > > > space left for that even on 32-bit, it''s less critical on 64-bit > > > anyway since we are about to have a direct map of RAM available to us. > > > > I think it is a good idea to have an ever-mapping linear dirty-bit map > > for guest p2m. > > By Linear map I meant slotting the p2m into the hypervisor''s own page > tables (I believe the relevant p2m and pte bits in the entries do not > overlap, so this is possible), such that the p2m table entries are > available at known virtual addresses and you can locate the L1, L2 and > L3 entries corresponding to a given guest physical address with just > arithmetic, i.e. something like: > http://www.technovelty.org/linux/virtual-linear-page-table.html >Oh it looks interesting. And we can try this. Thanks for the good tip :)> As opposed to an always mapped version of the dirty bitmap, which I think > we can sync only when the tools ask.Got it. Jaeyong
Jaeyong Yoo
2013-Jul-05 04:42 UTC
Re: [PATCH RFC v2 6/9] xen/arm: Add handling write fault for dirty-page tracing
> > > I think we''d want to establish a linear map of the current guest p2m > > > for these purposes so we could also avoid all the map_domain_page > > > stuff in the fault path. I think there is enough virtual address > > > space left for that even on 32-bit, it''s less critical on 64-bit > > > anyway since we are about to have a direct map of RAM available to us. > > > > I think it is a good idea to have an ever-mapping linear dirty-bit map > > for guest p2m. > > By Linear map I meant slotting the p2m into the hypervisor''s own page > tables (I believe the relevant p2m and pte bits in the entries do not > overlap, so this is possible), such that the p2m table entries are > available at known virtual addresses and you can locate the L1, L2 and > L3 entries corresponding to a given guest physical address with just > arithmetic, i.e. something like: > http://www.technovelty.org/linux/virtual-linear-page-table.htmlWhich virtual address range should we map the p2m? Would it be xenheap, domheap, or unassigned range (for instance, between frametable and vmap)? Jaeyong
Ian Campbell
2013-Jul-07 16:53 UTC
Re: [PATCH RFC v2 6/9] xen/arm: Add handling write fault for dirty-page tracing
On Fri, 2013-07-05 at 13:42 +0900, Jaeyong Yoo wrote:> > > > I think we''d want to establish a linear map of the current guest p2m > > > > for these purposes so we could also avoid all the map_domain_page > > > > stuff in the fault path. I think there is enough virtual address > > > > space left for that even on 32-bit, it''s less critical on 64-bit > > > > anyway since we are about to have a direct map of RAM available to us. > > > > > > I think it is a good idea to have an ever-mapping linear dirty-bit map > > > for guest p2m. > > > > By Linear map I meant slotting the p2m into the hypervisor''s own page > > tables (I believe the relevant p2m and pte bits in the entries do not > > overlap, so this is possible), such that the p2m table entries are > > available at known virtual addresses and you can locate the L1, L2 and > > L3 entries corresponding to a given guest physical address with just > > arithmetic, i.e. something like: > > http://www.technovelty.org/linux/virtual-linear-page-table.html > > Which virtual address range should we map the p2m? Would it be xenheap, > domheap, or unassigned range (for instance, between frametable and vmap)?Since it needs to take over a bunch of entries in the page tables it should be its own virtual address range. Looking at xen/include/asm-arm/config.h between the frametable and vmap seems sensible. Ian.
Eugene Fedotov
2013-Jul-10 15:34 UTC
Re: [PATCH RFC v2 8/9] xen/arm: Implement hypercall for dirty page tracing (shadow op)
1) Maybe it reasonable use a new HVM parameter for guest RAM base PFN instead of new DOMCTL and setup it from libxc through existing xc_set_hvm_param hypercall? 2) Since a static GUEST_RAM_BASE is also used in libxc zImage loader (xc_dom_armzimageloader.c) should we parse guest''s DTB in function xc_dom_parse_zimage_kernel or simply provide dom->rambase_pfn to the hypervisor? Best regards, Evgeny 04.07.2013 14:36, Ian Campbell> On Thu, 2013-07-04 at 11:29 +0100, Stefano Stabellini wrote: >> On Thu, 4 Jul 2013, Jaeyong Yoo wrote: >>>>> Yes, you are right. Since there is no way to figure out this value at >>>>> the moment, we use this static value here. I think we need to parse >>>>> dtb of domu guest and get this value from there. Do you have any >>>> suggestion? >>>> >>>> The hypervisor should never be parsing (or even seeing) the domU DTB. >>>> Instead the toolstack should poke this sort of information down via a >>>> (presumably new) DOMCTL. >>> Got it. >> I wonder whether we could find the start of memory by looking at the p2m >> and then storing the value in a per-domain variable to avoid a pagetable >> walk every time. > I think the toolstack should just tell the hypervisor. > > Ian. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
Ian Campbell
2013-Jul-10 15:39 UTC
Re: [PATCH RFC v2 8/9] xen/arm: Implement hypercall for dirty page tracing (shadow op)
On Wed, 2013-07-10 at 19:34 +0400, Eugene Fedotov wrote:> 1) Maybe it reasonable use a new HVM parameter for guest RAM base PFN > instead of new DOMCTL and setup it from libxc through existing > xc_set_hvm_param hypercall?Either that or perhaps reuse the existing set memory map hypercall?> 2) Since a static GUEST_RAM_BASE is also used in libxc zImage loader > (xc_dom_armzimageloader.c) should we parse guest''s DTB in function > xc_dom_parse_zimage_kernel or simply provide dom->rambase_pfn to the > hypervisor?Actually, we should be generating the DTB based on the guest configuration, rather than parsing the DTB to determine the configuration IYSWIM. Ian.
Eugene Fedotov
2013-Jul-11 10:18 UTC
Re: [PATCH RFC v2 8/9] xen/arm: Implement hypercall for dirty page tracing (shadow op)
Hi.> Either that or perhaps reuse the existing set memory map hypercall?The HVM parameter is easy to implement since we have many of them unused on ARM. I''ll think of reusing set_memory_map (this looks like a PV-specific on x86 and compat code for x86-64)> Actually, we should be generating the DTB based on the guest > configuration, rather than parsing the DTB to determine the > configuration IYSWIM.Could guest''s device tree generation be done automatically with some tool? Best regards, Evgeny.
Ian Campbell
2013-Jul-11 11:24 UTC
Re: [PATCH RFC v2 8/9] xen/arm: Implement hypercall for dirty page tracing (shadow op)
On Thu, 2013-07-11 at 14:18 +0400, Eugene Fedotov wrote:> Hi. > > > Either that or perhaps reuse the existing set memory map hypercall? > > The HVM parameter is easy to implement since we have many of them unused > on ARM. I''ll think of reusing set_memory_map (this looks like a > PV-specific on x86 and compat code for x86-64)Right, I menat to cop-opt extend it to support ARM too. The benefit is that it gives you a complete map and not just a base. I don''t know how useful that will be in practice but it seems like it might come up. I suppose you could push a map into an hvm param but it would not be a very good fit for that interface.> > > Actually, we should be generating the DTB based on the guest > > configuration, rather than parsing the DTB to determine the > > configuration IYSWIM. > > Could guest''s device tree generation be done automatically with some tool?My vague plan was to link against libfdt and use that.> > Best regards, > Evgeny. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Jaeyong Yoo
2013-Jul-12 00:54 UTC
Re: [PATCH RFC v2 6/9] xen/arm: Add handling write fault for dirty-page tracing
> -----Original Message----- > From: Ian Campbell [mailto:ian.campbell@citrix.com] > Sent: Monday, July 08, 2013 1:53 AM > To: Jaeyong Yoo > Cc: ''Elena Pyatunina''; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH RFC v2 6/9] xen/arm: Add handling write > fault for dirty-page tracing > > On Fri, 2013-07-05 at 13:42 +0900, Jaeyong Yoo wrote: > > > > > I think we''d want to establish a linear map of the current guest > > > > > p2m for these purposes so we could also avoid all the > > > > > map_domain_page stuff in the fault path. I think there is enough > > > > > virtual address space left for that even on 32-bit, it''s less > > > > > critical on 64-bit anyway since we are about to have a direct map > of RAM available to us. > > > > > > > > I think it is a good idea to have an ever-mapping linear dirty-bit > > > > map for guest p2m. > > > > > > By Linear map I meant slotting the p2m into the hypervisor''s own > > > page tables (I believe the relevant p2m and pte bits in the entries > > > do not overlap, so this is possible), such that the p2m table > > > entries are available at known virtual addresses and you can locate > > > the L1, L2 and > > > L3 entries corresponding to a given guest physical address with just > > > arithmetic, i.e. something like: > > > http://www.technovelty.org/linux/virtual-linear-page-table.html > > > > Which virtual address range should we map the p2m? Would it be > > xenheap, domheap, or unassigned range (for instance, between frametable > and vmap)? > > Since it needs to take over a bunch of entries in the page tables it > should be its own virtual address range. Looking at xen/include/asm- > arm/config.h between the frametable and vmap seems sensible.I have finished VLPT (virtual-linear-page-table) and it gives much faster speed for dirty-page marking. For original version, it took around 5 usec for marking a dirty-bit, but now it only takes around 1.5 usec since we remove all map/unmap domain pages. Now, I''m thinking of migrating multiple VMs at a moment, then we need to assign virtual memory for VLPT to different each migrating domain. And, I''m thinking to implement similar to vmap assignment. Can you tell me your opinion about this? Jaeyong