Jaeyong Yoo
2013-Aug-01 12:57 UTC
[PATCH v3 00/10] xen/arm: live migration support in arndale board
Hi all, here goes the v3 patch series for live migration in arndale board. This version applies the comments from v2 patch series, which are majorly: 1) just use one timer struct for storing vtimer and ptimer for hvm context: patch 1 2) for dirty page tracing, use virtual-linear page table for accessing guest p2m in xen: patch 6, 7, and 9 3) Rather than using hard-coded guest memory map in xen, use the one from toolstack by implementing set/get_memory_map hypercall: patch 3 and 10 This patch series does not support SMP guest migration. We are expecting to provide SMP-guests live migration in version 4 patch series. We also have tested the stability of v3 patch series as follows: - setup two arndale boards with xen (let''s say A and B) - launch 3 domUs at A - simulataneously migrate 3 domUs from A to B back and forth. - we say one round of migration if all 3 domUs migrate from A to B and migrate back from B to A. When we perform the above tests without any load on domUs, the migration goes to 80~100 rounds. After that, dom0 suddenly stops responding. When we perform with network load (iperf on each domU), the migration goes to 2~3 rounds and dom0 stops responding. After several repeated tests, we gather the PCs where dom0 stucks, and those are - _raw_spin_lock: (called by try_to_wake_up) - panic_smp_self_stop - cpu_v7_dcache_clean_area I think those bugs are somehow related to live migration or maybe other parts that may result in complicated cause-and-effect chain to live migration. In any case, I would like to look into the detail to figure out the cause and possibly fix the bugs. In the meanwhile, I would appreciate your comments to this patch series :) Best, Jaeyong lexey Sokolov, Elena Pyatunina, Evgeny Fedotov, and Nikolay Martyanov (1): xen/arm: Implement toolstack for xl restore/save and migrate Alexey Sokolov (1): xen/arm: Implement modify_returncode Evgeny Fedotov (2): xen/arm: Implement set_memory_map hypercall xen/arm: Implement get_maximum_gpfn hypercall for arm Jaeyong Yoo and Evgeny Fedotov (1): xen/arm: Implement hvm save and restore Jaeyong Yoo and Alexey Sokolov (1): xen/arm: Add more registers for saving and restoring vcpu registers Jaeyong Yoo and Elena Pyatunina (2) xen/arm: Add handling write fault for dirty-page tracing xen/arm: Implement hypercall for dirty page tracing (shadow op) Jaeyong Yoo (2): xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration xen/arm: Fixing clear_guest_offset macro config/arm32.mk | 1 + tools/include/xen-foreign/reference.size | 2 +- tools/libxc/Makefile | 5 + tools/libxc/xc_arm_migrate.c | 686 +++++++++++++++++++++++++++++++ tools/libxc/xc_dom_arm.c | 12 +- tools/libxc/xc_domain.c | 44 ++ tools/libxc/xc_resume.c | 25 ++ tools/libxc/xenctrl.h | 23 ++ tools/misc/Makefile | 4 + xen/arch/arm/Makefile | 2 + xen/arch/arm/domain.c | 44 ++ xen/arch/arm/domctl.c | 137 +++++- xen/arch/arm/hvm.c | 124 ++++++ xen/arch/arm/mm.c | 284 ++++++++++++- xen/arch/arm/p2m.c | 307 ++++++++++++++ xen/arch/arm/save.c | 66 +++ xen/arch/arm/setup.c | 3 + xen/arch/arm/traps.c | 16 +- xen/arch/arm/vlpt.c | 162 ++++++++ xen/common/Makefile | 2 + xen/include/asm-arm/config.h | 3 + xen/include/asm-arm/domain.h | 13 + xen/include/asm-arm/guest_access.h | 5 +- xen/include/asm-arm/hvm/support.h | 29 ++ xen/include/asm-arm/mm.h | 7 + xen/include/asm-arm/p2m.h | 4 + xen/include/asm-arm/processor.h | 2 + xen/include/asm-arm/vlpt.h | 10 + xen/include/public/arch-arm.h | 35 ++ xen/include/public/arch-arm/hvm/save.h | 41 ++ xen/include/public/memory.h | 15 +- xen/include/xsm/dummy.h | 5 + xen/include/xsm/xsm.h | 5 + 33 files changed, 2113 insertions(+), 10 deletions(-) create mode 100644 tools/libxc/xc_arm_migrate.c create mode 100644 xen/arch/arm/save.c create mode 100644 xen/arch/arm/vlpt.c create mode 100644 xen/include/asm-arm/hvm/support.h create mode 100644 xen/include/asm-arm/vlpt.h -- 1.8.1.2
Jaeyong Yoo
2013-Aug-01 12:57 UTC
[PATCH v3 01/10] xen/arm: Implement hvm save and restore
Implement save/restore of hvm context hypercall. In hvm context save/restore, we save gic and timer registers. Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> --- xen/arch/arm/Makefile | 1 + xen/arch/arm/domctl.c | 89 ++++++++++++++++++++++- xen/arch/arm/hvm.c | 124 +++++++++++++++++++++++++++++++++ xen/arch/arm/save.c | 66 ++++++++++++++++++ xen/common/Makefile | 2 + xen/include/asm-arm/hvm/support.h | 29 ++++++++ xen/include/public/arch-arm/hvm/save.h | 41 +++++++++++ 7 files changed, 351 insertions(+), 1 deletion(-) create mode 100644 xen/arch/arm/save.c create mode 100644 xen/include/asm-arm/hvm/support.h diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile index 5ae5831..fa15412 100644 --- a/xen/arch/arm/Makefile +++ b/xen/arch/arm/Makefile @@ -30,6 +30,7 @@ obj-y += vtimer.o obj-y += vpl011.o obj-y += hvm.o obj-y += device.o +obj-y += save.o #obj-bin-y += ....o diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c index 851ee40..cb38e59 100644 --- a/xen/arch/arm/domctl.c +++ b/xen/arch/arm/domctl.c @@ -9,12 +9,99 @@ #include <xen/lib.h> #include <xen/errno.h> #include <xen/sched.h> +#include <xen/hvm/save.h> +#include <xen/guest_access.h> #include <public/domctl.h> long arch_do_domctl(struct xen_domctl *domctl, struct domain *d, XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) { - return -ENOSYS; + long ret = 0; + bool_t copyback = 0; + + switch ( domctl->cmd ) + { + case XEN_DOMCTL_sethvmcontext: + { + struct hvm_domain_context c = { .size = domctl->u.hvmcontext.size }; + + ret = -ENOMEM; + if ( (c.data = xmalloc_bytes(c.size)) == NULL ) + goto sethvmcontext_out; + + ret = -EFAULT; + if ( copy_from_guest(c.data, domctl->u.hvmcontext.buffer, c.size) != 0 ) + goto sethvmcontext_out; + + domain_pause(d); + ret = hvm_load(d, &c); + domain_unpause(d); + + sethvmcontext_out: + if ( c.data != NULL ) + xfree(c.data); + } + break; + case XEN_DOMCTL_gethvmcontext: + { + struct hvm_domain_context c = { 0 }; + + ret = -EINVAL; + + c.size = hvm_save_size(d); + + if ( guest_handle_is_null(domctl->u.hvmcontext.buffer) ) + { + /* Client is querying for the correct buffer size */ + domctl->u.hvmcontext.size = c.size; + ret = 0; + goto gethvmcontext_out; + } + + /* Check that the client has a big enough buffer */ + ret = -ENOSPC; + if ( domctl->u.hvmcontext.size < c.size ) + { + printk("(gethvmcontext) size error: %d and %d\n", + domctl->u.hvmcontext.size, c.size ); + goto gethvmcontext_out; + } + + /* Allocate our own marshalling buffer */ + ret = -ENOMEM; + if ( (c.data = xmalloc_bytes(c.size)) == NULL ) + { + printk("(gethvmcontext) xmalloc_bytes failed: %d\n", c.size ); + goto gethvmcontext_out; + } + + domain_pause(d); + ret = hvm_save(d, &c); + domain_unpause(d); + + domctl->u.hvmcontext.size = c.cur; + if ( copy_to_guest(domctl->u.hvmcontext.buffer, c.data, c.size) != 0 ) + { + printk("(gethvmcontext) copy to guest failed\n"); + ret = -EFAULT; + } + + gethvmcontext_out: + copyback = 1; + + if ( c.data != NULL ) + xfree(c.data); + } + break; + + default: + return -EINVAL; + } + + if ( copyback && __copy_to_guest(u_domctl, domctl, 1) ) + ret = -EFAULT; + + return ret; } void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c) diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c index 471c4cd..9db5b20 100644 --- a/xen/arch/arm/hvm.c +++ b/xen/arch/arm/hvm.c @@ -7,6 +7,7 @@ #include <xsm/xsm.h> +#include <xen/hvm/save.h> #include <public/xen.h> #include <public/hvm/params.h> #include <public/hvm/hvm_op.h> @@ -65,3 +66,126 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) return rc; } + +static int gic_save(struct domain *d, hvm_domain_context_t *h) +{ + struct hvm_hw_gic ctxt; + struct vcpu *v; + + /* Save the state of GICs */ + for_each_vcpu( d, v ) + { + ctxt.gic_hcr = v->arch.gic_hcr; + ctxt.gic_vmcr = v->arch.gic_vmcr; + ctxt.gic_apr = v->arch.gic_apr; + + if ( hvm_save_entry(GIC, v->vcpu_id, h, &ctxt) != 0 ) + return 1; + } + return 0; +} + +static int gic_load(struct domain *d, hvm_domain_context_t *h) +{ + int vcpuid; + struct hvm_hw_gic ctxt; + struct vcpu *v; + + /* Which vcpu is this? */ + vcpuid = hvm_load_instance(h); + if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) + { + dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n", + d->domain_id, vcpuid); + return -EINVAL; + } + + if ( hvm_load_entry(GIC, h, &ctxt) != 0 ) + return -EINVAL; + + v->arch.gic_hcr = ctxt.gic_hcr; + v->arch.gic_vmcr = ctxt.gic_vmcr; + v->arch.gic_apr = ctxt.gic_apr; + + return 0; +} + +HVM_REGISTER_SAVE_RESTORE(GIC, gic_save, gic_load, 1, HVMSR_PER_VCPU); + +static int timer_save(struct domain *d, hvm_domain_context_t *h) +{ + struct hvm_hw_timer ctxt; + struct vcpu *v; + struct vtimer *t; + int i; + + /* Save the state of vtimer and ptimer */ + for_each_vcpu( d, v ) + { + t = &v->arch.virt_timer; + for ( i = 0; i < 2; i++ ) + { + ctxt.cval = t->cval; + ctxt.ctl = t->ctl; + ctxt.vtb_offset = i ? d->arch.phys_timer_base.offset : + d->arch.virt_timer_base.offset; + ctxt.type = i; + if ( hvm_save_entry(A15_TIMER, v->vcpu_id, h, &ctxt) != 0 ) + return 1; + t = &v->arch.phys_timer; + } + } + + return 0; +} + +static int timer_load(struct domain *d, hvm_domain_context_t *h) +{ + int vcpuid; + struct hvm_hw_timer ctxt; + struct vcpu *v; + struct vtimer *t = NULL; + + /* Which vcpu is this? */ + vcpuid = hvm_load_instance(h); + + if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL ) + { + dprintk(XENLOG_G_ERR, "HVM restore: dom%u has no vcpu%u\n", + d->domain_id, vcpuid); + return -EINVAL; + } + + if ( hvm_load_entry(A15_TIMER, h, &ctxt) != 0 ) + return -EINVAL; + + if ( ctxt.type == TIMER_TYPE_VIRT ) + { + t = &v->arch.virt_timer; + d->arch.virt_timer_base.offset = ctxt.vtb_offset; + + } + else + { + t = &v->arch.phys_timer; + d->arch.phys_timer_base.offset = ctxt.vtb_offset; + } + + t->cval = ctxt.cval; + t->ctl = ctxt.ctl; + t->v = v; + + return 0; +} + +HVM_REGISTER_SAVE_RESTORE(A15_TIMER, timer_save, timer_load, 2, HVMSR_PER_VCPU); + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/arch/arm/save.c b/xen/arch/arm/save.c new file mode 100644 index 0000000..c923910 --- /dev/null +++ b/xen/arch/arm/save.c @@ -0,0 +1,66 @@ +/* + * hvm/save.c: Save and restore HVM guest''s emulated hardware state for ARM. + * + * Copyright (c) 2013, Samsung Electronics. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + */ + +#include <asm/hvm/support.h> +#include <public/hvm/save.h> + +void arch_hvm_save(struct domain *d, struct hvm_save_header *hdr) +{ + hdr->cpuid = READ_SYSREG32(MIDR_EL1); +} + +int arch_hvm_load(struct domain *d, struct hvm_save_header *hdr) +{ + uint32_t cpuid; + + if ( hdr->magic != HVM_FILE_MAGIC ) + { + printk(XENLOG_G_ERR "HVM%d restore: bad magic number %#"PRIx32"\n", + d->domain_id, hdr->magic); + return -1; + } + + if ( hdr->version != HVM_FILE_VERSION ) + { + printk(XENLOG_G_ERR "HVM%d restore: unsupported version %u\n", + d->domain_id, hdr->version); + return -1; + } + + cpuid = READ_SYSREG32(MIDR_EL1); + if ( hdr->cpuid != cpuid ) + { + printk(XENLOG_G_INFO "HVM%d restore: VM saved on one CPU " + "(%#"PRIx32") and restored on another (%#"PRIx32").\n", + d->domain_id, hdr->cpuid, cpuid); + return -1; + } + + return 0; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/common/Makefile b/xen/common/Makefile index 0dc2050..956cf29 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -60,6 +60,8 @@ subdir-$(CONFIG_COMPAT) += compat subdir-$(x86_64) += hvm +subdir-$(CONFIG_ARM) += hvm + subdir-$(coverage) += gcov subdir-y += libelf diff --git a/xen/include/asm-arm/hvm/support.h b/xen/include/asm-arm/hvm/support.h new file mode 100644 index 0000000..8311f2f --- /dev/null +++ b/xen/include/asm-arm/hvm/support.h @@ -0,0 +1,29 @@ +/* + * support.h: HVM support routines used by ARMv7 VE. + * + * Copyright (c) 2012, Citrix Systems + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + */ + +#ifndef __ASM_ARM_HVM_SUPPORT_H__ +#define __ASM_ARM_HVM_SUPPORT_H__ + +#include <xen/types.h> +#include <public/hvm/ioreq.h> +#include <xen/sched.h> +#include <xen/hvm/save.h> +#include <asm/processor.h> + +#endif /* __ASM_ARM_HVM_SUPPORT_H__ */ diff --git a/xen/include/public/arch-arm/hvm/save.h b/xen/include/public/arch-arm/hvm/save.h index 75b8e65..26f8755 100644 --- a/xen/include/public/arch-arm/hvm/save.h +++ b/xen/include/public/arch-arm/hvm/save.h @@ -26,6 +26,47 @@ #ifndef __XEN_PUBLIC_HVM_SAVE_ARM_H__ #define __XEN_PUBLIC_HVM_SAVE_ARM_H__ +#define HVM_FILE_MAGIC 0x92385520 +#define HVM_FILE_VERSION 0x00000001 + +struct hvm_save_header +{ + uint32_t magic; /* Must be HVM_FILE_MAGIC */ + uint32_t version; /* File format version */ + uint64_t changeset; /* Version of Xen that saved this file */ + uint32_t cpuid; /* MIDR_EL1 on the saving machine */ +}; + +DECLARE_HVM_SAVE_TYPE(HEADER, 1, struct hvm_save_header); + +struct hvm_hw_gic +{ + uint32_t gic_hcr; + uint32_t gic_vmcr; + uint32_t gic_apr; +}; + +DECLARE_HVM_SAVE_TYPE(GIC, 2, struct hvm_hw_gic); + +#define TIMER_TYPE_VIRT 0 +#define TIMER_TYPE_PHYS 1 + +struct hvm_hw_timer +{ + uint64_t vtb_offset; + + uint32_t ctl; + uint64_t cval; + uint32_t type; +}; + +DECLARE_HVM_SAVE_TYPE(A15_TIMER, 3, struct hvm_hw_timer); + +/* + * Largest type-code in use + */ +#define HVM_SAVE_CODE_MAX 3 + #endif /* -- 1.8.1.2
Jaeyong Yoo
2013-Aug-01 12:57 UTC
[PATCH v3 02/10] xen/arm: Add more registers for saving and restoring vcpu registers
Add more registers for saving and restoring vcpu registers for live migration. Those registers are selected from the registers stored when vcpu context switching. And, make it build-able by fixing vcpu_guest_context size in reference.size. Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> --- tools/include/xen-foreign/reference.size | 2 +- xen/arch/arm/domain.c | 34 +++++++++++++++++++++++++++++++ xen/arch/arm/domctl.c | 35 ++++++++++++++++++++++++++++++++ xen/include/public/arch-arm.h | 35 ++++++++++++++++++++++++++++++++ 4 files changed, 105 insertions(+), 1 deletion(-) diff --git a/tools/include/xen-foreign/reference.size b/tools/include/xen-foreign/reference.size index de36455..dfd691c 100644 --- a/tools/include/xen-foreign/reference.size +++ b/tools/include/xen-foreign/reference.size @@ -5,7 +5,7 @@ start_info | - - 1112 1168 trap_info | - - 8 16 cpu_user_regs | - - 68 200 vcpu_guest_core_regs | 304 304 - - -vcpu_guest_context | 336 336 2800 5168 +vcpu_guest_context | 440 440 2800 5168 arch_vcpu_info | - - 24 16 vcpu_time_info | - - 32 32 vcpu_info | - - 64 64 diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 4e9cece..4fab443 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -597,6 +597,40 @@ int arch_set_info_guest( v->arch.ttbr1 = ctxt->ttbr1; v->arch.ttbcr = ctxt->ttbcr; + v->arch.dacr = ctxt->dacr; + v->arch.ifar = ctxt->ifar; + v->arch.ifsr = ctxt->ifsr; + v->arch.dfar = ctxt->dfar; + v->arch.dfsr = ctxt->dfsr; + +#ifdef CONFIG_ARM_32 + v->arch.mair0 = ctxt->mair0; + v->arch.mair1 = ctxt->mair1; +#else + v->arch.mair = ctxt->mair; +#endif + + /* Control Registers */ + v->arch.actlr = ctxt->actlr; + v->arch.cpacr = ctxt->cpacr; + + v->arch.contextidr = ctxt->contextidr; + v->arch.tpidr_el0 = ctxt->tpidr_el0; + v->arch.tpidr_el1 = ctxt->tpidr_el1; + v->arch.tpidrro_el0 = ctxt->tpidrro_el0; + + /* CP 15 */ + v->arch.csselr = ctxt->csselr; + + v->arch.afsr0 = ctxt->afsr0; + v->arch.afsr1 = ctxt->afsr1; + v->arch.vbar = ctxt->vbar; + v->arch.par = ctxt->par; + v->arch.teecr = ctxt->teecr; + v->arch.teehbr = ctxt->teehbr; + v->arch.joscr = ctxt->joscr; + v->arch.jmcr = ctxt->jmcr; + v->is_initialised = 1; if ( ctxt->flags & VGCF_online ) diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c index cb38e59..9cfb48a 100644 --- a/xen/arch/arm/domctl.c +++ b/xen/arch/arm/domctl.c @@ -116,6 +116,41 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c) ctxt->ttbr1 = v->arch.ttbr1; ctxt->ttbcr = v->arch.ttbcr; + ctxt->dacr = v->arch.dacr; + ctxt->ifar = v->arch.ifar; + ctxt->ifsr = v->arch.ifsr; + ctxt->dfar = v->arch.dfar; + ctxt->dfsr = v->arch.dfsr; + +#ifdef CONFIG_ARM_32 + ctxt->mair0 = v->arch.mair0; + ctxt->mair1 = v->arch.mair1; +#else + ctxt->mair = v->arch.mair; +#endif + + /* Control Registers */ + ctxt->actlr = v->arch.actlr; + ctxt->sctlr = v->arch.sctlr; + ctxt->cpacr = v->arch.cpacr; + + ctxt->contextidr = v->arch.contextidr; + ctxt->tpidr_el0 = v->arch.tpidr_el0; + ctxt->tpidr_el1 = v->arch.tpidr_el1; + ctxt->tpidrro_el0 = v->arch.tpidrro_el0; + + /* CP 15 */ + ctxt->csselr = v->arch.csselr; + + ctxt->afsr0 = v->arch.afsr0; + ctxt->afsr1 = v->arch.afsr1; + ctxt->vbar = v->arch.vbar; + ctxt->par = v->arch.par; + ctxt->teecr = v->arch.teecr; + ctxt->teehbr = v->arch.teehbr; + ctxt->joscr = v->arch.joscr; + ctxt->jmcr = v->arch.jmcr; + if ( !test_bit(_VPF_down, &v->pause_flags) ) ctxt->flags |= VGCF_online; } diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h index cbd53a9..28388ce 100644 --- a/xen/include/public/arch-arm.h +++ b/xen/include/public/arch-arm.h @@ -191,6 +191,41 @@ struct vcpu_guest_context { uint32_t sctlr, ttbcr; uint64_t ttbr0, ttbr1; + uint32_t ifar, dfar; + uint32_t ifsr, dfsr; + uint32_t dacr; + uint64_t par; + +#ifdef CONFIG_ARM_32 + uint32_t mair0, mair1; + uint32_t tpidr_el0; + uint32_t tpidr_el1; + uint32_t tpidrro_el0; + uint32_t vbar; +#else + uint64_t mair; + uint64_t tpidr_el0; + uint64_t tpidr_el1; + uint64_t tpidrro_el0; + uint64_t vbar; +#endif + + /* Control Registers */ + uint32_t actlr; + uint32_t cpacr; + + uint32_t afsr0, afsr1; + + uint32_t contextidr; + + uint32_t teecr, teehbr; /* ThumbEE, 32-bit guests only */ + +#ifdef CONFIG_ARM_32 + uint32_t joscr, jmcr; +#endif + + /* CP 15 */ + uint32_t csselr; }; typedef struct vcpu_guest_context vcpu_guest_context_t; DEFINE_XEN_GUEST_HANDLE(vcpu_guest_context_t); -- 1.8.1.2
Jaeyong Yoo
2013-Aug-01 12:57 UTC
[PATCH v3 03/10] xen/arm: Implement set_memory_map hypercall
From: Evgeny Fedotov <e.fedotov@samsung.com> When creating domU in toolstack, pass the guest memory map info to the hypervisor, and the hypervisor stores those info in arch_domain for later use. Singed-off-by: Evgeny Fedotov <e.fedotov@samsung.com> --- tools/libxc/xc_dom_arm.c | 12 +++++++- tools/libxc/xc_domain.c | 44 ++++++++++++++++++++++++++++ tools/libxc/xenctrl.h | 23 +++++++++++++++ xen/arch/arm/domain.c | 3 ++ xen/arch/arm/mm.c | 68 ++++++++++++++++++++++++++++++++++++++++++++ xen/include/asm-arm/domain.h | 2 ++ xen/include/asm-arm/mm.h | 1 + xen/include/public/memory.h | 15 ++++++++-- xen/include/xsm/dummy.h | 5 ++++ xen/include/xsm/xsm.h | 5 ++++ 10 files changed, 175 insertions(+), 3 deletions(-) diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c index df59ffb..20c9095 100644 --- a/tools/libxc/xc_dom_arm.c +++ b/tools/libxc/xc_dom_arm.c @@ -166,6 +166,7 @@ int arch_setup_meminit(struct xc_dom_image *dom) { int rc; xen_pfn_t pfn, allocsz, i; + struct dt_mem_info memmap; dom->shadow_enabled = 1; @@ -191,7 +192,16 @@ int arch_setup_meminit(struct xc_dom_image *dom) 0, 0, &dom->p2m_host[i]); } - return 0; + /* setup guest memory map */ + memmap.nr_banks = 2; + memmap.bank[0].start = (dom->rambase_pfn << PAGE_SHIFT_ARM); + memmap.bank[0].size = (dom->total_pages << PAGE_SHIFT_ARM); + /*The end of main memory: magic pages */ + memmap.bank[1].start = memmap.bank[0].start + memmap.bank[0].size; + memmap.bank[1].size = NR_MAGIC_PAGES << PAGE_SHIFT_ARM; + + return xc_domain_set_memory_map(dom->xch, dom->guest_domid, &memmap); + } int arch_setup_bootearly(struct xc_dom_image *dom) diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index 3257e2a..10627f7 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -644,7 +644,51 @@ int xc_domain_set_memmap_limit(xc_interface *xch, return -1; } #endif +#if defined(__arm__) +int xc_domain_get_memory_map(xc_interface *xch, + uint32_t domid, + struct dt_mem_info *map) +{ + int rc; + struct xen_arm_memory_map fmap = { + .domid = domid + }; + + DECLARE_HYPERCALL_BOUNCE(map, sizeof(struct dt_mem_info), + XC_HYPERCALL_BUFFER_BOUNCE_OUT); + + if ( !map || xc_hypercall_bounce_pre(xch, map) ) + return -1; + set_xen_guest_handle(fmap.buffer, map); + rc = do_memory_op(xch, XENMEM_memory_map, &fmap, sizeof(fmap)); + + xc_hypercall_bounce_post(xch, map); + + return rc; +} + +int xc_domain_set_memory_map(xc_interface *xch, + uint32_t domid, + struct dt_mem_info *map) +{ + int rc; + struct xen_arm_memory_map fmap = { + .domid = domid + }; + DECLARE_HYPERCALL_BOUNCE(map, sizeof(struct dt_mem_info), + XC_HYPERCALL_BUFFER_BOUNCE_IN); + + if ( !map || xc_hypercall_bounce_pre(xch, map) ) + return -1; + set_xen_guest_handle(fmap.buffer, map); + + rc = do_memory_op(xch, XENMEM_set_memory_map, &fmap, sizeof(fmap)); + + xc_hypercall_bounce_post(xch, map); + return rc; +} +#endif int xc_domain_set_time_offset(xc_interface *xch, uint32_t domid, int32_t time_offset_seconds) diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h index 388a9c3..e12d49c 100644 --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -1110,6 +1110,29 @@ int xc_get_machine_memory_map(xc_interface *xch, struct e820entry entries[], uint32_t max_entries); #endif + +#if defined(__arm__) +#define NR_MEM_BANKS 8 +typedef uint64_t paddr_t; + +struct membank { + paddr_t start; + paddr_t size; +}; + +struct dt_mem_info { + int nr_banks; + struct membank bank[NR_MEM_BANKS]; +}; + +int xc_domain_set_memory_map(xc_interface *xch, + uint32_t domid, + struct dt_mem_info *map); +int xc_domain_get_memory_map(xc_interface *xch, + uint32_t domid, + struct dt_mem_info *map); +#endif + int xc_domain_set_time_offset(xc_interface *xch, uint32_t domid, int32_t time_offset_seconds); diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 4fab443..e9cfc81 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -509,6 +509,9 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) /* Default the virtual ID to match the physical */ d->arch.vpidr = boot_cpu_data.midr.bits; + spin_lock_init(&d->arch.map_lock); + d->arch.map_domain.nr_banks = 0; + clear_page(d->shared_info); share_xen_page_with_guest( virt_to_page(d->shared_info), d, XENSHARE_writable); diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index f301e65..3c83447 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -998,6 +998,74 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) return rc; } + case XENMEM_set_memory_map: + { + struct xen_arm_memory_map fmap; + struct domain *d; + struct dt_mem_info info; + + if ( copy_from_guest(&fmap, arg, 1) ) + return -EFAULT; + + if ( copy_from_guest(&info, fmap.buffer, 1) ) + { + return -EFAULT; + } + + if ( info.nr_banks > NR_MEM_BANKS ) + return -EINVAL; + + d = rcu_lock_domain_by_any_id(fmap.domid); + if ( d == NULL ) + return -ESRCH; + + rc = xsm_domain_memory_map(XSM_TARGET, d); + if ( rc ) + { + rcu_unlock_domain(d); + return rc; + } + spin_lock(&d->arch.map_lock); + d->arch.map_domain = info; + spin_unlock(&d->arch.map_lock); + + rcu_unlock_domain(d); + return rc; + } + + case XENMEM_memory_map: + { + /* get the domain''s memory map as it was stored */ + struct xen_arm_memory_map fmap; + struct domain *d; + struct dt_mem_info info; + + if ( copy_from_guest(&fmap, arg, 1) ) + return -EFAULT; + + d = rcu_lock_domain_by_any_id(fmap.domid); + if ( d == NULL ) + return -ESRCH; + + spin_lock(&d->arch.map_lock); + info = d->arch.map_domain; + spin_unlock(&d->arch.map_lock); + + if ( copy_to_guest(fmap.buffer, &info, 1) ) + { + rcu_unlock_domain(d); + return -EFAULT; + } + + if ( copy_to_guest(arg, &fmap, 1) ) + { + rcu_unlock_domain(d); + return -EFAULT; + } + + rcu_unlock_domain(d); + return 0; + } /* XXX: memsharing not working yet */ case XENMEM_get_sharing_shared_pages: case XENMEM_get_sharing_freed_pages: diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 89f88f6..0c80c65 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -110,6 +110,8 @@ struct arch_domain spinlock_t lock; } uart0; + struct dt_mem_info map_domain; + spinlock_t map_lock; } __cacheline_aligned; struct arch_vcpu diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index 5e7c5a3..404ec4d 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -5,6 +5,7 @@ #include <xen/kernel.h> #include <asm/page.h> #include <public/xen.h> +#include <xen/device_tree.h> #if defined(CONFIG_ARM_32) # include <asm/arm32/io.h> diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h index 7a26dee..264fb8f 100644 --- a/xen/include/public/memory.h +++ b/xen/include/public/memory.h @@ -283,9 +283,12 @@ DEFINE_XEN_GUEST_HANDLE(xen_remove_from_physmap_t); /*#define XENMEM_translate_gpfn_list 8*/ /* - * Returns the pseudo-physical memory map as it was when the domain + * x86: returns the pseudo-physical memory map as it was when the domain * was started (specified by XENMEM_set_memory_map). * arg == addr of xen_memory_map_t. + * ARM: returns the pseudo-physical memory map as it was set + * (specified by XENMEM_set_memory_map). + * arg == addr of xen_arm_memory_map_t. */ #define XENMEM_memory_map 9 struct xen_memory_map { @@ -315,7 +318,8 @@ DEFINE_XEN_GUEST_HANDLE(xen_memory_map_t); /* * Set the pseudo-physical memory map of a domain, as returned by * XENMEM_memory_map. - * arg == addr of xen_foreign_memory_map_t. + * x86: arg == addr of xen_foreign_memory_map_t. + * ARM: arg == addr of xen_arm_memory_map_t */ #define XENMEM_set_memory_map 13 struct xen_foreign_memory_map { @@ -325,6 +329,13 @@ struct xen_foreign_memory_map { typedef struct xen_foreign_memory_map xen_foreign_memory_map_t; DEFINE_XEN_GUEST_HANDLE(xen_foreign_memory_map_t); +struct xen_arm_memory_map { + domid_t domid; + XEN_GUEST_HANDLE(void) buffer; +}; +typedef struct xen_arm_memory_map xen_arm_memory_map_t; +DEFINE_XEN_GUEST_HANDLE(xen_arm_memory_map_t); + #define XENMEM_set_pod_target 16 #define XENMEM_get_pod_target 17 struct xen_pod_target { diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h index cc0a5a8..fef9904 100644 --- a/xen/include/xsm/dummy.h +++ b/xen/include/xsm/dummy.h @@ -626,4 +626,9 @@ static XSM_INLINE int xsm_map_gmfn_foreign(XSM_DEFAULT_ARG struct domain *d, str XSM_ASSERT_ACTION(XSM_TARGET); return xsm_default_action(action, d, t); } +static XSM_INLINE int xsm_domain_memory_map(XSM_DEFAULT_ARG struct domain *d) +{ + XSM_ASSERT_ACTION(XSM_TARGET); + return xsm_default_action(action, current->domain, d); +} #endif diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h index 1939453..9764011 100644 --- a/xen/include/xsm/xsm.h +++ b/xen/include/xsm/xsm.h @@ -625,6 +625,11 @@ static inline int xsm_map_gmfn_foreign (struct domain *d, struct domain *t) { return xsm_ops->map_gmfn_foreign(d, t); } +static inline int xsm_domain_memory_map(xsm_default_t def, struct domain *d) +{ + return xsm_ops->domain_memory_map(d); +} + #endif /* CONFIG_ARM */ #endif /* XSM_NO_WRAPPERS */ -- 1.8.1.2
Jaeyong Yoo
2013-Aug-01 12:57 UTC
[PATCH v3 04/10] xen/arm: Implement get_maximum_gpfn hypercall for arm
From: Evgeny Fedotov <e.fedotov@samsung.com> By using the memory map info in arch_domain (from set_memory_map hypercall) implement get_maximum_gpfn hypercall. Singed-off-by: Evgeny Fedotov <e.fedotov@samsung.com> --- xen/arch/arm/mm.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 3c83447..9d5d3e0 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -762,7 +762,16 @@ int page_is_ram_type(unsigned long mfn, unsigned long mem_type) unsigned long domain_get_maximum_gpfn(struct domain *d) { - return -ENOSYS; + xen_pfn_t max = 0; + int nr_banks; + + spin_lock(&d->arch.map_lock); + nr_banks = d->arch.map_domain.nr_banks; + if ( nr_banks ) + max = (d->arch.map_domain.bank[nr_banks - 1].start + + d->arch.map_domain.bank[nr_banks - 1].size) >> PAGE_SHIFT; + spin_unlock(&d->arch.map_lock); + return (unsigned long) max; } void share_xen_page_with_guest(struct page_info *page, -- 1.8.1.2
From: Alexey Sokolov <sokolov.a@samsung.com> Making sched_op in do_suspend (driver/xen/manage.c) returns 0 on the success of suspend. Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com> --- tools/libxc/xc_resume.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c index 1c43ec6..4aeda6a 100644 --- a/tools/libxc/xc_resume.c +++ b/tools/libxc/xc_resume.c @@ -87,6 +87,31 @@ static int modify_returncode(xc_interface *xch, uint32_t domid) return 0; } +#elif defined(__arm__) + +static int modify_returncode(xc_interface *xch, uint32_t domid) +{ + vcpu_guest_context_any_t ctxt; + xc_dominfo_t info; + int rc; + + if ( xc_domain_getinfo(xch, domid, 1, &info) != 1 ) + { + PERROR("Could not get domain info"); + return -1; + } + + if ( (rc = xc_vcpu_getcontext(xch, domid, 0, &ctxt)) != 0 ) + return rc; + + ctxt.c.user_regs.r0_usr = 1; + + if ( (rc = xc_vcpu_setcontext(xch, domid, 0, &ctxt)) != 0 ) + return rc; + + return 0; +} + #else static int modify_returncode(xc_interface *xch, uint32_t domid) -- 1.8.1.2
Jaeyong Yoo
2013-Aug-01 12:57 UTC
[PATCH v3 06/10] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration
Allocate and free the xen''s virtual memory for virtual-linear page table of guest p2m. Slotting the guest p2m into the hypervisor''s own page tables, such that the guest p2m table entries are available at known virtual addresses. For more info, see: http://www.technovelty.org/linux/virtual-linear-page-table.html This function is used in dirty-page tracing: when domU write-fault is trapped by xen, xen can immediately locate the p2m entry of the write-fault. Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> --- xen/arch/arm/Makefile | 1 + xen/arch/arm/setup.c | 3 + xen/arch/arm/vlpt.c | 162 +++++++++++++++++++++++++++++++++++++++++++ xen/include/asm-arm/config.h | 3 + xen/include/asm-arm/vlpt.h | 10 +++ 5 files changed, 179 insertions(+) create mode 100644 xen/arch/arm/vlpt.c create mode 100644 xen/include/asm-arm/vlpt.h diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile index fa15412..86165e7 100644 --- a/xen/arch/arm/Makefile +++ b/xen/arch/arm/Makefile @@ -31,6 +31,7 @@ obj-y += vpl011.o obj-y += hvm.o obj-y += device.o obj-y += save.o +obj-y += vlpt.o #obj-bin-y += ....o diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c index 1ec5e38..27f0cca 100644 --- a/xen/arch/arm/setup.c +++ b/xen/arch/arm/setup.c @@ -35,6 +35,7 @@ #include <xen/cpu.h> #include <xen/pfn.h> #include <xen/vmap.h> +#include <asm/vlpt.h> #include <asm/page.h> #include <asm/current.h> #include <asm/setup.h> @@ -447,6 +448,8 @@ void __init start_xen(unsigned long boot_phys_offset, dt_unflatten_host_device_tree(); dt_irq_xlate = gic_irq_xlate; + vlpt_init(); + dt_uart_init(); console_init_preirq(); diff --git a/xen/arch/arm/vlpt.c b/xen/arch/arm/vlpt.c new file mode 100644 index 0000000..49b1887 --- /dev/null +++ b/xen/arch/arm/vlpt.c @@ -0,0 +1,162 @@ +#ifdef VIRT_LIN_P2M_START +#include <xen/bitmap.h> +#include <xen/cache.h> +#include <xen/init.h> +#include <xen/mm.h> +#include <xen/pfn.h> +#include <xen/spinlock.h> +#include <xen/types.h> +#include <asm/vlpt.h> +#include <asm/page.h> +#include <asm/early_printk.h> + +static DEFINE_SPINLOCK(vlpt_lock); +static void *__read_mostly vlpt_base; +#define vlpt_bitmap ((unsigned long *)vlpt_base) +/* highest allocated bit in the bitmap */ +static unsigned int __read_mostly vlpt_top; +/* total number of bits in the bitmap */ +static unsigned int __read_mostly vlpt_end; +/* lowest known clear bit in the bitmap */ +static unsigned int vlpt_low; + +void __init vlpt_init(void) +{ + unsigned int i, nr; + unsigned long va; + + vlpt_base = (void *)VIRT_LIN_P2M_START; + vlpt_end = PFN_DOWN((void *)VIRT_LIN_P2M_END - vlpt_base); + vlpt_low = PFN_UP((vlpt_end + 7) / 8); + nr = PFN_UP((vlpt_low + 7) / 8); + vlpt_top = nr * PAGE_SIZE * 8; + + for ( i = 0, va = (unsigned long)vlpt_bitmap; i < nr; ++i, va += PAGE_SIZE ) + { + struct page_info *pg = alloc_domheap_page(NULL, 0); + + map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR); + clear_page((void *)va); + } + bitmap_fill(vlpt_bitmap, vlpt_low); + /* Populate page tables for the bitmap if necessary. */ + map_pages_to_xen(va, 0, vlpt_low - nr, MAP_SMALL_PAGES); +} + +void *vlpt_alloc(unsigned int nr, unsigned int align) +{ + unsigned int start, bit; + + if ( !align ) + align = 1; + else if ( align & (align - 1) ) + align &= -align; + + spin_lock(&vlpt_lock); + for ( ; ; ) + { + struct page_info *pg; + + ASSERT(!test_bit(vlpt_low, vlpt_bitmap)); + for ( start = vlpt_low; ; ) + { + bit = find_next_bit(vlpt_bitmap, vlpt_top, start + 1); + if ( bit > vlpt_top ) + bit = vlpt_top; + /* + * Note that this skips the first bit, making the + * corresponding page a guard one. + */ + start = (start + align) & ~(align - 1); + if ( start + nr <= bit ) + break; + start = bit < vlpt_top ? + find_next_zero_bit(vlpt_bitmap, vlpt_top, bit + 1) : bit; + if ( start >= vlpt_top ) + break; + } + + if ( start < vlpt_top ) + break; + + spin_unlock(&vlpt_lock); + + if ( vlpt_top >= vlpt_end ) + return NULL; + + pg = alloc_domheap_page(NULL, 0); + if ( !pg ) + return NULL; + + spin_lock(&vlpt_lock); + + if ( start >= vlpt_top ) + { + unsigned long va = (unsigned long)vlpt_bitmap + vlpt_top / 8; + + if ( !map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR) ) + { + clear_page((void *)va); + vlpt_top += PAGE_SIZE * 8; + if ( vlpt_top > vlpt_end ) + vlpt_top = vlpt_end; + continue; + } + } + + free_domheap_page(pg); + + if ( start >= vlpt_top ) + { + spin_unlock(&vlpt_lock); + return NULL; + } + } + + for ( bit = start; bit < start + nr; ++bit ) + __set_bit(bit, vlpt_bitmap); + if ( start <= vlpt_low + 2 ) + vlpt_low = bit; + spin_unlock(&vlpt_lock); + + return vlpt_base + start * PAGE_SIZE; +} + +static unsigned int vlpt_index(const void *va) +{ + unsigned long addr = (unsigned long)va & ~(PAGE_SIZE - 1); + unsigned int idx; + + if ( addr < VIRT_LIN_P2M_START + (vlpt_end / 8) || + addr >= VIRT_LIN_P2M_START + vlpt_top * PAGE_SIZE ) + return 0; + + idx = PFN_DOWN(va - vlpt_base); + return !test_bit(idx - 1, vlpt_bitmap) && + test_bit(idx, vlpt_bitmap) ? idx : 0; +} + +void vlpt_free(const void *va) +{ + unsigned int bit = vlpt_index(va); + + if ( !bit ) + { + WARN_ON(va != NULL); + return; + } + + spin_lock(&vlpt_lock); + if ( bit < vlpt_low ) + { + vlpt_low = bit - 1; + while ( !test_bit(vlpt_low - 1, vlpt_bitmap) ) + --vlpt_low; + } + while ( __test_and_clear_bit(bit, vlpt_bitmap) ) + if ( ++bit == vlpt_top ) + break; + spin_unlock(&vlpt_lock); +} + +#endif diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h index e3cfaf1..f9a7063 100644 --- a/xen/include/asm-arm/config.h +++ b/xen/include/asm-arm/config.h @@ -80,6 +80,7 @@ * 6M - 8M Early boot misc (see below) * * 32M - 128M Frametable: 24 bytes per page for 16GB of RAM + * 128M - 256M Virtual-linear mapping to P2M table * 256M - 1G VMAP: ioremap and early_ioremap use this virtual address * space * @@ -95,12 +96,14 @@ #define FIXMAP_ADDR(n) (mk_unsigned_long(0x00400000) + (n) * PAGE_SIZE) #define BOOT_MISC_VIRT_START mk_unsigned_long(0x00600000) #define FRAMETABLE_VIRT_START mk_unsigned_long(0x02000000) +#define VIRT_LIN_P2M_START mk_unsigned_long(0x08000000) #define VMAP_VIRT_START mk_unsigned_long(0x10000000) #define XENHEAP_VIRT_START mk_unsigned_long(0x40000000) #define DOMHEAP_VIRT_START mk_unsigned_long(0x80000000) #define DOMHEAP_VIRT_END mk_unsigned_long(0xffffffff) #define VMAP_VIRT_END XENHEAP_VIRT_START +#define VIRT_LIN_P2M_END VMAP_VIRT_START #define HYPERVISOR_VIRT_START XEN_VIRT_START #define DOMHEAP_ENTRIES 1024 /* 1024 2MB mapping slots */ diff --git a/xen/include/asm-arm/vlpt.h b/xen/include/asm-arm/vlpt.h new file mode 100644 index 0000000..da55293 --- /dev/null +++ b/xen/include/asm-arm/vlpt.h @@ -0,0 +1,10 @@ +#if !defined(__XEN_VLPT_H__) && defined(VIRT_LIN_P2M_START) +#define __XEN_VLPT_H__ + +#include <xen/types.h> + +void *vlpt_alloc(unsigned int nr, unsigned int align); +void vlpt_free(const void *); +void vlpt_init(void); + +#endif /* __XEN_VLPT_H__ */ -- 1.8.1.2
Jaeyong Yoo
2013-Aug-01 12:57 UTC
[PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing
Add handling write fault in do_trap_data_abort_guest for dirty-page tracing. Rather than maintaining a bitmap for dirty pages, we use the avail bit in p2m entry. For locating the write fault pte in guest p2m, we use virtual-linear page table that slots guest p2m into xen''s virtual memory. Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> --- xen/arch/arm/mm.c | 110 +++++++++++++++++++++++++++++++++++++++- xen/arch/arm/traps.c | 16 +++++- xen/include/asm-arm/domain.h | 11 ++++ xen/include/asm-arm/mm.h | 5 ++ xen/include/asm-arm/processor.h | 2 + 5 files changed, 142 insertions(+), 2 deletions(-) diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 9d5d3e0..a24afe6 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -680,7 +680,6 @@ void destroy_xen_mappings(unsigned long v, unsigned long e) create_xen_entries(REMOVE, v, 0, (e - v) >> PAGE_SHIFT, 0); } -enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; static void set_pte_flags_on_range(const char *p, unsigned long l, enum mg mg) { lpae_t pte; @@ -1214,6 +1213,115 @@ int is_iomem_page(unsigned long mfn) return 1; return 0; } + +static uint64_t find_guest_p2m_mfn(struct domain *d, paddr_t addr) +{ + lpae_t *first = NULL, *second = NULL; + struct p2m_domain *p2m = &d->arch.p2m; + uint64_t mfn = -EFAULT; + + if ( first_table_offset(addr) >= LPAE_ENTRIES ) + return mfn; + + first = __map_domain_page(p2m->first_level); + + if ( !first || + !first[first_table_offset(addr)].walk.valid || + !first[first_table_offset(addr)].walk.table ) + goto done; + + second = map_domain_page(first[first_table_offset(addr)].walk.base); + + if ( !second || + !second[second_table_offset(addr)].walk.valid || + !second[second_table_offset(addr)].walk.table ) + goto done; + + mfn = second[second_table_offset(addr)].walk.base; + +done: + if ( second ) unmap_domain_page(second); + if ( first ) unmap_domain_page(first); + + return mfn; +} + +/* + * routine for dirty-page tracing + * + * On first write, it page faults, its entry is changed to read-write, + * and on retry the write succeeds. + * + * for locating p2m of the faulting entry, we use virtual-linear page table. + */ +int handle_page_fault(struct domain *d, paddr_t addr) +{ + int rc = 0; + struct p2m_domain *p2m = &d->arch.p2m; + uint64_t gma_start; + int gma_third_index; + int xen_second_linear, xen_third_table; + lpae_t *xen_third; + lpae_t *vlp2m_pte; + + BUG_ON( !d->arch.map_domain.nr_banks ); + + gma_start = d->arch.map_domain.bank[0].start; + gma_third_index = third_linear_offset(addr - gma_start); + vlp2m_pte = (lpae_t *)(d->arch.dirty.vlpt_start + + sizeof(lpae_t) * gma_third_index); + + BUG_ON( (void *)vlp2m_pte > d->arch.dirty.vlpt_end ); + + spin_lock(&p2m->lock); + + xen_second_linear = second_linear_offset((unsigned long)vlp2m_pte); + xen_third_table = third_table_offset((unsigned long)vlp2m_pte); + + /* starting from xen second level page table */ + if ( !xen_second[xen_second_linear].pt.valid ) + { + unsigned long va = (unsigned long)vlp2m_pte & ~(PAGE_SIZE-1); + + rc = create_xen_table(&xen_second[second_linear_offset(va)]); + if ( rc < 0 ) + goto out; + } + + BUG_ON( !xen_second[xen_second_linear].pt.valid ); + + /* at this point, xen second level pt has valid entry + * check again the validity of third level pt */ + xen_third = __va(pfn_to_paddr(xen_second[xen_second_linear].pt.base)); + + /* xen third-level page table invalid */ + if ( !xen_third[xen_third_table].p2m.valid ) + { + uint64_t mfn = find_guest_p2m_mfn(d, addr); + lpae_t pte = mfn_to_xen_entry(mfn); + unsigned long va = (unsigned long)vlp2m_pte & ~(PAGE_SIZE-1); + + pte.pt.table = 1; /* 4k mappings always have this bit set */ + write_pte(&xen_third[xen_third_table], pte); + flush_xen_data_tlb_range_va(va, PAGE_SIZE); + } + + /* at this point, xen third level pt has valid entry: means we can access + * vlp2m_pte vlp2m_pte is like a fourth level pt for xen, but for guest, + * it is third level pt */ + if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0 ) + { + vlp2m_pte->p2m.write = 1; + vlp2m_pte->p2m.avail = 1; + write_pte(vlp2m_pte, *vlp2m_pte); + flush_tlb_local(); + } + +out: + spin_unlock(&p2m->lock); + return rc; +} + /* * Local variables: * mode: C diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 1b9209d..f844f56 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1226,7 +1226,12 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, goto bad_data_abort; /* XXX: Decode the instruction if ISS is not valid */ - if ( !dabt.valid ) + /* Note: add additional check before goto bad_data_abort. dabt.valid + * bit is for telling the validity of ISS[23:16] bits. For dirty-page + * tracing, we need to see DFSC bits. If DFSC bits are indicating the + * possibility of dirty page tracing, do not go to bad_data_abort */ + if ( !dabt.valid && + (dabt.dfsc & FSC_MASK) != (FSC_FLT_PERM + FSC_3D_LEVEL) && dabt.write) goto bad_data_abort; if (handle_mmio(&info)) @@ -1235,6 +1240,15 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, return; } + /* handle permission fault on write */ + if ( (dabt.dfsc & FSC_MASK) == (FSC_FLT_PERM + FSC_3D_LEVEL) && dabt.write ) + { + if ( current->domain->arch.dirty.mode == 0 ) + goto bad_data_abort; + if ( handle_page_fault(current->domain, info.gpa) == 0 ) + return; + } + bad_data_abort: msg = decode_fsc( dabt.dfsc, &level); diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 0c80c65..413b89a 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -110,6 +110,17 @@ struct arch_domain spinlock_t lock; } uart0; + /* dirty-page tracing */ + struct { + spinlock_t lock; + int mode; + unsigned int count; + uint32_t gmfn_guest_start; /* guest physical memory start address */ + void *vlpt_start; /* va-start of guest p2m */ + void *vlpt_end; /* va-end of guest p2m */ + struct page_info *head; /* maintain the mapped vaddrs */ + } dirty; + struct dt_mem_info map_domain; spinlock_t map_lock; } __cacheline_aligned; diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index 404ec4d..fd976e3 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -328,6 +328,11 @@ static inline void put_page_and_type(struct page_info *page) put_page(page); } +enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; + +/* routine for dirty-page tracing */ +int handle_page_fault(struct domain *d, paddr_t addr); + #endif /* __ARCH_ARM_MM__ */ /* * Local variables: diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h index 06b0b25..34c21de 100644 --- a/xen/include/asm-arm/processor.h +++ b/xen/include/asm-arm/processor.h @@ -383,6 +383,8 @@ union hsr { #define FSC_CPR (0x3a) /* Coprocossor Abort */ #define FSC_LL_MASK (0x03<<0) +#define FSC_MASK (0x3f) /* Fault status mask */ +#define FSC_3D_LEVEL (0x03) /* Third level fault*/ /* Time counter hypervisor control register */ #define CNTHCTL_PA (1u<<0) /* Kernel/user access to physical counter */ -- 1.8.1.2
Jaeyong Yoo
2013-Aug-01 12:57 UTC
[PATCH v3 08/10] xen/arm: Fixing clear_guest_offset macro
Fix the the broken macro ''clear_guest_offset'' in arm. Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> --- xen/include/asm-arm/guest_access.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/xen/include/asm-arm/guest_access.h b/xen/include/asm-arm/guest_access.h index 34aae14..8ff088f 100644 --- a/xen/include/asm-arm/guest_access.h +++ b/xen/include/asm-arm/guest_access.h @@ -77,8 +77,9 @@ unsigned long raw_clear_guest(void *to, unsigned len); * Clear an array of objects in guest context via a guest handle, * specifying an offset into the guest array. */ -#define clear_guest_offset(hnd, off, ptr, nr) ({ \ - raw_clear_guest(_d+(off), nr); \ +#define clear_guest_offset(hnd, off, nr) ({ \ + void *_d = (hnd).p; \ + raw_clear_guest(_d+(off), nr); \ }) /* -- 1.8.1.2
Jaeyong Yoo
2013-Aug-01 12:57 UTC
[PATCH v3 09/10] xen/arm: Implement hypercall for dirty page tracing (shadow op)
Add hypercall (shadow op: enable/disable and clean/peek dirted page bitmap). For generating the dirty-bitmap, loop over the xen''s page table mapped to guest p2m. In this way, we don''t need to map/unmap domain page for guest p2m. For unmapping the guest p2m slotted into xen''s page table after finishing live migration, we implement add_mapped_vaddr for storing the write-faulting addresses. In destroy_all_mapped_vaddrs function, the actual unmap happens. Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> --- xen/arch/arm/domain.c | 7 ++ xen/arch/arm/domctl.c | 13 ++ xen/arch/arm/mm.c | 95 ++++++++++++++ xen/arch/arm/p2m.c | 307 ++++++++++++++++++++++++++++++++++++++++++++++ xen/include/asm-arm/mm.h | 1 + xen/include/asm-arm/p2m.h | 4 + 6 files changed, 427 insertions(+) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index e9cfc81..b629988 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -512,6 +512,13 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) spin_lock_init(&d->arch.map_lock); d->arch.map_domain.nr_banks = 0; + /* init for dirty-page tracing */ + d->arch.dirty.count = 0; + d->arch.dirty.gmfn_guest_start = 0; + d->arch.dirty.vlpt_start = NULL; + d->arch.dirty.vlpt_end = NULL; + d->arch.dirty.head = NULL; + clear_page(d->shared_info); share_xen_page_with_guest( virt_to_page(d->shared_info), d, XENSHARE_writable); diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c index 9cfb48a..87c5184 100644 --- a/xen/arch/arm/domctl.c +++ b/xen/arch/arm/domctl.c @@ -93,6 +93,19 @@ long arch_do_domctl(struct xen_domctl *domctl, struct domain *d, xfree(c.data); } break; + case XEN_DOMCTL_shadow_op: + { + domain_pause(d); + ret = dirty_mode_op(d, &domctl->u.shadow_op); + domain_unpause(d); + + if ( (&domctl->u.shadow_op)->op == XEN_DOMCTL_SHADOW_OP_CLEAN || + (&domctl->u.shadow_op)->op == XEN_DOMCTL_SHADOW_OP_PEEK ) + { + copyback = 1; + } + } + break; default: return -EINVAL; diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index a24afe6..cd7bdff 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1304,6 +1304,9 @@ int handle_page_fault(struct domain *d, paddr_t addr) pte.pt.table = 1; /* 4k mappings always have this bit set */ write_pte(&xen_third[xen_third_table], pte); flush_xen_data_tlb_range_va(va, PAGE_SIZE); + + /* in order to remove mappings in free stage */ + add_mapped_vaddr(d, va); } /* at this point, xen third level pt has valid entry: means we can access @@ -1322,6 +1325,98 @@ out: return rc; } +int get_dirty_bitmap(struct domain *d, uint8_t *bitmap[], int peek, int clean) +{ + vaddr_t vlpt_start = (vaddr_t)d->arch.dirty.vlpt_start; + vaddr_t vlpt_end = (vaddr_t)d->arch.dirty.vlpt_end; + int xen_second_linear_start, xen_second_linear_end; + int xen_third_table_start, xen_third_table_end; + int i1, i2, i3; + + xen_second_linear_start = second_linear_offset((unsigned long)vlpt_start); + xen_second_linear_end = second_linear_offset((unsigned long)vlpt_end) + 1; + + for ( i1 = xen_second_linear_start; i1 < xen_second_linear_end; i1++ ) + { + vaddr_t xen_second_start_va; + int i1_offset = 0; + lpae_t *xen_third; + + /* if xen_second page table does not have valid entry, it means, + * the corresponding region is not dirtied, so we do nothing */ + if ( !xen_second[i1].pt.valid ) + continue; + + xen_second_start_va = i1 << (LPAE_SHIFT + PAGE_SHIFT); + + /* since vlpt is partialy laying over xen_second, + we need to find the start index of third */ + if ( vlpt_start > xen_second_start_va ) + { + xen_third_table_start = third_table_offset(vlpt_start); + i1_offset = (vlpt_start - xen_second_start_va) / sizeof(lpae_t); + } + else + xen_third_table_start = 0; + + if ( vlpt_end < xen_second_start_va + + (1ul << (LPAE_SHIFT + PAGE_SHIFT)) ) + xen_third_table_end = third_table_offset(vlpt_end) + 1; + else + xen_third_table_end = LPAE_ENTRIES; + + xen_third = __va(pfn_to_paddr(xen_second[i1].pt.base)); + + for ( i2 = xen_third_table_start; i2 < xen_third_table_end; i2 ++ ) + { + lpae_t *guest_third; + if ( !xen_third[i2].pt.valid ) + continue; + + guest_third = (lpae_t *)((i1 << (LPAE_SHIFT+PAGE_SHIFT)) + + (i2 << PAGE_SHIFT)); + for ( i3 = 0; i3 < LPAE_ENTRIES; i3++ ) + { + lpae_t pte; + lpae_walk_t third_pte = guest_third[i3].walk; + int write = 0; + int bit_offset; + if ( !third_pte.valid ) + return -EINVAL; + + pte = guest_third[i3]; + if ( peek && pte.p2m.avail ) + { + int bitmap_index; + int bitmap_offset; + bit_offset = (i1 - xen_second_linear_start) * + LPAE_ENTRIES * LPAE_ENTRIES + + i2 * LPAE_ENTRIES + + i3 - + i1_offset; + + bitmap_index = bit_offset >> (PAGE_SHIFT + 3); + bitmap_offset = bit_offset & ((1ul << (PAGE_SHIFT + 3)) - + 1); + __test_and_set_bit(bitmap_offset, bitmap[bitmap_index]); + write = 1; + } + if ( clean && pte.p2m.write ) + { + pte.p2m.write = 0; + pte.p2m.avail = 0; + write = 1; + } + if ( write ) + write_pte(&guest_third[i3], pte); + } + } + } + + flush_tlb_all_local(); + return 0; +} + /* * Local variables: * mode: C diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index 307c6d4..c62a383 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -5,6 +5,9 @@ #include <xen/domain_page.h> #include <asm/flushtlb.h> #include <asm/gic.h> +#include <asm/vlpt.h> +#include <xen/guest_access.h> +#include <xen/pfn.h> void dump_p2m_lookup(struct domain *d, paddr_t addr) { @@ -345,6 +348,310 @@ unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn) return p >> PAGE_SHIFT; } +static int alloc_vlpt_for_p2m(struct domain *d) +{ + unsigned long gmfn_start = 0, gmfn_end = 0, gmfns, pgts_3rd; + void *vlpt_start, *vlpt_end; + int nr_banks; + + spin_lock(&d->arch.map_lock); + /* The guest memory map must be ordered by start addr */ + nr_banks = d->arch.map_domain.nr_banks; + if ( nr_banks ) + { + gmfn_start = d->arch.map_domain.bank[0].start >> PAGE_SHIFT; + gmfn_end = (d->arch.map_domain.bank[nr_banks - 1].start + + d->arch.map_domain.bank[nr_banks - 1].size) >> PAGE_SHIFT; + } + spin_unlock(&d->arch.map_lock); + gmfns = gmfn_end - gmfn_start; + pgts_3rd = (gmfns + LPAE_ENTRIES - 1) >> LPAE_SHIFT; + + vlpt_start = vlpt_alloc(pgts_3rd, 1); + + if ( !vlpt_start ) + { + printk("Out of memory for allocating VLPT mapping\n"); + goto out; + } + + vlpt_end = vlpt_start + pgts_3rd*PAGE_SIZE; + + d->arch.dirty.vlpt_start = vlpt_start; + d->arch.dirty.vlpt_end = vlpt_end; + + d->arch.dirty.head = NULL; + + return 0; +out: + if ( vlpt_start ) vlpt_free(vlpt_start); + return -ENOMEM; +} + +#define MAX_VA_PER_NODE (PAGE_SIZE - sizeof(struct page_info *) -\ + sizeof(int)) / sizeof(unsigned long) + +/* an array-based linked list for storing virtual addresses + * where the third-table mapping should be destroyed after + * live migration */ +struct mapped_va_node +{ + struct page_info *next; + int items; + unsigned long vaddrs[MAX_VA_PER_NODE]; +}; + +int add_mapped_vaddr(struct domain *d, unsigned long va) +{ + struct page_info *head_page = d->arch.dirty.head; + struct mapped_va_node *mvn = NULL; + + if ( !head_page ) + { + head_page = alloc_domheap_page(NULL, 0); + if ( !head_page ) + return -ENOMEM; + + mvn = __map_domain_page(head_page); + mvn->items = 0; + mvn->next = NULL; + d->arch.dirty.head = head_page; + } + + if ( !mvn ) + mvn = __map_domain_page(head_page); + + if ( mvn->items == MAX_VA_PER_NODE ) + { + struct page_info *page; + unmap_domain_page(mvn); + + page = alloc_domheap_page(NULL, 0); + if ( !page ) + return -ENOMEM; + + mvn = __map_domain_page(page); + mvn->items = 0; + mvn->next = head_page; + + d->arch.dirty.head = page; + } + + mvn->vaddrs[mvn->items] = va; + mvn->items ++; + + unmap_domain_page(mvn); + return 0; +} + +static void destroy_all_mapped_vaddrs(struct domain *d) +{ + struct page_info *head_page = d->arch.dirty.head; + struct mapped_va_node *mvn = NULL; + + while ( head_page ) + { + int i; + mvn = __map_domain_page(head_page); + head_page = mvn->next; + + for ( i = 0; i < mvn->items; ++i ) + destroy_xen_mappings(mvn->vaddrs[i], mvn->vaddrs[i] + PAGE_SIZE); + + unmap_domain_page(mvn); + } + + d->arch.dirty.head = NULL; +} + +static void free_vlpt_for_p2m(struct domain *d) +{ + destroy_all_mapped_vaddrs(d); + + vlpt_free(d->arch.dirty.vlpt_start); + d->arch.dirty.vlpt_start = NULL; + d->arch.dirty.vlpt_end = NULL; + d->arch.dirty.head = NULL; +} + +/* Change types across all p2m entries in a domain */ +static void p2m_change_entry_type_global(struct domain *d, enum mg nt) +{ + struct p2m_domain *p2m = &d->arch.p2m; + uint64_t ram_base = 0; + int i1, i2, i3; + int first_index, second_index, third_index; + lpae_t *first = __map_domain_page(p2m->first_level); + lpae_t pte, *second = NULL, *third = NULL; + + spin_lock(&d->arch.map_lock); + /*Suppose that first map base is a guest''s RAM base */ + if ( d->arch.map_domain.nr_banks ) + ram_base = d->arch.map_domain.bank[0].start; + spin_unlock(&d->arch.map_lock); + first_index = first_table_offset(ram_base); + second_index = second_table_offset(ram_base); + third_index = third_table_offset(ram_base); + + BUG_ON( !ram_base && "RAM base is undefined" ); + BUG_ON( !first && "Can''t map first level p2m." ); + + spin_lock(&p2m->lock); + + for ( i1 = first_index; i1 < LPAE_ENTRIES*2; ++i1 ) + { + lpae_walk_t first_pte = first[i1].walk; + if ( !first_pte.valid || !first_pte.table ) + goto out; + + second = map_domain_page(first_pte.base); + BUG_ON( !second && "Can''t map second level p2m."); + for ( i2 = second_index; i2 < LPAE_ENTRIES; ++i2 ) + { + lpae_walk_t second_pte = second[i2].walk; + if ( !second_pte.valid || !second_pte.table ) + goto out; + + third = map_domain_page(second_pte.base); + BUG_ON( !third && "Can''t map third level p2m."); + + for ( i3 = third_index; i3 < LPAE_ENTRIES; ++i3 ) + { + lpae_walk_t third_pte = third[i3].walk; + int write = 0; + if ( !third_pte.valid ) + goto out; + + pte = third[i3]; + if ( pte.p2m.write == 1 && nt == mg_ro ) + { + pte.p2m.write = 0; + write = 1; + } + else if ( pte.p2m.write == 0 && nt == mg_rw ) + { + pte.p2m.write = 1; + write = 1; + } + if ( write ) + write_pte(&third[i3], pte); + } + unmap_domain_page(third); + + third = NULL; + third_index = 0; + } + unmap_domain_page(second); + + second = NULL; + second_index = 0; + third_index = 0; + } + +out: + flush_tlb_all_local(); + if ( third ) unmap_domain_page(third); + if ( second ) unmap_domain_page(second); + if ( first ) unmap_domain_page(first); + + spin_unlock(&p2m->lock); +} + +/* Read a domain''s log-dirty bitmap and stats. + * If the operation is a CLEAN, clear the bitmap and stats. */ +int log_dirty_op(struct domain *d, xen_domctl_shadow_op_t *sc) +{ + unsigned long gmfn_start; + unsigned long gmfn_end; + unsigned long gmfns; + unsigned int bitmap_pages; + int rc = 0, clean = 0, peek = 1; + uint8_t *bitmap[256]; /* bitmap[256] covers 32GB ram */ + int i; + + BUG_ON( !d->arch.map_domain.nr_banks ); + + gmfn_start = d->arch.map_domain.bank[0].start >> PAGE_SHIFT; + gmfn_end = domain_get_maximum_gpfn(d); + gmfns = gmfn_end - gmfn_start; + bitmap_pages = PFN_UP((gmfns + 7) / 8); + + if ( guest_handle_is_null(sc->dirty_bitmap) ) + { + peek = 0; + } + else + { + /* prepare a mapping to the bitmap from guest param */ + vaddr_t to = (vaddr_t)sc->dirty_bitmap.p; /* TODO: use macro */ + + BUG_ON( to & ~PAGE_MASK && "offset not aligned to PAGE SIZE"); + + for ( i = 0; i < bitmap_pages; ++i ) + { + paddr_t g; + rc = gvirt_to_maddr(to, &g); + if ( rc ) + return rc; + bitmap[i] = map_domain_page(g>>PAGE_SHIFT); + memset(bitmap[i], 0x00, PAGE_SIZE); + to += PAGE_SIZE; + } + } + + clean = (sc->op == XEN_DOMCTL_SHADOW_OP_CLEAN); + + sc->stats.dirty_count = d->arch.dirty.count; + + spin_lock(&d->arch.dirty.lock); + + get_dirty_bitmap(d, bitmap, peek, clean); + + if ( peek ) + { + for ( i = 0; i < bitmap_pages; ++i ) + { + unmap_domain_page(bitmap[i]); + } + } + spin_unlock(&d->arch.dirty.lock); + + return 0; +} + +long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc) +{ + long ret = 0; + switch (sc->op) + { + case XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY: + case XEN_DOMCTL_SHADOW_OP_OFF: + { + enum mg nt = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? mg_rw : mg_ro; + + d->arch.dirty.mode = sc->op == XEN_DOMCTL_SHADOW_OP_OFF ? 0 : 1; + p2m_change_entry_type_global(d, nt); + + if ( sc->op == XEN_DOMCTL_SHADOW_OP_OFF ) + free_vlpt_for_p2m(d); + else + ret = alloc_vlpt_for_p2m(d); + } + break; + + case XEN_DOMCTL_SHADOW_OP_CLEAN: + case XEN_DOMCTL_SHADOW_OP_PEEK: + { + ret = log_dirty_op(d, sc); + } + break; + + default: + return -ENOSYS; + } + return ret; +} + /* * Local variables: * mode: C diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index fd976e3..be67349 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -332,6 +332,7 @@ enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; /* routine for dirty-page tracing */ int handle_page_fault(struct domain *d, paddr_t addr); +int get_dirty_bitmap(struct domain *d, uint8_t *bitmap[], int peek, int clean); #endif /* __ARCH_ARM_MM__ */ /* diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h index a00069b..fe33360 100644 --- a/xen/include/asm-arm/p2m.h +++ b/xen/include/asm-arm/p2m.h @@ -2,6 +2,7 @@ #define _XEN_P2M_H #include <xen/mm.h> +#include <public/domctl.h> struct domain; @@ -107,6 +108,9 @@ static inline int get_page_and_type(struct page_info *page, return rc; } +long dirty_mode_op(struct domain *d, xen_domctl_shadow_op_t *sc); +int add_mapped_vaddr(struct domain *d, unsigned long va); + #endif /* _XEN_P2M_H */ /* -- 1.8.1.2
Jaeyong Yoo
2013-Aug-01 12:57 UTC
[PATCH v3 10/10] xen/arm: Implement toolstack for xl restore/save and migrate
From: Alexey Sokolov <sokolov.a@samsung.com> Implement for xl restore/save (which are also used for migrate) operation in xc_arm_migrate.c and make it compilable. The overall process of save is the following: 1) save guest parameters (i.e., memory map, console and store pfn, etc) 2) save memory (if it is live, perform dirty-page tracing) 3) save hvm states (i.e., gic, timer, etc) 4) save vcpu registers (i.e., pc, sp, lr, etc) The overall process of restore is the same to the one in save. Singed-off-by: Alexey Sokolov <sokolov.a@samsung.com> --- config/arm32.mk | 1 + tools/libxc/Makefile | 5 + tools/libxc/xc_arm_migrate.c | 686 +++++++++++++++++++++++++++++++++++++++++++ tools/misc/Makefile | 4 + 4 files changed, 696 insertions(+) create mode 100644 tools/libxc/xc_arm_migrate.c diff --git a/config/arm32.mk b/config/arm32.mk index 8e21158..0100ee2 100644 --- a/config/arm32.mk +++ b/config/arm32.mk @@ -1,6 +1,7 @@ CONFIG_ARM := y CONFIG_ARM_32 := y CONFIG_ARM_$(XEN_OS) := y +CONFIG_MIGRATE := y CONFIG_XEN_INSTALL_SUFFIX : diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile index 512a994..05dfef4 100644 --- a/tools/libxc/Makefile +++ b/tools/libxc/Makefile @@ -42,8 +42,13 @@ CTRL_SRCS-$(CONFIG_MiniOS) += xc_minios.c GUEST_SRCS-y : GUEST_SRCS-y += xg_private.c xc_suspend.c ifeq ($(CONFIG_MIGRATE),y) +ifeq ($(CONFIG_X86),y) GUEST_SRCS-y += xc_domain_restore.c xc_domain_save.c GUEST_SRCS-y += xc_offline_page.c xc_compression.c +endif +ifeq ($(CONFIG_ARM),y) +GUEST_SRCS-y += xc_arm_migrate.c +endif else GUEST_SRCS-y += xc_nomigrate.c endif diff --git a/tools/libxc/xc_arm_migrate.c b/tools/libxc/xc_arm_migrate.c new file mode 100644 index 0000000..9f642f3 --- /dev/null +++ b/tools/libxc/xc_arm_migrate.c @@ -0,0 +1,686 @@ +/****************************************************************************** + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; + * version 2.1 of the License. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + * + * Copyright (c) 2013, Samsung Electronics + */ + +#include <inttypes.h> +#include <errno.h> +#include <xenctrl.h> +#include <xenguest.h> + +#include <unistd.h> +#include <xc_private.h> +#include <xc_dom.h> +#include "xc_bitops.h" +#include "xg_private.h" + +#define DEF_MAX_ITERS 29 /* limit us to 30 times round loop */ +#define DEF_MAX_FACTOR 3 /* never send more than 3x p2m_size */ +#define DEF_MIN_DIRTY_PER_ITER 50 /* dirty page count to define last iter */ +#define DEF_PROGRESS_RATE 50 /* progress bar update rate */ + +/* + * Guest params to save: used HVM params, save flags, memory map + */ +typedef struct guest_params { + unsigned long console_pfn; + unsigned long store_pfn; + uint32_t flags; + uint32_t mem_map_nr_entries; + struct dt_mem_info memmap; /* Memory map */ +} guest_params_t; + +static int suspend_and_state(int (*suspend)(void*), void *data, + xc_interface *xch, int dom) +{ + xc_dominfo_t info; + if ( !(*suspend)(data) ) + { + ERROR("Suspend request failed"); + return -1; + } + + if ( (xc_domain_getinfo(xch, dom, 1, &info) != 1) || + !info.shutdown || (info.shutdown_reason != SHUTDOWN_suspend) ) + { + ERROR("Domain is not in suspended state after suspend attempt"); + return -1; + } + + return 0; +} + +static int write_exact_handled(xc_interface *xch, int fd, const void *data, + size_t size) +{ + if ( write_exact(fd, data, size) ) + { + ERROR("Write failed, check space"); + return -1; + } + return 0; +} + +/* ============ Memory ============= */ +static int save_memory(xc_interface *xch, int io_fd, uint32_t dom, + struct save_callbacks *callbacks, + uint32_t max_iters, uint32_t max_factor, + guest_params_t *params) +{ + int live = !!(params->flags & XCFLAGS_LIVE); + int debug = !!(params->flags & XCFLAGS_DEBUG); + const char zero = 0; + char reportbuf[80]; + int iter = 0; + int last_iter = !live; + int total_dirty_pages_num = 0; + int dirty_pages_on_prev_iter_num = 0; + + DECLARE_HYPERCALL_BUFFER(unsigned long, to_send); + + /* We suppose that guest''s memory base is the first region base */ + xen_pfn_t start = (params->memmap.bank[0].start >> PAGE_SHIFT); + const xen_pfn_t end = xc_domain_maximum_gpfn(xch, dom); + const xen_pfn_t mem_size = end - start; + xen_pfn_t i; + + if ( write_exact_handled(xch, io_fd, &end, sizeof(xen_pfn_t)) ) + return -1; + + if ( live ) + { + if ( xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_ENABLE_LOGDIRTY, + NULL, 0, NULL, 0, NULL) < 0 ) + { + ERROR("Couldn''t enable log-dirty mode !\n"); + return -1; + } + if ( debug ) + IPRINTF("Log-dirty mode enabled!\n"); + + max_iters = max_iters ? : DEF_MAX_ITERS; + max_factor = max_factor ? : DEF_MAX_FACTOR; + } + + to_send = xc_hypercall_buffer_alloc_pages(xch, to_send, + NRPAGES(bitmap_size(mem_size))); + if ( !to_send ) + { + ERROR("Couldn''t allocate to_send array!\n"); + return -1; + } + + /* send all pages on first iter */ + memset(to_send, 0xff, bitmap_size(mem_size)); + + for ( ; ; ) + { + int dirty_pages_on_current_iter_num = 0; + int frc; + iter++; + + snprintf(reportbuf, sizeof(reportbuf), + "Saving memory: iter %d (last sent %u)", + iter, dirty_pages_on_prev_iter_num); + + xc_report_progress_start(xch, reportbuf, mem_size); + + if ( (iter > 1 && + dirty_pages_on_prev_iter_num < DEF_MIN_DIRTY_PER_ITER) || + (iter == max_iters) || + (total_dirty_pages_num >= mem_size*max_factor) ) + { + if ( debug ) + IPRINTF("Last iteration"); + last_iter = 1; + } + + if ( last_iter ) + { + if ( suspend_and_state(callbacks->suspend, callbacks->data, + xch, dom) ) + { + ERROR("Domain appears not to have suspended"); + return -1; + } + } + if ( live && iter > 1 ) + { + frc = xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_CLEAN, + HYPERCALL_BUFFER(to_send), mem_size, + NULL, 0, NULL); + if ( frc != mem_size ) + { + ERROR("Error peeking shadow bitmap"); + xc_hypercall_buffer_free_pages(xch, to_send, + NRPAGES(bitmap_size(mem_size))); + return -1; + } + } + + for ( i = start; i < end; ++i ) + { + if ( test_bit(i - start, to_send) ) + { + const char one = 1; + char *page = xc_map_foreign_range(xch, dom, PAGE_SIZE, + PROT_READ | PROT_WRITE, i); + if ( !page ) + { + PERROR("xc_map_foreign_range failed, pfn=%llx", i); + return -1; + } + + if ( write_exact_handled(xch, io_fd, &one, 1) || + write_exact_handled(xch, io_fd, &i, sizeof(i)) || + write_exact_handled(xch, io_fd, page, PAGE_SIZE) ) + { + munmap(page, PAGE_SIZE); + return -1; + } + munmap(page, PAGE_SIZE); + + if ( (i % DEF_PROGRESS_RATE) == 0 ) + xc_report_progress_step(xch, i - start, mem_size); + dirty_pages_on_current_iter_num++; + } + } + + if ( debug ) + IPRINTF("Dirty pages=%d", dirty_pages_on_current_iter_num); + xc_report_progress_step(xch, mem_size, mem_size); + + dirty_pages_on_prev_iter_num = dirty_pages_on_current_iter_num; + total_dirty_pages_num += dirty_pages_on_current_iter_num; + + if ( last_iter ) + { + xc_hypercall_buffer_free_pages(xch, to_send, + NRPAGES(bitmap_size(mem_size))); + if ( live ) + { + if ( xc_shadow_control(xch, dom, XEN_DOMCTL_SHADOW_OP_OFF, + NULL, 0, NULL, 0, NULL) < 0 ) + ERROR("Couldn''t disable log-dirty mode"); + } + break; + } + } + return write_exact_handled(xch, io_fd, &zero, 1); +} + +static int restore_memory(xc_interface *xch, int io_fd, uint32_t dom, + guest_params_t *params) +{ + xen_pfn_t end; + xen_pfn_t gpfn; + + /* We suppose that guest''s memory base is the first region base */ + xen_pfn_t start = (params->memmap.bank[0].start >> PAGE_SHIFT); + + if ( read_exact(io_fd, &end, sizeof(xen_pfn_t)) ) + { + PERROR("First read of incoming memory failed"); + return -1; + } + + /* TODO allocate several pages per call */ + for ( gpfn = start; gpfn < end; ++gpfn ) + { + if ( xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &gpfn) ) + { + PERROR("Memory allocation for a new domain failed"); + return -1; + } + } + while ( 1 ) + { + char new_page; + xen_pfn_t gpfn; + char *page; + if ( read_exact(io_fd, &new_page, 1) ) + { + PERROR("End-checking flag read failed during memory transfer"); + return -1; + } + if ( !new_page ) + break; + + if ( read_exact(io_fd, &gpfn, sizeof(gpfn)) ) + { + PERROR("GPFN read failed during memory transfer"); + return -1; + } + if ( gpfn < start || gpfn >= end ) + { + ERROR("GPFN %llx doesn''t belong to RAM address space", gpfn); + return -1; + } + page = xc_map_foreign_range(xch, dom, PAGE_SIZE, + PROT_READ | PROT_WRITE, gpfn); + if ( !page ) + { + PERROR("xc_map_foreign_range failed, pfn=%llx", gpfn); + return -1; + } + if ( read_exact(io_fd, page, PAGE_SIZE) ) + { + PERROR("Page data read failed during memory transfer"); + return -1; + } + munmap(page, PAGE_SIZE); + } + + return 0; +} + +/* ============ HVM context =========== */ +static int save_armhvm(xc_interface *xch, int io_fd, uint32_t dom, int debug) +{ + /* HVM: a buffer for holding HVM context */ + uint32_t hvm_buf_size = 0; + uint8_t *hvm_buf = NULL; + uint32_t rec_size; + int retval = -1; + + /* Need another buffer for HVM context */ + hvm_buf_size = xc_domain_hvm_getcontext(xch, dom, 0, 0); + if ( hvm_buf_size == -1 ) + { + ERROR("Couldn''t get HVM context size from Xen"); + goto out; + } + hvm_buf = malloc(hvm_buf_size); + + if ( !hvm_buf ) + { + ERROR("Couldn''t allocate memory for hvm buffer"); + goto out; + } + + /* Get HVM context from Xen and save it too */ + if ( (rec_size = xc_domain_hvm_getcontext(xch, dom, hvm_buf, + hvm_buf_size)) == -1 ) + { + ERROR("HVM:Could not get hvm buffer"); + goto out; + } + + if ( debug ) + IPRINTF("HVM save size %d %d", hvm_buf_size, rec_size); + + if ( write_exact_handled(xch, io_fd, &rec_size, sizeof(uint32_t)) ) + goto out; + + if ( write_exact_handled(xch, io_fd, hvm_buf, rec_size) ) + { + goto out; + } + retval = 0; + +out: + if ( hvm_buf ) + free (hvm_buf); + return retval; +} + +static int restore_armhvm(xc_interface *xch, int io_fd, + uint32_t dom, int debug) +{ + uint32_t rec_size; + uint32_t hvm_buf_size = 0; + uint8_t *hvm_buf = NULL; + int frc = 0; + int retval = -1; + + if ( read_exact(io_fd, &rec_size, sizeof(uint32_t)) ) + { + PERROR("Could not read HVM size"); + goto out; + } + + if ( !rec_size ) + { + ERROR("Zero HVM size"); + goto out; + } + + if ( debug ) + { + IPRINTF("HVM restore size %d %d", hvm_buf_size, rec_size); + } + + hvm_buf_size = xc_domain_hvm_getcontext(xch, dom, 0, 0); + if ( hvm_buf_size != rec_size ) + { + ERROR("HVM size for this domain is not the same as stored"); + } + + hvm_buf = malloc(hvm_buf_size); + if ( !hvm_buf ) + { + ERROR("Couldn''t allocate memory"); + goto out; + } + + if ( read_exact(io_fd, hvm_buf, hvm_buf_size) ) + { + PERROR("Could not read HVM context"); + goto out; + } + + frc = xc_domain_hvm_setcontext(xch, dom, hvm_buf, hvm_buf_size); + if ( frc ) + { + ERROR("error setting the HVM context"); + goto out; + } + retval = 0; + +out: + if ( hvm_buf ) + free (hvm_buf); + return retval; +} + + +/* ================= Console & Xenstore & Memory map =========== */ + +static guest_params_t * save_guest_params(xc_interface *xch, int io_fd, + uint32_t dom, uint32_t flags) +{ + + guest_params_t *p = NULL; + size_t sz = sizeof(guest_params_t); + + p = malloc(sz); + if ( p == NULL ) + { + ERROR("Couldn''t allocate memory"); + return NULL; + } + + if ( xc_domain_get_memory_map(xch, dom, &p->memmap) ) + { + ERROR("Can''t get memory map"); + free(p); + return NULL; + } + + if ( flags & XCFLAGS_DEBUG ) + { + IPRINTF("Guest param save size: %d ", sz); + IPRINTF("Guest memory map save %d entries", p->memmap.nr_banks); + } + + if ( xc_get_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN, &p->console_pfn) ) + { + ERROR("Can''t get console gpfn"); + free (p); + return NULL; + } + + if ( xc_get_hvm_param(xch, dom, HVM_PARAM_STORE_PFN, &p->store_pfn) ) + { + ERROR("Can''t get store gpfn"); + free (p); + return NULL; + } + + p->flags = flags; + + if ( write_exact_handled(xch, io_fd, p, sz) ) + { + free (p); + return NULL; + } + return p; +} + +static guest_params_t * restore_guest_params(xc_interface *xch, int io_fd, + uint32_t dom) +{ + guest_params_t *p = NULL; + size_t sz = sizeof(guest_params_t); + + p = malloc(sz); + if ( p == NULL ) + { + ERROR("Couldn''t allocate memory"); + return NULL; + } + + if ( read_exact(io_fd, p, sizeof(guest_params_t)) ) + { + PERROR("Can''t read guest params"); + free(p); + return NULL; + } + + if ( p->flags & XCFLAGS_DEBUG ) + { + IPRINTF("Guest param restore size: %d ", sz); + IPRINTF("Guest memory map restore %d entries", p->memmap.nr_banks); + } + + if ( xc_domain_set_memory_map(xch, dom, &p->memmap) ) + { + free (p); + ERROR("Can''t set memory map"); + return NULL; + } + return p; +} + +static int set_guest_params(xc_interface *xch, int io_fd, uint32_t dom, + guest_params_t *params, unsigned int console_evtchn, + domid_t console_domid, unsigned int store_evtchn, + domid_t store_domid) +{ + int rc = 0; + + if ( (rc = xc_clear_domain_page(xch, dom, params->console_pfn)) ) + { + ERROR("Can''t clear console page"); + return rc; + } + + if ( (rc = xc_clear_domain_page(xch, dom, params->store_pfn)) ) + { + ERROR("Can''t clear xenstore page"); + return rc; + } + + if ( (rc = xc_dom_gnttab_hvm_seed(xch, dom, params->console_pfn, + params->store_pfn, console_domid, + store_domid)) ) + { + ERROR("Can''t grant console and xenstore pages"); + return rc; + } + + if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN, + params->console_pfn)) ) + { + ERROR("Can''t set console gpfn"); + return rc; + } + + if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN, + params->store_pfn)) ) + { + ERROR("Can''t set xenstore gpfn"); + return rc; + } + + if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_EVTCHN, + console_evtchn)) ) + { + ERROR("Can''t set console event channel"); + return rc; + } + + if ( (rc = xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_EVTCHN, + store_evtchn)) ) + { + ERROR("Can''t set xenstore event channel"); + return rc; + } + return 0; +} + +/* ====================== VCPU ============== */ +static int save_vcpu(xc_interface *xch, int io_fd, uint32_t dom) +{ + vcpu_guest_context_any_t ctxt; + xc_vcpu_getcontext(xch, dom, 0, &ctxt); + return write_exact_handled(xch, io_fd, &ctxt, sizeof(ctxt)); +} + +static int restore_vcpu(xc_interface *xch, int io_fd, uint32_t dom) +{ + int rc = -1; + DECLARE_DOMCTL; + DECLARE_HYPERCALL_BUFFER(vcpu_guest_context_any_t, ctxt); + + ctxt = xc_hypercall_buffer_alloc(xch, ctxt, sizeof(*ctxt)); + memset(ctxt, 0, sizeof(*ctxt)); + + if ( read_exact(io_fd, ctxt, sizeof(*ctxt)) ) + { + PERROR("VCPU context read failed"); + goto out; + } + + memset(&domctl, 0, sizeof(domctl)); + domctl.cmd = XEN_DOMCTL_setvcpucontext; + domctl.domain = dom; + domctl.u.vcpucontext.vcpu = 0; + set_xen_guest_handle(domctl.u.vcpucontext.ctxt, ctxt); + rc = do_domctl(xch, &domctl); + if ( rc ) + ERROR("VCPU context set failed (error %d)", rc); + +out: + xc_hypercall_buffer_free(xch, ctxt); + return rc; +} + +/* ================== Main ============== */ +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, + uint32_t max_iters, uint32_t max_factor, uint32_t flags, + struct save_callbacks *callbacks, int hvm, + unsigned long vm_generationid_addr) +{ + int debug = !!(flags & XCFLAGS_DEBUG); + guest_params_t *params = NULL; + + if ( (params = save_guest_params(xch, io_fd, dom, flags)) == NULL ) + { + ERROR("Can''t save guest params"); + return -1; + } + + if ( save_memory(xch, io_fd, dom, callbacks, max_iters, + max_factor, params) ) + { + ERROR("Memory not saved"); + free(params); + return -1; + } + + if ( save_armhvm(xch, io_fd, dom, debug) ) + { + ERROR("HVM not saved"); + free(params); + return -1; + } + + if ( save_vcpu(xch, io_fd, dom) ) + { + ERROR("VCPU not saved"); + free(params); + return -1; + } + free(params); + return 0; +} + +int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, + unsigned int store_evtchn, unsigned long *store_gpfn, + domid_t store_domid, unsigned int console_evtchn, + unsigned long *console_gpfn, domid_t console_domid, + unsigned int hvm, unsigned int pae, int superpages, + int no_incr_generationid, + unsigned long *vm_generationid_addr, + struct restore_callbacks *callbacks) +{ + guest_params_t *params = NULL; + int debug = 0; + + if ( (params = restore_guest_params(xch, io_fd, dom)) == NULL ) + { + ERROR("Can''t restore guest params"); + return -1; + } + debug = !!( params->flags & XCFLAGS_DEBUG ); + + if ( restore_memory(xch, io_fd, dom, params) ) + { + ERROR("Can''t restore memory"); + free(params); + return -1; + } + if ( set_guest_params(xch, io_fd, dom, params, + console_evtchn, console_domid, + store_evtchn, store_domid) ) + { + ERROR("Can''t setup guest params"); + free(params); + return -1; + } + + /* Setup console and store PFNs to caller */ + *console_gpfn = params->console_pfn; + *store_gpfn = params->store_pfn; + + if ( restore_armhvm(xch, io_fd, dom, debug) ) + { + ERROR("HVM not restored"); + free(params); + return -1; + } + + if ( restore_vcpu(xch, io_fd, dom) ) + { + ERROR("Can''t restore VCPU"); + free(params); + return -1; + } + + free(params); + return 0; +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/tools/misc/Makefile b/tools/misc/Makefile index 520ef80..5338f87 100644 --- a/tools/misc/Makefile +++ b/tools/misc/Makefile @@ -11,7 +11,9 @@ HDRS = $(wildcard *.h) TARGETS-y := xenperf xenpm xen-tmem-list-parse gtraceview gtracestat xenlockprof xenwatchdogd xencov TARGETS-$(CONFIG_X86) += xen-detect xen-hvmctx xen-hvmcrash xen-lowmemd +ifeq ($(CONFIG_X86),y) TARGETS-$(CONFIG_MIGRATE) += xen-hptool +endif TARGETS := $(TARGETS-y) SUBDIRS-$(CONFIG_LOMOUNT) += lomount @@ -25,7 +27,9 @@ INSTALL_BIN := $(INSTALL_BIN-y) INSTALL_SBIN-y := xm xen-bugtool xen-python-path xend xenperf xsview xenpm xen-tmem-list-parse gtraceview \ gtracestat xenlockprof xenwatchdogd xen-ringwatch xencov INSTALL_SBIN-$(CONFIG_X86) += xen-hvmctx xen-hvmcrash xen-lowmemd +ifeq ($(CONFIG_X86),y) INSTALL_SBIN-$(CONFIG_MIGRATE) += xen-hptool +endif INSTALL_SBIN := $(INSTALL_SBIN-y) INSTALL_PRIVBIN-y := xenpvnetboot -- 1.8.1.2
Stefano Stabellini
2013-Aug-04 16:27 UTC
Re: [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing
On Thu, 1 Aug 2013, Jaeyong Yoo wrote:> Add handling write fault in do_trap_data_abort_guest for dirty-page tracing. > Rather than maintaining a bitmap for dirty pages, we use the avail bit in p2m entry. > For locating the write fault pte in guest p2m, we use virtual-linear page table > that slots guest p2m into xen''s virtual memory. > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>Looks good to me. I would appreciated some more comments in the code to explain the inner working of the vlp2m. Nonetheless: Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>> xen/arch/arm/mm.c | 110 +++++++++++++++++++++++++++++++++++++++- > xen/arch/arm/traps.c | 16 +++++- > xen/include/asm-arm/domain.h | 11 ++++ > xen/include/asm-arm/mm.h | 5 ++ > xen/include/asm-arm/processor.h | 2 + > 5 files changed, 142 insertions(+), 2 deletions(-) > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index 9d5d3e0..a24afe6 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -680,7 +680,6 @@ void destroy_xen_mappings(unsigned long v, unsigned long e) > create_xen_entries(REMOVE, v, 0, (e - v) >> PAGE_SHIFT, 0); > } > > -enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; > static void set_pte_flags_on_range(const char *p, unsigned long l, enum mg mg) > { > lpae_t pte; > @@ -1214,6 +1213,115 @@ int is_iomem_page(unsigned long mfn) > return 1; > return 0; > } > + > +static uint64_t find_guest_p2m_mfn(struct domain *d, paddr_t addr) > +{ > + lpae_t *first = NULL, *second = NULL; > + struct p2m_domain *p2m = &d->arch.p2m; > + uint64_t mfn = -EFAULT; > + > + if ( first_table_offset(addr) >= LPAE_ENTRIES ) > + return mfn; > + > + first = __map_domain_page(p2m->first_level); > + > + if ( !first || > + !first[first_table_offset(addr)].walk.valid || > + !first[first_table_offset(addr)].walk.table ) > + goto done; > + > + second = map_domain_page(first[first_table_offset(addr)].walk.base); > + > + if ( !second || > + !second[second_table_offset(addr)].walk.valid || > + !second[second_table_offset(addr)].walk.table ) > + goto done; > + > + mfn = second[second_table_offset(addr)].walk.base; > + > +done: > + if ( second ) unmap_domain_page(second); > + if ( first ) unmap_domain_page(first); > + > + return mfn; > +} > + > +/* > + * routine for dirty-page tracing > + * > + * On first write, it page faults, its entry is changed to read-write, > + * and on retry the write succeeds. > + * > + * for locating p2m of the faulting entry, we use virtual-linear page table. > + */ > +int handle_page_fault(struct domain *d, paddr_t addr) > +{ > + int rc = 0; > + struct p2m_domain *p2m = &d->arch.p2m; > + uint64_t gma_start; > + int gma_third_index; > + int xen_second_linear, xen_third_table; > + lpae_t *xen_third; > + lpae_t *vlp2m_pte; > + > + BUG_ON( !d->arch.map_domain.nr_banks ); > + > + gma_start = d->arch.map_domain.bank[0].start; > + gma_third_index = third_linear_offset(addr - gma_start); > + vlp2m_pte = (lpae_t *)(d->arch.dirty.vlpt_start + > + sizeof(lpae_t) * gma_third_index); > + > + BUG_ON( (void *)vlp2m_pte > d->arch.dirty.vlpt_end ); > + > + spin_lock(&p2m->lock); > + > + xen_second_linear = second_linear_offset((unsigned long)vlp2m_pte); > + xen_third_table = third_table_offset((unsigned long)vlp2m_pte); > + > + /* starting from xen second level page table */ > + if ( !xen_second[xen_second_linear].pt.valid ) > + { > + unsigned long va = (unsigned long)vlp2m_pte & ~(PAGE_SIZE-1); > + > + rc = create_xen_table(&xen_second[second_linear_offset(va)]); > + if ( rc < 0 ) > + goto out; > + } > + > + BUG_ON( !xen_second[xen_second_linear].pt.valid ); > + > + /* at this point, xen second level pt has valid entry > + * check again the validity of third level pt */ > + xen_third = __va(pfn_to_paddr(xen_second[xen_second_linear].pt.base)); > + > + /* xen third-level page table invalid */ > + if ( !xen_third[xen_third_table].p2m.valid ) > + { > + uint64_t mfn = find_guest_p2m_mfn(d, addr); > + lpae_t pte = mfn_to_xen_entry(mfn); > + unsigned long va = (unsigned long)vlp2m_pte & ~(PAGE_SIZE-1); > + > + pte.pt.table = 1; /* 4k mappings always have this bit set */ > + write_pte(&xen_third[xen_third_table], pte); > + flush_xen_data_tlb_range_va(va, PAGE_SIZE); > + } > + > + /* at this point, xen third level pt has valid entry: means we can access > + * vlp2m_pte vlp2m_pte is like a fourth level pt for xen, but for guest, > + * it is third level pt */ > + if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0 ) > + { > + vlp2m_pte->p2m.write = 1; > + vlp2m_pte->p2m.avail = 1; > + write_pte(vlp2m_pte, *vlp2m_pte); > + flush_tlb_local(); > + } > + > +out: > + spin_unlock(&p2m->lock); > + return rc; > +} > + > /* > * Local variables: > * mode: C > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > index 1b9209d..f844f56 100644 > --- a/xen/arch/arm/traps.c > +++ b/xen/arch/arm/traps.c > @@ -1226,7 +1226,12 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, > goto bad_data_abort; > > /* XXX: Decode the instruction if ISS is not valid */ > - if ( !dabt.valid ) > + /* Note: add additional check before goto bad_data_abort. dabt.valid > + * bit is for telling the validity of ISS[23:16] bits. For dirty-page > + * tracing, we need to see DFSC bits. If DFSC bits are indicating the > + * possibility of dirty page tracing, do not go to bad_data_abort */ > + if ( !dabt.valid && > + (dabt.dfsc & FSC_MASK) != (FSC_FLT_PERM + FSC_3D_LEVEL) && dabt.write) > goto bad_data_abort; > > if (handle_mmio(&info)) > @@ -1235,6 +1240,15 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, > return; > } > > + /* handle permission fault on write */ > + if ( (dabt.dfsc & FSC_MASK) == (FSC_FLT_PERM + FSC_3D_LEVEL) && dabt.write ) > + { > + if ( current->domain->arch.dirty.mode == 0 ) > + goto bad_data_abort; > + if ( handle_page_fault(current->domain, info.gpa) == 0 ) > + return; > + } > + > bad_data_abort: > > msg = decode_fsc( dabt.dfsc, &level); > diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h > index 0c80c65..413b89a 100644 > --- a/xen/include/asm-arm/domain.h > +++ b/xen/include/asm-arm/domain.h > @@ -110,6 +110,17 @@ struct arch_domain > spinlock_t lock; > } uart0; > > + /* dirty-page tracing */ > + struct { > + spinlock_t lock; > + int mode; > + unsigned int count; > + uint32_t gmfn_guest_start; /* guest physical memory start address */ > + void *vlpt_start; /* va-start of guest p2m */ > + void *vlpt_end; /* va-end of guest p2m */ > + struct page_info *head; /* maintain the mapped vaddrs */ > + } dirty; > + > struct dt_mem_info map_domain; > spinlock_t map_lock; > } __cacheline_aligned; > diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h > index 404ec4d..fd976e3 100644 > --- a/xen/include/asm-arm/mm.h > +++ b/xen/include/asm-arm/mm.h > @@ -328,6 +328,11 @@ static inline void put_page_and_type(struct page_info *page) > put_page(page); > } > > +enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; > + > +/* routine for dirty-page tracing */ > +int handle_page_fault(struct domain *d, paddr_t addr); > + > #endif /* __ARCH_ARM_MM__ */ > /* > * Local variables: > diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h > index 06b0b25..34c21de 100644 > --- a/xen/include/asm-arm/processor.h > +++ b/xen/include/asm-arm/processor.h > @@ -383,6 +383,8 @@ union hsr { > #define FSC_CPR (0x3a) /* Coprocossor Abort */ > > #define FSC_LL_MASK (0x03<<0) > +#define FSC_MASK (0x3f) /* Fault status mask */ > +#define FSC_3D_LEVEL (0x03) /* Third level fault*/ > > /* Time counter hypervisor control register */ > #define CNTHCTL_PA (1u<<0) /* Kernel/user access to physical counter */ > -- > 1.8.1.2 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
Jaeyong Yoo
2013-Aug-05 00:23 UTC
Re: [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing
> -----Original Message----- > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com] > Sent: Monday, August 05, 2013 1:28 AM > To: Jaeyong Yoo > Cc: xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling write > fault for dirty-page tracing > > On Thu, 1 Aug 2013, Jaeyong Yoo wrote: > > Add handling write fault in do_trap_data_abort_guest for dirty-page > tracing. > > Rather than maintaining a bitmap for dirty pages, we use the avail bit > in p2m entry. > > For locating the write fault pte in guest p2m, we use virtual-linear > > page table that slots guest p2m into xen''s virtual memory. > > > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > > Looks good to me. > I would appreciated some more comments in the code to explain the inner > working of the vlp2m.I got it. One question: If you see patch #6, it implements the allocation and free of vlp2m memory (xen/arch/arm/vlpt.c) which is almost the same to vmap allocation (xen/arch/arm/vmap.c). To be honest, I copied vmap.c and change the virtual address start/end points and the name. While I was doing that, I think it would be better if we naje a common interface, something like Virtual address allocator. That is, if we create a virtual address allocator giving the VA range from A to B, the allocator allocates the VA in between A and B. And, we initialize the virtual allocator instance at boot stage.> Nonetheless: > > Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > > > > xen/arch/arm/mm.c | 110 > +++++++++++++++++++++++++++++++++++++++- > > xen/arch/arm/traps.c | 16 +++++- > > xen/include/asm-arm/domain.h | 11 ++++ > > xen/include/asm-arm/mm.h | 5 ++ > > xen/include/asm-arm/processor.h | 2 + > > 5 files changed, 142 insertions(+), 2 deletions(-) > > > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index > > 9d5d3e0..a24afe6 100644 > > --- a/xen/arch/arm/mm.c > > +++ b/xen/arch/arm/mm.c > > @@ -680,7 +680,6 @@ void destroy_xen_mappings(unsigned long v, unsigned > long e) > > create_xen_entries(REMOVE, v, 0, (e - v) >> PAGE_SHIFT, 0); } > > > > -enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; static void > > set_pte_flags_on_range(const char *p, unsigned long l, enum mg mg) { > > lpae_t pte; > > @@ -1214,6 +1213,115 @@ int is_iomem_page(unsigned long mfn) > > return 1; > > return 0; > > } > > + > > +static uint64_t find_guest_p2m_mfn(struct domain *d, paddr_t addr) { > > + lpae_t *first = NULL, *second = NULL; > > + struct p2m_domain *p2m = &d->arch.p2m; > > + uint64_t mfn = -EFAULT; > > + > > + if ( first_table_offset(addr) >= LPAE_ENTRIES ) > > + return mfn; > > + > > + first = __map_domain_page(p2m->first_level); > > + > > + if ( !first || > > + !first[first_table_offset(addr)].walk.valid || > > + !first[first_table_offset(addr)].walk.table ) > > + goto done; > > + > > + second > > + map_domain_page(first[first_table_offset(addr)].walk.base); > > + > > + if ( !second || > > + !second[second_table_offset(addr)].walk.valid || > > + !second[second_table_offset(addr)].walk.table ) > > + goto done; > > + > > + mfn = second[second_table_offset(addr)].walk.base; > > + > > +done: > > + if ( second ) unmap_domain_page(second); > > + if ( first ) unmap_domain_page(first); > > + > > + return mfn; > > +} > > + > > +/* > > + * routine for dirty-page tracing > > + * > > + * On first write, it page faults, its entry is changed to > > +read-write, > > + * and on retry the write succeeds. > > + * > > + * for locating p2m of the faulting entry, we use virtual-linear page > table. > > + */ > > +int handle_page_fault(struct domain *d, paddr_t addr) { > > + int rc = 0; > > + struct p2m_domain *p2m = &d->arch.p2m; > > + uint64_t gma_start; > > + int gma_third_index; > > + int xen_second_linear, xen_third_table; > > + lpae_t *xen_third; > > + lpae_t *vlp2m_pte; > > + > > + BUG_ON( !d->arch.map_domain.nr_banks ); > > + > > + gma_start = d->arch.map_domain.bank[0].start; > > + gma_third_index = third_linear_offset(addr - gma_start); > > + vlp2m_pte = (lpae_t *)(d->arch.dirty.vlpt_start + > > + sizeof(lpae_t) * gma_third_index); > > + > > + BUG_ON( (void *)vlp2m_pte > d->arch.dirty.vlpt_end ); > > + > > + spin_lock(&p2m->lock); > > + > > + xen_second_linear = second_linear_offset((unsigned long)vlp2m_pte); > > + xen_third_table = third_table_offset((unsigned long)vlp2m_pte); > > + > > + /* starting from xen second level page table */ > > + if ( !xen_second[xen_second_linear].pt.valid ) > > + { > > + unsigned long va = (unsigned long)vlp2m_pte & ~(PAGE_SIZE-1); > > + > > + rc = create_xen_table(&xen_second[second_linear_offset(va)]); > > + if ( rc < 0 ) > > + goto out; > > + } > > + > > + BUG_ON( !xen_second[xen_second_linear].pt.valid ); > > + > > + /* at this point, xen second level pt has valid entry > > + * check again the validity of third level pt */ > > + xen_third > > + __va(pfn_to_paddr(xen_second[xen_second_linear].pt.base)); > > + > > + /* xen third-level page table invalid */ > > + if ( !xen_third[xen_third_table].p2m.valid ) > > + { > > + uint64_t mfn = find_guest_p2m_mfn(d, addr); > > + lpae_t pte = mfn_to_xen_entry(mfn); > > + unsigned long va = (unsigned long)vlp2m_pte & ~(PAGE_SIZE-1); > > + > > + pte.pt.table = 1; /* 4k mappings always have this bit set */ > > + write_pte(&xen_third[xen_third_table], pte); > > + flush_xen_data_tlb_range_va(va, PAGE_SIZE); > > + } > > + > > + /* at this point, xen third level pt has valid entry: means we can > access > > + * vlp2m_pte vlp2m_pte is like a fourth level pt for xen, but for > guest, > > + * it is third level pt */ > > + if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0 ) > > + { > > + vlp2m_pte->p2m.write = 1; > > + vlp2m_pte->p2m.avail = 1; > > + write_pte(vlp2m_pte, *vlp2m_pte); > > + flush_tlb_local(); > > + } > > + > > +out: > > + spin_unlock(&p2m->lock); > > + return rc; > > +} > > + > > /* > > * Local variables: > > * mode: C > > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index > > 1b9209d..f844f56 100644 > > --- a/xen/arch/arm/traps.c > > +++ b/xen/arch/arm/traps.c > > @@ -1226,7 +1226,12 @@ static void do_trap_data_abort_guest(struct > cpu_user_regs *regs, > > goto bad_data_abort; > > > > /* XXX: Decode the instruction if ISS is not valid */ > > - if ( !dabt.valid ) > > + /* Note: add additional check before goto bad_data_abort.dabt.valid> > + * bit is for telling the validity of ISS[23:16] bits. For dirty- > page > > + * tracing, we need to see DFSC bits. If DFSC bits are indicating > the > > + * possibility of dirty page tracing, do not go to bad_data_abort*/> > + if ( !dabt.valid && > > + (dabt.dfsc & FSC_MASK) != (FSC_FLT_PERM + FSC_3D_LEVEL) && > > + dabt.write) > > goto bad_data_abort; > > > > if (handle_mmio(&info)) > > @@ -1235,6 +1240,15 @@ static void do_trap_data_abort_guest(struct > cpu_user_regs *regs, > > return; > > } > > > > + /* handle permission fault on write */ > > + if ( (dabt.dfsc & FSC_MASK) == (FSC_FLT_PERM + FSC_3D_LEVEL) && > dabt.write ) > > + { > > + if ( current->domain->arch.dirty.mode == 0 ) > > + goto bad_data_abort; > > + if ( handle_page_fault(current->domain, info.gpa) == 0 ) > > + return; > > + } > > + > > bad_data_abort: > > > > msg = decode_fsc( dabt.dfsc, &level); diff --git > > a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index > > 0c80c65..413b89a 100644 > > --- a/xen/include/asm-arm/domain.h > > +++ b/xen/include/asm-arm/domain.h > > @@ -110,6 +110,17 @@ struct arch_domain > > spinlock_t lock; > > } uart0; > > > > + /* dirty-page tracing */ > > + struct { > > + spinlock_t lock; > > + int mode; > > + unsigned int count; > > + uint32_t gmfn_guest_start; /* guest physical memory start > address */ > > + void *vlpt_start; /* va-start of guest p2m */ > > + void *vlpt_end; /* va-end of guest p2m */ > > + struct page_info *head; /* maintain the mapped vaddrs */ > > + } dirty; > > + > > struct dt_mem_info map_domain; > > spinlock_t map_lock; > > } __cacheline_aligned; > > diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index > > 404ec4d..fd976e3 100644 > > --- a/xen/include/asm-arm/mm.h > > +++ b/xen/include/asm-arm/mm.h > > @@ -328,6 +328,11 @@ static inline void put_page_and_type(struct > page_info *page) > > put_page(page); > > } > > > > +enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; > > + > > +/* routine for dirty-page tracing */ > > +int handle_page_fault(struct domain *d, paddr_t addr); > > + > > #endif /* __ARCH_ARM_MM__ */ > > /* > > * Local variables: > > diff --git a/xen/include/asm-arm/processor.h > > b/xen/include/asm-arm/processor.h index 06b0b25..34c21de 100644 > > --- a/xen/include/asm-arm/processor.h > > +++ b/xen/include/asm-arm/processor.h > > @@ -383,6 +383,8 @@ union hsr { > > #define FSC_CPR (0x3a) /* Coprocossor Abort */ > > > > #define FSC_LL_MASK (0x03<<0) > > +#define FSC_MASK (0x3f) /* Fault status mask */ > > +#define FSC_3D_LEVEL (0x03) /* Third level fault*/ > > > > /* Time counter hypervisor control register */ > > #define CNTHCTL_PA (1u<<0) /* Kernel/user access to physical > counter */ > > -- > > 1.8.1.2 > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel > >
Stefano Stabellini
2013-Aug-05 11:11 UTC
Re: [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing
On Mon, 5 Aug 2013, Jaeyong Yoo wrote:> > -----Original Message----- > > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com] > > Sent: Monday, August 05, 2013 1:28 AM > > To: Jaeyong Yoo > > Cc: xen-devel@lists.xen.org > > Subject: Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling write > > fault for dirty-page tracing > > > > On Thu, 1 Aug 2013, Jaeyong Yoo wrote: > > > Add handling write fault in do_trap_data_abort_guest for dirty-page > > tracing. > > > Rather than maintaining a bitmap for dirty pages, we use the avail bit > > in p2m entry. > > > For locating the write fault pte in guest p2m, we use virtual-linear > > > page table that slots guest p2m into xen''s virtual memory. > > > > > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > > > > Looks good to me. > > I would appreciated some more comments in the code to explain the inner > > working of the vlp2m. > I got it. > > One question: If you see patch #6, it implements the allocation and free of > vlp2m memory (xen/arch/arm/vlpt.c) which is almost the same to vmap > allocation (xen/arch/arm/vmap.c). To be honest, I copied vmap.c and change > the virtual address start/end points and the name. While I was doing that, > I think it would be better if we naje a common interface, something like > Virtual address allocator. That is, if we create a virtual address allocator > > giving the VA range from A to B, the allocator allocates the VA in between > A and B. And, we initialize the virtual allocator instance at boot stage.Good question. I think it might be best to improve the current vmap (it''s actually xen/common/vmap.c) so that we can have multiple vmap instances for different virtual address ranges at the same time.
Jaeyong Yoo
2013-Aug-05 11:39 UTC
Re: [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing
> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of Stefano Stabellini > Sent: Monday, August 05, 2013 8:11 PM > To: Jaeyong Yoo > Cc: xen-devel@lists.xen.org; ''Stefano Stabellini'' > Subject: Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling write > fault for dirty-page tracing > > On Mon, 5 Aug 2013, Jaeyong Yoo wrote: > > > -----Original Message----- > > > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com] > > > Sent: Monday, August 05, 2013 1:28 AM > > > To: Jaeyong Yoo > > > Cc: xen-devel@lists.xen.org > > > Subject: Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling > > > write fault for dirty-page tracing > > > > > > On Thu, 1 Aug 2013, Jaeyong Yoo wrote: > > > > Add handling write fault in do_trap_data_abort_guest for > > > > dirty-page > > > tracing. > > > > Rather than maintaining a bitmap for dirty pages, we use the avail > > > > bit > > > in p2m entry. > > > > For locating the write fault pte in guest p2m, we use > > > > virtual-linear page table that slots guest p2m into xen''s virtual > memory. > > > > > > > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > > > > > > Looks good to me. > > > I would appreciated some more comments in the code to explain the > > > inner working of the vlp2m. > > I got it. > > > > One question: If you see patch #6, it implements the allocation and > > free of vlp2m memory (xen/arch/arm/vlpt.c) which is almost the same to > > vmap allocation (xen/arch/arm/vmap.c). To be honest, I copied vmap.c > > and change the virtual address start/end points and the name. While I > > was doing that, I think it would be better if we naje a common > > interface, something like Virtual address allocator. That is, if we > > create a virtual address allocator > > > > giving the VA range from A to B, the allocator allocates the VA in > > between A and B. And, we initialize the virtual allocator instance at > boot stage. > > Good question. I think it might be best to improve the current vmap (it''s > actually xen/common/vmap.c) so that we can have multiple vmap instances > for different virtual address ranges at the same time.This looks interesting to me. Could I provide the patch for this? Jaeyong> > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Stefano Stabellini
2013-Aug-05 13:49 UTC
Re: [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing
On Mon, 5 Aug 2013, Jaeyong Yoo wrote:> > -----Original Message----- > > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > > bounces@lists.xen.org] On Behalf Of Stefano Stabellini > > Sent: Monday, August 05, 2013 8:11 PM > > To: Jaeyong Yoo > > Cc: xen-devel@lists.xen.org; ''Stefano Stabellini'' > > Subject: Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling write > > fault for dirty-page tracing > > > > On Mon, 5 Aug 2013, Jaeyong Yoo wrote: > > > > -----Original Message----- > > > > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com] > > > > Sent: Monday, August 05, 2013 1:28 AM > > > > To: Jaeyong Yoo > > > > Cc: xen-devel@lists.xen.org > > > > Subject: Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling > > > > write fault for dirty-page tracing > > > > > > > > On Thu, 1 Aug 2013, Jaeyong Yoo wrote: > > > > > Add handling write fault in do_trap_data_abort_guest for > > > > > dirty-page > > > > tracing. > > > > > Rather than maintaining a bitmap for dirty pages, we use the avail > > > > > bit > > > > in p2m entry. > > > > > For locating the write fault pte in guest p2m, we use > > > > > virtual-linear page table that slots guest p2m into xen''s virtual > > memory. > > > > > > > > > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > > > > > > > > Looks good to me. > > > > I would appreciated some more comments in the code to explain the > > > > inner working of the vlp2m. > > > I got it. > > > > > > One question: If you see patch #6, it implements the allocation and > > > free of vlp2m memory (xen/arch/arm/vlpt.c) which is almost the same to > > > vmap allocation (xen/arch/arm/vmap.c). To be honest, I copied vmap.c > > > and change the virtual address start/end points and the name. While I > > > was doing that, I think it would be better if we naje a common > > > interface, something like Virtual address allocator. That is, if we > > > create a virtual address allocator > > > > > > giving the VA range from A to B, the allocator allocates the VA in > > > between A and B. And, we initialize the virtual allocator instance at > > boot stage. > > > > Good question. I think it might be best to improve the current vmap (it''s > > actually xen/common/vmap.c) so that we can have multiple vmap instances > > for different virtual address ranges at the same time. > > This looks interesting to me. > Could I provide the patch for this?Yes, that would be great. You could include it in your series, then you could substitute patch #6 with simple calls to vmap.
Ian Campbell
2013-Aug-05 13:52 UTC
Re: [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing
On Mon, 2013-08-05 at 12:11 +0100, Stefano Stabellini wrote:> On Mon, 5 Aug 2013, Jaeyong Yoo wrote: > > > -----Original Message----- > > > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com] > > > Sent: Monday, August 05, 2013 1:28 AM > > > To: Jaeyong Yoo > > > Cc: xen-devel@lists.xen.org > > > Subject: Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling write > > > fault for dirty-page tracing > > > > > > On Thu, 1 Aug 2013, Jaeyong Yoo wrote: > > > > Add handling write fault in do_trap_data_abort_guest for dirty-page > > > tracing. > > > > Rather than maintaining a bitmap for dirty pages, we use the avail bit > > > in p2m entry. > > > > For locating the write fault pte in guest p2m, we use virtual-linear > > > > page table that slots guest p2m into xen''s virtual memory. > > > > > > > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > > > > > > Looks good to me. > > > I would appreciated some more comments in the code to explain the inner > > > working of the vlp2m. > > I got it. > > > > One question: If you see patch #6, it implements the allocation and free of > > vlp2m memory (xen/arch/arm/vlpt.c) which is almost the same to vmap > > allocation (xen/arch/arm/vmap.c). To be honest, I copied vmap.c and change > > the virtual address start/end points and the name. While I was doing that, > > I think it would be better if we naje a common interface, something like > > Virtual address allocator. That is, if we create a virtual address allocator > > > > giving the VA range from A to B, the allocator allocates the VA in between > > A and B. And, we initialize the virtual allocator instance at boot stage. > > Good question. I think it might be best to improve the current vmap > (it''s actually xen/common/vmap.c) so that we can have multiple vmap > instances for different virtual address ranges at the same time.Before we go off and do that: I don''t think this patch implements a linear p2m mapping in the sense in which I intended it when I suggested it. The patch implements a manual lookup with a kind of cache of the resulting mapping, I think. A linear mapping means inserting the current p2m base pointer into Xen''s own pagetables in such a way that you can access a leaf node of the p2m by dereferencing a virtual address. Given this setup there should be no need for on-demand mapping as part of the log-dirty stuff, all the smarts happen at context switch time. Normally a linear memory map is done by creating a loop in the page tables, i.e. HTTBR[N] would contain an entry which referenced HTTBR again. In this case we actually have a separate p2m table which we want to stitch into the normal tables, which makes it a bit different to the classical case. Lets assume both Xen''s page tables and the 2pm are two level, to simplify the ascii art. So for the P2M you have: VTTBR `-------> P2M FIRST `----------> P2M SECOND `-------------GUEST RAM Now if we arrange that Xen''s page tables contains the VTTBR in a top level page table slot: HTTBR `-------> VTTBR `----------> P2M FIRST `-------------P2M SECOND, ACCESSED AS XEN RAM So now Xen can access the leaf PTE''s of the P2M directly just by using the correct virtual address. This can be slightly tricky if P2M FIRST can contain super page mappings, since you need to arrange to stop a level sooner to get the correct PT entry. This means we need to arrange for a second virtual address region which maps to that, by arranging for a loop in the page table, e.g. HTTBR `-------> HTTBR `----------> VTTBR `-------------P2M FIRST, ACCESSED AS XEN RAM Under Xen, which uses LPAE and 3-level tables, I think the P2M SECOND would require 16 first level slots in the xen_second tables, which need to be context switched, the regions needed to hit the super page mappings would need slots too. If we use the gap between 128M and 256M in the Xen memory map then that means we are using xen_second[64..80]=p2m[0..16] for the linear map of the p2m leaf nodes. We can then use xen_second[80..144] to point back to xen_second allowing xen_second[64..80] to be dereferenced and create the loop needed for mapping for the superpage ptes in the P2M. So given VTTBR->P2M FIRST->P2M SECOND->P2M THIRD->GUEST RAM We have in the Xen mappings: HTTBR->XEN_SECOND[64..80]->P2M FIRST[0..16]->P2M SECOND->P2M THIRD AS XEN RAM HTTBR->XEN_SECOND[80..144]->XEN_SECOND(*)->P2M FIRST[0..16]->P2M SECOND AS XEN RAM (*) here we only care about XEN_SECOND[64..80] but the loop maps XEN_SECOND[0..512], a larger region which we can safely ignore. So if my maths is correct this means Xen can access P2M THIRD entries at virtual addresses 0x8000000..0xa000000 and P2M SECOND entries at 0x12000000..0x14000000, which means that the fault handler just needs to lookup the P2M SECOND to check it isn''t super page mapping and then lookup P2M FIRST to mark it dirty etc. If for some reason we also need to access P2M FIRST efficiently we could add a third region, but I don''t think we will be doing 1GB P2M mappings for the time being. It occurs to me now that with 16 slots changing on context switch and a further 16 aliasing them (and hence requiring maintenance too) for the super pages it is possible that the TLB maintenance at context switch might get prohibitively expensive. We could address this by firstly only doing it when switching to/from domains which have log dirty mode enabled and then secondly by seeing if we can make use of global or locked down mappings for the static Xen .text/.data/.xenheap mappings and therefore allow us to use a bigger global flush. In hindsight it might be the case that doing the domain_map_page walk on each lookup might be offset by the need to do all that TLB maintenance on context switch. It may be that this is something which we can only resolve by measuring? BTW, eventually we will have a direct map of all RAM for 64-bit only, so we would likely end up with different schemes for p2m lookups for the two sub arches, since the 64-bit direct map case the domain_map_page is very cheap. I hope my description of a linear map makes sense, hard to do without a whiteboard ;-) Ian.
Jaeyong Yoo
2013-Aug-06 11:56 UTC
Re: [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing
> -----Original Message----- > From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > Sent: Monday, August 05, 2013 10:53 PM > To: Stefano Stabellini > Cc: Jaeyong Yoo; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling write > fault for dirty-page tracing > > On Mon, 2013-08-05 at 12:11 +0100, Stefano Stabellini wrote: > > On Mon, 5 Aug 2013, Jaeyong Yoo wrote: > > > > -----Original Message----- > > > > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com] > > > > Sent: Monday, August 05, 2013 1:28 AM > > > > To: Jaeyong Yoo > > > > Cc: xen-devel@lists.xen.org > > > > Subject: Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling > > > > write fault for dirty-page tracing > > > > > > > > On Thu, 1 Aug 2013, Jaeyong Yoo wrote: > > > > > Add handling write fault in do_trap_data_abort_guest for > > > > > dirty-page > > > > tracing. > > > > > Rather than maintaining a bitmap for dirty pages, we use the > > > > > avail bit > > > > in p2m entry. > > > > > For locating the write fault pte in guest p2m, we use > > > > > virtual-linear page table that slots guest p2m into xen''s virtual > memory. > > > > > > > > > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > > > > > > > > Looks good to me. > > > > I would appreciated some more comments in the code to explain the > > > > inner working of the vlp2m. > > > I got it. > > > > > > One question: If you see patch #6, it implements the allocation and > > > free of vlp2m memory (xen/arch/arm/vlpt.c) which is almost the same > > > to vmap allocation (xen/arch/arm/vmap.c). To be honest, I copied > > > vmap.c and change the virtual address start/end points and the name. > > > While I was doing that, I think it would be better if we naje a > > > common interface, something like Virtual address allocator. That is, > > > if we create a virtual address allocator > > > > > > giving the VA range from A to B, the allocator allocates the VA in > > > between A and B. And, we initialize the virtual allocator instance at > boot stage. > > > > Good question. I think it might be best to improve the current vmap > > (it''s actually xen/common/vmap.c) so that we can have multiple vmap > > instances for different virtual address ranges at the same time. > > Before we go off and do that: > > I don''t think this patch implements a linear p2m mapping in the sense in > which I intended it when I suggested it. The patch implements a manual > lookup with a kind of cache of the resulting mapping, I think. > > A linear mapping means inserting the current p2m base pointer into Xen''s > own pagetables in such a way that you can access a leaf node of the p2m by > dereferencing a virtual address. Given this setup there should be no need > for on-demand mapping as part of the log-dirty stuff, all the smarts > happen at context switch time. > > Normally a linear memory map is done by creating a loop in the page tables, > i.e. HTTBR[N] would contain an entry which referenced HTTBR again. In this > case we actually have a separate p2m table which we want to stitch into > the normal tables, which makes it a bit different to the classical case. > > Lets assume both Xen''s page tables and the 2pm are two level, to simplify > the ascii art. > > So for the P2M you have: > VTTBR > `-------> P2M FIRST > `----------> P2M SECOND > `-------------GUEST RAM > > Now if we arrange that Xen''s page tables contains the VTTBR in a top level > page table slot: > HTTBR > `-------> VTTBR > `----------> P2M FIRST > `-------------P2M SECOND, ACCESSED AS XEN RAM > > So now Xen can access the leaf PTE''s of the P2M directly just by using the > correct virtual address. > > This can be slightly tricky if P2M FIRST can contain super page mappings, > since you need to arrange to stop a level sooner to get the correct PT > entry. This means we need to arrange for a second virtual address region > which maps to that, by arranging for a loop in the page table, e.g. > > HTTBR > `-------> HTTBR > `----------> VTTBR > `-------------P2M FIRST, ACCESSED AS XEN RAM > > Under Xen, which uses LPAE and 3-level tables, I think the P2M SECOND > would require 16 first level slots in the xen_second tables, which need to > be context switched, the regions needed to hit the super page mappings > would need slots too. If we use the gap between 128M and 256M in the Xen > memory map then that means we are using xen_second[64..80]=p2m[0..16] for > the linear map of the p2m leaf nodes. > We can then use xen_second[80..144] to point back to xen_second allowing > xen_second[64..80] to be dereferenced and create the loop needed for > mapping for the superpage ptes in the P2M. > > So given > VTTBR->P2M FIRST->P2M SECOND->P2M THIRD->GUEST RAM > > We have in the Xen mappings: > HTTBR->XEN_SECOND[64..80]->P2M FIRST[0..16]->P2M SECOND->P2M THIRD AS > HTTBR->XEN RAM XEN_SECOND[80..144]->XEN_SECOND(*)->P2M FIRST[0..16]->P2M > HTTBR->SECOND AS XEN RAM > > (*) here we only care about XEN_SECOND[64..80] but the loop maps > XEN_SECOND[0..512], a larger region which we can safely ignore. > > So if my maths is correct this means Xen can access P2M THIRD entries at > virtual addresses 0x8000000..0xa000000 and P2M SECOND entries at > 0x12000000..0x14000000, which means that the fault handler just needs to > lookup the P2M SECOND to check it isn''t super page mapping and then lookup > P2M FIRST to mark it dirty etc. > > If for some reason we also need to access P2M FIRST efficiently we could > add a third region, but I don''t think we will be doing 1GB P2M mappings > for the time being. > > It occurs to me now that with 16 slots changing on context switch and a > further 16 aliasing them (and hence requiring maintenance too) for the > super pages it is possible that the TLB maintenance at context switch > might get prohibitively expensive. We could address this by firstly only > doing it when switching to/from domains which have log dirty mode enabled > and then secondly by seeing if we can make use of global or locked down > mappings for the static Xen .text/.data/.xenheap mappings and therefore > allow us to use a bigger global flush. > > In hindsight it might be the case that doing the domain_map_page walk on > each lookup might be offset by the need to do all that TLB maintenance on > context switch. It may be that this is something which we can only resolve > by measuring? > > BTW, eventually we will have a direct map of all RAM for 64-bit only, so > we would likely end up with different schemes for p2m lookups for the two > sub arches, since the 64-bit direct map case the domain_map_page is very > cheap. > > I hope my description of a linear map makes sense, hard to do without a > whiteboard ;-)Thanks a lot for the ascii art! Even without whiteboard, it works very nicely for me :) I think I understand your points. Previously, in my implementation, I created the Xen mapping to leaf PTE of the P2M by looking up the guest leaf p2m and create_xen_table, but everything could be better if I just map the xen_second to the guest''s P2M first. Then, by just reading the correct VA, I can immediately access leaf PTE of guest p2m. As a minor issue, I don''t correctly understand your numbers within XEN_SECOND[64..80] for p2m third and XEN_SECOND[80..144] for p2m second. I think p2m third should have larger VA ranges than the one in p2m second. If I''m not mistaken, if we try to migrate domU with memory size 4GB, it requires VA sizes of 8MB for p2m third and 16KB for p2m second. Since we have 128MB size for vlpt, how about allocating vlpt for different ranges within 128MB to each migrating domU? This way, we don''t need to context switch the xen second page tables. Although it limits the simultaneous live migration with large memory size DomU''s, but for ARM, I think it is reasonable. Best, Jaeyong> > Ian.
Ian Campbell
2013-Aug-06 13:17 UTC
Re: [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing
On Tue, 2013-08-06 at 20:56 +0900, Jaeyong Yoo wrote:> > I hope my description of a linear map makes sense, hard to do without a > > whiteboard ;-) > > Thanks a lot for the ascii art! Even without whiteboard, it works very > nicely for me :)Oh good, I was worried I''d made it very confusing !> > I think I understand your points. Previously, in my implementation, I > created the Xen mapping to leaf PTE of the P2M by looking up the guest > leaf p2m and create_xen_table, but everything could be better if I just > map the xen_second to the guest''s P2M first. Then, by just reading the > correct VA, I can immediately access leaf PTE of guest p2m.Correct.> As a minor issue, I don''t correctly understand your numbers within > XEN_SECOND[64..80] for p2m third and XEN_SECOND[80..144] for p2m second. > I think p2m third should have larger VA ranges than the one in p2m second.The numbers are slot number (i.e. entries) within the xen_second table. I think I was just a bit confused, the p2m_second entry only actually needs one entry I think, since you just need it to loopback to the entry containing the p2m_third mapping.> If I''m not mistaken, if we try to migrate domU with memory size 4GB, it > requires VA sizes of 8MB for p2m third and 16KB for p2m second.Sounds about right. The reason I used 16 slots is that we, at least in theory, support domU memory size up to 16GB and each slot == 1GB.> Since we have 128MB size for vlpt, how about allocating vlpt for different ranges > within 128MB to each migrating domU? This way, we don''t need to context > switch the xen second page tables. Although it limits the simultaneous > live migration with large memory size DomU''s, but for ARM, I think it is > reasonable.You''d only be able to fit 4, I think, if supporting 16GB guests. I''m not sure how I feel about making a slightly random limitation like that. Why don''t we just context switch the slots for now, only for domains where log dirty is enabled, and then we can measure and see how bad it is etc. Ian.
Jaeyong Yoo
2013-Aug-07 01:24 UTC
Re: [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing
> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of Ian Campbell > Sent: Tuesday, August 06, 2013 10:17 PM > To: Jaeyong Yoo > Cc: xen-devel@lists.xen.org; ''Stefano Stabellini'' > Subject: Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling write > fault for dirty-page tracing > > On Tue, 2013-08-06 at 20:56 +0900, Jaeyong Yoo wrote: > > > I hope my description of a linear map makes sense, hard to do > > > without a whiteboard ;-) > > > > Thanks a lot for the ascii art! Even without whiteboard, it works very > > nicely for me :) > > Oh good, I was worried I''d made it very confusing ! > > > > > I think I understand your points. Previously, in my implementation, I > > created the Xen mapping to leaf PTE of the P2M by looking up the guest > > leaf p2m and create_xen_table, but everything could be better if I > > just map the xen_second to the guest''s P2M first. Then, by just > > reading the correct VA, I can immediately access leaf PTE of guest p2m. > > Correct. > > > As a minor issue, I don''t correctly understand your numbers within > > XEN_SECOND[64..80] for p2m third and XEN_SECOND[80..144] for p2m second. > > I think p2m third should have larger VA ranges than the one in p2m > second. > > The numbers are slot number (i.e. entries) within the xen_second table. > > I think I was just a bit confused, the p2m_second entry only actually > needs one entry I think, since you just need it to loopback to the entry > containing the p2m_third mapping. > > > If I''m not mistaken, if we try to migrate domU with memory size 4GB, > > it requires VA sizes of 8MB for p2m third and 16KB for p2m second. > > Sounds about right. The reason I used 16 slots is that we, at least in > theory, support domU memory size up to 16GB and each slot == 1GB. > > > Since we have 128MB size for vlpt, how about allocating vlpt for > > different ranges within 128MB to each migrating domU? This way, we > > don''t need to context switch the xen second page tables. Although it > > limits the simultaneous live migration with large memory size DomU''s, > > but for ARM, I think it is reasonable. > > You''d only be able to fit 4, I think, if supporting 16GB guests. I''m not > sure how I feel about making a slightly random limitation like that. > > Why don''t we just context switch the slots for now, only for domains where > log dirty is enabled, and then we can measure and see how bad it is etc.OK. It is no problem.> > Ian. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Jaeyong Yoo
2013-Aug-15 04:24 UTC
Re: [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing
> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of Ian Campbell > Sent: Tuesday, August 06, 2013 10:17 PM > To: Jaeyong Yoo > Cc: xen-devel@lists.xen.org; ''Stefano Stabellini'' > Subject: Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling write > fault for dirty-page tracing > > On Tue, 2013-08-06 at 20:56 +0900, Jaeyong Yoo wrote: > > > I hope my description of a linear map makes sense, hard to do > > > without a whiteboard ;-) > > > > Thanks a lot for the ascii art! Even without whiteboard, it works very > > nicely for me :) > > Oh good, I was worried I''d made it very confusing ! > > > > > I think I understand your points. Previously, in my implementation, I > > created the Xen mapping to leaf PTE of the P2M by looking up the guest > > leaf p2m and create_xen_table, but everything could be better if I > > just map the xen_second to the guest''s P2M first. Then, by just > > reading the correct VA, I can immediately access leaf PTE of guest p2m. > > Correct. > > > As a minor issue, I don''t correctly understand your numbers within > > XEN_SECOND[64..80] for p2m third and XEN_SECOND[80..144] for p2m second. > > I think p2m third should have larger VA ranges than the one in p2m > second. > > The numbers are slot number (i.e. entries) within the xen_second table. > > I think I was just a bit confused, the p2m_second entry only actually > needs one entry I think, since you just need it to loopback to the entry > containing the p2m_third mapping. > > > If I''m not mistaken, if we try to migrate domU with memory size 4GB, > > it requires VA sizes of 8MB for p2m third and 16KB for p2m second. > > Sounds about right. The reason I used 16 slots is that we, at least in > theory, support domU memory size up to 16GB and each slot == 1GB. > > > Since we have 128MB size for vlpt, how about allocating vlpt for > > different ranges within 128MB to each migrating domU? This way, we > > don''t need to context switch the xen second page tables. Although it > > limits the simultaneous live migration with large memory size DomU''s, > > but for ARM, I think it is reasonable. > > You''d only be able to fit 4, I think, if supporting 16GB guests. I''m not > sure how I feel about making a slightly random limitation like that. > > Why don''t we just context switch the slots for now, only for domains where > log dirty is enabled, and then we can measure and see how bad it is etc.Here goes the measurement results: For better understanding of trade-off between vlpt and page-table walk in dirty-page handling, let''s consider the following two cases: - Migrating a single domain at a time: - Migrating multiple domains concurrently: For each case, the metrics that we are going to see is the following: - page-table walk overhead: for handling a single dirty-page, page-table requires 6us and vlpt (improved version) requires 1.5us. From this, we consider 4.5 us for pure overhead compared to vlpt. And it happens every dirty-pages. - vlpt overhead: the only vlpt overhead is the flushes at context switch. And flushing 34MB (which is for supporting 16GB domU) virtual address range requires 130us. And it happens when two migrating domUs are contexted switched. Here goes the results: - Migrating a domain at a time: * page-table walk overhead: 4.5us * 611 times = 2.7ms * vlpt overhead: 0 (no flush required) - Migrating two domains concurrently: * page-table walk overhead: 4.5us * 8653 times = 39 ms * vlpt overhead: 130us * 357 times = 46 ms Although page-table walk gives little bit better performance in migrating two domains, I think it is better to choose vlpt due to the following reasons: - In the above tests, I did not run any workloads at migrating domU, and IIRC, when I run gzip or bonnie++ in domU, the dirty-pages grow to few thousands. Then, page-table walk overhead becomes few hundred milli-seconds even in migrating a domain. - I would expect that migrating a single domain would be used more Frequently than migrating multiple domains at a time. One more thing: regarding your comments about tlb lockdown, which is:> It occurs to me now that with 16 slots changing on context switch and > a further 16 aliasing them (and hence requiring maintenance too) for > the super pages it is possible that the TLB maintenance at context > switch might get prohibitively expensive. We could address this by > firstly only doing it when switching to/from domains which have log > dirty mode enabled and then secondly by seeing if we can make use of > global or locked down mappings for the static Xen .text/.data/.xenheap > mappings and therefore allow us to use a bigger global flush.Unfortunately Cortex A15 looks like not supporting tlb lockdown. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0438d/CHDGEDA E.html And, I am not sure that setting global of page table entry prevents being flushed from TLB flush operation. If it works, we may decrease the vlpt overhead a lot. Jaeyong> > Ian. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2013-Aug-17 22:16 UTC
Re: [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing
On Thu, 2013-08-15 at 13:24 +0900, Jaeyong Yoo wrote:> > Why don''t we just context switch the slots for now, only for domains where > > log dirty is enabled, and then we can measure and see how bad it is etc. > > > Here goes the measurement results:Wow, that was quick, thanks.> For better understanding of trade-off between vlpt and page-table > walk in dirty-page handling, let''s consider the following two cases: > - Migrating a single domain at a time: > - Migrating multiple domains concurrently: > > For each case, the metrics that we are going to see is the following: > - page-table walk overhead: for handling a single dirty-page, > page-table requires 6us and vlpt (improved version) requires 1.5us. > From this, we consider 4.5 us for pure overhead compared to vlpt. > And it happens every dirty-pages.map_domain_page is has a hash table structure in which the PTE entires are reference counted, however we don''t clear the pte when the ref reaches 0 so if we immediately use it again we don''t need to flush. But we may need to flush if there is a hash table collision. So in practice there will be a bit more overhead, I''m not sure how significant that will be. I suppose the chance of collision depends on the side of the guest.> - vlpt overhead: the only vlpt overhead is the flushes at context > switch. And flushing 34MB (which is for supporting 16GB domU) > virtual address range requires 130us. And it happens when two > migrating domUs are contexted switched. > > Here goes the results: > > - Migrating a domain at a time: > * page-table walk overhead: 4.5us * 611 times = 2.7ms > * vlpt overhead: 0 (no flush required) > > - Migrating two domains concurrently: > * page-table walk overhead: 4.5us * 8653 times = 39 ms > * vlpt overhead: 130us * 357 times = 46 msThe 611, 8653 and 357''s in here are from an actual test, right? Out of interest what was the total time for each case?> Although page-table walk gives little bit better performance in > migrating two domains, I think it is better to choose vlpt due to > the following reasons: > - In the above tests, I did not run any workloads at migrating domU, > and IIRC, when I run gzip or bonnie++ in domU, the dirty-pages grow > to few thousands. Then, page-table walk overhead becomes few hundred > milli-seconds even in migrating a domain. > - I would expect that migrating a single domain would be used more > Frequently than migrating multiple domains at a time.Both of those seem like sound arguments to me.> One more thing: regarding your comments about tlb lockdown, which is: > > It occurs to me now that with 16 slots changing on context switch and > > a further 16 aliasing them (and hence requiring maintenance too) for > > the super pages it is possible that the TLB maintenance at context > > switch might get prohibitively expensive. We could address this by > > firstly only doing it when switching to/from domains which have log > > dirty mode enabled and then secondly by seeing if we can make use of > > global or locked down mappings for the static Xen .text/.data/.xenheap > > mappings and therefore allow us to use a bigger global flush. > > Unfortunately Cortex A15 looks like not supporting tlb lockdown. > http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0438d/CHDGEDA > E.htmlOh well.> And, I am not sure that setting global of page table entry prevents being > flushed from TLB flush operation. > If it works, we may decrease the vlpt overhead a lot.yes, this is something to investigate, but not urgently I don''t think. Ian.
Ian Campbell
2013-Aug-17 22:21 UTC
Re: [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing
On Sat, 2013-08-17 at 23:16 +0100, Ian Campbell wrote:> > And, I am not sure that setting global of page table entry prevents being > > flushed from TLB flush operation. > > If it works, we may decrease the vlpt overhead a lot. > > yes, this is something to investigate, but not urgently I don''t think.Except I''ve just had a look and there is no support for the nG bit in the NS PL2 stage one page tables. Which makes sense since nG is tied to ASIDs and there is no such concept at NS PL2. Oh well. Ian.
Julien Grall
2013-Aug-17 23:51 UTC
Re: [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing
On Thu, Aug 1, 2013 at 1:57 PM, Jaeyong Yoo <jaeyong.yoo@samsung.com> wrote:> Add handling write fault in do_trap_data_abort_guest for dirty-page tracing. > Rather than maintaining a bitmap for dirty pages, we use the avail bit in p2m entry. > For locating the write fault pte in guest p2m, we use virtual-linear page table > that slots guest p2m into xen''s virtual memory. > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > --- > xen/arch/arm/mm.c | 110 +++++++++++++++++++++++++++++++++++++++- > xen/arch/arm/traps.c | 16 +++++- > xen/include/asm-arm/domain.h | 11 ++++ > xen/include/asm-arm/mm.h | 5 ++ > xen/include/asm-arm/processor.h | 2 + > 5 files changed, 142 insertions(+), 2 deletions(-) > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index 9d5d3e0..a24afe6 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -680,7 +680,6 @@ void destroy_xen_mappings(unsigned long v, unsigned long e) > create_xen_entries(REMOVE, v, 0, (e - v) >> PAGE_SHIFT, 0); > } > > -enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; > static void set_pte_flags_on_range(const char *p, unsigned long l, enum mg mg) > { > lpae_t pte; > @@ -1214,6 +1213,115 @@ int is_iomem_page(unsigned long mfn) > return 1; > return 0; > } > + > +static uint64_t find_guest_p2m_mfn(struct domain *d, paddr_t addr) > +{ > + lpae_t *first = NULL, *second = NULL; > + struct p2m_domain *p2m = &d->arch.p2m; > + uint64_t mfn = -EFAULT; > + > + if ( first_table_offset(addr) >= LPAE_ENTRIES ) > + return mfn; > + > + first = __map_domain_page(p2m->first_level); > + > + if ( !first || > + !first[first_table_offset(addr)].walk.valid || > + !first[first_table_offset(addr)].walk.table ) > + goto done; > + > + second = map_domain_page(first[first_table_offset(addr)].walk.base); > + > + if ( !second || > + !second[second_table_offset(addr)].walk.valid || > + !second[second_table_offset(addr)].walk.table ) > + goto done; > + > + mfn = second[second_table_offset(addr)].walk.base; > + > +done: > + if ( second ) unmap_domain_page(second); > + if ( first ) unmap_domain_page(first); > + > + return mfn; > +} > + > +/* > + * routine for dirty-page tracing > + * > + * On first write, it page faults, its entry is changed to read-write, > + * and on retry the write succeeds. > + * > + * for locating p2m of the faulting entry, we use virtual-linear page table. > + */ > +int handle_page_fault(struct domain *d, paddr_t addr) > +{ > + int rc = 0; > + struct p2m_domain *p2m = &d->arch.p2m; > + uint64_t gma_start; > + int gma_third_index; > + int xen_second_linear, xen_third_table; > + lpae_t *xen_third; > + lpae_t *vlp2m_pte; > + > + BUG_ON( !d->arch.map_domain.nr_banks ); > + > + gma_start = d->arch.map_domain.bank[0].start; > + gma_third_index = third_linear_offset(addr - gma_start); > + vlp2m_pte = (lpae_t *)(d->arch.dirty.vlpt_start + > + sizeof(lpae_t) * gma_third_index); > + > + BUG_ON( (void *)vlp2m_pte > d->arch.dirty.vlpt_end ); > + > + spin_lock(&p2m->lock); > + > + xen_second_linear = second_linear_offset((unsigned long)vlp2m_pte); > + xen_third_table = third_table_offset((unsigned long)vlp2m_pte); > + > + /* starting from xen second level page table */ > + if ( !xen_second[xen_second_linear].pt.valid ) > + { > + unsigned long va = (unsigned long)vlp2m_pte & ~(PAGE_SIZE-1); > + > + rc = create_xen_table(&xen_second[second_linear_offset(va)]); > + if ( rc < 0 ) > + goto out; > + } > + > + BUG_ON( !xen_second[xen_second_linear].pt.valid ); > + > + /* at this point, xen second level pt has valid entry > + * check again the validity of third level pt */ > + xen_third = __va(pfn_to_paddr(xen_second[xen_second_linear].pt.base)); > + > + /* xen third-level page table invalid */ > + if ( !xen_third[xen_third_table].p2m.valid ) > + { > + uint64_t mfn = find_guest_p2m_mfn(d, addr); > + lpae_t pte = mfn_to_xen_entry(mfn); > + unsigned long va = (unsigned long)vlp2m_pte & ~(PAGE_SIZE-1); > + > + pte.pt.table = 1; /* 4k mappings always have this bit set */ > + write_pte(&xen_third[xen_third_table], pte); > + flush_xen_data_tlb_range_va(va, PAGE_SIZE); > + } > + > + /* at this point, xen third level pt has valid entry: means we can access > + * vlp2m_pte vlp2m_pte is like a fourth level pt for xen, but for guest, > + * it is third level pt */ > + if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0 ) > + { > + vlp2m_pte->p2m.write = 1; > + vlp2m_pte->p2m.avail = 1; > + write_pte(vlp2m_pte, *vlp2m_pte); > + flush_tlb_local(); > + } > + > +out: > + spin_unlock(&p2m->lock); > + return rc; > +} > + > /* > * Local variables: > * mode: C > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c > index 1b9209d..f844f56 100644 > --- a/xen/arch/arm/traps.c > +++ b/xen/arch/arm/traps.c > @@ -1226,7 +1226,12 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, > goto bad_data_abort; > > /* XXX: Decode the instruction if ISS is not valid */ > - if ( !dabt.valid ) > + /* Note: add additional check before goto bad_data_abort. dabt.valid > + * bit is for telling the validity of ISS[23:16] bits. For dirty-page > + * tracing, we need to see DFSC bits. If DFSC bits are indicating the > + * possibility of dirty page tracing, do not go to bad_data_abort */ > + if ( !dabt.valid && > + (dabt.dfsc & FSC_MASK) != (FSC_FLT_PERM + FSC_3D_LEVEL) && dabt.write)It''s better to use | for bitmask instead of +. handle_mmio should not be called with dabt.valid == 0. If I understand correctly your patch, this case can happen.> goto bad_data_abort; > > if (handle_mmio(&info)) > @@ -1235,6 +1240,15 @@ static void do_trap_data_abort_guest(struct cpu_user_regs *regs, > return; > } > > + /* handle permission fault on write */ > + if ( (dabt.dfsc & FSC_MASK) == (FSC_FLT_PERM + FSC_3D_LEVEL) && dabt.write )Same here.> + { > + if ( current->domain->arch.dirty.mode == 0 ) > + goto bad_data_abort; > + if ( handle_page_fault(current->domain, info.gpa) == 0 ) > + return; > + } > + > bad_data_abort: > > msg = decode_fsc( dabt.dfsc, &level); > diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h > index 0c80c65..413b89a 100644 > --- a/xen/include/asm-arm/domain.h > +++ b/xen/include/asm-arm/domain.h > @@ -110,6 +110,17 @@ struct arch_domain > spinlock_t lock; > } uart0; > > + /* dirty-page tracing */ > + struct { > + spinlock_t lock; > + int mode; > + unsigned int count; > + uint32_t gmfn_guest_start; /* guest physical memory start address */ > + void *vlpt_start; /* va-start of guest p2m */ > + void *vlpt_end; /* va-end of guest p2m */ > + struct page_info *head; /* maintain the mapped vaddrs */ > + } dirty; > + > struct dt_mem_info map_domain; > spinlock_t map_lock; > } __cacheline_aligned; > diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h > index 404ec4d..fd976e3 100644 > --- a/xen/include/asm-arm/mm.h > +++ b/xen/include/asm-arm/mm.h > @@ -328,6 +328,11 @@ static inline void put_page_and_type(struct page_info *page) > put_page(page); > } > > +enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; > + > +/* routine for dirty-page tracing */ > +int handle_page_fault(struct domain *d, paddr_t addr); > + > #endif /* __ARCH_ARM_MM__ */ > /* > * Local variables: > diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h > index 06b0b25..34c21de 100644 > --- a/xen/include/asm-arm/processor.h > +++ b/xen/include/asm-arm/processor.h > @@ -383,6 +383,8 @@ union hsr { > #define FSC_CPR (0x3a) /* Coprocossor Abort */ > > #define FSC_LL_MASK (0x03<<0) > +#define FSC_MASK (0x3f) /* Fault status mask */ > +#define FSC_3D_LEVEL (0x03) /* Third level fault*/ > > /* Time counter hypervisor control register */ > #define CNTHCTL_PA (1u<<0) /* Kernel/user access to physical counter */-- Julien Grall
Ian Campbell
2013-Aug-18 06:39 UTC
Re: [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing
On Thu, 2013-08-15 at 13:24 +0900, Jaeyong Yoo wrote:> For better understanding of trade-off between vlpt and page-table > walk in dirty-page handling,I meant to ask before: "vlpt" is the implementation arising from linking the p2m into Xen''s own (stage 1) pagetable rather than the initial vmap-like implementation which was posted, right? Ian.
Jaeyong Yoo
2013-Aug-20 10:15 UTC
Re: [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing
> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of Ian Campbell > Sent: Sunday, August 18, 2013 7:16 AM > To: Jaeyong Yoo > Cc: ''Stefano Stabellini''; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling write > fault for dirty-page tracing > > On Thu, 2013-08-15 at 13:24 +0900, Jaeyong Yoo wrote: > > > > Why don''t we just context switch the slots for now, only for domains > > > where log dirty is enabled, and then we can measure and see how bad it > is etc. > > > > > > Here goes the measurement results: > > Wow, that was quick, thanks.Your explanation with ascii art does help a lot. Thanks again!> > > For better understanding of trade-off between vlpt and page-table walk > > in dirty-page handling, let''s consider the following two cases: > > - Migrating a single domain at a time: > > - Migrating multiple domains concurrently: > > > > For each case, the metrics that we are going to see is the following: > > - page-table walk overhead: for handling a single dirty-page, > > page-table requires 6us and vlpt (improved version) requires 1.5us. > > From this, we consider 4.5 us for pure overhead compared to vlpt. > > And it happens every dirty-pages. > > map_domain_page is has a hash table structure in which the PTE entires are > reference counted, however we don''t clear the pte when the ref reaches 0 > so if we immediately use it again we don''t need to flush. But we may need > to flush if there is a hash table collision. So in practice there will be > a bit more overhead, I''m not sure how significant that will be. I suppose > the chance of collision depends on the side of the guest.Yes, right. Overhead for unmap_domain_page may be under-estimated.> > > - vlpt overhead: the only vlpt overhead is the flushes at context > > switch. And flushing 34MB (which is for supporting 16GB domU) > > virtual address range requires 130us. And it happens when two > > migrating domUs are contexted switched. > > > > Here goes the results: > > > > - Migrating a domain at a time: > > * page-table walk overhead: 4.5us * 611 times = 2.7ms > > * vlpt overhead: 0 (no flush required) > > > > - Migrating two domains concurrently: > > * page-table walk overhead: 4.5us * 8653 times = 39 ms > > * vlpt overhead: 130us * 357 times = 46 ms > > The 611, 8653 and 357''s in here are from an actual test, right? > > Out of interest what was the total time for each case? > > > Although page-table walk gives little bit better performance in > > migrating two domains, I think it is better to choose vlpt due to the > > following reasons: > > - In the above tests, I did not run any workloads at migrating domU, > > and IIRC, when I run gzip or bonnie++ in domU, the dirty-pages grow > > to few thousands. Then, page-table walk overhead becomes few hundred > > milli-seconds even in migrating a domain. > > - I would expect that migrating a single domain would be used more > > Frequently than migrating multiple domains at a time. > > Both of those seem like sound arguments to me. > > > One more thing: regarding your comments about tlb lockdown, which is: > > > It occurs to me now that with 16 slots changing on context switch > > > and a further 16 aliasing them (and hence requiring maintenance too) > > > for the super pages it is possible that the TLB maintenance at > > > context switch might get prohibitively expensive. We could address > > > this by firstly only doing it when switching to/from domains which > > > have log dirty mode enabled and then secondly by seeing if we can > > > make use of global or locked down mappings for the static Xen > > > .text/.data/.xenheap mappings and therefore allow us to use a bigger > global flush. > > > > Unfortunately Cortex A15 looks like not supporting tlb lockdown. > > http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0438d/C > > HDGEDA > > E.html > > Oh well. > > > And, I am not sure that setting global of page table entry prevents > > being flushed from TLB flush operation. > > If it works, we may decrease the vlpt overhead a lot. > > yes, this is something to investigate, but not urgently I don''t think.Got it. Making it absolutely stable is more important, I think.> > Ian. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Jaeyong Yoo
2013-Aug-20 10:16 UTC
Re: [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing
> -----Original Message----- > From: julien.grall@gmail.com [mailto:julien.grall@gmail.com] On Behalf Of > Julien Grall > Sent: Sunday, August 18, 2013 8:51 AM > To: Jaeyong Yoo > Cc: xen-devel@lists.xen.org; Stefano Stabellini; Ian Campbell > Subject: Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling write > fault for dirty-page tracing > > On Thu, Aug 1, 2013 at 1:57 PM, Jaeyong Yoo <jaeyong.yoo@samsung.com> > wrote: > > Add handling write fault in do_trap_data_abort_guest for dirty-page > tracing. > > Rather than maintaining a bitmap for dirty pages, we use the avail bit > in p2m entry. > > For locating the write fault pte in guest p2m, we use virtual-linear > > page table that slots guest p2m into xen''s virtual memory. > > > > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com> > > --- > > xen/arch/arm/mm.c | 110 > +++++++++++++++++++++++++++++++++++++++- > > xen/arch/arm/traps.c | 16 +++++- > > xen/include/asm-arm/domain.h | 11 ++++ > > xen/include/asm-arm/mm.h | 5 ++ > > xen/include/asm-arm/processor.h | 2 + > > 5 files changed, 142 insertions(+), 2 deletions(-) > > > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index > > 9d5d3e0..a24afe6 100644 > > --- a/xen/arch/arm/mm.c > > +++ b/xen/arch/arm/mm.c > > @@ -680,7 +680,6 @@ void destroy_xen_mappings(unsigned long v, unsigned > long e) > > create_xen_entries(REMOVE, v, 0, (e - v) >> PAGE_SHIFT, 0); } > > > > -enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; static void > > set_pte_flags_on_range(const char *p, unsigned long l, enum mg mg) { > > lpae_t pte; > > @@ -1214,6 +1213,115 @@ int is_iomem_page(unsigned long mfn) > > return 1; > > return 0; > > } > > + > > +static uint64_t find_guest_p2m_mfn(struct domain *d, paddr_t addr) { > > + lpae_t *first = NULL, *second = NULL; > > + struct p2m_domain *p2m = &d->arch.p2m; > > + uint64_t mfn = -EFAULT; > > + > > + if ( first_table_offset(addr) >= LPAE_ENTRIES ) > > + return mfn; > > + > > + first = __map_domain_page(p2m->first_level); > > + > > + if ( !first || > > + !first[first_table_offset(addr)].walk.valid || > > + !first[first_table_offset(addr)].walk.table ) > > + goto done; > > + > > + second > > + map_domain_page(first[first_table_offset(addr)].walk.base); > > + > > + if ( !second || > > + !second[second_table_offset(addr)].walk.valid || > > + !second[second_table_offset(addr)].walk.table ) > > + goto done; > > + > > + mfn = second[second_table_offset(addr)].walk.base; > > + > > +done: > > + if ( second ) unmap_domain_page(second); > > + if ( first ) unmap_domain_page(first); > > + > > + return mfn; > > +} > > + > > +/* > > + * routine for dirty-page tracing > > + * > > + * On first write, it page faults, its entry is changed to > > +read-write, > > + * and on retry the write succeeds. > > + * > > + * for locating p2m of the faulting entry, we use virtual-linear page > table. > > + */ > > +int handle_page_fault(struct domain *d, paddr_t addr) { > > + int rc = 0; > > + struct p2m_domain *p2m = &d->arch.p2m; > > + uint64_t gma_start; > > + int gma_third_index; > > + int xen_second_linear, xen_third_table; > > + lpae_t *xen_third; > > + lpae_t *vlp2m_pte; > > + > > + BUG_ON( !d->arch.map_domain.nr_banks ); > > + > > + gma_start = d->arch.map_domain.bank[0].start; > > + gma_third_index = third_linear_offset(addr - gma_start); > > + vlp2m_pte = (lpae_t *)(d->arch.dirty.vlpt_start + > > + sizeof(lpae_t) * gma_third_index); > > + > > + BUG_ON( (void *)vlp2m_pte > d->arch.dirty.vlpt_end ); > > + > > + spin_lock(&p2m->lock); > > + > > + xen_second_linear = second_linear_offset((unsigned long)vlp2m_pte); > > + xen_third_table = third_table_offset((unsigned long)vlp2m_pte); > > + > > + /* starting from xen second level page table */ > > + if ( !xen_second[xen_second_linear].pt.valid ) > > + { > > + unsigned long va = (unsigned long)vlp2m_pte & ~(PAGE_SIZE-1); > > + > > + rc = create_xen_table(&xen_second[second_linear_offset(va)]); > > + if ( rc < 0 ) > > + goto out; > > + } > > + > > + BUG_ON( !xen_second[xen_second_linear].pt.valid ); > > + > > + /* at this point, xen second level pt has valid entry > > + * check again the validity of third level pt */ > > + xen_third > > + __va(pfn_to_paddr(xen_second[xen_second_linear].pt.base)); > > + > > + /* xen third-level page table invalid */ > > + if ( !xen_third[xen_third_table].p2m.valid ) > > + { > > + uint64_t mfn = find_guest_p2m_mfn(d, addr); > > + lpae_t pte = mfn_to_xen_entry(mfn); > > + unsigned long va = (unsigned long)vlp2m_pte & ~(PAGE_SIZE-1); > > + > > + pte.pt.table = 1; /* 4k mappings always have this bit set */ > > + write_pte(&xen_third[xen_third_table], pte); > > + flush_xen_data_tlb_range_va(va, PAGE_SIZE); > > + } > > + > > + /* at this point, xen third level pt has valid entry: means we can > access > > + * vlp2m_pte vlp2m_pte is like a fourth level pt for xen, but for > guest, > > + * it is third level pt */ > > + if ( vlp2m_pte->p2m.valid && vlp2m_pte->p2m.write == 0 ) > > + { > > + vlp2m_pte->p2m.write = 1; > > + vlp2m_pte->p2m.avail = 1; > > + write_pte(vlp2m_pte, *vlp2m_pte); > > + flush_tlb_local(); > > + } > > + > > +out: > > + spin_unlock(&p2m->lock); > > + return rc; > > +} > > + > > /* > > * Local variables: > > * mode: C > > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index > > 1b9209d..f844f56 100644 > > --- a/xen/arch/arm/traps.c > > +++ b/xen/arch/arm/traps.c > > @@ -1226,7 +1226,12 @@ static void do_trap_data_abort_guest(struct > cpu_user_regs *regs, > > goto bad_data_abort; > > > > /* XXX: Decode the instruction if ISS is not valid */ > > - if ( !dabt.valid ) > > + /* Note: add additional check before goto bad_data_abort. dabt.valid > > + * bit is for telling the validity of ISS[23:16] bits. For dirty- > page > > + * tracing, we need to see DFSC bits. If DFSC bits are indicating > the > > + * possibility of dirty page tracing, do not go to bad_data_abort */ > > + if ( !dabt.valid && > > + (dabt.dfsc & FSC_MASK) != (FSC_FLT_PERM + FSC_3D_LEVEL) && > > + dabt.write) > > It''s better to use | for bitmask instead of +.Yes, that is better.> > handle_mmio should not be called with dabt.valid == 0. If I understand > correctly your patch, this case can happen. > > > goto bad_data_abort; > > > > if (handle_mmio(&info)) > > @@ -1235,6 +1240,15 @@ static void do_trap_data_abort_guest(struct > cpu_user_regs *regs, > > return; > > } > > > > + /* handle permission fault on write */ > > + if ( (dabt.dfsc & FSC_MASK) == (FSC_FLT_PERM + FSC_3D_LEVEL) && > > + dabt.write ) > > Same here.OK.> > > + { > > + if ( current->domain->arch.dirty.mode == 0 ) > > + goto bad_data_abort; > > + if ( handle_page_fault(current->domain, info.gpa) == 0 ) > > + return; > > + } > > + > > bad_data_abort: > > > > msg = decode_fsc( dabt.dfsc, &level); diff --git > > a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index > > 0c80c65..413b89a 100644 > > --- a/xen/include/asm-arm/domain.h > > +++ b/xen/include/asm-arm/domain.h > > @@ -110,6 +110,17 @@ struct arch_domain > > spinlock_t lock; > > } uart0; > > > > + /* dirty-page tracing */ > > + struct { > > + spinlock_t lock; > > + int mode; > > + unsigned int count; > > + uint32_t gmfn_guest_start; /* guest physical memory start > address */ > > + void *vlpt_start; /* va-start of guest p2m */ > > + void *vlpt_end; /* va-end of guest p2m */ > > + struct page_info *head; /* maintain the mapped vaddrs */ > > + } dirty; > > + > > struct dt_mem_info map_domain; > > spinlock_t map_lock; > > } __cacheline_aligned; > > diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index > > 404ec4d..fd976e3 100644 > > --- a/xen/include/asm-arm/mm.h > > +++ b/xen/include/asm-arm/mm.h > > @@ -328,6 +328,11 @@ static inline void put_page_and_type(struct > page_info *page) > > put_page(page); > > } > > > > +enum mg { mg_clear, mg_ro, mg_rw, mg_rx }; > > + > > +/* routine for dirty-page tracing */ > > +int handle_page_fault(struct domain *d, paddr_t addr); > > + > > #endif /* __ARCH_ARM_MM__ */ > > /* > > * Local variables: > > diff --git a/xen/include/asm-arm/processor.h > > b/xen/include/asm-arm/processor.h index 06b0b25..34c21de 100644 > > --- a/xen/include/asm-arm/processor.h > > +++ b/xen/include/asm-arm/processor.h > > @@ -383,6 +383,8 @@ union hsr { > > #define FSC_CPR (0x3a) /* Coprocossor Abort */ > > > > #define FSC_LL_MASK (0x03<<0) > > +#define FSC_MASK (0x3f) /* Fault status mask */ > > +#define FSC_3D_LEVEL (0x03) /* Third level fault*/ > > > > /* Time counter hypervisor control register */ > > #define CNTHCTL_PA (1u<<0) /* Kernel/user access to physical > counter */ > > > -- > Julien Grall
Jaeyong Yoo
2013-Aug-20 10:19 UTC
Re: [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing
> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of Ian Campbell > Sent: Sunday, August 18, 2013 3:39 PM > To: Jaeyong Yoo > Cc: ''Stefano Stabellini''; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling write > fault for dirty-page tracing > > On Thu, 2013-08-15 at 13:24 +0900, Jaeyong Yoo wrote: > > For better understanding of trade-off between vlpt and page-table walk > > in dirty-page handling, > > I meant to ask before: "vlpt" is the implementation arising from linking > the p2m into Xen''s own (stage 1) pagetable rather than the initial vmap- > like implementation which was posted, right?Yes right. VLPT itself means slotting guest p2m into xen''s page table. Vmap-like implementation is for sharing xen''s virtual address range for vlpt and currently, migrating DomUs context switch the page table of this range. Jaeyong> > Ian. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2013-Sep-25 15:59 UTC
Re: [PATCH v3 00/10] xen/arm: live migration support in arndale board
On Thu, 2013-08-01 at 21:57 +0900, Jaeyong Yoo wrote:> Hi all, > here goes the v3 patch series for live migration in arndale board.Was a v4 of this ever posted (with the vlpt stuff)? If so I appear to have missed it, sorry. Feature freeze for Xen 4.4 is currently 18th October[0], we''d really like to see live migration support before then! (which in reality means it needs to be posted well before to give ample time for review of what is likely to be a complex patch set) Ian. [0] http://article.gmane.org/gmane.comp.emulators.xen.devel/168132> This version applies the comments from v2 patch series, which are majorly: > > 1) just use one timer struct for storing vtimer and ptimer > for hvm context: patch 1 > > 2) for dirty page tracing, use virtual-linear page table for accessing > guest p2m in xen: patch 6, 7, and 9 > > 3) Rather than using hard-coded guest memory map in xen, use the one from > toolstack by implementing set/get_memory_map hypercall: patch 3 and 10 > > This patch series does not support SMP guest migration. We are expecting > to provide SMP-guests live migration in version 4 patch series. > > We also have tested the stability of v3 patch series as follows: > > - setup two arndale boards with xen (let''s say A and B) > - launch 3 domUs at A > - simulataneously migrate 3 domUs from A to B back and forth. > - we say one round of migration if all 3 domUs migrate > from A to B and migrate back from B to A. > > When we perform the above tests without any load on domUs, the migration > goes to 80~100 rounds. After that, dom0 suddenly stops responding. When we > perform with network load (iperf on each domU), the migration goes to 2~3 > rounds and dom0 stops responding. > > After several repeated tests, we gather the PCs where dom0 stucks, and those are > - _raw_spin_lock: (called by try_to_wake_up) > - panic_smp_self_stop > - cpu_v7_dcache_clean_area > > I think those bugs are somehow related to live migration or maybe other parts > that may result in complicated cause-and-effect chain to live migration. In any > case, I would like to look into the detail to figure out the cause and possibly > fix the bugs. > > In the meanwhile, I would appreciate your comments to this patch series :) > > > Best, > Jaeyong > > lexey Sokolov, Elena Pyatunina, Evgeny Fedotov, and Nikolay Martyanov (1): > xen/arm: Implement toolstack for xl restore/save and migrate > > Alexey Sokolov (1): > xen/arm: Implement modify_returncode > > Evgeny Fedotov (2): > xen/arm: Implement set_memory_map hypercall > xen/arm: Implement get_maximum_gpfn hypercall for arm > > Jaeyong Yoo and Evgeny Fedotov (1): > xen/arm: Implement hvm save and restore > > Jaeyong Yoo and Alexey Sokolov (1): > xen/arm: Add more registers for saving and restoring vcpu registers > > Jaeyong Yoo and Elena Pyatunina (2) > xen/arm: Add handling write fault for dirty-page tracing > xen/arm: Implement hypercall for dirty page tracing (shadow op) > > Jaeyong Yoo (2): > xen/arm: Implement virtual-linear page table for guest p2m > mapping in live migration > xen/arm: Fixing clear_guest_offset macro > > config/arm32.mk | 1 + > tools/include/xen-foreign/reference.size | 2 +- > tools/libxc/Makefile | 5 + > tools/libxc/xc_arm_migrate.c | 686 +++++++++++++++++++++++++++++++ > tools/libxc/xc_dom_arm.c | 12 +- > tools/libxc/xc_domain.c | 44 ++ > tools/libxc/xc_resume.c | 25 ++ > tools/libxc/xenctrl.h | 23 ++ > tools/misc/Makefile | 4 + > xen/arch/arm/Makefile | 2 + > xen/arch/arm/domain.c | 44 ++ > xen/arch/arm/domctl.c | 137 +++++- > xen/arch/arm/hvm.c | 124 ++++++ > xen/arch/arm/mm.c | 284 ++++++++++++- > xen/arch/arm/p2m.c | 307 ++++++++++++++ > xen/arch/arm/save.c | 66 +++ > xen/arch/arm/setup.c | 3 + > xen/arch/arm/traps.c | 16 +- > xen/arch/arm/vlpt.c | 162 ++++++++ > xen/common/Makefile | 2 + > xen/include/asm-arm/config.h | 3 + > xen/include/asm-arm/domain.h | 13 + > xen/include/asm-arm/guest_access.h | 5 +- > xen/include/asm-arm/hvm/support.h | 29 ++ > xen/include/asm-arm/mm.h | 7 + > xen/include/asm-arm/p2m.h | 4 + > xen/include/asm-arm/processor.h | 2 + > xen/include/asm-arm/vlpt.h | 10 + > xen/include/public/arch-arm.h | 35 ++ > xen/include/public/arch-arm/hvm/save.h | 41 ++ > xen/include/public/memory.h | 15 +- > xen/include/xsm/dummy.h | 5 + > xen/include/xsm/xsm.h | 5 + > 33 files changed, 2113 insertions(+), 10 deletions(-) > create mode 100644 tools/libxc/xc_arm_migrate.c > create mode 100644 xen/arch/arm/save.c > create mode 100644 xen/arch/arm/vlpt.c > create mode 100644 xen/include/asm-arm/hvm/support.h > create mode 100644 xen/include/asm-arm/vlpt.h >
Jaeyong Yoo
2013-Sep-26 06:23 UTC
Re: [PATCH v3 00/10] xen/arm: live migration support in arndale board
> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of Ian Campbell > Sent: Thursday, September 26, 2013 12:59 AM > To: Jaeyong Yoo > Cc: Stefano Stabellini; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH v3 00/10] xen/arm: live migration support > in arndale board > > On Thu, 2013-08-01 at 21:57 +0900, Jaeyong Yoo wrote: > > Hi all, > > here goes the v3 patch series for live migration in arndale board. > > Was a v4 of this ever posted (with the vlpt stuff)? If so I appear to have > missed it, sorry.No, I didn''t post it yet.> > Feature freeze for Xen 4.4 is currently 18th October[0], we''d really like > to see live migration support before then! (which in reality means it > needs to be posted well before to give ample time for review of what is > likely to be a complex patch set)I also would like to see live migration in Xen 4.4 and I will post the V4 within next week. Then, you have 2 weeks for review before feature freeze. I''m sorry for the delay. Jaeyong.> > Ian. > > [0] http://article.gmane.org/gmane.comp.emulators.xen.devel/168132 > > > > This version applies the comments from v2 patch series, which are > majorly: > > > > 1) just use one timer struct for storing vtimer and ptimer > > for hvm context: patch 1 > > > > 2) for dirty page tracing, use virtual-linear page table for accessing > > guest p2m in xen: patch 6, 7, and 9 > > > > 3) Rather than using hard-coded guest memory map in xen, use the one > from > > toolstack by implementing set/get_memory_map hypercall: patch 3 > > and 10 > > > > This patch series does not support SMP guest migration. We are > > expecting to provide SMP-guests live migration in version 4 patchseries.> > > > We also have tested the stability of v3 patch series as follows: > > > > - setup two arndale boards with xen (let''s say A and B) > > - launch 3 domUs at A > > - simulataneously migrate 3 domUs from A to B back and forth. > > - we say one round of migration if all 3 domUs migrate > > from A to B and migrate back from B to A. > > > > When we perform the above tests without any load on domUs, the > > migration goes to 80~100 rounds. After that, dom0 suddenly stops > > responding. When we perform with network load (iperf on each domU), > > the migration goes to 2~3 rounds and dom0 stops responding. > > > > After several repeated tests, we gather the PCs where dom0 stucks, and > those are > > - _raw_spin_lock: (called by try_to_wake_up) > > - panic_smp_self_stop > > - cpu_v7_dcache_clean_area > > > > I think those bugs are somehow related to live migration or maybe > > other parts that may result in complicated cause-and-effect chain to > > live migration. In any case, I would like to look into the detail to > > figure out the cause and possibly fix the bugs. > > > > In the meanwhile, I would appreciate your comments to this patch > > series :) > > > > > > Best, > > Jaeyong > > > > lexey Sokolov, Elena Pyatunina, Evgeny Fedotov, and Nikolay Martyanov > (1): > > xen/arm: Implement toolstack for xl restore/save and migrate > > > > Alexey Sokolov (1): > > xen/arm: Implement modify_returncode > > > > Evgeny Fedotov (2): > > xen/arm: Implement set_memory_map hypercall > > xen/arm: Implement get_maximum_gpfn hypercall for arm > > > > Jaeyong Yoo and Evgeny Fedotov (1): > > xen/arm: Implement hvm save and restore > > > > Jaeyong Yoo and Alexey Sokolov (1): > > xen/arm: Add more registers for saving and restoring vcpu registers > > > > Jaeyong Yoo and Elena Pyatunina (2) > > xen/arm: Add handling write fault for dirty-page tracing > > xen/arm: Implement hypercall for dirty page tracing (shadow op) > > > > Jaeyong Yoo (2): > > xen/arm: Implement virtual-linear page table for guest p2m > > mapping in live migration > > xen/arm: Fixing clear_guest_offset macro > > > > config/arm32.mk | 1 + > > tools/include/xen-foreign/reference.size | 2 +- > > tools/libxc/Makefile | 5 + > > tools/libxc/xc_arm_migrate.c | 686 > +++++++++++++++++++++++++++++++ > > tools/libxc/xc_dom_arm.c | 12 +- > > tools/libxc/xc_domain.c | 44 ++ > > tools/libxc/xc_resume.c | 25 ++ > > tools/libxc/xenctrl.h | 23 ++ > > tools/misc/Makefile | 4 + > > xen/arch/arm/Makefile | 2 + > > xen/arch/arm/domain.c | 44 ++ > > xen/arch/arm/domctl.c | 137 +++++- > > xen/arch/arm/hvm.c | 124 ++++++ > > xen/arch/arm/mm.c | 284 ++++++++++++- > > xen/arch/arm/p2m.c | 307 ++++++++++++++ > > xen/arch/arm/save.c | 66 +++ > > xen/arch/arm/setup.c | 3 + > > xen/arch/arm/traps.c | 16 +- > > xen/arch/arm/vlpt.c | 162 ++++++++ > > xen/common/Makefile | 2 + > > xen/include/asm-arm/config.h | 3 + > > xen/include/asm-arm/domain.h | 13 + > > xen/include/asm-arm/guest_access.h | 5 +- > > xen/include/asm-arm/hvm/support.h | 29 ++ > > xen/include/asm-arm/mm.h | 7 + > > xen/include/asm-arm/p2m.h | 4 + > > xen/include/asm-arm/processor.h | 2 + > > xen/include/asm-arm/vlpt.h | 10 + > > xen/include/public/arch-arm.h | 35 ++ > > xen/include/public/arch-arm/hvm/save.h | 41 ++ > > xen/include/public/memory.h | 15 +- > > xen/include/xsm/dummy.h | 5 + > > xen/include/xsm/xsm.h | 5 + > > 33 files changed, 2113 insertions(+), 10 deletions(-) create mode > > 100644 tools/libxc/xc_arm_migrate.c create mode 100644 > > xen/arch/arm/save.c create mode 100644 xen/arch/arm/vlpt.c create > > mode 100644 xen/include/asm-arm/hvm/support.h create mode 100644 > > xen/include/asm-arm/vlpt.h > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2013-Sep-26 15:13 UTC
Re: [PATCH v3 00/10] xen/arm: live migration support in arndale board
On Thu, 2013-09-26 at 15:23 +0900, Jaeyong Yoo wrote:> I also would like to see live migration in Xen 4.4 and I will post the V4 > within next week. Then, you have 2 weeks for review before feature freeze.Brilliant, thanks!> I''m sorry for the delay.No worries. Cheers, Ian.