Mukesh Rathor
2013-Jul-20 01:44 UTC
[PATCH 00/23][V9]PVH xen: Phase I, Version 9 patches...
Hi all, This is V9 of PVH patches for xen. These are xen changes to support boot of a 64bit PVH domU guest. Built on top of unstable git c/s: d4435fe5e2f0dfadb41ef46c38f462f45d67762e Patches 2, 4, 5, 6, 7, 10, 11, 12, and 15 have already been "Reviewed-by". The public git tree for this: git clone -n git://oss.oracle.com/git/mrathor/xen.git . git checkout pvh.v9 Coming in future after this is done, two patchsets: - 1) tools changes and 2) dom0 changes. Thanks for all the help, Mukesh
Mukesh Rathor
2013-Jul-20 01:44 UTC
[PATCH 01/23] PVH xen: Add readme docs/misc/pvh-readme.txt
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- docs/misc/pvh-readme.txt | 56 ++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 56 insertions(+), 0 deletions(-) create mode 100644 docs/misc/pvh-readme.txt diff --git a/docs/misc/pvh-readme.txt b/docs/misc/pvh-readme.txt new file mode 100644 index 0000000..3b14aa7 --- /dev/null +++ b/docs/misc/pvh-readme.txt @@ -0,0 +1,56 @@ + +PVH : an x86 PV guest running in an HVM container. HAP is required for PVH. + +See: http://blog.xen.org/index.php/2012/10/23/the-paravirtualization-spectrum-part-1-the-ends-of-the-spectrum/ + +At present the only PVH guest is an x86 64bit PV linux. Patches are at: + git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git + +A PVH guest kernel must support following features, as defined for linux +in arch/x86/xen/xen-head.S: + + #define FEATURES_PVH "|writable_descriptor_tables" \ + "|auto_translated_physmap" \ + "|supervisor_mode_kernel" \ + "|hvm_callback_vector" + +In a nutshell, the guest uses auto translate, ie, p2m is managed by xen, +it uses event callback and not vlapic emulation, the page tables are +native, so mmu_update hcall is N/A for PVH guest. Moreover IDT is native, so +set_trap_table hcall is also N/A for a PVH guest. For a full list of hcalls +supported for PVH, see pvh_hypercall64_table in arch/x86/hvm/hvm.c in xen. +From the ABI prespective, it''s mostly a PV guest with auto translate, altho +it does use hvm_op for setting callback vector. + +The initial phase targets the booting of a 64bit UP/SMP linux guest in PVH +mode. This is done by adding: pvh=1 in the config file. xl, and not xm, is +supported. Phase I patches are broken into three parts: + - xen changes for booting of 64bit PVH guest + - tools changes for creating a PVH guest + - boot of 64bit dom0 in PVH mode. + +Following fixme''s exist in the code: + - Add support for more memory types in arch/x86/hvm/mtrr.c. + - arch/x86/time.c: support more tsc modes. + - check_guest_io_breakpoint(): check/add support for IO breakpoint. + - implement arch_get_info_guest() for pvh. + - vmxit_msr_read(): during AMD port go thru hvm_msr_read_intercept() again. + - verify bp matching on emulated instructions will work same as HVM for + PVH guest. see instruction_done() and check_guest_io_breakpoint(). + +Following remain to be done for PVH: + - AMD port. + - 32bit PVH guest support in both linux and xen. Xen changes are tagged + "32bitfixme". + - Add support for monitoring guest behavior. See hvm_memory_event* functions + in hvm.c + - vcpu hotplug support + - Live migration of PVH guests. + - Avail PVH dom0 of posted interrupts. (This will be a big win). + + +Note, any emails to me must be cc''d to xen devel mailing list. OTOH, please +cc me on PVH emails to the xen devel mailing list. + +Mukesh Rathor +mukesh.rathor [at] oracle [dot] com -- 1.7.2.3
Mukesh Rathor
2013-Jul-20 01:44 UTC
[PATCH 02/23] PVH xen: turn gdb_frames/gdt_ents into union.
Changes in V2: - Add __XEN_INTERFACE_VERSION__ Changes in V3: - Rename union to ''gdt'' and rename field names. Change in V9: - Update __XEN_LATEST_INTERFACE_VERSION__ to 0x00040400 for compat. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- tools/libxc/xc_domain_restore.c | 8 ++++---- tools/libxc/xc_domain_save.c | 6 +++--- xen/arch/x86/domain.c | 12 ++++++------ xen/arch/x86/domctl.c | 12 ++++++------ xen/include/public/arch-x86/xen.h | 14 ++++++++++++++ xen/include/public/xen-compat.h | 2 +- 6 files changed, 34 insertions(+), 20 deletions(-) diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c index 63d36cd..47aaca0 100644 --- a/tools/libxc/xc_domain_restore.c +++ b/tools/libxc/xc_domain_restore.c @@ -2055,15 +2055,15 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, munmap(start_info, PAGE_SIZE); } /* Uncanonicalise each GDT frame number. */ - if ( GET_FIELD(ctxt, gdt_ents) > 8192 ) + if ( GET_FIELD(ctxt, gdt.pv.num_ents) > 8192 ) { ERROR("GDT entry count out of range"); goto out; } - for ( j = 0; (512*j) < GET_FIELD(ctxt, gdt_ents); j++ ) + for ( j = 0; (512*j) < GET_FIELD(ctxt, gdt.pv.num_ents); j++ ) { - pfn = GET_FIELD(ctxt, gdt_frames[j]); + pfn = GET_FIELD(ctxt, gdt.pv.frames[j]); if ( (pfn >= dinfo->p2m_size) || (pfn_type[pfn] != XEN_DOMCTL_PFINFO_NOTAB) ) { @@ -2071,7 +2071,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, j, (unsigned long)pfn); goto out; } - SET_FIELD(ctxt, gdt_frames[j], ctx->p2m[pfn]); + SET_FIELD(ctxt, gdt.pv.frames[j], ctx->p2m[pfn]); } /* Uncanonicalise the page table base pointer. */ pfn = UNFOLD_CR3(GET_FIELD(ctxt, ctrlreg[3])); diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c index fbc15e9..e938628 100644 --- a/tools/libxc/xc_domain_save.c +++ b/tools/libxc/xc_domain_save.c @@ -1907,15 +1907,15 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter } /* Canonicalise each GDT frame number. */ - for ( j = 0; (512*j) < GET_FIELD(&ctxt, gdt_ents); j++ ) + for ( j = 0; (512*j) < GET_FIELD(&ctxt, gdt.pv.num_ents); j++ ) { - mfn = GET_FIELD(&ctxt, gdt_frames[j]); + mfn = GET_FIELD(&ctxt, gdt.pv.frames[j]); if ( !MFN_IS_IN_PSEUDOPHYS_MAP(mfn) ) { ERROR("GDT frame is not in range of pseudophys map"); goto out; } - SET_FIELD(&ctxt, gdt_frames[j], mfn_to_pfn(mfn)); + SET_FIELD(&ctxt, gdt.pv.frames[j], mfn_to_pfn(mfn)); } /* Canonicalise the page table base pointer. */ diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 874742c..73ddad7 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -784,8 +784,8 @@ int arch_set_info_guest( } for ( i = 0; i < ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames); ++i ) - fail |= v->arch.pv_vcpu.gdt_frames[i] != c(gdt_frames[i]); - fail |= v->arch.pv_vcpu.gdt_ents != c(gdt_ents); + fail |= v->arch.pv_vcpu.gdt_frames[i] != c(gdt.pv.frames[i]); + fail |= v->arch.pv_vcpu.gdt_ents != c(gdt.pv.num_ents); fail |= v->arch.pv_vcpu.ldt_base != c(ldt_base); fail |= v->arch.pv_vcpu.ldt_ents != c(ldt_ents); @@ -838,17 +838,17 @@ int arch_set_info_guest( return rc; if ( !compat ) - rc = (int)set_gdt(v, c.nat->gdt_frames, c.nat->gdt_ents); + rc = (int)set_gdt(v, c.nat->gdt.pv.frames, c.nat->gdt.pv.num_ents); else { unsigned long gdt_frames[ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames)]; - unsigned int n = (c.cmp->gdt_ents + 511) / 512; + unsigned int n = (c.cmp->gdt.pv.num_ents + 511) / 512; if ( n > ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames) ) return -EINVAL; for ( i = 0; i < n; ++i ) - gdt_frames[i] = c.cmp->gdt_frames[i]; - rc = (int)set_gdt(v, gdt_frames, c.cmp->gdt_ents); + gdt_frames[i] = c.cmp->gdt.pv.frames[i]; + rc = (int)set_gdt(v, gdt_frames, c.cmp->gdt.pv.num_ents); } if ( rc != 0 ) return rc; diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index c2a04c4..f87d6ab 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -1300,12 +1300,12 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c) c(ldt_base = v->arch.pv_vcpu.ldt_base); c(ldt_ents = v->arch.pv_vcpu.ldt_ents); for ( i = 0; i < ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames); ++i ) - c(gdt_frames[i] = v->arch.pv_vcpu.gdt_frames[i]); - BUILD_BUG_ON(ARRAY_SIZE(c.nat->gdt_frames) !- ARRAY_SIZE(c.cmp->gdt_frames)); - for ( ; i < ARRAY_SIZE(c.nat->gdt_frames); ++i ) - c(gdt_frames[i] = 0); - c(gdt_ents = v->arch.pv_vcpu.gdt_ents); + c(gdt.pv.frames[i] = v->arch.pv_vcpu.gdt_frames[i]); + BUILD_BUG_ON(ARRAY_SIZE(c.nat->gdt.pv.frames) !+ ARRAY_SIZE(c.cmp->gdt.pv.frames)); + for ( ; i < ARRAY_SIZE(c.nat->gdt.pv.frames); ++i ) + c(gdt.pv.frames[i] = 0); + c(gdt.pv.num_ents = v->arch.pv_vcpu.gdt_ents); c(kernel_ss = v->arch.pv_vcpu.kernel_ss); c(kernel_sp = v->arch.pv_vcpu.kernel_sp); for ( i = 0; i < ARRAY_SIZE(v->arch.pv_vcpu.ctrlreg); ++i ) diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h index b7f6a51..25c8519 100644 --- a/xen/include/public/arch-x86/xen.h +++ b/xen/include/public/arch-x86/xen.h @@ -170,7 +170,21 @@ struct vcpu_guest_context { struct cpu_user_regs user_regs; /* User-level CPU registers */ struct trap_info trap_ctxt[256]; /* Virtual IDT */ unsigned long ldt_base, ldt_ents; /* LDT (linear address, # ents) */ +#if __XEN_INTERFACE_VERSION__ < 0x00040400 unsigned long gdt_frames[16], gdt_ents; /* GDT (machine frames, # ents) */ +#else + union { + struct { + /* GDT (machine frames, # ents) */ + unsigned long frames[16], num_ents; + } pv; + struct { + /* PVH: GDTR addr and size */ + uint64_t addr; + uint16_t limit; + } pvh; + } gdt; +#endif unsigned long kernel_ss, kernel_sp; /* Virtual TSS (only SS1/SP1) */ /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */ unsigned long ctrlreg[8]; /* CR0-CR7 (control registers) */ diff --git a/xen/include/public/xen-compat.h b/xen/include/public/xen-compat.h index 69141c4..3eb80a0 100644 --- a/xen/include/public/xen-compat.h +++ b/xen/include/public/xen-compat.h @@ -27,7 +27,7 @@ #ifndef __XEN_PUBLIC_XEN_COMPAT_H__ #define __XEN_PUBLIC_XEN_COMPAT_H__ -#define __XEN_LATEST_INTERFACE_VERSION__ 0x00040300 +#define __XEN_LATEST_INTERFACE_VERSION__ 0x00040400 #if defined(__XEN__) || defined(__XEN_TOOLS__) /* Xen is built with matching headers and implements the latest interface. */ -- 1.7.2.3
Mukesh Rathor
2013-Jul-20 01:44 UTC
[PATCH 03/23] PVH xen: add params to read_segment_register
In this preparatory patch, read_segment_register macro is changed to take vcpu and regs parameters. No functionality change. Changes in V2: None Changes in V3: - Replace read_sreg with read_segment_register Changes in V7: - Don''t make emulate_privileged_op() public here. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/domain.c | 8 ++++---- xen/arch/x86/traps.c | 26 ++++++++++++-------------- xen/arch/x86/x86_64/traps.c | 16 ++++++++-------- xen/include/asm-x86/system.h | 2 +- 4 files changed, 25 insertions(+), 27 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 73ddad7..5de5e49 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1221,10 +1221,10 @@ static void save_segments(struct vcpu *v) struct cpu_user_regs *regs = &v->arch.user_regs; unsigned int dirty_segment_mask = 0; - regs->ds = read_segment_register(ds); - regs->es = read_segment_register(es); - regs->fs = read_segment_register(fs); - regs->gs = read_segment_register(gs); + regs->ds = read_segment_register(v, regs, ds); + regs->es = read_segment_register(v, regs, es); + regs->fs = read_segment_register(v, regs, fs); + regs->gs = read_segment_register(v, regs, gs); if ( regs->ds ) dirty_segment_mask |= DIRTY_DS; diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 57dbd0c..378ef0a 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -1831,8 +1831,6 @@ static inline uint64_t guest_misc_enable(uint64_t val) } \ (eip) += sizeof(_x); _x; }) -#define read_sreg(regs, sr) read_segment_register(sr) - static int is_cpufreq_controller(struct domain *d) { return ((cpufreq_controller == FREQCTL_dom0_kernel) && @@ -1877,7 +1875,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) goto fail; /* emulating only opcodes not allowing SS to be default */ - data_sel = read_sreg(regs, ds); + data_sel = read_segment_register(v, regs, ds); /* Legacy prefixes. */ for ( i = 0; i < 8; i++, rex == opcode || (rex = 0) ) @@ -1895,17 +1893,17 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) data_sel = regs->cs; continue; case 0x3e: /* DS override */ - data_sel = read_sreg(regs, ds); + data_sel = read_segment_register(v, regs, ds); continue; case 0x26: /* ES override */ - data_sel = read_sreg(regs, es); + data_sel = read_segment_register(v, regs, es); continue; case 0x64: /* FS override */ - data_sel = read_sreg(regs, fs); + data_sel = read_segment_register(v, regs, fs); lm_ovr = lm_seg_fs; continue; case 0x65: /* GS override */ - data_sel = read_sreg(regs, gs); + data_sel = read_segment_register(v, regs, gs); lm_ovr = lm_seg_gs; continue; case 0x36: /* SS override */ @@ -1952,7 +1950,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) if ( !(opcode & 2) ) { - data_sel = read_sreg(regs, es); + data_sel = read_segment_register(v, regs, es); lm_ovr = lm_seg_none; } @@ -2685,22 +2683,22 @@ static void emulate_gate_op(struct cpu_user_regs *regs) ASSERT(opnd_sel); continue; case 0x3e: /* DS override */ - opnd_sel = read_sreg(regs, ds); + opnd_sel = read_segment_register(v, regs, ds); if ( !opnd_sel ) opnd_sel = dpl; continue; case 0x26: /* ES override */ - opnd_sel = read_sreg(regs, es); + opnd_sel = read_segment_register(v, regs, es); if ( !opnd_sel ) opnd_sel = dpl; continue; case 0x64: /* FS override */ - opnd_sel = read_sreg(regs, fs); + opnd_sel = read_segment_register(v, regs, fs); if ( !opnd_sel ) opnd_sel = dpl; continue; case 0x65: /* GS override */ - opnd_sel = read_sreg(regs, gs); + opnd_sel = read_segment_register(v, regs, gs); if ( !opnd_sel ) opnd_sel = dpl; continue; @@ -2753,7 +2751,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs) switch ( modrm & 7 ) { default: - opnd_sel = read_sreg(regs, ds); + opnd_sel = read_segment_register(v, regs, ds); break; case 4: case 5: opnd_sel = regs->ss; @@ -2781,7 +2779,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs) break; } if ( !opnd_sel ) - opnd_sel = read_sreg(regs, ds); + opnd_sel = read_segment_register(v, regs, ds); switch ( modrm & 7 ) { case 0: case 2: case 4: diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c index bcd7609..9e0571d 100644 --- a/xen/arch/x86/x86_64/traps.c +++ b/xen/arch/x86/x86_64/traps.c @@ -122,10 +122,10 @@ void show_registers(struct cpu_user_regs *regs) fault_crs[0] = read_cr0(); fault_crs[3] = read_cr3(); fault_crs[4] = read_cr4(); - fault_regs.ds = read_segment_register(ds); - fault_regs.es = read_segment_register(es); - fault_regs.fs = read_segment_register(fs); - fault_regs.gs = read_segment_register(gs); + fault_regs.ds = read_segment_register(v, regs, ds); + fault_regs.es = read_segment_register(v, regs, es); + fault_regs.fs = read_segment_register(v, regs, fs); + fault_regs.gs = read_segment_register(v, regs, gs); } print_xen_info(); @@ -240,10 +240,10 @@ void do_double_fault(struct cpu_user_regs *regs) crs[2] = read_cr2(); crs[3] = read_cr3(); crs[4] = read_cr4(); - regs->ds = read_segment_register(ds); - regs->es = read_segment_register(es); - regs->fs = read_segment_register(fs); - regs->gs = read_segment_register(gs); + regs->ds = read_segment_register(current, regs, ds); + regs->es = read_segment_register(current, regs, es); + regs->fs = read_segment_register(current, regs, fs); + regs->gs = read_segment_register(current, regs, gs); printk("CPU: %d\n", cpu); _show_registers(regs, crs, CTXT_hypervisor, NULL); diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h index 6ab7d56..9bb22cb 100644 --- a/xen/include/asm-x86/system.h +++ b/xen/include/asm-x86/system.h @@ -4,7 +4,7 @@ #include <xen/lib.h> #include <xen/bitops.h> -#define read_segment_register(name) \ +#define read_segment_register(vcpu, regs, name) \ ({ u16 __sel; \ asm volatile ( "movw %%" STR(name) ",%0" : "=r" (__sel) ); \ __sel; \ -- 1.7.2.3
Mukesh Rathor
2013-Jul-20 01:44 UTC
[PATCH 04/23] PVH xen: Move e820 fields out of pv_domain struct
This patch moves fields out of the pv_domain struct as they are used by PVH also. Changes in V6: - Don''t base on guest type the initialization and cleanup. Changes in V7: - If statement doesn''t need to be split across lines anymore. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/arch/x86/domain.c | 10 ++++------ xen/arch/x86/mm.c | 26 ++++++++++++-------------- xen/include/asm-x86/domain.h | 10 +++++----- 3 files changed, 21 insertions(+), 25 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 5de5e49..c361abf 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -553,6 +553,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) if ( (rc = iommu_domain_init(d)) != 0 ) goto fail; } + spin_lock_init(&d->arch.e820_lock); if ( is_hvm_domain(d) ) { @@ -563,13 +564,9 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) } } else - { /* 64-bit PV guest by default. */ d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0; - spin_lock_init(&d->arch.pv_domain.e820_lock); - } - /* initialize default tsc behavior in case tools don''t */ tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0); spin_lock_init(&d->arch.vtsc_lock); @@ -592,8 +589,9 @@ void arch_domain_destroy(struct domain *d) { if ( is_hvm_domain(d) ) hvm_domain_destroy(d); - else - xfree(d->arch.pv_domain.e820); + + if ( d->arch.e820 ) + xfree(d->arch.e820); free_domain_pirqs(d); if ( !is_idle_domain(d) ) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 286e903..e980431 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4763,11 +4763,11 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) return -EFAULT; } - spin_lock(&d->arch.pv_domain.e820_lock); - xfree(d->arch.pv_domain.e820); - d->arch.pv_domain.e820 = e820; - d->arch.pv_domain.nr_e820 = fmap.map.nr_entries; - spin_unlock(&d->arch.pv_domain.e820_lock); + spin_lock(&d->arch.e820_lock); + xfree(d->arch.e820); + d->arch.e820 = e820; + d->arch.nr_e820 = fmap.map.nr_entries; + spin_unlock(&d->arch.e820_lock); rcu_unlock_domain(d); return rc; @@ -4781,26 +4781,24 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) if ( copy_from_guest(&map, arg, 1) ) return -EFAULT; - spin_lock(&d->arch.pv_domain.e820_lock); + spin_lock(&d->arch.e820_lock); /* Backwards compatibility. */ - if ( (d->arch.pv_domain.nr_e820 == 0) || - (d->arch.pv_domain.e820 == NULL) ) + if ( (d->arch.nr_e820 == 0) || (d->arch.e820 == NULL) ) { - spin_unlock(&d->arch.pv_domain.e820_lock); + spin_unlock(&d->arch.e820_lock); return -ENOSYS; } - map.nr_entries = min(map.nr_entries, d->arch.pv_domain.nr_e820); - if ( copy_to_guest(map.buffer, d->arch.pv_domain.e820, - map.nr_entries) || + map.nr_entries = min(map.nr_entries, d->arch.nr_e820); + if ( copy_to_guest(map.buffer, d->arch.e820, map.nr_entries) || __copy_to_guest(arg, &map, 1) ) { - spin_unlock(&d->arch.pv_domain.e820_lock); + spin_unlock(&d->arch.e820_lock); return -EFAULT; } - spin_unlock(&d->arch.pv_domain.e820_lock); + spin_unlock(&d->arch.e820_lock); return 0; } diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index d79464d..c3f9f8e 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -234,11 +234,6 @@ struct pv_domain /* map_domain_page() mapping cache. */ struct mapcache_domain mapcache; - - /* Pseudophysical e820 map (XENMEM_memory_map). */ - spinlock_t e820_lock; - struct e820entry *e820; - unsigned int nr_e820; }; struct arch_domain @@ -313,6 +308,11 @@ struct arch_domain (possibly other cases in the future */ uint64_t vtsc_kerncount; /* for hvm, counts all vtsc */ uint64_t vtsc_usercount; /* not used for hvm */ + + /* Pseudophysical e820 map (XENMEM_memory_map). */ + spinlock_t e820_lock; + struct e820entry *e820; + unsigned int nr_e820; } __cacheline_aligned; #define has_arch_pdevs(d) (!list_empty(&(d)->arch.pdev_list)) -- 1.7.2.3
Mukesh Rathor
2013-Jul-20 01:44 UTC
[PATCH 05/23] PVH xen: hvm related preparatory changes for PVH
This patch contains small changes to hvm.c because hvm_domain.params is not set/used/supported for PVH in the present series. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/arch/x86/hvm/hvm.c | 10 ++++++---- 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 1fcaed0..8284b3b 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1070,10 +1070,13 @@ int hvm_vcpu_initialise(struct vcpu *v) { int rc; struct domain *d = v->domain; - domid_t dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN]; + domid_t dm_domid; hvm_asid_flush_vcpu(v); + spin_lock_init(&v->arch.hvm_vcpu.tm_lock); + INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list); + if ( (rc = vlapic_init(v)) != 0 ) goto fail1; @@ -1084,6 +1087,8 @@ int hvm_vcpu_initialise(struct vcpu *v) && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) goto fail3; + dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN]; + /* Create ioreq event channel. */ rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); if ( rc < 0 ) @@ -1106,9 +1111,6 @@ int hvm_vcpu_initialise(struct vcpu *v) get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port; spin_unlock(&d->arch.hvm_domain.ioreq.lock); - spin_lock_init(&v->arch.hvm_vcpu.tm_lock); - INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list); - v->arch.hvm_vcpu.inject_trap.vector = -1; rc = setup_compat_arg_xlat(v); -- 1.7.2.3
Mukesh Rathor
2013-Jul-20 01:44 UTC
[PATCH 06/23] PVH xen: vmx related preparatory changes for PVH
This is another preparatory patch for PVH. In this patch, following functions are made available for general/public use: vmx_fpu_enter(), get_instruction_length(), update_guest_eip(), and vmx_dr_access(). There is no functionality change. Changes in V2: - prepend vmx_ to get_instruction_length and update_guest_eip. - Do not export/use vmr(). Changes in V3: - Do not change emulate_forced_invalid_op() in this patch. Changes in V7: - Drop pv_cpuid going public here. Changes in V8: - Move vmx_fpu_enter prototype from vmcs.h to vmx.h Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/arch/x86/hvm/vmx/vmx.c | 72 +++++++++++++++--------------------- xen/arch/x86/hvm/vmx/vvmx.c | 2 +- xen/include/asm-x86/hvm/vmx/vmx.h | 17 ++++++++- 3 files changed, 47 insertions(+), 44 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 8ed7026..7292357 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -577,7 +577,7 @@ static int vmx_load_vmcs_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt) return 0; } -static void vmx_fpu_enter(struct vcpu *v) +void vmx_fpu_enter(struct vcpu *v) { vcpu_restore_fpu_lazy(v); v->arch.hvm_vmx.exception_bitmap &= ~(1u << TRAP_no_device); @@ -1608,24 +1608,12 @@ const struct hvm_function_table * __init start_vmx(void) return &vmx_function_table; } -/* - * Not all cases receive valid value in the VM-exit instruction length field. - * Callers must know what they''re doing! - */ -static int get_instruction_length(void) -{ - int len; - len = __vmread(VM_EXIT_INSTRUCTION_LEN); /* Safe: callers audited */ - BUG_ON((len < 1) || (len > 15)); - return len; -} - -void update_guest_eip(void) +void vmx_update_guest_eip(void) { struct cpu_user_regs *regs = guest_cpu_user_regs(); unsigned long x; - regs->eip += get_instruction_length(); /* Safe: callers audited */ + regs->eip += vmx_get_instruction_length(); /* Safe: callers audited */ regs->eflags &= ~X86_EFLAGS_RF; x = __vmread(GUEST_INTERRUPTIBILITY_INFO); @@ -1698,8 +1686,8 @@ static void vmx_do_cpuid(struct cpu_user_regs *regs) regs->edx = edx; } -static void vmx_dr_access(unsigned long exit_qualification, - struct cpu_user_regs *regs) +void vmx_dr_access(unsigned long exit_qualification, + struct cpu_user_regs *regs) { struct vcpu *v = current; @@ -2312,7 +2300,7 @@ static int vmx_handle_eoi_write(void) if ( (((exit_qualification >> 12) & 0xf) == 1) && ((exit_qualification & 0xfff) == APIC_EOI) ) { - update_guest_eip(); /* Safe: APIC data write */ + vmx_update_guest_eip(); /* Safe: APIC data write */ vlapic_EOI_set(vcpu_vlapic(current)); HVMTRACE_0D(VLAPIC); return 1; @@ -2525,7 +2513,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) HVMTRACE_1D(TRAP, vector); if ( v->domain->debugger_attached ) { - update_guest_eip(); /* Safe: INT3 */ + vmx_update_guest_eip(); /* Safe: INT3 */ current->arch.gdbsx_vcpu_event = TRAP_int3; domain_pause_for_debugger(); break; @@ -2633,7 +2621,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) */ inst_len = ((source != 3) || /* CALL, IRET, or JMP? */ (idtv_info & (1u<<10))) /* IntrType > 3? */ - ? get_instruction_length() /* Safe: SDM 3B 23.2.4 */ : 0; + ? vmx_get_instruction_length() /* Safe: SDM 3B 23.2.4 */ : 0; if ( (source == 3) && (idtv_info & INTR_INFO_DELIVER_CODE_MASK) ) ecode = __vmread(IDT_VECTORING_ERROR_CODE); regs->eip += inst_len; @@ -2641,15 +2629,15 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) break; } case EXIT_REASON_CPUID: - update_guest_eip(); /* Safe: CPUID */ + vmx_update_guest_eip(); /* Safe: CPUID */ vmx_do_cpuid(regs); break; case EXIT_REASON_HLT: - update_guest_eip(); /* Safe: HLT */ + vmx_update_guest_eip(); /* Safe: HLT */ hvm_hlt(regs->eflags); break; case EXIT_REASON_INVLPG: - update_guest_eip(); /* Safe: INVLPG */ + vmx_update_guest_eip(); /* Safe: INVLPG */ exit_qualification = __vmread(EXIT_QUALIFICATION); vmx_invlpg_intercept(exit_qualification); break; @@ -2657,7 +2645,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) regs->ecx = hvm_msr_tsc_aux(v); /* fall through */ case EXIT_REASON_RDTSC: - update_guest_eip(); /* Safe: RDTSC, RDTSCP */ + vmx_update_guest_eip(); /* Safe: RDTSC, RDTSCP */ hvm_rdtsc_intercept(regs); break; case EXIT_REASON_VMCALL: @@ -2667,7 +2655,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) rc = hvm_do_hypercall(regs); if ( rc != HVM_HCALL_preempted ) { - update_guest_eip(); /* Safe: VMCALL */ + vmx_update_guest_eip(); /* Safe: VMCALL */ if ( rc == HVM_HCALL_invalidate ) send_invalidate_req(); } @@ -2677,7 +2665,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) { exit_qualification = __vmread(EXIT_QUALIFICATION); if ( vmx_cr_access(exit_qualification) == X86EMUL_OKAY ) - update_guest_eip(); /* Safe: MOV Cn, LMSW, CLTS */ + vmx_update_guest_eip(); /* Safe: MOV Cn, LMSW, CLTS */ break; } case EXIT_REASON_DR_ACCESS: @@ -2691,7 +2679,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) { regs->eax = (uint32_t)msr_content; regs->edx = (uint32_t)(msr_content >> 32); - update_guest_eip(); /* Safe: RDMSR */ + vmx_update_guest_eip(); /* Safe: RDMSR */ } break; } @@ -2700,63 +2688,63 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) uint64_t msr_content; msr_content = ((uint64_t)regs->edx << 32) | (uint32_t)regs->eax; if ( hvm_msr_write_intercept(regs->ecx, msr_content) == X86EMUL_OKAY ) - update_guest_eip(); /* Safe: WRMSR */ + vmx_update_guest_eip(); /* Safe: WRMSR */ break; } case EXIT_REASON_VMXOFF: if ( nvmx_handle_vmxoff(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_VMXON: if ( nvmx_handle_vmxon(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_VMCLEAR: if ( nvmx_handle_vmclear(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_VMPTRLD: if ( nvmx_handle_vmptrld(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_VMPTRST: if ( nvmx_handle_vmptrst(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_VMREAD: if ( nvmx_handle_vmread(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_VMWRITE: if ( nvmx_handle_vmwrite(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_VMLAUNCH: if ( nvmx_handle_vmlaunch(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_VMRESUME: if ( nvmx_handle_vmresume(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_INVEPT: if ( nvmx_handle_invept(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_INVVPID: if ( nvmx_handle_invvpid(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_MWAIT_INSTRUCTION: @@ -2804,14 +2792,14 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) int bytes = (exit_qualification & 0x07) + 1; int dir = (exit_qualification & 0x08) ? IOREQ_READ : IOREQ_WRITE; if ( handle_pio(port, bytes, dir) ) - update_guest_eip(); /* Safe: IN, OUT */ + vmx_update_guest_eip(); /* Safe: IN, OUT */ } break; case EXIT_REASON_INVD: case EXIT_REASON_WBINVD: { - update_guest_eip(); /* Safe: INVD, WBINVD */ + vmx_update_guest_eip(); /* Safe: INVD, WBINVD */ vmx_wbinvd_intercept(); break; } @@ -2843,7 +2831,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) case EXIT_REASON_XSETBV: if ( hvm_handle_xsetbv(regs->ecx, (regs->rdx << 32) | regs->_eax) == 0 ) - update_guest_eip(); /* Safe: XSETBV */ + vmx_update_guest_eip(); /* Safe: XSETBV */ break; case EXIT_REASON_APIC_WRITE: diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 5dfbc54..82be4cc 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -2139,7 +2139,7 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs, tsc += __get_vvmcs(nvcpu->nv_vvmcx, TSC_OFFSET); regs->eax = (uint32_t)tsc; regs->edx = (uint32_t)(tsc >> 32); - update_guest_eip(); + vmx_update_guest_eip(); return 1; } diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h index c33b9f9..c21a303 100644 --- a/xen/include/asm-x86/hvm/vmx/vmx.h +++ b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -446,6 +446,18 @@ static inline int __vmxon(u64 addr) return rc; } +/* + * Not all cases receive valid value in the VM-exit instruction length field. + * Callers must know what they''re doing! + */ +static inline int vmx_get_instruction_length(void) +{ + int len; + len = __vmread(VM_EXIT_INSTRUCTION_LEN); /* Safe: callers audited */ + BUG_ON((len < 1) || (len > 15)); + return len; +} + void vmx_get_segment_register(struct vcpu *, enum x86_segment, struct segment_register *); void vmx_inject_extint(int trap); @@ -457,7 +469,10 @@ void ept_p2m_uninit(struct p2m_domain *p2m); void ept_walk_table(struct domain *d, unsigned long gfn); void setup_ept_dump(void); -void update_guest_eip(void); +void vmx_update_guest_eip(void); +void vmx_dr_access(unsigned long exit_qualification, + struct cpu_user_regs *regs); +void vmx_fpu_enter(struct vcpu *v); int alloc_p2m_hap_data(struct p2m_domain *p2m); void free_p2m_hap_data(struct p2m_domain *p2m); -- 1.7.2.3
Mukesh Rathor
2013-Jul-20 01:44 UTC
[PATCH 07/23] PVH xen: vmcs related preparatory changes for PVH
In this patch, some common code is factored out of construct_vmcs() to create vmx_set_common_host_vmcs_fields() to be used by PVH. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/arch/x86/hvm/vmx/vmcs.c | 58 +++++++++++++++++++++++------------------- 1 files changed, 32 insertions(+), 26 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index de9f592..36f167f 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -825,11 +825,40 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val) virtual_vmcs_exit(vvmcs); } -static int construct_vmcs(struct vcpu *v) +static void vmx_set_common_host_vmcs_fields(struct vcpu *v) { - struct domain *d = v->domain; uint16_t sysenter_cs; unsigned long sysenter_eip; + + /* Host data selectors. */ + __vmwrite(HOST_SS_SELECTOR, __HYPERVISOR_DS); + __vmwrite(HOST_DS_SELECTOR, __HYPERVISOR_DS); + __vmwrite(HOST_ES_SELECTOR, __HYPERVISOR_DS); + __vmwrite(HOST_FS_SELECTOR, 0); + __vmwrite(HOST_GS_SELECTOR, 0); + __vmwrite(HOST_FS_BASE, 0); + __vmwrite(HOST_GS_BASE, 0); + + /* Host control registers. */ + v->arch.hvm_vmx.host_cr0 = read_cr0() | X86_CR0_TS; + __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); + __vmwrite(HOST_CR4, + mmu_cr4_features | (xsave_enabled(v) ? X86_CR4_OSXSAVE : 0)); + + /* Host CS:RIP. */ + __vmwrite(HOST_CS_SELECTOR, __HYPERVISOR_CS); + __vmwrite(HOST_RIP, (unsigned long)vmx_asm_vmexit_handler); + + /* Host SYSENTER CS:RIP. */ + rdmsrl(MSR_IA32_SYSENTER_CS, sysenter_cs); + __vmwrite(HOST_SYSENTER_CS, sysenter_cs); + rdmsrl(MSR_IA32_SYSENTER_EIP, sysenter_eip); + __vmwrite(HOST_SYSENTER_EIP, sysenter_eip); +} + +static int construct_vmcs(struct vcpu *v) +{ + struct domain *d = v->domain; u32 vmexit_ctl = vmx_vmexit_control; u32 vmentry_ctl = vmx_vmentry_control; @@ -932,30 +961,7 @@ static int construct_vmcs(struct vcpu *v) __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector); } - /* Host data selectors. */ - __vmwrite(HOST_SS_SELECTOR, __HYPERVISOR_DS); - __vmwrite(HOST_DS_SELECTOR, __HYPERVISOR_DS); - __vmwrite(HOST_ES_SELECTOR, __HYPERVISOR_DS); - __vmwrite(HOST_FS_SELECTOR, 0); - __vmwrite(HOST_GS_SELECTOR, 0); - __vmwrite(HOST_FS_BASE, 0); - __vmwrite(HOST_GS_BASE, 0); - - /* Host control registers. */ - v->arch.hvm_vmx.host_cr0 = read_cr0() | X86_CR0_TS; - __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); - __vmwrite(HOST_CR4, - mmu_cr4_features | (xsave_enabled(v) ? X86_CR4_OSXSAVE : 0)); - - /* Host CS:RIP. */ - __vmwrite(HOST_CS_SELECTOR, __HYPERVISOR_CS); - __vmwrite(HOST_RIP, (unsigned long)vmx_asm_vmexit_handler); - - /* Host SYSENTER CS:RIP. */ - rdmsrl(MSR_IA32_SYSENTER_CS, sysenter_cs); - __vmwrite(HOST_SYSENTER_CS, sysenter_cs); - rdmsrl(MSR_IA32_SYSENTER_EIP, sysenter_eip); - __vmwrite(HOST_SYSENTER_EIP, sysenter_eip); + vmx_set_common_host_vmcs_fields(v); /* MSR intercepts. */ __vmwrite(VM_EXIT_MSR_LOAD_COUNT, 0); -- 1.7.2.3
Mukesh Rathor
2013-Jul-20 01:44 UTC
[PATCH 08/23] PVH xen: Introduce PVH guest type and some basic changes.
This patch introduces the concept of a pvh guest. There are other basic changes like creating macros to check for pv/pvh vcpu/domain, and also modifying copy-macros to account for pvh. Finally, guest_kernel_mode is changed to boast that a PVH doesn''t need to check for TF_kernel_mode flag since the kernel runs in ring 0. Chagnes in V2: - make is_pvh/is_hvm enum instead of adding is_pvh as a new flag. - fix indentation and spacing in guest_kernel_mode macro. - add debug only BUG() in GUEST_KERNEL_RPL macro as it should no longer be called in any PVH paths. Chagnes in V3: - Rename enum fields, and add is_pv to it. - Get rid if is_hvm_or_pvh_* macros. Chagnes in V4: - Move e820 fields out of pv_domain struct. Chagnes in V5: - Move e820 changes above in V4, to a separate patch. Chagnes in V5: - Rename enum guest_type from is_pv, ... to guest_type_pv, .... Chagnes in V8: - Got to VMCS for DPL check instead of checking the rpl in guest_kernel_mode. Note, we drop the const qualifier from vcpu_show_registers() to accomodate the hvm function call in guest_kernel_mode(). - Also, hvm_kernel_mode is put in hvm.c because it''s called from guest_kernel_mode in regs.h which is a pretty early header include. Hence, we can''t place it in hvm.h like other similar functions. The other alternative, to put hvm_kernel_mode in regs.h itself, but then it calls hvm_get_segment_register() for which either we need to include hvm.h in regs.h, not possible, or add proto for hvm_get_segment_register(). But then the args to hvm_get_segment_register() also need their headers. So, in the end this seems to be the best/only way. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/debug.c | 2 +- xen/arch/x86/hvm/hvm.c | 8 ++++++++ xen/arch/x86/x86_64/traps.c | 2 +- xen/common/domain.c | 2 +- xen/include/asm-x86/desc.h | 4 +++- xen/include/asm-x86/domain.h | 2 +- xen/include/asm-x86/guest_access.h | 12 ++++++------ xen/include/asm-x86/x86_64/regs.h | 11 +++++++---- xen/include/public/domctl.h | 3 +++ xen/include/xen/sched.h | 21 ++++++++++++++++++--- 10 files changed, 49 insertions(+), 18 deletions(-) diff --git a/xen/arch/x86/debug.c b/xen/arch/x86/debug.c index e67473e..167421d 100644 --- a/xen/arch/x86/debug.c +++ b/xen/arch/x86/debug.c @@ -158,7 +158,7 @@ dbg_rw_guest_mem(dbgva_t addr, dbgbyte_t *buf, int len, struct domain *dp, pagecnt = min_t(long, PAGE_SIZE - (addr & ~PAGE_MASK), len); - mfn = (dp->is_hvm + mfn = (!is_pv_domain(dp) ? dbg_hvm_va2mfn(addr, dp, toaddr, &gfn) : dbg_pv_va2mfn(addr, dp, pgd3)); if ( mfn == INVALID_MFN ) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 8284b3b..bac4708 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4642,6 +4642,14 @@ enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v) return hvm_funcs.nhvm_intr_blocked(v); } +bool_t hvm_kernel_mode(struct vcpu *v) +{ + struct segment_register seg; + + hvm_get_segment_register(v, x86_seg_ss, &seg); + return (seg.attr.fields.dpl == 0); +} + /* * Local variables: * mode: C diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c index 9e0571d..feb50ff 100644 --- a/xen/arch/x86/x86_64/traps.c +++ b/xen/arch/x86/x86_64/traps.c @@ -141,7 +141,7 @@ void show_registers(struct cpu_user_regs *regs) } } -void vcpu_show_registers(const struct vcpu *v) +void vcpu_show_registers(struct vcpu *v) { const struct cpu_user_regs *regs = &v->arch.user_regs; unsigned long crs[8]; diff --git a/xen/common/domain.c b/xen/common/domain.c index 6c264a5..38b1bad 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -236,7 +236,7 @@ struct domain *domain_create( goto fail; if ( domcr_flags & DOMCRF_hvm ) - d->is_hvm = 1; + d->guest_type = guest_type_hvm; if ( domid == 0 ) { diff --git a/xen/include/asm-x86/desc.h b/xen/include/asm-x86/desc.h index 354b889..041e9d3 100644 --- a/xen/include/asm-x86/desc.h +++ b/xen/include/asm-x86/desc.h @@ -38,7 +38,9 @@ #ifndef __ASSEMBLY__ -#define GUEST_KERNEL_RPL(d) (is_pv_32bit_domain(d) ? 1 : 3) +/* PVH 32bitfixme : see emulate_gate_op call from do_general_protection */ +#define GUEST_KERNEL_RPL(d) ({ ASSERT(is_pv_domain(d)); \ + is_pv_32bit_domain(d) ? 1 : 3; }) /* Fix up the RPL of a guest segment selector. */ #define __fixup_guest_selector(d, sel) \ diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index c3f9f8e..22a72df 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -447,7 +447,7 @@ struct arch_vcpu #define hvm_svm hvm_vcpu.u.svm void vcpu_show_execution_state(struct vcpu *); -void vcpu_show_registers(const struct vcpu *); +void vcpu_show_registers(struct vcpu *); /* Clean up CR4 bits that are not under guest control. */ unsigned long pv_guest_cr4_fixup(const struct vcpu *, unsigned long guest_cr4); diff --git a/xen/include/asm-x86/guest_access.h b/xen/include/asm-x86/guest_access.h index ca700c9..675dda1 100644 --- a/xen/include/asm-x86/guest_access.h +++ b/xen/include/asm-x86/guest_access.h @@ -14,27 +14,27 @@ /* Raw access functions: no type checking. */ #define raw_copy_to_guest(dst, src, len) \ - (is_hvm_vcpu(current) ? \ + (!is_pv_vcpu(current) ? \ copy_to_user_hvm((dst), (src), (len)) : \ copy_to_user((dst), (src), (len))) #define raw_copy_from_guest(dst, src, len) \ - (is_hvm_vcpu(current) ? \ + (!is_pv_vcpu(current) ? \ copy_from_user_hvm((dst), (src), (len)) : \ copy_from_user((dst), (src), (len))) #define raw_clear_guest(dst, len) \ - (is_hvm_vcpu(current) ? \ + (!is_pv_vcpu(current) ? \ clear_user_hvm((dst), (len)) : \ clear_user((dst), (len))) #define __raw_copy_to_guest(dst, src, len) \ - (is_hvm_vcpu(current) ? \ + (!is_pv_vcpu(current) ? \ copy_to_user_hvm((dst), (src), (len)) : \ __copy_to_user((dst), (src), (len))) #define __raw_copy_from_guest(dst, src, len) \ - (is_hvm_vcpu(current) ? \ + (!is_pv_vcpu(current) ? \ copy_from_user_hvm((dst), (src), (len)) : \ __copy_from_user((dst), (src), (len))) #define __raw_clear_guest(dst, len) \ - (is_hvm_vcpu(current) ? \ + (!is_pv_vcpu(current) ? \ clear_user_hvm((dst), (len)) : \ clear_user((dst), (len))) diff --git a/xen/include/asm-x86/x86_64/regs.h b/xen/include/asm-x86/x86_64/regs.h index 3cdc702..d91a84b 100644 --- a/xen/include/asm-x86/x86_64/regs.h +++ b/xen/include/asm-x86/x86_64/regs.h @@ -10,10 +10,13 @@ #define ring_2(r) (((r)->cs & 3) == 2) #define ring_3(r) (((r)->cs & 3) == 3) -#define guest_kernel_mode(v, r) \ - (!is_pv_32bit_vcpu(v) ? \ - (ring_3(r) && ((v)->arch.flags & TF_kernel_mode)) : \ - (ring_1(r))) +bool_t hvm_kernel_mode(struct vcpu *); + +#define guest_kernel_mode(v, r) \ + (is_pvh_vcpu(v) ? hvm_kernel_mode(v) : \ + (!is_pv_32bit_vcpu(v) ? \ + (ring_3(r) && ((v)->arch.flags & TF_kernel_mode)) : \ + (ring_1(r)))) #define permit_softint(dpl, v, r) \ ((dpl) >= (guest_kernel_mode(v, r) ? 1 : 3)) diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index 4c5b2bb..6b1aa11 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -89,6 +89,9 @@ struct xen_domctl_getdomaininfo { /* Being debugged. */ #define _XEN_DOMINF_debugged 6 #define XEN_DOMINF_debugged (1U<<_XEN_DOMINF_debugged) +/* domain is PVH */ +#define _XEN_DOMINF_pvh_guest 7 +#define XEN_DOMINF_pvh_guest (1U<<_XEN_DOMINF_pvh_guest) /* XEN_DOMINF_shutdown guest-supplied code. */ #define XEN_DOMINF_shutdownmask 255 #define XEN_DOMINF_shutdownshift 16 diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index ae6a3b8..2d48d22 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -238,6 +238,14 @@ struct mem_event_per_domain struct mem_event_domain access; }; +/* + * PVH is a PV guest running in an HVM container. While is_hvm_* checks are + * false for it, it uses many of the HVM data structs. + */ +enum guest_type { + guest_type_pv, guest_type_pvh, guest_type_hvm +}; + struct domain { domid_t domain_id; @@ -285,8 +293,8 @@ struct domain struct rangeset *iomem_caps; struct rangeset *irq_caps; - /* Is this an HVM guest? */ - bool_t is_hvm; + enum guest_type guest_type; + #ifdef HAS_PASSTHROUGH /* Does this guest need iommu mappings? */ bool_t need_iommu; @@ -464,6 +472,9 @@ struct domain *domain_create( /* DOMCRF_oos_off: dont use out-of-sync optimization for shadow page tables */ #define _DOMCRF_oos_off 4 #define DOMCRF_oos_off (1U<<_DOMCRF_oos_off) + /* DOMCRF_pvh: Create PV domain in HVM container. */ +#define _DOMCRF_pvh 5 +#define DOMCRF_pvh (1U<<_DOMCRF_pvh) /* * rcu_lock_domain_by_id() is more efficient than get_domain_by_id(). @@ -732,8 +743,12 @@ void watchdog_domain_destroy(struct domain *d); #define VM_ASSIST(_d,_t) (test_bit((_t), &(_d)->vm_assist)) -#define is_hvm_domain(d) ((d)->is_hvm) +#define is_pv_domain(d) ((d)->guest_type == guest_type_pv) +#define is_pv_vcpu(v) (is_pv_domain((v)->domain)) +#define is_hvm_domain(d) ((d)->guest_type == guest_type_hvm) #define is_hvm_vcpu(v) (is_hvm_domain(v->domain)) +#define is_pvh_domain(d) ((d)->guest_type == guest_type_pvh) +#define is_pvh_vcpu(v) (is_pvh_domain((v)->domain)) #define is_pinned_vcpu(v) ((v)->domain->is_pinned || \ cpumask_weight((v)->cpu_affinity) == 1) #ifdef HAS_PASSTHROUGH -- 1.7.2.3
Mukesh Rathor
2013-Jul-20 01:44 UTC
[PATCH 09/23] PVH xen: introduce pvh_set_vcpu_info() and vmx_pvh_set_vcpu_info()
vmx_pvh_set_vcpu_info() is added to a new file pvh.c, to which more changes are added later, like pvh vmexit handler. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/vmx/Makefile | 1 + xen/arch/x86/hvm/vmx/pvh.c | 78 +++++++++++++++++++++++++++++++++++++ xen/arch/x86/hvm/vmx/vmx.c | 1 + xen/include/asm-x86/hvm/hvm.h | 8 ++++ xen/include/asm-x86/hvm/vmx/vmx.h | 1 + 5 files changed, 89 insertions(+), 0 deletions(-) create mode 100644 xen/arch/x86/hvm/vmx/pvh.c diff --git a/xen/arch/x86/hvm/vmx/Makefile b/xen/arch/x86/hvm/vmx/Makefile index 373b3d9..59fb5d4 100644 --- a/xen/arch/x86/hvm/vmx/Makefile +++ b/xen/arch/x86/hvm/vmx/Makefile @@ -1,5 +1,6 @@ obj-bin-y += entry.o obj-y += intr.o +obj-y += pvh.o obj-y += realmode.o obj-y += vmcs.o obj-y += vmx.o diff --git a/xen/arch/x86/hvm/vmx/pvh.c b/xen/arch/x86/hvm/vmx/pvh.c new file mode 100644 index 0000000..b37e423 --- /dev/null +++ b/xen/arch/x86/hvm/vmx/pvh.c @@ -0,0 +1,78 @@ +/* + * Copyright (C) 2013, Mukesh Rathor, Oracle Corp. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ + +#include <xen/hypercall.h> +#include <xen/guest_access.h> +#include <asm/p2m.h> +#include <asm/traps.h> +#include <asm/hvm/vmx/vmx.h> +#include <public/sched.h> +#include <asm/hvm/nestedhvm.h> +#include <asm/xstate.h> + +/* + * Set vmcs fields in support of vcpu_op -> VCPUOP_initialise hcall. Called + * from arch_set_info_guest() which sets the (PVH relevant) non-vmcs fields. + * + * In case of linux: + * The boot vcpu calls this to set some context for the non boot smp vcpu. + * The call comes from cpu_initialize_context(). (boot vcpu 0 context is + * set by the tools via do_domctl -> vcpu_initialise). + * + * NOTE: In case of VMCS, loading a selector doesn''t cause the hidden fields + * to be automatically loaded. We load selectors here but not the hidden + * parts, except for GS_BASE and FS_BASE. This means we require the + * guest to have same hidden values as the default values loaded in the + * vmcs in pvh_construct_vmcs(), ie, the GDT the vcpu is coming up on + * should be something like following, + * (from 64bit linux, CS:0x10 DS/SS:0x18) : + * + * ffff88007f704000: 0000000000000000 00cf9b000000ffff + * ffff88007f704010: 00af9b000000ffff 00cf93000000ffff + * ffff88007f704020: 00cffb000000ffff 00cff3000000ffff + * + */ +int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context *ctxtp) +{ + if ( v->vcpu_id == 0 ) + return 0; + + if ( !(ctxtp->flags & VGCF_in_kernel) ) + return -EINVAL; + + vmx_vmcs_enter(v); + __vmwrite(GUEST_GDTR_BASE, ctxtp->gdt.pvh.addr); + __vmwrite(GUEST_GDTR_LIMIT, ctxtp->gdt.pvh.limit); + __vmwrite(GUEST_LDTR_BASE, ctxtp->ldt_base); + __vmwrite(GUEST_LDTR_LIMIT, ctxtp->ldt_ents); + + __vmwrite(GUEST_FS_BASE, ctxtp->fs_base); + __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_kernel); + + __vmwrite(GUEST_CS_SELECTOR, ctxtp->user_regs.cs); + __vmwrite(GUEST_SS_SELECTOR, ctxtp->user_regs.ss); + __vmwrite(GUEST_ES_SELECTOR, ctxtp->user_regs.es); + __vmwrite(GUEST_DS_SELECTOR, ctxtp->user_regs.ds); + __vmwrite(GUEST_FS_SELECTOR, ctxtp->user_regs.fs); + __vmwrite(GUEST_GS_SELECTOR, ctxtp->user_regs.gs); + + if ( vmx_add_guest_msr(MSR_SHADOW_GS_BASE) ) + { + vmx_vmcs_exit(v); + return -EINVAL; + } + vmx_write_guest_msr(MSR_SHADOW_GS_BASE, ctxtp->gs_base_user); + + vmx_vmcs_exit(v); + return 0; +} diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 7292357..e3c7515 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1562,6 +1562,7 @@ static struct hvm_function_table __initdata vmx_function_table = { .sync_pir_to_irr = vmx_sync_pir_to_irr, .handle_eoi = vmx_handle_eoi, .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m, + .pvh_set_vcpu_info = vmx_pvh_set_vcpu_info, }; const struct hvm_function_table * __init start_vmx(void) diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index 00489cf..072a2a7 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -193,6 +193,8 @@ struct hvm_function_table { paddr_t *L1_gpa, unsigned int *page_order, uint8_t *p2m_acc, bool_t access_r, bool_t access_w, bool_t access_x); + + int (*pvh_set_vcpu_info)(struct vcpu *v, struct vcpu_guest_context *ctxtp); }; extern struct hvm_function_table hvm_funcs; @@ -326,6 +328,12 @@ static inline unsigned long hvm_get_shadow_gs_base(struct vcpu *v) return hvm_funcs.get_shadow_gs_base(v); } +static inline int pvh_set_vcpu_info(struct vcpu *v, + struct vcpu_guest_context *ctxtp) +{ + return hvm_funcs.pvh_set_vcpu_info(v, ctxtp); +} + #define is_viridian_domain(_d) \ (is_hvm_domain(_d) && ((_d)->arch.hvm_domain.params[HVM_PARAM_VIRIDIAN])) diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h index c21a303..9e6c481 100644 --- a/xen/include/asm-x86/hvm/vmx/vmx.h +++ b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -473,6 +473,7 @@ void vmx_update_guest_eip(void); void vmx_dr_access(unsigned long exit_qualification, struct cpu_user_regs *regs); void vmx_fpu_enter(struct vcpu *v); +int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context *ctxtp); int alloc_p2m_hap_data(struct p2m_domain *p2m); void free_p2m_hap_data(struct p2m_domain *p2m); -- 1.7.2.3
Mukesh Rathor
2013-Jul-20 01:44 UTC
[PATCH 10/23] PVH xen: domain create, context switch related code changes
This patch mostly contains changes to arch/x86/domain.c to allow for a PVH domain creation. The new function pvh_set_vcpu_info(), introduced in the previous patch, is called here to set some guest context in the VMCS. This patch also changes the context_switch code in the same file to follow HVM behaviour for PVH. Changes in V2: - changes to read_segment_register() moved to this patch. - The other comment was to create NULL functions for pvh_set_vcpu_info and pvh_read_descriptor which are implemented in later patch, but since I disable PVH creation until all patches are checked in, it is not needed. But it helps breaking down of patches. Changes in V3: - Fix read_segment_register() macro to make sure args are evaluated once, and use # instead of STR for name in the macro. Changes in V4: - Remove pvh substruct in the hvm substruct, as the vcpu_info_mfn has been moved out of pv_vcpu struct. - rename hvm_pvh_* functions to hvm_*. Changes in V5: - remove pvh_read_descriptor(). Changes in V7: - remove hap_update_cr3() and read_segment_register changes from here. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/arch/x86/domain.c | 56 ++++++++++++++++++++++++++++++++---------------- xen/arch/x86/mm.c | 3 ++ 2 files changed, 40 insertions(+), 19 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index c361abf..fccb4ee 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -385,7 +385,7 @@ int vcpu_initialise(struct vcpu *v) vmce_init_vcpu(v); - if ( is_hvm_domain(d) ) + if ( !is_pv_domain(d) ) { rc = hvm_vcpu_initialise(v); goto done; @@ -452,7 +452,7 @@ void vcpu_destroy(struct vcpu *v) vcpu_destroy_fpu(v); - if ( is_hvm_vcpu(v) ) + if ( !is_pv_vcpu(v) ) hvm_vcpu_destroy(v); else xfree(v->arch.pv_vcpu.trap_ctxt); @@ -464,7 +464,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) int rc = -ENOMEM; d->arch.hvm_domain.hap_enabled - is_hvm_domain(d) && + !is_pv_domain(d) && hvm_funcs.hap_supported && (domcr_flags & DOMCRF_hap); d->arch.hvm_domain.mem_sharing_enabled = 0; @@ -512,7 +512,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) mapcache_domain_init(d); HYPERVISOR_COMPAT_VIRT_START(d) - is_hvm_domain(d) ? ~0u : __HYPERVISOR_COMPAT_VIRT_START; + is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u; if ( (rc = paging_domain_init(d, domcr_flags)) != 0 ) goto fail; @@ -555,7 +555,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) } spin_lock_init(&d->arch.e820_lock); - if ( is_hvm_domain(d) ) + if ( !is_pv_domain(d) ) { if ( (rc = hvm_domain_initialise(d)) != 0 ) { @@ -651,7 +651,7 @@ int arch_set_info_guest( #define c(fld) (compat ? (c.cmp->fld) : (c.nat->fld)) flags = c(flags); - if ( !is_hvm_vcpu(v) ) + if ( is_pv_vcpu(v) ) { if ( !compat ) { @@ -704,7 +704,7 @@ int arch_set_info_guest( v->fpu_initialised = !!(flags & VGCF_I387_VALID); v->arch.flags &= ~TF_kernel_mode; - if ( (flags & VGCF_in_kernel) || is_hvm_vcpu(v)/*???*/ ) + if ( (flags & VGCF_in_kernel) || !is_pv_vcpu(v)/*???*/ ) v->arch.flags |= TF_kernel_mode; v->arch.vgc_flags = flags; @@ -719,7 +719,7 @@ int arch_set_info_guest( if ( !compat ) { memcpy(&v->arch.user_regs, &c.nat->user_regs, sizeof(c.nat->user_regs)); - if ( !is_hvm_vcpu(v) ) + if ( is_pv_vcpu(v) ) memcpy(v->arch.pv_vcpu.trap_ctxt, c.nat->trap_ctxt, sizeof(c.nat->trap_ctxt)); } @@ -735,10 +735,13 @@ int arch_set_info_guest( v->arch.user_regs.eflags |= 2; - if ( is_hvm_vcpu(v) ) + if ( !is_pv_vcpu(v) ) { hvm_set_info_guest(v); - goto out; + if ( is_hvm_vcpu(v) || v->is_initialised ) + goto out; + else + goto pvh_skip_pv_stuff; } init_int80_direct_trap(v); @@ -853,6 +856,7 @@ int arch_set_info_guest( set_bit(_VPF_in_reset, &v->pause_flags); + pvh_skip_pv_stuff: if ( !compat ) cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[3]); else @@ -861,7 +865,7 @@ int arch_set_info_guest( if ( !cr3_page ) rc = -EINVAL; - else if ( paging_mode_refcounts(d) ) + else if ( paging_mode_refcounts(d) || is_pvh_vcpu(v) ) /* nothing */; else if ( cr3_page == v->arch.old_guest_table ) { @@ -893,8 +897,15 @@ int arch_set_info_guest( /* handled below */; else if ( !compat ) { + /* PVH 32bitfixme. */ + if ( is_pvh_vcpu(v) ) + { + v->arch.cr3 = page_to_mfn(cr3_page); + v->arch.hvm_vcpu.guest_cr[3] = c.nat->ctrlreg[3]; + } + v->arch.guest_table = pagetable_from_page(cr3_page); - if ( c.nat->ctrlreg[1] ) + if ( c.nat->ctrlreg[1] && !is_pvh_vcpu(v) ) { cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[1]); cr3_page = get_page_from_gfn(d, cr3_gfn, NULL, P2M_ALLOC); @@ -954,6 +965,13 @@ int arch_set_info_guest( update_cr3(v); + if ( is_pvh_vcpu(v) ) + { + /* Set VMCS fields. */ + if ( (rc = pvh_set_vcpu_info(v, c.nat)) != 0 ) + return rc; + } + out: if ( flags & VGCF_online ) clear_bit(_VPF_down, &v->pause_flags); @@ -1315,7 +1333,7 @@ static void update_runstate_area(struct vcpu *v) static inline int need_full_gdt(struct vcpu *v) { - return (!is_hvm_vcpu(v) && !is_idle_vcpu(v)); + return (is_pv_vcpu(v) && !is_idle_vcpu(v)); } static void __context_switch(void) @@ -1450,7 +1468,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next) /* Re-enable interrupts before restoring state which may fault. */ local_irq_enable(); - if ( !is_hvm_vcpu(next) ) + if ( is_pv_vcpu(next) ) { load_LDT(next); load_segments(next); @@ -1576,12 +1594,12 @@ unsigned long hypercall_create_continuation( regs->eax = op; /* Ensure the hypercall trap instruction is re-executed. */ - if ( !is_hvm_vcpu(current) ) + if ( is_pv_vcpu(current) ) regs->eip -= 2; /* re-execute ''syscall'' / ''int $xx'' */ else current->arch.hvm_vcpu.hcall_preempted = 1; - if ( !is_hvm_vcpu(current) ? + if ( is_pv_vcpu(current) ? !is_pv_32on64_vcpu(current) : (hvm_guest_x86_mode(current) == 8) ) { @@ -1849,7 +1867,7 @@ int domain_relinquish_resources(struct domain *d) return ret; } - if ( !is_hvm_domain(d) ) + if ( is_pv_domain(d) ) { for_each_vcpu ( d, v ) { @@ -1922,7 +1940,7 @@ int domain_relinquish_resources(struct domain *d) BUG(); } - if ( is_hvm_domain(d) ) + if ( !is_pv_domain(d) ) hvm_domain_relinquish_resources(d); return 0; @@ -2006,7 +2024,7 @@ void vcpu_mark_events_pending(struct vcpu *v) if ( already_pending ) return; - if ( is_hvm_vcpu(v) ) + if ( !is_pv_vcpu(v) ) hvm_assert_evtchn_irq(v); else vcpu_kick(v); diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index e980431..de7ba45 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4334,6 +4334,9 @@ void destroy_gdt(struct vcpu *v) int i; unsigned long pfn; + if ( is_pvh_vcpu(v) ) + return; + v->arch.pv_vcpu.gdt_ents = 0; pl1e = gdt_ldt_ptes(v->domain, v); for ( i = 0; i < FIRST_RESERVED_GDT_PAGE; i++ ) -- 1.7.2.3
Mukesh Rathor
2013-Jul-20 01:44 UTC
[PATCH 11/23] PVH xen: support invalid op emulation for PVH
This patch supports invalid op emulation for PVH by calling appropriate copy macros and and HVM function to inject PF. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/arch/x86/traps.c | 17 ++++++++++++++--- xen/include/asm-x86/traps.h | 1 + 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 378ef0a..a3ca70b 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -459,6 +459,11 @@ static void instruction_done( struct cpu_user_regs *regs, unsigned long eip, unsigned int bpmatch) { regs->eip = eip; + + /* PVH fixme: debug trap below */ + if ( is_pvh_vcpu(current) ) + return; + regs->eflags &= ~X86_EFLAGS_RF; if ( bpmatch || (regs->eflags & X86_EFLAGS_TF) ) { @@ -913,7 +918,7 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs *regs) return EXCRET_fault_fixed; } -static int emulate_forced_invalid_op(struct cpu_user_regs *regs) +int emulate_forced_invalid_op(struct cpu_user_regs *regs) { char sig[5], instr[2]; unsigned long eip, rc; @@ -921,7 +926,7 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs) eip = regs->eip; /* Check for forced emulation signature: ud2 ; .ascii "xen". */ - if ( (rc = copy_from_user(sig, (char *)eip, sizeof(sig))) != 0 ) + if ( (rc = raw_copy_from_guest(sig, (char *)eip, sizeof(sig))) != 0 ) { propagate_page_fault(eip + sizeof(sig) - rc, 0); return EXCRET_fault_fixed; @@ -931,7 +936,7 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs) eip += sizeof(sig); /* We only emulate CPUID. */ - if ( ( rc = copy_from_user(instr, (char *)eip, sizeof(instr))) != 0 ) + if ( ( rc = raw_copy_from_guest(instr, (char *)eip, sizeof(instr))) != 0 ) { propagate_page_fault(eip + sizeof(instr) - rc, 0); return EXCRET_fault_fixed; @@ -1076,6 +1081,12 @@ void propagate_page_fault(unsigned long addr, u16 error_code) struct vcpu *v = current; struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce; + if ( is_pvh_vcpu(v) ) + { + hvm_inject_page_fault(error_code, addr); + return; + } + v->arch.pv_vcpu.ctrlreg[2] = addr; arch_set_cr2(v, addr); diff --git a/xen/include/asm-x86/traps.h b/xen/include/asm-x86/traps.h index 82cbcee..1d9b087 100644 --- a/xen/include/asm-x86/traps.h +++ b/xen/include/asm-x86/traps.h @@ -48,5 +48,6 @@ extern int guest_has_trap_callback(struct domain *d, uint16_t vcpuid, */ extern int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr); +int emulate_forced_invalid_op(struct cpu_user_regs *regs); #endif /* ASM_TRAP_H */ -- 1.7.2.3
Mukesh Rathor
2013-Jul-20 01:44 UTC
[PATCH 12/23] PVH xen: Support privileged op emulation for PVH
This patch changes mostly traps.c to support privileged op emulation for PVH. A new function read_descriptor_sel() is introduced to read descriptor for PVH given a selector. Another new function vmx_read_selector() reads a selector from VMCS, to support read_segment_register() for PVH. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/arch/x86/hvm/vmx/vmx.c | 40 +++++++++++++++++++ xen/arch/x86/traps.c | 86 +++++++++++++++++++++++++++++++++++----- xen/include/asm-x86/hvm/hvm.h | 7 +++ xen/include/asm-x86/system.h | 19 +++++++-- 4 files changed, 137 insertions(+), 15 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index e3c7515..80109c1 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -664,6 +664,45 @@ static void vmx_ctxt_switch_to(struct vcpu *v) .fields = { .type = 0xb, .s = 0, .dpl = 0, .p = 1, .avl = 0, \ .l = 0, .db = 0, .g = 0, .pad = 0 } }).bytes) +u16 vmx_read_selector(struct vcpu *v, enum x86_segment seg) +{ + u16 sel = 0; + + vmx_vmcs_enter(v); + switch ( seg ) + { + case x86_seg_cs: + sel = __vmread(GUEST_CS_SELECTOR); + break; + + case x86_seg_ss: + sel = __vmread(GUEST_SS_SELECTOR); + break; + + case x86_seg_es: + sel = __vmread(GUEST_ES_SELECTOR); + break; + + case x86_seg_ds: + sel = __vmread(GUEST_DS_SELECTOR); + break; + + case x86_seg_fs: + sel = __vmread(GUEST_FS_SELECTOR); + break; + + case x86_seg_gs: + sel = __vmread(GUEST_GS_SELECTOR); + break; + + default: + BUG(); + } + vmx_vmcs_exit(v); + + return sel; +} + void vmx_get_segment_register(struct vcpu *v, enum x86_segment seg, struct segment_register *reg) { @@ -1563,6 +1602,7 @@ static struct hvm_function_table __initdata vmx_function_table = { .handle_eoi = vmx_handle_eoi, .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m, .pvh_set_vcpu_info = vmx_pvh_set_vcpu_info, + .read_selector = vmx_read_selector, }; const struct hvm_function_table * __init start_vmx(void) diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index a3ca70b..fe8b94c 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -480,6 +480,10 @@ static unsigned int check_guest_io_breakpoint(struct vcpu *v, unsigned int width, i, match = 0; unsigned long start; + /* PVH fixme: support io breakpoint. */ + if ( is_pvh_vcpu(v) ) + return 0; + if ( !(v->arch.debugreg[5]) || !(v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_DE) ) return 0; @@ -1525,6 +1529,49 @@ static int read_descriptor(unsigned int sel, return 1; } +static int read_descriptor_sel(unsigned int sel, + enum x86_segment which_sel, + struct vcpu *v, + const struct cpu_user_regs *regs, + unsigned long *base, + unsigned long *limit, + unsigned int *ar, + unsigned int vm86attr) +{ + struct segment_register seg; + bool_t long_mode; + + if ( !is_pvh_vcpu(v) ) + return read_descriptor(sel, v, regs, base, limit, ar, vm86attr); + + hvm_get_segment_register(v, x86_seg_cs, &seg); + long_mode = seg.attr.fields.l; + + if ( which_sel != x86_seg_cs ) + hvm_get_segment_register(v, which_sel, &seg); + + /* "ar" is returned packed as in segment_attributes_t. Fix it up. */ + *ar = seg.attr.bytes; + *ar = (*ar & 0xff ) | ((*ar & 0xf00) << 4); + *ar <<= 8; + + if ( long_mode ) + { + *limit = ~0UL; + + if ( which_sel < x86_seg_fs ) + { + *base = 0UL; + return 1; + } + } + else + *limit = seg.limit; + + *base = seg.base; + return 1; +} + static int read_gate_descriptor(unsigned int gate_sel, const struct vcpu *v, unsigned int *sel, @@ -1590,6 +1637,13 @@ static int guest_io_okay( int user_mode = !(v->arch.flags & TF_kernel_mode); #define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v) + /* + * For PVH we check this in vmexit for EXIT_REASON_IO_INSTRUCTION + * and so don''t need to check again here. + */ + if ( is_pvh_vcpu(v) ) + return 1; + if ( !vm86_mode(regs) && (v->arch.pv_vcpu.iopl >= (guest_kernel_mode(v, regs) ? 1 : 3)) ) return 1; @@ -1835,7 +1889,7 @@ static inline uint64_t guest_misc_enable(uint64_t val) _ptr = (unsigned int)_ptr; \ if ( (limit) < sizeof(_x) - 1 || (eip) > (limit) - (sizeof(_x) - 1) ) \ goto fail; \ - if ( (_rc = copy_from_user(&_x, (type *)_ptr, sizeof(_x))) != 0 ) \ + if ( (_rc = raw_copy_from_guest(&_x, (type *)_ptr, sizeof(_x))) != 0 ) \ { \ propagate_page_fault(_ptr + sizeof(_x) - _rc, 0); \ goto skip; \ @@ -1852,6 +1906,7 @@ static int is_cpufreq_controller(struct domain *d) static int emulate_privileged_op(struct cpu_user_regs *regs) { + enum x86_segment which_sel; struct vcpu *v = current; unsigned long *reg, eip = regs->eip; u8 opcode, modrm_reg = 0, modrm_rm = 0, rep_prefix = 0, lock = 0, rex = 0; @@ -1874,9 +1929,10 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) void (*io_emul)(struct cpu_user_regs *) __attribute__((__regparm__(1))); uint64_t val, msr_content; - if ( !read_descriptor(regs->cs, v, regs, - &code_base, &code_limit, &ar, - _SEGMENT_CODE|_SEGMENT_S|_SEGMENT_DPL|_SEGMENT_P) ) + if ( !read_descriptor_sel(regs->cs, x86_seg_cs, v, regs, + &code_base, &code_limit, &ar, + _SEGMENT_CODE|_SEGMENT_S| + _SEGMENT_DPL|_SEGMENT_P) ) goto fail; op_default = op_bytes = (ar & (_SEGMENT_L|_SEGMENT_DB)) ? 4 : 2; ad_default = ad_bytes = (ar & _SEGMENT_L) ? 8 : op_default; @@ -1887,6 +1943,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) /* emulating only opcodes not allowing SS to be default */ data_sel = read_segment_register(v, regs, ds); + which_sel = x86_seg_ds; /* Legacy prefixes. */ for ( i = 0; i < 8; i++, rex == opcode || (rex = 0) ) @@ -1902,23 +1959,29 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) continue; case 0x2e: /* CS override */ data_sel = regs->cs; + which_sel = x86_seg_cs; continue; case 0x3e: /* DS override */ data_sel = read_segment_register(v, regs, ds); + which_sel = x86_seg_ds; continue; case 0x26: /* ES override */ data_sel = read_segment_register(v, regs, es); + which_sel = x86_seg_es; continue; case 0x64: /* FS override */ data_sel = read_segment_register(v, regs, fs); + which_sel = x86_seg_fs; lm_ovr = lm_seg_fs; continue; case 0x65: /* GS override */ data_sel = read_segment_register(v, regs, gs); + which_sel = x86_seg_gs; lm_ovr = lm_seg_gs; continue; case 0x36: /* SS override */ data_sel = regs->ss; + which_sel = x86_seg_ss; continue; case 0xf0: /* LOCK */ lock = 1; @@ -1962,15 +2025,16 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) if ( !(opcode & 2) ) { data_sel = read_segment_register(v, regs, es); + which_sel = x86_seg_es; lm_ovr = lm_seg_none; } if ( !(ar & _SEGMENT_L) ) { - if ( !read_descriptor(data_sel, v, regs, - &data_base, &data_limit, &ar, - _SEGMENT_WR|_SEGMENT_S|_SEGMENT_DPL| - _SEGMENT_P) ) + if ( !read_descriptor_sel(data_sel, which_sel, v, regs, + &data_base, &data_limit, &ar, + _SEGMENT_WR|_SEGMENT_S|_SEGMENT_DPL| + _SEGMENT_P) ) goto fail; if ( !(ar & _SEGMENT_S) || !(ar & _SEGMENT_P) || @@ -2000,9 +2064,9 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) } } else - read_descriptor(data_sel, v, regs, - &data_base, &data_limit, &ar, - 0); + read_descriptor_sel(data_sel, which_sel, v, regs, + &data_base, &data_limit, &ar, + 0); data_limit = ~0UL; ar = _SEGMENT_WR|_SEGMENT_S|_SEGMENT_DPL|_SEGMENT_P; } diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index 072a2a7..29ed313 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -195,6 +195,8 @@ struct hvm_function_table { bool_t access_w, bool_t access_x); int (*pvh_set_vcpu_info)(struct vcpu *v, struct vcpu_guest_context *ctxtp); + + u16 (*read_selector)(struct vcpu *v, enum x86_segment seg); }; extern struct hvm_function_table hvm_funcs; @@ -334,6 +336,11 @@ static inline int pvh_set_vcpu_info(struct vcpu *v, return hvm_funcs.pvh_set_vcpu_info(v, ctxtp); } +static inline u16 pvh_get_selector(struct vcpu *v, enum x86_segment seg) +{ + return hvm_funcs.read_selector(v, seg); +} + #define is_viridian_domain(_d) \ (is_hvm_domain(_d) && ((_d)->arch.hvm_domain.params[HVM_PARAM_VIRIDIAN])) diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h index 9bb22cb..1242657 100644 --- a/xen/include/asm-x86/system.h +++ b/xen/include/asm-x86/system.h @@ -4,10 +4,21 @@ #include <xen/lib.h> #include <xen/bitops.h> -#define read_segment_register(vcpu, regs, name) \ -({ u16 __sel; \ - asm volatile ( "movw %%" STR(name) ",%0" : "=r" (__sel) ); \ - __sel; \ +/* + * We need vcpu because during context switch, going from PV to PVH, + * in save_segments() current has been updated to next, and no longer pointing + * to the PV, but the intention is to get selector for the PV. Checking + * is_pvh_vcpu(current) will yield incorrect results in such a case. + */ +#define read_segment_register(vcpu, regs, name) \ +({ u16 __sel; \ + struct cpu_user_regs *_regs = (regs); \ + \ + if ( is_pvh_vcpu(vcpu) && guest_mode(_regs) ) \ + __sel = pvh_get_selector(vcpu, x86_seg_##name); \ + else \ + asm volatile ( "movw %%" #name ",%0" : "=r" (__sel) ); \ + __sel; \ }) #define wbinvd() \ -- 1.7.2.3
Mukesh Rathor
2013-Jul-20 01:44 UTC
[PATCH 13/23] PVH xen: interrupt/event-channel delivery to PVH
PVH uses HVMIRQ_callback_vector for interrupt delivery. Also, change hvm_vcpu_has_pending_irq() as PVH doesn''t use vlapic emulation, so we can skip vlapic checks in the function. Moreover, a PVH guest installs IDT natively, and sets callback via for interrupt delivery during boot. Once that is done, it receives interrupts via the callback. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/irq.c | 3 +++ xen/arch/x86/hvm/vmx/intr.c | 8 ++++++-- xen/include/asm-x86/domain.h | 2 +- xen/include/asm-x86/event.h | 2 +- 4 files changed, 11 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c index 9eae5de..92fb245 100644 --- a/xen/arch/x86/hvm/irq.c +++ b/xen/arch/x86/hvm/irq.c @@ -405,6 +405,9 @@ struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v) && vcpu_info(v, evtchn_upcall_pending) ) return hvm_intack_vector(plat->irq.callback_via.vector); + if ( is_pvh_vcpu(v) ) + return hvm_intack_none; + if ( vlapic_accept_pic_intr(v) && plat->vpic[0].int_output ) return hvm_intack_pic(0); diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c index e376f3c..ce42950 100644 --- a/xen/arch/x86/hvm/vmx/intr.c +++ b/xen/arch/x86/hvm/vmx/intr.c @@ -165,6 +165,9 @@ static int nvmx_intr_intercept(struct vcpu *v, struct hvm_intack intack) { u32 ctrl; + if ( is_pvh_vcpu(v) ) + return 0; + if ( nvmx_intr_blocked(v) != hvm_intblk_none ) { enable_intr_window(v, intack); @@ -219,8 +222,9 @@ void vmx_intr_assist(void) return; } - /* Crank the handle on interrupt state. */ - pt_vector = pt_update_irq(v); + if ( !is_pvh_vcpu(v) ) + /* Crank the handle on interrupt state. */ + pt_vector = pt_update_irq(v); do { intack = hvm_vcpu_has_pending_irq(v); diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index 22a72df..21a9954 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -16,7 +16,7 @@ #define is_pv_32on64_domain(d) (is_pv_32bit_domain(d)) #define is_pv_32on64_vcpu(v) (is_pv_32on64_domain((v)->domain)) -#define is_hvm_pv_evtchn_domain(d) (is_hvm_domain(d) && \ +#define is_hvm_pv_evtchn_domain(d) (!is_pv_domain(d) && \ d->arch.hvm_domain.irq.callback_via_type == HVMIRQ_callback_vector) #define is_hvm_pv_evtchn_vcpu(v) (is_hvm_pv_evtchn_domain(v->domain)) diff --git a/xen/include/asm-x86/event.h b/xen/include/asm-x86/event.h index 06057c7..7ed5812 100644 --- a/xen/include/asm-x86/event.h +++ b/xen/include/asm-x86/event.h @@ -18,7 +18,7 @@ int hvm_local_events_need_delivery(struct vcpu *v); static inline int local_events_need_delivery(void) { struct vcpu *v = current; - return (is_hvm_vcpu(v) ? hvm_local_events_need_delivery(v) : + return (!is_pv_vcpu(v) ? hvm_local_events_need_delivery(v) : (vcpu_info(v, evtchn_upcall_pending) && !vcpu_info(v, evtchn_upcall_mask))); } -- 1.7.2.3
Mukesh Rathor
2013-Jul-20 01:44 UTC
[PATCH 14/23] PVH xen: additional changes to support PVH guest creation and execution.
Fail creation of 32bit PVH guest. Change hap_update_cr3() to return long mode for PVH, this called during domain creation from arch_set_info_guest(). Return correct features for PVH to guest during it''s boot. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/domain.c | 8 ++++++++ xen/arch/x86/mm/hap/hap.c | 4 +++- xen/common/domain.c | 10 ++++++++++ xen/common/domctl.c | 5 +++++ xen/common/kernel.c | 6 +++++- 5 files changed, 31 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index fccb4ee..288872a 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -339,6 +339,14 @@ int switch_compat(struct domain *d) if ( d == NULL ) return -EINVAL; + + if ( is_pvh_domain(d) ) + { + printk(XENLOG_INFO + "Xen currently does not support 32bit PVH guests\n"); + return -EINVAL; + } + if ( !may_switch_mode(d) ) return -EACCES; if ( is_pv_32on64_domain(d) ) diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c index bff05d9..19a085c 100644 --- a/xen/arch/x86/mm/hap/hap.c +++ b/xen/arch/x86/mm/hap/hap.c @@ -639,7 +639,9 @@ static void hap_update_cr3(struct vcpu *v, int do_locking) const struct paging_mode * hap_paging_get_mode(struct vcpu *v) { - return !hvm_paging_enabled(v) ? &hap_paging_real_mode : + /* PVH 32bitfixme. */ + return is_pvh_vcpu(v) ? &hap_paging_long_mode : + !hvm_paging_enabled(v) ? &hap_paging_real_mode : hvm_long_mode_enabled(v) ? &hap_paging_long_mode : hvm_pae_enabled(v) ? &hap_paging_pae_mode : &hap_paging_protected_mode; diff --git a/xen/common/domain.c b/xen/common/domain.c index 38b1bad..3b4af4b 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -237,6 +237,16 @@ struct domain *domain_create( if ( domcr_flags & DOMCRF_hvm ) d->guest_type = guest_type_hvm; + else if ( domcr_flags & DOMCRF_pvh ) + { + if ( !(domcr_flags & DOMCRF_hap) ) + { + err = -EOPNOTSUPP; + printk(XENLOG_INFO "PVH guest must have HAP on\n"); + goto fail; + } + d->guest_type = guest_type_pvh; + } if ( domid == 0 ) { diff --git a/xen/common/domctl.c b/xen/common/domctl.c index c653efb..48e4c08 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -187,6 +187,8 @@ void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info) if ( is_hvm_domain(d) ) info->flags |= XEN_DOMINF_hvm_guest; + else if ( is_pvh_domain(d) ) + info->flags |= XEN_DOMINF_pvh_guest; xsm_security_domaininfo(d, info); @@ -443,6 +445,9 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) domcr_flags = 0; if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hvm_guest ) domcr_flags |= DOMCRF_hvm; + else if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hap ) + domcr_flags |= DOMCRF_pvh; /* PV with HAP is a PVH guest */ + if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hap ) domcr_flags |= DOMCRF_hap; if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_s3_integrity ) diff --git a/xen/common/kernel.c b/xen/common/kernel.c index 72fb905..3bba758 100644 --- a/xen/common/kernel.c +++ b/xen/common/kernel.c @@ -289,7 +289,11 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) if ( current->domain == dom0 ) fi.submap |= 1U << XENFEAT_dom0; #ifdef CONFIG_X86 - if ( !is_hvm_vcpu(current) ) + if ( is_pvh_vcpu(current) ) + fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) | + (1U << XENFEAT_supervisor_mode_kernel) | + (1U << XENFEAT_hvm_callback_vector); + else if ( !is_hvm_vcpu(current) ) fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) | (1U << XENFEAT_highmem_assist) | (1U << XENFEAT_gnttab_map_avail_bits); -- 1.7.2.3
PVH doesn''t use map cache. show_registers() for PVH takes the HVM path. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/arch/x86/domain_page.c | 10 +++++----- xen/arch/x86/x86_64/traps.c | 6 +++--- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c index bc18263..3903952 100644 --- a/xen/arch/x86/domain_page.c +++ b/xen/arch/x86/domain_page.c @@ -35,7 +35,7 @@ static inline struct vcpu *mapcache_current_vcpu(void) * then it means we are running on the idle domain''s page table and must * therefore use its mapcache. */ - if ( unlikely(pagetable_is_null(v->arch.guest_table)) && !is_hvm_vcpu(v) ) + if ( unlikely(pagetable_is_null(v->arch.guest_table)) && is_pv_vcpu(v) ) { /* If we really are idling, perform lazy context switch now. */ if ( (v = idle_vcpu[smp_processor_id()]) == current ) @@ -72,7 +72,7 @@ void *map_domain_page(unsigned long mfn) #endif v = mapcache_current_vcpu(); - if ( !v || is_hvm_vcpu(v) ) + if ( !v || !is_pv_vcpu(v) ) return mfn_to_virt(mfn); dcache = &v->domain->arch.pv_domain.mapcache; @@ -177,7 +177,7 @@ void unmap_domain_page(const void *ptr) ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END); v = mapcache_current_vcpu(); - ASSERT(v && !is_hvm_vcpu(v)); + ASSERT(v && is_pv_vcpu(v)); dcache = &v->domain->arch.pv_domain.mapcache; ASSERT(dcache->inuse); @@ -244,7 +244,7 @@ int mapcache_domain_init(struct domain *d) struct mapcache_domain *dcache = &d->arch.pv_domain.mapcache; unsigned int bitmap_pages; - if ( is_hvm_domain(d) || is_idle_domain(d) ) + if ( !is_pv_domain(d) || is_idle_domain(d) ) return 0; #ifdef NDEBUG @@ -275,7 +275,7 @@ int mapcache_vcpu_init(struct vcpu *v) unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES; unsigned int nr = PFN_UP(BITS_TO_LONGS(ents) * sizeof(long)); - if ( is_hvm_vcpu(v) || !dcache->inuse ) + if ( !is_pv_vcpu(v) || !dcache->inuse ) return 0; if ( ents > dcache->entries ) diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c index feb50ff..6ac7762 100644 --- a/xen/arch/x86/x86_64/traps.c +++ b/xen/arch/x86/x86_64/traps.c @@ -85,7 +85,7 @@ void show_registers(struct cpu_user_regs *regs) enum context context; struct vcpu *v = current; - if ( is_hvm_vcpu(v) && guest_mode(regs) ) + if ( !is_pv_vcpu(v) && guest_mode(regs) ) { struct segment_register sreg; context = CTXT_hvm_guest; @@ -146,8 +146,8 @@ void vcpu_show_registers(struct vcpu *v) const struct cpu_user_regs *regs = &v->arch.user_regs; unsigned long crs[8]; - /* No need to handle HVM for now. */ - if ( is_hvm_vcpu(v) ) + /* No need to handle HVM and PVH for now. */ + if ( !is_pv_vcpu(v) ) return; crs[0] = v->arch.pv_vcpu.ctrlreg[0]; -- 1.7.2.3
PVH only supports limited memory types in Phase I. TSC is limited to native mode only also for the moment. Finally, grant mapping of iomem for PVH hasn''t been explored in phase I. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/mtrr.c | 8 ++++++++ xen/arch/x86/time.c | 8 ++++++++ xen/common/grant_table.c | 4 ++-- 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c index ef51a8d..b9d6411 100644 --- a/xen/arch/x86/hvm/mtrr.c +++ b/xen/arch/x86/hvm/mtrr.c @@ -693,6 +693,14 @@ uint8_t epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn, ((d->vcpu == NULL) || ((v = d->vcpu[0]) == NULL)) ) return MTRR_TYPE_WRBACK; + /* PVH fixme: Add support for more memory types. */ + if ( is_pvh_domain(d) ) + { + if ( direct_mmio ) + return MTRR_TYPE_UNCACHABLE; + return MTRR_TYPE_WRBACK; + } + if ( !v->domain->arch.hvm_domain.params[HVM_PARAM_IDENT_PT] ) return MTRR_TYPE_WRBACK; diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index cf8bc78..55f6f4b 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -1890,6 +1890,14 @@ void tsc_set_info(struct domain *d, d->arch.vtsc = 0; return; } + if ( is_pvh_domain(d) && tsc_mode != TSC_MODE_NEVER_EMULATE ) + { + /* PVH fixme: support more tsc modes. */ + printk(XENLOG_WARNING + "PVH currently does not support tsc emulation. Setting timer_mode = native\n"); + d->arch.vtsc = 0; + return; + } switch ( d->arch.tsc_mode = tsc_mode ) { diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c index eb50288..c51da30 100644 --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -721,7 +721,7 @@ __gnttab_map_grant_ref( double_gt_lock(lgt, rgt); - if ( !is_hvm_domain(ld) && need_iommu(ld) ) + if ( is_pv_domain(ld) && need_iommu(ld) ) { unsigned int wrc, rdc; int err = 0; @@ -932,7 +932,7 @@ __gnttab_unmap_common( act->pin -= GNTPIN_hstw_inc; } - if ( !is_hvm_domain(ld) && need_iommu(ld) ) + if ( is_pv_domain(ld) && need_iommu(ld) ) { unsigned int wrc, rdc; int err = 0; -- 1.7.2.3
Mukesh Rathor
2013-Jul-20 01:44 UTC
[PATCH 17/23] PVH xen: Checks, asserts, and limitations for PVH
This patch adds some precautionary checks and debug asserts for PVH. Also, PVH doesn''t support any HVM type guest monitoring at present. Change in V9: - Remove ASSERTs from emulate_gate_op and do_device_not_available. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/hvm.c | 13 +++++++++++++ xen/arch/x86/hvm/mtrr.c | 4 ++++ xen/arch/x86/physdev.c | 13 +++++++++++++ 3 files changed, 30 insertions(+), 0 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index bac4708..3b47e6f 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4526,8 +4526,11 @@ static int hvm_memory_event_traps(long p, uint32_t reason, return 1; } +/* PVH fixme: add support for monitoring guest behaviour in below functions. */ void hvm_memory_event_cr0(unsigned long value, unsigned long old) { + if ( is_pvh_vcpu(current) ) + return; hvm_memory_event_traps(current->domain->arch.hvm_domain .params[HVM_PARAM_MEMORY_EVENT_CR0], MEM_EVENT_REASON_CR0, @@ -4536,6 +4539,8 @@ void hvm_memory_event_cr0(unsigned long value, unsigned long old) void hvm_memory_event_cr3(unsigned long value, unsigned long old) { + if ( is_pvh_vcpu(current) ) + return; hvm_memory_event_traps(current->domain->arch.hvm_domain .params[HVM_PARAM_MEMORY_EVENT_CR3], MEM_EVENT_REASON_CR3, @@ -4544,6 +4549,8 @@ void hvm_memory_event_cr3(unsigned long value, unsigned long old) void hvm_memory_event_cr4(unsigned long value, unsigned long old) { + if ( is_pvh_vcpu(current) ) + return; hvm_memory_event_traps(current->domain->arch.hvm_domain .params[HVM_PARAM_MEMORY_EVENT_CR4], MEM_EVENT_REASON_CR4, @@ -4552,6 +4559,8 @@ void hvm_memory_event_cr4(unsigned long value, unsigned long old) void hvm_memory_event_msr(unsigned long msr, unsigned long value) { + if ( is_pvh_vcpu(current) ) + return; hvm_memory_event_traps(current->domain->arch.hvm_domain .params[HVM_PARAM_MEMORY_EVENT_MSR], MEM_EVENT_REASON_MSR, @@ -4564,6 +4573,8 @@ int hvm_memory_event_int3(unsigned long gla) unsigned long gfn; gfn = paging_gva_to_gfn(current, gla, &pfec); + if ( is_pvh_vcpu(current) ) + return 0; return hvm_memory_event_traps(current->domain->arch.hvm_domain .params[HVM_PARAM_MEMORY_EVENT_INT3], MEM_EVENT_REASON_INT3, @@ -4576,6 +4587,8 @@ int hvm_memory_event_single_step(unsigned long gla) unsigned long gfn; gfn = paging_gva_to_gfn(current, gla, &pfec); + if ( is_pvh_vcpu(current) ) + return 0; return hvm_memory_event_traps(current->domain->arch.hvm_domain .params[HVM_PARAM_MEMORY_EVENT_SINGLE_STEP], MEM_EVENT_REASON_SINGLESTEP, diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c index b9d6411..6706af6 100644 --- a/xen/arch/x86/hvm/mtrr.c +++ b/xen/arch/x86/hvm/mtrr.c @@ -578,6 +578,10 @@ int32_t hvm_set_mem_pinned_cacheattr( { struct hvm_mem_pinned_cacheattr_range *range; + /* Side note: A PVH guest writes to MSR_IA32_CR_PAT natively. */ + if ( is_pvh_domain(d) ) + return -EOPNOTSUPP; + if ( !((type == PAT_TYPE_UNCACHABLE) || (type == PAT_TYPE_WRCOMB) || (type == PAT_TYPE_WRTHROUGH) || diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index 3733c7a..73c8d2a 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -475,6 +475,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_set_iopl: { struct physdev_set_iopl set_iopl; + + if ( is_pvh_vcpu(current) ) + { + ret = -EPERM; + break; + } + ret = -EFAULT; if ( copy_from_guest(&set_iopl, arg, 1) != 0 ) break; @@ -488,6 +495,12 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_set_iobitmap: { struct physdev_set_iobitmap set_iobitmap; + + if ( is_pvh_vcpu(current) ) + { + ret = -EPERM; + break; + } ret = -EFAULT; if ( copy_from_guest(&set_iobitmap, arg, 1) != 0 ) break; -- 1.7.2.3
This patch expands HVM hcall support to include PVH. Changes in v8: - Carve out PVH support of hvm_op to a small function. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/hvm.c | 80 +++++++++++++++++++++++++++++++++++++------ xen/arch/x86/x86_64/traps.c | 2 +- 2 files changed, 70 insertions(+), 12 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 3b47e6f..0be98a1 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -3188,6 +3188,17 @@ static long hvm_vcpu_op( case VCPUOP_register_vcpu_time_memory_area: rc = do_vcpu_op(cmd, vcpuid, arg); break; + + case VCPUOP_is_up: + case VCPUOP_up: + case VCPUOP_initialise: + /* PVH fixme: this white list should be removed eventually */ + if ( is_pvh_vcpu(current) ) + rc = do_vcpu_op(cmd, vcpuid, arg); + else + rc = -ENOSYS; + break; + default: rc = -ENOSYS; break; @@ -3308,6 +3319,24 @@ static hvm_hypercall_t *const hvm_hypercall32_table[NR_hypercalls] = { HYPERCALL(tmem_op) }; +/* PVH 32bitfixme. */ +static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = { + HYPERCALL(platform_op), + HYPERCALL(memory_op), + HYPERCALL(xen_version), + HYPERCALL(console_io), + [ __HYPERVISOR_grant_table_op ] = (hvm_hypercall_t *)hvm_grant_table_op, + [ __HYPERVISOR_vcpu_op ] = (hvm_hypercall_t *)hvm_vcpu_op, + HYPERCALL(mmuext_op), + HYPERCALL(xsm_op), + HYPERCALL(sched_op), + HYPERCALL(event_channel_op), + [ __HYPERVISOR_physdev_op ] = (hvm_hypercall_t *)hvm_physdev_op, + HYPERCALL(hvm_op), + HYPERCALL(sysctl), + HYPERCALL(domctl) +}; + int hvm_do_hypercall(struct cpu_user_regs *regs) { struct vcpu *curr = current; @@ -3334,7 +3363,9 @@ int hvm_do_hypercall(struct cpu_user_regs *regs) if ( (eax & 0x80000000) && is_viridian_domain(curr->domain) ) return viridian_hypercall(regs); - if ( (eax >= NR_hypercalls) || !hvm_hypercall32_table[eax] ) + if ( (eax >= NR_hypercalls) || + (is_pvh_vcpu(curr) && !pvh_hypercall64_table[eax]) || + (is_hvm_vcpu(curr) && !hvm_hypercall32_table[eax]) ) { regs->eax = -ENOSYS; return HVM_HCALL_completed; @@ -3349,16 +3380,20 @@ int hvm_do_hypercall(struct cpu_user_regs *regs) regs->r10, regs->r8, regs->r9); curr->arch.hvm_vcpu.hcall_64bit = 1; - regs->rax = hvm_hypercall64_table[eax](regs->rdi, - regs->rsi, - regs->rdx, - regs->r10, - regs->r8, - regs->r9); + if ( is_pvh_vcpu(curr) ) + regs->rax = pvh_hypercall64_table[eax](regs->rdi, regs->rsi, + regs->rdx, regs->r10, + regs->r8, regs->r9); + else + regs->rax = hvm_hypercall64_table[eax](regs->rdi, regs->rsi, + regs->rdx, regs->r10, + regs->r8, regs->r9); curr->arch.hvm_vcpu.hcall_64bit = 0; } else { + ASSERT(!is_pvh_vcpu(curr)); /* PVH 32bitfixme. */ + HVM_DBG_LOG(DBG_LEVEL_HCALL, "hcall%u(%x, %x, %x, %x, %x, %x)", eax, (uint32_t)regs->ebx, (uint32_t)regs->ecx, (uint32_t)regs->edx, (uint32_t)regs->esi, @@ -3756,6 +3791,23 @@ static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid, return 0; } +static long pvh_hvm_op(unsigned long op, struct domain *d, + struct xen_hvm_param *harg) +{ + long rc = -ENOSYS; + + if ( op == HVMOP_set_param ) + { + if ( harg->index == HVM_PARAM_CALLBACK_IRQ ) + { + hvm_set_callback_via(d, harg->value); + hvm_latch_shinfo_size(d); + rc = 0; + } + } + return rc; +} + long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) { @@ -3783,12 +3835,18 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) return -ESRCH; rc = -EINVAL; - if ( !is_hvm_domain(d) ) - goto param_fail; + if ( is_pv_domain(d) ) + goto param_done; rc = xsm_hvm_param(XSM_TARGET, d, op); if ( rc ) - goto param_fail; + goto param_done; + + if ( is_pvh_domain(d) ) + { + rc = pvh_hvm_op(op, d, &a); + goto param_done; + } if ( op == HVMOP_set_param ) { @@ -3997,7 +4055,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) op == HVMOP_set_param ? "set" : "get", a.index, a.value); - param_fail: + param_done: rcu_unlock_domain(d); break; } diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c index 6ac7762..af727f9 100644 --- a/xen/arch/x86/x86_64/traps.c +++ b/xen/arch/x86/x86_64/traps.c @@ -623,7 +623,7 @@ static void hypercall_page_initialise_ring3_kernel(void *hypercall_page) void hypercall_page_initialise(struct domain *d, void *hypercall_page) { memset(hypercall_page, 0xCC, PAGE_SIZE); - if ( is_hvm_domain(d) ) + if ( !is_pv_domain(d) ) hvm_hypercall_page_initialise(d, hypercall_page); else if ( !is_pv_32bit_domain(d) ) hypercall_page_initialise_ring3_kernel(hypercall_page); -- 1.7.2.3
This patch contains vmcs changes related for PVH, mainly creating a VMCS for PVH guest. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/vmx/vmcs.c | 247 ++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 245 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 36f167f..8d35370 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -634,7 +634,7 @@ void vmx_vmcs_exit(struct vcpu *v) { /* Don''t confuse vmx_do_resume (for @v or @current!) */ vmx_clear_vmcs(v); - if ( is_hvm_vcpu(current) ) + if ( !is_pv_vcpu(current) ) vmx_load_vmcs(current); spin_unlock(&v->arch.hvm_vmx.vmcs_lock); @@ -856,6 +856,239 @@ static void vmx_set_common_host_vmcs_fields(struct vcpu *v) __vmwrite(HOST_SYSENTER_EIP, sysenter_eip); } +static int pvh_check_requirements(struct vcpu *v) +{ + u64 required, tmpval = real_cr4_to_pv_guest_cr4(mmu_cr4_features); + + if ( !paging_mode_hap(v->domain) ) + { + printk(XENLOG_G_INFO "HAP is required for PVH guest.\n"); + return -EINVAL; + } + if ( !cpu_has_vmx_pat ) + { + printk(XENLOG_G_INFO "PVH: CPU does not have PAT support\n"); + return -ENOSYS; + } + if ( !cpu_has_vmx_msr_bitmap ) + { + printk(XENLOG_G_INFO "PVH: CPU does not have msr bitmap\n"); + return -ENOSYS; + } + if ( !cpu_has_vmx_vpid ) + { + printk(XENLOG_G_INFO "PVH: CPU doesn''t have VPID support\n"); + return -ENOSYS; + } + if ( !cpu_has_vmx_secondary_exec_control ) + { + printk(XENLOG_G_INFO "CPU Secondary exec is required to run PVH\n"); + return -ENOSYS; + } + + if ( v->domain->arch.vtsc ) + { + printk(XENLOG_G_INFO + "At present PVH only supports the default timer mode\n"); + return -ENOSYS; + } + + required = X86_CR4_PAE | X86_CR4_VMXE | X86_CR4_OSFXSR; + if ( (tmpval & required) != required ) + { + printk(XENLOG_G_INFO "PVH: required CR4 features not available:%lx\n", + required); + return -ENOSYS; + } + + return 0; +} + +static int pvh_construct_vmcs(struct vcpu *v) +{ + int rc, msr_type; + unsigned long *msr_bitmap; + struct domain *d = v->domain; + struct p2m_domain *p2m = p2m_get_hostp2m(d); + struct ept_data *ept = &p2m->ept; + u32 vmexit_ctl = vmx_vmexit_control; + u32 vmentry_ctl = vmx_vmentry_control; + u64 host_pat, tmpval = -1; + + if ( (rc = pvh_check_requirements(v)) ) + return rc; + + msr_bitmap = alloc_xenheap_page(); + if ( msr_bitmap == NULL ) + return -ENOMEM; + + /* 1. Pin-Based Controls: */ + __vmwrite(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_control); + + v->arch.hvm_vmx.exec_control = vmx_cpu_based_exec_control; + + /* 2. Primary Processor-based controls: */ + /* + * If rdtsc exiting is turned on and it goes thru emulate_privileged_op, + * then pv_vcpu.ctrlreg must be added to the pvh struct. + */ + v->arch.hvm_vmx.exec_control &= ~CPU_BASED_RDTSC_EXITING; + v->arch.hvm_vmx.exec_control &= ~CPU_BASED_USE_TSC_OFFSETING; + + v->arch.hvm_vmx.exec_control &= ~(CPU_BASED_INVLPG_EXITING | + CPU_BASED_CR3_LOAD_EXITING | + CPU_BASED_CR3_STORE_EXITING); + v->arch.hvm_vmx.exec_control |= CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; + v->arch.hvm_vmx.exec_control &= ~CPU_BASED_MONITOR_TRAP_FLAG; + v->arch.hvm_vmx.exec_control |= CPU_BASED_ACTIVATE_MSR_BITMAP; + v->arch.hvm_vmx.exec_control &= ~CPU_BASED_TPR_SHADOW; + v->arch.hvm_vmx.exec_control &= ~CPU_BASED_VIRTUAL_NMI_PENDING; + + __vmwrite(CPU_BASED_VM_EXEC_CONTROL, v->arch.hvm_vmx.exec_control); + + /* 3. Secondary Processor-based controls (Intel SDM: resvd bits are 0): */ + v->arch.hvm_vmx.secondary_exec_control = SECONDARY_EXEC_ENABLE_EPT; + v->arch.hvm_vmx.secondary_exec_control |= SECONDARY_EXEC_ENABLE_VPID; + v->arch.hvm_vmx.secondary_exec_control |= SECONDARY_EXEC_PAUSE_LOOP_EXITING; + + __vmwrite(SECONDARY_VM_EXEC_CONTROL, + v->arch.hvm_vmx.secondary_exec_control); + + __vmwrite(IO_BITMAP_A, virt_to_maddr((char *)hvm_io_bitmap + 0)); + __vmwrite(IO_BITMAP_B, virt_to_maddr((char *)hvm_io_bitmap + PAGE_SIZE)); + + /* MSR bitmap for intercepts. */ + memset(msr_bitmap, ~0, PAGE_SIZE); + v->arch.hvm_vmx.msr_bitmap = msr_bitmap; + __vmwrite(MSR_BITMAP, virt_to_maddr(msr_bitmap)); + + msr_type = MSR_TYPE_R | MSR_TYPE_W; + /* Disable interecepts for MSRs that have corresponding VMCS fields. */ + vmx_disable_intercept_for_msr(v, MSR_FS_BASE, msr_type); + vmx_disable_intercept_for_msr(v, MSR_GS_BASE, msr_type); + vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_CS, msr_type); + vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_ESP, msr_type); + vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_EIP, msr_type); + vmx_disable_intercept_for_msr(v, MSR_SHADOW_GS_BASE, msr_type); + vmx_disable_intercept_for_msr(v, MSR_IA32_CR_PAT, msr_type); + + /* + * We don''t disable intercepts for MSRs: MSR_STAR, MSR_LSTAR, MSR_CSTAR, + * and MSR_SYSCALL_MASK because we need to specify save/restore area to + * save/restore at every VM exit and entry. Instead, let the intercept + * functions save them into vmx_msr_state fields. See comment in + * vmx_restore_host_msrs(). See also vmx_restore_guest_msrs(). + */ + __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, 0); + __vmwrite(VM_EXIT_MSR_LOAD_COUNT, 0); + __vmwrite(VM_EXIT_MSR_STORE_COUNT, 0); + + __vmwrite(VM_EXIT_CONTROLS, vmexit_ctl); + + /* + * Note: we run with default VM_ENTRY_LOAD_DEBUG_CTLS of 1, which means + * upon vmentry, the cpu reads/loads VMCS.DR7 and VMCS.DEBUGCTLS, and not + * use the host values. 0 would cause it to not use the VMCS values. + */ + vmentry_ctl &= ~VM_ENTRY_LOAD_GUEST_EFER; + vmentry_ctl &= ~VM_ENTRY_SMM; + vmentry_ctl &= ~VM_ENTRY_DEACT_DUAL_MONITOR; + /* PVH 32bitfixme. */ + vmentry_ctl |= VM_ENTRY_IA32E_MODE; /* GUEST_EFER.LME/LMA ignored */ + + __vmwrite(VM_ENTRY_CONTROLS, vmentry_ctl); + + vmx_set_common_host_vmcs_fields(v); + + __vmwrite(VM_ENTRY_INTR_INFO, 0); + __vmwrite(CR3_TARGET_COUNT, 0); + __vmwrite(GUEST_ACTIVITY_STATE, 0); + + /* These are sorta irrelevant as we load the discriptors directly. */ + __vmwrite(GUEST_CS_SELECTOR, 0); + __vmwrite(GUEST_DS_SELECTOR, 0); + __vmwrite(GUEST_SS_SELECTOR, 0); + __vmwrite(GUEST_ES_SELECTOR, 0); + __vmwrite(GUEST_FS_SELECTOR, 0); + __vmwrite(GUEST_GS_SELECTOR, 0); + + __vmwrite(GUEST_CS_BASE, 0); + __vmwrite(GUEST_CS_LIMIT, ~0u); + /* CS.L == 1, exec, read/write, accessed. PVH 32bitfixme. */ + __vmwrite(GUEST_CS_AR_BYTES, 0xa09b); + + __vmwrite(GUEST_DS_BASE, 0); + __vmwrite(GUEST_DS_LIMIT, ~0u); + __vmwrite(GUEST_DS_AR_BYTES, 0xc093); /* read/write, accessed */ + + __vmwrite(GUEST_SS_BASE, 0); + __vmwrite(GUEST_SS_LIMIT, ~0u); + __vmwrite(GUEST_SS_AR_BYTES, 0xc093); /* read/write, accessed */ + + __vmwrite(GUEST_ES_BASE, 0); + __vmwrite(GUEST_ES_LIMIT, ~0u); + __vmwrite(GUEST_ES_AR_BYTES, 0xc093); /* read/write, accessed */ + + __vmwrite(GUEST_FS_BASE, 0); + __vmwrite(GUEST_FS_LIMIT, ~0u); + __vmwrite(GUEST_FS_AR_BYTES, 0xc093); /* read/write, accessed */ + + __vmwrite(GUEST_GS_BASE, 0); + __vmwrite(GUEST_GS_LIMIT, ~0u); + __vmwrite(GUEST_GS_AR_BYTES, 0xc093); /* read/write, accessed */ + + __vmwrite(GUEST_GDTR_BASE, 0); + __vmwrite(GUEST_GDTR_LIMIT, 0); + + __vmwrite(GUEST_LDTR_BASE, 0); + __vmwrite(GUEST_LDTR_LIMIT, 0); + __vmwrite(GUEST_LDTR_AR_BYTES, 0x82); /* LDT */ + __vmwrite(GUEST_LDTR_SELECTOR, 0); + + /* Guest TSS. */ + __vmwrite(GUEST_TR_BASE, 0); + __vmwrite(GUEST_TR_LIMIT, 0xff); + __vmwrite(GUEST_TR_AR_BYTES, 0x8b); /* 32-bit TSS (busy) */ + + __vmwrite(GUEST_INTERRUPTIBILITY_INFO, 0); + __vmwrite(GUEST_DR7, 0); + __vmwrite(VMCS_LINK_POINTER, ~0UL); + + __vmwrite(PAGE_FAULT_ERROR_CODE_MASK, 0); + __vmwrite(PAGE_FAULT_ERROR_CODE_MATCH, 0); + + v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK | (1U << TRAP_debug) | + (1U << TRAP_int3) | (1U << TRAP_no_device); + __vmwrite(EXCEPTION_BITMAP, v->arch.hvm_vmx.exception_bitmap); + + /* Set WP bit so rdonly pages are not written from CPL 0. */ + tmpval = X86_CR0_PG | X86_CR0_NE | X86_CR0_PE | X86_CR0_WP; + __vmwrite(GUEST_CR0, tmpval); + __vmwrite(CR0_READ_SHADOW, tmpval); + v->arch.hvm_vcpu.hw_cr[0] = v->arch.hvm_vcpu.guest_cr[0] = tmpval; + + tmpval = real_cr4_to_pv_guest_cr4(mmu_cr4_features); + __vmwrite(GUEST_CR4, tmpval); + __vmwrite(CR4_READ_SHADOW, tmpval); + v->arch.hvm_vcpu.guest_cr[4] = tmpval; + + __vmwrite(CR0_GUEST_HOST_MASK, ~0UL); + __vmwrite(CR4_GUEST_HOST_MASK, ~0UL); + + v->arch.hvm_vmx.vmx_realmode = 0; + + ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m)); + __vmwrite(EPT_POINTER, ept_get_eptp(ept)); + + rdmsrl(MSR_IA32_CR_PAT, host_pat); + __vmwrite(HOST_PAT, host_pat); + __vmwrite(GUEST_PAT, MSR_IA32_CR_PAT_RESET); + + /* The paging mode is updated for PVH by arch_set_info_guest(). */ + + return 0; +} + static int construct_vmcs(struct vcpu *v) { struct domain *d = v->domain; @@ -864,6 +1097,13 @@ static int construct_vmcs(struct vcpu *v) vmx_vmcs_enter(v); + if ( is_pvh_vcpu(v) ) + { + int rc = pvh_construct_vmcs(v); + vmx_vmcs_exit(v); + return rc; + } + /* VMCS controls. */ __vmwrite(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_control); @@ -1294,6 +1534,9 @@ void vmx_do_resume(struct vcpu *v) hvm_asid_flush_vcpu(v); } + if ( is_pvh_vcpu(v) ) + reset_stack_and_jump(vmx_asm_do_vmentry); + debug_state = v->domain->debugger_attached || v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_INT3] || v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_SINGLE_STEP]; @@ -1477,7 +1720,7 @@ static void vmcs_dump(unsigned char ch) for_each_domain ( d ) { - if ( !is_hvm_domain(d) ) + if ( is_pv_domain(d) ) continue; printk("\n>>> Domain %d <<<\n", d->domain_id); for_each_vcpu ( d, v ) -- 1.7.2.3
Mukesh Rathor
2013-Jul-20 01:45 UTC
[PATCH 20/23] PVH xen: HVM support of PVH guest creation/destruction
This patch implements the HVM portion of the guest create, ie vcpu and domain initilization. Some changes to support the destroy path. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/hvm.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 65 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 0be98a1..a21391f 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -510,6 +510,30 @@ static int hvm_print_line( return X86EMUL_OKAY; } +static int pvh_dom_initialise(struct domain *d) +{ + int rc; + + if ( !d->arch.hvm_domain.hap_enabled ) + return -EINVAL; + + spin_lock_init(&d->arch.hvm_domain.irq_lock); + + hvm_init_cacheattr_region_list(d); + + if ( (rc = paging_enable(d, PG_refcounts|PG_translate|PG_external)) != 0 ) + goto pvh_dominit_fail; + + if ( (rc = hvm_funcs.domain_initialise(d)) != 0 ) + goto pvh_dominit_fail; + + return 0; + +pvh_dominit_fail: + hvm_destroy_cacheattr_region_list(d); + return rc; +} + int hvm_domain_initialise(struct domain *d) { int rc; @@ -520,6 +544,8 @@ int hvm_domain_initialise(struct domain *d) "on a non-VT/AMDV platform.\n"); return -EINVAL; } + if ( is_pvh_domain(d) ) + return pvh_dom_initialise(d); spin_lock_init(&d->arch.hvm_domain.pbuf_lock); spin_lock_init(&d->arch.hvm_domain.irq_lock); @@ -584,6 +610,9 @@ int hvm_domain_initialise(struct domain *d) void hvm_domain_relinquish_resources(struct domain *d) { + if ( is_pvh_domain(d) ) + return; + if ( hvm_funcs.nhvm_domain_relinquish_resources ) hvm_funcs.nhvm_domain_relinquish_resources(d); @@ -609,10 +638,14 @@ void hvm_domain_relinquish_resources(struct domain *d) void hvm_domain_destroy(struct domain *d) { hvm_funcs.domain_destroy(d); + hvm_destroy_cacheattr_region_list(d); + + if ( is_pvh_domain(d) ) + return; + rtc_deinit(d); stdvga_deinit(d); vioapic_deinit(d); - hvm_destroy_cacheattr_region_list(d); } static int hvm_save_tsc_adjust(struct domain *d, hvm_domain_context_t *h) @@ -1066,6 +1099,30 @@ static int __init __hvm_register_CPU_XSAVE_save_and_restore(void) } __initcall(__hvm_register_CPU_XSAVE_save_and_restore); +static int pvh_vcpu_initialise(struct vcpu *v) +{ + int rc; + + if ( (rc = hvm_funcs.vcpu_initialise(v)) != 0 ) + return rc; + + softirq_tasklet_init(&v->arch.hvm_vcpu.assert_evtchn_irq_tasklet, + (void(*)(unsigned long))hvm_assert_evtchn_irq, + (unsigned long)v); + + v->arch.hvm_vcpu.hcall_64bit = 1; /* PVH 32bitfixme. */ + v->arch.user_regs.eflags = 2; + v->arch.hvm_vcpu.inject_trap.vector = -1; + + if ( (rc = hvm_vcpu_cacheattr_init(v)) != 0 ) + { + hvm_funcs.vcpu_destroy(v); + return rc; + } + + return 0; +} + int hvm_vcpu_initialise(struct vcpu *v) { int rc; @@ -1077,6 +1134,9 @@ int hvm_vcpu_initialise(struct vcpu *v) spin_lock_init(&v->arch.hvm_vcpu.tm_lock); INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list); + if ( is_pvh_vcpu(v) ) + return pvh_vcpu_initialise(v); + if ( (rc = vlapic_init(v)) != 0 ) goto fail1; @@ -1165,7 +1225,10 @@ void hvm_vcpu_destroy(struct vcpu *v) tasklet_kill(&v->arch.hvm_vcpu.assert_evtchn_irq_tasklet); hvm_vcpu_cacheattr_destroy(v); - vlapic_destroy(v); + + if ( !is_pvh_vcpu(v) ) + vlapic_destroy(v); + hvm_funcs.vcpu_destroy(v); /* Event channel is already freed by evtchn_destroy(). */ -- 1.7.2.3
Mukesh Rathor
2013-Jul-20 01:45 UTC
[PATCH 21/23] PVH xen: VMX support of PVH guest creation/destruction
This patch implements the vmx portion of the guest create, ie vcpu and domain initilization. Some changes to support the destroy path. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/vmx/vmx.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 files changed, 40 insertions(+), 0 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 80109c1..7b141d6 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -82,6 +82,9 @@ static int vmx_domain_initialise(struct domain *d) { int rc; + if ( is_pvh_domain(d) ) + return 0; + if ( (rc = vmx_alloc_vlapic_mapping(d)) != 0 ) return rc; @@ -90,6 +93,9 @@ static int vmx_domain_initialise(struct domain *d) static void vmx_domain_destroy(struct domain *d) { + if ( is_pvh_domain(d) ) + return; + vmx_free_vlapic_mapping(d); } @@ -113,6 +119,12 @@ static int vmx_vcpu_initialise(struct vcpu *v) vpmu_initialise(v); + if ( is_pvh_vcpu(v) ) + { + /* This for hvm_long_mode_enabled(v). */ + v->arch.hvm_vcpu.guest_efer = EFER_SCE | EFER_LMA | EFER_LME; + return 0; + } vmx_install_vlapic_mapping(v); /* %eax == 1 signals full real-mode support to the guest loader. */ @@ -1076,6 +1088,28 @@ static void vmx_update_host_cr3(struct vcpu *v) vmx_vmcs_exit(v); } +/* + * PVH guest never causes CR3 write vmexit. This is called during the guest + * setup. + */ +static void vmx_update_pvh_cr(struct vcpu *v, unsigned int cr) +{ + vmx_vmcs_enter(v); + switch ( cr ) + { + case 3: + __vmwrite(GUEST_CR3, v->arch.hvm_vcpu.guest_cr[3]); + hvm_asid_flush_vcpu(v); + break; + + default: + printk(XENLOG_ERR + "PVH: d%d v%d unexpected cr%d update at rip:%lx\n", + v->domain->domain_id, v->vcpu_id, cr, __vmread(GUEST_RIP)); + } + vmx_vmcs_exit(v); +} + void vmx_update_debug_state(struct vcpu *v) { unsigned long mask; @@ -1095,6 +1129,12 @@ void vmx_update_debug_state(struct vcpu *v) static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr) { + if ( is_pvh_vcpu(v) ) + { + vmx_update_pvh_cr(v, cr); + return; + } + vmx_vmcs_enter(v); switch ( cr ) -- 1.7.2.3
Mukesh Rathor
2013-Jul-20 01:45 UTC
[PATCH 22/23] PVH xen: preparatory patch for the pvh vmexit handler patch
This is a preparatory patch for the next pvh vmexit handler patch. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/vmx/pvh.c | 5 +++++ xen/arch/x86/hvm/vmx/vmx.c | 6 ++++++ xen/arch/x86/traps.c | 4 ++-- xen/include/asm-x86/hvm/vmx/vmx.h | 1 + xen/include/asm-x86/processor.h | 2 ++ xen/include/asm-x86/traps.h | 2 ++ 6 files changed, 18 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/pvh.c b/xen/arch/x86/hvm/vmx/pvh.c index b37e423..8e61d23 100644 --- a/xen/arch/x86/hvm/vmx/pvh.c +++ b/xen/arch/x86/hvm/vmx/pvh.c @@ -20,6 +20,11 @@ #include <asm/hvm/nestedhvm.h> #include <asm/xstate.h> +/* Implemented in the next patch */ +void vmx_pvh_vmexit_handler(struct cpu_user_regs *regs) +{ +} + /* * Set vmcs fields in support of vcpu_op -> VCPUOP_initialise hcall. Called * from arch_set_info_guest() which sets the (PVH relevant) non-vmcs fields. diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 7b141d6..97c3f76 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2502,6 +2502,12 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) if ( unlikely(exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) ) return vmx_failed_vmentry(exit_reason, regs); + if ( is_pvh_vcpu(v) ) + { + vmx_pvh_vmexit_handler(regs); + return; + } + if ( v->arch.hvm_vmx.vmx_realmode ) { /* Put RFLAGS back the way the guest wants it */ diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index fe8b94c..9c82b45 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -745,7 +745,7 @@ int cpuid_hypervisor_leaves( uint32_t idx, uint32_t sub_idx, return 1; } -static void pv_cpuid(struct cpu_user_regs *regs) +void pv_cpuid(struct cpu_user_regs *regs) { uint32_t a, b, c, d; @@ -1904,7 +1904,7 @@ static int is_cpufreq_controller(struct domain *d) #include "x86_64/mmconfig.h" -static int emulate_privileged_op(struct cpu_user_regs *regs) +int emulate_privileged_op(struct cpu_user_regs *regs) { enum x86_segment which_sel; struct vcpu *v = current; diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h index 9e6c481..44e4136 100644 --- a/xen/include/asm-x86/hvm/vmx/vmx.h +++ b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -474,6 +474,7 @@ void vmx_dr_access(unsigned long exit_qualification, struct cpu_user_regs *regs); void vmx_fpu_enter(struct vcpu *v); int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context *ctxtp); +void vmx_pvh_vmexit_handler(struct cpu_user_regs *regs); int alloc_p2m_hap_data(struct p2m_domain *p2m); void free_p2m_hap_data(struct p2m_domain *p2m); diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h index 5cdacc7..22a9653 100644 --- a/xen/include/asm-x86/processor.h +++ b/xen/include/asm-x86/processor.h @@ -566,6 +566,8 @@ void microcode_set_module(unsigned int); int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void), unsigned long len); int microcode_resume_cpu(int cpu); +void pv_cpuid(struct cpu_user_regs *regs); + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_X86_PROCESSOR_H */ diff --git a/xen/include/asm-x86/traps.h b/xen/include/asm-x86/traps.h index 1d9b087..8c3540a 100644 --- a/xen/include/asm-x86/traps.h +++ b/xen/include/asm-x86/traps.h @@ -50,4 +50,6 @@ extern int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr); int emulate_forced_invalid_op(struct cpu_user_regs *regs); +int emulate_privileged_op(struct cpu_user_regs *regs); + #endif /* ASM_TRAP_H */ -- 1.7.2.3
Mukesh Rathor
2013-Jul-20 01:45 UTC
[PATCH 23/23] PVH xen: introduce vmexit handler for PVH
This patch contains vmx exit handler for PVH guest. Note it contains a macro dbgp1 to print vmexit reasons and a lot of other data to go with it. It can be enabled by setting pvhdbg to 1. This can be very useful debugging for the first few months of testing, after which it can be removed at the maintainer''s discretion. Changes in V2: - Move non VMX generic code to arch/x86/hvm/pvh.c - Remove get_gpr_ptr() and use existing decode_register() instead. - Defer call to pvh vmx exit handler until interrupts are enabled. So the caller vmx_pvh_vmexit_handler() handles the NMI/EXT-INT/TRIPLE_FAULT now. - Fix the CPUID (wrongly) clearing bit 24. No need to do this now, set the correct feature bits in CR4 during vmcs creation. - Fix few hard tabs. Changes in V3: - Lot of cleanup and rework in PVH vm exit handler. - add parameter to emulate_forced_invalid_op(). Changes in V5: - Move pvh.c and emulate_forced_invalid_op related changes to another patch. - Formatting. - Remove vmx_pvh_read_descriptor(). - Use SS DPL instead of CS.RPL for CPL. - Remove pvh_user_cpuid() and call pv_cpuid for user mode also. Changes in V6: - Replace domain_crash_synchronous() with domain_crash(). Changes in V7: - Don''t read all selectors on every vmexit. Do that only for the IO instruction vmexit. - Add couple checks and set guest_cr[4] in access_cr4(). - Add period after all comments in case that''s an issue. - Move making pv_cpuid and emulate_privileged_op public here. Changes in V8: - Mainly, don''t read selectors on vmexit. The macros now come to VMCS to read selectors on demand. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/vmx/pvh.c | 451 +++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 450 insertions(+), 1 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/pvh.c b/xen/arch/x86/hvm/vmx/pvh.c index 8e61d23..ba11967 100644 --- a/xen/arch/x86/hvm/vmx/pvh.c +++ b/xen/arch/x86/hvm/vmx/pvh.c @@ -20,9 +20,458 @@ #include <asm/hvm/nestedhvm.h> #include <asm/xstate.h> -/* Implemented in the next patch */ +#ifndef NDEBUG +static int pvhdbg = 0; +#define dbgp1(...) do { (pvhdbg == 1) ? printk(__VA_ARGS__) : 0; } while ( 0 ) +#else +#define dbgp1(...) ((void)0) +#endif + +/* Returns : 0 == msr read successfully. */ +static int vmxit_msr_read(struct cpu_user_regs *regs) +{ + u64 msr_content = 0; + + switch ( regs->ecx ) + { + case MSR_IA32_MISC_ENABLE: + rdmsrl(MSR_IA32_MISC_ENABLE, msr_content); + msr_content |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL | + MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL; + break; + + default: + /* PVH fixme: see hvm_msr_read_intercept(). */ + rdmsrl(regs->ecx, msr_content); + break; + } + regs->eax = (uint32_t)msr_content; + regs->edx = (uint32_t)(msr_content >> 32); + vmx_update_guest_eip(); + + dbgp1("msr read c:%lx a:%lx d:%lx RIP:%lx RSP:%lx\n", regs->ecx, regs->eax, + regs->edx, regs->rip, regs->rsp); + + return 0; +} + +/* Returns : 0 == msr written successfully. */ +static int vmxit_msr_write(struct cpu_user_regs *regs) +{ + uint64_t msr_content = (uint32_t)regs->eax | ((uint64_t)regs->edx << 32); + + dbgp1("PVH: msr write:0x%lx. eax:0x%lx edx:0x%lx\n", regs->ecx, + regs->eax, regs->edx); + + if ( hvm_msr_write_intercept(regs->ecx, msr_content) == X86EMUL_OKAY ) + { + vmx_update_guest_eip(); + return 0; + } + return 1; +} + +static int vmxit_debug(struct cpu_user_regs *regs) +{ + struct vcpu *vp = current; + unsigned long exit_qualification = __vmread(EXIT_QUALIFICATION); + + write_debugreg(6, exit_qualification | 0xffff0ff0); + + /* gdbsx or another debugger. Never pause dom0. */ + if ( vp->domain->domain_id != 0 && vp->domain->debugger_attached ) + domain_pause_for_debugger(); + else + hvm_inject_hw_exception(TRAP_debug, HVM_DELIVER_NO_ERROR_CODE); + + return 0; +} + +/* Returns: rc == 0: handled the MTF vmexit. */ +static int vmxit_mtf(struct cpu_user_regs *regs) +{ + struct vcpu *vp = current; + int rc = -EINVAL, ss = vp->arch.hvm_vcpu.single_step; + + vp->arch.hvm_vmx.exec_control &= ~CPU_BASED_MONITOR_TRAP_FLAG; + __vmwrite(CPU_BASED_VM_EXEC_CONTROL, vp->arch.hvm_vmx.exec_control); + vp->arch.hvm_vcpu.single_step = 0; + + if ( vp->domain->debugger_attached && ss ) + { + domain_pause_for_debugger(); + rc = 0; + } + return rc; +} + +static int vmxit_int3(struct cpu_user_regs *regs) +{ + int ilen = vmx_get_instruction_length(); + struct vcpu *vp = current; + struct hvm_trap trap_info = { + .vector = TRAP_int3, + .type = X86_EVENTTYPE_SW_EXCEPTION, + .error_code = HVM_DELIVER_NO_ERROR_CODE, + .insn_len = ilen + }; + + /* gdbsx or another debugger. Never pause dom0. */ + if ( vp->domain->domain_id != 0 && vp->domain->debugger_attached ) + { + regs->eip += ilen; + dbgp1("[%d]PVH: domain pause for debugger\n", smp_processor_id()); + current->arch.gdbsx_vcpu_event = TRAP_int3; + domain_pause_for_debugger(); + return 0; + } + hvm_inject_trap(&trap_info); + + return 0; +} + +/* Just like HVM, PVH should be using "cpuid" from the kernel mode. */ +static int vmxit_invalid_op(struct cpu_user_regs *regs) +{ + if ( guest_kernel_mode(current, regs) || !emulate_forced_invalid_op(regs) ) + hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE); + + return 0; +} + +/* Returns: rc == 0: handled the exception. */ +static int vmxit_exception(struct cpu_user_regs *regs) +{ + int vector = (__vmread(VM_EXIT_INTR_INFO)) & INTR_INFO_VECTOR_MASK; + int rc = -ENOSYS; + + dbgp1(" EXCPT: vec:%d cs:%lx r.IP:%lx\n", vector, + __vmread(GUEST_CS_SELECTOR), regs->eip); + + switch ( vector ) + { + case TRAP_debug: + rc = vmxit_debug(regs); + break; + + case TRAP_int3: + rc = vmxit_int3(regs); + break; + + case TRAP_invalid_op: + rc = vmxit_invalid_op(regs); + break; + + case TRAP_no_device: + hvm_funcs.fpu_dirty_intercept(); + rc = 0; + break; + + default: + printk(XENLOG_G_WARNING + "PVH: Unhandled trap:%d. IP:%lx\n", vector, regs->eip); + } + return rc; +} + +static int vmxit_vmcall(struct cpu_user_regs *regs) +{ + if ( hvm_do_hypercall(regs) != HVM_HCALL_preempted ) + vmx_update_guest_eip(); + return 0; +} + +/* Returns: rc == 0: success. */ +static int access_cr0(struct cpu_user_regs *regs, uint acc_typ, uint64_t *regp) +{ + struct vcpu *vp = current; + + if ( acc_typ == VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR ) + { + unsigned long new_cr0 = *regp; + unsigned long old_cr0 = __vmread(GUEST_CR0); + + dbgp1("PVH:writing to CR0. RIP:%lx val:0x%lx\n", regs->rip, *regp); + if ( (u32)new_cr0 != new_cr0 ) + { + printk(XENLOG_G_WARNING + "Guest setting upper 32 bits in CR0: %lx", new_cr0); + return -EPERM; + } + + new_cr0 &= ~HVM_CR0_GUEST_RESERVED_BITS; + /* ET is reserved and should always be 1. */ + new_cr0 |= X86_CR0_ET; + + /* A pvh is not expected to change to real mode. */ + if ( (new_cr0 & (X86_CR0_PE | X86_CR0_PG)) !+ (X86_CR0_PG | X86_CR0_PE) ) + { + printk(XENLOG_G_WARNING + "PVH attempting to turn off PE/PG. CR0:%lx\n", new_cr0); + return -EPERM; + } + /* TS going from 1 to 0 */ + if ( (old_cr0 & X86_CR0_TS) && ((new_cr0 & X86_CR0_TS) == 0) ) + vmx_fpu_enter(vp); + + vp->arch.hvm_vcpu.hw_cr[0] = vp->arch.hvm_vcpu.guest_cr[0] = new_cr0; + __vmwrite(GUEST_CR0, new_cr0); + __vmwrite(CR0_READ_SHADOW, new_cr0); + } + else + *regp = __vmread(GUEST_CR0); + + return 0; +} + +/* Returns: rc == 0: success. */ +static int access_cr4(struct cpu_user_regs *regs, uint acc_typ, uint64_t *regp) +{ + if ( acc_typ == VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR ) + { + struct vcpu *vp = current; + u64 old_val = __vmread(GUEST_CR4); + u64 new = *regp; + + if ( new & HVM_CR4_GUEST_RESERVED_BITS(vp) ) + { + printk(XENLOG_G_WARNING + "PVH guest attempts to set reserved bit in CR4: %lx", new); + hvm_inject_hw_exception(TRAP_gp_fault, 0); + return 0; + } + + if ( !(new & X86_CR4_PAE) && hvm_long_mode_enabled(vp) ) + { + printk(XENLOG_G_WARNING "Guest cleared CR4.PAE while " + "EFER.LMA is set"); + hvm_inject_hw_exception(TRAP_gp_fault, 0); + return 0; + } + + vp->arch.hvm_vcpu.guest_cr[4] = new; + + if ( (old_val ^ new) & (X86_CR4_PSE | X86_CR4_PGE | X86_CR4_PAE) ) + vpid_sync_all(); + + __vmwrite(CR4_READ_SHADOW, new); + + new &= ~X86_CR4_PAE; /* PVH always runs with hap enabled. */ + new |= X86_CR4_VMXE | X86_CR4_MCE; + __vmwrite(GUEST_CR4, new); + } + else + *regp = __vmread(CR4_READ_SHADOW); + + return 0; +} + +/* Returns: rc == 0: success, else -errno. */ +static int vmxit_cr_access(struct cpu_user_regs *regs) +{ + unsigned long exit_qualification = __vmread(EXIT_QUALIFICATION); + uint acc_typ = VMX_CONTROL_REG_ACCESS_TYPE(exit_qualification); + int cr, rc = -EINVAL; + + switch ( acc_typ ) + { + case VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR: + case VMX_CONTROL_REG_ACCESS_TYPE_MOV_FROM_CR: + { + uint gpr = VMX_CONTROL_REG_ACCESS_GPR(exit_qualification); + uint64_t *regp = decode_register(gpr, regs, 0); + cr = VMX_CONTROL_REG_ACCESS_NUM(exit_qualification); + + if ( regp == NULL ) + break; + + switch ( cr ) + { + case 0: + rc = access_cr0(regs, acc_typ, regp); + break; + + case 3: + printk(XENLOG_G_ERR "PVH: unexpected cr3 vmexit. rip:%lx\n", + regs->rip); + domain_crash(current->domain); + break; + + case 4: + rc = access_cr4(regs, acc_typ, regp); + break; + } + if ( rc == 0 ) + vmx_update_guest_eip(); + break; + } + + case VMX_CONTROL_REG_ACCESS_TYPE_CLTS: + { + struct vcpu *vp = current; + unsigned long cr0 = vp->arch.hvm_vcpu.guest_cr[0] & ~X86_CR0_TS; + vp->arch.hvm_vcpu.hw_cr[0] = vp->arch.hvm_vcpu.guest_cr[0] = cr0; + + vmx_fpu_enter(vp); + __vmwrite(GUEST_CR0, cr0); + __vmwrite(CR0_READ_SHADOW, cr0); + vmx_update_guest_eip(); + rc = 0; + } + } + return rc; +} + +/* + * Note: A PVH guest sets IOPL natively by setting bits in the eflags, and not + * via hypercalls used by a PV. + */ +static int vmxit_io_instr(struct cpu_user_regs *regs) +{ + struct segment_register seg; + int requested = (regs->rflags & X86_EFLAGS_IOPL) >> 12; + int curr_lvl = (regs->rflags & X86_EFLAGS_VM) ? 3 : 0; + + if ( curr_lvl == 0 ) + { + hvm_get_segment_register(current, x86_seg_ss, &seg); + curr_lvl = seg.attr.fields.dpl; + } + if ( requested >= curr_lvl && emulate_privileged_op(regs) ) + return 0; + + hvm_inject_hw_exception(TRAP_gp_fault, regs->error_code); + return 0; +} + +static int pvh_ept_handle_violation(unsigned long qualification, + paddr_t gpa, struct cpu_user_regs *regs) +{ + unsigned long gla, gfn = gpa >> PAGE_SHIFT; + p2m_type_t p2mt; + mfn_t mfn = get_gfn_query_unlocked(current->domain, gfn, &p2mt); + + printk(XENLOG_G_ERR "EPT violation %#lx (%c%c%c/%c%c%c), " + "gpa %#"PRIpaddr", mfn %#lx, type %i. IP:0x%lx RSP:0x%lx\n", + qualification, + (qualification & EPT_READ_VIOLATION) ? ''r'' : ''-'', + (qualification & EPT_WRITE_VIOLATION) ? ''w'' : ''-'', + (qualification & EPT_EXEC_VIOLATION) ? ''x'' : ''-'', + (qualification & EPT_EFFECTIVE_READ) ? ''r'' : ''-'', + (qualification & EPT_EFFECTIVE_WRITE) ? ''w'' : ''-'', + (qualification & EPT_EFFECTIVE_EXEC) ? ''x'' : ''-'', + gpa, mfn_x(mfn), p2mt, regs->rip, regs->rsp); + + ept_walk_table(current->domain, gfn); + + if ( qualification & EPT_GLA_VALID ) + { + gla = __vmread(GUEST_LINEAR_ADDRESS); + printk(XENLOG_G_ERR " --- GLA %#lx\n", gla); + } + hvm_inject_hw_exception(TRAP_gp_fault, 0); + return 0; +} + +/* + * Main vm exit handler for PVH . Called from vmx_vmexit_handler(). + * Note: vmx_asm_vmexit_handler updates rip/rsp/eflags in regs{} struct. + */ void vmx_pvh_vmexit_handler(struct cpu_user_regs *regs) { + unsigned long exit_qualification; + unsigned int exit_reason = __vmread(VM_EXIT_REASON); + int rc=0, ccpu = smp_processor_id(); + struct vcpu *v = current; + + dbgp1("PVH:[%d]left VMCS exitreas:%d RIP:%lx RSP:%lx EFLAGS:%lx CR0:%lx\n", + ccpu, exit_reason, regs->rip, regs->rsp, regs->rflags, + __vmread(GUEST_CR0)); + + switch ( (uint16_t)exit_reason ) + { + /* NMI and machine_check are handled by the caller, we handle rest here */ + case EXIT_REASON_EXCEPTION_NMI: /* 0 */ + rc = vmxit_exception(regs); + break; + + case EXIT_REASON_EXTERNAL_INTERRUPT: /* 1 */ + break; /* handled in vmx_vmexit_handler() */ + + case EXIT_REASON_PENDING_VIRT_INTR: /* 7 */ + /* Disable the interrupt window. */ + v->arch.hvm_vmx.exec_control &= ~CPU_BASED_VIRTUAL_INTR_PENDING; + __vmwrite(CPU_BASED_VM_EXEC_CONTROL, v->arch.hvm_vmx.exec_control); + break; + + case EXIT_REASON_CPUID: /* 10 */ + pv_cpuid(regs); + vmx_update_guest_eip(); + break; + + case EXIT_REASON_HLT: /* 12 */ + vmx_update_guest_eip(); + hvm_hlt(regs->eflags); + break; + + case EXIT_REASON_VMCALL: /* 18 */ + rc = vmxit_vmcall(regs); + break; + + case EXIT_REASON_CR_ACCESS: /* 28 */ + rc = vmxit_cr_access(regs); + break; + + case EXIT_REASON_DR_ACCESS: /* 29 */ + exit_qualification = __vmread(EXIT_QUALIFICATION); + vmx_dr_access(exit_qualification, regs); + break; + + case EXIT_REASON_IO_INSTRUCTION: /* 30 */ + vmxit_io_instr(regs); + break; + + case EXIT_REASON_MSR_READ: /* 31 */ + rc = vmxit_msr_read(regs); + break; + + case EXIT_REASON_MSR_WRITE: /* 32 */ + rc = vmxit_msr_write(regs); + break; + + case EXIT_REASON_MONITOR_TRAP_FLAG: /* 37 */ + rc = vmxit_mtf(regs); + break; + + case EXIT_REASON_MCE_DURING_VMENTRY: /* 41 */ + break; /* handled in vmx_vmexit_handler() */ + + case EXIT_REASON_EPT_VIOLATION: /* 48 */ + { + paddr_t gpa = __vmread(GUEST_PHYSICAL_ADDRESS); + exit_qualification = __vmread(EXIT_QUALIFICATION); + rc = pvh_ept_handle_violation(exit_qualification, gpa, regs); + break; + } + + default: + rc = 1; + printk(XENLOG_G_ERR + "PVH: Unexpected exit reason:%#x\n", exit_reason); + } + + if ( rc ) + { + exit_qualification = __vmread(EXIT_QUALIFICATION); + printk(XENLOG_G_WARNING + "PVH: [%d] exit_reas:%d %#x qual:%ld 0x%lx cr0:0x%016lx\n", + ccpu, exit_reason, exit_reason, exit_qualification, + exit_qualification, __vmread(GUEST_CR0)); + printk(XENLOG_G_WARNING "PVH: RIP:%lx RSP:%lx EFLAGS:%lx CR3:%lx\n", + regs->rip, regs->rsp, regs->rflags, __vmread(GUEST_CR3)); + domain_crash(v->domain); + } } /* -- 1.7.2.3
Egger, Christoph
2013-Jul-22 14:25 UTC
Re: [PATCH 21/23] PVH xen: VMX support of PVH guest creation/destruction
On 20.07.13 03:45, Mukesh Rathor wrote:> This patch implements the vmx portion of the guest create, ie > vcpu and domain initilization. Some changes to support the destroy path. > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > --- > xen/arch/x86/hvm/vmx/vmx.c | 40 ++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 40 insertions(+), 0 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index 80109c1..7b141d6 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -82,6 +82,9 @@ static int vmx_domain_initialise(struct domain *d) > { > int rc; > > + if ( is_pvh_domain(d) ) > + return 0; > +This is not VMX specific. You can move this one call level up.> if ( (rc = vmx_alloc_vlapic_mapping(d)) != 0 ) > return rc; > > @@ -90,6 +93,9 @@ static int vmx_domain_initialise(struct domain *d) > > static void vmx_domain_destroy(struct domain *d) > { > + if ( is_pvh_domain(d) ) > + return; > +Ditto.> vmx_free_vlapic_mapping(d); > } > > @@ -113,6 +119,12 @@ static int vmx_vcpu_initialise(struct vcpu *v) > > vpmu_initialise(v); > > + if ( is_pvh_vcpu(v) ) > + { > + /* This for hvm_long_mode_enabled(v). */ > + v->arch.hvm_vcpu.guest_efer = EFER_SCE | EFER_LMA | EFER_LME; > + return 0; > + }Ditto. The changes below are VMX specific and can stay. Christoph> vmx_install_vlapic_mapping(v); > > /* %eax == 1 signals full real-mode support to the guest loader. */ > @@ -1076,6 +1088,28 @@ static void vmx_update_host_cr3(struct vcpu *v) > vmx_vmcs_exit(v); > } > > +/* > + * PVH guest never causes CR3 write vmexit. This is called during the guest > + * setup. > + */ > +static void vmx_update_pvh_cr(struct vcpu *v, unsigned int cr) > +{ > + vmx_vmcs_enter(v); > + switch ( cr ) > + { > + case 3: > + __vmwrite(GUEST_CR3, v->arch.hvm_vcpu.guest_cr[3]); > + hvm_asid_flush_vcpu(v); > + break; > + > + default: > + printk(XENLOG_ERR > + "PVH: d%d v%d unexpected cr%d update at rip:%lx\n", > + v->domain->domain_id, v->vcpu_id, cr, __vmread(GUEST_RIP)); > + } > + vmx_vmcs_exit(v); > +} > + > void vmx_update_debug_state(struct vcpu *v) > { > unsigned long mask; > @@ -1095,6 +1129,12 @@ void vmx_update_debug_state(struct vcpu *v) > > static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr) > { > + if ( is_pvh_vcpu(v) ) > + { > + vmx_update_pvh_cr(v, cr); > + return; > + } > + > vmx_vmcs_enter(v); > > switch ( cr ) >
George Dunlap
2013-Aug-05 15:55 UTC
Re: [PATCH 02/23] PVH xen: turn gdb_frames/gdt_ents into union.
On Sat, Jul 20, 2013 at 2:44 AM, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:> Changes in V2: > - Add __XEN_INTERFACE_VERSION__ > > Changes in V3: > - Rename union to ''gdt'' and rename field names. > > Change in V9: > - Update __XEN_LATEST_INTERFACE_VERSION__ to 0x00040400 for compat. > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > Reviewed-by: Jan Beulich <jbeulich@suse.com>One thing that''s missing here is a description of *why* this change is being made. Seeing that it''s introducing a union isn''t too difficult; harder is figuring out why that''s necessary. Presumably this is to more closely reflect how an HVM guest''s GDT is stored -- i.e., in the guest''s memory and checked by the hardware on use, rather than in Xen''s memory, and checked by Xen on assignment? -George
Mukesh Rathor
2013-Aug-05 21:51 UTC
Re: [PATCH 02/23] PVH xen: turn gdb_frames/gdt_ents into union.
On Mon, 5 Aug 2013 16:55:22 +0100 George Dunlap <George.Dunlap@eu.citrix.com> wrote:> On Sat, Jul 20, 2013 at 2:44 AM, Mukesh Rathor > <mukesh.rathor@oracle.com> wrote: > > Changes in V2: > > - Add __XEN_INTERFACE_VERSION__ > > > > Changes in V3: > > - Rename union to ''gdt'' and rename field names. > > > > Change in V9: > > - Update __XEN_LATEST_INTERFACE_VERSION__ to 0x00040400 for > > compat. > > > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > > One thing that''s missing here is a description of *why* this change is > being made. Seeing that it''s introducing a union isn''t too difficult; > harder is figuring out why that''s necessary. > > Presumably this is to more closely reflect how an HVM guest''s GDT is > stored -- i.e., in the guest''s memory and checked by the hardware on > use, rather than in Xen''s memory, and checked by Xen on assignment?Right. Unlike PV which passes it''s gdt pages to xen to be installed, a PVH only passes the GDT base and size, as it manages its own GDT. I''ll add more to the comments. thanks mukesh
Ian Campbell
2013-Aug-06 07:36 UTC
Re: [PATCH 02/23] PVH xen: turn gdb_frames/gdt_ents into union.
On Mon, 2013-08-05 at 14:51 -0700, Mukesh Rathor wrote:> On Mon, 5 Aug 2013 16:55:22 +0100 > George Dunlap <George.Dunlap@eu.citrix.com> wrote: > > > On Sat, Jul 20, 2013 at 2:44 AM, Mukesh Rathor > > <mukesh.rathor@oracle.com> wrote: > > > Changes in V2: > > > - Add __XEN_INTERFACE_VERSION__ > > > > > > Changes in V3: > > > - Rename union to ''gdt'' and rename field names. > > > > > > Change in V9: > > > - Update __XEN_LATEST_INTERFACE_VERSION__ to 0x00040400 for > > > compat. > > > > > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > > > > One thing that''s missing here is a description of *why* this change is > > being made. Seeing that it''s introducing a union isn''t too difficult; > > harder is figuring out why that''s necessary. > > > > Presumably this is to more closely reflect how an HVM guest''s GDT is > > stored -- i.e., in the guest''s memory and checked by the hardware on > > use, rather than in Xen''s memory, and checked by Xen on assignment? > > Right. Unlike PV which passes it''s gdt pages to xen to be installed, a > PVH only passes the GDT base and size, as it manages its own GDT. I''ll > add more to the comments.If the guest is managing its own gdt can''t it also use lgdt instructions etc and expect the hypervisor to pull the GDT info out of the VMCS when it needs it? Presumably there is a reason why this doesn''t work, perhaps because the gdt is specified by virtual address? (This ring a bell from a past conversation). It would be good to elaborate in the commit log. Ian.
George Dunlap
2013-Aug-06 13:50 UTC
Re: [PATCH 02/23] PVH xen: turn gdb_frames/gdt_ents into union.
On Tue, Aug 6, 2013 at 8:36 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote:> On Mon, 2013-08-05 at 14:51 -0700, Mukesh Rathor wrote: >> On Mon, 5 Aug 2013 16:55:22 +0100 >> George Dunlap <George.Dunlap@eu.citrix.com> wrote: >> >> > On Sat, Jul 20, 2013 at 2:44 AM, Mukesh Rathor >> > <mukesh.rathor@oracle.com> wrote: >> > > Changes in V2: >> > > - Add __XEN_INTERFACE_VERSION__ >> > > >> > > Changes in V3: >> > > - Rename union to ''gdt'' and rename field names. >> > > >> > > Change in V9: >> > > - Update __XEN_LATEST_INTERFACE_VERSION__ to 0x00040400 for >> > > compat. >> > > >> > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> >> > > Reviewed-by: Jan Beulich <jbeulich@suse.com> >> > >> > One thing that''s missing here is a description of *why* this change is >> > being made. Seeing that it''s introducing a union isn''t too difficult; >> > harder is figuring out why that''s necessary. >> > >> > Presumably this is to more closely reflect how an HVM guest''s GDT is >> > stored -- i.e., in the guest''s memory and checked by the hardware on >> > use, rather than in Xen''s memory, and checked by Xen on assignment? >> >> Right. Unlike PV which passes it''s gdt pages to xen to be installed, a >> PVH only passes the GDT base and size, as it manages its own GDT. I''ll >> add more to the comments. > > If the guest is managing its own gdt can''t it also use lgdt instructions > etc and expect the hypervisor to pull the GDT info out of the VMCS when > it needs it?If the toolstack wants to read the state, maybe? -George
Ian Campbell
2013-Aug-06 13:57 UTC
Re: [PATCH 02/23] PVH xen: turn gdb_frames/gdt_ents into union.
On Tue, 2013-08-06 at 14:50 +0100, George Dunlap wrote:> On Tue, Aug 6, 2013 at 8:36 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Mon, 2013-08-05 at 14:51 -0700, Mukesh Rathor wrote: > >> On Mon, 5 Aug 2013 16:55:22 +0100 > >> George Dunlap <George.Dunlap@eu.citrix.com> wrote: > >> > >> > On Sat, Jul 20, 2013 at 2:44 AM, Mukesh Rathor > >> > <mukesh.rathor@oracle.com> wrote: > >> > > Changes in V2: > >> > > - Add __XEN_INTERFACE_VERSION__ > >> > > > >> > > Changes in V3: > >> > > - Rename union to ''gdt'' and rename field names. > >> > > > >> > > Change in V9: > >> > > - Update __XEN_LATEST_INTERFACE_VERSION__ to 0x00040400 for > >> > > compat. > >> > > > >> > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > >> > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > >> > > >> > One thing that''s missing here is a description of *why* this change is > >> > being made. Seeing that it''s introducing a union isn''t too difficult; > >> > harder is figuring out why that''s necessary. > >> > > >> > Presumably this is to more closely reflect how an HVM guest''s GDT is > >> > stored -- i.e., in the guest''s memory and checked by the hardware on > >> > use, rather than in Xen''s memory, and checked by Xen on assignment? > >> > >> Right. Unlike PV which passes it''s gdt pages to xen to be installed, a > >> PVH only passes the GDT base and size, as it manages its own GDT. I''ll > >> add more to the comments. > > > > If the guest is managing its own gdt can''t it also use lgdt instructions > > etc and expect the hypervisor to pull the GDT info out of the VMCS when > > it needs it? > > If the toolstack wants to read the state, maybe?"when it needs to" would include when the toolstack uses whichever domctl queries this stuff (if there even is one) Ian.
Mukesh Rathor
2013-Aug-06 19:09 UTC
Re: [PATCH 02/23] PVH xen: turn gdb_frames/gdt_ents into union.
On Tue, 6 Aug 2013 08:36:52 +0100 Ian Campbell <Ian.Campbell@citrix.com> wrote:> On Mon, 2013-08-05 at 14:51 -0700, Mukesh Rathor wrote: > > On Mon, 5 Aug 2013 16:55:22 +0100 > > George Dunlap <George.Dunlap@eu.citrix.com> wrote: > > > > > On Sat, Jul 20, 2013 at 2:44 AM, Mukesh Rathor > > > <mukesh.rathor@oracle.com> wrote: > > > > Changes in V2: > > > > - Add __XEN_INTERFACE_VERSION__ > > > > > > > > Changes in V3: > > > > - Rename union to ''gdt'' and rename field names. > > > > > > > > Change in V9: > > > > - Update __XEN_LATEST_INTERFACE_VERSION__ to 0x00040400 for > > > > compat. > > > > > > > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > > > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > > > > > > One thing that''s missing here is a description of *why* this > > > change is being made. Seeing that it''s introducing a union isn''t > > > too difficult; harder is figuring out why that''s necessary. > > > > > > Presumably this is to more closely reflect how an HVM guest''s GDT > > > is stored -- i.e., in the guest''s memory and checked by the > > > hardware on use, rather than in Xen''s memory, and checked by Xen > > > on assignment? > > > > Right. Unlike PV which passes it''s gdt pages to xen to be > > installed, a PVH only passes the GDT base and size, as it manages > > its own GDT. I''ll add more to the comments. > > If the guest is managing its own gdt can''t it also use lgdt > instructions etc and expect the hypervisor to pull the GDT info out > of the VMCS when it needs it?That''s what a PVH guest does during it''s lifetime, ie, uses lgdt. But like the comment says: "The boot vcpu calls this to set some context for the non boot smp vcpu" ..we are doing this for jump starting a secondary vcpu to a very advanced known state. Once the vcpu is up and running, it just uses lgdt, loads it''s own selectors, etc.... thanks mukesh
Ian Campbell
2013-Aug-07 08:32 UTC
Re: [PATCH 02/23] PVH xen: turn gdb_frames/gdt_ents into union.
On Tue, 2013-08-06 at 12:09 -0700, Mukesh Rathor wrote:> On Tue, 6 Aug 2013 08:36:52 +0100 > Ian Campbell <Ian.Campbell@citrix.com> wrote: > > > On Mon, 2013-08-05 at 14:51 -0700, Mukesh Rathor wrote: > > > On Mon, 5 Aug 2013 16:55:22 +0100 > > > George Dunlap <George.Dunlap@eu.citrix.com> wrote: > > > > > > > On Sat, Jul 20, 2013 at 2:44 AM, Mukesh Rathor > > > > <mukesh.rathor@oracle.com> wrote: > > > > > Changes in V2: > > > > > - Add __XEN_INTERFACE_VERSION__ > > > > > > > > > > Changes in V3: > > > > > - Rename union to ''gdt'' and rename field names. > > > > > > > > > > Change in V9: > > > > > - Update __XEN_LATEST_INTERFACE_VERSION__ to 0x00040400 for > > > > > compat. > > > > > > > > > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > > > > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > > > > > > > > One thing that''s missing here is a description of *why* this > > > > change is being made. Seeing that it''s introducing a union isn''t > > > > too difficult; harder is figuring out why that''s necessary. > > > > > > > > Presumably this is to more closely reflect how an HVM guest''s GDT > > > > is stored -- i.e., in the guest''s memory and checked by the > > > > hardware on use, rather than in Xen''s memory, and checked by Xen > > > > on assignment? > > > > > > Right. Unlike PV which passes it''s gdt pages to xen to be > > > installed, a PVH only passes the GDT base and size, as it manages > > > its own GDT. I''ll add more to the comments. > > > > If the guest is managing its own gdt can''t it also use lgdt > > instructions etc and expect the hypervisor to pull the GDT info out > > of the VMCS when it needs it? > > That''s what a PVH guest does during it''s lifetime, ie, uses lgdt. > But like the comment says: > > "The boot vcpu calls this to set some context for the non boot smp vcpu"Oops sorry, that''s I get for jumping in half way and not going back to the root of the thread.> ..we are doing this for jump starting a secondary vcpu to a very advanced > known state. Once the vcpu is up and running, it just uses lgdt, loads > it''s own selectors, etc....Sounds reasonable. Ian