Mukesh Rathor
2013-Jul-18 02:32 UTC
[PATCH 00/24][V8]PVH xen: Phase I, Version 8 patches...
Hi all, This is V8 of PVH patches for xen. These are xen changes to support boot of a 64bit PVH domU guest. Built on top of unstable git c/s: 5d0ca62156d734a757656b9bcb6bf17ee76d37b4. New in V8: - Add docs/misc/pvh-readme.txt per Konrad''s suggestion. - Redo macros guest_kernel_mode and read_segment_register. - Reorg and break down HVM+VMX patches to HVM and VMX as suggested. Patches 3/5/16 have already been "Reviewed-by". This patchset will also be on a public git tree in less than 24 hours and I''ll email the details as soon as its done. Coming in future after this is done, two patchsets: - 1) tools changes and 2) dom0 changes. Thanks for all the help, Mukesh
Mukesh Rathor
2013-Jul-18 02:32 UTC
[PATCH 01/24] PVH xen: Add readme docs/misc/pvh-readme.txt
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- docs/misc/pvh-readme.txt | 40 ++++++++++++++++++++++++++++++++++++++++ 1 files changed, 40 insertions(+), 0 deletions(-) create mode 100644 docs/misc/pvh-readme.txt diff --git a/docs/misc/pvh-readme.txt b/docs/misc/pvh-readme.txt new file mode 100644 index 0000000..a813373 --- /dev/null +++ b/docs/misc/pvh-readme.txt @@ -0,0 +1,40 @@ + +PVH : a pv guest running in an HVM container. HAP is required for PVH. + +See: http://blog.xen.org/index.php/2012/10/23/the-paravirtualization-spectrum-part-1-the-ends-of-the-spectrum/ + + +The initial phase targets the booting of a 64bit UP/SMP linux guest in PVH +mode. This is done by adding: pvh=1 in the config file. xl, and not xm, is +supported. Phase I patches are broken into three parts: + - xen changes for booting of 64bit PVH guest + - tools changes for creating a PVH guest + - boot of 64bit dom0 in PVH mode. + +The best way to find all the patches is to use "git log|grep -i PVH", both +in xen and linux tree. + +Following fixme''s exist in the code: + - Add support for more memory types in arch/x86/hvm/mtrr.c. + - arch/x86/time.c: support more tsc modes. + - check_guest_io_breakpoint(): check/add support for IO breakpoint. + - implement arch_get_info_guest() for pvh. + - vmxit_msr_read(): during AMD port go thru hvm_msr_read_intercept() again. + - verify bp matching on emulated instructions will work same as HVM for + PVH guest. see instruction_done() and check_guest_io_breakpoint(). + +Following remain to be done for PVH: + - AMD port. + - 32bit PVH guest support in both linux and xen. Xen changes are tagged + "32bitfixme". + - Add support for monitoring guest behavior. See hvm_memory_event* functions + in hvm.c + - vcpu hotplug support + - Live migration of PVH guests. + - Avail PVH dom0 of posted interrupts. (This will be a big win). + + +Note, any emails to must be cc''d to Xen-devel@lists.xensource.com. + +Mukesh Rathor +mukesh.rathor [at] oracle [dot] com -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:32 UTC
[PATCH 02/24] PVH xen: update __XEN_LATEST_INTERFACE_VERSION__
Update __XEN_LATEST_INTERFACE_VERSION__ to 0x00040400 because of the gdb union changes in the next patch titled "turn gdb_frames/gdt_ents into union". Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/include/public/xen-compat.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/xen/include/public/xen-compat.h b/xen/include/public/xen-compat.h index 69141c4..3eb80a0 100644 --- a/xen/include/public/xen-compat.h +++ b/xen/include/public/xen-compat.h @@ -27,7 +27,7 @@ #ifndef __XEN_PUBLIC_XEN_COMPAT_H__ #define __XEN_PUBLIC_XEN_COMPAT_H__ -#define __XEN_LATEST_INTERFACE_VERSION__ 0x00040300 +#define __XEN_LATEST_INTERFACE_VERSION__ 0x00040400 #if defined(__XEN__) || defined(__XEN_TOOLS__) /* Xen is built with matching headers and implements the latest interface. */ -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:32 UTC
[PATCH 03/24] PVH xen: turn gdb_frames/gdt_ents into union
Changes in V2: - Add __XEN_INTERFACE_VERSION__ Changes in V3: - Rename union to ''gdt'' and rename field names. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- tools/libxc/xc_domain_restore.c | 8 ++++---- tools/libxc/xc_domain_save.c | 6 +++--- xen/arch/x86/domain.c | 12 ++++++------ xen/arch/x86/domctl.c | 12 ++++++------ xen/include/public/arch-x86/xen.h | 14 ++++++++++++++ 5 files changed, 33 insertions(+), 19 deletions(-) diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c index 63d36cd..47aaca0 100644 --- a/tools/libxc/xc_domain_restore.c +++ b/tools/libxc/xc_domain_restore.c @@ -2055,15 +2055,15 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, munmap(start_info, PAGE_SIZE); } /* Uncanonicalise each GDT frame number. */ - if ( GET_FIELD(ctxt, gdt_ents) > 8192 ) + if ( GET_FIELD(ctxt, gdt.pv.num_ents) > 8192 ) { ERROR("GDT entry count out of range"); goto out; } - for ( j = 0; (512*j) < GET_FIELD(ctxt, gdt_ents); j++ ) + for ( j = 0; (512*j) < GET_FIELD(ctxt, gdt.pv.num_ents); j++ ) { - pfn = GET_FIELD(ctxt, gdt_frames[j]); + pfn = GET_FIELD(ctxt, gdt.pv.frames[j]); if ( (pfn >= dinfo->p2m_size) || (pfn_type[pfn] != XEN_DOMCTL_PFINFO_NOTAB) ) { @@ -2071,7 +2071,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, j, (unsigned long)pfn); goto out; } - SET_FIELD(ctxt, gdt_frames[j], ctx->p2m[pfn]); + SET_FIELD(ctxt, gdt.pv.frames[j], ctx->p2m[pfn]); } /* Uncanonicalise the page table base pointer. */ pfn = UNFOLD_CR3(GET_FIELD(ctxt, ctrlreg[3])); diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c index fbc15e9..e938628 100644 --- a/tools/libxc/xc_domain_save.c +++ b/tools/libxc/xc_domain_save.c @@ -1907,15 +1907,15 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter } /* Canonicalise each GDT frame number. */ - for ( j = 0; (512*j) < GET_FIELD(&ctxt, gdt_ents); j++ ) + for ( j = 0; (512*j) < GET_FIELD(&ctxt, gdt.pv.num_ents); j++ ) { - mfn = GET_FIELD(&ctxt, gdt_frames[j]); + mfn = GET_FIELD(&ctxt, gdt.pv.frames[j]); if ( !MFN_IS_IN_PSEUDOPHYS_MAP(mfn) ) { ERROR("GDT frame is not in range of pseudophys map"); goto out; } - SET_FIELD(&ctxt, gdt_frames[j], mfn_to_pfn(mfn)); + SET_FIELD(&ctxt, gdt.pv.frames[j], mfn_to_pfn(mfn)); } /* Canonicalise the page table base pointer. */ diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 874742c..73ddad7 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -784,8 +784,8 @@ int arch_set_info_guest( } for ( i = 0; i < ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames); ++i ) - fail |= v->arch.pv_vcpu.gdt_frames[i] != c(gdt_frames[i]); - fail |= v->arch.pv_vcpu.gdt_ents != c(gdt_ents); + fail |= v->arch.pv_vcpu.gdt_frames[i] != c(gdt.pv.frames[i]); + fail |= v->arch.pv_vcpu.gdt_ents != c(gdt.pv.num_ents); fail |= v->arch.pv_vcpu.ldt_base != c(ldt_base); fail |= v->arch.pv_vcpu.ldt_ents != c(ldt_ents); @@ -838,17 +838,17 @@ int arch_set_info_guest( return rc; if ( !compat ) - rc = (int)set_gdt(v, c.nat->gdt_frames, c.nat->gdt_ents); + rc = (int)set_gdt(v, c.nat->gdt.pv.frames, c.nat->gdt.pv.num_ents); else { unsigned long gdt_frames[ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames)]; - unsigned int n = (c.cmp->gdt_ents + 511) / 512; + unsigned int n = (c.cmp->gdt.pv.num_ents + 511) / 512; if ( n > ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames) ) return -EINVAL; for ( i = 0; i < n; ++i ) - gdt_frames[i] = c.cmp->gdt_frames[i]; - rc = (int)set_gdt(v, gdt_frames, c.cmp->gdt_ents); + gdt_frames[i] = c.cmp->gdt.pv.frames[i]; + rc = (int)set_gdt(v, gdt_frames, c.cmp->gdt.pv.num_ents); } if ( rc != 0 ) return rc; diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index c2a04c4..f87d6ab 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -1300,12 +1300,12 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c) c(ldt_base = v->arch.pv_vcpu.ldt_base); c(ldt_ents = v->arch.pv_vcpu.ldt_ents); for ( i = 0; i < ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames); ++i ) - c(gdt_frames[i] = v->arch.pv_vcpu.gdt_frames[i]); - BUILD_BUG_ON(ARRAY_SIZE(c.nat->gdt_frames) !- ARRAY_SIZE(c.cmp->gdt_frames)); - for ( ; i < ARRAY_SIZE(c.nat->gdt_frames); ++i ) - c(gdt_frames[i] = 0); - c(gdt_ents = v->arch.pv_vcpu.gdt_ents); + c(gdt.pv.frames[i] = v->arch.pv_vcpu.gdt_frames[i]); + BUILD_BUG_ON(ARRAY_SIZE(c.nat->gdt.pv.frames) !+ ARRAY_SIZE(c.cmp->gdt.pv.frames)); + for ( ; i < ARRAY_SIZE(c.nat->gdt.pv.frames); ++i ) + c(gdt.pv.frames[i] = 0); + c(gdt.pv.num_ents = v->arch.pv_vcpu.gdt_ents); c(kernel_ss = v->arch.pv_vcpu.kernel_ss); c(kernel_sp = v->arch.pv_vcpu.kernel_sp); for ( i = 0; i < ARRAY_SIZE(v->arch.pv_vcpu.ctrlreg); ++i ) diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h index b7f6a51..25c8519 100644 --- a/xen/include/public/arch-x86/xen.h +++ b/xen/include/public/arch-x86/xen.h @@ -170,7 +170,21 @@ struct vcpu_guest_context { struct cpu_user_regs user_regs; /* User-level CPU registers */ struct trap_info trap_ctxt[256]; /* Virtual IDT */ unsigned long ldt_base, ldt_ents; /* LDT (linear address, # ents) */ +#if __XEN_INTERFACE_VERSION__ < 0x00040400 unsigned long gdt_frames[16], gdt_ents; /* GDT (machine frames, # ents) */ +#else + union { + struct { + /* GDT (machine frames, # ents) */ + unsigned long frames[16], num_ents; + } pv; + struct { + /* PVH: GDTR addr and size */ + uint64_t addr; + uint16_t limit; + } pvh; + } gdt; +#endif unsigned long kernel_ss, kernel_sp; /* Virtual TSS (only SS1/SP1) */ /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */ unsigned long ctrlreg[8]; /* CR0-CR7 (control registers) */ -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:32 UTC
[PATCH 04/24] PVH xen: add params to read_segment_register
In this preparatory patch, read_segment_register macro is changed to take vcpu and regs parameters. No functionality change. Changes in V2: None Changes in V3: - Replace read_sreg with read_segment_register Changes in V7: - Don''t make emulate_privileged_op() public here. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/domain.c | 8 ++++---- xen/arch/x86/traps.c | 26 ++++++++++++-------------- xen/arch/x86/x86_64/traps.c | 16 ++++++++-------- xen/include/asm-x86/system.h | 2 +- 4 files changed, 25 insertions(+), 27 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 73ddad7..5de5e49 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1221,10 +1221,10 @@ static void save_segments(struct vcpu *v) struct cpu_user_regs *regs = &v->arch.user_regs; unsigned int dirty_segment_mask = 0; - regs->ds = read_segment_register(ds); - regs->es = read_segment_register(es); - regs->fs = read_segment_register(fs); - regs->gs = read_segment_register(gs); + regs->ds = read_segment_register(v, regs, ds); + regs->es = read_segment_register(v, regs, es); + regs->fs = read_segment_register(v, regs, fs); + regs->gs = read_segment_register(v, regs, gs); if ( regs->ds ) dirty_segment_mask |= DIRTY_DS; diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 57dbd0c..378ef0a 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -1831,8 +1831,6 @@ static inline uint64_t guest_misc_enable(uint64_t val) } \ (eip) += sizeof(_x); _x; }) -#define read_sreg(regs, sr) read_segment_register(sr) - static int is_cpufreq_controller(struct domain *d) { return ((cpufreq_controller == FREQCTL_dom0_kernel) && @@ -1877,7 +1875,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) goto fail; /* emulating only opcodes not allowing SS to be default */ - data_sel = read_sreg(regs, ds); + data_sel = read_segment_register(v, regs, ds); /* Legacy prefixes. */ for ( i = 0; i < 8; i++, rex == opcode || (rex = 0) ) @@ -1895,17 +1893,17 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) data_sel = regs->cs; continue; case 0x3e: /* DS override */ - data_sel = read_sreg(regs, ds); + data_sel = read_segment_register(v, regs, ds); continue; case 0x26: /* ES override */ - data_sel = read_sreg(regs, es); + data_sel = read_segment_register(v, regs, es); continue; case 0x64: /* FS override */ - data_sel = read_sreg(regs, fs); + data_sel = read_segment_register(v, regs, fs); lm_ovr = lm_seg_fs; continue; case 0x65: /* GS override */ - data_sel = read_sreg(regs, gs); + data_sel = read_segment_register(v, regs, gs); lm_ovr = lm_seg_gs; continue; case 0x36: /* SS override */ @@ -1952,7 +1950,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) if ( !(opcode & 2) ) { - data_sel = read_sreg(regs, es); + data_sel = read_segment_register(v, regs, es); lm_ovr = lm_seg_none; } @@ -2685,22 +2683,22 @@ static void emulate_gate_op(struct cpu_user_regs *regs) ASSERT(opnd_sel); continue; case 0x3e: /* DS override */ - opnd_sel = read_sreg(regs, ds); + opnd_sel = read_segment_register(v, regs, ds); if ( !opnd_sel ) opnd_sel = dpl; continue; case 0x26: /* ES override */ - opnd_sel = read_sreg(regs, es); + opnd_sel = read_segment_register(v, regs, es); if ( !opnd_sel ) opnd_sel = dpl; continue; case 0x64: /* FS override */ - opnd_sel = read_sreg(regs, fs); + opnd_sel = read_segment_register(v, regs, fs); if ( !opnd_sel ) opnd_sel = dpl; continue; case 0x65: /* GS override */ - opnd_sel = read_sreg(regs, gs); + opnd_sel = read_segment_register(v, regs, gs); if ( !opnd_sel ) opnd_sel = dpl; continue; @@ -2753,7 +2751,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs) switch ( modrm & 7 ) { default: - opnd_sel = read_sreg(regs, ds); + opnd_sel = read_segment_register(v, regs, ds); break; case 4: case 5: opnd_sel = regs->ss; @@ -2781,7 +2779,7 @@ static void emulate_gate_op(struct cpu_user_regs *regs) break; } if ( !opnd_sel ) - opnd_sel = read_sreg(regs, ds); + opnd_sel = read_segment_register(v, regs, ds); switch ( modrm & 7 ) { case 0: case 2: case 4: diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c index bcd7609..9e0571d 100644 --- a/xen/arch/x86/x86_64/traps.c +++ b/xen/arch/x86/x86_64/traps.c @@ -122,10 +122,10 @@ void show_registers(struct cpu_user_regs *regs) fault_crs[0] = read_cr0(); fault_crs[3] = read_cr3(); fault_crs[4] = read_cr4(); - fault_regs.ds = read_segment_register(ds); - fault_regs.es = read_segment_register(es); - fault_regs.fs = read_segment_register(fs); - fault_regs.gs = read_segment_register(gs); + fault_regs.ds = read_segment_register(v, regs, ds); + fault_regs.es = read_segment_register(v, regs, es); + fault_regs.fs = read_segment_register(v, regs, fs); + fault_regs.gs = read_segment_register(v, regs, gs); } print_xen_info(); @@ -240,10 +240,10 @@ void do_double_fault(struct cpu_user_regs *regs) crs[2] = read_cr2(); crs[3] = read_cr3(); crs[4] = read_cr4(); - regs->ds = read_segment_register(ds); - regs->es = read_segment_register(es); - regs->fs = read_segment_register(fs); - regs->gs = read_segment_register(gs); + regs->ds = read_segment_register(current, regs, ds); + regs->es = read_segment_register(current, regs, es); + regs->fs = read_segment_register(current, regs, fs); + regs->gs = read_segment_register(current, regs, gs); printk("CPU: %d\n", cpu); _show_registers(regs, crs, CTXT_hypervisor, NULL); diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h index 6ab7d56..9bb22cb 100644 --- a/xen/include/asm-x86/system.h +++ b/xen/include/asm-x86/system.h @@ -4,7 +4,7 @@ #include <xen/lib.h> #include <xen/bitops.h> -#define read_segment_register(name) \ +#define read_segment_register(vcpu, regs, name) \ ({ u16 __sel; \ asm volatile ( "movw %%" STR(name) ",%0" : "=r" (__sel) ); \ __sel; \ -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:32 UTC
[PATCH 05/24] PVH xen: Move e820 fields out of pv_domain struct
This patch moves fields out of the pv_domain struct as they are used by PVH also. Changes in V6: - Don''t base on guest type the initialization and cleanup. Changes in V7: - If statement doesn''t need to be split across lines anymore. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/arch/x86/domain.c | 10 ++++------ xen/arch/x86/mm.c | 26 ++++++++++++-------------- xen/include/asm-x86/domain.h | 10 +++++----- 3 files changed, 21 insertions(+), 25 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 5de5e49..c361abf 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -553,6 +553,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) if ( (rc = iommu_domain_init(d)) != 0 ) goto fail; } + spin_lock_init(&d->arch.e820_lock); if ( is_hvm_domain(d) ) { @@ -563,13 +564,9 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) } } else - { /* 64-bit PV guest by default. */ d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0; - spin_lock_init(&d->arch.pv_domain.e820_lock); - } - /* initialize default tsc behavior in case tools don''t */ tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0); spin_lock_init(&d->arch.vtsc_lock); @@ -592,8 +589,9 @@ void arch_domain_destroy(struct domain *d) { if ( is_hvm_domain(d) ) hvm_domain_destroy(d); - else - xfree(d->arch.pv_domain.e820); + + if ( d->arch.e820 ) + xfree(d->arch.e820); free_domain_pirqs(d); if ( !is_idle_domain(d) ) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 286e903..e980431 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4763,11 +4763,11 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) return -EFAULT; } - spin_lock(&d->arch.pv_domain.e820_lock); - xfree(d->arch.pv_domain.e820); - d->arch.pv_domain.e820 = e820; - d->arch.pv_domain.nr_e820 = fmap.map.nr_entries; - spin_unlock(&d->arch.pv_domain.e820_lock); + spin_lock(&d->arch.e820_lock); + xfree(d->arch.e820); + d->arch.e820 = e820; + d->arch.nr_e820 = fmap.map.nr_entries; + spin_unlock(&d->arch.e820_lock); rcu_unlock_domain(d); return rc; @@ -4781,26 +4781,24 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) if ( copy_from_guest(&map, arg, 1) ) return -EFAULT; - spin_lock(&d->arch.pv_domain.e820_lock); + spin_lock(&d->arch.e820_lock); /* Backwards compatibility. */ - if ( (d->arch.pv_domain.nr_e820 == 0) || - (d->arch.pv_domain.e820 == NULL) ) + if ( (d->arch.nr_e820 == 0) || (d->arch.e820 == NULL) ) { - spin_unlock(&d->arch.pv_domain.e820_lock); + spin_unlock(&d->arch.e820_lock); return -ENOSYS; } - map.nr_entries = min(map.nr_entries, d->arch.pv_domain.nr_e820); - if ( copy_to_guest(map.buffer, d->arch.pv_domain.e820, - map.nr_entries) || + map.nr_entries = min(map.nr_entries, d->arch.nr_e820); + if ( copy_to_guest(map.buffer, d->arch.e820, map.nr_entries) || __copy_to_guest(arg, &map, 1) ) { - spin_unlock(&d->arch.pv_domain.e820_lock); + spin_unlock(&d->arch.e820_lock); return -EFAULT; } - spin_unlock(&d->arch.pv_domain.e820_lock); + spin_unlock(&d->arch.e820_lock); return 0; } diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index d79464d..c3f9f8e 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -234,11 +234,6 @@ struct pv_domain /* map_domain_page() mapping cache. */ struct mapcache_domain mapcache; - - /* Pseudophysical e820 map (XENMEM_memory_map). */ - spinlock_t e820_lock; - struct e820entry *e820; - unsigned int nr_e820; }; struct arch_domain @@ -313,6 +308,11 @@ struct arch_domain (possibly other cases in the future */ uint64_t vtsc_kerncount; /* for hvm, counts all vtsc */ uint64_t vtsc_usercount; /* not used for hvm */ + + /* Pseudophysical e820 map (XENMEM_memory_map). */ + spinlock_t e820_lock; + struct e820entry *e820; + unsigned int nr_e820; } __cacheline_aligned; #define has_arch_pdevs(d) (!list_empty(&(d)->arch.pdev_list)) -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:32 UTC
[PATCH 06/24] PVH xen: hvm related preparatory changes for PVH
This patch contains small changes to hvm.c because hvm_domain.params is not set/used/supported for PVH in the present series. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/hvm.c | 10 ++++++---- 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 1fcaed0..8284b3b 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1070,10 +1070,13 @@ int hvm_vcpu_initialise(struct vcpu *v) { int rc; struct domain *d = v->domain; - domid_t dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN]; + domid_t dm_domid; hvm_asid_flush_vcpu(v); + spin_lock_init(&v->arch.hvm_vcpu.tm_lock); + INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list); + if ( (rc = vlapic_init(v)) != 0 ) goto fail1; @@ -1084,6 +1087,8 @@ int hvm_vcpu_initialise(struct vcpu *v) && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) goto fail3; + dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN]; + /* Create ioreq event channel. */ rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); if ( rc < 0 ) @@ -1106,9 +1111,6 @@ int hvm_vcpu_initialise(struct vcpu *v) get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port; spin_unlock(&d->arch.hvm_domain.ioreq.lock); - spin_lock_init(&v->arch.hvm_vcpu.tm_lock); - INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list); - v->arch.hvm_vcpu.inject_trap.vector = -1; rc = setup_compat_arg_xlat(v); -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:32 UTC
[PATCH 07/24] PVH xen: vmx related preparatory changes for PVH
This is another preparatory patch for PVH. In this patch, following functions are made available for general/public use: vmx_fpu_enter(), get_instruction_length(), update_guest_eip(), and vmx_dr_access(). There is no functionality change. Changes in V2: - prepend vmx_ to get_instruction_length and update_guest_eip. - Do not export/use vmr(). Changes in V3: - Do not change emulate_forced_invalid_op() in this patch. Changes in V7: - Drop pv_cpuid going public here. Changes in V8: - Move vmx_fpu_enter prototype from vmcs.h to vmx.h Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- xen/arch/x86/hvm/vmx/vmx.c | 72 +++++++++++++++--------------------- xen/arch/x86/hvm/vmx/vvmx.c | 2 +- xen/include/asm-x86/hvm/vmx/vmx.h | 17 ++++++++- 3 files changed, 47 insertions(+), 44 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index d6540e3..195f9ed 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -577,7 +577,7 @@ static int vmx_load_vmcs_ctxt(struct vcpu *v, struct hvm_hw_cpu *ctxt) return 0; } -static void vmx_fpu_enter(struct vcpu *v) +void vmx_fpu_enter(struct vcpu *v) { vcpu_restore_fpu_lazy(v); v->arch.hvm_vmx.exception_bitmap &= ~(1u << TRAP_no_device); @@ -1597,24 +1597,12 @@ const struct hvm_function_table * __init start_vmx(void) return &vmx_function_table; } -/* - * Not all cases receive valid value in the VM-exit instruction length field. - * Callers must know what they''re doing! - */ -static int get_instruction_length(void) -{ - int len; - len = __vmread(VM_EXIT_INSTRUCTION_LEN); /* Safe: callers audited */ - BUG_ON((len < 1) || (len > 15)); - return len; -} - -void update_guest_eip(void) +void vmx_update_guest_eip(void) { struct cpu_user_regs *regs = guest_cpu_user_regs(); unsigned long x; - regs->eip += get_instruction_length(); /* Safe: callers audited */ + regs->eip += vmx_get_instruction_length(); /* Safe: callers audited */ regs->eflags &= ~X86_EFLAGS_RF; x = __vmread(GUEST_INTERRUPTIBILITY_INFO); @@ -1687,8 +1675,8 @@ static void vmx_do_cpuid(struct cpu_user_regs *regs) regs->edx = edx; } -static void vmx_dr_access(unsigned long exit_qualification, - struct cpu_user_regs *regs) +void vmx_dr_access(unsigned long exit_qualification, + struct cpu_user_regs *regs) { struct vcpu *v = current; @@ -2301,7 +2289,7 @@ static int vmx_handle_eoi_write(void) if ( (((exit_qualification >> 12) & 0xf) == 1) && ((exit_qualification & 0xfff) == APIC_EOI) ) { - update_guest_eip(); /* Safe: APIC data write */ + vmx_update_guest_eip(); /* Safe: APIC data write */ vlapic_EOI_set(vcpu_vlapic(current)); HVMTRACE_0D(VLAPIC); return 1; @@ -2514,7 +2502,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) HVMTRACE_1D(TRAP, vector); if ( v->domain->debugger_attached ) { - update_guest_eip(); /* Safe: INT3 */ + vmx_update_guest_eip(); /* Safe: INT3 */ current->arch.gdbsx_vcpu_event = TRAP_int3; domain_pause_for_debugger(); break; @@ -2622,7 +2610,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) */ inst_len = ((source != 3) || /* CALL, IRET, or JMP? */ (idtv_info & (1u<<10))) /* IntrType > 3? */ - ? get_instruction_length() /* Safe: SDM 3B 23.2.4 */ : 0; + ? vmx_get_instruction_length() /* Safe: SDM 3B 23.2.4 */ : 0; if ( (source == 3) && (idtv_info & INTR_INFO_DELIVER_CODE_MASK) ) ecode = __vmread(IDT_VECTORING_ERROR_CODE); regs->eip += inst_len; @@ -2630,15 +2618,15 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) break; } case EXIT_REASON_CPUID: - update_guest_eip(); /* Safe: CPUID */ + vmx_update_guest_eip(); /* Safe: CPUID */ vmx_do_cpuid(regs); break; case EXIT_REASON_HLT: - update_guest_eip(); /* Safe: HLT */ + vmx_update_guest_eip(); /* Safe: HLT */ hvm_hlt(regs->eflags); break; case EXIT_REASON_INVLPG: - update_guest_eip(); /* Safe: INVLPG */ + vmx_update_guest_eip(); /* Safe: INVLPG */ exit_qualification = __vmread(EXIT_QUALIFICATION); vmx_invlpg_intercept(exit_qualification); break; @@ -2646,7 +2634,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) regs->ecx = hvm_msr_tsc_aux(v); /* fall through */ case EXIT_REASON_RDTSC: - update_guest_eip(); /* Safe: RDTSC, RDTSCP */ + vmx_update_guest_eip(); /* Safe: RDTSC, RDTSCP */ hvm_rdtsc_intercept(regs); break; case EXIT_REASON_VMCALL: @@ -2656,7 +2644,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) rc = hvm_do_hypercall(regs); if ( rc != HVM_HCALL_preempted ) { - update_guest_eip(); /* Safe: VMCALL */ + vmx_update_guest_eip(); /* Safe: VMCALL */ if ( rc == HVM_HCALL_invalidate ) send_invalidate_req(); } @@ -2666,7 +2654,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) { exit_qualification = __vmread(EXIT_QUALIFICATION); if ( vmx_cr_access(exit_qualification) == X86EMUL_OKAY ) - update_guest_eip(); /* Safe: MOV Cn, LMSW, CLTS */ + vmx_update_guest_eip(); /* Safe: MOV Cn, LMSW, CLTS */ break; } case EXIT_REASON_DR_ACCESS: @@ -2680,7 +2668,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) { regs->eax = (uint32_t)msr_content; regs->edx = (uint32_t)(msr_content >> 32); - update_guest_eip(); /* Safe: RDMSR */ + vmx_update_guest_eip(); /* Safe: RDMSR */ } break; } @@ -2689,63 +2677,63 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) uint64_t msr_content; msr_content = ((uint64_t)regs->edx << 32) | (uint32_t)regs->eax; if ( hvm_msr_write_intercept(regs->ecx, msr_content) == X86EMUL_OKAY ) - update_guest_eip(); /* Safe: WRMSR */ + vmx_update_guest_eip(); /* Safe: WRMSR */ break; } case EXIT_REASON_VMXOFF: if ( nvmx_handle_vmxoff(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_VMXON: if ( nvmx_handle_vmxon(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_VMCLEAR: if ( nvmx_handle_vmclear(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_VMPTRLD: if ( nvmx_handle_vmptrld(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_VMPTRST: if ( nvmx_handle_vmptrst(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_VMREAD: if ( nvmx_handle_vmread(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_VMWRITE: if ( nvmx_handle_vmwrite(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_VMLAUNCH: if ( nvmx_handle_vmlaunch(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_VMRESUME: if ( nvmx_handle_vmresume(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_INVEPT: if ( nvmx_handle_invept(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_INVVPID: if ( nvmx_handle_invvpid(regs) == X86EMUL_OKAY ) - update_guest_eip(); + vmx_update_guest_eip(); break; case EXIT_REASON_MWAIT_INSTRUCTION: @@ -2793,14 +2781,14 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) int bytes = (exit_qualification & 0x07) + 1; int dir = (exit_qualification & 0x08) ? IOREQ_READ : IOREQ_WRITE; if ( handle_pio(port, bytes, dir) ) - update_guest_eip(); /* Safe: IN, OUT */ + vmx_update_guest_eip(); /* Safe: IN, OUT */ } break; case EXIT_REASON_INVD: case EXIT_REASON_WBINVD: { - update_guest_eip(); /* Safe: INVD, WBINVD */ + vmx_update_guest_eip(); /* Safe: INVD, WBINVD */ vmx_wbinvd_intercept(); break; } @@ -2832,7 +2820,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) case EXIT_REASON_XSETBV: if ( hvm_handle_xsetbv(regs->ecx, (regs->rdx << 32) | regs->_eax) == 0 ) - update_guest_eip(); /* Safe: XSETBV */ + vmx_update_guest_eip(); /* Safe: XSETBV */ break; case EXIT_REASON_APIC_WRITE: diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 5dfbc54..82be4cc 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -2139,7 +2139,7 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs, tsc += __get_vvmcs(nvcpu->nv_vvmcx, TSC_OFFSET); regs->eax = (uint32_t)tsc; regs->edx = (uint32_t)(tsc >> 32); - update_guest_eip(); + vmx_update_guest_eip(); return 1; } diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h index c33b9f9..c21a303 100644 --- a/xen/include/asm-x86/hvm/vmx/vmx.h +++ b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -446,6 +446,18 @@ static inline int __vmxon(u64 addr) return rc; } +/* + * Not all cases receive valid value in the VM-exit instruction length field. + * Callers must know what they''re doing! + */ +static inline int vmx_get_instruction_length(void) +{ + int len; + len = __vmread(VM_EXIT_INSTRUCTION_LEN); /* Safe: callers audited */ + BUG_ON((len < 1) || (len > 15)); + return len; +} + void vmx_get_segment_register(struct vcpu *, enum x86_segment, struct segment_register *); void vmx_inject_extint(int trap); @@ -457,7 +469,10 @@ void ept_p2m_uninit(struct p2m_domain *p2m); void ept_walk_table(struct domain *d, unsigned long gfn); void setup_ept_dump(void); -void update_guest_eip(void); +void vmx_update_guest_eip(void); +void vmx_dr_access(unsigned long exit_qualification, + struct cpu_user_regs *regs); +void vmx_fpu_enter(struct vcpu *v); int alloc_p2m_hap_data(struct p2m_domain *p2m); void free_p2m_hap_data(struct p2m_domain *p2m); -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:32 UTC
[PATCH 08/24] PVH xen: vmcs related preparatory changes for PVH
In this patch, some common code is factored out of construct_vmcs() to create vmx_set_common_host_vmcs_fields() to be used by PVH. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- xen/arch/x86/hvm/vmx/vmcs.c | 58 +++++++++++++++++++++++------------------- 1 files changed, 32 insertions(+), 26 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index de9f592..36f167f 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -825,11 +825,40 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val) virtual_vmcs_exit(vvmcs); } -static int construct_vmcs(struct vcpu *v) +static void vmx_set_common_host_vmcs_fields(struct vcpu *v) { - struct domain *d = v->domain; uint16_t sysenter_cs; unsigned long sysenter_eip; + + /* Host data selectors. */ + __vmwrite(HOST_SS_SELECTOR, __HYPERVISOR_DS); + __vmwrite(HOST_DS_SELECTOR, __HYPERVISOR_DS); + __vmwrite(HOST_ES_SELECTOR, __HYPERVISOR_DS); + __vmwrite(HOST_FS_SELECTOR, 0); + __vmwrite(HOST_GS_SELECTOR, 0); + __vmwrite(HOST_FS_BASE, 0); + __vmwrite(HOST_GS_BASE, 0); + + /* Host control registers. */ + v->arch.hvm_vmx.host_cr0 = read_cr0() | X86_CR0_TS; + __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); + __vmwrite(HOST_CR4, + mmu_cr4_features | (xsave_enabled(v) ? X86_CR4_OSXSAVE : 0)); + + /* Host CS:RIP. */ + __vmwrite(HOST_CS_SELECTOR, __HYPERVISOR_CS); + __vmwrite(HOST_RIP, (unsigned long)vmx_asm_vmexit_handler); + + /* Host SYSENTER CS:RIP. */ + rdmsrl(MSR_IA32_SYSENTER_CS, sysenter_cs); + __vmwrite(HOST_SYSENTER_CS, sysenter_cs); + rdmsrl(MSR_IA32_SYSENTER_EIP, sysenter_eip); + __vmwrite(HOST_SYSENTER_EIP, sysenter_eip); +} + +static int construct_vmcs(struct vcpu *v) +{ + struct domain *d = v->domain; u32 vmexit_ctl = vmx_vmexit_control; u32 vmentry_ctl = vmx_vmentry_control; @@ -932,30 +961,7 @@ static int construct_vmcs(struct vcpu *v) __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector); } - /* Host data selectors. */ - __vmwrite(HOST_SS_SELECTOR, __HYPERVISOR_DS); - __vmwrite(HOST_DS_SELECTOR, __HYPERVISOR_DS); - __vmwrite(HOST_ES_SELECTOR, __HYPERVISOR_DS); - __vmwrite(HOST_FS_SELECTOR, 0); - __vmwrite(HOST_GS_SELECTOR, 0); - __vmwrite(HOST_FS_BASE, 0); - __vmwrite(HOST_GS_BASE, 0); - - /* Host control registers. */ - v->arch.hvm_vmx.host_cr0 = read_cr0() | X86_CR0_TS; - __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); - __vmwrite(HOST_CR4, - mmu_cr4_features | (xsave_enabled(v) ? X86_CR4_OSXSAVE : 0)); - - /* Host CS:RIP. */ - __vmwrite(HOST_CS_SELECTOR, __HYPERVISOR_CS); - __vmwrite(HOST_RIP, (unsigned long)vmx_asm_vmexit_handler); - - /* Host SYSENTER CS:RIP. */ - rdmsrl(MSR_IA32_SYSENTER_CS, sysenter_cs); - __vmwrite(HOST_SYSENTER_CS, sysenter_cs); - rdmsrl(MSR_IA32_SYSENTER_EIP, sysenter_eip); - __vmwrite(HOST_SYSENTER_EIP, sysenter_eip); + vmx_set_common_host_vmcs_fields(v); /* MSR intercepts. */ __vmwrite(VM_EXIT_MSR_LOAD_COUNT, 0); -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:32 UTC
[PATCH 09/24] PVH xen: Introduce PVH guest type and some basic changes.
This patch introduces the concept of a pvh guest. There are other basic changes like creating macros to check for pv/pvh vcpu/domain, and also modifying copy-macros to account for pvh. Finally, guest_kernel_mode is changed to boast that a PVH doesn''t need to check for TF_kernel_mode flag since the kernel runs in ring 0. Chagnes in V2: - make is_pvh/is_hvm enum instead of adding is_pvh as a new flag. - fix indentation and spacing in guest_kernel_mode macro. - add debug only BUG() in GUEST_KERNEL_RPL macro as it should no longer be called in any PVH paths. Chagnes in V3: - Rename enum fields, and add is_pv to it. - Get rid if is_hvm_or_pvh_* macros. Chagnes in V4: - Move e820 fields out of pv_domain struct. Chagnes in V5: - Move e820 changes above in V4, to a separate patch. Chagnes in V5: - Rename enum guest_type from is_pv, ... to guest_type_pv, .... Chagnes in V8: - Got to VMCS for DPL check instead of checking the rpl in guest_kernel_mode. Note, we drop the const qualifier from vcpu_show_registers() to accomodate the hvm function call in guest_kernel_mode(). - Also, hvm_kernel_mode is put in hvm.c because it''s called from guest_kernel_mode in regs.h which is a pretty early header include. Hence, we can''t place it in hvm.h like other similar functions. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/debug.c | 2 +- xen/arch/x86/hvm/hvm.c | 8 ++++++++ xen/arch/x86/x86_64/traps.c | 2 +- xen/common/domain.c | 2 +- xen/include/asm-x86/desc.h | 4 +++- xen/include/asm-x86/domain.h | 2 +- xen/include/asm-x86/guest_access.h | 12 ++++++------ xen/include/asm-x86/x86_64/regs.h | 11 +++++++---- xen/include/public/domctl.h | 3 +++ xen/include/xen/sched.h | 21 ++++++++++++++++++--- 10 files changed, 49 insertions(+), 18 deletions(-) diff --git a/xen/arch/x86/debug.c b/xen/arch/x86/debug.c index e67473e..167421d 100644 --- a/xen/arch/x86/debug.c +++ b/xen/arch/x86/debug.c @@ -158,7 +158,7 @@ dbg_rw_guest_mem(dbgva_t addr, dbgbyte_t *buf, int len, struct domain *dp, pagecnt = min_t(long, PAGE_SIZE - (addr & ~PAGE_MASK), len); - mfn = (dp->is_hvm + mfn = (!is_pv_domain(dp) ? dbg_hvm_va2mfn(addr, dp, toaddr, &gfn) : dbg_pv_va2mfn(addr, dp, pgd3)); if ( mfn == INVALID_MFN ) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 8284b3b..bac4708 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4642,6 +4642,14 @@ enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v) return hvm_funcs.nhvm_intr_blocked(v); } +bool_t hvm_kernel_mode(struct vcpu *v) +{ + struct segment_register seg; + + hvm_get_segment_register(v, x86_seg_ss, &seg); + return (seg.attr.fields.dpl == 0); +} + /* * Local variables: * mode: C diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c index 9e0571d..feb50ff 100644 --- a/xen/arch/x86/x86_64/traps.c +++ b/xen/arch/x86/x86_64/traps.c @@ -141,7 +141,7 @@ void show_registers(struct cpu_user_regs *regs) } } -void vcpu_show_registers(const struct vcpu *v) +void vcpu_show_registers(struct vcpu *v) { const struct cpu_user_regs *regs = &v->arch.user_regs; unsigned long crs[8]; diff --git a/xen/common/domain.c b/xen/common/domain.c index 6c264a5..38b1bad 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -236,7 +236,7 @@ struct domain *domain_create( goto fail; if ( domcr_flags & DOMCRF_hvm ) - d->is_hvm = 1; + d->guest_type = guest_type_hvm; if ( domid == 0 ) { diff --git a/xen/include/asm-x86/desc.h b/xen/include/asm-x86/desc.h index 354b889..a4d4326 100644 --- a/xen/include/asm-x86/desc.h +++ b/xen/include/asm-x86/desc.h @@ -38,7 +38,9 @@ #ifndef __ASSEMBLY__ -#define GUEST_KERNEL_RPL(d) (is_pv_32bit_domain(d) ? 1 : 3) +/* PVH 32bitfixme : see emulate_gate_op call from do_general_protection */ +#define GUEST_KERNEL_RPL(d) ({ ASSERT(!is_pvh_domain(d)); \ + is_pv_32bit_domain(d) ? 1 : 3; }) /* Fix up the RPL of a guest segment selector. */ #define __fixup_guest_selector(d, sel) \ diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index c3f9f8e..22a72df 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -447,7 +447,7 @@ struct arch_vcpu #define hvm_svm hvm_vcpu.u.svm void vcpu_show_execution_state(struct vcpu *); -void vcpu_show_registers(const struct vcpu *); +void vcpu_show_registers(struct vcpu *); /* Clean up CR4 bits that are not under guest control. */ unsigned long pv_guest_cr4_fixup(const struct vcpu *, unsigned long guest_cr4); diff --git a/xen/include/asm-x86/guest_access.h b/xen/include/asm-x86/guest_access.h index ca700c9..675dda1 100644 --- a/xen/include/asm-x86/guest_access.h +++ b/xen/include/asm-x86/guest_access.h @@ -14,27 +14,27 @@ /* Raw access functions: no type checking. */ #define raw_copy_to_guest(dst, src, len) \ - (is_hvm_vcpu(current) ? \ + (!is_pv_vcpu(current) ? \ copy_to_user_hvm((dst), (src), (len)) : \ copy_to_user((dst), (src), (len))) #define raw_copy_from_guest(dst, src, len) \ - (is_hvm_vcpu(current) ? \ + (!is_pv_vcpu(current) ? \ copy_from_user_hvm((dst), (src), (len)) : \ copy_from_user((dst), (src), (len))) #define raw_clear_guest(dst, len) \ - (is_hvm_vcpu(current) ? \ + (!is_pv_vcpu(current) ? \ clear_user_hvm((dst), (len)) : \ clear_user((dst), (len))) #define __raw_copy_to_guest(dst, src, len) \ - (is_hvm_vcpu(current) ? \ + (!is_pv_vcpu(current) ? \ copy_to_user_hvm((dst), (src), (len)) : \ __copy_to_user((dst), (src), (len))) #define __raw_copy_from_guest(dst, src, len) \ - (is_hvm_vcpu(current) ? \ + (!is_pv_vcpu(current) ? \ copy_from_user_hvm((dst), (src), (len)) : \ __copy_from_user((dst), (src), (len))) #define __raw_clear_guest(dst, len) \ - (is_hvm_vcpu(current) ? \ + (!is_pv_vcpu(current) ? \ clear_user_hvm((dst), (len)) : \ clear_user((dst), (len))) diff --git a/xen/include/asm-x86/x86_64/regs.h b/xen/include/asm-x86/x86_64/regs.h index 3cdc702..63cb51a 100644 --- a/xen/include/asm-x86/x86_64/regs.h +++ b/xen/include/asm-x86/x86_64/regs.h @@ -10,10 +10,13 @@ #define ring_2(r) (((r)->cs & 3) == 2) #define ring_3(r) (((r)->cs & 3) == 3) -#define guest_kernel_mode(v, r) \ - (!is_pv_32bit_vcpu(v) ? \ - (ring_3(r) && ((v)->arch.flags & TF_kernel_mode)) : \ - (ring_1(r))) +bool_t hvm_kernel_mode(struct vcpu *); + +#define guest_kernel_mode(v, r) \ + (is_pvh_vcpu(v) ? (hvm_kernel_mode(v)) : \ + (!is_pv_32bit_vcpu(v) ? \ + (ring_3(r) && ((v)->arch.flags & TF_kernel_mode)) : \ + (ring_1(r)))) #define permit_softint(dpl, v, r) \ ((dpl) >= (guest_kernel_mode(v, r) ? 1 : 3)) diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index 4c5b2bb..6b1aa11 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -89,6 +89,9 @@ struct xen_domctl_getdomaininfo { /* Being debugged. */ #define _XEN_DOMINF_debugged 6 #define XEN_DOMINF_debugged (1U<<_XEN_DOMINF_debugged) +/* domain is PVH */ +#define _XEN_DOMINF_pvh_guest 7 +#define XEN_DOMINF_pvh_guest (1U<<_XEN_DOMINF_pvh_guest) /* XEN_DOMINF_shutdown guest-supplied code. */ #define XEN_DOMINF_shutdownmask 255 #define XEN_DOMINF_shutdownshift 16 diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index ae6a3b8..2d48d22 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -238,6 +238,14 @@ struct mem_event_per_domain struct mem_event_domain access; }; +/* + * PVH is a PV guest running in an HVM container. While is_hvm_* checks are + * false for it, it uses many of the HVM data structs. + */ +enum guest_type { + guest_type_pv, guest_type_pvh, guest_type_hvm +}; + struct domain { domid_t domain_id; @@ -285,8 +293,8 @@ struct domain struct rangeset *iomem_caps; struct rangeset *irq_caps; - /* Is this an HVM guest? */ - bool_t is_hvm; + enum guest_type guest_type; + #ifdef HAS_PASSTHROUGH /* Does this guest need iommu mappings? */ bool_t need_iommu; @@ -464,6 +472,9 @@ struct domain *domain_create( /* DOMCRF_oos_off: dont use out-of-sync optimization for shadow page tables */ #define _DOMCRF_oos_off 4 #define DOMCRF_oos_off (1U<<_DOMCRF_oos_off) + /* DOMCRF_pvh: Create PV domain in HVM container. */ +#define _DOMCRF_pvh 5 +#define DOMCRF_pvh (1U<<_DOMCRF_pvh) /* * rcu_lock_domain_by_id() is more efficient than get_domain_by_id(). @@ -732,8 +743,12 @@ void watchdog_domain_destroy(struct domain *d); #define VM_ASSIST(_d,_t) (test_bit((_t), &(_d)->vm_assist)) -#define is_hvm_domain(d) ((d)->is_hvm) +#define is_pv_domain(d) ((d)->guest_type == guest_type_pv) +#define is_pv_vcpu(v) (is_pv_domain((v)->domain)) +#define is_hvm_domain(d) ((d)->guest_type == guest_type_hvm) #define is_hvm_vcpu(v) (is_hvm_domain(v->domain)) +#define is_pvh_domain(d) ((d)->guest_type == guest_type_pvh) +#define is_pvh_vcpu(v) (is_pvh_domain((v)->domain)) #define is_pinned_vcpu(v) ((v)->domain->is_pinned || \ cpumask_weight((v)->cpu_affinity) == 1) #ifdef HAS_PASSTHROUGH -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:32 UTC
[PATCH 10/24] PVH xen: introduce pvh_set_vcpu_info() and vmx_pvh_set_vcpu_info()
vmx_pvh_set_vcpu_info() is added to a new file pvh.c, to which more changes are added later, like pvh vmexit handler. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/vmx/Makefile | 1 + xen/arch/x86/hvm/vmx/pvh.c | 77 +++++++++++++++++++++++++++++++++++++ xen/arch/x86/hvm/vmx/vmx.c | 1 + xen/include/asm-x86/hvm/hvm.h | 8 ++++ xen/include/asm-x86/hvm/vmx/vmx.h | 1 + 5 files changed, 88 insertions(+), 0 deletions(-) create mode 100644 xen/arch/x86/hvm/vmx/pvh.c diff --git a/xen/arch/x86/hvm/vmx/Makefile b/xen/arch/x86/hvm/vmx/Makefile index 373b3d9..59fb5d4 100644 --- a/xen/arch/x86/hvm/vmx/Makefile +++ b/xen/arch/x86/hvm/vmx/Makefile @@ -1,5 +1,6 @@ obj-bin-y += entry.o obj-y += intr.o +obj-y += pvh.o obj-y += realmode.o obj-y += vmcs.o obj-y += vmx.o diff --git a/xen/arch/x86/hvm/vmx/pvh.c b/xen/arch/x86/hvm/vmx/pvh.c new file mode 100644 index 0000000..8638850 --- /dev/null +++ b/xen/arch/x86/hvm/vmx/pvh.c @@ -0,0 +1,77 @@ +/* + * Copyright (C) 2013, Mukesh Rathor, Oracle Corp. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ + +#include <xen/hypercall.h> +#include <xen/guest_access.h> +#include <asm/p2m.h> +#include <asm/traps.h> +#include <asm/hvm/vmx/vmx.h> +#include <public/sched.h> +#include <asm/hvm/nestedhvm.h> +#include <asm/xstate.h> + +/* + * Set vmcs fields in support of vcpu_op -> VCPUOP_initialise hcall. Called + * from arch_set_info_guest() which sets the (PVH relevant) non-vmcs fields. + * + * In case of linux: + * The boot vcpu calls this to set some context for the non boot smp vcpu. + * The call comes from cpu_initialize_context(). (boot vcpu 0 context is + * set by the tools via do_domctl -> vcpu_initialise). + * + * NOTE: In case of VMCS, loading a selector doesn''t cause the hidden fields + * to be automatically loaded. We load selectors here but not the hidden + * parts. This means we require the guest to have same hidden values + * as the default values loaded in the vmcs in pvh_construct_vmcs(), ie, + * the GDT the vcpu is coming up on should be something like following + * on linux (for 64bit, CS:0x10 DS/SS:0x18) : + * + * ffff88007f704000: 0000000000000000 00cf9b000000ffff + * ffff88007f704010: 00af9b000000ffff 00cf93000000ffff + * ffff88007f704020: 00cffb000000ffff 00cff3000000ffff + * + */ +int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context *ctxtp) +{ + if ( v->vcpu_id == 0 ) + return 0; + + if ( !(ctxtp->flags & VGCF_in_kernel) ) + return -EINVAL; + + vmx_vmcs_enter(v); + __vmwrite(GUEST_GDTR_BASE, ctxtp->gdt.pvh.addr); + __vmwrite(GUEST_GDTR_LIMIT, ctxtp->gdt.pvh.limit); + __vmwrite(GUEST_LDTR_BASE, ctxtp->ldt_base); + __vmwrite(GUEST_LDTR_LIMIT, ctxtp->ldt_ents); + + __vmwrite(GUEST_FS_BASE, ctxtp->fs_base); + __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_user); + + __vmwrite(GUEST_CS_SELECTOR, ctxtp->user_regs.cs); + __vmwrite(GUEST_SS_SELECTOR, ctxtp->user_regs.ss); + __vmwrite(GUEST_ES_SELECTOR, ctxtp->user_regs.es); + __vmwrite(GUEST_DS_SELECTOR, ctxtp->user_regs.ds); + __vmwrite(GUEST_FS_SELECTOR, ctxtp->user_regs.fs); + __vmwrite(GUEST_GS_SELECTOR, ctxtp->user_regs.gs); + + if ( vmx_add_guest_msr(MSR_SHADOW_GS_BASE) ) + { + vmx_vmcs_exit(v); + return -EINVAL; + } + vmx_write_guest_msr(MSR_SHADOW_GS_BASE, ctxtp->gs_base_kernel); + + vmx_vmcs_exit(v); + return 0; +} diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 195f9ed..faf8b46 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1558,6 +1558,7 @@ static struct hvm_function_table __initdata vmx_function_table = { .deliver_posted_intr = vmx_deliver_posted_intr, .sync_pir_to_irr = vmx_sync_pir_to_irr, .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m, + .pvh_set_vcpu_info = vmx_pvh_set_vcpu_info, }; const struct hvm_function_table * __init start_vmx(void) diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index 8408420..aee95f4 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -192,6 +192,8 @@ struct hvm_function_table { paddr_t *L1_gpa, unsigned int *page_order, uint8_t *p2m_acc, bool_t access_r, bool_t access_w, bool_t access_x); + + int (*pvh_set_vcpu_info)(struct vcpu *v, struct vcpu_guest_context *ctxtp); }; extern struct hvm_function_table hvm_funcs; @@ -325,6 +327,12 @@ static inline unsigned long hvm_get_shadow_gs_base(struct vcpu *v) return hvm_funcs.get_shadow_gs_base(v); } +static inline int pvh_set_vcpu_info(struct vcpu *v, + struct vcpu_guest_context *ctxtp) +{ + return hvm_funcs.pvh_set_vcpu_info(v, ctxtp); +} + #define is_viridian_domain(_d) \ (is_hvm_domain(_d) && ((_d)->arch.hvm_domain.params[HVM_PARAM_VIRIDIAN])) diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h index c21a303..9e6c481 100644 --- a/xen/include/asm-x86/hvm/vmx/vmx.h +++ b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -473,6 +473,7 @@ void vmx_update_guest_eip(void); void vmx_dr_access(unsigned long exit_qualification, struct cpu_user_regs *regs); void vmx_fpu_enter(struct vcpu *v); +int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context *ctxtp); int alloc_p2m_hap_data(struct p2m_domain *p2m); void free_p2m_hap_data(struct p2m_domain *p2m); -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:32 UTC
[PATCH 11/24] PVH xen: domain create, schedular related code changes
This patch mostly contains changes to arch/x86/domain.c to allow for a PVH domain creation. The new function pvh_set_vcpu_info(), introduced in the previous patch, is called here to set some guest context in the VMCS. This patch also changes the context_switch code in the same file to follow HVM behaviour for PVH. Changes in V2: - changes to read_segment_register() moved to this patch. - The other comment was to create NULL functions for pvh_set_vcpu_info and pvh_read_descriptor which are implemented in later patch, but since I disable PVH creation until all patches are checked in, it is not needed. But it helps breaking down of patches. Changes in V3: - Fix read_segment_register() macro to make sure args are evaluated once, and use # instead of STR for name in the macro. Changes in V4: - Remove pvh substruct in the hvm substruct, as the vcpu_info_mfn has been moved out of pv_vcpu struct. - rename hvm_pvh_* functions to hvm_*. Changes in V5: - remove pvh_read_descriptor(). Changes in V7: - remove hap_update_cr3() and read_segment_register changes from here. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/domain.c | 56 ++++++++++++++++++++++++++++++++---------------- xen/arch/x86/mm.c | 3 ++ 2 files changed, 40 insertions(+), 19 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index c361abf..fccb4ee 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -385,7 +385,7 @@ int vcpu_initialise(struct vcpu *v) vmce_init_vcpu(v); - if ( is_hvm_domain(d) ) + if ( !is_pv_domain(d) ) { rc = hvm_vcpu_initialise(v); goto done; @@ -452,7 +452,7 @@ void vcpu_destroy(struct vcpu *v) vcpu_destroy_fpu(v); - if ( is_hvm_vcpu(v) ) + if ( !is_pv_vcpu(v) ) hvm_vcpu_destroy(v); else xfree(v->arch.pv_vcpu.trap_ctxt); @@ -464,7 +464,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) int rc = -ENOMEM; d->arch.hvm_domain.hap_enabled - is_hvm_domain(d) && + !is_pv_domain(d) && hvm_funcs.hap_supported && (domcr_flags & DOMCRF_hap); d->arch.hvm_domain.mem_sharing_enabled = 0; @@ -512,7 +512,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) mapcache_domain_init(d); HYPERVISOR_COMPAT_VIRT_START(d) - is_hvm_domain(d) ? ~0u : __HYPERVISOR_COMPAT_VIRT_START; + is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u; if ( (rc = paging_domain_init(d, domcr_flags)) != 0 ) goto fail; @@ -555,7 +555,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) } spin_lock_init(&d->arch.e820_lock); - if ( is_hvm_domain(d) ) + if ( !is_pv_domain(d) ) { if ( (rc = hvm_domain_initialise(d)) != 0 ) { @@ -651,7 +651,7 @@ int arch_set_info_guest( #define c(fld) (compat ? (c.cmp->fld) : (c.nat->fld)) flags = c(flags); - if ( !is_hvm_vcpu(v) ) + if ( is_pv_vcpu(v) ) { if ( !compat ) { @@ -704,7 +704,7 @@ int arch_set_info_guest( v->fpu_initialised = !!(flags & VGCF_I387_VALID); v->arch.flags &= ~TF_kernel_mode; - if ( (flags & VGCF_in_kernel) || is_hvm_vcpu(v)/*???*/ ) + if ( (flags & VGCF_in_kernel) || !is_pv_vcpu(v)/*???*/ ) v->arch.flags |= TF_kernel_mode; v->arch.vgc_flags = flags; @@ -719,7 +719,7 @@ int arch_set_info_guest( if ( !compat ) { memcpy(&v->arch.user_regs, &c.nat->user_regs, sizeof(c.nat->user_regs)); - if ( !is_hvm_vcpu(v) ) + if ( is_pv_vcpu(v) ) memcpy(v->arch.pv_vcpu.trap_ctxt, c.nat->trap_ctxt, sizeof(c.nat->trap_ctxt)); } @@ -735,10 +735,13 @@ int arch_set_info_guest( v->arch.user_regs.eflags |= 2; - if ( is_hvm_vcpu(v) ) + if ( !is_pv_vcpu(v) ) { hvm_set_info_guest(v); - goto out; + if ( is_hvm_vcpu(v) || v->is_initialised ) + goto out; + else + goto pvh_skip_pv_stuff; } init_int80_direct_trap(v); @@ -853,6 +856,7 @@ int arch_set_info_guest( set_bit(_VPF_in_reset, &v->pause_flags); + pvh_skip_pv_stuff: if ( !compat ) cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[3]); else @@ -861,7 +865,7 @@ int arch_set_info_guest( if ( !cr3_page ) rc = -EINVAL; - else if ( paging_mode_refcounts(d) ) + else if ( paging_mode_refcounts(d) || is_pvh_vcpu(v) ) /* nothing */; else if ( cr3_page == v->arch.old_guest_table ) { @@ -893,8 +897,15 @@ int arch_set_info_guest( /* handled below */; else if ( !compat ) { + /* PVH 32bitfixme. */ + if ( is_pvh_vcpu(v) ) + { + v->arch.cr3 = page_to_mfn(cr3_page); + v->arch.hvm_vcpu.guest_cr[3] = c.nat->ctrlreg[3]; + } + v->arch.guest_table = pagetable_from_page(cr3_page); - if ( c.nat->ctrlreg[1] ) + if ( c.nat->ctrlreg[1] && !is_pvh_vcpu(v) ) { cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[1]); cr3_page = get_page_from_gfn(d, cr3_gfn, NULL, P2M_ALLOC); @@ -954,6 +965,13 @@ int arch_set_info_guest( update_cr3(v); + if ( is_pvh_vcpu(v) ) + { + /* Set VMCS fields. */ + if ( (rc = pvh_set_vcpu_info(v, c.nat)) != 0 ) + return rc; + } + out: if ( flags & VGCF_online ) clear_bit(_VPF_down, &v->pause_flags); @@ -1315,7 +1333,7 @@ static void update_runstate_area(struct vcpu *v) static inline int need_full_gdt(struct vcpu *v) { - return (!is_hvm_vcpu(v) && !is_idle_vcpu(v)); + return (is_pv_vcpu(v) && !is_idle_vcpu(v)); } static void __context_switch(void) @@ -1450,7 +1468,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next) /* Re-enable interrupts before restoring state which may fault. */ local_irq_enable(); - if ( !is_hvm_vcpu(next) ) + if ( is_pv_vcpu(next) ) { load_LDT(next); load_segments(next); @@ -1576,12 +1594,12 @@ unsigned long hypercall_create_continuation( regs->eax = op; /* Ensure the hypercall trap instruction is re-executed. */ - if ( !is_hvm_vcpu(current) ) + if ( is_pv_vcpu(current) ) regs->eip -= 2; /* re-execute ''syscall'' / ''int $xx'' */ else current->arch.hvm_vcpu.hcall_preempted = 1; - if ( !is_hvm_vcpu(current) ? + if ( is_pv_vcpu(current) ? !is_pv_32on64_vcpu(current) : (hvm_guest_x86_mode(current) == 8) ) { @@ -1849,7 +1867,7 @@ int domain_relinquish_resources(struct domain *d) return ret; } - if ( !is_hvm_domain(d) ) + if ( is_pv_domain(d) ) { for_each_vcpu ( d, v ) { @@ -1922,7 +1940,7 @@ int domain_relinquish_resources(struct domain *d) BUG(); } - if ( is_hvm_domain(d) ) + if ( !is_pv_domain(d) ) hvm_domain_relinquish_resources(d); return 0; @@ -2006,7 +2024,7 @@ void vcpu_mark_events_pending(struct vcpu *v) if ( already_pending ) return; - if ( is_hvm_vcpu(v) ) + if ( !is_pv_vcpu(v) ) hvm_assert_evtchn_irq(v); else vcpu_kick(v); diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index e980431..de7ba45 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4334,6 +4334,9 @@ void destroy_gdt(struct vcpu *v) int i; unsigned long pfn; + if ( is_pvh_vcpu(v) ) + return; + v->arch.pv_vcpu.gdt_ents = 0; pl1e = gdt_ldt_ptes(v->domain, v); for ( i = 0; i < FIRST_RESERVED_GDT_PAGE; i++ ) -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:32 UTC
[PATCH 12/24] PVH xen: support invalid op emulation for PVH
This patch supports invalid op emulation for PVH by calling appropriate copy macros and and HVM function to inject PF. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/traps.c | 17 ++++++++++++++--- xen/include/asm-x86/traps.h | 1 + 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 378ef0a..a3ca70b 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -459,6 +459,11 @@ static void instruction_done( struct cpu_user_regs *regs, unsigned long eip, unsigned int bpmatch) { regs->eip = eip; + + /* PVH fixme: debug trap below */ + if ( is_pvh_vcpu(current) ) + return; + regs->eflags &= ~X86_EFLAGS_RF; if ( bpmatch || (regs->eflags & X86_EFLAGS_TF) ) { @@ -913,7 +918,7 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs *regs) return EXCRET_fault_fixed; } -static int emulate_forced_invalid_op(struct cpu_user_regs *regs) +int emulate_forced_invalid_op(struct cpu_user_regs *regs) { char sig[5], instr[2]; unsigned long eip, rc; @@ -921,7 +926,7 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs) eip = regs->eip; /* Check for forced emulation signature: ud2 ; .ascii "xen". */ - if ( (rc = copy_from_user(sig, (char *)eip, sizeof(sig))) != 0 ) + if ( (rc = raw_copy_from_guest(sig, (char *)eip, sizeof(sig))) != 0 ) { propagate_page_fault(eip + sizeof(sig) - rc, 0); return EXCRET_fault_fixed; @@ -931,7 +936,7 @@ static int emulate_forced_invalid_op(struct cpu_user_regs *regs) eip += sizeof(sig); /* We only emulate CPUID. */ - if ( ( rc = copy_from_user(instr, (char *)eip, sizeof(instr))) != 0 ) + if ( ( rc = raw_copy_from_guest(instr, (char *)eip, sizeof(instr))) != 0 ) { propagate_page_fault(eip + sizeof(instr) - rc, 0); return EXCRET_fault_fixed; @@ -1076,6 +1081,12 @@ void propagate_page_fault(unsigned long addr, u16 error_code) struct vcpu *v = current; struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce; + if ( is_pvh_vcpu(v) ) + { + hvm_inject_page_fault(error_code, addr); + return; + } + v->arch.pv_vcpu.ctrlreg[2] = addr; arch_set_cr2(v, addr); diff --git a/xen/include/asm-x86/traps.h b/xen/include/asm-x86/traps.h index 82cbcee..1d9b087 100644 --- a/xen/include/asm-x86/traps.h +++ b/xen/include/asm-x86/traps.h @@ -48,5 +48,6 @@ extern int guest_has_trap_callback(struct domain *d, uint16_t vcpuid, */ extern int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr); +int emulate_forced_invalid_op(struct cpu_user_regs *regs); #endif /* ASM_TRAP_H */ -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:32 UTC
[PATCH 13/24] PVH xen: Support privileged op emulation for PVH
This patch changes mostly traps.c to support privileged op emulation for PVH. A new function read_descriptor_sel() is introduced to read descriptor for PVH given a selector. Another new function vmx_read_selector() reads a selector from VMCS, to support read_segment_register() for PVH. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/vmx/vmx.c | 40 +++++++++++++++++++ xen/arch/x86/traps.c | 86 +++++++++++++++++++++++++++++++++++----- xen/include/asm-x86/hvm/hvm.h | 7 +++ xen/include/asm-x86/system.h | 19 +++++++-- 4 files changed, 137 insertions(+), 15 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index faf8b46..9be321d 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -664,6 +664,45 @@ static void vmx_ctxt_switch_to(struct vcpu *v) .fields = { .type = 0xb, .s = 0, .dpl = 0, .p = 1, .avl = 0, \ .l = 0, .db = 0, .g = 0, .pad = 0 } }).bytes) +u16 vmx_read_selector(struct vcpu *v, enum x86_segment seg) +{ + u16 sel = 0; + + vmx_vmcs_enter(v); + switch ( seg ) + { + case x86_seg_cs: + sel = __vmread(GUEST_CS_SELECTOR); + break; + + case x86_seg_ss: + sel = __vmread(GUEST_SS_SELECTOR); + break; + + case x86_seg_es: + sel = __vmread(GUEST_ES_SELECTOR); + break; + + case x86_seg_ds: + sel = __vmread(GUEST_DS_SELECTOR); + break; + + case x86_seg_fs: + sel = __vmread(GUEST_FS_SELECTOR); + break; + + case x86_seg_gs: + sel = __vmread(GUEST_GS_SELECTOR); + break; + + default: + BUG(); + } + vmx_vmcs_exit(v); + + return sel; +} + void vmx_get_segment_register(struct vcpu *v, enum x86_segment seg, struct segment_register *reg) { @@ -1559,6 +1598,7 @@ static struct hvm_function_table __initdata vmx_function_table = { .sync_pir_to_irr = vmx_sync_pir_to_irr, .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m, .pvh_set_vcpu_info = vmx_pvh_set_vcpu_info, + .read_selector = vmx_read_selector, }; const struct hvm_function_table * __init start_vmx(void) diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index a3ca70b..fe8b94c 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -480,6 +480,10 @@ static unsigned int check_guest_io_breakpoint(struct vcpu *v, unsigned int width, i, match = 0; unsigned long start; + /* PVH fixme: support io breakpoint. */ + if ( is_pvh_vcpu(v) ) + return 0; + if ( !(v->arch.debugreg[5]) || !(v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_DE) ) return 0; @@ -1525,6 +1529,49 @@ static int read_descriptor(unsigned int sel, return 1; } +static int read_descriptor_sel(unsigned int sel, + enum x86_segment which_sel, + struct vcpu *v, + const struct cpu_user_regs *regs, + unsigned long *base, + unsigned long *limit, + unsigned int *ar, + unsigned int vm86attr) +{ + struct segment_register seg; + bool_t long_mode; + + if ( !is_pvh_vcpu(v) ) + return read_descriptor(sel, v, regs, base, limit, ar, vm86attr); + + hvm_get_segment_register(v, x86_seg_cs, &seg); + long_mode = seg.attr.fields.l; + + if ( which_sel != x86_seg_cs ) + hvm_get_segment_register(v, which_sel, &seg); + + /* "ar" is returned packed as in segment_attributes_t. Fix it up. */ + *ar = seg.attr.bytes; + *ar = (*ar & 0xff ) | ((*ar & 0xf00) << 4); + *ar <<= 8; + + if ( long_mode ) + { + *limit = ~0UL; + + if ( which_sel < x86_seg_fs ) + { + *base = 0UL; + return 1; + } + } + else + *limit = seg.limit; + + *base = seg.base; + return 1; +} + static int read_gate_descriptor(unsigned int gate_sel, const struct vcpu *v, unsigned int *sel, @@ -1590,6 +1637,13 @@ static int guest_io_okay( int user_mode = !(v->arch.flags & TF_kernel_mode); #define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v) + /* + * For PVH we check this in vmexit for EXIT_REASON_IO_INSTRUCTION + * and so don''t need to check again here. + */ + if ( is_pvh_vcpu(v) ) + return 1; + if ( !vm86_mode(regs) && (v->arch.pv_vcpu.iopl >= (guest_kernel_mode(v, regs) ? 1 : 3)) ) return 1; @@ -1835,7 +1889,7 @@ static inline uint64_t guest_misc_enable(uint64_t val) _ptr = (unsigned int)_ptr; \ if ( (limit) < sizeof(_x) - 1 || (eip) > (limit) - (sizeof(_x) - 1) ) \ goto fail; \ - if ( (_rc = copy_from_user(&_x, (type *)_ptr, sizeof(_x))) != 0 ) \ + if ( (_rc = raw_copy_from_guest(&_x, (type *)_ptr, sizeof(_x))) != 0 ) \ { \ propagate_page_fault(_ptr + sizeof(_x) - _rc, 0); \ goto skip; \ @@ -1852,6 +1906,7 @@ static int is_cpufreq_controller(struct domain *d) static int emulate_privileged_op(struct cpu_user_regs *regs) { + enum x86_segment which_sel; struct vcpu *v = current; unsigned long *reg, eip = regs->eip; u8 opcode, modrm_reg = 0, modrm_rm = 0, rep_prefix = 0, lock = 0, rex = 0; @@ -1874,9 +1929,10 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) void (*io_emul)(struct cpu_user_regs *) __attribute__((__regparm__(1))); uint64_t val, msr_content; - if ( !read_descriptor(regs->cs, v, regs, - &code_base, &code_limit, &ar, - _SEGMENT_CODE|_SEGMENT_S|_SEGMENT_DPL|_SEGMENT_P) ) + if ( !read_descriptor_sel(regs->cs, x86_seg_cs, v, regs, + &code_base, &code_limit, &ar, + _SEGMENT_CODE|_SEGMENT_S| + _SEGMENT_DPL|_SEGMENT_P) ) goto fail; op_default = op_bytes = (ar & (_SEGMENT_L|_SEGMENT_DB)) ? 4 : 2; ad_default = ad_bytes = (ar & _SEGMENT_L) ? 8 : op_default; @@ -1887,6 +1943,7 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) /* emulating only opcodes not allowing SS to be default */ data_sel = read_segment_register(v, regs, ds); + which_sel = x86_seg_ds; /* Legacy prefixes. */ for ( i = 0; i < 8; i++, rex == opcode || (rex = 0) ) @@ -1902,23 +1959,29 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) continue; case 0x2e: /* CS override */ data_sel = regs->cs; + which_sel = x86_seg_cs; continue; case 0x3e: /* DS override */ data_sel = read_segment_register(v, regs, ds); + which_sel = x86_seg_ds; continue; case 0x26: /* ES override */ data_sel = read_segment_register(v, regs, es); + which_sel = x86_seg_es; continue; case 0x64: /* FS override */ data_sel = read_segment_register(v, regs, fs); + which_sel = x86_seg_fs; lm_ovr = lm_seg_fs; continue; case 0x65: /* GS override */ data_sel = read_segment_register(v, regs, gs); + which_sel = x86_seg_gs; lm_ovr = lm_seg_gs; continue; case 0x36: /* SS override */ data_sel = regs->ss; + which_sel = x86_seg_ss; continue; case 0xf0: /* LOCK */ lock = 1; @@ -1962,15 +2025,16 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) if ( !(opcode & 2) ) { data_sel = read_segment_register(v, regs, es); + which_sel = x86_seg_es; lm_ovr = lm_seg_none; } if ( !(ar & _SEGMENT_L) ) { - if ( !read_descriptor(data_sel, v, regs, - &data_base, &data_limit, &ar, - _SEGMENT_WR|_SEGMENT_S|_SEGMENT_DPL| - _SEGMENT_P) ) + if ( !read_descriptor_sel(data_sel, which_sel, v, regs, + &data_base, &data_limit, &ar, + _SEGMENT_WR|_SEGMENT_S|_SEGMENT_DPL| + _SEGMENT_P) ) goto fail; if ( !(ar & _SEGMENT_S) || !(ar & _SEGMENT_P) || @@ -2000,9 +2064,9 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) } } else - read_descriptor(data_sel, v, regs, - &data_base, &data_limit, &ar, - 0); + read_descriptor_sel(data_sel, which_sel, v, regs, + &data_base, &data_limit, &ar, + 0); data_limit = ~0UL; ar = _SEGMENT_WR|_SEGMENT_S|_SEGMENT_DPL|_SEGMENT_P; } diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index aee95f4..51ab230 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -194,6 +194,8 @@ struct hvm_function_table { bool_t access_w, bool_t access_x); int (*pvh_set_vcpu_info)(struct vcpu *v, struct vcpu_guest_context *ctxtp); + + u16 (*read_selector)(struct vcpu *v, enum x86_segment seg); }; extern struct hvm_function_table hvm_funcs; @@ -333,6 +335,11 @@ static inline int pvh_set_vcpu_info(struct vcpu *v, return hvm_funcs.pvh_set_vcpu_info(v, ctxtp); } +static inline u16 pvh_get_selector(struct vcpu *v, enum x86_segment seg) +{ + return hvm_funcs.read_selector(v, seg); +} + #define is_viridian_domain(_d) \ (is_hvm_domain(_d) && ((_d)->arch.hvm_domain.params[HVM_PARAM_VIRIDIAN])) diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h index 9bb22cb..1242657 100644 --- a/xen/include/asm-x86/system.h +++ b/xen/include/asm-x86/system.h @@ -4,10 +4,21 @@ #include <xen/lib.h> #include <xen/bitops.h> -#define read_segment_register(vcpu, regs, name) \ -({ u16 __sel; \ - asm volatile ( "movw %%" STR(name) ",%0" : "=r" (__sel) ); \ - __sel; \ +/* + * We need vcpu because during context switch, going from PV to PVH, + * in save_segments() current has been updated to next, and no longer pointing + * to the PV, but the intention is to get selector for the PV. Checking + * is_pvh_vcpu(current) will yield incorrect results in such a case. + */ +#define read_segment_register(vcpu, regs, name) \ +({ u16 __sel; \ + struct cpu_user_regs *_regs = (regs); \ + \ + if ( is_pvh_vcpu(vcpu) && guest_mode(_regs) ) \ + __sel = pvh_get_selector(vcpu, x86_seg_##name); \ + else \ + asm volatile ( "movw %%" #name ",%0" : "=r" (__sel) ); \ + __sel; \ }) #define wbinvd() \ -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:32 UTC
[PATCH 14/24] PVH xen: interrupt/event-channel delivery to PVH
PVH uses HVMIRQ_callback_vector for interrupt delivery. Also, change hvm_vcpu_has_pending_irq() as PVH doesn''t use vlapic emulation, so we can skip vlapic checks in the function. Moreover, a PVH guest installs IDT natively, and sets callback via for interrupt delivery during boot. Once that is done, it receives interrupts via the callback. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/irq.c | 3 +++ xen/arch/x86/hvm/vmx/intr.c | 8 ++++++-- xen/include/asm-x86/domain.h | 2 +- xen/include/asm-x86/event.h | 2 +- 4 files changed, 11 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c index 9eae5de..92fb245 100644 --- a/xen/arch/x86/hvm/irq.c +++ b/xen/arch/x86/hvm/irq.c @@ -405,6 +405,9 @@ struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v) && vcpu_info(v, evtchn_upcall_pending) ) return hvm_intack_vector(plat->irq.callback_via.vector); + if ( is_pvh_vcpu(v) ) + return hvm_intack_none; + if ( vlapic_accept_pic_intr(v) && plat->vpic[0].int_output ) return hvm_intack_pic(0); diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c index e376f3c..ce42950 100644 --- a/xen/arch/x86/hvm/vmx/intr.c +++ b/xen/arch/x86/hvm/vmx/intr.c @@ -165,6 +165,9 @@ static int nvmx_intr_intercept(struct vcpu *v, struct hvm_intack intack) { u32 ctrl; + if ( is_pvh_vcpu(v) ) + return 0; + if ( nvmx_intr_blocked(v) != hvm_intblk_none ) { enable_intr_window(v, intack); @@ -219,8 +222,9 @@ void vmx_intr_assist(void) return; } - /* Crank the handle on interrupt state. */ - pt_vector = pt_update_irq(v); + if ( !is_pvh_vcpu(v) ) + /* Crank the handle on interrupt state. */ + pt_vector = pt_update_irq(v); do { intack = hvm_vcpu_has_pending_irq(v); diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index 22a72df..21a9954 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -16,7 +16,7 @@ #define is_pv_32on64_domain(d) (is_pv_32bit_domain(d)) #define is_pv_32on64_vcpu(v) (is_pv_32on64_domain((v)->domain)) -#define is_hvm_pv_evtchn_domain(d) (is_hvm_domain(d) && \ +#define is_hvm_pv_evtchn_domain(d) (!is_pv_domain(d) && \ d->arch.hvm_domain.irq.callback_via_type == HVMIRQ_callback_vector) #define is_hvm_pv_evtchn_vcpu(v) (is_hvm_pv_evtchn_domain(v->domain)) diff --git a/xen/include/asm-x86/event.h b/xen/include/asm-x86/event.h index 06057c7..7ed5812 100644 --- a/xen/include/asm-x86/event.h +++ b/xen/include/asm-x86/event.h @@ -18,7 +18,7 @@ int hvm_local_events_need_delivery(struct vcpu *v); static inline int local_events_need_delivery(void) { struct vcpu *v = current; - return (is_hvm_vcpu(v) ? hvm_local_events_need_delivery(v) : + return (!is_pv_vcpu(v) ? hvm_local_events_need_delivery(v) : (vcpu_info(v, evtchn_upcall_pending) && !vcpu_info(v, evtchn_upcall_mask))); } -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:32 UTC
[PATCH 15/24] PVH xen: additional changes to support PVH guest creation and execution.
Fail creation of 32bit PVH guest. Change hap_update_cr3() to return long mode for PVH, this called during domain creation from arch_set_info_guest(). Return correct features for PVH to guest during it''s boot. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/domain.c | 8 ++++++++ xen/arch/x86/mm/hap/hap.c | 4 +++- xen/common/domain.c | 10 ++++++++++ xen/common/domctl.c | 5 +++++ xen/common/kernel.c | 6 +++++- 5 files changed, 31 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index fccb4ee..288872a 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -339,6 +339,14 @@ int switch_compat(struct domain *d) if ( d == NULL ) return -EINVAL; + + if ( is_pvh_domain(d) ) + { + printk(XENLOG_INFO + "Xen currently does not support 32bit PVH guests\n"); + return -EINVAL; + } + if ( !may_switch_mode(d) ) return -EACCES; if ( is_pv_32on64_domain(d) ) diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c index bff05d9..19a085c 100644 --- a/xen/arch/x86/mm/hap/hap.c +++ b/xen/arch/x86/mm/hap/hap.c @@ -639,7 +639,9 @@ static void hap_update_cr3(struct vcpu *v, int do_locking) const struct paging_mode * hap_paging_get_mode(struct vcpu *v) { - return !hvm_paging_enabled(v) ? &hap_paging_real_mode : + /* PVH 32bitfixme. */ + return is_pvh_vcpu(v) ? &hap_paging_long_mode : + !hvm_paging_enabled(v) ? &hap_paging_real_mode : hvm_long_mode_enabled(v) ? &hap_paging_long_mode : hvm_pae_enabled(v) ? &hap_paging_pae_mode : &hap_paging_protected_mode; diff --git a/xen/common/domain.c b/xen/common/domain.c index 38b1bad..3b4af4b 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -237,6 +237,16 @@ struct domain *domain_create( if ( domcr_flags & DOMCRF_hvm ) d->guest_type = guest_type_hvm; + else if ( domcr_flags & DOMCRF_pvh ) + { + if ( !(domcr_flags & DOMCRF_hap) ) + { + err = -EOPNOTSUPP; + printk(XENLOG_INFO "PVH guest must have HAP on\n"); + goto fail; + } + d->guest_type = guest_type_pvh; + } if ( domid == 0 ) { diff --git a/xen/common/domctl.c b/xen/common/domctl.c index c653efb..48e4c08 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -187,6 +187,8 @@ void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info) if ( is_hvm_domain(d) ) info->flags |= XEN_DOMINF_hvm_guest; + else if ( is_pvh_domain(d) ) + info->flags |= XEN_DOMINF_pvh_guest; xsm_security_domaininfo(d, info); @@ -443,6 +445,9 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) domcr_flags = 0; if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hvm_guest ) domcr_flags |= DOMCRF_hvm; + else if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hap ) + domcr_flags |= DOMCRF_pvh; /* PV with HAP is a PVH guest */ + if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hap ) domcr_flags |= DOMCRF_hap; if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_s3_integrity ) diff --git a/xen/common/kernel.c b/xen/common/kernel.c index 72fb905..3bba758 100644 --- a/xen/common/kernel.c +++ b/xen/common/kernel.c @@ -289,7 +289,11 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) if ( current->domain == dom0 ) fi.submap |= 1U << XENFEAT_dom0; #ifdef CONFIG_X86 - if ( !is_hvm_vcpu(current) ) + if ( is_pvh_vcpu(current) ) + fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) | + (1U << XENFEAT_supervisor_mode_kernel) | + (1U << XENFEAT_hvm_callback_vector); + else if ( !is_hvm_vcpu(current) ) fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) | (1U << XENFEAT_highmem_assist) | (1U << XENFEAT_gnttab_map_avail_bits); -- 1.7.2.3
PVH doesn''t use map cache. show_registers() for PVH takes the HVM path. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/arch/x86/domain_page.c | 10 +++++----- xen/arch/x86/x86_64/traps.c | 6 +++--- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c index bc18263..3903952 100644 --- a/xen/arch/x86/domain_page.c +++ b/xen/arch/x86/domain_page.c @@ -35,7 +35,7 @@ static inline struct vcpu *mapcache_current_vcpu(void) * then it means we are running on the idle domain''s page table and must * therefore use its mapcache. */ - if ( unlikely(pagetable_is_null(v->arch.guest_table)) && !is_hvm_vcpu(v) ) + if ( unlikely(pagetable_is_null(v->arch.guest_table)) && is_pv_vcpu(v) ) { /* If we really are idling, perform lazy context switch now. */ if ( (v = idle_vcpu[smp_processor_id()]) == current ) @@ -72,7 +72,7 @@ void *map_domain_page(unsigned long mfn) #endif v = mapcache_current_vcpu(); - if ( !v || is_hvm_vcpu(v) ) + if ( !v || !is_pv_vcpu(v) ) return mfn_to_virt(mfn); dcache = &v->domain->arch.pv_domain.mapcache; @@ -177,7 +177,7 @@ void unmap_domain_page(const void *ptr) ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END); v = mapcache_current_vcpu(); - ASSERT(v && !is_hvm_vcpu(v)); + ASSERT(v && is_pv_vcpu(v)); dcache = &v->domain->arch.pv_domain.mapcache; ASSERT(dcache->inuse); @@ -244,7 +244,7 @@ int mapcache_domain_init(struct domain *d) struct mapcache_domain *dcache = &d->arch.pv_domain.mapcache; unsigned int bitmap_pages; - if ( is_hvm_domain(d) || is_idle_domain(d) ) + if ( !is_pv_domain(d) || is_idle_domain(d) ) return 0; #ifdef NDEBUG @@ -275,7 +275,7 @@ int mapcache_vcpu_init(struct vcpu *v) unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES; unsigned int nr = PFN_UP(BITS_TO_LONGS(ents) * sizeof(long)); - if ( is_hvm_vcpu(v) || !dcache->inuse ) + if ( !is_pv_vcpu(v) || !dcache->inuse ) return 0; if ( ents > dcache->entries ) diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c index feb50ff..6ac7762 100644 --- a/xen/arch/x86/x86_64/traps.c +++ b/xen/arch/x86/x86_64/traps.c @@ -85,7 +85,7 @@ void show_registers(struct cpu_user_regs *regs) enum context context; struct vcpu *v = current; - if ( is_hvm_vcpu(v) && guest_mode(regs) ) + if ( !is_pv_vcpu(v) && guest_mode(regs) ) { struct segment_register sreg; context = CTXT_hvm_guest; @@ -146,8 +146,8 @@ void vcpu_show_registers(struct vcpu *v) const struct cpu_user_regs *regs = &v->arch.user_regs; unsigned long crs[8]; - /* No need to handle HVM for now. */ - if ( is_hvm_vcpu(v) ) + /* No need to handle HVM and PVH for now. */ + if ( !is_pv_vcpu(v) ) return; crs[0] = v->arch.pv_vcpu.ctrlreg[0]; -- 1.7.2.3
PVH only supports limited memory types in Phase I. TSC is limited to native mode only also for the moment. Finally, grant mapping of iomem for PVH hasn''t been explored in phase I. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/mtrr.c | 8 ++++++++ xen/arch/x86/time.c | 8 ++++++++ xen/common/grant_table.c | 4 ++-- 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c index ef51a8d..b9d6411 100644 --- a/xen/arch/x86/hvm/mtrr.c +++ b/xen/arch/x86/hvm/mtrr.c @@ -693,6 +693,14 @@ uint8_t epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn, ((d->vcpu == NULL) || ((v = d->vcpu[0]) == NULL)) ) return MTRR_TYPE_WRBACK; + /* PVH fixme: Add support for more memory types. */ + if ( is_pvh_domain(d) ) + { + if ( direct_mmio ) + return MTRR_TYPE_UNCACHABLE; + return MTRR_TYPE_WRBACK; + } + if ( !v->domain->arch.hvm_domain.params[HVM_PARAM_IDENT_PT] ) return MTRR_TYPE_WRBACK; diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index cf8bc78..55f6f4b 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -1890,6 +1890,14 @@ void tsc_set_info(struct domain *d, d->arch.vtsc = 0; return; } + if ( is_pvh_domain(d) && tsc_mode != TSC_MODE_NEVER_EMULATE ) + { + /* PVH fixme: support more tsc modes. */ + printk(XENLOG_WARNING + "PVH currently does not support tsc emulation. Setting timer_mode = native\n"); + d->arch.vtsc = 0; + return; + } switch ( d->arch.tsc_mode = tsc_mode ) { diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c index eb50288..c51da30 100644 --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -721,7 +721,7 @@ __gnttab_map_grant_ref( double_gt_lock(lgt, rgt); - if ( !is_hvm_domain(ld) && need_iommu(ld) ) + if ( is_pv_domain(ld) && need_iommu(ld) ) { unsigned int wrc, rdc; int err = 0; @@ -932,7 +932,7 @@ __gnttab_unmap_common( act->pin -= GNTPIN_hstw_inc; } - if ( !is_hvm_domain(ld) && need_iommu(ld) ) + if ( is_pv_domain(ld) && need_iommu(ld) ) { unsigned int wrc, rdc; int err = 0; -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:33 UTC
[PATCH 18/24] PVH xen: Checks, asserts, and limitations for PVH
This patch adds some precautionary checks and debug asserts for PVH. Also, PVH doesn''t support any HVM type guest monitoring at present. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/hvm.c | 13 +++++++++++++ xen/arch/x86/hvm/mtrr.c | 4 ++++ xen/arch/x86/physdev.c | 13 +++++++++++++ xen/arch/x86/traps.c | 4 ++++ 4 files changed, 34 insertions(+), 0 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index bac4708..3b47e6f 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4526,8 +4526,11 @@ static int hvm_memory_event_traps(long p, uint32_t reason, return 1; } +/* PVH fixme: add support for monitoring guest behaviour in below functions. */ void hvm_memory_event_cr0(unsigned long value, unsigned long old) { + if ( is_pvh_vcpu(current) ) + return; hvm_memory_event_traps(current->domain->arch.hvm_domain .params[HVM_PARAM_MEMORY_EVENT_CR0], MEM_EVENT_REASON_CR0, @@ -4536,6 +4539,8 @@ void hvm_memory_event_cr0(unsigned long value, unsigned long old) void hvm_memory_event_cr3(unsigned long value, unsigned long old) { + if ( is_pvh_vcpu(current) ) + return; hvm_memory_event_traps(current->domain->arch.hvm_domain .params[HVM_PARAM_MEMORY_EVENT_CR3], MEM_EVENT_REASON_CR3, @@ -4544,6 +4549,8 @@ void hvm_memory_event_cr3(unsigned long value, unsigned long old) void hvm_memory_event_cr4(unsigned long value, unsigned long old) { + if ( is_pvh_vcpu(current) ) + return; hvm_memory_event_traps(current->domain->arch.hvm_domain .params[HVM_PARAM_MEMORY_EVENT_CR4], MEM_EVENT_REASON_CR4, @@ -4552,6 +4559,8 @@ void hvm_memory_event_cr4(unsigned long value, unsigned long old) void hvm_memory_event_msr(unsigned long msr, unsigned long value) { + if ( is_pvh_vcpu(current) ) + return; hvm_memory_event_traps(current->domain->arch.hvm_domain .params[HVM_PARAM_MEMORY_EVENT_MSR], MEM_EVENT_REASON_MSR, @@ -4564,6 +4573,8 @@ int hvm_memory_event_int3(unsigned long gla) unsigned long gfn; gfn = paging_gva_to_gfn(current, gla, &pfec); + if ( is_pvh_vcpu(current) ) + return 0; return hvm_memory_event_traps(current->domain->arch.hvm_domain .params[HVM_PARAM_MEMORY_EVENT_INT3], MEM_EVENT_REASON_INT3, @@ -4576,6 +4587,8 @@ int hvm_memory_event_single_step(unsigned long gla) unsigned long gfn; gfn = paging_gva_to_gfn(current, gla, &pfec); + if ( is_pvh_vcpu(current) ) + return 0; return hvm_memory_event_traps(current->domain->arch.hvm_domain .params[HVM_PARAM_MEMORY_EVENT_SINGLE_STEP], MEM_EVENT_REASON_SINGLESTEP, diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c index b9d6411..6706af6 100644 --- a/xen/arch/x86/hvm/mtrr.c +++ b/xen/arch/x86/hvm/mtrr.c @@ -578,6 +578,10 @@ int32_t hvm_set_mem_pinned_cacheattr( { struct hvm_mem_pinned_cacheattr_range *range; + /* Side note: A PVH guest writes to MSR_IA32_CR_PAT natively. */ + if ( is_pvh_domain(d) ) + return -EOPNOTSUPP; + if ( !((type == PAT_TYPE_UNCACHABLE) || (type == PAT_TYPE_WRCOMB) || (type == PAT_TYPE_WRTHROUGH) || diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index 3733c7a..73c8d2a 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -475,6 +475,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_set_iopl: { struct physdev_set_iopl set_iopl; + + if ( is_pvh_vcpu(current) ) + { + ret = -EPERM; + break; + } + ret = -EFAULT; if ( copy_from_guest(&set_iopl, arg, 1) != 0 ) break; @@ -488,6 +495,12 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_set_iobitmap: { struct physdev_set_iobitmap set_iobitmap; + + if ( is_pvh_vcpu(current) ) + { + ret = -EPERM; + break; + } ret = -EFAULT; if ( copy_from_guest(&set_iobitmap, arg, 1) != 0 ) break; diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index fe8b94c..5325e92 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -2710,6 +2710,8 @@ static void emulate_gate_op(struct cpu_user_regs *regs) unsigned long off, eip, opnd_off, base, limit; int jump; + ASSERT(!is_pvh_vcpu(v)); + /* Check whether this fault is due to the use of a call gate. */ if ( !read_gate_descriptor(regs->error_code, v, &sel, &off, &ar) || (((ar >> 13) & 3) < (regs->cs & 3)) || @@ -3326,6 +3328,8 @@ void do_device_not_available(struct cpu_user_regs *regs) BUG_ON(!guest_mode(regs)); + ASSERT(!is_pvh_vcpu(curr)); + vcpu_restore_fpu_lazy(curr); if ( curr->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS ) -- 1.7.2.3
This patch expands HVM hcall support to include PVH. Changes in v8: - Carve out PVH support of hvm_op to a small function. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/hvm.c | 80 +++++++++++++++++++++++++++++++++++++------ xen/arch/x86/x86_64/traps.c | 2 +- 2 files changed, 70 insertions(+), 12 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 3b47e6f..3d930eb 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -3188,6 +3188,16 @@ static long hvm_vcpu_op( case VCPUOP_register_vcpu_time_memory_area: rc = do_vcpu_op(cmd, vcpuid, arg); break; + + case VCPUOP_is_up: + case VCPUOP_up: + case VCPUOP_initialise: + if ( is_pvh_vcpu(current) ) + rc = do_vcpu_op(cmd, vcpuid, arg); + else + rc = -ENOSYS; + break; + default: rc = -ENOSYS; break; @@ -3308,6 +3318,24 @@ static hvm_hypercall_t *const hvm_hypercall32_table[NR_hypercalls] = { HYPERCALL(tmem_op) }; +/* PVH 32bitfixme. */ +static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = { + HYPERCALL(platform_op), + HYPERCALL(memory_op), + HYPERCALL(xen_version), + HYPERCALL(console_io), + [ __HYPERVISOR_grant_table_op ] = (hvm_hypercall_t *)hvm_grant_table_op, + [ __HYPERVISOR_vcpu_op ] = (hvm_hypercall_t *)hvm_vcpu_op, + HYPERCALL(mmuext_op), + HYPERCALL(xsm_op), + HYPERCALL(sched_op), + HYPERCALL(event_channel_op), + [ __HYPERVISOR_physdev_op ] = (hvm_hypercall_t *)hvm_physdev_op, + HYPERCALL(hvm_op), + HYPERCALL(sysctl), + HYPERCALL(domctl) +}; + int hvm_do_hypercall(struct cpu_user_regs *regs) { struct vcpu *curr = current; @@ -3334,7 +3362,9 @@ int hvm_do_hypercall(struct cpu_user_regs *regs) if ( (eax & 0x80000000) && is_viridian_domain(curr->domain) ) return viridian_hypercall(regs); - if ( (eax >= NR_hypercalls) || !hvm_hypercall32_table[eax] ) + if ( (eax >= NR_hypercalls) || + (is_pvh_vcpu(curr) && !pvh_hypercall64_table[eax]) || + (is_hvm_vcpu(curr) && !hvm_hypercall32_table[eax]) ) { regs->eax = -ENOSYS; return HVM_HCALL_completed; @@ -3349,16 +3379,21 @@ int hvm_do_hypercall(struct cpu_user_regs *regs) regs->r10, regs->r8, regs->r9); curr->arch.hvm_vcpu.hcall_64bit = 1; - regs->rax = hvm_hypercall64_table[eax](regs->rdi, - regs->rsi, - regs->rdx, - regs->r10, - regs->r8, - regs->r9); + if ( is_pvh_vcpu(curr) ) + regs->rax = pvh_hypercall64_table[eax](regs->rdi, regs->rsi, + regs->rdx, regs->r10, + regs->r8, regs->r9); + else + regs->rax = hvm_hypercall64_table[eax](regs->rdi, regs->rsi, + regs->rdx, regs->r10, + regs->r8, regs->r9); curr->arch.hvm_vcpu.hcall_64bit = 0; + } else { + ASSERT(!is_pvh_vcpu(curr)); /* PVH 32bitfixme. */ + HVM_DBG_LOG(DBG_LEVEL_HCALL, "hcall%u(%x, %x, %x, %x, %x, %x)", eax, (uint32_t)regs->ebx, (uint32_t)regs->ecx, (uint32_t)regs->edx, (uint32_t)regs->esi, @@ -3756,6 +3791,23 @@ static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid, return 0; } +static long pvh_hvm_op(unsigned long op, struct domain *d, + struct xen_hvm_param *harg) +{ + long rc = -ENOSYS; + + if ( op == HVMOP_set_param ) + { + if ( harg->index == HVM_PARAM_CALLBACK_IRQ ) + { + hvm_set_callback_via(d, harg->value); + hvm_latch_shinfo_size(d); + rc = 0; + } + } + return rc; +} + long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) { @@ -3783,12 +3835,18 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) return -ESRCH; rc = -EINVAL; - if ( !is_hvm_domain(d) ) - goto param_fail; + if ( is_pv_domain(d) ) + goto param_done; rc = xsm_hvm_param(XSM_TARGET, d, op); if ( rc ) - goto param_fail; + goto param_done; + + if ( is_pvh_domain(d) ) + { + rc = pvh_hvm_op(op, d, &a); + goto param_done; + } if ( op == HVMOP_set_param ) { @@ -3997,7 +4055,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) op == HVMOP_set_param ? "set" : "get", a.index, a.value); - param_fail: + param_done: rcu_unlock_domain(d); break; } diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c index 6ac7762..af727f9 100644 --- a/xen/arch/x86/x86_64/traps.c +++ b/xen/arch/x86/x86_64/traps.c @@ -623,7 +623,7 @@ static void hypercall_page_initialise_ring3_kernel(void *hypercall_page) void hypercall_page_initialise(struct domain *d, void *hypercall_page) { memset(hypercall_page, 0xCC, PAGE_SIZE); - if ( is_hvm_domain(d) ) + if ( !is_pv_domain(d) ) hvm_hypercall_page_initialise(d, hypercall_page); else if ( !is_pv_32bit_domain(d) ) hypercall_page_initialise_ring3_kernel(hypercall_page); -- 1.7.2.3
This patch contains vmcs changes related for PVH, mainly creating a VMCS for PVH guest. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/vmx/vmcs.c | 247 ++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 245 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 36f167f..8d35370 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -634,7 +634,7 @@ void vmx_vmcs_exit(struct vcpu *v) { /* Don''t confuse vmx_do_resume (for @v or @current!) */ vmx_clear_vmcs(v); - if ( is_hvm_vcpu(current) ) + if ( !is_pv_vcpu(current) ) vmx_load_vmcs(current); spin_unlock(&v->arch.hvm_vmx.vmcs_lock); @@ -856,6 +856,239 @@ static void vmx_set_common_host_vmcs_fields(struct vcpu *v) __vmwrite(HOST_SYSENTER_EIP, sysenter_eip); } +static int pvh_check_requirements(struct vcpu *v) +{ + u64 required, tmpval = real_cr4_to_pv_guest_cr4(mmu_cr4_features); + + if ( !paging_mode_hap(v->domain) ) + { + printk(XENLOG_G_INFO "HAP is required for PVH guest.\n"); + return -EINVAL; + } + if ( !cpu_has_vmx_pat ) + { + printk(XENLOG_G_INFO "PVH: CPU does not have PAT support\n"); + return -ENOSYS; + } + if ( !cpu_has_vmx_msr_bitmap ) + { + printk(XENLOG_G_INFO "PVH: CPU does not have msr bitmap\n"); + return -ENOSYS; + } + if ( !cpu_has_vmx_vpid ) + { + printk(XENLOG_G_INFO "PVH: CPU doesn''t have VPID support\n"); + return -ENOSYS; + } + if ( !cpu_has_vmx_secondary_exec_control ) + { + printk(XENLOG_G_INFO "CPU Secondary exec is required to run PVH\n"); + return -ENOSYS; + } + + if ( v->domain->arch.vtsc ) + { + printk(XENLOG_G_INFO + "At present PVH only supports the default timer mode\n"); + return -ENOSYS; + } + + required = X86_CR4_PAE | X86_CR4_VMXE | X86_CR4_OSFXSR; + if ( (tmpval & required) != required ) + { + printk(XENLOG_G_INFO "PVH: required CR4 features not available:%lx\n", + required); + return -ENOSYS; + } + + return 0; +} + +static int pvh_construct_vmcs(struct vcpu *v) +{ + int rc, msr_type; + unsigned long *msr_bitmap; + struct domain *d = v->domain; + struct p2m_domain *p2m = p2m_get_hostp2m(d); + struct ept_data *ept = &p2m->ept; + u32 vmexit_ctl = vmx_vmexit_control; + u32 vmentry_ctl = vmx_vmentry_control; + u64 host_pat, tmpval = -1; + + if ( (rc = pvh_check_requirements(v)) ) + return rc; + + msr_bitmap = alloc_xenheap_page(); + if ( msr_bitmap == NULL ) + return -ENOMEM; + + /* 1. Pin-Based Controls: */ + __vmwrite(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_control); + + v->arch.hvm_vmx.exec_control = vmx_cpu_based_exec_control; + + /* 2. Primary Processor-based controls: */ + /* + * If rdtsc exiting is turned on and it goes thru emulate_privileged_op, + * then pv_vcpu.ctrlreg must be added to the pvh struct. + */ + v->arch.hvm_vmx.exec_control &= ~CPU_BASED_RDTSC_EXITING; + v->arch.hvm_vmx.exec_control &= ~CPU_BASED_USE_TSC_OFFSETING; + + v->arch.hvm_vmx.exec_control &= ~(CPU_BASED_INVLPG_EXITING | + CPU_BASED_CR3_LOAD_EXITING | + CPU_BASED_CR3_STORE_EXITING); + v->arch.hvm_vmx.exec_control |= CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; + v->arch.hvm_vmx.exec_control &= ~CPU_BASED_MONITOR_TRAP_FLAG; + v->arch.hvm_vmx.exec_control |= CPU_BASED_ACTIVATE_MSR_BITMAP; + v->arch.hvm_vmx.exec_control &= ~CPU_BASED_TPR_SHADOW; + v->arch.hvm_vmx.exec_control &= ~CPU_BASED_VIRTUAL_NMI_PENDING; + + __vmwrite(CPU_BASED_VM_EXEC_CONTROL, v->arch.hvm_vmx.exec_control); + + /* 3. Secondary Processor-based controls (Intel SDM: resvd bits are 0): */ + v->arch.hvm_vmx.secondary_exec_control = SECONDARY_EXEC_ENABLE_EPT; + v->arch.hvm_vmx.secondary_exec_control |= SECONDARY_EXEC_ENABLE_VPID; + v->arch.hvm_vmx.secondary_exec_control |= SECONDARY_EXEC_PAUSE_LOOP_EXITING; + + __vmwrite(SECONDARY_VM_EXEC_CONTROL, + v->arch.hvm_vmx.secondary_exec_control); + + __vmwrite(IO_BITMAP_A, virt_to_maddr((char *)hvm_io_bitmap + 0)); + __vmwrite(IO_BITMAP_B, virt_to_maddr((char *)hvm_io_bitmap + PAGE_SIZE)); + + /* MSR bitmap for intercepts. */ + memset(msr_bitmap, ~0, PAGE_SIZE); + v->arch.hvm_vmx.msr_bitmap = msr_bitmap; + __vmwrite(MSR_BITMAP, virt_to_maddr(msr_bitmap)); + + msr_type = MSR_TYPE_R | MSR_TYPE_W; + /* Disable interecepts for MSRs that have corresponding VMCS fields. */ + vmx_disable_intercept_for_msr(v, MSR_FS_BASE, msr_type); + vmx_disable_intercept_for_msr(v, MSR_GS_BASE, msr_type); + vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_CS, msr_type); + vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_ESP, msr_type); + vmx_disable_intercept_for_msr(v, MSR_IA32_SYSENTER_EIP, msr_type); + vmx_disable_intercept_for_msr(v, MSR_SHADOW_GS_BASE, msr_type); + vmx_disable_intercept_for_msr(v, MSR_IA32_CR_PAT, msr_type); + + /* + * We don''t disable intercepts for MSRs: MSR_STAR, MSR_LSTAR, MSR_CSTAR, + * and MSR_SYSCALL_MASK because we need to specify save/restore area to + * save/restore at every VM exit and entry. Instead, let the intercept + * functions save them into vmx_msr_state fields. See comment in + * vmx_restore_host_msrs(). See also vmx_restore_guest_msrs(). + */ + __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, 0); + __vmwrite(VM_EXIT_MSR_LOAD_COUNT, 0); + __vmwrite(VM_EXIT_MSR_STORE_COUNT, 0); + + __vmwrite(VM_EXIT_CONTROLS, vmexit_ctl); + + /* + * Note: we run with default VM_ENTRY_LOAD_DEBUG_CTLS of 1, which means + * upon vmentry, the cpu reads/loads VMCS.DR7 and VMCS.DEBUGCTLS, and not + * use the host values. 0 would cause it to not use the VMCS values. + */ + vmentry_ctl &= ~VM_ENTRY_LOAD_GUEST_EFER; + vmentry_ctl &= ~VM_ENTRY_SMM; + vmentry_ctl &= ~VM_ENTRY_DEACT_DUAL_MONITOR; + /* PVH 32bitfixme. */ + vmentry_ctl |= VM_ENTRY_IA32E_MODE; /* GUEST_EFER.LME/LMA ignored */ + + __vmwrite(VM_ENTRY_CONTROLS, vmentry_ctl); + + vmx_set_common_host_vmcs_fields(v); + + __vmwrite(VM_ENTRY_INTR_INFO, 0); + __vmwrite(CR3_TARGET_COUNT, 0); + __vmwrite(GUEST_ACTIVITY_STATE, 0); + + /* These are sorta irrelevant as we load the discriptors directly. */ + __vmwrite(GUEST_CS_SELECTOR, 0); + __vmwrite(GUEST_DS_SELECTOR, 0); + __vmwrite(GUEST_SS_SELECTOR, 0); + __vmwrite(GUEST_ES_SELECTOR, 0); + __vmwrite(GUEST_FS_SELECTOR, 0); + __vmwrite(GUEST_GS_SELECTOR, 0); + + __vmwrite(GUEST_CS_BASE, 0); + __vmwrite(GUEST_CS_LIMIT, ~0u); + /* CS.L == 1, exec, read/write, accessed. PVH 32bitfixme. */ + __vmwrite(GUEST_CS_AR_BYTES, 0xa09b); + + __vmwrite(GUEST_DS_BASE, 0); + __vmwrite(GUEST_DS_LIMIT, ~0u); + __vmwrite(GUEST_DS_AR_BYTES, 0xc093); /* read/write, accessed */ + + __vmwrite(GUEST_SS_BASE, 0); + __vmwrite(GUEST_SS_LIMIT, ~0u); + __vmwrite(GUEST_SS_AR_BYTES, 0xc093); /* read/write, accessed */ + + __vmwrite(GUEST_ES_BASE, 0); + __vmwrite(GUEST_ES_LIMIT, ~0u); + __vmwrite(GUEST_ES_AR_BYTES, 0xc093); /* read/write, accessed */ + + __vmwrite(GUEST_FS_BASE, 0); + __vmwrite(GUEST_FS_LIMIT, ~0u); + __vmwrite(GUEST_FS_AR_BYTES, 0xc093); /* read/write, accessed */ + + __vmwrite(GUEST_GS_BASE, 0); + __vmwrite(GUEST_GS_LIMIT, ~0u); + __vmwrite(GUEST_GS_AR_BYTES, 0xc093); /* read/write, accessed */ + + __vmwrite(GUEST_GDTR_BASE, 0); + __vmwrite(GUEST_GDTR_LIMIT, 0); + + __vmwrite(GUEST_LDTR_BASE, 0); + __vmwrite(GUEST_LDTR_LIMIT, 0); + __vmwrite(GUEST_LDTR_AR_BYTES, 0x82); /* LDT */ + __vmwrite(GUEST_LDTR_SELECTOR, 0); + + /* Guest TSS. */ + __vmwrite(GUEST_TR_BASE, 0); + __vmwrite(GUEST_TR_LIMIT, 0xff); + __vmwrite(GUEST_TR_AR_BYTES, 0x8b); /* 32-bit TSS (busy) */ + + __vmwrite(GUEST_INTERRUPTIBILITY_INFO, 0); + __vmwrite(GUEST_DR7, 0); + __vmwrite(VMCS_LINK_POINTER, ~0UL); + + __vmwrite(PAGE_FAULT_ERROR_CODE_MASK, 0); + __vmwrite(PAGE_FAULT_ERROR_CODE_MATCH, 0); + + v->arch.hvm_vmx.exception_bitmap = HVM_TRAP_MASK | (1U << TRAP_debug) | + (1U << TRAP_int3) | (1U << TRAP_no_device); + __vmwrite(EXCEPTION_BITMAP, v->arch.hvm_vmx.exception_bitmap); + + /* Set WP bit so rdonly pages are not written from CPL 0. */ + tmpval = X86_CR0_PG | X86_CR0_NE | X86_CR0_PE | X86_CR0_WP; + __vmwrite(GUEST_CR0, tmpval); + __vmwrite(CR0_READ_SHADOW, tmpval); + v->arch.hvm_vcpu.hw_cr[0] = v->arch.hvm_vcpu.guest_cr[0] = tmpval; + + tmpval = real_cr4_to_pv_guest_cr4(mmu_cr4_features); + __vmwrite(GUEST_CR4, tmpval); + __vmwrite(CR4_READ_SHADOW, tmpval); + v->arch.hvm_vcpu.guest_cr[4] = tmpval; + + __vmwrite(CR0_GUEST_HOST_MASK, ~0UL); + __vmwrite(CR4_GUEST_HOST_MASK, ~0UL); + + v->arch.hvm_vmx.vmx_realmode = 0; + + ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m)); + __vmwrite(EPT_POINTER, ept_get_eptp(ept)); + + rdmsrl(MSR_IA32_CR_PAT, host_pat); + __vmwrite(HOST_PAT, host_pat); + __vmwrite(GUEST_PAT, MSR_IA32_CR_PAT_RESET); + + /* The paging mode is updated for PVH by arch_set_info_guest(). */ + + return 0; +} + static int construct_vmcs(struct vcpu *v) { struct domain *d = v->domain; @@ -864,6 +1097,13 @@ static int construct_vmcs(struct vcpu *v) vmx_vmcs_enter(v); + if ( is_pvh_vcpu(v) ) + { + int rc = pvh_construct_vmcs(v); + vmx_vmcs_exit(v); + return rc; + } + /* VMCS controls. */ __vmwrite(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_control); @@ -1294,6 +1534,9 @@ void vmx_do_resume(struct vcpu *v) hvm_asid_flush_vcpu(v); } + if ( is_pvh_vcpu(v) ) + reset_stack_and_jump(vmx_asm_do_vmentry); + debug_state = v->domain->debugger_attached || v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_INT3] || v->domain->arch.hvm_domain.params[HVM_PARAM_MEMORY_EVENT_SINGLE_STEP]; @@ -1477,7 +1720,7 @@ static void vmcs_dump(unsigned char ch) for_each_domain ( d ) { - if ( !is_hvm_domain(d) ) + if ( is_pv_domain(d) ) continue; printk("\n>>> Domain %d <<<\n", d->domain_id); for_each_vcpu ( d, v ) -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:33 UTC
[PATCH 21/24] PVH xen: HVM support of PVH guest creation/destruction
This patch implements the HVM portion of the guest create, ie vcpu and domain initilization. Some changes to support the destroy path. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/hvm.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 65 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 3d930eb..7066d7b 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -510,6 +510,30 @@ static int hvm_print_line( return X86EMUL_OKAY; } +static int pvh_dom_initialise(struct domain *d) +{ + int rc; + + if ( !d->arch.hvm_domain.hap_enabled ) + return -EINVAL; + + spin_lock_init(&d->arch.hvm_domain.irq_lock); + + hvm_init_cacheattr_region_list(d); + + if ( (rc = paging_enable(d, PG_refcounts|PG_translate|PG_external)) != 0 ) + goto pvh_dominit_fail; + + if ( (rc = hvm_funcs.domain_initialise(d)) != 0 ) + goto pvh_dominit_fail; + + return 0; + +pvh_dominit_fail: + hvm_destroy_cacheattr_region_list(d); + return rc; +} + int hvm_domain_initialise(struct domain *d) { int rc; @@ -520,6 +544,8 @@ int hvm_domain_initialise(struct domain *d) "on a non-VT/AMDV platform.\n"); return -EINVAL; } + if ( is_pvh_domain(d) ) + return pvh_dom_initialise(d); spin_lock_init(&d->arch.hvm_domain.pbuf_lock); spin_lock_init(&d->arch.hvm_domain.irq_lock); @@ -584,6 +610,9 @@ int hvm_domain_initialise(struct domain *d) void hvm_domain_relinquish_resources(struct domain *d) { + if ( is_pvh_domain(d) ) + return; + if ( hvm_funcs.nhvm_domain_relinquish_resources ) hvm_funcs.nhvm_domain_relinquish_resources(d); @@ -609,10 +638,14 @@ void hvm_domain_relinquish_resources(struct domain *d) void hvm_domain_destroy(struct domain *d) { hvm_funcs.domain_destroy(d); + hvm_destroy_cacheattr_region_list(d); + + if ( is_pvh_domain(d) ) + return; + rtc_deinit(d); stdvga_deinit(d); vioapic_deinit(d); - hvm_destroy_cacheattr_region_list(d); } static int hvm_save_tsc_adjust(struct domain *d, hvm_domain_context_t *h) @@ -1066,6 +1099,30 @@ static int __init __hvm_register_CPU_XSAVE_save_and_restore(void) } __initcall(__hvm_register_CPU_XSAVE_save_and_restore); +static int pvh_vcpu_initialise(struct vcpu *v) +{ + int rc; + + if ( (rc = hvm_funcs.vcpu_initialise(v)) != 0 ) + return rc; + + softirq_tasklet_init(&v->arch.hvm_vcpu.assert_evtchn_irq_tasklet, + (void(*)(unsigned long))hvm_assert_evtchn_irq, + (unsigned long)v); + + v->arch.hvm_vcpu.hcall_64bit = 1; /* PVH 32bitfixme. */ + v->arch.user_regs.eflags = 2; + v->arch.hvm_vcpu.inject_trap.vector = -1; + + if ( (rc = hvm_vcpu_cacheattr_init(v)) != 0 ) + { + hvm_funcs.vcpu_destroy(v); + return rc; + } + + return 0; +} + int hvm_vcpu_initialise(struct vcpu *v) { int rc; @@ -1077,6 +1134,9 @@ int hvm_vcpu_initialise(struct vcpu *v) spin_lock_init(&v->arch.hvm_vcpu.tm_lock); INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list); + if ( is_pvh_vcpu(v) ) + return pvh_vcpu_initialise(v); + if ( (rc = vlapic_init(v)) != 0 ) goto fail1; @@ -1165,7 +1225,10 @@ void hvm_vcpu_destroy(struct vcpu *v) tasklet_kill(&v->arch.hvm_vcpu.assert_evtchn_irq_tasklet); hvm_vcpu_cacheattr_destroy(v); - vlapic_destroy(v); + + if ( !is_pvh_vcpu(v) ) + vlapic_destroy(v); + hvm_funcs.vcpu_destroy(v); /* Event channel is already freed by evtchn_destroy(). */ -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:33 UTC
[PATCH 22/24] PVH xen: VMX support of PVH guest creation/destruction
This patch implements the vmx portion of the guest create, ie vcpu and domain initilization. Some changes to support the destroy path. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/vmx/vmx.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 files changed, 40 insertions(+), 0 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 9be321d..8f08253 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -82,6 +82,9 @@ static int vmx_domain_initialise(struct domain *d) { int rc; + if ( is_pvh_domain(d) ) + return 0; + if ( (rc = vmx_alloc_vlapic_mapping(d)) != 0 ) return rc; @@ -90,6 +93,9 @@ static int vmx_domain_initialise(struct domain *d) static void vmx_domain_destroy(struct domain *d) { + if ( is_pvh_domain(d) ) + return; + vmx_free_vlapic_mapping(d); } @@ -113,6 +119,12 @@ static int vmx_vcpu_initialise(struct vcpu *v) vpmu_initialise(v); + if ( is_pvh_vcpu(v) ) + { + /* This for hvm_long_mode_enabled(v). */ + v->arch.hvm_vcpu.guest_efer = EFER_SCE | EFER_LMA | EFER_LME; + return 0; + } vmx_install_vlapic_mapping(v); /* %eax == 1 signals full real-mode support to the guest loader. */ @@ -1076,6 +1088,28 @@ static void vmx_update_host_cr3(struct vcpu *v) vmx_vmcs_exit(v); } +/* + * PVH guest never causes CR3 write vmexit. This is called during the guest + * setup. + */ +static void vmx_update_pvh_cr(struct vcpu *v, unsigned int cr) +{ + vmx_vmcs_enter(v); + switch ( cr ) + { + case 3: + __vmwrite(GUEST_CR3, v->arch.hvm_vcpu.guest_cr[3]); + hvm_asid_flush_vcpu(v); + break; + + default: + printk(XENLOG_ERR + "PVH: d%d v%d unexpected cr%d update at rip:%lx\n", + v->domain->domain_id, v->vcpu_id, cr, __vmread(GUEST_RIP)); + } + vmx_vmcs_exit(v); +} + void vmx_update_debug_state(struct vcpu *v) { unsigned long mask; @@ -1095,6 +1129,12 @@ void vmx_update_debug_state(struct vcpu *v) static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr) { + if ( is_pvh_vcpu(v) ) + { + vmx_update_pvh_cr(v, cr); + return; + } + vmx_vmcs_enter(v); switch ( cr ) -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:33 UTC
[PATCH 23/24] PVH xen: preparatory patch for the pvh vmexit handler patch
This is a preparatory patch for the next pvh vmexit handler patch. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/vmx/pvh.c | 5 +++++ xen/arch/x86/hvm/vmx/vmx.c | 6 ++++++ xen/arch/x86/traps.c | 4 ++-- xen/include/asm-x86/hvm/vmx/vmx.h | 1 + xen/include/asm-x86/processor.h | 2 ++ xen/include/asm-x86/traps.h | 2 ++ 6 files changed, 18 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/pvh.c b/xen/arch/x86/hvm/vmx/pvh.c index 8638850..fb55ac8 100644 --- a/xen/arch/x86/hvm/vmx/pvh.c +++ b/xen/arch/x86/hvm/vmx/pvh.c @@ -20,6 +20,11 @@ #include <asm/hvm/nestedhvm.h> #include <asm/xstate.h> +/* Implemented in the next patch */ +void vmx_pvh_vmexit_handler(struct cpu_user_regs *regs) +{ +} + /* * Set vmcs fields in support of vcpu_op -> VCPUOP_initialise hcall. Called * from arch_set_info_guest() which sets the (PVH relevant) non-vmcs fields. diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 8f08253..59070a8 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2491,6 +2491,12 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) if ( unlikely(exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) ) return vmx_failed_vmentry(exit_reason, regs); + if ( is_pvh_vcpu(v) ) + { + vmx_pvh_vmexit_handler(regs); + return; + } + if ( v->arch.hvm_vmx.vmx_realmode ) { /* Put RFLAGS back the way the guest wants it */ diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 5325e92..1e8cf60 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -745,7 +745,7 @@ int cpuid_hypervisor_leaves( uint32_t idx, uint32_t sub_idx, return 1; } -static void pv_cpuid(struct cpu_user_regs *regs) +void pv_cpuid(struct cpu_user_regs *regs) { uint32_t a, b, c, d; @@ -1904,7 +1904,7 @@ static int is_cpufreq_controller(struct domain *d) #include "x86_64/mmconfig.h" -static int emulate_privileged_op(struct cpu_user_regs *regs) +int emulate_privileged_op(struct cpu_user_regs *regs) { enum x86_segment which_sel; struct vcpu *v = current; diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h index 9e6c481..44e4136 100644 --- a/xen/include/asm-x86/hvm/vmx/vmx.h +++ b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -474,6 +474,7 @@ void vmx_dr_access(unsigned long exit_qualification, struct cpu_user_regs *regs); void vmx_fpu_enter(struct vcpu *v); int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context *ctxtp); +void vmx_pvh_vmexit_handler(struct cpu_user_regs *regs); int alloc_p2m_hap_data(struct p2m_domain *p2m); void free_p2m_hap_data(struct p2m_domain *p2m); diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h index 5cdacc7..22a9653 100644 --- a/xen/include/asm-x86/processor.h +++ b/xen/include/asm-x86/processor.h @@ -566,6 +566,8 @@ void microcode_set_module(unsigned int); int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void), unsigned long len); int microcode_resume_cpu(int cpu); +void pv_cpuid(struct cpu_user_regs *regs); + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_X86_PROCESSOR_H */ diff --git a/xen/include/asm-x86/traps.h b/xen/include/asm-x86/traps.h index 1d9b087..8c3540a 100644 --- a/xen/include/asm-x86/traps.h +++ b/xen/include/asm-x86/traps.h @@ -50,4 +50,6 @@ extern int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr); int emulate_forced_invalid_op(struct cpu_user_regs *regs); +int emulate_privileged_op(struct cpu_user_regs *regs); + #endif /* ASM_TRAP_H */ -- 1.7.2.3
Mukesh Rathor
2013-Jul-18 02:33 UTC
[PATCH 24/24] PVH xen: introduce vmexit handler for PVH
This patch contains vmx exit handler for PVH guest. Note it contains a macro dbgp1 to print vmexit reasons and a lot of other data to go with it. It can be enabled by setting pvhdbg to 1. This can be very useful debugging for the first few months of testing, after which it can be removed at the maintainer''s discretion. Changes in V2: - Move non VMX generic code to arch/x86/hvm/pvh.c - Remove get_gpr_ptr() and use existing decode_register() instead. - Defer call to pvh vmx exit handler until interrupts are enabled. So the caller vmx_pvh_vmexit_handler() handles the NMI/EXT-INT/TRIPLE_FAULT now. - Fix the CPUID (wrongly) clearing bit 24. No need to do this now, set the correct feature bits in CR4 during vmcs creation. - Fix few hard tabs. Changes in V3: - Lot of cleanup and rework in PVH vm exit handler. - add parameter to emulate_forced_invalid_op(). Changes in V5: - Move pvh.c and emulate_forced_invalid_op related changes to another patch. - Formatting. - Remove vmx_pvh_read_descriptor(). - Use SS DPL instead of CS.RPL for CPL. - Remove pvh_user_cpuid() and call pv_cpuid for user mode also. Changes in V6: - Replace domain_crash_synchronous() with domain_crash(). Changes in V7: - Don''t read all selectors on every vmexit. Do that only for the IO instruction vmexit. - Add couple checks and set guest_cr[4] in access_cr4(). - Add period after all comments in case that''s an issue. - Move making pv_cpuid and emulate_privileged_op public here. Changes in V8: - Mainly, don''t read selectors on vmexit. The macros now come to VMCS to read selectors on demand. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- xen/arch/x86/hvm/vmx/pvh.c | 451 +++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 450 insertions(+), 1 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/pvh.c b/xen/arch/x86/hvm/vmx/pvh.c index fb55ac8..508b740 100644 --- a/xen/arch/x86/hvm/vmx/pvh.c +++ b/xen/arch/x86/hvm/vmx/pvh.c @@ -20,9 +20,458 @@ #include <asm/hvm/nestedhvm.h> #include <asm/xstate.h> -/* Implemented in the next patch */ +#ifndef NDEBUG +static int pvhdbg = 0; +#define dbgp1(...) do { (pvhdbg == 1) ? printk(__VA_ARGS__) : 0; } while ( 0 ) +#else +#define dbgp1(...) ((void)0) +#endif + +/* Returns : 0 == msr read successfully. */ +static int vmxit_msr_read(struct cpu_user_regs *regs) +{ + u64 msr_content = 0; + + switch ( regs->ecx ) + { + case MSR_IA32_MISC_ENABLE: + rdmsrl(MSR_IA32_MISC_ENABLE, msr_content); + msr_content |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL | + MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL; + break; + + default: + /* PVH fixme: see hvm_msr_read_intercept(). */ + rdmsrl(regs->ecx, msr_content); + break; + } + regs->eax = (uint32_t)msr_content; + regs->edx = (uint32_t)(msr_content >> 32); + vmx_update_guest_eip(); + + dbgp1("msr read c:%lx a:%lx d:%lx RIP:%lx RSP:%lx\n", regs->ecx, regs->eax, + regs->edx, regs->rip, regs->rsp); + + return 0; +} + +/* Returns : 0 == msr written successfully. */ +static int vmxit_msr_write(struct cpu_user_regs *regs) +{ + uint64_t msr_content = (uint32_t)regs->eax | ((uint64_t)regs->edx << 32); + + dbgp1("PVH: msr write:0x%lx. eax:0x%lx edx:0x%lx\n", regs->ecx, + regs->eax, regs->edx); + + if ( hvm_msr_write_intercept(regs->ecx, msr_content) == X86EMUL_OKAY ) + { + vmx_update_guest_eip(); + return 0; + } + return 1; +} + +static int vmxit_debug(struct cpu_user_regs *regs) +{ + struct vcpu *vp = current; + unsigned long exit_qualification = __vmread(EXIT_QUALIFICATION); + + write_debugreg(6, exit_qualification | 0xffff0ff0); + + /* gdbsx or another debugger. Never pause dom0. */ + if ( vp->domain->domain_id != 0 && vp->domain->debugger_attached ) + domain_pause_for_debugger(); + else + hvm_inject_hw_exception(TRAP_debug, HVM_DELIVER_NO_ERROR_CODE); + + return 0; +} + +/* Returns: rc == 0: handled the MTF vmexit. */ +static int vmxit_mtf(struct cpu_user_regs *regs) +{ + struct vcpu *vp = current; + int rc = -EINVAL, ss = vp->arch.hvm_vcpu.single_step; + + vp->arch.hvm_vmx.exec_control &= ~CPU_BASED_MONITOR_TRAP_FLAG; + __vmwrite(CPU_BASED_VM_EXEC_CONTROL, vp->arch.hvm_vmx.exec_control); + vp->arch.hvm_vcpu.single_step = 0; + + if ( vp->domain->debugger_attached && ss ) + { + domain_pause_for_debugger(); + rc = 0; + } + return rc; +} + +static int vmxit_int3(struct cpu_user_regs *regs) +{ + int ilen = vmx_get_instruction_length(); + struct vcpu *vp = current; + struct hvm_trap trap_info = { + .vector = TRAP_int3, + .type = X86_EVENTTYPE_SW_EXCEPTION, + .error_code = HVM_DELIVER_NO_ERROR_CODE, + .insn_len = ilen + }; + + /* gdbsx or another debugger. Never pause dom0. */ + if ( vp->domain->domain_id != 0 && vp->domain->debugger_attached ) + { + regs->eip += ilen; + dbgp1("[%d]PVH: domain pause for debugger\n", smp_processor_id()); + current->arch.gdbsx_vcpu_event = TRAP_int3; + domain_pause_for_debugger(); + return 0; + } + hvm_inject_trap(&trap_info); + + return 0; +} + +/* Just like HVM, PVH should be using "cpuid" from the kernel mode. */ +static int vmxit_invalid_op(struct cpu_user_regs *regs) +{ + if ( guest_kernel_mode(current, regs) || !emulate_forced_invalid_op(regs) ) + hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE); + + return 0; +} + +/* Returns: rc == 0: handled the exception. */ +static int vmxit_exception(struct cpu_user_regs *regs) +{ + int vector = (__vmread(VM_EXIT_INTR_INFO)) & INTR_INFO_VECTOR_MASK; + int rc = -ENOSYS; + + dbgp1(" EXCPT: vec:%d cs:%lx r.IP:%lx\n", vector, + __vmread(GUEST_CS_SELECTOR), regs->eip); + + switch ( vector ) + { + case TRAP_debug: + rc = vmxit_debug(regs); + break; + + case TRAP_int3: + rc = vmxit_int3(regs); + break; + + case TRAP_invalid_op: + rc = vmxit_invalid_op(regs); + break; + + case TRAP_no_device: + hvm_funcs.fpu_dirty_intercept(); + rc = 0; + break; + + default: + printk(XENLOG_G_WARNING + "PVH: Unhandled trap:%d. IP:%lx\n", vector, regs->eip); + } + return rc; +} + +static int vmxit_vmcall(struct cpu_user_regs *regs) +{ + if ( hvm_do_hypercall(regs) != HVM_HCALL_preempted ) + vmx_update_guest_eip(); + return 0; +} + +/* Returns: rc == 0: success. */ +static int access_cr0(struct cpu_user_regs *regs, uint acc_typ, uint64_t *regp) +{ + struct vcpu *vp = current; + + if ( acc_typ == VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR ) + { + unsigned long new_cr0 = *regp; + unsigned long old_cr0 = __vmread(GUEST_CR0); + + dbgp1("PVH:writing to CR0. RIP:%lx val:0x%lx\n", regs->rip, *regp); + if ( (u32)new_cr0 != new_cr0 ) + { + printk(XENLOG_G_WARNING + "Guest setting upper 32 bits in CR0: %lx", new_cr0); + return -EPERM; + } + + new_cr0 &= ~HVM_CR0_GUEST_RESERVED_BITS; + /* ET is reserved and should always be 1. */ + new_cr0 |= X86_CR0_ET; + + /* A pvh is not expected to change to real mode. */ + if ( (new_cr0 & (X86_CR0_PE | X86_CR0_PG)) !+ (X86_CR0_PG | X86_CR0_PE) ) + { + printk(XENLOG_G_WARNING + "PVH attempting to turn off PE/PG. CR0:%lx\n", new_cr0); + return -EPERM; + } + /* TS going from 1 to 0 */ + if ( (old_cr0 & X86_CR0_TS) && ((new_cr0 & X86_CR0_TS) == 0) ) + vmx_fpu_enter(vp); + + vp->arch.hvm_vcpu.hw_cr[0] = vp->arch.hvm_vcpu.guest_cr[0] = new_cr0; + __vmwrite(GUEST_CR0, new_cr0); + __vmwrite(CR0_READ_SHADOW, new_cr0); + } + else + *regp = __vmread(GUEST_CR0); + + return 0; +} + +/* Returns: rc == 0: success. */ +static int access_cr4(struct cpu_user_regs *regs, uint acc_typ, uint64_t *regp) +{ + if ( acc_typ == VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR ) + { + struct vcpu *vp = current; + u64 old_val = __vmread(GUEST_CR4); + u64 new = *regp; + + if ( new & HVM_CR4_GUEST_RESERVED_BITS(vp) ) + { + printk(XENLOG_G_WARNING + "PVH guest attempts to set reserved bit in CR4: %lx", new); + hvm_inject_hw_exception(TRAP_gp_fault, 0); + return 0; + } + + if ( !(new & X86_CR4_PAE) && hvm_long_mode_enabled(vp) ) + { + printk(XENLOG_G_WARNING "Guest cleared CR4.PAE while " + "EFER.LMA is set"); + hvm_inject_hw_exception(TRAP_gp_fault, 0); + return 0; + } + + vp->arch.hvm_vcpu.guest_cr[4] = new; + + if ( (old_val ^ new) & (X86_CR4_PSE | X86_CR4_PGE | X86_CR4_PAE) ) + vpid_sync_all(); + + __vmwrite(CR4_READ_SHADOW, new); + + new &= ~X86_CR4_PAE; /* PVH always runs with hap enabled. */ + new |= X86_CR4_VMXE | X86_CR4_MCE; + __vmwrite(GUEST_CR4, new); + } + else + *regp = __vmread(CR4_READ_SHADOW); + + return 0; +} + +/* Returns: rc == 0: success, else -errno. */ +static int vmxit_cr_access(struct cpu_user_regs *regs) +{ + unsigned long exit_qualification = __vmread(EXIT_QUALIFICATION); + uint acc_typ = VMX_CONTROL_REG_ACCESS_TYPE(exit_qualification); + int cr, rc = -EINVAL; + + switch ( acc_typ ) + { + case VMX_CONTROL_REG_ACCESS_TYPE_MOV_TO_CR: + case VMX_CONTROL_REG_ACCESS_TYPE_MOV_FROM_CR: + { + uint gpr = VMX_CONTROL_REG_ACCESS_GPR(exit_qualification); + uint64_t *regp = decode_register(gpr, regs, 0); + cr = VMX_CONTROL_REG_ACCESS_NUM(exit_qualification); + + if ( regp == NULL ) + break; + + switch ( cr ) + { + case 0: + rc = access_cr0(regs, acc_typ, regp); + break; + + case 3: + printk(XENLOG_G_ERR "PVH: unexpected cr3 vmexit. rip:%lx\n", + regs->rip); + domain_crash(current->domain); + break; + + case 4: + rc = access_cr4(regs, acc_typ, regp); + break; + } + if ( rc == 0 ) + vmx_update_guest_eip(); + break; + } + + case VMX_CONTROL_REG_ACCESS_TYPE_CLTS: + { + struct vcpu *vp = current; + unsigned long cr0 = vp->arch.hvm_vcpu.guest_cr[0] & ~X86_CR0_TS; + vp->arch.hvm_vcpu.hw_cr[0] = vp->arch.hvm_vcpu.guest_cr[0] = cr0; + + vmx_fpu_enter(vp); + __vmwrite(GUEST_CR0, cr0); + __vmwrite(CR0_READ_SHADOW, cr0); + vmx_update_guest_eip(); + rc = 0; + } + } + return rc; +} + +/* + * Note: A PVH guest sets IOPL natively by setting bits in the eflags, and not + * via hypercalls used by a PV. + */ +static int vmxit_io_instr(struct cpu_user_regs *regs) +{ + struct segment_register seg; + int requested = (regs->rflags & X86_EFLAGS_IOPL) >> 12; + int curr_lvl = (regs->rflags & X86_EFLAGS_VM) ? 3 : 0; + + if ( curr_lvl == 0 ) + { + hvm_get_segment_register(current, x86_seg_ss, &seg); + curr_lvl = seg.attr.fields.dpl; + } + if ( requested >= curr_lvl && emulate_privileged_op(regs) ) + return 0; + + hvm_inject_hw_exception(TRAP_gp_fault, regs->error_code); + return 0; +} + +static int pvh_ept_handle_violation(unsigned long qualification, + paddr_t gpa, struct cpu_user_regs *regs) +{ + unsigned long gla, gfn = gpa >> PAGE_SHIFT; + p2m_type_t p2mt; + mfn_t mfn = get_gfn_query_unlocked(current->domain, gfn, &p2mt); + + printk(XENLOG_G_ERR "EPT violation %#lx (%c%c%c/%c%c%c), " + "gpa %#"PRIpaddr", mfn %#lx, type %i. IP:0x%lx RSP:0x%lx\n", + qualification, + (qualification & EPT_READ_VIOLATION) ? ''r'' : ''-'', + (qualification & EPT_WRITE_VIOLATION) ? ''w'' : ''-'', + (qualification & EPT_EXEC_VIOLATION) ? ''x'' : ''-'', + (qualification & EPT_EFFECTIVE_READ) ? ''r'' : ''-'', + (qualification & EPT_EFFECTIVE_WRITE) ? ''w'' : ''-'', + (qualification & EPT_EFFECTIVE_EXEC) ? ''x'' : ''-'', + gpa, mfn_x(mfn), p2mt, regs->rip, regs->rsp); + + ept_walk_table(current->domain, gfn); + + if ( qualification & EPT_GLA_VALID ) + { + gla = __vmread(GUEST_LINEAR_ADDRESS); + printk(XENLOG_G_ERR " --- GLA %#lx\n", gla); + } + hvm_inject_hw_exception(TRAP_gp_fault, 0); + return 0; +} + +/* + * Main vm exit handler for PVH . Called from vmx_vmexit_handler(). + * Note: vmx_asm_vmexit_handler updates rip/rsp/eflags in regs{} struct. + */ void vmx_pvh_vmexit_handler(struct cpu_user_regs *regs) { + unsigned long exit_qualification; + unsigned int exit_reason = __vmread(VM_EXIT_REASON); + int rc=0, ccpu = smp_processor_id(); + struct vcpu *v = current; + + dbgp1("PVH:[%d]left VMCS exitreas:%d RIP:%lx RSP:%lx EFLAGS:%lx CR0:%lx\n", + ccpu, exit_reason, regs->rip, regs->rsp, regs->rflags, + __vmread(GUEST_CR0)); + + switch ( (uint16_t)exit_reason ) + { + /* NMI and machine_check are handled by the caller, we handle rest here */ + case EXIT_REASON_EXCEPTION_NMI: /* 0 */ + rc = vmxit_exception(regs); + break; + + case EXIT_REASON_EXTERNAL_INTERRUPT: /* 1 */ + break; /* handled in vmx_vmexit_handler() */ + + case EXIT_REASON_PENDING_VIRT_INTR: /* 7 */ + /* Disable the interrupt window. */ + v->arch.hvm_vmx.exec_control &= ~CPU_BASED_VIRTUAL_INTR_PENDING; + __vmwrite(CPU_BASED_VM_EXEC_CONTROL, v->arch.hvm_vmx.exec_control); + break; + + case EXIT_REASON_CPUID: /* 10 */ + pv_cpuid(regs); + vmx_update_guest_eip(); + break; + + case EXIT_REASON_HLT: /* 12 */ + vmx_update_guest_eip(); + hvm_hlt(regs->eflags); + break; + + case EXIT_REASON_VMCALL: /* 18 */ + rc = vmxit_vmcall(regs); + break; + + case EXIT_REASON_CR_ACCESS: /* 28 */ + rc = vmxit_cr_access(regs); + break; + + case EXIT_REASON_DR_ACCESS: /* 29 */ + exit_qualification = __vmread(EXIT_QUALIFICATION); + vmx_dr_access(exit_qualification, regs); + break; + + case EXIT_REASON_IO_INSTRUCTION: /* 30 */ + vmxit_io_instr(regs); + break; + + case EXIT_REASON_MSR_READ: /* 31 */ + rc = vmxit_msr_read(regs); + break; + + case EXIT_REASON_MSR_WRITE: /* 32 */ + rc = vmxit_msr_write(regs); + break; + + case EXIT_REASON_MONITOR_TRAP_FLAG: /* 37 */ + rc = vmxit_mtf(regs); + break; + + case EXIT_REASON_MCE_DURING_VMENTRY: /* 41 */ + break; /* handled in vmx_vmexit_handler() */ + + case EXIT_REASON_EPT_VIOLATION: /* 48 */ + { + paddr_t gpa = __vmread(GUEST_PHYSICAL_ADDRESS); + exit_qualification = __vmread(EXIT_QUALIFICATION); + rc = pvh_ept_handle_violation(exit_qualification, gpa, regs); + break; + } + + default: + rc = 1; + printk(XENLOG_G_ERR + "PVH: Unexpected exit reason:%#x\n", exit_reason); + } + + if ( rc ) + { + exit_qualification = __vmread(EXIT_QUALIFICATION); + printk(XENLOG_G_WARNING + "PVH: [%d] exit_reas:%d %#x qual:%ld 0x%lx cr0:0x%016lx\n", + ccpu, exit_reason, exit_reason, exit_qualification, + exit_qualification, __vmread(GUEST_CR0)); + printk(XENLOG_G_WARNING "PVH: RIP:%lx RSP:%lx EFLAGS:%lx CR3:%lx\n", + regs->rip, regs->rsp, regs->rflags, __vmread(GUEST_CR3)); + domain_crash(v->domain); + } } /* -- 1.7.2.3
Ian Campbell
2013-Jul-18 10:09 UTC
Re: [PATCH 01/24] PVH xen: Add readme docs/misc/pvh-readme.txt
On Wed, 2013-07-17 at 19:32 -0700, Mukesh Rathor wrote:> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > --- > docs/misc/pvh-readme.txt | 40 ++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 40 insertions(+), 0 deletions(-) > create mode 100644 docs/misc/pvh-readme.txt > > diff --git a/docs/misc/pvh-readme.txt b/docs/misc/pvh-readme.txt > new file mode 100644 > index 0000000..a813373 > --- /dev/null > +++ b/docs/misc/pvh-readme.txt > @@ -0,0 +1,40 @@ > + > +PVH : a pv guest running in an HVM container. HAP is required for PVH. > + > +See: http://blog.xen.org/index.php/2012/10/23/the-paravirtualization-spectrum-part-1-the-ends-of-the-spectrum/ > + > + > +The initial phase targets the booting of a 64bit UP/SMP linux guest in PVH > +mode. This is done by adding: pvh=1 in the config file. xl, and not xm, is > +supported. Phase I patches are broken into three parts: > + - xen changes for booting of 64bit PVH guest > + - tools changes for creating a PVH guest > + - boot of 64bit dom0 in PVH mode. > + > +The best way to find all the patches is to use "git log|grep -i PVH", both > +in xen and linux tree. > + > +Following fixme''s exist in the code: > + - Add support for more memory types in arch/x86/hvm/mtrr.c. > + - arch/x86/time.c: support more tsc modes. > + - check_guest_io_breakpoint(): check/add support for IO breakpoint. > + - implement arch_get_info_guest() for pvh. > + - vmxit_msr_read(): during AMD port go thru hvm_msr_read_intercept() again. > + - verify bp matching on emulated instructions will work same as HVM for > + PVH guest. see instruction_done() and check_guest_io_breakpoint(). > + > +Following remain to be done for PVH: > + - AMD port. > + - 32bit PVH guest support in both linux and xen. Xen changes are tagged > + "32bitfixme". > + - Add support for monitoring guest behavior. See hvm_memory_event* functions > + in hvm.c > + - vcpu hotplug support > + - Live migration of PVH guests. > + - Avail PVH dom0 of posted interrupts. (This will be a big win). > + > + > +Note, any emails to must be cc''d to Xen-devel@lists.xensource.com.lists.xen.org please, lets not get two domain names behind ;-) Is there a description somewhere in the series of what PVH means in terms of the guest visible ABI? i.e. documentation of the delta from the regular PV mode? I had a skim through and didn''t spot it.> + > +Mukesh Rathor > +mukesh.rathor [at] oracle [dot] com
Jan Beulich
2013-Jul-18 10:32 UTC
Re: [PATCH 01/24] PVH xen: Add readme docs/misc/pvh-readme.txt
>>> On 18.07.13 at 04:32, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > --- > docs/misc/pvh-readme.txt | 40 ++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 40 insertions(+), 0 deletions(-) > create mode 100644 docs/misc/pvh-readme.txt > > diff --git a/docs/misc/pvh-readme.txt b/docs/misc/pvh-readme.txt > new file mode 100644 > index 0000000..a813373 > --- /dev/null > +++ b/docs/misc/pvh-readme.txt > @@ -0,0 +1,40 @@ > + > +PVH : a pv guest running in an HVM container. HAP is required for PVH. > + > +See: > http://blog.xen.org/index.php/2012/10/23/the-paravirtualization-spectrum-p > art-1-the-ends-of-the-spectrum/ > + > + > +The initial phase targets the booting of a 64bit UP/SMP linux guest in PVH > +mode. This is done by adding: pvh=1 in the config file. xl, and not xm, is > +supported. Phase I patches are broken into three parts: > + - xen changes for booting of 64bit PVH guest > + - tools changes for creating a PVH guest > + - boot of 64bit dom0 in PVH mode. > + > +The best way to find all the patches is to use "git log|grep -i PVH", both > +in xen and linux tree.Which doesn''t really say which tree. Do you mean the upstream ones or some private ones you maintain?> + > +Following fixme''s exist in the code: > + - Add support for more memory types in arch/x86/hvm/mtrr.c. > + - arch/x86/time.c: support more tsc modes. > + - check_guest_io_breakpoint(): check/add support for IO breakpoint. > + - implement arch_get_info_guest() for pvh. > + - vmxit_msr_read(): during AMD port go thru hvm_msr_read_intercept() > again. > + - verify bp matching on emulated instructions will work same as HVM for > + PVH guest. see instruction_done() and check_guest_io_breakpoint(). > + > +Following remain to be done for PVH: > + - AMD port. > + - 32bit PVH guest support in both linux and xen. Xen changes are tagged > + "32bitfixme". > + - Add support for monitoring guest behavior. See hvm_memory_event* > functions > + in hvm.c > + - vcpu hotplug support > + - Live migration of PVH guests. > + - Avail PVH dom0 of posted interrupts. (This will be a big win). > + > + > +Note, any emails to must be cc''d to Xen-devel@lists.xensource.com.Please let''s not add more references to this super stale mailing list. It had been @lists.xen.org for a couple of years, and recently changed to @lists.xenproject.org. Also there must be something missing between "to" and "must"... Jan> + > +Mukesh Rathor > +mukesh.rathor [at] oracle [dot] com > -- > 1.7.2.3 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Jan Beulich
2013-Jul-18 10:33 UTC
Re: [PATCH 02/24] PVH xen: update __XEN_LATEST_INTERFACE_VERSION__
>>> On 18.07.13 at 04:32, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > Update __XEN_LATEST_INTERFACE_VERSION__ to 0x00040400 because of the gdb > union changes in the next patch titled "turn gdb_frames/gdt_ents into > union". > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>I would prefer this to be part of the next patch, but there''s no need to re-do the series just because of this. Jan> --- > xen/include/public/xen-compat.h | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/xen/include/public/xen-compat.h b/xen/include/public/xen-compat.h > index 69141c4..3eb80a0 100644 > --- a/xen/include/public/xen-compat.h > +++ b/xen/include/public/xen-compat.h > @@ -27,7 +27,7 @@ > #ifndef __XEN_PUBLIC_XEN_COMPAT_H__ > #define __XEN_PUBLIC_XEN_COMPAT_H__ > > -#define __XEN_LATEST_INTERFACE_VERSION__ 0x00040300 > +#define __XEN_LATEST_INTERFACE_VERSION__ 0x00040400 > > #if defined(__XEN__) || defined(__XEN_TOOLS__) > /* Xen is built with matching headers and implements the latest interface. > */ > -- > 1.7.2.3 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Jan Beulich
2013-Jul-18 10:37 UTC
Re: [PATCH 06/24] PVH xen: hvm related preparatory changes for PVH
>>> On 18.07.13 at 04:32, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > This patch contains small changes to hvm.c because hvm_domain.params is > not set/used/supported for PVH in the present series. > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>Reviewed-by: Jan Beulich <jbeulich@suse.com>> --- > xen/arch/x86/hvm/hvm.c | 10 ++++++---- > 1 files changed, 6 insertions(+), 4 deletions(-) > > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c > index 1fcaed0..8284b3b 100644 > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -1070,10 +1070,13 @@ int hvm_vcpu_initialise(struct vcpu *v) > { > int rc; > struct domain *d = v->domain; > - domid_t dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN]; > + domid_t dm_domid; > > hvm_asid_flush_vcpu(v); > > + spin_lock_init(&v->arch.hvm_vcpu.tm_lock); > + INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list); > + > if ( (rc = vlapic_init(v)) != 0 ) > goto fail1; > > @@ -1084,6 +1087,8 @@ int hvm_vcpu_initialise(struct vcpu *v) > && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) > goto fail3; > > + dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN]; > + > /* Create ioreq event channel. */ > rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); > if ( rc < 0 ) > @@ -1106,9 +1111,6 @@ int hvm_vcpu_initialise(struct vcpu *v) > get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port; > spin_unlock(&d->arch.hvm_domain.ioreq.lock); > > - spin_lock_init(&v->arch.hvm_vcpu.tm_lock); > - INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list); > - > v->arch.hvm_vcpu.inject_trap.vector = -1; > > rc = setup_compat_arg_xlat(v); > -- > 1.7.2.3 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2013-Jul-18 10:40 UTC
Re: [PATCH 01/24] PVH xen: Add readme docs/misc/pvh-readme.txt
On Thu, 2013-07-18 at 11:32 +0100, Jan Beulich wrote:> >>> On 18.07.13 at 04:32, Mukesh Rathor <mukesh.rathor@oracle.com> wrote:> > +Note, any emails to must be cc''d to Xen-devel@lists.xensource.com. > > Please let''s not add more references to this super stale mailing > list. It had been @lists.xen.org for a couple of years, and recently > changed to @lists.xenproject.org.For some reason I thought xen-devel@lists.xenproject.org wasn''t active yet when I suggested lists.xen.org, but you sent your mail via it so I must be wrong, so I agree lets use the newest address. Ian.
Roger Pau Monné
2013-Jul-18 10:47 UTC
Re: [PATCH 00/24][V8]PVH xen: Phase I, Version 8 patches...
On 18/07/13 03:32, Mukesh Rathor wrote:> Hi all, > > This is V8 of PVH patches for xen. These are xen changes to support > boot of a 64bit PVH domU guest. Built on top of unstable git c/s: > 5d0ca62156d734a757656b9bcb6bf17ee76d37b4. > > New in V8: > - Add docs/misc/pvh-readme.txt per Konrad''s suggestion. > - Redo macros guest_kernel_mode and read_segment_register. > - Reorg and break down HVM+VMX patches to HVM and VMX as suggested. > > Patches 3/5/16 have already been "Reviewed-by". > > This patchset will also be on a public git tree in less than 24 hours and > I''ll email the details as soon as its done.Could you also push the necessary toolstack changes to the public git tree? It would make testing much more easier.
Jan Beulich
2013-Jul-18 12:29 UTC
Re: [PATCH 07/24] PVH xen: vmx related preparatory changes for PVH
>>> On 18.07.13 at 04:32, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > This is another preparatory patch for PVH. In this patch, following > functions are made available for general/public use: > vmx_fpu_enter(), get_instruction_length(), update_guest_eip(), > and vmx_dr_access(). > > There is no functionality change. > > Changes in V2: > - prepend vmx_ to get_instruction_length and update_guest_eip. > - Do not export/use vmr(). > > Changes in V3: > - Do not change emulate_forced_invalid_op() in this patch. > > Changes in V7: > - Drop pv_cpuid going public here. > > Changes in V8: > - Move vmx_fpu_enter prototype from vmcs.h to vmx.h > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>Reviewed-by: Jan Beulich <jbeulich@suse.com>> --- > xen/arch/x86/hvm/vmx/vmx.c | 72 +++++++++++++++--------------------- > xen/arch/x86/hvm/vmx/vvmx.c | 2 +- > xen/include/asm-x86/hvm/vmx/vmx.h | 17 ++++++++- > 3 files changed, 47 insertions(+), 44 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index d6540e3..195f9ed 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -577,7 +577,7 @@ static int vmx_load_vmcs_ctxt(struct vcpu *v, struct > hvm_hw_cpu *ctxt) > return 0; > } > > -static void vmx_fpu_enter(struct vcpu *v) > +void vmx_fpu_enter(struct vcpu *v) > { > vcpu_restore_fpu_lazy(v); > v->arch.hvm_vmx.exception_bitmap &= ~(1u << TRAP_no_device); > @@ -1597,24 +1597,12 @@ const struct hvm_function_table * __init > start_vmx(void) > return &vmx_function_table; > } > > -/* > - * Not all cases receive valid value in the VM-exit instruction length field. > - * Callers must know what they''re doing! > - */ > -static int get_instruction_length(void) > -{ > - int len; > - len = __vmread(VM_EXIT_INSTRUCTION_LEN); /* Safe: callers audited */ > - BUG_ON((len < 1) || (len > 15)); > - return len; > -} > - > -void update_guest_eip(void) > +void vmx_update_guest_eip(void) > { > struct cpu_user_regs *regs = guest_cpu_user_regs(); > unsigned long x; > > - regs->eip += get_instruction_length(); /* Safe: callers audited */ > + regs->eip += vmx_get_instruction_length(); /* Safe: callers audited */ > regs->eflags &= ~X86_EFLAGS_RF; > > x = __vmread(GUEST_INTERRUPTIBILITY_INFO); > @@ -1687,8 +1675,8 @@ static void vmx_do_cpuid(struct cpu_user_regs *regs) > regs->edx = edx; > } > > -static void vmx_dr_access(unsigned long exit_qualification, > - struct cpu_user_regs *regs) > +void vmx_dr_access(unsigned long exit_qualification, > + struct cpu_user_regs *regs) > { > struct vcpu *v = current; > > @@ -2301,7 +2289,7 @@ static int vmx_handle_eoi_write(void) > if ( (((exit_qualification >> 12) & 0xf) == 1) && > ((exit_qualification & 0xfff) == APIC_EOI) ) > { > - update_guest_eip(); /* Safe: APIC data write */ > + vmx_update_guest_eip(); /* Safe: APIC data write */ > vlapic_EOI_set(vcpu_vlapic(current)); > HVMTRACE_0D(VLAPIC); > return 1; > @@ -2514,7 +2502,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) > HVMTRACE_1D(TRAP, vector); > if ( v->domain->debugger_attached ) > { > - update_guest_eip(); /* Safe: INT3 */ > + vmx_update_guest_eip(); /* Safe: INT3 */ > current->arch.gdbsx_vcpu_event = TRAP_int3; > domain_pause_for_debugger(); > break; > @@ -2622,7 +2610,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) > */ > inst_len = ((source != 3) || /* CALL, IRET, or JMP? */ > (idtv_info & (1u<<10))) /* IntrType > 3? */ > - ? get_instruction_length() /* Safe: SDM 3B 23.2.4 */ : 0; > + ? vmx_get_instruction_length() /* Safe: SDM 3B 23.2.4 */ : 0; > if ( (source == 3) && (idtv_info & INTR_INFO_DELIVER_CODE_MASK) ) > ecode = __vmread(IDT_VECTORING_ERROR_CODE); > regs->eip += inst_len; > @@ -2630,15 +2618,15 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) > break; > } > case EXIT_REASON_CPUID: > - update_guest_eip(); /* Safe: CPUID */ > + vmx_update_guest_eip(); /* Safe: CPUID */ > vmx_do_cpuid(regs); > break; > case EXIT_REASON_HLT: > - update_guest_eip(); /* Safe: HLT */ > + vmx_update_guest_eip(); /* Safe: HLT */ > hvm_hlt(regs->eflags); > break; > case EXIT_REASON_INVLPG: > - update_guest_eip(); /* Safe: INVLPG */ > + vmx_update_guest_eip(); /* Safe: INVLPG */ > exit_qualification = __vmread(EXIT_QUALIFICATION); > vmx_invlpg_intercept(exit_qualification); > break; > @@ -2646,7 +2634,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) > regs->ecx = hvm_msr_tsc_aux(v); > /* fall through */ > case EXIT_REASON_RDTSC: > - update_guest_eip(); /* Safe: RDTSC, RDTSCP */ > + vmx_update_guest_eip(); /* Safe: RDTSC, RDTSCP */ > hvm_rdtsc_intercept(regs); > break; > case EXIT_REASON_VMCALL: > @@ -2656,7 +2644,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) > rc = hvm_do_hypercall(regs); > if ( rc != HVM_HCALL_preempted ) > { > - update_guest_eip(); /* Safe: VMCALL */ > + vmx_update_guest_eip(); /* Safe: VMCALL */ > if ( rc == HVM_HCALL_invalidate ) > send_invalidate_req(); > } > @@ -2666,7 +2654,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) > { > exit_qualification = __vmread(EXIT_QUALIFICATION); > if ( vmx_cr_access(exit_qualification) == X86EMUL_OKAY ) > - update_guest_eip(); /* Safe: MOV Cn, LMSW, CLTS */ > + vmx_update_guest_eip(); /* Safe: MOV Cn, LMSW, CLTS */ > break; > } > case EXIT_REASON_DR_ACCESS: > @@ -2680,7 +2668,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) > { > regs->eax = (uint32_t)msr_content; > regs->edx = (uint32_t)(msr_content >> 32); > - update_guest_eip(); /* Safe: RDMSR */ > + vmx_update_guest_eip(); /* Safe: RDMSR */ > } > break; > } > @@ -2689,63 +2677,63 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) > uint64_t msr_content; > msr_content = ((uint64_t)regs->edx << 32) | (uint32_t)regs->eax; > if ( hvm_msr_write_intercept(regs->ecx, msr_content) == X86EMUL_OKAY > ) > - update_guest_eip(); /* Safe: WRMSR */ > + vmx_update_guest_eip(); /* Safe: WRMSR */ > break; > } > > case EXIT_REASON_VMXOFF: > if ( nvmx_handle_vmxoff(regs) == X86EMUL_OKAY ) > - update_guest_eip(); > + vmx_update_guest_eip(); > break; > > case EXIT_REASON_VMXON: > if ( nvmx_handle_vmxon(regs) == X86EMUL_OKAY ) > - update_guest_eip(); > + vmx_update_guest_eip(); > break; > > case EXIT_REASON_VMCLEAR: > if ( nvmx_handle_vmclear(regs) == X86EMUL_OKAY ) > - update_guest_eip(); > + vmx_update_guest_eip(); > break; > > case EXIT_REASON_VMPTRLD: > if ( nvmx_handle_vmptrld(regs) == X86EMUL_OKAY ) > - update_guest_eip(); > + vmx_update_guest_eip(); > break; > > case EXIT_REASON_VMPTRST: > if ( nvmx_handle_vmptrst(regs) == X86EMUL_OKAY ) > - update_guest_eip(); > + vmx_update_guest_eip(); > break; > > case EXIT_REASON_VMREAD: > if ( nvmx_handle_vmread(regs) == X86EMUL_OKAY ) > - update_guest_eip(); > + vmx_update_guest_eip(); > break; > > case EXIT_REASON_VMWRITE: > if ( nvmx_handle_vmwrite(regs) == X86EMUL_OKAY ) > - update_guest_eip(); > + vmx_update_guest_eip(); > break; > > case EXIT_REASON_VMLAUNCH: > if ( nvmx_handle_vmlaunch(regs) == X86EMUL_OKAY ) > - update_guest_eip(); > + vmx_update_guest_eip(); > break; > > case EXIT_REASON_VMRESUME: > if ( nvmx_handle_vmresume(regs) == X86EMUL_OKAY ) > - update_guest_eip(); > + vmx_update_guest_eip(); > break; > > case EXIT_REASON_INVEPT: > if ( nvmx_handle_invept(regs) == X86EMUL_OKAY ) > - update_guest_eip(); > + vmx_update_guest_eip(); > break; > > case EXIT_REASON_INVVPID: > if ( nvmx_handle_invvpid(regs) == X86EMUL_OKAY ) > - update_guest_eip(); > + vmx_update_guest_eip(); > break; > > case EXIT_REASON_MWAIT_INSTRUCTION: > @@ -2793,14 +2781,14 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) > int bytes = (exit_qualification & 0x07) + 1; > int dir = (exit_qualification & 0x08) ? IOREQ_READ : > IOREQ_WRITE; > if ( handle_pio(port, bytes, dir) ) > - update_guest_eip(); /* Safe: IN, OUT */ > + vmx_update_guest_eip(); /* Safe: IN, OUT */ > } > break; > > case EXIT_REASON_INVD: > case EXIT_REASON_WBINVD: > { > - update_guest_eip(); /* Safe: INVD, WBINVD */ > + vmx_update_guest_eip(); /* Safe: INVD, WBINVD */ > vmx_wbinvd_intercept(); > break; > } > @@ -2832,7 +2820,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) > case EXIT_REASON_XSETBV: > if ( hvm_handle_xsetbv(regs->ecx, > (regs->rdx << 32) | regs->_eax) == 0 ) > - update_guest_eip(); /* Safe: XSETBV */ > + vmx_update_guest_eip(); /* Safe: XSETBV */ > break; > > case EXIT_REASON_APIC_WRITE: > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c > index 5dfbc54..82be4cc 100644 > --- a/xen/arch/x86/hvm/vmx/vvmx.c > +++ b/xen/arch/x86/hvm/vmx/vvmx.c > @@ -2139,7 +2139,7 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs, > tsc += __get_vvmcs(nvcpu->nv_vvmcx, TSC_OFFSET); > regs->eax = (uint32_t)tsc; > regs->edx = (uint32_t)(tsc >> 32); > - update_guest_eip(); > + vmx_update_guest_eip(); > > return 1; > } > diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h > b/xen/include/asm-x86/hvm/vmx/vmx.h > index c33b9f9..c21a303 100644 > --- a/xen/include/asm-x86/hvm/vmx/vmx.h > +++ b/xen/include/asm-x86/hvm/vmx/vmx.h > @@ -446,6 +446,18 @@ static inline int __vmxon(u64 addr) > return rc; > } > > +/* > + * Not all cases receive valid value in the VM-exit instruction length > field. > + * Callers must know what they''re doing! > + */ > +static inline int vmx_get_instruction_length(void) > +{ > + int len; > + len = __vmread(VM_EXIT_INSTRUCTION_LEN); /* Safe: callers audited */ > + BUG_ON((len < 1) || (len > 15)); > + return len; > +} > + > void vmx_get_segment_register(struct vcpu *, enum x86_segment, > struct segment_register *); > void vmx_inject_extint(int trap); > @@ -457,7 +469,10 @@ void ept_p2m_uninit(struct p2m_domain *p2m); > void ept_walk_table(struct domain *d, unsigned long gfn); > void setup_ept_dump(void); > > -void update_guest_eip(void); > +void vmx_update_guest_eip(void); > +void vmx_dr_access(unsigned long exit_qualification, > + struct cpu_user_regs *regs); > +void vmx_fpu_enter(struct vcpu *v); > > int alloc_p2m_hap_data(struct p2m_domain *p2m); > void free_p2m_hap_data(struct p2m_domain *p2m); > -- > 1.7.2.3 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Jan Beulich
2013-Jul-18 12:32 UTC
Re: [PATCH 08/24] PVH xen: vmcs related preparatory changes for PVH
>>> On 18.07.13 at 04:32, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > In this patch, some common code is factored out of construct_vmcs() to create > vmx_set_common_host_vmcs_fields() to be used by PVH. > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>Reviewed-by: Jan Beulich <jbeulich@suse.com>> --- > xen/arch/x86/hvm/vmx/vmcs.c | 58 +++++++++++++++++++++++------------------- > 1 files changed, 32 insertions(+), 26 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c > index de9f592..36f167f 100644 > --- a/xen/arch/x86/hvm/vmx/vmcs.c > +++ b/xen/arch/x86/hvm/vmx/vmcs.c > @@ -825,11 +825,40 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 > vmcs_encoding, u64 val) > virtual_vmcs_exit(vvmcs); > } > > -static int construct_vmcs(struct vcpu *v) > +static void vmx_set_common_host_vmcs_fields(struct vcpu *v) > { > - struct domain *d = v->domain; > uint16_t sysenter_cs; > unsigned long sysenter_eip; > + > + /* Host data selectors. */ > + __vmwrite(HOST_SS_SELECTOR, __HYPERVISOR_DS); > + __vmwrite(HOST_DS_SELECTOR, __HYPERVISOR_DS); > + __vmwrite(HOST_ES_SELECTOR, __HYPERVISOR_DS); > + __vmwrite(HOST_FS_SELECTOR, 0); > + __vmwrite(HOST_GS_SELECTOR, 0); > + __vmwrite(HOST_FS_BASE, 0); > + __vmwrite(HOST_GS_BASE, 0); > + > + /* Host control registers. */ > + v->arch.hvm_vmx.host_cr0 = read_cr0() | X86_CR0_TS; > + __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); > + __vmwrite(HOST_CR4, > + mmu_cr4_features | (xsave_enabled(v) ? X86_CR4_OSXSAVE : 0)); > + > + /* Host CS:RIP. */ > + __vmwrite(HOST_CS_SELECTOR, __HYPERVISOR_CS); > + __vmwrite(HOST_RIP, (unsigned long)vmx_asm_vmexit_handler); > + > + /* Host SYSENTER CS:RIP. */ > + rdmsrl(MSR_IA32_SYSENTER_CS, sysenter_cs); > + __vmwrite(HOST_SYSENTER_CS, sysenter_cs); > + rdmsrl(MSR_IA32_SYSENTER_EIP, sysenter_eip); > + __vmwrite(HOST_SYSENTER_EIP, sysenter_eip); > +} > + > +static int construct_vmcs(struct vcpu *v) > +{ > + struct domain *d = v->domain; > u32 vmexit_ctl = vmx_vmexit_control; > u32 vmentry_ctl = vmx_vmentry_control; > > @@ -932,30 +961,7 @@ static int construct_vmcs(struct vcpu *v) > __vmwrite(POSTED_INTR_NOTIFICATION_VECTOR, posted_intr_vector); > } > > - /* Host data selectors. */ > - __vmwrite(HOST_SS_SELECTOR, __HYPERVISOR_DS); > - __vmwrite(HOST_DS_SELECTOR, __HYPERVISOR_DS); > - __vmwrite(HOST_ES_SELECTOR, __HYPERVISOR_DS); > - __vmwrite(HOST_FS_SELECTOR, 0); > - __vmwrite(HOST_GS_SELECTOR, 0); > - __vmwrite(HOST_FS_BASE, 0); > - __vmwrite(HOST_GS_BASE, 0); > - > - /* Host control registers. */ > - v->arch.hvm_vmx.host_cr0 = read_cr0() | X86_CR0_TS; > - __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); > - __vmwrite(HOST_CR4, > - mmu_cr4_features | (xsave_enabled(v) ? X86_CR4_OSXSAVE : 0)); > - > - /* Host CS:RIP. */ > - __vmwrite(HOST_CS_SELECTOR, __HYPERVISOR_CS); > - __vmwrite(HOST_RIP, (unsigned long)vmx_asm_vmexit_handler); > - > - /* Host SYSENTER CS:RIP. */ > - rdmsrl(MSR_IA32_SYSENTER_CS, sysenter_cs); > - __vmwrite(HOST_SYSENTER_CS, sysenter_cs); > - rdmsrl(MSR_IA32_SYSENTER_EIP, sysenter_eip); > - __vmwrite(HOST_SYSENTER_EIP, sysenter_eip); > + vmx_set_common_host_vmcs_fields(v); > > /* MSR intercepts. */ > __vmwrite(VM_EXIT_MSR_LOAD_COUNT, 0); > -- > 1.7.2.3 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Jan Beulich
2013-Jul-18 12:43 UTC
Re: [PATCH 09/24] PVH xen: Introduce PVH guest type and some basic changes.
>>> On 18.07.13 at 04:32, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > Chagnes in V8:Same spelling typo in all of the change title lines...> - Got to VMCS for DPL check instead of checking the rpl in > guest_kernel_mode. Note, we drop the const qualifier from > vcpu_show_registers() to accomodate the hvm function call in > guest_kernel_mode(). > - Also, hvm_kernel_mode is put in hvm.c because it''s called from > guest_kernel_mode in regs.h which is a pretty early header include. > Hence, we can''t place it in hvm.h like other similar functions.Are you saying that because you tried it, or just because it looks like so? The use of the function is in a macro, and hence if the macro isn''t used too early this could still work out. I say this because the function would clearly benefit from getting inlined.> --- a/xen/include/asm-x86/desc.h > +++ b/xen/include/asm-x86/desc.h > @@ -38,7 +38,9 @@ > > #ifndef __ASSEMBLY__ > > -#define GUEST_KERNEL_RPL(d) (is_pv_32bit_domain(d) ? 1 : 3) > +/* PVH 32bitfixme : see emulate_gate_op call from do_general_protection */ > +#define GUEST_KERNEL_RPL(d) ({ ASSERT(!is_pvh_domain(d)); \ > + is_pv_32bit_domain(d) ? 1 : 3; })Sorry for having overlooked this earlier - this really ought to assert for is_pv_domain(), if any assertion is to be added here at all.> --- a/xen/include/asm-x86/x86_64/regs.h > +++ b/xen/include/asm-x86/x86_64/regs.h > @@ -10,10 +10,13 @@ > #define ring_2(r) (((r)->cs & 3) == 2) > #define ring_3(r) (((r)->cs & 3) == 3) > > -#define guest_kernel_mode(v, r) \ > - (!is_pv_32bit_vcpu(v) ? \ > - (ring_3(r) && ((v)->arch.flags & TF_kernel_mode)) : \ > - (ring_1(r))) > +bool_t hvm_kernel_mode(struct vcpu *); > + > +#define guest_kernel_mode(v, r) \ > + (is_pvh_vcpu(v) ? (hvm_kernel_mode(v)) : \ > + (!is_pv_32bit_vcpu(v) ? \ > + (ring_3(r) && ((v)->arch.flags & TF_kernel_mode)) : \ > + (ring_1(r))))While I see that you attempt to follow the style used here before, it has been inconsistent and shouldn''t be made worse: Please drop the extra parentheses around function calls / function-like macro invocations. Jan
Jan Beulich
2013-Jul-18 13:14 UTC
Re: [PATCH 10/24] PVH xen: introduce pvh_set_vcpu_info() and vmx_pvh_set_vcpu_info()
>>> On 18.07.13 at 04:32, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > +/* > + * Set vmcs fields in support of vcpu_op -> VCPUOP_initialise hcall. Called > + * from arch_set_info_guest() which sets the (PVH relevant) non-vmcs fields. > + * > + * In case of linux: > + * The boot vcpu calls this to set some context for the non boot smp vcpu. > + * The call comes from cpu_initialize_context(). (boot vcpu 0 context is > + * set by the tools via do_domctl -> vcpu_initialise). > + * > + * NOTE: In case of VMCS, loading a selector doesn''t cause the hidden fields > + * to be automatically loaded. We load selectors here but not the hidden > + * parts. This means we require the guest to have same hidden values > + * as the default values loaded in the vmcs in pvh_construct_vmcs(), ie, > + * the GDT the vcpu is coming up on should be something like following > + * on linux (for 64bit, CS:0x10 DS/SS:0x18) : > + * > + * ffff88007f704000: 0000000000000000 00cf9b000000ffff > + * ffff88007f704010: 00af9b000000ffff 00cf93000000ffff > + * ffff88007f704020: 00cffb000000ffff 00cff3000000ffff > + * > + */This comment should reflect reality as closely as possible, or else it''ll just cause confusion rather than clarifying things. In particular, the hidden base fields of FS and GS get set below, and hence the comment should say so.> +int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context *ctxtp) > +{ > + if ( v->vcpu_id == 0 ) > + return 0; > + > + if ( !(ctxtp->flags & VGCF_in_kernel) ) > + return -EINVAL;So you check for kernel mode now, ...> + > + vmx_vmcs_enter(v); > + __vmwrite(GUEST_GDTR_BASE, ctxtp->gdt.pvh.addr); > + __vmwrite(GUEST_GDTR_LIMIT, ctxtp->gdt.pvh.limit); > + __vmwrite(GUEST_LDTR_BASE, ctxtp->ldt_base); > + __vmwrite(GUEST_LDTR_LIMIT, ctxtp->ldt_ents); > + > + __vmwrite(GUEST_FS_BASE, ctxtp->fs_base); > + __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_user);... but then write the user GS base here ...> + > + __vmwrite(GUEST_CS_SELECTOR, ctxtp->user_regs.cs); > + __vmwrite(GUEST_SS_SELECTOR, ctxtp->user_regs.ss); > + __vmwrite(GUEST_ES_SELECTOR, ctxtp->user_regs.es); > + __vmwrite(GUEST_DS_SELECTOR, ctxtp->user_regs.ds); > + __vmwrite(GUEST_FS_SELECTOR, ctxtp->user_regs.fs); > + __vmwrite(GUEST_GS_SELECTOR, ctxtp->user_regs.gs); > + > + if ( vmx_add_guest_msr(MSR_SHADOW_GS_BASE) ) > + { > + vmx_vmcs_exit(v); > + return -EINVAL; > + } > + vmx_write_guest_msr(MSR_SHADOW_GS_BASE, ctxtp->gs_base_kernel);... and the kernel one here? That looks the wrong way round to me. Jan
Jan Beulich
2013-Jul-18 13:16 UTC
Re: [PATCH 11/24] PVH xen: domain create, schedular related code changes
>>> On 18.07.13 at 04:32, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > This patch mostly contains changes to arch/x86/domain.c to allow for a PVH > domain creation. The new function pvh_set_vcpu_info(), introduced in the > previous patch, is called here to set some guest context in the VMCS. > This patch also changes the context_switch code in the same file to follow > HVM behaviour for PVH. > > Changes in V2: > - changes to read_segment_register() moved to this patch. > - The other comment was to create NULL functions for pvh_set_vcpu_info > and pvh_read_descriptor which are implemented in later patch, but since > I disable PVH creation until all patches are checked in, it is not > needed. > But it helps breaking down of patches. > > Changes in V3: > - Fix read_segment_register() macro to make sure args are evaluated once, > and use # instead of STR for name in the macro. > > Changes in V4: > - Remove pvh substruct in the hvm substruct, as the vcpu_info_mfn has been > moved out of pv_vcpu struct. > - rename hvm_pvh_* functions to hvm_*. > > Changes in V5: > - remove pvh_read_descriptor(). > > Changes in V7: > - remove hap_update_cr3() and read_segment_register changes from here. > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>Re the title: I don''t see anything scheduler related in here. Re the contents: Reviewed-by: Jan Beulich <jbeulich@suse.com>> --- > xen/arch/x86/domain.c | 56 ++++++++++++++++++++++++++++++++---------------- > xen/arch/x86/mm.c | 3 ++ > 2 files changed, 40 insertions(+), 19 deletions(-) > > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c > index c361abf..fccb4ee 100644 > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -385,7 +385,7 @@ int vcpu_initialise(struct vcpu *v) > > vmce_init_vcpu(v); > > - if ( is_hvm_domain(d) ) > + if ( !is_pv_domain(d) ) > { > rc = hvm_vcpu_initialise(v); > goto done; > @@ -452,7 +452,7 @@ void vcpu_destroy(struct vcpu *v) > > vcpu_destroy_fpu(v); > > - if ( is_hvm_vcpu(v) ) > + if ( !is_pv_vcpu(v) ) > hvm_vcpu_destroy(v); > else > xfree(v->arch.pv_vcpu.trap_ctxt); > @@ -464,7 +464,7 @@ int arch_domain_create(struct domain *d, unsigned int > domcr_flags) > int rc = -ENOMEM; > > d->arch.hvm_domain.hap_enabled > - is_hvm_domain(d) && > + !is_pv_domain(d) && > hvm_funcs.hap_supported && > (domcr_flags & DOMCRF_hap); > d->arch.hvm_domain.mem_sharing_enabled = 0; > @@ -512,7 +512,7 @@ int arch_domain_create(struct domain *d, unsigned int > domcr_flags) > mapcache_domain_init(d); > > HYPERVISOR_COMPAT_VIRT_START(d) > - is_hvm_domain(d) ? ~0u : __HYPERVISOR_COMPAT_VIRT_START; > + is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u; > > if ( (rc = paging_domain_init(d, domcr_flags)) != 0 ) > goto fail; > @@ -555,7 +555,7 @@ int arch_domain_create(struct domain *d, unsigned int > domcr_flags) > } > spin_lock_init(&d->arch.e820_lock); > > - if ( is_hvm_domain(d) ) > + if ( !is_pv_domain(d) ) > { > if ( (rc = hvm_domain_initialise(d)) != 0 ) > { > @@ -651,7 +651,7 @@ int arch_set_info_guest( > #define c(fld) (compat ? (c.cmp->fld) : (c.nat->fld)) > flags = c(flags); > > - if ( !is_hvm_vcpu(v) ) > + if ( is_pv_vcpu(v) ) > { > if ( !compat ) > { > @@ -704,7 +704,7 @@ int arch_set_info_guest( > v->fpu_initialised = !!(flags & VGCF_I387_VALID); > > v->arch.flags &= ~TF_kernel_mode; > - if ( (flags & VGCF_in_kernel) || is_hvm_vcpu(v)/*???*/ ) > + if ( (flags & VGCF_in_kernel) || !is_pv_vcpu(v)/*???*/ ) > v->arch.flags |= TF_kernel_mode; > > v->arch.vgc_flags = flags; > @@ -719,7 +719,7 @@ int arch_set_info_guest( > if ( !compat ) > { > memcpy(&v->arch.user_regs, &c.nat->user_regs, sizeof(c.nat->user_regs)); > - if ( !is_hvm_vcpu(v) ) > + if ( is_pv_vcpu(v) ) > memcpy(v->arch.pv_vcpu.trap_ctxt, c.nat->trap_ctxt, > sizeof(c.nat->trap_ctxt)); > } > @@ -735,10 +735,13 @@ int arch_set_info_guest( > > v->arch.user_regs.eflags |= 2; > > - if ( is_hvm_vcpu(v) ) > + if ( !is_pv_vcpu(v) ) > { > hvm_set_info_guest(v); > - goto out; > + if ( is_hvm_vcpu(v) || v->is_initialised ) > + goto out; > + else > + goto pvh_skip_pv_stuff; > } > > init_int80_direct_trap(v); > @@ -853,6 +856,7 @@ int arch_set_info_guest( > > set_bit(_VPF_in_reset, &v->pause_flags); > > + pvh_skip_pv_stuff: > if ( !compat ) > cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[3]); > else > @@ -861,7 +865,7 @@ int arch_set_info_guest( > > if ( !cr3_page ) > rc = -EINVAL; > - else if ( paging_mode_refcounts(d) ) > + else if ( paging_mode_refcounts(d) || is_pvh_vcpu(v) ) > /* nothing */; > else if ( cr3_page == v->arch.old_guest_table ) > { > @@ -893,8 +897,15 @@ int arch_set_info_guest( > /* handled below */; > else if ( !compat ) > { > + /* PVH 32bitfixme. */ > + if ( is_pvh_vcpu(v) ) > + { > + v->arch.cr3 = page_to_mfn(cr3_page); > + v->arch.hvm_vcpu.guest_cr[3] = c.nat->ctrlreg[3]; > + } > + > v->arch.guest_table = pagetable_from_page(cr3_page); > - if ( c.nat->ctrlreg[1] ) > + if ( c.nat->ctrlreg[1] && !is_pvh_vcpu(v) ) > { > cr3_gfn = xen_cr3_to_pfn(c.nat->ctrlreg[1]); > cr3_page = get_page_from_gfn(d, cr3_gfn, NULL, P2M_ALLOC); > @@ -954,6 +965,13 @@ int arch_set_info_guest( > > update_cr3(v); > > + if ( is_pvh_vcpu(v) ) > + { > + /* Set VMCS fields. */ > + if ( (rc = pvh_set_vcpu_info(v, c.nat)) != 0 ) > + return rc; > + } > + > out: > if ( flags & VGCF_online ) > clear_bit(_VPF_down, &v->pause_flags); > @@ -1315,7 +1333,7 @@ static void update_runstate_area(struct vcpu *v) > > static inline int need_full_gdt(struct vcpu *v) > { > - return (!is_hvm_vcpu(v) && !is_idle_vcpu(v)); > + return (is_pv_vcpu(v) && !is_idle_vcpu(v)); > } > > static void __context_switch(void) > @@ -1450,7 +1468,7 @@ void context_switch(struct vcpu *prev, struct vcpu > *next) > /* Re-enable interrupts before restoring state which may fault. */ > local_irq_enable(); > > - if ( !is_hvm_vcpu(next) ) > + if ( is_pv_vcpu(next) ) > { > load_LDT(next); > load_segments(next); > @@ -1576,12 +1594,12 @@ unsigned long hypercall_create_continuation( > regs->eax = op; > > /* Ensure the hypercall trap instruction is re-executed. */ > - if ( !is_hvm_vcpu(current) ) > + if ( is_pv_vcpu(current) ) > regs->eip -= 2; /* re-execute ''syscall'' / ''int $xx'' */ > else > current->arch.hvm_vcpu.hcall_preempted = 1; > > - if ( !is_hvm_vcpu(current) ? > + if ( is_pv_vcpu(current) ? > !is_pv_32on64_vcpu(current) : > (hvm_guest_x86_mode(current) == 8) ) > { > @@ -1849,7 +1867,7 @@ int domain_relinquish_resources(struct domain *d) > return ret; > } > > - if ( !is_hvm_domain(d) ) > + if ( is_pv_domain(d) ) > { > for_each_vcpu ( d, v ) > { > @@ -1922,7 +1940,7 @@ int domain_relinquish_resources(struct domain *d) > BUG(); > } > > - if ( is_hvm_domain(d) ) > + if ( !is_pv_domain(d) ) > hvm_domain_relinquish_resources(d); > > return 0; > @@ -2006,7 +2024,7 @@ void vcpu_mark_events_pending(struct vcpu *v) > if ( already_pending ) > return; > > - if ( is_hvm_vcpu(v) ) > + if ( !is_pv_vcpu(v) ) > hvm_assert_evtchn_irq(v); > else > vcpu_kick(v); > diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c > index e980431..de7ba45 100644 > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -4334,6 +4334,9 @@ void destroy_gdt(struct vcpu *v) > int i; > unsigned long pfn; > > + if ( is_pvh_vcpu(v) ) > + return; > + > v->arch.pv_vcpu.gdt_ents = 0; > pl1e = gdt_ldt_ptes(v->domain, v); > for ( i = 0; i < FIRST_RESERVED_GDT_PAGE; i++ ) > -- > 1.7.2.3 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Jan Beulich
2013-Jul-18 13:17 UTC
Re: [PATCH 12/24] PVH xen: support invalid op emulation for PVH
>>> On 18.07.13 at 04:32, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > This patch supports invalid op emulation for PVH by calling appropriate > copy macros and and HVM function to inject PF. > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>Reviewed-by: Jan Beulich <jbeulich@suse.com>> --- > xen/arch/x86/traps.c | 17 ++++++++++++++--- > xen/include/asm-x86/traps.h | 1 + > 2 files changed, 15 insertions(+), 3 deletions(-) > > diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c > index 378ef0a..a3ca70b 100644 > --- a/xen/arch/x86/traps.c > +++ b/xen/arch/x86/traps.c > @@ -459,6 +459,11 @@ static void instruction_done( > struct cpu_user_regs *regs, unsigned long eip, unsigned int bpmatch) > { > regs->eip = eip; > + > + /* PVH fixme: debug trap below */ > + if ( is_pvh_vcpu(current) ) > + return; > + > regs->eflags &= ~X86_EFLAGS_RF; > if ( bpmatch || (regs->eflags & X86_EFLAGS_TF) ) > { > @@ -913,7 +918,7 @@ static int emulate_invalid_rdtscp(struct cpu_user_regs > *regs) > return EXCRET_fault_fixed; > } > > -static int emulate_forced_invalid_op(struct cpu_user_regs *regs) > +int emulate_forced_invalid_op(struct cpu_user_regs *regs) > { > char sig[5], instr[2]; > unsigned long eip, rc; > @@ -921,7 +926,7 @@ static int emulate_forced_invalid_op(struct cpu_user_regs > *regs) > eip = regs->eip; > > /* Check for forced emulation signature: ud2 ; .ascii "xen". */ > - if ( (rc = copy_from_user(sig, (char *)eip, sizeof(sig))) != 0 ) > + if ( (rc = raw_copy_from_guest(sig, (char *)eip, sizeof(sig))) != 0 ) > { > propagate_page_fault(eip + sizeof(sig) - rc, 0); > return EXCRET_fault_fixed; > @@ -931,7 +936,7 @@ static int emulate_forced_invalid_op(struct cpu_user_regs > *regs) > eip += sizeof(sig); > > /* We only emulate CPUID. */ > - if ( ( rc = copy_from_user(instr, (char *)eip, sizeof(instr))) != 0 ) > + if ( ( rc = raw_copy_from_guest(instr, (char *)eip, sizeof(instr))) != > 0 ) > { > propagate_page_fault(eip + sizeof(instr) - rc, 0); > return EXCRET_fault_fixed; > @@ -1076,6 +1081,12 @@ void propagate_page_fault(unsigned long addr, u16 > error_code) > struct vcpu *v = current; > struct trap_bounce *tb = &v->arch.pv_vcpu.trap_bounce; > > + if ( is_pvh_vcpu(v) ) > + { > + hvm_inject_page_fault(error_code, addr); > + return; > + } > + > v->arch.pv_vcpu.ctrlreg[2] = addr; > arch_set_cr2(v, addr); > > diff --git a/xen/include/asm-x86/traps.h b/xen/include/asm-x86/traps.h > index 82cbcee..1d9b087 100644 > --- a/xen/include/asm-x86/traps.h > +++ b/xen/include/asm-x86/traps.h > @@ -48,5 +48,6 @@ extern int guest_has_trap_callback(struct domain *d, > uint16_t vcpuid, > */ > extern int send_guest_trap(struct domain *d, uint16_t vcpuid, > unsigned int trap_nr); > +int emulate_forced_invalid_op(struct cpu_user_regs *regs); > > #endif /* ASM_TRAP_H */ > -- > 1.7.2.3 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Jan Beulich
2013-Jul-18 13:29 UTC
Re: [PATCH 13/24] PVH xen: Support privileged op emulation for PVH
>>> On 18.07.13 at 04:32, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > This patch changes mostly traps.c to support privileged op emulation for PVH. > A new function read_descriptor_sel() is introduced to read descriptor for > PVH > given a selector. Another new function vmx_read_selector() reads a selector > from VMCS, to support read_segment_register() for PVH. > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>Reviewed-by: Jan Beulich <jbeulich@suse.com>> --- > xen/arch/x86/hvm/vmx/vmx.c | 40 +++++++++++++++++++ > xen/arch/x86/traps.c | 86 +++++++++++++++++++++++++++++++++++----- > xen/include/asm-x86/hvm/hvm.h | 7 +++ > xen/include/asm-x86/system.h | 19 +++++++-- > 4 files changed, 137 insertions(+), 15 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index faf8b46..9be321d 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -664,6 +664,45 @@ static void vmx_ctxt_switch_to(struct vcpu *v) > .fields = { .type = 0xb, .s = 0, .dpl = 0, .p = 1, .avl = 0, \ > .l = 0, .db = 0, .g = 0, .pad = 0 } }).bytes) > > +u16 vmx_read_selector(struct vcpu *v, enum x86_segment seg) > +{ > + u16 sel = 0; > + > + vmx_vmcs_enter(v); > + switch ( seg ) > + { > + case x86_seg_cs: > + sel = __vmread(GUEST_CS_SELECTOR); > + break; > + > + case x86_seg_ss: > + sel = __vmread(GUEST_SS_SELECTOR); > + break; > + > + case x86_seg_es: > + sel = __vmread(GUEST_ES_SELECTOR); > + break; > + > + case x86_seg_ds: > + sel = __vmread(GUEST_DS_SELECTOR); > + break; > + > + case x86_seg_fs: > + sel = __vmread(GUEST_FS_SELECTOR); > + break; > + > + case x86_seg_gs: > + sel = __vmread(GUEST_GS_SELECTOR); > + break; > + > + default: > + BUG(); > + } > + vmx_vmcs_exit(v); > + > + return sel; > +} > + > void vmx_get_segment_register(struct vcpu *v, enum x86_segment seg, > struct segment_register *reg) > { > @@ -1559,6 +1598,7 @@ static struct hvm_function_table __initdata > vmx_function_table = { > .sync_pir_to_irr = vmx_sync_pir_to_irr, > .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m, > .pvh_set_vcpu_info = vmx_pvh_set_vcpu_info, > + .read_selector = vmx_read_selector, > }; > > const struct hvm_function_table * __init start_vmx(void) > diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c > index a3ca70b..fe8b94c 100644 > --- a/xen/arch/x86/traps.c > +++ b/xen/arch/x86/traps.c > @@ -480,6 +480,10 @@ static unsigned int check_guest_io_breakpoint(struct > vcpu *v, > unsigned int width, i, match = 0; > unsigned long start; > > + /* PVH fixme: support io breakpoint. */ > + if ( is_pvh_vcpu(v) ) > + return 0; > + > if ( !(v->arch.debugreg[5]) || > !(v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_DE) ) > return 0; > @@ -1525,6 +1529,49 @@ static int read_descriptor(unsigned int sel, > return 1; > } > > +static int read_descriptor_sel(unsigned int sel, > + enum x86_segment which_sel, > + struct vcpu *v, > + const struct cpu_user_regs *regs, > + unsigned long *base, > + unsigned long *limit, > + unsigned int *ar, > + unsigned int vm86attr) > +{ > + struct segment_register seg; > + bool_t long_mode; > + > + if ( !is_pvh_vcpu(v) ) > + return read_descriptor(sel, v, regs, base, limit, ar, vm86attr); > + > + hvm_get_segment_register(v, x86_seg_cs, &seg); > + long_mode = seg.attr.fields.l; > + > + if ( which_sel != x86_seg_cs ) > + hvm_get_segment_register(v, which_sel, &seg); > + > + /* "ar" is returned packed as in segment_attributes_t. Fix it up. */ > + *ar = seg.attr.bytes; > + *ar = (*ar & 0xff ) | ((*ar & 0xf00) << 4); > + *ar <<= 8; > + > + if ( long_mode ) > + { > + *limit = ~0UL; > + > + if ( which_sel < x86_seg_fs ) > + { > + *base = 0UL; > + return 1; > + } > + } > + else > + *limit = seg.limit; > + > + *base = seg.base; > + return 1; > +} > + > static int read_gate_descriptor(unsigned int gate_sel, > const struct vcpu *v, > unsigned int *sel, > @@ -1590,6 +1637,13 @@ static int guest_io_okay( > int user_mode = !(v->arch.flags & TF_kernel_mode); > #define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v) > > + /* > + * For PVH we check this in vmexit for EXIT_REASON_IO_INSTRUCTION > + * and so don''t need to check again here. > + */ > + if ( is_pvh_vcpu(v) ) > + return 1; > + > if ( !vm86_mode(regs) && > (v->arch.pv_vcpu.iopl >= (guest_kernel_mode(v, regs) ? 1 : 3)) ) > return 1; > @@ -1835,7 +1889,7 @@ static inline uint64_t guest_misc_enable(uint64_t val) > _ptr = (unsigned int)_ptr; > \ > if ( (limit) < sizeof(_x) - 1 || (eip) > (limit) - (sizeof(_x) - 1) ) \ > goto fail; > \ > - if ( (_rc = copy_from_user(&_x, (type *)_ptr, sizeof(_x))) != 0 ) \ > + if ( (_rc = raw_copy_from_guest(&_x, (type *)_ptr, sizeof(_x))) != 0 ) > \ > { > \ > propagate_page_fault(_ptr + sizeof(_x) - _rc, 0); > \ > goto skip; > \ > @@ -1852,6 +1906,7 @@ static int is_cpufreq_controller(struct domain *d) > > static int emulate_privileged_op(struct cpu_user_regs *regs) > { > + enum x86_segment which_sel; > struct vcpu *v = current; > unsigned long *reg, eip = regs->eip; > u8 opcode, modrm_reg = 0, modrm_rm = 0, rep_prefix = 0, lock = 0, rex = > 0; > @@ -1874,9 +1929,10 @@ static int emulate_privileged_op(struct cpu_user_regs > *regs) > void (*io_emul)(struct cpu_user_regs *) > __attribute__((__regparm__(1))); > uint64_t val, msr_content; > > - if ( !read_descriptor(regs->cs, v, regs, > - &code_base, &code_limit, &ar, > - _SEGMENT_CODE|_SEGMENT_S|_SEGMENT_DPL|_SEGMENT_P) > ) > + if ( !read_descriptor_sel(regs->cs, x86_seg_cs, v, regs, > + &code_base, &code_limit, &ar, > + _SEGMENT_CODE|_SEGMENT_S| > + _SEGMENT_DPL|_SEGMENT_P) ) > goto fail; > op_default = op_bytes = (ar & (_SEGMENT_L|_SEGMENT_DB)) ? 4 : 2; > ad_default = ad_bytes = (ar & _SEGMENT_L) ? 8 : op_default; > @@ -1887,6 +1943,7 @@ static int emulate_privileged_op(struct cpu_user_regs > *regs) > > /* emulating only opcodes not allowing SS to be default */ > data_sel = read_segment_register(v, regs, ds); > + which_sel = x86_seg_ds; > > /* Legacy prefixes. */ > for ( i = 0; i < 8; i++, rex == opcode || (rex = 0) ) > @@ -1902,23 +1959,29 @@ static int emulate_privileged_op(struct cpu_user_regs > *regs) > continue; > case 0x2e: /* CS override */ > data_sel = regs->cs; > + which_sel = x86_seg_cs; > continue; > case 0x3e: /* DS override */ > data_sel = read_segment_register(v, regs, ds); > + which_sel = x86_seg_ds; > continue; > case 0x26: /* ES override */ > data_sel = read_segment_register(v, regs, es); > + which_sel = x86_seg_es; > continue; > case 0x64: /* FS override */ > data_sel = read_segment_register(v, regs, fs); > + which_sel = x86_seg_fs; > lm_ovr = lm_seg_fs; > continue; > case 0x65: /* GS override */ > data_sel = read_segment_register(v, regs, gs); > + which_sel = x86_seg_gs; > lm_ovr = lm_seg_gs; > continue; > case 0x36: /* SS override */ > data_sel = regs->ss; > + which_sel = x86_seg_ss; > continue; > case 0xf0: /* LOCK */ > lock = 1; > @@ -1962,15 +2025,16 @@ static int emulate_privileged_op(struct cpu_user_regs > *regs) > if ( !(opcode & 2) ) > { > data_sel = read_segment_register(v, regs, es); > + which_sel = x86_seg_es; > lm_ovr = lm_seg_none; > } > > if ( !(ar & _SEGMENT_L) ) > { > - if ( !read_descriptor(data_sel, v, regs, > - &data_base, &data_limit, &ar, > - _SEGMENT_WR|_SEGMENT_S|_SEGMENT_DPL| > - _SEGMENT_P) ) > + if ( !read_descriptor_sel(data_sel, which_sel, v, regs, > + &data_base, &data_limit, &ar, > + _SEGMENT_WR|_SEGMENT_S|_SEGMENT_DPL| > + _SEGMENT_P) ) > goto fail; > if ( !(ar & _SEGMENT_S) || > !(ar & _SEGMENT_P) || > @@ -2000,9 +2064,9 @@ static int emulate_privileged_op(struct cpu_user_regs > *regs) > } > } > else > - read_descriptor(data_sel, v, regs, > - &data_base, &data_limit, &ar, > - 0); > + read_descriptor_sel(data_sel, which_sel, v, regs, > + &data_base, &data_limit, &ar, > + 0); > data_limit = ~0UL; > ar = _SEGMENT_WR|_SEGMENT_S|_SEGMENT_DPL|_SEGMENT_P; > } > diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h > index aee95f4..51ab230 100644 > --- a/xen/include/asm-x86/hvm/hvm.h > +++ b/xen/include/asm-x86/hvm/hvm.h > @@ -194,6 +194,8 @@ struct hvm_function_table { > bool_t access_w, bool_t access_x); > > int (*pvh_set_vcpu_info)(struct vcpu *v, struct vcpu_guest_context > *ctxtp); > + > + u16 (*read_selector)(struct vcpu *v, enum x86_segment seg); > }; > > extern struct hvm_function_table hvm_funcs; > @@ -333,6 +335,11 @@ static inline int pvh_set_vcpu_info(struct vcpu *v, > return hvm_funcs.pvh_set_vcpu_info(v, ctxtp); > } > > +static inline u16 pvh_get_selector(struct vcpu *v, enum x86_segment seg) > +{ > + return hvm_funcs.read_selector(v, seg); > +} > + > #define is_viridian_domain(_d) > \ > (is_hvm_domain(_d) && ((_d)->arch.hvm_domain.params[HVM_PARAM_VIRIDIAN])) > > diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h > index 9bb22cb..1242657 100644 > --- a/xen/include/asm-x86/system.h > +++ b/xen/include/asm-x86/system.h > @@ -4,10 +4,21 @@ > #include <xen/lib.h> > #include <xen/bitops.h> > > -#define read_segment_register(vcpu, regs, name) \ > -({ u16 __sel; \ > - asm volatile ( "movw %%" STR(name) ",%0" : "=r" (__sel) ); \ > - __sel; \ > +/* > + * We need vcpu because during context switch, going from PV to PVH, > + * in save_segments() current has been updated to next, and no longer > pointing > + * to the PV, but the intention is to get selector for the PV. Checking > + * is_pvh_vcpu(current) will yield incorrect results in such a case. > + */ > +#define read_segment_register(vcpu, regs, name) \ > +({ u16 __sel; \ > + struct cpu_user_regs *_regs = (regs); \ > + \ > + if ( is_pvh_vcpu(vcpu) && guest_mode(_regs) ) \ > + __sel = pvh_get_selector(vcpu, x86_seg_##name); \ > + else \ > + asm volatile ( "movw %%" #name ",%0" : "=r" (__sel) ); \ > + __sel; \ > }) > > #define wbinvd() \ > -- > 1.7.2.3 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Jan Beulich
2013-Jul-18 13:49 UTC
Re: [PATCH 18/24] PVH xen: Checks, asserts, and limitations for PVH
>>> On 18.07.13 at 04:33, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > --- a/xen/arch/x86/traps.c > +++ b/xen/arch/x86/traps.c > @@ -2710,6 +2710,8 @@ static void emulate_gate_op(struct cpu_user_regs *regs) > unsigned long off, eip, opnd_off, base, limit; > int jump; > > + ASSERT(!is_pvh_vcpu(v)); > + > /* Check whether this fault is due to the use of a call gate. */ > if ( !read_gate_descriptor(regs->error_code, v, &sel, &off, &ar) || > (((ar >> 13) & 3) < (regs->cs & 3)) || > @@ -3326,6 +3328,8 @@ void do_device_not_available(struct cpu_user_regs *regs) > > BUG_ON(!guest_mode(regs)); > > + ASSERT(!is_pvh_vcpu(curr)); > + > vcpu_restore_fpu_lazy(curr); > > if ( curr->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )I''m pretty sure I said this before: These assertions are bogus: Either drop them, or make them ASSERT(is_pv_vcpu(...)); Jan
Jan Beulich
2013-Jul-18 13:56 UTC
Re: [PATCH 19/24] PVH xen: add hypercall support for PVH
>>> On 18.07.13 at 04:33, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -3188,6 +3188,16 @@ static long hvm_vcpu_op( > case VCPUOP_register_vcpu_time_memory_area: > rc = do_vcpu_op(cmd, vcpuid, arg); > break; > + > + case VCPUOP_is_up: > + case VCPUOP_up: > + case VCPUOP_initialise: > + if ( is_pvh_vcpu(current) ) > + rc = do_vcpu_op(cmd, vcpuid, arg); > + else > + rc = -ENOSYS; > + break; > +As said before, this white listing has to be a temporary thing, and hence ought to have a fixme note.> @@ -3349,16 +3379,21 @@ int hvm_do_hypercall(struct cpu_user_regs *regs) > regs->r10, regs->r8, regs->r9); > > curr->arch.hvm_vcpu.hcall_64bit = 1; > - regs->rax = hvm_hypercall64_table[eax](regs->rdi, > - regs->rsi, > - regs->rdx, > - regs->r10, > - regs->r8, > - regs->r9); > + if ( is_pvh_vcpu(curr) ) > + regs->rax = pvh_hypercall64_table[eax](regs->rdi, regs->rsi, > + regs->rdx, regs->r10, > + regs->r8, regs->r9); > + else > + regs->rax = hvm_hypercall64_table[eax](regs->rdi, regs->rsi, > + regs->rdx, regs->r10, > + regs->r8, regs->r9); > curr->arch.hvm_vcpu.hcall_64bit = 0; > +Adding a stray blank line.> } > elseJan
Mukesh Rathor
2013-Jul-18 18:21 UTC
Re: [PATCH 01/24] PVH xen: Add readme docs/misc/pvh-readme.txt
On Thu, 18 Jul 2013 11:09:03 +0100 Ian Campbell <Ian.Campbell@citrix.com> wrote:> On Wed, 2013-07-17 at 19:32 -0700, Mukesh Rathor wrote: > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > > --- > > docs/misc/pvh-readme.txt | 40 > > ++++++++++++++++++++++++++++++++++++++++ 1 files changed, 40 > > insertions(+), 0 deletions(-) create mode 100644 > > docs/misc/pvh-readme.txt > > > > diff --git a/docs/misc/pvh-readme.txt b/docs/misc/pvh-readme.txt > > new file mode 100644 > > index 0000000..a813373 > > --- /dev/null > > +++ b/docs/misc/pvh-readme.txt > > @@ -0,0 +1,40 @@ > > + > > +PVH : a pv guest running in an HVM container. HAP is required for > > PVH. + > > +See: > > http://blog.xen.org/index.php/2012/10/23/the-paravirtualization-spectrum-part-1-the-ends-of-the-spectrum/ > > + + > > +The initial phase targets the booting of a 64bit UP/SMP linux > > guest in PVH +mode. This is done by adding: pvh=1 in the config > > file. xl, and not xm, is +supported. Phase I patches are broken > > into three parts: > > + - xen changes for booting of 64bit PVH guest > > + - tools changes for creating a PVH guest > > + - boot of 64bit dom0 in PVH mode. > > + > > +The best way to find all the patches is to use "git log|grep -i > > PVH", both +in xen and linux tree. > > + > > +Following fixme''s exist in the code: > > + - Add support for more memory types in arch/x86/hvm/mtrr.c. > > + - arch/x86/time.c: support more tsc modes. > > + - check_guest_io_breakpoint(): check/add support for IO > > breakpoint. > > + - implement arch_get_info_guest() for pvh. > > + - vmxit_msr_read(): during AMD port go thru > > hvm_msr_read_intercept() again. > > + - verify bp matching on emulated instructions will work same as > > HVM for > > + PVH guest. see instruction_done() and > > check_guest_io_breakpoint(). + > > +Following remain to be done for PVH: > > + - AMD port. > > + - 32bit PVH guest support in both linux and xen. Xen changes > > are tagged > > + "32bitfixme". > > + - Add support for monitoring guest behavior. See > > hvm_memory_event* functions > > + in hvm.c > > + - vcpu hotplug support > > + - Live migration of PVH guests. > > + - Avail PVH dom0 of posted interrupts. (This will be a big win). > > + > > + > > +Note, any emails to must be cc''d to Xen-devel@lists.xensource.com. > > lists.xen.org please, lets not get two domain names behind ;-)I''ll just say xen devel mailing list. The reason is I often get emails about debugger and when I cc xen-devel, some people get upset about it! I wanna say this so people can send from appropriate email that they are OK with being public.> Is there a description somewhere in the series of what PVH means in > terms of the guest visible ABI? i.e. documentation of the delta from > the regular PV mode? I had a skim through and didn''t spot it.No. The ABIs are not as affected, thanks to pre-existing auto translate mode. But, I suppose after all the patches are checked in, I can write up something. thanks Mukesh
Mukesh Rathor
2013-Jul-18 18:28 UTC
Re: [PATCH 09/24] PVH xen: Introduce PVH guest type and some basic changes.
On Thu, 18 Jul 2013 13:43:31 +0100 "Jan Beulich" <JBeulich@suse.com> wrote:> >>> On 18.07.13 at 04:32, Mukesh Rathor <mukesh.rathor@oracle.com> > >>> wrote: > > Chagnes in V8: > > Same spelling typo in all of the change title lines... > > > - Got to VMCS for DPL check instead of checking the rpl in > > guest_kernel_mode. Note, we drop the const qualifier from > > vcpu_show_registers() to accomodate the hvm function call in > > guest_kernel_mode(). > > - Also, hvm_kernel_mode is put in hvm.c because it''s called from > > guest_kernel_mode in regs.h which is a pretty early header > > include. Hence, we can''t place it in hvm.h like other similar > > functions. > > Are you saying that because you tried it, or just because it looks > like so? The use of the function is in a macro, and hence if the > macro isn''t used too early this could still work out. I say this > because the function would clearly benefit from getting inlined.I tried a lot. I tried putting the function in hvm.h, but then that needs to be included in regs.h, which won''t work at all since regs.h is a very early header. The other alternative, to put hvm_kernel_mode in regs.h itself, but then it calls hvm_get_segment_register() for which either I need to include hvm.h in regs.h, not possible, or add proto for hvm_get_segment_register(). But then the args to hvm_get_segment_register() also need their headers. So, in the end this seemed to be the best/only way. thanks Mukesh
Mukesh Rathor
2013-Jul-18 18:37 UTC
Re: [PATCH 10/24] PVH xen: introduce pvh_set_vcpu_info() and vmx_pvh_set_vcpu_info()
On Thu, 18 Jul 2013 14:14:32 +0100 "Jan Beulich" <JBeulich@suse.com> wrote:> >>> On 18.07.13 at 04:32, Mukesh Rathor <mukesh.rathor@oracle.com> > >>> wrote: > > +/* > > + * Set vmcs fields in support of vcpu_op -> VCPUOP_initialise > > hcall. Called > > + * from arch_set_info_guest() which sets the (PVH relevant) > > non-vmcs fields. > > + * > > + * In case of linux: > > + * The boot vcpu calls this to set some context for the non > > boot smp vcpu. > > + * The call comes from cpu_initialize_context(). (boot vcpu 0 > > context is > > + * set by the tools via do_domctl -> vcpu_initialise). > > + * > > + * NOTE: In case of VMCS, loading a selector doesn''t cause the > > hidden fields > > + * to be automatically loaded. We load selectors here but > > not the hidden > > + * parts. This means we require the guest to have same > > hidden values > > + * as the default values loaded in the vmcs in > > pvh_construct_vmcs(), ie, > > + * the GDT the vcpu is coming up on should be something like > > following > > + * on linux (for 64bit, CS:0x10 DS/SS:0x18) : > > + * > > + * ffff88007f704000: 0000000000000000 00cf9b000000ffff > > + * ffff88007f704010: 00af9b000000ffff 00cf93000000ffff > > + * ffff88007f704020: 00cffb000000ffff 00cff3000000ffff > > + * > > + */ > > This comment should reflect reality as closely as possible, or else > it''ll just cause confusion rather than clarifying things. In > particular, the hidden base fields of FS and GS get set below, and > hence the comment should say so.Ah, right, the FS and GS are different that way. I''ll change comment.> > +int vmx_pvh_set_vcpu_info(struct vcpu *v, struct > > vcpu_guest_context *ctxtp) +{ > > + if ( v->vcpu_id == 0 ) > > + return 0; > > + > > + if ( !(ctxtp->flags & VGCF_in_kernel) ) > > + return -EINVAL; > > So you check for kernel mode now, ... > > > + > > + vmx_vmcs_enter(v); > > + __vmwrite(GUEST_GDTR_BASE, ctxtp->gdt.pvh.addr); > > + __vmwrite(GUEST_GDTR_LIMIT, ctxtp->gdt.pvh.limit); > > + __vmwrite(GUEST_LDTR_BASE, ctxtp->ldt_base); > > + __vmwrite(GUEST_LDTR_LIMIT, ctxtp->ldt_ents); > > + > > + __vmwrite(GUEST_FS_BASE, ctxtp->fs_base); > > + __vmwrite(GUEST_GS_BASE, ctxtp->gs_base_user); > > ... but then write the user GS base here ... > > > + > > + __vmwrite(GUEST_CS_SELECTOR, ctxtp->user_regs.cs); > > + __vmwrite(GUEST_SS_SELECTOR, ctxtp->user_regs.ss); > > + __vmwrite(GUEST_ES_SELECTOR, ctxtp->user_regs.es); > > + __vmwrite(GUEST_DS_SELECTOR, ctxtp->user_regs.ds); > > + __vmwrite(GUEST_FS_SELECTOR, ctxtp->user_regs.fs); > > + __vmwrite(GUEST_GS_SELECTOR, ctxtp->user_regs.gs); > > + > > + if ( vmx_add_guest_msr(MSR_SHADOW_GS_BASE) ) > > + { > > + vmx_vmcs_exit(v); > > + return -EINVAL; > > + } > > + vmx_write_guest_msr(MSR_SHADOW_GS_BASE, ctxtp->gs_base_kernel); > > ... and the kernel one here? That looks the wrong way round to me.Yeah, I struggled with that one a lot, and had added to my list to talk to Konrad about. I think the PV code in linux has it backwards. Both values are same when the hcall is made, btw. But in linux baremetal/HVM, the value put in gs_base_user is the value written to MSR_GS_BASE. But, in PV part of linux code, I think it should be switched. Since you also agree, I''ll change this code here. thanks Mukesh
Mukesh Rathor
2013-Jul-19 01:23 UTC
Re: [PATCH 00/24][V8]PVH xen: Phase I, Version 8 patches...
On Thu, 18 Jul 2013 11:47:44 +0100 Roger Pau Monné <roger.pau@citrix.com> wrote:> On 18/07/13 03:32, Mukesh Rathor wrote: > > Hi all, > > > > This is V8 of PVH patches for xen. These are xen changes to support > > boot of a 64bit PVH domU guest. Built on top of unstable git c/s: > > 5d0ca62156d734a757656b9bcb6bf17ee76d37b4. > > > > New in V8: > > - Add docs/misc/pvh-readme.txt per Konrad's suggestion. > > - Redo macros guest_kernel_mode and read_segment_register. > > - Reorg and break down HVM+VMX patches to HVM and VMX as > > suggested. > > > > Patches 3/5/16 have already been "Reviewed-by". > > > > This patchset will also be on a public git tree in less than 24 > > hours and I'll email the details as soon as its done. > > Could you also push the necessary toolstack changes to the public git > tree? It would make testing much more easier. >Inlined below. Still working on getting the tree up there :).., until then.... I'm still using a bit older tree for tools, c/s: 4de97462d34f7b74c748ab67600fe2386131b778 --- docs/man/xl.cfg.pod.5 | 3 +++ tools/debugger/gdbsx/xg/xg_main.c | 4 +++- tools/libxc/xc_dom.h | 1 + tools/libxc/xc_dom_x86.c | 7 ++++--- tools/libxl/libxl_create.c | 2 ++ tools/libxl/libxl_dom.c | 18 +++++++++++++++++- tools/libxl/libxl_types.idl | 2 ++ tools/libxl/libxl_x86.c | 4 +++- tools/libxl/xl_cmdimpl.c | 11 +++++++++++ tools/xenstore/xenstored_domain.c | 12 +++++++----- 10 files changed, 53 insertions(+), 11 deletions(-) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index f8b4576..17c5679 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -620,6 +620,9 @@ if your particular guest kernel does not require this behaviour then it is safe to allow this to be enabled but you may wish to disable it anyway. +=item B<pvh=BOOLEAN> +Selects whether to run this guest in an HVM container. Default is 0. + =back =head2 Fully-virtualised (HVM) Guest Specific Options diff --git a/tools/debugger/gdbsx/xg/xg_main.c b/tools/debugger/gdbsx/xg/xg_main.c index 64c7484..5736b86 100644 --- a/tools/debugger/gdbsx/xg/xg_main.c +++ b/tools/debugger/gdbsx/xg/xg_main.c @@ -81,6 +81,7 @@ int xgtrc_on = 0; struct xen_domctl domctl; /* just use a global domctl */ static int _hvm_guest; /* hvm guest? 32bit HVMs have 64bit context */ +static int _pvh_guest; /* PV guest in HVM container */ static domid_t _dom_id; /* guest domid */ static int _max_vcpu_id; /* thus max_vcpu_id+1 VCPUs */ static int _dom0_fd; /* fd of /dev/privcmd */ @@ -309,6 +310,7 @@ xg_attach(int domid, int guest_bitness) _max_vcpu_id = domctl.u.getdomaininfo.max_vcpu_id; _hvm_guest = (domctl.u.getdomaininfo.flags & XEN_DOMINF_hvm_guest); + _pvh_guest = (domctl.u.getdomaininfo.flags & XEN_DOMINF_pvh_guest); return _max_vcpu_id; } @@ -369,7 +371,7 @@ _change_TF(vcpuid_t which_vcpu, int guest_bitness, int setit) int sz = sizeof(anyc); /* first try the MTF for hvm guest. otherwise do manually */ - if (_hvm_guest) { + if (_hvm_guest || _pvh_guest) { domctl.u.debug_op.vcpu = which_vcpu; domctl.u.debug_op.op = setit ? XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON : XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF; diff --git a/tools/libxc/xc_dom.h b/tools/libxc/xc_dom.h index ac36600..8b43d2b 100644 --- a/tools/libxc/xc_dom.h +++ b/tools/libxc/xc_dom.h @@ -130,6 +130,7 @@ struct xc_dom_image { domid_t console_domid; domid_t xenstore_domid; xen_pfn_t shared_info_mfn; + int pvh_enabled; xc_interface *xch; domid_t guest_domid; diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index f1be43b..24f6759 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -389,7 +389,8 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom) pgpfn = (addr - dom->parms.virt_base) >> PAGE_SHIFT_X86; l1tab[l1off] pfn_to_paddr(xc_dom_p2m_guest(dom, pgpfn)) | L1_PROT; - if ( (addr >= dom->pgtables_seg.vstart) && + if ( (!dom->pvh_enabled) && + (addr >= dom->pgtables_seg.vstart) && (addr < dom->pgtables_seg.vend) ) l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */ if ( l1off == (L1_PAGETABLE_ENTRIES_X86_64 - 1) ) @@ -706,7 +707,7 @@ int arch_setup_meminit(struct xc_dom_image *dom) rc = x86_compat(dom->xch, dom->guest_domid, dom->guest_type); if ( rc ) return rc; - if ( xc_dom_feature_translated(dom) ) + if ( xc_dom_feature_translated(dom) && !dom->pvh_enabled ) { dom->shadow_enabled = 1; rc = x86_shadow(dom->xch, dom->guest_domid); @@ -832,7 +833,7 @@ int arch_setup_bootlate(struct xc_dom_image *dom) } /* Map grant table frames into guest physmap. */ - for ( i = 0; ; i++ ) + for ( i = 0; !dom->pvh_enabled; i++ ) { rc = xc_domain_add_to_physmap(dom->xch, dom->guest_domid, XENMAPSPACE_grant_table, diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index cb9c822..83e2d5b 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -421,6 +421,8 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_create_info *info, flags |= XEN_DOMCTL_CDF_hvm_guest; flags |= libxl_defbool_val(info->hap) ? XEN_DOMCTL_CDF_hap : 0; flags |= libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off; + } else if ( libxl_defbool_val(info->pvh) ) { + flags |= XEN_DOMCTL_CDF_hap; } *domid = -1; diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index b38d0a7..cefbf76 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -329,9 +329,23 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid, struct xc_dom_image *dom; int ret; int flags = 0; + int is_pvh = libxl_defbool_val(info->pvh); xc_dom_loginit(ctx->xch); + if (is_pvh) { + char *pv_feats = "writable_descriptor_tables|auto_translated_physmap" + "|supervisor_mode_kernel|hvm_callback_vector"; + + if (info->u.pv.features && info->u.pv.features[0] != '\0') + { + LOG(ERROR, "Didn't expect info->u.pv.features to contain string\n"); + LOG(ERROR, "String: %s\n", info->u.pv.features); + return ERROR_FAIL; + } + info->u.pv.features = strdup(pv_feats); + } + dom = xc_dom_allocate(ctx->xch, state->pv_cmdline, info->u.pv.features); if (!dom) { LOGE(ERROR, "xc_dom_allocate failed"); @@ -370,6 +384,7 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid, } dom->flags = flags; + dom->pvh_enabled = is_pvh; dom->console_evtchn = state->console_port; dom->console_domid = state->console_domid; dom->xenstore_evtchn = state->store_port; @@ -400,7 +415,8 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid, LOGE(ERROR, "xc_dom_boot_image failed"); goto out; } - if ( (ret = xc_dom_gnttab_init(dom)) != 0 ) { + /* PVH sets up its own grant during boot via hvm mechanisms */ + if ( !is_pvh && (ret = xc_dom_gnttab_init(dom)) != 0 ) { LOGE(ERROR, "xc_dom_gnttab_init failed"); goto out; } diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index ecf1f0b..2599e01 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -245,6 +245,7 @@ libxl_domain_create_info = Struct("domain_create_info",[ ("platformdata", libxl_key_value_list), ("poolid", uint32), ("run_hotplug_scripts",libxl_defbool), + ("pvh", libxl_defbool), ], dir=DIR_IN) MemKB = UInt(64, init_val = "LIBXL_MEMKB_DEFAULT") @@ -346,6 +347,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ ])), ("invalid", Struct(None, [])), ], keyvar_init_val = "LIBXL_DOMAIN_TYPE_INVALID")), + ("pvh", libxl_defbool), ], dir=DIR_IN ) diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c index a17f6ae..424bc68 100644 --- a/tools/libxl/libxl_x86.c +++ b/tools/libxl/libxl_x86.c @@ -290,7 +290,9 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config, if (rtc_timeoffset) xc_domain_set_time_offset(ctx->xch, domid, rtc_timeoffset); - if (d_config->b_info.type == LIBXL_DOMAIN_TYPE_HVM) { + if (d_config->b_info.type == LIBXL_DOMAIN_TYPE_HVM || + libxl_defbool_val(d_config->b_info.pvh)) { + unsigned long shadow; shadow = (d_config->b_info.shadow_memkb + 1023) / 1024; xc_shadow_control(ctx->xch, domid, XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION, NULL, 0, &shadow, 0, NULL); diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index c1a969b..3ee7593 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -610,8 +610,18 @@ static void parse_config_data(const char *config_source, !strncmp(buf, "hvm", strlen(buf))) c_info->type = LIBXL_DOMAIN_TYPE_HVM; + libxl_defbool_setdefault(&c_info->pvh, false); + libxl_defbool_setdefault(&c_info->hap, false); + xlu_cfg_get_defbool(config, "pvh", &c_info->pvh, 0); xlu_cfg_get_defbool(config, "hap", &c_info->hap, 0); + if (libxl_defbool_val(c_info->pvh) && + !libxl_defbool_val(c_info->hap)) { + + fprintf(stderr, "hap is required for PVH domain\n"); + exit(1); + } + if (xlu_cfg_replace_string (config, "name", &c_info->name, 0)) { fprintf(stderr, "Domain name must be specified.\n"); exit(1); @@ -918,6 +928,7 @@ static void parse_config_data(const char *config_source, b_info->u.pv.cmdline = cmdline; xlu_cfg_replace_string (config, "ramdisk", &b_info->u.pv.ramdisk, 0); + libxl_defbool_set(&b_info->pvh, libxl_defbool_val(c_info->pvh)); break; } default: diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c index bf83d58..10c23a1 100644 --- a/tools/xenstore/xenstored_domain.c +++ b/tools/xenstore/xenstored_domain.c @@ -168,13 +168,15 @@ static int readchn(struct connection *conn, void *data, unsigned int len) static void *map_interface(domid_t domid, unsigned long mfn) { if (*xcg_handle != NULL) { - /* this is the preferred method */ - return xc_gnttab_map_grant_ref(*xcg_handle, domid, + void *addr; + /* this is the preferred method */ + addr = xc_gnttab_map_grant_ref(*xcg_handle, domid, GNTTAB_RESERVED_XENSTORE, PROT_READ|PROT_WRITE); - } else { - return xc_map_foreign_range(*xc_handle, domid, - getpagesize(), PROT_READ|PROT_WRITE, mfn); + if (addr) + return addr; } + return xc_map_foreign_range(*xc_handle, domid, + getpagesize(), PROT_READ|PROT_WRITE, mfn); } static void unmap_interface(void *interface) -- 1.7.2.3 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2013-Jul-19 09:16 UTC
Re: [PATCH 01/24] PVH xen: Add readme docs/misc/pvh-readme.txt
On Thu, 2013-07-18 at 11:21 -0700, Mukesh Rathor wrote:> > Is there a description somewhere in the series of what PVH means in > > terms of the guest visible ABI? i.e. documentation of the delta from > > the regular PV mode? I had a skim through and didn''t spot it. > > No. The ABIs are not as affected, thanks to pre-existing auto translate > mode.Not even a little bit of variation from that?> But, I suppose after all the patches are checked in, I can write > up something.I could live with that, but I have heard mutterings that some people are finding it hard to review the patches without knowing the interface the are supposed to be implementing, which is pretty fair I think. Could you perhaps enumerate the exact set of XENFEAT flags which must/must not be used/supported by a PVH guest in a document somewhere? That would hopefully explain the vast majority of the differences between trad-PV and PVH and be pretty succinct I think. Anything which isn''t explained away by a particular feature flag might need additional explanation. Ian.
Mukesh Rathor
2013-Jul-19 21:33 UTC
Re: [PATCH 01/24] PVH xen: Add readme docs/misc/pvh-readme.txt
On Fri, 19 Jul 2013 10:16:00 +0100 Ian Campbell <Ian.Campbell@citrix.com> wrote:> On Thu, 2013-07-18 at 11:21 -0700, Mukesh Rathor wrote: > > > Is there a description somewhere in the series of what PVH means > > > in terms of the guest visible ABI? i.e. documentation of the > > > delta from the regular PV mode? I had a skim through and didn''t > > > spot it. > > > > No. The ABIs are not as affected, thanks to pre-existing auto > > translate mode. > > Not even a little bit of variation from that?In the current series, it''s a PV domU guest with auto translate, so not really. There are some changes to the implementation, like in case of VCPUOP_initialise or XEN_DOMCTL_setvcpucontext for PVH, we must set context in VMCS also. The upcoming dom0 patch will introduce a new ABI, (unless you already did it for ARM and PVH will just piggyback on it). BTW, is there such a doc for ARM I can look at for reference?> > But, I suppose after all the patches are checked in, I can write > > up something. > > I could live with that, but I have heard mutterings that some people > are finding it hard to review the patches without knowing the > interface the are supposed to be implementing, which is pretty fair I > think.Ah, I see. I will try to enhance the patch comment prolog in the next version. Hopefully, that will help.> Could you perhaps enumerate the exact set of XENFEAT flags which > must/must not be used/supported by a PVH guest in a document > somewhere? That would hopefully explain the vast majority of the > differences between trad-PV and PVH and be pretty succinct I think. > Anything which isn''t explained away by a particular feature flag > might need additional explanation.Ok, done. I put that in the pvh-readme. thanks Mukesh
Ian Campbell
2013-Jul-22 18:21 UTC
Re: [PATCH 01/24] PVH xen: Add readme docs/misc/pvh-readme.txt
On Fri, 2013-07-19 at 14:33 -0700, Mukesh Rathor wrote:> On Fri, 19 Jul 2013 10:16:00 +0100 > Ian Campbell <Ian.Campbell@citrix.com> wrote: > > > On Thu, 2013-07-18 at 11:21 -0700, Mukesh Rathor wrote: > > > > Is there a description somewhere in the series of what PVH means > > > > in terms of the guest visible ABI? i.e. documentation of the > > > > delta from the regular PV mode? I had a skim through and didn''t > > > > spot it. > > > > > > No. The ABIs are not as affected, thanks to pre-existing auto > > > translate mode. > > > > Not even a little bit of variation from that? > > In the current series, it''s a PV domU guest with auto translate, so not > really. There are some changes to the implementation, like in case > of VCPUOP_initialise or XEN_DOMCTL_setvcpucontext for PVH, we must set > context in VMCS also. The upcoming dom0 patch will introduce a new ABI, > (unless you already did it for ARM and PVH will just piggyback on it). > > BTW, is there such a doc for ARM I can look at for reference?Being a whole new arch we had the luxury of being pretty fluid about how things we going to work and were correspondingly slack about writing them down. Things have settled down now and it really is about time we had something, I''ve just posted <1374517040-10822-1-git-send-email-ijc@hellion.org.uk>. It''s a bit lame but it is a start.> > > But, I suppose after all the patches are checked in, I can write > > > up something. > > > > I could live with that, but I have heard mutterings that some people > > are finding it hard to review the patches without knowing the > > interface the are supposed to be implementing, which is pretty fair I > > think. > > Ah, I see. I will try to enhance the patch comment prolog in the next > version. Hopefully, that will help.Thanks.> > Could you perhaps enumerate the exact set of XENFEAT flags which > > must/must not be used/supported by a PVH guest in a document > > somewhere? That would hopefully explain the vast majority of the > > differences between trad-PV and PVH and be pretty succinct I think. > > Anything which isn''t explained away by a particular feature flag > > might need additional explanation. > > Ok, done. I put that in the pvh-readme.Thanks.
Konrad Rzeszutek Wilk
2013-Jul-22 19:15 UTC
Re: [PATCH 22/24] PVH xen: VMX support of PVH guest creation/destruction
On Wed, Jul 17, 2013 at 07:33:06PM -0700, Mukesh Rathor wrote:> This patch implements the vmx portion of the guest create, ie > vcpu and domain initilization. Some changes to support the destroy path.initialization.> > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > --- > xen/arch/x86/hvm/vmx/vmx.c | 40 ++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 40 insertions(+), 0 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index 9be321d..8f08253 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -82,6 +82,9 @@ static int vmx_domain_initialise(struct domain *d) > { > int rc; > > + if ( is_pvh_domain(d) ) > + return 0; > + > if ( (rc = vmx_alloc_vlapic_mapping(d)) != 0 ) > return rc; > > @@ -90,6 +93,9 @@ static int vmx_domain_initialise(struct domain *d) > > static void vmx_domain_destroy(struct domain *d) > { > + if ( is_pvh_domain(d) ) > + return; > + > vmx_free_vlapic_mapping(d); > } > > @@ -113,6 +119,12 @@ static int vmx_vcpu_initialise(struct vcpu *v) > > vpmu_initialise(v); > > + if ( is_pvh_vcpu(v) ) > + { > + /* This for hvm_long_mode_enabled(v). */ > + v->arch.hvm_vcpu.guest_efer = EFER_SCE | EFER_LMA | EFER_LME; > + return 0; > + } > vmx_install_vlapic_mapping(v); > > /* %eax == 1 signals full real-mode support to the guest loader. */ > @@ -1076,6 +1088,28 @@ static void vmx_update_host_cr3(struct vcpu *v) > vmx_vmcs_exit(v); > } > > +/* > + * PVH guest never causes CR3 write vmexit. This is called during the guest > + * setup.What do you mean ''guest setup''? Setup from the toolstack? Or from the construct_dom0? I presume toolstack but it would be good to know from the comment. Thanks!> + */ > +static void vmx_update_pvh_cr(struct vcpu *v, unsigned int cr) > +{ > + vmx_vmcs_enter(v); > + switch ( cr ) > + { > + case 3: > + __vmwrite(GUEST_CR3, v->arch.hvm_vcpu.guest_cr[3]); > + hvm_asid_flush_vcpu(v); > + break; > + > + default: > + printk(XENLOG_ERR > + "PVH: d%d v%d unexpected cr%d update at rip:%lx\n", > + v->domain->domain_id, v->vcpu_id, cr, __vmread(GUEST_RIP)); > + } > + vmx_vmcs_exit(v); > +} > + > void vmx_update_debug_state(struct vcpu *v) > { > unsigned long mask; > @@ -1095,6 +1129,12 @@ void vmx_update_debug_state(struct vcpu *v) > > static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr) > { > + if ( is_pvh_vcpu(v) ) > + { > + vmx_update_pvh_cr(v, cr); > + return; > + } > + > vmx_vmcs_enter(v); > > switch ( cr ) > -- > 1.7.2.3 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2013-Jul-22 19:21 UTC
Re: [PATCH 14/24] PVH xen: interrupt/event-channel delivery to PVH
On Wed, Jul 17, 2013 at 07:32:58PM -0700, Mukesh Rathor wrote:> PVH uses HVMIRQ_callback_vector for interrupt delivery. Also, change > hvm_vcpu_has_pending_irq() as PVH doesn''t use vlapic emulation, so weFYI. Yet. It could in the future if we want to use that and the callback together. But that is a seperate discussion and you can avoid that for now.> can skip vlapic checks in the function. Moreover, a PVH guest installs IDT > natively, and sets callback via for interrupt delivery during boot. OnceThe ''sets callback via for'' does not make much sense. Did you mean: "and sets a callback vector for interrupt delivery during boot"?> that is done, it receives interrupts via the callback. > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>You can add Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> with the change I mentioned above.> --- > xen/arch/x86/hvm/irq.c | 3 +++ > xen/arch/x86/hvm/vmx/intr.c | 8 ++++++-- > xen/include/asm-x86/domain.h | 2 +- > xen/include/asm-x86/event.h | 2 +- > 4 files changed, 11 insertions(+), 4 deletions(-) > > diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c > index 9eae5de..92fb245 100644 > --- a/xen/arch/x86/hvm/irq.c > +++ b/xen/arch/x86/hvm/irq.c > @@ -405,6 +405,9 @@ struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v) > && vcpu_info(v, evtchn_upcall_pending) ) > return hvm_intack_vector(plat->irq.callback_via.vector); > > + if ( is_pvh_vcpu(v) ) > + return hvm_intack_none; > + > if ( vlapic_accept_pic_intr(v) && plat->vpic[0].int_output ) > return hvm_intack_pic(0); > > diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c > index e376f3c..ce42950 100644 > --- a/xen/arch/x86/hvm/vmx/intr.c > +++ b/xen/arch/x86/hvm/vmx/intr.c > @@ -165,6 +165,9 @@ static int nvmx_intr_intercept(struct vcpu *v, struct hvm_intack intack) > { > u32 ctrl; > > + if ( is_pvh_vcpu(v) ) > + return 0; > + > if ( nvmx_intr_blocked(v) != hvm_intblk_none ) > { > enable_intr_window(v, intack); > @@ -219,8 +222,9 @@ void vmx_intr_assist(void) > return; > } > > - /* Crank the handle on interrupt state. */ > - pt_vector = pt_update_irq(v); > + if ( !is_pvh_vcpu(v) ) > + /* Crank the handle on interrupt state. */ > + pt_vector = pt_update_irq(v); > > do { > intack = hvm_vcpu_has_pending_irq(v); > diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h > index 22a72df..21a9954 100644 > --- a/xen/include/asm-x86/domain.h > +++ b/xen/include/asm-x86/domain.h > @@ -16,7 +16,7 @@ > #define is_pv_32on64_domain(d) (is_pv_32bit_domain(d)) > #define is_pv_32on64_vcpu(v) (is_pv_32on64_domain((v)->domain)) > > -#define is_hvm_pv_evtchn_domain(d) (is_hvm_domain(d) && \ > +#define is_hvm_pv_evtchn_domain(d) (!is_pv_domain(d) && \ > d->arch.hvm_domain.irq.callback_via_type == HVMIRQ_callback_vector) > #define is_hvm_pv_evtchn_vcpu(v) (is_hvm_pv_evtchn_domain(v->domain)) > > diff --git a/xen/include/asm-x86/event.h b/xen/include/asm-x86/event.h > index 06057c7..7ed5812 100644 > --- a/xen/include/asm-x86/event.h > +++ b/xen/include/asm-x86/event.h > @@ -18,7 +18,7 @@ int hvm_local_events_need_delivery(struct vcpu *v); > static inline int local_events_need_delivery(void) > { > struct vcpu *v = current; > - return (is_hvm_vcpu(v) ? hvm_local_events_need_delivery(v) : > + return (!is_pv_vcpu(v) ? hvm_local_events_need_delivery(v) : > (vcpu_info(v, evtchn_upcall_pending) && > !vcpu_info(v, evtchn_upcall_mask))); > } > -- > 1.7.2.3 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2013-Jul-22 19:22 UTC
Re: [PATCH 03/24] PVH xen: turn gdb_frames/gdt_ents into union
On Wed, Jul 17, 2013 at 07:32:47PM -0700, Mukesh Rathor wrote:> Changes in V2: > - Add __XEN_INTERFACE_VERSION__ > > Changes in V3: > - Rename union to ''gdt'' and rename field names. > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > Reviewed-by: Jan Beulich <jbeulich@suse.com>Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>> --- > tools/libxc/xc_domain_restore.c | 8 ++++---- > tools/libxc/xc_domain_save.c | 6 +++--- > xen/arch/x86/domain.c | 12 ++++++------ > xen/arch/x86/domctl.c | 12 ++++++------ > xen/include/public/arch-x86/xen.h | 14 ++++++++++++++ > 5 files changed, 33 insertions(+), 19 deletions(-) > > diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c > index 63d36cd..47aaca0 100644 > --- a/tools/libxc/xc_domain_restore.c > +++ b/tools/libxc/xc_domain_restore.c > @@ -2055,15 +2055,15 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, > munmap(start_info, PAGE_SIZE); > } > /* Uncanonicalise each GDT frame number. */ > - if ( GET_FIELD(ctxt, gdt_ents) > 8192 ) > + if ( GET_FIELD(ctxt, gdt.pv.num_ents) > 8192 ) > { > ERROR("GDT entry count out of range"); > goto out; > } > > - for ( j = 0; (512*j) < GET_FIELD(ctxt, gdt_ents); j++ ) > + for ( j = 0; (512*j) < GET_FIELD(ctxt, gdt.pv.num_ents); j++ ) > { > - pfn = GET_FIELD(ctxt, gdt_frames[j]); > + pfn = GET_FIELD(ctxt, gdt.pv.frames[j]); > if ( (pfn >= dinfo->p2m_size) || > (pfn_type[pfn] != XEN_DOMCTL_PFINFO_NOTAB) ) > { > @@ -2071,7 +2071,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, > j, (unsigned long)pfn); > goto out; > } > - SET_FIELD(ctxt, gdt_frames[j], ctx->p2m[pfn]); > + SET_FIELD(ctxt, gdt.pv.frames[j], ctx->p2m[pfn]); > } > /* Uncanonicalise the page table base pointer. */ > pfn = UNFOLD_CR3(GET_FIELD(ctxt, ctrlreg[3])); > diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c > index fbc15e9..e938628 100644 > --- a/tools/libxc/xc_domain_save.c > +++ b/tools/libxc/xc_domain_save.c > @@ -1907,15 +1907,15 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter > } > > /* Canonicalise each GDT frame number. */ > - for ( j = 0; (512*j) < GET_FIELD(&ctxt, gdt_ents); j++ ) > + for ( j = 0; (512*j) < GET_FIELD(&ctxt, gdt.pv.num_ents); j++ ) > { > - mfn = GET_FIELD(&ctxt, gdt_frames[j]); > + mfn = GET_FIELD(&ctxt, gdt.pv.frames[j]); > if ( !MFN_IS_IN_PSEUDOPHYS_MAP(mfn) ) > { > ERROR("GDT frame is not in range of pseudophys map"); > goto out; > } > - SET_FIELD(&ctxt, gdt_frames[j], mfn_to_pfn(mfn)); > + SET_FIELD(&ctxt, gdt.pv.frames[j], mfn_to_pfn(mfn)); > } > > /* Canonicalise the page table base pointer. */ > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c > index 874742c..73ddad7 100644 > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -784,8 +784,8 @@ int arch_set_info_guest( > } > > for ( i = 0; i < ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames); ++i ) > - fail |= v->arch.pv_vcpu.gdt_frames[i] != c(gdt_frames[i]); > - fail |= v->arch.pv_vcpu.gdt_ents != c(gdt_ents); > + fail |= v->arch.pv_vcpu.gdt_frames[i] != c(gdt.pv.frames[i]); > + fail |= v->arch.pv_vcpu.gdt_ents != c(gdt.pv.num_ents); > > fail |= v->arch.pv_vcpu.ldt_base != c(ldt_base); > fail |= v->arch.pv_vcpu.ldt_ents != c(ldt_ents); > @@ -838,17 +838,17 @@ int arch_set_info_guest( > return rc; > > if ( !compat ) > - rc = (int)set_gdt(v, c.nat->gdt_frames, c.nat->gdt_ents); > + rc = (int)set_gdt(v, c.nat->gdt.pv.frames, c.nat->gdt.pv.num_ents); > else > { > unsigned long gdt_frames[ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames)]; > - unsigned int n = (c.cmp->gdt_ents + 511) / 512; > + unsigned int n = (c.cmp->gdt.pv.num_ents + 511) / 512; > > if ( n > ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames) ) > return -EINVAL; > for ( i = 0; i < n; ++i ) > - gdt_frames[i] = c.cmp->gdt_frames[i]; > - rc = (int)set_gdt(v, gdt_frames, c.cmp->gdt_ents); > + gdt_frames[i] = c.cmp->gdt.pv.frames[i]; > + rc = (int)set_gdt(v, gdt_frames, c.cmp->gdt.pv.num_ents); > } > if ( rc != 0 ) > return rc; > diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c > index c2a04c4..f87d6ab 100644 > --- a/xen/arch/x86/domctl.c > +++ b/xen/arch/x86/domctl.c > @@ -1300,12 +1300,12 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c) > c(ldt_base = v->arch.pv_vcpu.ldt_base); > c(ldt_ents = v->arch.pv_vcpu.ldt_ents); > for ( i = 0; i < ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames); ++i ) > - c(gdt_frames[i] = v->arch.pv_vcpu.gdt_frames[i]); > - BUILD_BUG_ON(ARRAY_SIZE(c.nat->gdt_frames) !> - ARRAY_SIZE(c.cmp->gdt_frames)); > - for ( ; i < ARRAY_SIZE(c.nat->gdt_frames); ++i ) > - c(gdt_frames[i] = 0); > - c(gdt_ents = v->arch.pv_vcpu.gdt_ents); > + c(gdt.pv.frames[i] = v->arch.pv_vcpu.gdt_frames[i]); > + BUILD_BUG_ON(ARRAY_SIZE(c.nat->gdt.pv.frames) !> + ARRAY_SIZE(c.cmp->gdt.pv.frames)); > + for ( ; i < ARRAY_SIZE(c.nat->gdt.pv.frames); ++i ) > + c(gdt.pv.frames[i] = 0); > + c(gdt.pv.num_ents = v->arch.pv_vcpu.gdt_ents); > c(kernel_ss = v->arch.pv_vcpu.kernel_ss); > c(kernel_sp = v->arch.pv_vcpu.kernel_sp); > for ( i = 0; i < ARRAY_SIZE(v->arch.pv_vcpu.ctrlreg); ++i ) > diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h > index b7f6a51..25c8519 100644 > --- a/xen/include/public/arch-x86/xen.h > +++ b/xen/include/public/arch-x86/xen.h > @@ -170,7 +170,21 @@ struct vcpu_guest_context { > struct cpu_user_regs user_regs; /* User-level CPU registers */ > struct trap_info trap_ctxt[256]; /* Virtual IDT */ > unsigned long ldt_base, ldt_ents; /* LDT (linear address, # ents) */ > +#if __XEN_INTERFACE_VERSION__ < 0x00040400 > unsigned long gdt_frames[16], gdt_ents; /* GDT (machine frames, # ents) */ > +#else > + union { > + struct { > + /* GDT (machine frames, # ents) */ > + unsigned long frames[16], num_ents; > + } pv; > + struct { > + /* PVH: GDTR addr and size */ > + uint64_t addr; > + uint16_t limit; > + } pvh; > + } gdt; > +#endif > unsigned long kernel_ss, kernel_sp; /* Virtual TSS (only SS1/SP1) */ > /* NB. User pagetable on x86/64 is placed in ctrlreg[1]. */ > unsigned long ctrlreg[8]; /* CR0-CR7 (control registers) */ > -- > 1.7.2.3 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2013-Jul-22 19:22 UTC
Re: [PATCH 21/24] PVH xen: HVM support of PVH guest creation/destruction
On Wed, Jul 17, 2013 at 07:33:05PM -0700, Mukesh Rathor wrote:> This patch implements the HVM portion of the guest create, ie > vcpu and domain initilization. Some changes to support the destroy path. > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Thanks for splitting this out of the cleanup patch.> --- > xen/arch/x86/hvm/hvm.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++- > 1 files changed, 65 insertions(+), 2 deletions(-) > > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c > index 3d930eb..7066d7b 100644 > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -510,6 +510,30 @@ static int hvm_print_line( > return X86EMUL_OKAY; > } > > +static int pvh_dom_initialise(struct domain *d) > +{ > + int rc; > + > + if ( !d->arch.hvm_domain.hap_enabled ) > + return -EINVAL; > + > + spin_lock_init(&d->arch.hvm_domain.irq_lock); > + > + hvm_init_cacheattr_region_list(d); > + > + if ( (rc = paging_enable(d, PG_refcounts|PG_translate|PG_external)) != 0 ) > + goto pvh_dominit_fail; > + > + if ( (rc = hvm_funcs.domain_initialise(d)) != 0 ) > + goto pvh_dominit_fail; > + > + return 0; > + > +pvh_dominit_fail: > + hvm_destroy_cacheattr_region_list(d); > + return rc; > +} > + > int hvm_domain_initialise(struct domain *d) > { > int rc; > @@ -520,6 +544,8 @@ int hvm_domain_initialise(struct domain *d) > "on a non-VT/AMDV platform.\n"); > return -EINVAL; > } > + if ( is_pvh_domain(d) ) > + return pvh_dom_initialise(d); > > spin_lock_init(&d->arch.hvm_domain.pbuf_lock); > spin_lock_init(&d->arch.hvm_domain.irq_lock); > @@ -584,6 +610,9 @@ int hvm_domain_initialise(struct domain *d) > > void hvm_domain_relinquish_resources(struct domain *d) > { > + if ( is_pvh_domain(d) ) > + return; > + > if ( hvm_funcs.nhvm_domain_relinquish_resources ) > hvm_funcs.nhvm_domain_relinquish_resources(d); > > @@ -609,10 +638,14 @@ void hvm_domain_relinquish_resources(struct domain *d) > void hvm_domain_destroy(struct domain *d) > { > hvm_funcs.domain_destroy(d); > + hvm_destroy_cacheattr_region_list(d); > + > + if ( is_pvh_domain(d) ) > + return; > + > rtc_deinit(d); > stdvga_deinit(d); > vioapic_deinit(d); > - hvm_destroy_cacheattr_region_list(d); > } > > static int hvm_save_tsc_adjust(struct domain *d, hvm_domain_context_t *h) > @@ -1066,6 +1099,30 @@ static int __init __hvm_register_CPU_XSAVE_save_and_restore(void) > } > __initcall(__hvm_register_CPU_XSAVE_save_and_restore); > > +static int pvh_vcpu_initialise(struct vcpu *v) > +{ > + int rc; > + > + if ( (rc = hvm_funcs.vcpu_initialise(v)) != 0 ) > + return rc; > + > + softirq_tasklet_init(&v->arch.hvm_vcpu.assert_evtchn_irq_tasklet, > + (void(*)(unsigned long))hvm_assert_evtchn_irq, > + (unsigned long)v); > + > + v->arch.hvm_vcpu.hcall_64bit = 1; /* PVH 32bitfixme. */ > + v->arch.user_regs.eflags = 2; > + v->arch.hvm_vcpu.inject_trap.vector = -1; > + > + if ( (rc = hvm_vcpu_cacheattr_init(v)) != 0 ) > + { > + hvm_funcs.vcpu_destroy(v); > + return rc; > + } > + > + return 0; > +} > + > int hvm_vcpu_initialise(struct vcpu *v) > { > int rc; > @@ -1077,6 +1134,9 @@ int hvm_vcpu_initialise(struct vcpu *v) > spin_lock_init(&v->arch.hvm_vcpu.tm_lock); > INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list); > > + if ( is_pvh_vcpu(v) ) > + return pvh_vcpu_initialise(v); > + > if ( (rc = vlapic_init(v)) != 0 ) > goto fail1; > > @@ -1165,7 +1225,10 @@ void hvm_vcpu_destroy(struct vcpu *v) > > tasklet_kill(&v->arch.hvm_vcpu.assert_evtchn_irq_tasklet); > hvm_vcpu_cacheattr_destroy(v); > - vlapic_destroy(v); > + > + if ( !is_pvh_vcpu(v) ) > + vlapic_destroy(v); > + > hvm_funcs.vcpu_destroy(v); > > /* Event channel is already freed by evtchn_destroy(). */ > -- > 1.7.2.3 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2013-Jul-22 19:24 UTC
Re: [PATCH 23/24] PVH xen: preparatory patch for the pvh vmexit handler patch
On Wed, Jul 17, 2013 at 07:33:07PM -0700, Mukesh Rathor wrote:> This is a preparatory patch for the next pvh vmexit handler patch. > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>> --- > xen/arch/x86/hvm/vmx/pvh.c | 5 +++++ > xen/arch/x86/hvm/vmx/vmx.c | 6 ++++++ > xen/arch/x86/traps.c | 4 ++-- > xen/include/asm-x86/hvm/vmx/vmx.h | 1 + > xen/include/asm-x86/processor.h | 2 ++ > xen/include/asm-x86/traps.h | 2 ++ > 6 files changed, 18 insertions(+), 2 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmx/pvh.c b/xen/arch/x86/hvm/vmx/pvh.c > index 8638850..fb55ac8 100644 > --- a/xen/arch/x86/hvm/vmx/pvh.c > +++ b/xen/arch/x86/hvm/vmx/pvh.c > @@ -20,6 +20,11 @@ > #include <asm/hvm/nestedhvm.h> > #include <asm/xstate.h> > > +/* Implemented in the next patch */ > +void vmx_pvh_vmexit_handler(struct cpu_user_regs *regs) > +{ > +} > + > /* > * Set vmcs fields in support of vcpu_op -> VCPUOP_initialise hcall. Called > * from arch_set_info_guest() which sets the (PVH relevant) non-vmcs fields. > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index 8f08253..59070a8 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -2491,6 +2491,12 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) > if ( unlikely(exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) ) > return vmx_failed_vmentry(exit_reason, regs); > > + if ( is_pvh_vcpu(v) ) > + { > + vmx_pvh_vmexit_handler(regs); > + return; > + } > + > if ( v->arch.hvm_vmx.vmx_realmode ) > { > /* Put RFLAGS back the way the guest wants it */ > diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c > index 5325e92..1e8cf60 100644 > --- a/xen/arch/x86/traps.c > +++ b/xen/arch/x86/traps.c > @@ -745,7 +745,7 @@ int cpuid_hypervisor_leaves( uint32_t idx, uint32_t sub_idx, > return 1; > } > > -static void pv_cpuid(struct cpu_user_regs *regs) > +void pv_cpuid(struct cpu_user_regs *regs) > { > uint32_t a, b, c, d; > > @@ -1904,7 +1904,7 @@ static int is_cpufreq_controller(struct domain *d) > > #include "x86_64/mmconfig.h" > > -static int emulate_privileged_op(struct cpu_user_regs *regs) > +int emulate_privileged_op(struct cpu_user_regs *regs) > { > enum x86_segment which_sel; > struct vcpu *v = current; > diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h > index 9e6c481..44e4136 100644 > --- a/xen/include/asm-x86/hvm/vmx/vmx.h > +++ b/xen/include/asm-x86/hvm/vmx/vmx.h > @@ -474,6 +474,7 @@ void vmx_dr_access(unsigned long exit_qualification, > struct cpu_user_regs *regs); > void vmx_fpu_enter(struct vcpu *v); > int vmx_pvh_set_vcpu_info(struct vcpu *v, struct vcpu_guest_context *ctxtp); > +void vmx_pvh_vmexit_handler(struct cpu_user_regs *regs); > > int alloc_p2m_hap_data(struct p2m_domain *p2m); > void free_p2m_hap_data(struct p2m_domain *p2m); > diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h > index 5cdacc7..22a9653 100644 > --- a/xen/include/asm-x86/processor.h > +++ b/xen/include/asm-x86/processor.h > @@ -566,6 +566,8 @@ void microcode_set_module(unsigned int); > int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void), unsigned long len); > int microcode_resume_cpu(int cpu); > > +void pv_cpuid(struct cpu_user_regs *regs); > + > #endif /* !__ASSEMBLY__ */ > > #endif /* __ASM_X86_PROCESSOR_H */ > diff --git a/xen/include/asm-x86/traps.h b/xen/include/asm-x86/traps.h > index 1d9b087..8c3540a 100644 > --- a/xen/include/asm-x86/traps.h > +++ b/xen/include/asm-x86/traps.h > @@ -50,4 +50,6 @@ extern int send_guest_trap(struct domain *d, uint16_t vcpuid, > unsigned int trap_nr); > int emulate_forced_invalid_op(struct cpu_user_regs *regs); > > +int emulate_privileged_op(struct cpu_user_regs *regs); > + > #endif /* ASM_TRAP_H */ > -- > 1.7.2.3 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2013-Jul-22 19:25 UTC
Re: [PATCH 15/24] PVH xen: additional changes to support PVH guest creation and execution.
On Wed, Jul 17, 2013 at 07:32:59PM -0700, Mukesh Rathor wrote:> Fail creation of 32bit PVH guest. Change hap_update_cr3() to return long > mode for PVH, this called during domain creation from arch_set_info_guest(). > Return correct features for PVH to guest during it''s boot. > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>Looks ok to me, aka you can slap on Reviewed-by tag from me.> --- > xen/arch/x86/domain.c | 8 ++++++++ > xen/arch/x86/mm/hap/hap.c | 4 +++- > xen/common/domain.c | 10 ++++++++++ > xen/common/domctl.c | 5 +++++ > xen/common/kernel.c | 6 +++++- > 5 files changed, 31 insertions(+), 2 deletions(-) > > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c > index fccb4ee..288872a 100644 > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -339,6 +339,14 @@ int switch_compat(struct domain *d) > > if ( d == NULL ) > return -EINVAL; > + > + if ( is_pvh_domain(d) ) > + { > + printk(XENLOG_INFO > + "Xen currently does not support 32bit PVH guests\n"); > + return -EINVAL; > + } > + > if ( !may_switch_mode(d) ) > return -EACCES; > if ( is_pv_32on64_domain(d) ) > diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c > index bff05d9..19a085c 100644 > --- a/xen/arch/x86/mm/hap/hap.c > +++ b/xen/arch/x86/mm/hap/hap.c > @@ -639,7 +639,9 @@ static void hap_update_cr3(struct vcpu *v, int do_locking) > const struct paging_mode * > hap_paging_get_mode(struct vcpu *v) > { > - return !hvm_paging_enabled(v) ? &hap_paging_real_mode : > + /* PVH 32bitfixme. */ > + return is_pvh_vcpu(v) ? &hap_paging_long_mode : > + !hvm_paging_enabled(v) ? &hap_paging_real_mode : > hvm_long_mode_enabled(v) ? &hap_paging_long_mode : > hvm_pae_enabled(v) ? &hap_paging_pae_mode : > &hap_paging_protected_mode; > diff --git a/xen/common/domain.c b/xen/common/domain.c > index 38b1bad..3b4af4b 100644 > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -237,6 +237,16 @@ struct domain *domain_create( > > if ( domcr_flags & DOMCRF_hvm ) > d->guest_type = guest_type_hvm; > + else if ( domcr_flags & DOMCRF_pvh ) > + { > + if ( !(domcr_flags & DOMCRF_hap) ) > + { > + err = -EOPNOTSUPP; > + printk(XENLOG_INFO "PVH guest must have HAP on\n"); > + goto fail; > + } > + d->guest_type = guest_type_pvh; > + } > > if ( domid == 0 ) > { > diff --git a/xen/common/domctl.c b/xen/common/domctl.c > index c653efb..48e4c08 100644 > --- a/xen/common/domctl.c > +++ b/xen/common/domctl.c > @@ -187,6 +187,8 @@ void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info) > > if ( is_hvm_domain(d) ) > info->flags |= XEN_DOMINF_hvm_guest; > + else if ( is_pvh_domain(d) ) > + info->flags |= XEN_DOMINF_pvh_guest; > > xsm_security_domaininfo(d, info); > > @@ -443,6 +445,9 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) > domcr_flags = 0; > if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hvm_guest ) > domcr_flags |= DOMCRF_hvm; > + else if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hap ) > + domcr_flags |= DOMCRF_pvh; /* PV with HAP is a PVH guest */ > + > if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hap ) > domcr_flags |= DOMCRF_hap; > if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_s3_integrity ) > diff --git a/xen/common/kernel.c b/xen/common/kernel.c > index 72fb905..3bba758 100644 > --- a/xen/common/kernel.c > +++ b/xen/common/kernel.c > @@ -289,7 +289,11 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) > if ( current->domain == dom0 ) > fi.submap |= 1U << XENFEAT_dom0; > #ifdef CONFIG_X86 > - if ( !is_hvm_vcpu(current) ) > + if ( is_pvh_vcpu(current) ) > + fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) | > + (1U << XENFEAT_supervisor_mode_kernel) | > + (1U << XENFEAT_hvm_callback_vector); > + else if ( !is_hvm_vcpu(current) ) > fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) | > (1U << XENFEAT_highmem_assist) | > (1U << XENFEAT_gnttab_map_avail_bits); > -- > 1.7.2.3 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2013-Jul-22 19:29 UTC
Re: [PATCH 01/24] PVH xen: Add readme docs/misc/pvh-readme.txt
On Thu, Jul 18, 2013 at 11:32:00AM +0100, Jan Beulich wrote:> >>> On 18.07.13 at 04:32, Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > > --- > > docs/misc/pvh-readme.txt | 40 ++++++++++++++++++++++++++++++++++++++++ > > 1 files changed, 40 insertions(+), 0 deletions(-) > > create mode 100644 docs/misc/pvh-readme.txt > > > > diff --git a/docs/misc/pvh-readme.txt b/docs/misc/pvh-readme.txt > > new file mode 100644 > > index 0000000..a813373 > > --- /dev/null > > +++ b/docs/misc/pvh-readme.txt > > @@ -0,0 +1,40 @@ > > + > > +PVH : a pv guest running in an HVM container. HAP is required for PVH. > > + > > +See: > > http://blog.xen.org/index.php/2012/10/23/the-paravirtualization-spectrum-p > > art-1-the-ends-of-the-spectrum/ > > + > > + > > +The initial phase targets the booting of a 64bit UP/SMP linux guest in PVH > > +mode. This is done by adding: pvh=1 in the config file. xl, and not xm, is > > +supported. Phase I patches are broken into three parts: > > + - xen changes for booting of 64bit PVH guest > > + - tools changes for creating a PVH guest > > + - boot of 64bit dom0 in PVH mode. > > + > > +The best way to find all the patches is to use "git log|grep -i PVH", both > > +in xen and linux tree. > > Which doesn''t really say which tree. Do you mean the upstream > ones or some private ones you maintain?It is my upstream one: git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/pvh.v8 or just follow the #linux-next branch which has them as well. I am waiting for the trigger when I should push it to Linus.> > > + > > +Following fixme''s exist in the code: > > + - Add support for more memory types in arch/x86/hvm/mtrr.c. > > + - arch/x86/time.c: support more tsc modes. > > + - check_guest_io_breakpoint(): check/add support for IO breakpoint. > > + - implement arch_get_info_guest() for pvh. > > + - vmxit_msr_read(): during AMD port go thru hvm_msr_read_intercept() > > again. > > + - verify bp matching on emulated instructions will work same as HVM for > > + PVH guest. see instruction_done() and check_guest_io_breakpoint(). > > + > > +Following remain to be done for PVH: > > + - AMD port. > > + - 32bit PVH guest support in both linux and xen. Xen changes are tagged > > + "32bitfixme". > > + - Add support for monitoring guest behavior. See hvm_memory_event* > > functions > > + in hvm.c > > + - vcpu hotplug support > > + - Live migration of PVH guests. > > + - Avail PVH dom0 of posted interrupts. (This will be a big win). > > + > > + > > +Note, any emails to must be cc''d to Xen-devel@lists.xensource.com. > > Please let''s not add more references to this super stale mailing > list. It had been @lists.xen.org for a couple of years, and recently > changed to @lists.xenproject.org. > > Also there must be something missing between "to" and "must"... > > Jan > > > + > > +Mukesh Rathor > > +mukesh.rathor [at] oracle [dot] com > > -- > > 1.7.2.3 > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel