== Status = 01 a Allow vmx_update_debug_state to be called when v!=current 02 A libxc: Move temporary grant table mapping to end of memory 03 a pvh prep: code motion 04 n pvh: Tolerate HVM guests having no ioreq page 05 a Introduce pv guest type and has_hvm_container macros 06 * pvh: Introduce PVH guest type 07 pvh: Disable unneeded features of HVM containers 08 * ! pvh: vmx-specific changes 09 ra pvh: Do not allow PVH guests to change paging modes 10 a pvh: PVH access to hypercalls 11 Ra pvh: Use PV e820 12 * ! pvh: Set up more PV stuff in set_info_guest 13 * ! pvh: PV cpuid 14 * ! pvh: Use PV handlers for :IO 15 A pvh: Disable 32-bit guest support for now 16 a pvh: Restrict tsc_mode to NEVER_EMULATE for now 17 a pvh: Documentation 18 a PVH xen tools: libxc changes to build a PVH guest. 19 a PVH xen tools: libxl changes to create a PVH guest. Key *: Non-trivial changes in v15. n: New in v15 a/r: acked / reviewed (lowercase for 1, capitals for >1) !: Still missing any review from necesary maintainers (VMX maintainers) This series addresses review comments from versions 13 and 14. Additionally, it generalizes the PVH IO path from the previous series, by allowing HVM guests in general to tolerate not having a backing device model, and then registering a PIO handler for PVH guests which will call into the PV IO handlers. Major "to-do" or "open issues" (which need not stop the series going in) include: - Get rid of the extra mode, and make PVH just HVM with some flags - Implement full PV set_info_guest, to make the cpu bring-up code the same - Whether to support forced invalid ops. ATM the only effect of not having this is that xen-detect claims to be in an HVM xen guest rather than a PV xen guest. == Summay = This patch series is a reworking of a series developed by Mukesh Rathor at Oracle. The entirety of the design and development was done by him; I have only reworked, reorganized, and simplified things in a way that I think makes more sense. The vast majority of the credit for this effort therefore goes to him. This version is labelled v14 because it is based on his most recent series, v11. Because this is based on his work, I retain the "Signed-off-by" in patches which are based on his code. This is not meant to imply that he supports the modified version, only that he is involved in certifying that the origin of the code for copyright purposes. This patch series is broken down into several broad strokes: * Miscellaneous fixes or tweaks * Code motion, so future patches are simpler * Introduction of the "hvm_container" concept, which will form the basis for sharing codepaths between hvm and pvh * Start with PVH as an HVM container * Disable unneeded HVM functionality * Enable PV functionality * Disable not-yet-implemented functionality * Enable toolstack changes required to make PVH guests This patch series can also be pulled from this git tree: git://xenbits.xen.org/people/gdunlap/xen.git out/pvh-v15 The kernel code for PVH guests can be found here: git://oss.oracle.com/git/mrathor/linux.git pvh.v9-muk-1 (That repo/branch also contains a config file, pvh-config-file) Changes in v14 can be found inline; major changes since v13 include: * Various bug fixes * Use HVM emulation for IO instructions * ...thus removing many of the changes required to allow the PV emulation codepath to work for PVH guests Changes in v13 can be found inline; major changes since v12 include: * Include Mukesh''s toolstack patches (v4) * Allocate hvm_param struct for PVH domains; remove patch disabling memevents For those who have been following the series as it develops, here is a summary of the major changes from Mukesh''s series (v11->v12): * Introduction of "has_hvm_container_*()" macros, rather than using "!is_pv_*". The patch which introduces this also does the vast majority of the "heavy lifting" in terms of defining PVH. * Effort is made to use as much common code as possible. No separate vmcs constructor, no separate vmexit handlers. More of a "start with everything and disable if necessary" approach rather than "start with nothing and enable as needed" approach. * One exception is arch_set_info_guest(), where a small amount of code duplication meant a lot fewer "if(!is_pvh_domain())"s in awkward places * I rely on things being disabled at a higher level and passed down. For instance, I no longer explicitly disable rdtsc exiting in construct_vmcs(), since that will happen automatically when we''re in NEVER_EMULATE mode (which is currently enforced for PVH). Similarly for nested vmx and things relating to HAP mode. * I have also done a slightly more extensive audit of is_pv_* and is_hvm_* and tried to do more restrictions. * I changed the "enable PVH by setting PV + HAP", replacing it instead with a separate flag, just like the HVM case, since it makes sense to plan on using shadow in the future (although it is Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> CC: Mukesh Rathor <mukesh.rathor@oracle.com> CC: Jan Beulich <beulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> CC: Ian Jackson <ian.jackson@citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Jun Nakajima <jun.nakajima@intel.com> CC: Eddie Dong <eddie.dong@intel.com>
George Dunlap
2013-Nov-11 14:57 UTC
[PATCH v15 01/19] Allow vmx_update_debug_state to be called when v!=current
Removing the assert allows the PVH code to call this during vmcs construction in a later patch, making the code more robust by removing duplicate code. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Tim Deegan <tim@xen.org> ---- v13: Add vmx_vmcs_{enter,exit} CC: Mukesh Rathor <mukesh.rathor@oracle.com> CC: Jan Beulich <jbeulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> CC: Jun Nakajima <jun.nakajima@intel.com> CC: Eddie Dong <eddie.dong@intel.com> --- xen/arch/x86/hvm/vmx/vmx.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index a0ad37a..f4e5bcb 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1104,8 +1104,6 @@ void vmx_update_debug_state(struct vcpu *v) { unsigned long mask; - ASSERT(v == current); - mask = 1u << TRAP_int3; if ( !cpu_has_monitor_trap_flag ) mask |= 1u << TRAP_debug; @@ -1114,7 +1112,10 @@ void vmx_update_debug_state(struct vcpu *v) v->arch.hvm_vmx.exception_bitmap |= mask; else v->arch.hvm_vmx.exception_bitmap &= ~mask; + + vmx_vmcs_enter(v); vmx_update_exception_bitmap(v); + vmx_vmcs_exit(v); } static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr) -- 1.7.9.5
George Dunlap
2013-Nov-11 14:57 UTC
[PATCH v15 02/19] libxc: Move temporary grant table mapping to end of memory
From: Roger Pau Monné <roger.pau@citrix.com> In order to set up the grant table for HVM guests, libxc needs to map the grant table temporarily. At the moment, it does this by adding the grant page to the HVM guest's p2m table in the MMIO hole (at gfn 0xFFFFE), then mapping that gfn, setting up the table, then unmapping the gfn and removing it from the p2m table. This breaks with PVH guests with 4G or more of ram, because there is no MMIO hole; so it ends up clobbering a valid RAM p2m entry, then leaving a "hole" when it removes the grant map from the p2m table. Since the guest thinks this is normal ram, when it maps it and tries to access the page, it crashes. This patch maps the page at max_gfn+1 instead. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> --- tools/libxc/xc_dom.h | 3 --- tools/libxc/xc_dom_boot.c | 14 ++++++++++++-- 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/tools/libxc/xc_dom.h b/tools/libxc/xc_dom.h index 86e23ee..935b49e 100644 --- a/tools/libxc/xc_dom.h +++ b/tools/libxc/xc_dom.h @@ -18,9 +18,6 @@ #define INVALID_P2M_ENTRY ((xen_pfn_t)-1) -/* Scrach PFN for temporary mappings in HVM */ -#define SCRATCH_PFN_GNTTAB 0xFFFFE - /* --- typedefs and structs ---------------------------------------- */ typedef uint64_t xen_vaddr_t; diff --git a/tools/libxc/xc_dom_boot.c b/tools/libxc/xc_dom_boot.c index 71e1897..fdfeaf8 100644 --- a/tools/libxc/xc_dom_boot.c +++ b/tools/libxc/xc_dom_boot.c @@ -361,17 +361,27 @@ int xc_dom_gnttab_hvm_seed(xc_interface *xch, domid_t domid, domid_t xenstore_domid) { int rc; + xen_pfn_t max_gfn; struct xen_add_to_physmap xatp = { .domid = domid, .space = XENMAPSPACE_grant_table, .idx = 0, - .gpfn = SCRATCH_PFN_GNTTAB }; struct xen_remove_from_physmap xrfp = { .domid = domid, - .gpfn = SCRATCH_PFN_GNTTAB }; + max_gfn = xc_domain_maximum_gpfn(xch, domid); + if ( max_gfn <= 0 ) { + xc_dom_panic(xch, XC_INTERNAL_ERROR, + "%s: failed to get max gfn " + "[errno=%d]\n", + __FUNCTION__, errno); + return -1; + } + xatp.gpfn = max_gfn + 1; + xrfp.gpfn = max_gfn + 1; + rc = do_memory_op(xch, XENMEM_add_to_physmap, &xatp, sizeof(xatp)); if ( rc != 0 ) { -- 1.7.9.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
There are many functions where PVH requires some code in common with HVM. Rearrange some of these functions so that the code is together. In general, the HVM code that PVH also uses includes: - cacheattr functionality - paging - hvm_funcs - hvm_assert_evtchn_irq tasklet - tm_list - hvm_params And code that PVH shares with PV but not with PVH: - updating the domain wallclock - setting v->is_initialized There should be no end-to-end changes in behavior. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: Tim Deegan <tim@xen.org> --- v15: - Nuke unnecessary movement. v14: - Remove changes in arch_set_info_guest (more of the code is unified) - hvm_funcs.vcpu_initialise() must be called after vlapic_init() v13: - Don''t bother calling tasklet_kill in failure path of hvm_vcpu_initialize - Allocate hvm_params for PVH domains CC: Jan Beulich <jbeulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> --- xen/arch/x86/hvm/hvm.c | 89 +++++++++++++++++++++++++----------------------- 1 file changed, 47 insertions(+), 42 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 36699b8..85f9857 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -525,11 +525,17 @@ int hvm_domain_initialise(struct domain *d) INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list); spin_lock_init(&d->arch.hvm_domain.msixtbl_list_lock); + hvm_init_cacheattr_region_list(d); + + rc = paging_enable(d, PG_refcounts|PG_translate|PG_external); + if ( rc != 0 ) + goto fail0; + d->arch.hvm_domain.params = xzalloc_array(uint64_t, HVM_NR_PARAMS); d->arch.hvm_domain.io_handler = xmalloc(struct hvm_io_handler); rc = -ENOMEM; if ( !d->arch.hvm_domain.params || !d->arch.hvm_domain.io_handler ) - goto fail0; + goto fail1; d->arch.hvm_domain.io_handler->num_slot = 0; hvm_init_guest_time(d); @@ -537,12 +543,6 @@ int hvm_domain_initialise(struct domain *d) d->arch.hvm_domain.params[HVM_PARAM_HPET_ENABLED] = 1; d->arch.hvm_domain.params[HVM_PARAM_TRIPLE_FAULT_REASON] = SHUTDOWN_reboot; - hvm_init_cacheattr_region_list(d); - - rc = paging_enable(d, PG_refcounts|PG_translate|PG_external); - if ( rc != 0 ) - goto fail1; - vpic_init(d); rc = vioapic_init(d); @@ -569,10 +569,10 @@ int hvm_domain_initialise(struct domain *d) stdvga_deinit(d); vioapic_deinit(d); fail1: - hvm_destroy_cacheattr_region_list(d); - fail0: xfree(d->arch.hvm_domain.io_handler); xfree(d->arch.hvm_domain.params); + fail0: + hvm_destroy_cacheattr_region_list(d); return rc; } @@ -601,11 +601,11 @@ void hvm_domain_relinquish_resources(struct domain *d) void hvm_domain_destroy(struct domain *d) { + hvm_destroy_cacheattr_region_list(d); hvm_funcs.domain_destroy(d); rtc_deinit(d); stdvga_deinit(d); vioapic_deinit(d); - hvm_destroy_cacheattr_region_list(d); } static int hvm_save_tsc_adjust(struct domain *d, hvm_domain_context_t *h) @@ -1091,24 +1091,47 @@ int hvm_vcpu_initialise(struct vcpu *v) { int rc; struct domain *d = v->domain; - domid_t dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN]; + domid_t dm_domid; hvm_asid_flush_vcpu(v); - if ( (rc = vlapic_init(v)) != 0 ) + spin_lock_init(&v->arch.hvm_vcpu.tm_lock); + INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list); + + rc = hvm_vcpu_cacheattr_init(v); /* teardown: vcpu_cacheattr_destroy */ + if ( rc != 0 ) goto fail1; - if ( (rc = hvm_funcs.vcpu_initialise(v)) != 0 ) + /* NB: vlapic_init must be called before hvm_funcs.vcpu_initialise */ + if ( (rc = vlapic_init(v)) != 0 ) /* teardown: vlapic_destroy */ goto fail2; - if ( nestedhvm_enabled(d) - && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) + if ( (rc = hvm_funcs.vcpu_initialise(v)) != 0 ) /* teardown: hvm_funcs.vcpu_destroy */ goto fail3; + softirq_tasklet_init( + &v->arch.hvm_vcpu.assert_evtchn_irq_tasklet, + (void(*)(unsigned long))hvm_assert_evtchn_irq, + (unsigned long)v); + + v->arch.user_regs.eflags = 2; + + v->arch.hvm_vcpu.inject_trap.vector = -1; + + rc = setup_compat_arg_xlat(v); /* teardown: free_compat_arg_xlat() */ + if ( rc != 0 ) + goto fail4; + + if ( nestedhvm_enabled(d) + && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) /* teardown: nestedhvm_vcpu_destroy */ + goto fail5; + + dm_domid = d->arch.hvm_domain.params[HVM_PARAM_DM_DOMAIN]; + /* Create ioreq event channel. */ - rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); + rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */ if ( rc < 0 ) - goto fail4; + goto fail6; /* Register ioreq event channel. */ v->arch.hvm_vcpu.xen_port = rc; @@ -1116,9 +1139,9 @@ int hvm_vcpu_initialise(struct vcpu *v) if ( v->vcpu_id == 0 ) { /* Create bufioreq event channel. */ - rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); + rc = alloc_unbound_xen_event_channel(v, dm_domid, NULL); /* teardown: none */ if ( rc < 0 ) - goto fail4; + goto fail6; d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] = rc; } @@ -1127,26 +1150,6 @@ int hvm_vcpu_initialise(struct vcpu *v) get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port; spin_unlock(&d->arch.hvm_domain.ioreq.lock); - spin_lock_init(&v->arch.hvm_vcpu.tm_lock); - INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list); - - v->arch.hvm_vcpu.inject_trap.vector = -1; - - rc = setup_compat_arg_xlat(v); - if ( rc != 0 ) - goto fail4; - - rc = hvm_vcpu_cacheattr_init(v); - if ( rc != 0 ) - goto fail5; - - softirq_tasklet_init( - &v->arch.hvm_vcpu.assert_evtchn_irq_tasklet, - (void(*)(unsigned long))hvm_assert_evtchn_irq, - (unsigned long)v); - - v->arch.user_regs.eflags = 2; - if ( v->vcpu_id == 0 ) { /* NB. All these really belong in hvm_domain_initialise(). */ @@ -1164,14 +1167,16 @@ int hvm_vcpu_initialise(struct vcpu *v) return 0; + fail6: + nestedhvm_vcpu_destroy(v); fail5: free_compat_arg_xlat(v); fail4: - nestedhvm_vcpu_destroy(v); - fail3: hvm_funcs.vcpu_destroy(v); - fail2: + fail3: vlapic_destroy(v); + fail2: + hvm_vcpu_cacheattr_destroy(v); fail1: return rc; } -- 1.7.9.5
George Dunlap
2013-Nov-11 14:57 UTC
[PATCH v15 04/19] pvh: Tolerate HVM guests having no ioreq page
PVH guests don''t have a backing device model emulator (qemu); just tolerate this situation explicitly, rather than special-casing PVH. For unhandled IO, hvmemul_do_io() will now return X86EMUL_OKAY, which is I believe what would be the effect if qemu didn''t have a handler for the IO. This also fixes a potetial DoS in the host from the reworked series: If the guest makes a hypercall which sends an invalidate request, it would have crashed the host. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> --- CC: Mukesh Rathor <mukesh.rathor@oracle.com> CC: Jan Beulich <beulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> --- xen/arch/x86/hvm/emulate.c | 31 ++++++++++++++++++++++++++----- xen/arch/x86/hvm/hvm.c | 11 ++++++++--- xen/arch/x86/hvm/io.c | 8 +++++--- xen/include/asm-x86/hvm/io.h | 2 +- xen/include/asm-x86/hvm/support.h | 3 +-- 5 files changed, 41 insertions(+), 14 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index f39c173..590dcfe 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -58,10 +58,22 @@ static int hvmemul_do_io( struct vcpu *curr = current; struct hvm_vcpu_io *vio; ioreq_t *p = get_ioreq(curr); + ioreq_t _ioreq; unsigned long ram_gfn = paddr_to_pfn(ram_gpa); p2m_type_t p2mt; struct page_info *ram_page; int rc; + int has_dm = 1; + + /* Domains without a backing DM, don''t have an ioreq page. Just + * point to a struct on the stack, initialising the state as + * needed. */ + if ( !p ) + { + has_dm = 0; + p = &_ioreq; + p->state = STATE_IOREQ_NONE; + } /* Check for paged out page */ ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE); @@ -211,7 +223,7 @@ static int hvmemul_do_io( p->state = STATE_IORESP_READY; if ( !vio->mmio_retry ) { - hvm_io_assist(); + hvm_io_assist(p); vio->io_state = HVMIO_none; } else @@ -219,11 +231,20 @@ static int hvmemul_do_io( vio->io_state = HVMIO_handle_mmio_awaiting_completion; break; case X86EMUL_UNHANDLEABLE: - rc = X86EMUL_RETRY; - if ( !hvm_send_assist_req(curr) ) - vio->io_state = HVMIO_none; - else if ( p_data == NULL ) + /* If there is no backing DM, just ignore accesses */ + if ( !has_dm ) + { rc = X86EMUL_OKAY; + vio->io_state = HVMIO_none; + } + else + { + rc = X86EMUL_RETRY; + if ( !hvm_send_assist_req(curr) ) + vio->io_state = HVMIO_none; + else if ( p_data == NULL ) + rc = X86EMUL_OKAY; + } break; default: BUG(); diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 85f9857..7aa1e8a 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -347,13 +347,15 @@ void hvm_do_resume(struct vcpu *v) check_wakeup_from_wait(); /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */ - p = get_ioreq(v); + if ( !(p = get_ioreq(v)) ) + goto check_inject_trap; + while ( p->state != STATE_IOREQ_NONE ) { switch ( p->state ) { case STATE_IORESP_READY: /* IORESP_READY -> NONE */ - hvm_io_assist(); + hvm_io_assist(p); break; case STATE_IOREQ_READY: /* IOREQ_{READY,INPROCESS} -> IORESP_READY */ case STATE_IOREQ_INPROCESS: @@ -368,6 +370,7 @@ void hvm_do_resume(struct vcpu *v) } } + check_inject_trap: /* Inject pending hw/sw trap */ if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) { @@ -1227,7 +1230,9 @@ bool_t hvm_send_assist_req(struct vcpu *v) if ( unlikely(!vcpu_start_shutdown_deferral(v)) ) return 0; /* implicitly bins the i/o operation */ - p = get_ioreq(v); + if ( !(p = get_ioreq(v)) ) + return 0; + if ( unlikely(p->state != STATE_IOREQ_NONE) ) { /* This indicates a bug in the device model. Crash the domain. */ diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c index feb0406..0389c56 100644 --- a/xen/arch/x86/hvm/io.c +++ b/xen/arch/x86/hvm/io.c @@ -150,7 +150,10 @@ void send_timeoffset_req(unsigned long timeoff) void send_invalidate_req(void) { struct vcpu *v = current; - ioreq_t *p = get_ioreq(v); + ioreq_t *p; + + if ( !(p = get_ioreq(v)) ) + return; if ( p->state != STATE_IOREQ_NONE ) { @@ -262,11 +265,10 @@ int handle_pio(uint16_t port, unsigned int size, int dir) return 1; } -void hvm_io_assist(void) +void hvm_io_assist(ioreq_t *p) { struct vcpu *curr = current; struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io; - ioreq_t *p = get_ioreq(curr); enum hvm_io_state io_state; rmb(); /* see IORESP_READY /then/ read contents of ioreq */ diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h index b0718b8..6f4cb96 100644 --- a/xen/include/asm-x86/hvm/io.h +++ b/xen/include/asm-x86/hvm/io.h @@ -121,7 +121,7 @@ int handle_mmio(void); int handle_mmio_with_translation(unsigned long gva, unsigned long gpfn); int handle_pio(uint16_t port, unsigned int size, int dir); void hvm_interrupt_post(struct vcpu *v, int vector, int type); -void hvm_io_assist(void); +void hvm_io_assist(ioreq_t *p); void hvm_dpci_eoi(struct domain *d, unsigned int guest_irq, union vioapic_redir_entry *ent); diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h index 52aef1f..3529499 100644 --- a/xen/include/asm-x86/hvm/support.h +++ b/xen/include/asm-x86/hvm/support.h @@ -32,8 +32,7 @@ static inline ioreq_t *get_ioreq(struct vcpu *v) struct domain *d = v->domain; shared_iopage_t *p = d->arch.hvm_domain.ioreq.va; ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock)); - ASSERT(d->arch.hvm_domain.ioreq.va != NULL); - return &p->vcpu_ioreq[v->vcpu_id]; + return p ? &p->vcpu_ioreq[v->vcpu_id] : NULL; } #define HVM_DELIVER_NO_ERROR_CODE -1 -- 1.7.9.5
George Dunlap
2013-Nov-11 14:57 UTC
[PATCH v15 05/19] pvh prep: Introduce pv guest type and has_hvm_container macros
The goal of this patch is to classify conditionals more clearly, as to whether they relate to pv guests, hvm-only guests, or guests with an "hvm container" (which will eventually include PVH). This patch introduces an enum for guest type, as well as two new macros for switching behavior on and off: is_pv_* and has_hvm_container_*. At the moment is_pv_* <=> !has_hvm_container_*. The purpose of having two is that it seems to me different to take a path because something does *not* have PV structures as to take a path because it *does* have HVM structures, even if the two happen to coincide 100% at the moment. The exact usage is occasionally a bit fuzzy though, and a judgement call just needs to be made on which is clearer. In general, a switch should use is_pv_* (or !is_pv_*) if the code in question relates directly to a PV guest. Examples include use of pv_vcpu structs or other behavior directly related to PV domains. hvm_container is more of a fuzzy concept, but in general: * Most core HVM behavior will be included in this. Behavior not appropriate for PVH mode will be disabled in later patches * Hypercalls related to HVM guests will *not* be included by default; functionality needed by PVH guests will be enabled in future patches * The following functionality are not considered part of the HVM container, and PVH will end up behaving like PV by default: Event channel, vtsc offset, code related to emulated timers, nested HVM, emuirq, PoD * Some features are left to implement for PVH later: vpmu, shadow mode Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: Tim Deegan <tim@xen.org> --- v13: - Readjust where we choose to use has_hvm_container or !is_pv (and vice versa) - Do the memset in arch_set_info_guest unconditionally - Chech for is_pv in do_page_walk CC: Jan Beulich <jbeulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> --- xen/arch/x86/acpi/suspend.c | 2 +- xen/arch/x86/cpu/mcheck/vmce.c | 6 ++-- xen/arch/x86/debug.c | 2 +- xen/arch/x86/domain.c | 54 ++++++++++++++++++------------------ xen/arch/x86/domain_page.c | 10 +++---- xen/arch/x86/domctl.c | 11 ++++---- xen/arch/x86/efi/runtime.c | 4 +-- xen/arch/x86/hvm/vmx/vmcs.c | 4 +-- xen/arch/x86/mm.c | 6 ++-- xen/arch/x86/mm/shadow/common.c | 6 ++-- xen/arch/x86/mm/shadow/multi.c | 7 +++-- xen/arch/x86/physdev.c | 4 +-- xen/arch/x86/traps.c | 5 ++-- xen/arch/x86/x86_64/mm.c | 2 +- xen/arch/x86/x86_64/traps.c | 8 +++--- xen/common/domain.c | 2 +- xen/common/grant_table.c | 4 +-- xen/common/kernel.c | 2 +- xen/include/asm-x86/domain.h | 2 +- xen/include/asm-x86/event.h | 2 +- xen/include/asm-x86/guest_access.h | 12 ++++---- xen/include/asm-x86/guest_pt.h | 4 +-- xen/include/xen/sched.h | 14 ++++++++-- xen/include/xen/tmem_xen.h | 2 +- 24 files changed, 92 insertions(+), 83 deletions(-) diff --git a/xen/arch/x86/acpi/suspend.c b/xen/arch/x86/acpi/suspend.c index 6fdd876..1718930 100644 --- a/xen/arch/x86/acpi/suspend.c +++ b/xen/arch/x86/acpi/suspend.c @@ -85,7 +85,7 @@ void restore_rest_processor_state(void) BUG(); /* Maybe load the debug registers. */ - BUG_ON(is_hvm_vcpu(curr)); + BUG_ON(!is_pv_vcpu(curr)); if ( !is_idle_vcpu(curr) && curr->arch.debugreg[7] ) { write_debugreg(0, curr->arch.debugreg[0]); diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c index af3b491..f6c35db 100644 --- a/xen/arch/x86/cpu/mcheck/vmce.c +++ b/xen/arch/x86/cpu/mcheck/vmce.c @@ -83,7 +83,7 @@ int vmce_restore_vcpu(struct vcpu *v, const struct hvm_vmce_vcpu *ctxt) { dprintk(XENLOG_G_ERR, "%s restore: unsupported MCA capabilities" " %#" PRIx64 " for d%d:v%u (supported: %#Lx)\n", - is_hvm_vcpu(v) ? "HVM" : "PV", ctxt->caps, + has_hvm_container_vcpu(v) ? "HVM" : "PV", ctxt->caps, v->domain->domain_id, v->vcpu_id, guest_mcg_cap & ~MCG_CAP_COUNT); return -EPERM; @@ -357,7 +357,7 @@ int inject_vmce(struct domain *d, int vcpu) if ( vcpu != VMCE_INJECT_BROADCAST && vcpu != v->vcpu_id ) continue; - if ( (is_hvm_domain(d) || + if ( (has_hvm_container_domain(d) || guest_has_trap_callback(d, v->vcpu_id, TRAP_machine_check)) && !test_and_set_bool(v->mce_pending) ) { @@ -439,7 +439,7 @@ int unmmap_broken_page(struct domain *d, mfn_t mfn, unsigned long gfn) if (!mfn_valid(mfn_x(mfn))) return -EINVAL; - if ( !is_hvm_domain(d) || !paging_mode_hap(d) ) + if ( !has_hvm_container_domain(d) || !paging_mode_hap(d) ) return -ENOSYS; rc = -1; diff --git a/xen/arch/x86/debug.c b/xen/arch/x86/debug.c index e67473e..3e21ca8 100644 --- a/xen/arch/x86/debug.c +++ b/xen/arch/x86/debug.c @@ -158,7 +158,7 @@ dbg_rw_guest_mem(dbgva_t addr, dbgbyte_t *buf, int len, struct domain *dp, pagecnt = min_t(long, PAGE_SIZE - (addr & ~PAGE_MASK), len); - mfn = (dp->is_hvm + mfn = (has_hvm_container_domain(dp) ? dbg_hvm_va2mfn(addr, dp, toaddr, &gfn) : dbg_pv_va2mfn(addr, dp, pgd3)); if ( mfn == INVALID_MFN ) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index b67fcb8..358616c 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -167,7 +167,7 @@ void dump_pageframe_info(struct domain *d) spin_unlock(&d->page_alloc_lock); } - if ( is_hvm_domain(d) ) + if ( has_hvm_container_domain(d) ) p2m_pod_dump_data(d); spin_lock(&d->page_alloc_lock); @@ -385,7 +385,7 @@ int vcpu_initialise(struct vcpu *v) vmce_init_vcpu(v); - if ( is_hvm_domain(d) ) + if ( has_hvm_container_domain(d) ) { rc = hvm_vcpu_initialise(v); goto done; @@ -438,7 +438,7 @@ int vcpu_initialise(struct vcpu *v) { vcpu_destroy_fpu(v); - if ( !is_hvm_domain(d) ) + if ( is_pv_domain(d) ) xfree(v->arch.pv_vcpu.trap_ctxt); } @@ -452,7 +452,7 @@ void vcpu_destroy(struct vcpu *v) vcpu_destroy_fpu(v); - if ( is_hvm_vcpu(v) ) + if ( has_hvm_container_vcpu(v) ) hvm_vcpu_destroy(v); else xfree(v->arch.pv_vcpu.trap_ctxt); @@ -464,7 +464,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) int rc = -ENOMEM; d->arch.hvm_domain.hap_enabled - is_hvm_domain(d) && + has_hvm_container_domain(d) && hvm_funcs.hap_supported && (domcr_flags & DOMCRF_hap); d->arch.hvm_domain.mem_sharing_enabled = 0; @@ -490,7 +490,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) d->domain_id); } - if ( is_hvm_domain(d) ) + if ( has_hvm_container_domain(d) ) rc = create_perdomain_mapping(d, PERDOMAIN_VIRT_START, 0, NULL, NULL); else if ( is_idle_domain(d) ) rc = 0; @@ -512,7 +512,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) mapcache_domain_init(d); HYPERVISOR_COMPAT_VIRT_START(d) - is_hvm_domain(d) ? ~0u : __HYPERVISOR_COMPAT_VIRT_START; + is_pv_domain(d) ? __HYPERVISOR_COMPAT_VIRT_START : ~0u; if ( (rc = paging_domain_init(d, domcr_flags)) != 0 ) goto fail; @@ -554,7 +554,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) goto fail; } - if ( is_hvm_domain(d) ) + if ( has_hvm_container_domain(d) ) { if ( (rc = hvm_domain_initialise(d)) != 0 ) { @@ -583,14 +583,14 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) if ( paging_initialised ) paging_final_teardown(d); free_perdomain_mappings(d); - if ( !is_hvm_domain(d) ) + if ( is_pv_domain(d) ) free_xenheap_page(d->arch.pv_domain.gdt_ldt_l1tab); return rc; } void arch_domain_destroy(struct domain *d) { - if ( is_hvm_domain(d) ) + if ( has_hvm_container_domain(d) ) hvm_domain_destroy(d); else xfree(d->arch.pv_domain.e820); @@ -602,7 +602,7 @@ void arch_domain_destroy(struct domain *d) paging_final_teardown(d); free_perdomain_mappings(d); - if ( !is_hvm_domain(d) ) + if ( is_pv_domain(d) ) free_xenheap_page(d->arch.pv_domain.gdt_ldt_l1tab); free_xenheap_page(d->shared_info); @@ -653,7 +653,7 @@ int arch_set_info_guest( #define c(fld) (compat ? (c.cmp->fld) : (c.nat->fld)) flags = c(flags); - if ( !is_hvm_vcpu(v) ) + if ( is_pv_vcpu(v) ) { if ( !compat ) { @@ -698,7 +698,7 @@ int arch_set_info_guest( v->fpu_initialised = !!(flags & VGCF_I387_VALID); v->arch.flags &= ~TF_kernel_mode; - if ( (flags & VGCF_in_kernel) || is_hvm_vcpu(v)/*???*/ ) + if ( (flags & VGCF_in_kernel) || has_hvm_container_vcpu(v)/*???*/ ) v->arch.flags |= TF_kernel_mode; v->arch.vgc_flags = flags; @@ -713,7 +713,7 @@ int arch_set_info_guest( if ( !compat ) { memcpy(&v->arch.user_regs, &c.nat->user_regs, sizeof(c.nat->user_regs)); - if ( !is_hvm_vcpu(v) ) + if ( is_pv_vcpu(v) ) memcpy(v->arch.pv_vcpu.trap_ctxt, c.nat->trap_ctxt, sizeof(c.nat->trap_ctxt)); } @@ -729,7 +729,7 @@ int arch_set_info_guest( v->arch.user_regs.eflags |= 2; - if ( is_hvm_vcpu(v) ) + if ( has_hvm_container_vcpu(v) ) { hvm_set_info_guest(v); goto out; @@ -959,7 +959,7 @@ int arch_set_info_guest( int arch_vcpu_reset(struct vcpu *v) { - if ( !is_hvm_vcpu(v) ) + if ( is_pv_vcpu(v) ) { destroy_gdt(v); return vcpu_destroy_pagetables(v); @@ -1309,7 +1309,7 @@ static void update_runstate_area(struct vcpu *v) static inline int need_full_gdt(struct vcpu *v) { - return (!is_hvm_vcpu(v) && !is_idle_vcpu(v)); + return (is_pv_vcpu(v) && !is_idle_vcpu(v)); } static void __context_switch(void) @@ -1435,9 +1435,9 @@ void context_switch(struct vcpu *prev, struct vcpu *next) { __context_switch(); - if ( !is_hvm_vcpu(next) && + if ( is_pv_vcpu(next) && (is_idle_vcpu(prev) || - is_hvm_vcpu(prev) || + has_hvm_container_vcpu(prev) || is_pv_32on64_vcpu(prev) != is_pv_32on64_vcpu(next)) ) { uint64_t efer = read_efer(); @@ -1448,13 +1448,13 @@ void context_switch(struct vcpu *prev, struct vcpu *next) /* Re-enable interrupts before restoring state which may fault. */ local_irq_enable(); - if ( !is_hvm_vcpu(next) ) + if ( is_pv_vcpu(next) ) { load_LDT(next); load_segments(next); } - set_cpuid_faulting(!is_hvm_vcpu(next) && + set_cpuid_faulting(is_pv_vcpu(next) && (next->domain->domain_id != 0)); } @@ -1537,7 +1537,7 @@ void hypercall_cancel_continuation(void) } else { - if ( !is_hvm_vcpu(current) ) + if ( is_pv_vcpu(current) ) regs->eip += 2; /* skip re-execute ''syscall'' / ''int $xx'' */ else current->arch.hvm_vcpu.hcall_preempted = 0; @@ -1574,12 +1574,12 @@ unsigned long hypercall_create_continuation( regs->eax = op; /* Ensure the hypercall trap instruction is re-executed. */ - if ( !is_hvm_vcpu(current) ) + if ( is_pv_vcpu(current) ) regs->eip -= 2; /* re-execute ''syscall'' / ''int $xx'' */ else current->arch.hvm_vcpu.hcall_preempted = 1; - if ( !is_hvm_vcpu(current) ? + if ( is_pv_vcpu(current) ? !is_pv_32on64_vcpu(current) : (hvm_guest_x86_mode(current) == 8) ) { @@ -1851,7 +1851,7 @@ int domain_relinquish_resources(struct domain *d) return ret; } - if ( !is_hvm_domain(d) ) + if ( is_pv_domain(d) ) { for_each_vcpu ( d, v ) { @@ -1924,7 +1924,7 @@ int domain_relinquish_resources(struct domain *d) BUG(); } - if ( is_hvm_domain(d) ) + if ( has_hvm_container_domain(d) ) hvm_domain_relinquish_resources(d); return 0; @@ -2008,7 +2008,7 @@ void vcpu_mark_events_pending(struct vcpu *v) if ( already_pending ) return; - if ( is_hvm_vcpu(v) ) + if ( has_hvm_container_vcpu(v) ) hvm_assert_evtchn_irq(v); else vcpu_kick(v); diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c index bc18263..3903952 100644 --- a/xen/arch/x86/domain_page.c +++ b/xen/arch/x86/domain_page.c @@ -35,7 +35,7 @@ static inline struct vcpu *mapcache_current_vcpu(void) * then it means we are running on the idle domain''s page table and must * therefore use its mapcache. */ - if ( unlikely(pagetable_is_null(v->arch.guest_table)) && !is_hvm_vcpu(v) ) + if ( unlikely(pagetable_is_null(v->arch.guest_table)) && is_pv_vcpu(v) ) { /* If we really are idling, perform lazy context switch now. */ if ( (v = idle_vcpu[smp_processor_id()]) == current ) @@ -72,7 +72,7 @@ void *map_domain_page(unsigned long mfn) #endif v = mapcache_current_vcpu(); - if ( !v || is_hvm_vcpu(v) ) + if ( !v || !is_pv_vcpu(v) ) return mfn_to_virt(mfn); dcache = &v->domain->arch.pv_domain.mapcache; @@ -177,7 +177,7 @@ void unmap_domain_page(const void *ptr) ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END); v = mapcache_current_vcpu(); - ASSERT(v && !is_hvm_vcpu(v)); + ASSERT(v && is_pv_vcpu(v)); dcache = &v->domain->arch.pv_domain.mapcache; ASSERT(dcache->inuse); @@ -244,7 +244,7 @@ int mapcache_domain_init(struct domain *d) struct mapcache_domain *dcache = &d->arch.pv_domain.mapcache; unsigned int bitmap_pages; - if ( is_hvm_domain(d) || is_idle_domain(d) ) + if ( !is_pv_domain(d) || is_idle_domain(d) ) return 0; #ifdef NDEBUG @@ -275,7 +275,7 @@ int mapcache_vcpu_init(struct vcpu *v) unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES; unsigned int nr = PFN_UP(BITS_TO_LONGS(ents) * sizeof(long)); - if ( is_hvm_vcpu(v) || !dcache->inuse ) + if ( !is_pv_vcpu(v) || !dcache->inuse ) return 0; if ( ents > dcache->entries ) diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index e75918a..9531a16 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -800,7 +800,7 @@ long arch_do_domctl( if ( domctl->cmd == XEN_DOMCTL_get_ext_vcpucontext ) { evc->size = sizeof(*evc); - if ( !is_hvm_domain(d) ) + if ( is_pv_domain(d) ) { evc->sysenter_callback_cs v->arch.pv_vcpu.sysenter_callback_cs; @@ -833,7 +833,7 @@ long arch_do_domctl( ret = -EINVAL; if ( evc->size < offsetof(typeof(*evc), vmce) ) goto ext_vcpucontext_out; - if ( !is_hvm_domain(d) ) + if ( is_pv_domain(d) ) { if ( !is_canonical_address(evc->sysenter_callback_eip) || !is_canonical_address(evc->syscall32_callback_eip) ) @@ -1246,8 +1246,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c) bool_t compat = is_pv_32on64_domain(v->domain); #define c(fld) (!compat ? (c.nat->fld) : (c.cmp->fld)) - if ( is_hvm_vcpu(v) ) - memset(c.nat, 0, sizeof(*c.nat)); + memset(c.nat, 0, sizeof(*c.nat)); memcpy(&c.nat->fpu_ctxt, v->arch.fpu_ctxt, sizeof(c.nat->fpu_ctxt)); c(flags = v->arch.vgc_flags & ~(VGCF_i387_valid|VGCF_in_kernel)); if ( v->fpu_initialised ) @@ -1257,7 +1256,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c) if ( !compat ) { memcpy(&c.nat->user_regs, &v->arch.user_regs, sizeof(c.nat->user_regs)); - if ( !is_hvm_vcpu(v) ) + if ( is_pv_vcpu(v) ) memcpy(c.nat->trap_ctxt, v->arch.pv_vcpu.trap_ctxt, sizeof(c.nat->trap_ctxt)); } @@ -1272,7 +1271,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c) for ( i = 0; i < ARRAY_SIZE(v->arch.debugreg); ++i ) c(debugreg[i] = v->arch.debugreg[i]); - if ( is_hvm_vcpu(v) ) + if ( has_hvm_container_vcpu(v) ) { struct segment_register sreg; diff --git a/xen/arch/x86/efi/runtime.c b/xen/arch/x86/efi/runtime.c index 37bb535..d7c884b 100644 --- a/xen/arch/x86/efi/runtime.c +++ b/xen/arch/x86/efi/runtime.c @@ -52,7 +52,7 @@ unsigned long efi_rs_enter(void) /* prevent fixup_page_fault() from doing anything */ irq_enter(); - if ( !is_hvm_vcpu(current) && !is_idle_vcpu(current) ) + if ( is_pv_vcpu(current) && !is_idle_vcpu(current) ) { struct desc_ptr gdt_desc = { .limit = LAST_RESERVED_GDT_BYTE, @@ -71,7 +71,7 @@ unsigned long efi_rs_enter(void) void efi_rs_leave(unsigned long cr3) { write_cr3(cr3); - if ( !is_hvm_vcpu(current) && !is_idle_vcpu(current) ) + if ( is_pv_vcpu(current) && !is_idle_vcpu(current) ) { struct desc_ptr gdt_desc = { .limit = LAST_RESERVED_GDT_BYTE, diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 7a01b1f..fa92855 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -650,7 +650,7 @@ void vmx_vmcs_exit(struct vcpu *v) { /* Don''t confuse vmx_do_resume (for @v or @current!) */ vmx_clear_vmcs(v); - if ( is_hvm_vcpu(current) ) + if ( has_hvm_container_vcpu(current) ) vmx_load_vmcs(current); spin_unlock(&v->arch.hvm_vmx.vmcs_lock); @@ -1479,7 +1479,7 @@ static void vmcs_dump(unsigned char ch) for_each_domain ( d ) { - if ( !is_hvm_domain(d) ) + if ( !has_hvm_container_domain(d) ) continue; printk("\n>>> Domain %d <<<\n", d->domain_id); for_each_vcpu ( d, v ) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 43aaceb..9621e22 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -181,7 +181,7 @@ static uint32_t base_disallow_mask; (rangeset_is_empty((d)->iomem_caps) && \ rangeset_is_empty((d)->arch.ioport_caps) && \ !has_arch_pdevs(d) && \ - !is_hvm_domain(d)) ? \ + is_pv_domain(d)) ? \ L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS)) static void __init init_frametable_chunk(void *start, void *end) @@ -433,7 +433,7 @@ int page_is_ram_type(unsigned long mfn, unsigned long mem_type) unsigned long domain_get_maximum_gpfn(struct domain *d) { - if ( is_hvm_domain(d) ) + if ( has_hvm_container_domain(d) ) return p2m_get_hostp2m(d)->max_mapped_pfn; /* NB. PV guests specify nr_pfns rather than max_pfn so we adjust here. */ return (arch_get_max_pfn(d) ?: 1) - 1; @@ -2381,7 +2381,7 @@ static int __get_page_type(struct page_info *page, unsigned long type, { /* Special pages should not be accessible from devices. */ struct domain *d = page_get_owner(page); - if ( d && !is_hvm_domain(d) && unlikely(need_iommu(d)) ) + if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) ) { if ( (x & PGT_type_mask) == PGT_writable_page ) iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page))); diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c index adffa06..0bfa595 100644 --- a/xen/arch/x86/mm/shadow/common.c +++ b/xen/arch/x86/mm/shadow/common.c @@ -367,7 +367,7 @@ const struct x86_emulate_ops *shadow_init_emulation( sh_ctxt->ctxt.regs = regs; sh_ctxt->ctxt.force_writeback = 0; - if ( !is_hvm_vcpu(v) ) + if ( is_pv_vcpu(v) ) { sh_ctxt->ctxt.addr_size = sh_ctxt->ctxt.sp_size = BITS_PER_LONG; return &pv_shadow_emulator_ops; @@ -964,7 +964,7 @@ int sh_unsync(struct vcpu *v, mfn_t gmfn) if ( pg->shadow_flags & ((SHF_page_type_mask & ~SHF_L1_ANY) | SHF_out_of_sync) || sh_page_has_multiple_shadows(pg) - || !is_hvm_domain(v->domain) + || is_pv_domain(v->domain) || !v->domain->arch.paging.shadow.oos_active ) return 0; @@ -2753,7 +2753,7 @@ static void sh_update_paging_modes(struct vcpu *v) if ( v->arch.paging.mode ) v->arch.paging.mode->shadow.detach_old_tables(v); - if ( !is_hvm_domain(d) ) + if ( is_pv_domain(d) ) { /// /// PV guest diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c index 3fed0b6..d3fa25c 100644 --- a/xen/arch/x86/mm/shadow/multi.c +++ b/xen/arch/x86/mm/shadow/multi.c @@ -711,8 +711,9 @@ _sh_propagate(struct vcpu *v, // PV guests in 64-bit mode use two different page tables for user vs // supervisor permissions, making the guest''s _PAGE_USER bit irrelevant. // It is always shadowed as present... - if ( (GUEST_PAGING_LEVELS == 4) && !is_pv_32on64_domain(d) - && !is_hvm_domain(d) ) + if ( (GUEST_PAGING_LEVELS == 4) + && !is_pv_32on64_domain(d) + && is_pv_domain(d) ) { sflags |= _PAGE_USER; } @@ -3922,7 +3923,7 @@ sh_update_cr3(struct vcpu *v, int do_locking) #endif /* Don''t do anything on an uninitialised vcpu */ - if ( !is_hvm_domain(d) && !v->is_initialised ) + if ( is_pv_domain(d) && !v->is_initialised ) { ASSERT(v->arch.cr3 == 0); return; diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index 4835ed7..dab6213 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -310,10 +310,10 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) spin_unlock(&v->domain->event_lock); break; } - if ( !is_hvm_domain(v->domain) && + if ( is_pv_domain(v->domain) && v->domain->arch.pv_domain.auto_unmask ) evtchn_unmask(pirq->evtchn); - if ( !is_hvm_domain(v->domain) || + if ( is_pv_domain(v->domain) || domain_pirq_to_irq(v->domain, eoi.irq) > 0 ) pirq_guest_eoi(pirq); if ( is_hvm_domain(v->domain) && diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 77c200b..edb7a6a 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -120,6 +120,7 @@ static void show_guest_stack(struct vcpu *v, struct cpu_user_regs *regs) unsigned long *stack, addr; unsigned long mask = STACK_SIZE; + /* Avoid HVM as we don''t know what the stack looks like. */ if ( is_hvm_vcpu(v) ) return; @@ -557,7 +558,7 @@ static inline void do_trap( } if ( ((trapnr == TRAP_copro_error) || (trapnr == TRAP_simd_error)) && - is_hvm_vcpu(curr) && curr->arch.hvm_vcpu.fpu_exception_callback ) + has_hvm_container_vcpu(curr) && curr->arch.hvm_vcpu.fpu_exception_callback ) { curr->arch.hvm_vcpu.fpu_exception_callback( curr->arch.hvm_vcpu.fpu_exception_callback_arg, regs); @@ -712,7 +713,7 @@ int cpuid_hypervisor_leaves( uint32_t idx, uint32_t sub_idx, *ebx = 0x40000200; *ecx = 0; /* Features 1 */ *edx = 0; /* Features 2 */ - if ( !is_hvm_vcpu(current) ) + if ( is_pv_vcpu(current) ) *ecx |= XEN_CPUID_FEAT1_MMU_PT_UPDATE_PRESERVE_AD; break; diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c index 2bdbad0..4a3b3f1 100644 --- a/xen/arch/x86/x86_64/mm.c +++ b/xen/arch/x86/x86_64/mm.c @@ -73,7 +73,7 @@ void *do_page_walk(struct vcpu *v, unsigned long addr) l2_pgentry_t l2e, *l2t; l1_pgentry_t l1e, *l1t; - if ( is_hvm_vcpu(v) || !is_canonical_address(addr) ) + if ( !is_pv_vcpu(v) || !is_canonical_address(addr) ) return NULL; l4t = map_domain_page(mfn); diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c index 0316d7c..f66ab5a 100644 --- a/xen/arch/x86/x86_64/traps.c +++ b/xen/arch/x86/x86_64/traps.c @@ -86,7 +86,7 @@ void show_registers(struct cpu_user_regs *regs) enum context context; struct vcpu *v = current; - if ( is_hvm_vcpu(v) && guest_mode(regs) ) + if ( has_hvm_container_vcpu(v) && guest_mode(regs) ) { struct segment_register sreg; context = CTXT_hvm_guest; @@ -147,8 +147,8 @@ void vcpu_show_registers(const struct vcpu *v) const struct cpu_user_regs *regs = &v->arch.user_regs; unsigned long crs[8]; - /* No need to handle HVM for now. */ - if ( is_hvm_vcpu(v) ) + /* Only handle PV guests for now */ + if ( !is_pv_vcpu(v) ) return; crs[0] = v->arch.pv_vcpu.ctrlreg[0]; @@ -624,7 +624,7 @@ static void hypercall_page_initialise_ring3_kernel(void *hypercall_page) void hypercall_page_initialise(struct domain *d, void *hypercall_page) { memset(hypercall_page, 0xCC, PAGE_SIZE); - if ( is_hvm_domain(d) ) + if ( has_hvm_container_domain(d) ) hvm_hypercall_page_initialise(d, hypercall_page); else if ( !is_pv_32bit_domain(d) ) hypercall_page_initialise_ring3_kernel(hypercall_page); diff --git a/xen/common/domain.c b/xen/common/domain.c index 8c9b813..f39b28d 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -238,7 +238,7 @@ struct domain *domain_create( goto fail; if ( domcr_flags & DOMCRF_hvm ) - d->is_hvm = 1; + d->guest_type = guest_type_hvm; if ( domid == 0 ) { diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c index 21c6a14..107b000 100644 --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -721,7 +721,7 @@ __gnttab_map_grant_ref( double_gt_lock(lgt, rgt); - if ( !is_hvm_domain(ld) && need_iommu(ld) ) + if ( is_pv_domain(ld) && need_iommu(ld) ) { unsigned int wrc, rdc; int err = 0; @@ -931,7 +931,7 @@ __gnttab_unmap_common( act->pin -= GNTPIN_hstw_inc; } - if ( !is_hvm_domain(ld) && need_iommu(ld) ) + if ( is_pv_domain(ld) && need_iommu(ld) ) { unsigned int wrc, rdc; int err = 0; diff --git a/xen/common/kernel.c b/xen/common/kernel.c index 4ca50c4..97d9050 100644 --- a/xen/common/kernel.c +++ b/xen/common/kernel.c @@ -306,7 +306,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) if ( current->domain == dom0 ) fi.submap |= 1U << XENFEAT_dom0; #ifdef CONFIG_X86 - if ( !is_hvm_vcpu(current) ) + if ( is_pv_vcpu(current) ) fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) | (1U << XENFEAT_highmem_assist) | (1U << XENFEAT_gnttab_map_avail_bits); diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index e42651e..07e21f7 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -16,7 +16,7 @@ #define is_pv_32on64_domain(d) (is_pv_32bit_domain(d)) #define is_pv_32on64_vcpu(v) (is_pv_32on64_domain((v)->domain)) -#define is_hvm_pv_evtchn_domain(d) (is_hvm_domain(d) && \ +#define is_hvm_pv_evtchn_domain(d) (has_hvm_container_domain(d) && \ d->arch.hvm_domain.irq.callback_via_type == HVMIRQ_callback_vector) #define is_hvm_pv_evtchn_vcpu(v) (is_hvm_pv_evtchn_domain(v->domain)) diff --git a/xen/include/asm-x86/event.h b/xen/include/asm-x86/event.h index 7edeb5b..a82062e 100644 --- a/xen/include/asm-x86/event.h +++ b/xen/include/asm-x86/event.h @@ -23,7 +23,7 @@ int hvm_local_events_need_delivery(struct vcpu *v); static inline int local_events_need_delivery(void) { struct vcpu *v = current; - return (is_hvm_vcpu(v) ? hvm_local_events_need_delivery(v) : + return (has_hvm_container_vcpu(v) ? hvm_local_events_need_delivery(v) : (vcpu_info(v, evtchn_upcall_pending) && !vcpu_info(v, evtchn_upcall_mask))); } diff --git a/xen/include/asm-x86/guest_access.h b/xen/include/asm-x86/guest_access.h index ca700c9..88edb3f 100644 --- a/xen/include/asm-x86/guest_access.h +++ b/xen/include/asm-x86/guest_access.h @@ -14,27 +14,27 @@ /* Raw access functions: no type checking. */ #define raw_copy_to_guest(dst, src, len) \ - (is_hvm_vcpu(current) ? \ + (has_hvm_container_vcpu(current) ? \ copy_to_user_hvm((dst), (src), (len)) : \ copy_to_user((dst), (src), (len))) #define raw_copy_from_guest(dst, src, len) \ - (is_hvm_vcpu(current) ? \ + (has_hvm_container_vcpu(current) ? \ copy_from_user_hvm((dst), (src), (len)) : \ copy_from_user((dst), (src), (len))) #define raw_clear_guest(dst, len) \ - (is_hvm_vcpu(current) ? \ + (has_hvm_container_vcpu(current) ? \ clear_user_hvm((dst), (len)) : \ clear_user((dst), (len))) #define __raw_copy_to_guest(dst, src, len) \ - (is_hvm_vcpu(current) ? \ + (has_hvm_container_vcpu(current) ? \ copy_to_user_hvm((dst), (src), (len)) : \ __copy_to_user((dst), (src), (len))) #define __raw_copy_from_guest(dst, src, len) \ - (is_hvm_vcpu(current) ? \ + (has_hvm_container_vcpu(current) ? \ copy_from_user_hvm((dst), (src), (len)) : \ __copy_from_user((dst), (src), (len))) #define __raw_clear_guest(dst, len) \ - (is_hvm_vcpu(current) ? \ + (has_hvm_container_vcpu(current) ? \ clear_user_hvm((dst), (len)) : \ clear_user((dst), (len))) diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h index b62bc6a..d2a8250 100644 --- a/xen/include/asm-x86/guest_pt.h +++ b/xen/include/asm-x86/guest_pt.h @@ -196,7 +196,7 @@ guest_supports_superpages(struct vcpu *v) /* The _PAGE_PSE bit must be honoured in HVM guests, whenever * CR4.PSE is set or the guest is in PAE or long mode. * It''s also used in the dummy PT for vcpus with CR4.PG cleared. */ - return (!is_hvm_vcpu(v) + return (is_pv_vcpu(v) ? opt_allow_superpage : (GUEST_PAGING_LEVELS != 2 || !hvm_paging_enabled(v) @@ -214,7 +214,7 @@ guest_supports_nx(struct vcpu *v) { if ( GUEST_PAGING_LEVELS == 2 || !cpu_has_nx ) return 0; - if ( !is_hvm_vcpu(v) ) + if ( is_pv_vcpu(v) ) return cpu_has_nx; return hvm_nx_enabled(v); } diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 2397537..a9dd15f 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -259,6 +259,10 @@ struct mem_event_per_domain struct evtchn_port_ops; +enum guest_type { + guest_type_pv, guest_type_hvm +}; + struct domain { domid_t domain_id; @@ -311,8 +315,8 @@ struct domain struct rangeset *iomem_caps; struct rangeset *irq_caps; - /* Is this an HVM guest? */ - bool_t is_hvm; + enum guest_type guest_type; + #ifdef HAS_PASSTHROUGH /* Does this guest need iommu mappings? */ bool_t need_iommu; @@ -772,8 +776,12 @@ void watchdog_domain_destroy(struct domain *d); #define VM_ASSIST(_d,_t) (test_bit((_t), &(_d)->vm_assist)) -#define is_hvm_domain(d) ((d)->is_hvm) +#define is_pv_domain(d) ((d)->guest_type == guest_type_pv) +#define is_pv_vcpu(v) (is_pv_domain((v)->domain)) +#define is_hvm_domain(d) ((d)->guest_type == guest_type_hvm) #define is_hvm_vcpu(v) (is_hvm_domain(v->domain)) +#define has_hvm_container_domain(d) ((d)->guest_type != guest_type_pv) +#define has_hvm_container_vcpu(v) (has_hvm_container_domain((v)->domain)) #define is_pinned_vcpu(v) ((v)->domain->is_pinned || \ cpumask_weight((v)->cpu_affinity) == 1) #ifdef HAS_PASSTHROUGH diff --git a/xen/include/xen/tmem_xen.h b/xen/include/xen/tmem_xen.h index ad1ddd5..9fb7446 100644 --- a/xen/include/xen/tmem_xen.h +++ b/xen/include/xen/tmem_xen.h @@ -442,7 +442,7 @@ typedef XEN_GUEST_HANDLE_PARAM(char) tmem_cli_va_param_t; static inline int tmh_get_tmemop_from_client(tmem_op_t *op, tmem_cli_op_t uops) { #ifdef CONFIG_COMPAT - if ( is_hvm_vcpu(current) ? + if ( has_hvm_container_vcpu(current) ? hvm_guest_x86_mode(current) != 8 : is_pv_32on64_vcpu(current) ) { -- 1.7.9.5
Introduce new PVH guest type, flags to create it, and ways to identify it. To begin with, it will inherit functionality marked hvm_container. Code to actually check for hardware support, in the VMX case, will be added in future patches. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- v15: - Check for hardware support for PVH domains in hvm_domain_initialise - Move HAP check to hvm_domain_initialise as well v13: Changed if/else in getdomaininfo into a switch statement as requested. CC: Jan Beulich <jbeulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> --- xen/arch/x86/hvm/hvm.c | 19 +++++++++++++++++++ xen/common/domain.c | 2 ++ xen/common/domctl.c | 24 +++++++++++++++++++++--- xen/include/asm-x86/hvm/hvm.h | 3 +++ xen/include/public/domctl.h | 8 +++++++- xen/include/xen/sched.h | 11 ++++++++++- 6 files changed, 62 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 7aa1e8a..0a9c922 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -145,6 +145,9 @@ static int __init hvm_enable(void) printk("\n"); } + if ( !fns->pvh_supported ) + printk("HVM: PVH mode not supported on this platform\n"); + /* * Allow direct access to the PC debug ports 0x80 and 0xed (they are * often used for I/O delays, but the vmexits simply slow things down). @@ -522,6 +525,22 @@ int hvm_domain_initialise(struct domain *d) return -EINVAL; } + if ( is_pvh_domain(d) ) + { + if ( !hvm_funcs.pvh_supported ) + { + gdprintk(XENLOG_WARNING, "Attempt to create a PVH guest " + "on a system without necessary hardware support.\n"); + return -EINVAL; + } + if ( !hap_enabled(d) ) + { + printk(XENLOG_INFO "PVH guest must have HAP on\n"); + return -EINVAL; + } + + } + spin_lock_init(&d->arch.hvm_domain.irq_lock); spin_lock_init(&d->arch.hvm_domain.uc_lock); diff --git a/xen/common/domain.c b/xen/common/domain.c index f39b28d..52b7cea 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -239,6 +239,8 @@ struct domain *domain_create( if ( domcr_flags & DOMCRF_hvm ) d->guest_type = guest_type_hvm; + else if ( domcr_flags & DOMCRF_pvh ) + d->guest_type = guest_type_pvh; if ( domid == 0 ) { diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 870eef1..552669d 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -185,8 +185,17 @@ void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info) (d->debugger_attached ? XEN_DOMINF_debugged : 0) | d->shutdown_code << XEN_DOMINF_shutdownshift; - if ( is_hvm_domain(d) ) + switch(d->guest_type) + { + case guest_type_hvm: info->flags |= XEN_DOMINF_hvm_guest; + break; + case guest_type_pvh: + info->flags |= XEN_DOMINF_pvh_guest; + break; + default: + break; + } xsm_security_domaininfo(d, info); @@ -412,8 +421,11 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) ret = -EINVAL; if ( supervisor_mode_kernel || (op->u.createdomain.flags & - ~(XEN_DOMCTL_CDF_hvm_guest | XEN_DOMCTL_CDF_hap | - XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off)) ) + ~(XEN_DOMCTL_CDF_hvm_guest + | XEN_DOMCTL_CDF_pvh_guest + | XEN_DOMCTL_CDF_hap + | XEN_DOMCTL_CDF_s3_integrity + | XEN_DOMCTL_CDF_oos_off)) ) break; dom = op->domain; @@ -440,9 +452,15 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) rover = dom; } + if ( (op->u.createdomain.flags & XEN_DOMCTL_CDF_hvm_guest) + && (op->u.createdomain.flags & XEN_DOMCTL_CDF_pvh_guest) ) + return -EINVAL; + domcr_flags = 0; if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hvm_guest ) domcr_flags |= DOMCRF_hvm; + if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_pvh_guest ) + domcr_flags |= DOMCRF_pvh; if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_hap ) domcr_flags |= DOMCRF_hap; if ( op->u.createdomain.flags & XEN_DOMCTL_CDF_s3_integrity ) diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index c9afb56..ccca5df 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -90,6 +90,9 @@ struct hvm_function_table { /* Support Hardware-Assisted Paging? */ int hap_supported; + /* Necessary hardware support for PVH mode? */ + int pvh_supported; + /* Indicate HAP capabilities. */ int hap_capabilities; diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index d4e479f..c79dabe 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -47,7 +47,7 @@ struct xen_domctl_createdomain { /* IN parameters */ uint32_t ssidref; xen_domain_handle_t handle; - /* Is this an HVM guest (as opposed to a PV guest)? */ + /* Is this an HVM guest (as opposed to a PVH or PV guest)? */ #define _XEN_DOMCTL_CDF_hvm_guest 0 #define XEN_DOMCTL_CDF_hvm_guest (1U<<_XEN_DOMCTL_CDF_hvm_guest) /* Use hardware-assisted paging if available? */ @@ -59,6 +59,9 @@ struct xen_domctl_createdomain { /* Disable out-of-sync shadow page tables? */ #define _XEN_DOMCTL_CDF_oos_off 3 #define XEN_DOMCTL_CDF_oos_off (1U<<_XEN_DOMCTL_CDF_oos_off) + /* Is this a PVH guest (as opposed to an HVM or PV guest)? */ +#define _XEN_DOMCTL_CDF_pvh_guest 4 +#define XEN_DOMCTL_CDF_pvh_guest (1U<<_XEN_DOMCTL_CDF_pvh_guest) uint32_t flags; }; typedef struct xen_domctl_createdomain xen_domctl_createdomain_t; @@ -89,6 +92,9 @@ struct xen_domctl_getdomaininfo { /* Being debugged. */ #define _XEN_DOMINF_debugged 6 #define XEN_DOMINF_debugged (1U<<_XEN_DOMINF_debugged) +/* domain is PVH */ +#define _XEN_DOMINF_pvh_guest 7 +#define XEN_DOMINF_pvh_guest (1U<<_XEN_DOMINF_pvh_guest) /* XEN_DOMINF_shutdown guest-supplied code. */ #define XEN_DOMINF_shutdownmask 255 #define XEN_DOMINF_shutdownshift 16 diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index a9dd15f..6d78bf9 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -259,8 +259,12 @@ struct mem_event_per_domain struct evtchn_port_ops; +/* + * PVH is a PV guest running in an HVM container. is_hvm_* checks + * will be false, but has_hvm_container_* checks will be true. + */ enum guest_type { - guest_type_pv, guest_type_hvm + guest_type_pv, guest_type_pvh, guest_type_hvm }; struct domain @@ -500,6 +504,9 @@ struct domain *domain_create( /* DOMCRF_oos_off: dont use out-of-sync optimization for shadow page tables */ #define _DOMCRF_oos_off 4 #define DOMCRF_oos_off (1U<<_DOMCRF_oos_off) + /* DOMCRF_pvh: Create PV domain in HVM container. */ +#define _DOMCRF_pvh 5 +#define DOMCRF_pvh (1U<<_DOMCRF_pvh) /* * rcu_lock_domain_by_id() is more efficient than get_domain_by_id(). @@ -778,6 +785,8 @@ void watchdog_domain_destroy(struct domain *d); #define is_pv_domain(d) ((d)->guest_type == guest_type_pv) #define is_pv_vcpu(v) (is_pv_domain((v)->domain)) +#define is_pvh_domain(d) ((d)->guest_type == guest_type_pvh) +#define is_pvh_vcpu(v) (is_pvh_domain((v)->domain)) #define is_hvm_domain(d) ((d)->guest_type == guest_type_hvm) #define is_hvm_vcpu(v) (is_hvm_domain(v->domain)) #define has_hvm_container_domain(d) ((d)->guest_type != guest_type_pv) -- 1.7.9.5
George Dunlap
2013-Nov-11 14:57 UTC
[PATCH v15 07/19] pvh: Disable unneeded features of HVM containers
Things kept: * cacheattr_region lists * irq-related structures * paging * tm_list * hvm params * hvm_domaim.io_handler (for handling PV io) Things disabled for now: * compat xlation Things disabled: * Emulated timers and clock sources * IO/MMIO ioreq pages, event channels * msix tables * hvm_funcs * nested HVM * Fast-path for emulated lapic accesses Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- v15: - Fix typo in comment - Allocate io_handler (used later when using HVM io path for PV io) - Port hvm_do_resume to "hvm no ioreq" patch v14: - Also free the params struct for pvh domains, since we''ve allocated it - Fail io for pvh VMs further down the stack, as we will be using the emulation code before calling into the pv pio handlers v13: - Removed unnecessary comment - Allocate params for pvh domains; remove null checks necessary in last patch - Add ASSERT(!is_pvh) to handle_pio CC: Jan Beulich <jbeulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> CC: Jun Nakajima <jun.nakajima@intel.com> CC: Eddie Dong <eddie.dong@intel.com> --- xen/arch/x86/hvm/hvm.c | 43 ++++++++++++++++++++++++++++++++++++------- xen/arch/x86/hvm/irq.c | 3 +++ xen/arch/x86/hvm/vmx/intr.c | 3 ++- 3 files changed, 41 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 0a9c922..10ddc34 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -304,6 +304,10 @@ u64 hvm_get_guest_tsc_adjust(struct vcpu *v) void hvm_migrate_timers(struct vcpu *v) { + /* PVH doesn''t use rtc and emulated timers, it uses pvclock mechanism. */ + if ( is_pvh_vcpu(v) ) + return; + rtc_migrate_timers(v); pt_migrate(v); } @@ -345,10 +349,11 @@ void hvm_do_resume(struct vcpu *v) { ioreq_t *p; - pt_restore_timer(v); - check_wakeup_from_wait(); + if ( is_hvm_vcpu(v) ) + pt_restore_timer(v); + /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */ if ( !(p = get_ioreq(v)) ) goto check_inject_trap; @@ -560,6 +565,9 @@ int hvm_domain_initialise(struct domain *d) goto fail1; d->arch.hvm_domain.io_handler->num_slot = 0; + if ( is_pvh_domain(d) ) + return 0; + hvm_init_guest_time(d); d->arch.hvm_domain.params[HVM_PARAM_HPET_ENABLED] = 1; @@ -600,6 +608,12 @@ int hvm_domain_initialise(struct domain *d) void hvm_domain_relinquish_resources(struct domain *d) { + xfree(d->arch.hvm_domain.io_handler); + xfree(d->arch.hvm_domain.params); + + if ( is_pvh_domain(d) ) + return; + if ( hvm_funcs.nhvm_domain_relinquish_resources ) hvm_funcs.nhvm_domain_relinquish_resources(d); @@ -616,14 +630,15 @@ void hvm_domain_relinquish_resources(struct domain *d) pmtimer_deinit(d); hpet_deinit(d); } - - xfree(d->arch.hvm_domain.io_handler); - xfree(d->arch.hvm_domain.params); } void hvm_domain_destroy(struct domain *d) { hvm_destroy_cacheattr_region_list(d); + + if ( is_pvh_domain(d) ) + return; + hvm_funcs.domain_destroy(d); rtc_deinit(d); stdvga_deinit(d); @@ -1125,7 +1140,9 @@ int hvm_vcpu_initialise(struct vcpu *v) goto fail1; /* NB: vlapic_init must be called before hvm_funcs.vcpu_initialise */ - if ( (rc = vlapic_init(v)) != 0 ) /* teardown: vlapic_destroy */ + if ( is_hvm_vcpu(v) ) + rc = vlapic_init(v); + if ( rc != 0 ) /* teardown: vlapic_destroy */ goto fail2; if ( (rc = hvm_funcs.vcpu_initialise(v)) != 0 ) /* teardown: hvm_funcs.vcpu_destroy */ @@ -1140,6 +1157,14 @@ int hvm_vcpu_initialise(struct vcpu *v) v->arch.hvm_vcpu.inject_trap.vector = -1; + if ( is_pvh_vcpu(v) ) + { + v->arch.hvm_vcpu.hcall_64bit = 1; /* PVH 32bitfixme. */ + /* This is for hvm_long_mode_enabled(v). */ + v->arch.hvm_vcpu.guest_efer = EFER_SCE | EFER_LMA | EFER_LME; + return 0; + } + rc = setup_compat_arg_xlat(v); /* teardown: free_compat_arg_xlat() */ if ( rc != 0 ) goto fail4; @@ -1211,7 +1236,10 @@ void hvm_vcpu_destroy(struct vcpu *v) tasklet_kill(&v->arch.hvm_vcpu.assert_evtchn_irq_tasklet); hvm_vcpu_cacheattr_destroy(v); - vlapic_destroy(v); + + if ( is_hvm_vcpu(v) ) + vlapic_destroy(v); + hvm_funcs.vcpu_destroy(v); /* Event channel is already freed by evtchn_destroy(). */ @@ -1414,6 +1442,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs, * a fast path for LAPIC accesses, skipping the p2m lookup. */ if ( !nestedhvm_vcpu_in_guestmode(v) + && is_hvm_vcpu(v) && gfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) ) { if ( !handle_mmio() ) diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c index 6a6fb68..677fbcd 100644 --- a/xen/arch/x86/hvm/irq.c +++ b/xen/arch/x86/hvm/irq.c @@ -405,6 +405,9 @@ struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v) && vcpu_info(v, evtchn_upcall_pending) ) return hvm_intack_vector(plat->irq.callback_via.vector); + if ( is_pvh_vcpu(v) ) + return hvm_intack_none; + if ( vlapic_accept_pic_intr(v) && plat->vpic[0].int_output ) return hvm_intack_pic(0); diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c index 1942e31..7757910 100644 --- a/xen/arch/x86/hvm/vmx/intr.c +++ b/xen/arch/x86/hvm/vmx/intr.c @@ -236,7 +236,8 @@ void vmx_intr_assist(void) } /* Crank the handle on interrupt state. */ - pt_vector = pt_update_irq(v); + if ( is_hvm_vcpu(v) ) + pt_vector = pt_update_irq(v); do { unsigned long intr_info; -- 1.7.9.5
Changes: * Enforce HAP mode for now * Disable exits related to virtual interrupts or emulated APICs * Disable changing paging mode - "unrestricted guest" (i.e., real mode for EPT) disabled - write guest EFER disabled * Start in 64-bit mode * Paging mode update to happen in arch_set_info_guest Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- v15: - Move check for VMX-related bits required for PVH mode to start_vmx - NB this removes the check for the relevant cr4 bits - Remove check here for tsc mode - Don''t special-case SHADOW_GS_BASE exiting; fix this HVM-wide in another patch - Don''t special-case WBINVD exiting (i.e., keep it enabled) v14: - Mask out bits of cr4 that the guest is not allowed to set v13: - Fix up default cr0 settings - Get rid of some unnecessary PVH-related changes - Return EOPNOTSUPP instead of ENOSYS if hardware features are not present - Remove an unnecessary variable from pvh_check_requirements CC: Jan Beulich <jbeulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> CC: Jun Nakajima <jun.nakajima@intel.com> CC: Eddie Dong <eddie.dong@intel.com> --- xen/arch/x86/hvm/vmx/vmcs.c | 55 +++++++++++++++++++++++++++++++++++++++---- xen/arch/x86/hvm/vmx/vmx.c | 6 +++++ 2 files changed, 57 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index fa92855..bda97f6 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -28,6 +28,7 @@ #include <asm/msr.h> #include <asm/xstate.h> #include <asm/hvm/hvm.h> +#include <asm/hvm/nestedhvm.h> #include <asm/hvm/io.h> #include <asm/hvm/support.h> #include <asm/hvm/vmx/vmx.h> @@ -887,7 +888,30 @@ static int construct_vmcs(struct vcpu *v) /* Do not enable Monitor Trap Flag unless start single step debug */ v->arch.hvm_vmx.exec_control &= ~CPU_BASED_MONITOR_TRAP_FLAG; + if ( is_pvh_domain(d) ) + { + /* Disable virtual apics, TPR */ + v->arch.hvm_vmx.secondary_exec_control &= + ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES + | SECONDARY_EXEC_APIC_REGISTER_VIRT + | SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); + v->arch.hvm_vmx.exec_control &= ~CPU_BASED_TPR_SHADOW; + + /* Unrestricted guest (real mode for EPT) */ + v->arch.hvm_vmx.secondary_exec_control &+ ~SECONDARY_EXEC_UNRESTRICTED_GUEST; + + /* Start in 64-bit mode. + * PVH 32bitfixme. */ + vmentry_ctl |= VM_ENTRY_IA32E_MODE; /* GUEST_EFER.LME/LMA ignored */ + + ASSERT(v->arch.hvm_vmx.exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS); + ASSERT(v->arch.hvm_vmx.exec_control & CPU_BASED_ACTIVATE_MSR_BITMAP); + ASSERT(!(v->arch.hvm_vmx.exec_control & CPU_BASED_RDTSC_EXITING)); + } + vmx_update_cpu_exec_control(v); + __vmwrite(VM_EXIT_CONTROLS, vmexit_ctl); __vmwrite(VM_ENTRY_CONTROLS, vmentry_ctl); @@ -1011,7 +1035,11 @@ static int construct_vmcs(struct vcpu *v) __vmwrite(GUEST_DS_AR_BYTES, 0xc093); __vmwrite(GUEST_FS_AR_BYTES, 0xc093); __vmwrite(GUEST_GS_AR_BYTES, 0xc093); - __vmwrite(GUEST_CS_AR_BYTES, 0xc09b); /* exec/read, accessed */ + if ( is_pvh_domain(d) ) + /* CS.L == 1, exec, read/write, accessed. PVH 32bitfixme. */ + __vmwrite(GUEST_CS_AR_BYTES, 0xa09b); + else + __vmwrite(GUEST_CS_AR_BYTES, 0xc09b); /* exec/read, accessed */ /* Guest IDT. */ __vmwrite(GUEST_IDTR_BASE, 0); @@ -1041,10 +1069,24 @@ static int construct_vmcs(struct vcpu *v) | (1U << TRAP_no_device); vmx_update_exception_bitmap(v); + /* In HVM domains, this happens on the realmode->paging + * transition. Since PVH never goes through this transition, we + * need to do it at start-of-day. */ + if ( is_pvh_domain(d) ) + vmx_update_debug_state(v); + v->arch.hvm_vcpu.guest_cr[0] = X86_CR0_PE | X86_CR0_ET; + + /* PVH domains always start in paging mode */ + if ( is_pvh_domain(d) ) + v->arch.hvm_vcpu.guest_cr[0] |= X86_CR0_PG | X86_CR0_NE | X86_CR0_WP; + hvm_update_guest_cr(v, 0); - v->arch.hvm_vcpu.guest_cr[4] = 0; + v->arch.hvm_vcpu.guest_cr[4] = is_pvh_domain(d) ? + (real_cr4_to_pv_guest_cr4(mmu_cr4_features) + & ~HVM_CR4_GUEST_RESERVED_BITS(v)) + : 0; hvm_update_guest_cr(v, 4); if ( cpu_has_vmx_tpr_shadow ) @@ -1076,9 +1118,14 @@ static int construct_vmcs(struct vcpu *v) vmx_vmcs_exit(v); - paging_update_paging_modes(v); /* will update HOST & GUEST_CR3 as reqd */ + /* PVH: paging mode is updated by arch_set_info_guest(). */ + if ( is_hvm_vcpu(v) ) + { + /* will update HOST & GUEST_CR3 as reqd */ + paging_update_paging_modes(v); - vmx_vlapic_msr_changed(v); + vmx_vlapic_msr_changed(v); + } return 0; } diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index f4e5bcb..8a60680 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1670,6 +1670,12 @@ const struct hvm_function_table * __init start_vmx(void) vmx_function_table.sync_pir_to_irr = NULL; } + if ( cpu_has_vmx_ept + && cpu_has_vmx_pat + && cpu_has_vmx_msr_bitmap + && cpu_has_vmx_secondary_exec_control ) + vmx_function_table.pvh_supported = 1; + setup_vmcs_dump(); return &vmx_function_table; -- 1.7.9.5
George Dunlap
2013-Nov-11 14:57 UTC
[PATCH v15 09/19] pvh: Do not allow PVH guests to change paging modes
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> --- v15: Fix up whitespace issue v13: Removed unnecessary else. CC: Jan Beulich <jbeulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> --- xen/arch/x86/hvm/hvm.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 10ddc34..6992106 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1818,6 +1818,16 @@ int hvm_set_cr0(unsigned long value) (value & (X86_CR0_PE | X86_CR0_PG)) == X86_CR0_PG ) goto gpf; + /* A pvh is not expected to change to real mode. */ + if ( is_pvh_vcpu(v) + && (value & (X86_CR0_PE | X86_CR0_PG)) != (X86_CR0_PG | X86_CR0_PE) ) + { + printk(XENLOG_G_WARNING + "PVH attempting to turn off PE/PG. CR0:%lx\n", value); + goto gpf; + } + + if ( (value & X86_CR0_PG) && !(old_value & X86_CR0_PG) ) { if ( v->arch.hvm_vcpu.guest_efer & EFER_LME ) -- 1.7.9.5
Hypercalls where we now have unrestricted access: * memory_op * console_io * vcpu_op * mmuext_op We also restrict PVH domain access to HVMOP_*_param to reading and writing HVM_PARAM_CALLBACK_IRQ. Most hvm_op functions require "is_hvm_domain()" and will default to -EINVAL; exceptions are HVMOP_get_time and HVMOP_xentrace. Finally, we restrict setting IOPL permissions for a PVH domain. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: Tim Deegan <tim@xen.org> --- v15: - Remove unnecessary comment v14: - Get rid of (now) spurious null check for hvm_domain.params v13: - Minor code tweaks, as suggested during review - return -ENOSYS for set_iopl and set_iobitmap calls - Allow HVMOP_set_param for HVM_PARAM_CALLBACK_IRQ. We still don''t allow other values to be written. CC: Jan Beulich <jbeulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> --- xen/arch/x86/hvm/hvm.c | 44 ++++++++++++++++++++++++++++++++++++-------- xen/arch/x86/physdev.c | 10 ++++++++++ xen/common/kernel.c | 13 +++++++++++-- 3 files changed, 57 insertions(+), 10 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 6992106..3f245ba 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -3414,6 +3414,24 @@ static hvm_hypercall_t *const hvm_hypercall32_table[NR_hypercalls] = { HYPERCALL(tmem_op) }; +/* PVH 32bitfixme. */ +static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = { + HYPERCALL(platform_op), + HYPERCALL(memory_op), + HYPERCALL(xen_version), + HYPERCALL(console_io), + [ __HYPERVISOR_grant_table_op ] = (hvm_hypercall_t *)hvm_grant_table_op, + HYPERCALL(vcpu_op), + HYPERCALL(mmuext_op), + HYPERCALL(xsm_op), + HYPERCALL(sched_op), + HYPERCALL(event_channel_op), + [ __HYPERVISOR_physdev_op ] = (hvm_hypercall_t *)hvm_physdev_op, + HYPERCALL(hvm_op), + HYPERCALL(sysctl), + HYPERCALL(domctl) +}; + int hvm_do_hypercall(struct cpu_user_regs *regs) { struct vcpu *curr = current; @@ -3440,7 +3458,9 @@ int hvm_do_hypercall(struct cpu_user_regs *regs) if ( (eax & 0x80000000) && is_viridian_domain(curr->domain) ) return viridian_hypercall(regs); - if ( (eax >= NR_hypercalls) || !hvm_hypercall32_table[eax] ) + if ( (eax >= NR_hypercalls) || + (is_pvh_vcpu(curr) ? !pvh_hypercall64_table[eax] + : !hvm_hypercall32_table[eax]) ) { regs->eax = -ENOSYS; return HVM_HCALL_completed; @@ -3455,16 +3475,20 @@ int hvm_do_hypercall(struct cpu_user_regs *regs) regs->r10, regs->r8, regs->r9); curr->arch.hvm_vcpu.hcall_64bit = 1; - regs->rax = hvm_hypercall64_table[eax](regs->rdi, - regs->rsi, - regs->rdx, - regs->r10, - regs->r8, - regs->r9); + if ( is_pvh_vcpu(curr) ) + regs->rax = pvh_hypercall64_table[eax](regs->rdi, regs->rsi, + regs->rdx, regs->r10, + regs->r8, regs->r9); + else + regs->rax = hvm_hypercall64_table[eax](regs->rdi, regs->rsi, + regs->rdx, regs->r10, + regs->r8, regs->r9); curr->arch.hvm_vcpu.hcall_64bit = 0; } else { + ASSERT(!is_pvh_vcpu(curr)); /* PVH 32bitfixme. */ + HVM_DBG_LOG(DBG_LEVEL_HCALL, "hcall%u(%x, %x, %x, %x, %x, %x)", eax, (uint32_t)regs->ebx, (uint32_t)regs->ecx, (uint32_t)regs->edx, (uint32_t)regs->esi, @@ -3889,7 +3913,11 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) return -ESRCH; rc = -EINVAL; - if ( !is_hvm_domain(d) ) + if ( !has_hvm_container_domain(d) ) + goto param_fail; + + if ( is_pvh_domain(d) + && (a.index != HVM_PARAM_CALLBACK_IRQ) ) goto param_fail; rc = xsm_hvm_param(XSM_TARGET, d, op); diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index dab6213..7d787dc 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -519,6 +519,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_set_iopl: { struct physdev_set_iopl set_iopl; + + ret = -ENOSYS; + if ( is_pvh_vcpu(current) ) + break; + ret = -EFAULT; if ( copy_from_guest(&set_iopl, arg, 1) != 0 ) break; @@ -532,6 +537,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_set_iobitmap: { struct physdev_set_iobitmap set_iobitmap; + + ret = -ENOSYS; + if ( is_pvh_vcpu(current) ) + break; + ret = -EFAULT; if ( copy_from_guest(&set_iobitmap, arg, 1) != 0 ) break; diff --git a/xen/common/kernel.c b/xen/common/kernel.c index 97d9050..cc1f743 100644 --- a/xen/common/kernel.c +++ b/xen/common/kernel.c @@ -306,14 +306,23 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) if ( current->domain == dom0 ) fi.submap |= 1U << XENFEAT_dom0; #ifdef CONFIG_X86 - if ( is_pv_vcpu(current) ) + switch(d->guest_type) { + case guest_type_pv: fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) | (1U << XENFEAT_highmem_assist) | (1U << XENFEAT_gnttab_map_avail_bits); - else + break; + case guest_type_pvh: + fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) | + (1U << XENFEAT_supervisor_mode_kernel) | + (1U << XENFEAT_hvm_callback_vector); + break; + case guest_type_hvm: fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) | (1U << XENFEAT_hvm_callback_vector) | (1U << XENFEAT_hvm_pirqs); + break; + } #endif break; default: -- 1.7.9.5
Allow PV e820 map to be set and read from a PVH domain. This requires moving the pv e820 struct out from the pv-specific domain struct and into the arch domain struct. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- CC: Jan Beulich <jbeulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> --- xen/arch/x86/domain.c | 9 +++------ xen/arch/x86/mm.c | 26 ++++++++++++-------------- xen/include/asm-x86/domain.h | 10 +++++----- 3 files changed, 20 insertions(+), 25 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 358616c..8c2a57f 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -553,6 +553,7 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) if ( (rc = iommu_domain_init(d)) != 0 ) goto fail; } + spin_lock_init(&d->arch.e820_lock); if ( has_hvm_container_domain(d) ) { @@ -563,13 +564,9 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags) } } else - { /* 64-bit PV guest by default. */ d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0; - spin_lock_init(&d->arch.pv_domain.e820_lock); - } - /* initialize default tsc behavior in case tools don''t */ tsc_set_info(d, TSC_MODE_DEFAULT, 0UL, 0, 0); spin_lock_init(&d->arch.vtsc_lock); @@ -592,8 +589,8 @@ void arch_domain_destroy(struct domain *d) { if ( has_hvm_container_domain(d) ) hvm_domain_destroy(d); - else - xfree(d->arch.pv_domain.e820); + + xfree(d->arch.e820); free_domain_pirqs(d); if ( !is_idle_domain(d) ) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 9621e22..6c26026 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4759,11 +4759,11 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) return -EFAULT; } - spin_lock(&d->arch.pv_domain.e820_lock); - xfree(d->arch.pv_domain.e820); - d->arch.pv_domain.e820 = e820; - d->arch.pv_domain.nr_e820 = fmap.map.nr_entries; - spin_unlock(&d->arch.pv_domain.e820_lock); + spin_lock(&d->arch.e820_lock); + xfree(d->arch.e820); + d->arch.e820 = e820; + d->arch.nr_e820 = fmap.map.nr_entries; + spin_unlock(&d->arch.e820_lock); rcu_unlock_domain(d); return rc; @@ -4777,26 +4777,24 @@ long arch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) if ( copy_from_guest(&map, arg, 1) ) return -EFAULT; - spin_lock(&d->arch.pv_domain.e820_lock); + spin_lock(&d->arch.e820_lock); /* Backwards compatibility. */ - if ( (d->arch.pv_domain.nr_e820 == 0) || - (d->arch.pv_domain.e820 == NULL) ) + if ( (d->arch.nr_e820 == 0) || (d->arch.e820 == NULL) ) { - spin_unlock(&d->arch.pv_domain.e820_lock); + spin_unlock(&d->arch.e820_lock); return -ENOSYS; } - map.nr_entries = min(map.nr_entries, d->arch.pv_domain.nr_e820); - if ( copy_to_guest(map.buffer, d->arch.pv_domain.e820, - map.nr_entries) || + map.nr_entries = min(map.nr_entries, d->arch.nr_e820); + if ( copy_to_guest(map.buffer, d->arch.e820, map.nr_entries) || __copy_to_guest(arg, &map, 1) ) { - spin_unlock(&d->arch.pv_domain.e820_lock); + spin_unlock(&d->arch.e820_lock); return -EFAULT; } - spin_unlock(&d->arch.pv_domain.e820_lock); + spin_unlock(&d->arch.e820_lock); return 0; } diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index 07e21f7..67c3d4b 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -234,11 +234,6 @@ struct pv_domain /* map_domain_page() mapping cache. */ struct mapcache_domain mapcache; - - /* Pseudophysical e820 map (XENMEM_memory_map). */ - spinlock_t e820_lock; - struct e820entry *e820; - unsigned int nr_e820; }; struct arch_domain @@ -313,6 +308,11 @@ struct arch_domain (possibly other cases in the future */ uint64_t vtsc_kerncount; /* for hvm, counts all vtsc */ uint64_t vtsc_usercount; /* not used for hvm */ + + /* Pseudophysical e820 map (XENMEM_memory_map). */ + spinlock_t e820_lock; + struct e820entry *e820; + unsigned int nr_e820; } __cacheline_aligned; #define has_arch_pdevs(d) (!list_empty(&(d)->arch.pdev_list)) -- 1.7.9.5
George Dunlap
2013-Nov-11 14:57 UTC
[PATCH v15 12/19] pvh: Set up more PV stuff in set_info_guest
Allow the guest to set up a few more things when bringing up a vcpu. This includes cr3 and gs_base. Also set up wallclock, and only initialize a vcpu once. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- v15: - Eliminate stray blank line - Add more comments / FIXMEs - Don''t use PV cr3 unpickling macros to read guest cr3 v14: - Share more of the codepath, removing a potential bug that might happen if paging functions are called with "is_initialised" set to zero. - Put cr3 in v->arch.guest_table, so the ref counting happens properly. This should fix the "zombie domains" problem. v13: - Get rid of separate pvh call, and fold gs_base write into hvm_set_info_guest - Check pvh parameters for validity at the top of arch_set_info_guest - Fix comment about PVH and set_info_guest CC: Jan Beulich <jbeulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> CC: Jun Nakajima <jun.nakajima@intel.com> CC: Eddie Dong <eddie.dong@intel.com> --- xen/arch/x86/domain.c | 37 +++++++++++++++++++++++++++++++++++-- xen/arch/x86/hvm/vmx/vmx.c | 6 +++++- xen/include/asm-x86/hvm/hvm.h | 6 +++--- xen/include/public/arch-x86/xen.h | 11 +++++++++++ 4 files changed, 54 insertions(+), 6 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 8c2a57f..4d82993 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -691,6 +691,18 @@ int arch_set_info_guest( (c(ldt_ents) > 8192) ) return -EINVAL; } + else if ( is_pvh_vcpu(v) ) + { + /* PVH 32bitfixme */ + ASSERT(!compat); + + if ( c(ctrlreg[1]) || c(ldt_base) || c(ldt_ents) || + c(user_regs.cs) || c(user_regs.ss) || c(user_regs.es) || + c(user_regs.ds) || c(user_regs.fs) || c(user_regs.gs) || + c.nat->gdt_ents || c.nat->fs_base || c.nat->gs_base_user ) + return -EINVAL; + + } v->fpu_initialised = !!(flags & VGCF_I387_VALID); @@ -728,8 +740,28 @@ int arch_set_info_guest( if ( has_hvm_container_vcpu(v) ) { - hvm_set_info_guest(v); - goto out; + /* + * NB: TF_kernel_mode is set unconditionally for HVM guests, + * so we always use the gs_base_kernel here. If we change this + * function to imitate the PV functionality, we''ll need to + * make it pay attention to the kernel bit. + */ + hvm_set_info_guest(v, compat ? 0 : c.nat->gs_base_kernel); + + if ( is_hvm_vcpu(v) || v->is_initialised ) + goto out; + + /* NB: No need to use PV cr3 un-pickling macros */ + cr3_gfn = c(ctrlreg[3]) >> 12; + cr3_page = get_page_from_gfn(d, cr3_gfn, NULL, P2M_ALLOC); + + v->arch.cr3 = page_to_maddr(cr3_page); + v->arch.hvm_vcpu.guest_cr[3] = c.nat->ctrlreg[3]; + v->arch.guest_table = pagetable_from_page(cr3_page); + + ASSERT(paging_mode_enabled(d)); + + goto pvh_skip_pv_stuff; } init_int80_direct_trap(v); @@ -934,6 +966,7 @@ int arch_set_info_guest( clear_bit(_VPF_in_reset, &v->pause_flags); + pvh_skip_pv_stuff: if ( v->vcpu_id == 0 ) update_domain_wallclock_time(d); diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 8a60680..8289e3a 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1446,7 +1446,7 @@ static int vmx_event_pending(struct vcpu *v) return intr_info & INTR_INFO_VALID_MASK; } -static void vmx_set_info_guest(struct vcpu *v) +static void vmx_set_info_guest(struct vcpu *v, uint64_t gs_base_kernel) { unsigned long intr_shadow; @@ -1471,6 +1471,10 @@ static void vmx_set_info_guest(struct vcpu *v) __vmwrite(GUEST_INTERRUPTIBILITY_INFO, intr_shadow); } + /* PVH 32bitfixme */ + if ( is_pvh_vcpu(v) ) + __vmwrite(GUEST_GS_BASE, gs_base_kernel); + vmx_vmcs_exit(v); } diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index ccca5df..a8ba06d 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -160,7 +160,7 @@ struct hvm_function_table { int (*msr_write_intercept)(unsigned int msr, uint64_t msr_content); void (*invlpg_intercept)(unsigned long vaddr); void (*handle_cd)(struct vcpu *v, unsigned long value); - void (*set_info_guest)(struct vcpu *v); + void (*set_info_guest)(struct vcpu *v, uint64_t gs_base_kernel); void (*set_rdtsc_exiting)(struct vcpu *v, bool_t); /* Nested HVM */ @@ -434,10 +434,10 @@ void *hvm_map_guest_frame_rw(unsigned long gfn, bool_t permanent); void *hvm_map_guest_frame_ro(unsigned long gfn, bool_t permanent); void hvm_unmap_guest_frame(void *p, bool_t permanent); -static inline void hvm_set_info_guest(struct vcpu *v) +static inline void hvm_set_info_guest(struct vcpu *v, uint64_t gs_base_kernel) { if ( hvm_funcs.set_info_guest ) - return hvm_funcs.set_info_guest(v); + return hvm_funcs.set_info_guest(v, gs_base_kernel); } int hvm_debug_op(struct vcpu *v, int32_t op); diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h index 908ef87..5d220ce 100644 --- a/xen/include/public/arch-x86/xen.h +++ b/xen/include/public/arch-x86/xen.h @@ -154,6 +154,17 @@ typedef uint64_t tsc_timestamp_t; /* RDTSC timestamp */ /* * The following is all CPU context. Note that the fpu_ctxt block is filled * in by FXSAVE if the CPU has feature FXSR; otherwise FSAVE is used. + * + * Also note that when calling DOMCTL_setvcpucontext and VCPU_initialise + * for HVM and PVH guests, not all information in this structure is updated: + * + * - For HVM guests, the structures read include: fpu_ctxt (if + * VGCT_I387_VALID is set), flags, user_regs, debugreg[*] + * + * - PVH guests are the same as HVM guests, but additionally set cr3, + * and for 64-bit guests, gs_base_kernel. Additionally, the following + * entries must be 0: ctrlreg[1], ldt_base, ldt_ents, user_regs.{cs, + * ss, es, ds, fs, gs), gdt_ents, fs_base, and gs_base_user. */ struct vcpu_guest_context { /* FPU registers come first so they can be aligned for FXSAVE/FXRSTOR. */ -- 1.7.9.5
Call the PV version of CPUID. NB at the moment we do not handle forced emulated ops. This means, for example, that xen-detect will report an HVM Xen guest instead of a PV one. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- v15: - Break pvcpuid into a separate patch CC: Jan Beulich <jbeulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> CC: Jun Nakajima <jun.nakajima@intel.com> CC: Eddie Dong <eddie.dong@intel.com> --- xen/arch/x86/hvm/vmx/vmx.c | 2 +- xen/arch/x86/traps.c | 2 +- xen/include/asm-x86/processor.h | 2 ++ 3 files changed, 4 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 8289e3a..bc86f93 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2748,8 +2748,8 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) break; } case EXIT_REASON_CPUID: + is_pvh_vcpu(v) ? pv_cpuid(regs) : vmx_do_cpuid(regs); update_guest_eip(); /* Safe: CPUID */ - vmx_do_cpuid(regs); break; case EXIT_REASON_HLT: update_guest_eip(); /* Safe: HLT */ diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index edb7a6a..852b8e6 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -729,7 +729,7 @@ int cpuid_hypervisor_leaves( uint32_t idx, uint32_t sub_idx, return 1; } -static void pv_cpuid(struct cpu_user_regs *regs) +void pv_cpuid(struct cpu_user_regs *regs) { uint32_t a, b, c, d; diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h index 893afa3..551036d 100644 --- a/xen/include/asm-x86/processor.h +++ b/xen/include/asm-x86/processor.h @@ -567,6 +567,8 @@ void microcode_set_module(unsigned int); int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void), unsigned long len); int microcode_resume_cpu(int cpu); +void pv_cpuid(struct cpu_user_regs *regs); + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_X86_PROCESSOR_H */ -- 1.7.9.5
Register an IO handler for the entire PIO range, and have it call the PV PIO handlers. NB at this point this won''t do the full "copy and execute on the stack with full GPRs" work-around; this may need to be sorted out for dom0 to allow these instructions to happen in guest context. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> --- Note: PVH seems to work just fine without this patch (i.e., all PIOs are noops). Presumably something like this will be needed for dom0 at some point. CC: Mukesh Rathor <mukesh.rathor@oracle.com> CC: Jan Beulich <jbeulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> CC: Jun Nakajima <jun.nakajima@intel.com> CC: Eddie Dong <eddie.dong@intel.com> --- xen/arch/x86/hvm/hvm.c | 18 ++++++++++++++++++ xen/arch/x86/traps.c | 4 ++-- xen/include/asm-x86/traps.h | 8 ++++++++ 3 files changed, 28 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 3f245ba..fedbcde 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -519,6 +519,21 @@ static int hvm_print_line( return X86EMUL_OKAY; } +static int handle_pvh_io( + int dir, uint32_t port, uint32_t bytes, uint32_t *val) +{ + struct vcpu *curr = current; + struct cpu_user_regs *regs = guest_cpu_user_regs(); + + if ( dir == IOREQ_WRITE ) + guest_io_write(port, bytes, *val, curr, regs); + else + *val = guest_io_read(port, bytes, curr, regs); + + return X86EMUL_OKAY; +} + + int hvm_domain_initialise(struct domain *d) { int rc; @@ -566,7 +581,10 @@ int hvm_domain_initialise(struct domain *d) d->arch.hvm_domain.io_handler->num_slot = 0; if ( is_pvh_domain(d) ) + { + register_portio_handler(d, 0, 0xffffffff, handle_pvh_io); return 0; + } hvm_init_guest_time(d); diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 852b8e6..6c278bc 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -1681,7 +1681,7 @@ static int pci_cfg_ok(struct domain *d, int write, int size) return 1; } -static uint32_t guest_io_read( +uint32_t guest_io_read( unsigned int port, unsigned int bytes, struct vcpu *v, struct cpu_user_regs *regs) { @@ -1748,7 +1748,7 @@ static uint32_t guest_io_read( return data; } -static void guest_io_write( +void guest_io_write( unsigned int port, unsigned int bytes, uint32_t data, struct vcpu *v, struct cpu_user_regs *regs) { diff --git a/xen/include/asm-x86/traps.h b/xen/include/asm-x86/traps.h index 82cbcee..a26b318 100644 --- a/xen/include/asm-x86/traps.h +++ b/xen/include/asm-x86/traps.h @@ -49,4 +49,12 @@ extern int guest_has_trap_callback(struct domain *d, uint16_t vcpuid, extern int send_guest_trap(struct domain *d, uint16_t vcpuid, unsigned int trap_nr); +uint32_t guest_io_read( + unsigned int port, unsigned int bytes, + struct vcpu *v, struct cpu_user_regs *regs); +void guest_io_write( + unsigned int port, unsigned int bytes, uint32_t data, + struct vcpu *v, struct cpu_user_regs *regs); + + #endif /* ASM_TRAP_H */ -- 1.7.9.5
George Dunlap
2013-Nov-11 14:57 UTC
[PATCH v15 15/19] pvh: Disable 32-bit guest support for now
To be implemented. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> --- Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> CC: Jan Beulich <jbeulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> --- xen/arch/x86/domain.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 4d82993..ff4238c 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -339,6 +339,14 @@ int switch_compat(struct domain *d) if ( d == NULL ) return -EINVAL; + + if ( is_pvh_domain(d) ) + { + printk(XENLOG_INFO + "Xen currently does not support 32bit PVH guests\n"); + return -EINVAL; + } + if ( !may_switch_mode(d) ) return -EACCES; if ( is_pv_32on64_domain(d) ) -- 1.7.9.5
George Dunlap
2013-Nov-11 14:57 UTC
[PATCH v15 16/19] pvh: Restrict tsc_mode to NEVER_EMULATE for now
The reason given for this restriction in the first place, given in one of the comments checking for PVH requirements, had to do with additional infrastructure required to allow PV RDTSC emulation for PVH guests. Since we don''t use the PV emulation path at all anymore, we may be able to remove this restriction. Experiments show that pvh will boot without apparent issues in "default", "native", and "native_paravirt" mode, but not in "always_emulate" mode. We''ll leave this restriction in until we can sort out what''s going on. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: Tim Deegan <tim@xen.org> --- v15: - Indicate a reason why this mode was disabled in the first place. v13: - Only print a warning if tsc_mode != TSC_MODE_DEFAULT CC: Jan Beulich <jbeulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> --- xen/arch/x86/time.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index c1bbd50..d7ea2ff 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -1827,6 +1827,33 @@ void tsc_set_info(struct domain *d, d->arch.vtsc = 0; return; } + if ( is_pvh_domain(d) ) + { + /* + * PVH fixme: support more tsc modes. + * + * NB: The reason this is disabled here appears to be with + * additional support required to do the PV RDTSC emulation. + * Since we''re no longer taking the PV emulation path for + * anything, we may be able to remove this restriction. + * + * pvhfixme: Experiments show that "default" works for PVH, + * but "always_emulate" does not for some reason. Figure out + * why. + */ + switch ( tsc_mode ) + { + case TSC_MODE_NEVER_EMULATE: + break; + default: + printk(XENLOG_WARNING + "PVH currently does not support tsc emulation. Setting timer_mode = never_emulate\n"); + /* FALLTHRU */ + case TSC_MODE_DEFAULT: + tsc_mode = TSC_MODE_NEVER_EMULATE; + break; + } + } switch ( d->arch.tsc_mode = tsc_mode ) { -- 1.7.9.5
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: Tim Deegan <tim@xen.org> --- v14 - Update interface description - Update list of outstanding fixmes CC: Jan Beulich <jan.beulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> --- docs/misc/pvh-readme.txt | 61 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) create mode 100644 docs/misc/pvh-readme.txt diff --git a/docs/misc/pvh-readme.txt b/docs/misc/pvh-readme.txt new file mode 100644 index 0000000..37db277 --- /dev/null +++ b/docs/misc/pvh-readme.txt @@ -0,0 +1,61 @@ + +PVH : an x86 PV guest running in an HVM container. + +See: http://blog.xen.org/index.php/2012/10/23/the-paravirtualization-spectrum-part-1-the-ends-of-the-spectrum/ + +At the moment HAP is required for PVH. + +At present the only PVH guest is an x86 64bit PV linux. Patches are at: + git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git + +A PVH guest kernel must support following features, as defined for linux +in arch/x86/xen/xen-head.S: + + #define FEATURES_PVH "|writable_descriptor_tables" \ + "|auto_translated_physmap" \ + "|supervisor_mode_kernel" \ + "|hvm_callback_vector" + +In a nutshell: +* the guest uses auto translate: + - p2m is managed by xen + - pagetables are owned by the guest + - mmu_update hypercall not available +* it uses event callback and not vlapic emulation, +* IDT is native, so set_trap_table hcall is also N/A for a PVH guest. + +For a full list of hcalls supported for PVH, see pvh_hypercall64_table +in arch/x86/hvm/hvm.c in xen. From the ABI prespective, it''s mostly a +PV guest with auto translate, although it does use hvm_op for setting +callback vector, and has a special version of arch_set_guest_info for bringing +up secondary cpus. + +The initial phase targets the booting of a 64bit UP/SMP linux guest in PVH +mode. This is done by adding: pvh=1 in the config file. xl, and not xm, is +supported. Phase I patches are broken into three parts: + - xen changes for booting of 64bit PVH guest + - tools changes for creating a PVH guest + - boot of 64bit dom0 in PVH mode. + +Following fixme''s exist in the code: + - arch/x86/time.c: support more tsc modes. + +Following remain to be done for PVH: + - Get rid of PVH mode, make it just HVM with some flags set + - implement arch_get_info_guest() for pvh. + - Investigate what else needs to be done for VMI support. + - AMD port. + - 32bit PVH guest support in both linux and xen. Xen changes are tagged + "32bitfixme". + - Add support for monitoring guest behavior. See hvm_memory_event* functions + in hvm.c + - vcpu hotplug support + - Live migration of PVH guests. + - Avail PVH dom0 of posted interrupts. (This will be a big win). + + +Note, any emails to me must be cc''d to xen devel mailing list. OTOH, please +cc me on PVH emails to the xen devel mailing list. + +Mukesh Rathor +mukesh.rathor [at] oracle [dot] com -- 1.7.9.5
George Dunlap
2013-Nov-11 14:57 UTC
[PATCH v15 18/19] pvh tools: libxc changes to build a PVH guest.
From: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: Tim Deegan <tim@xen.org> CC: Ian Jackson <ian.jackson@citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> --- tools/libxc/xc_dom.h | 1 + tools/libxc/xc_dom_core.c | 9 +++++ tools/libxc/xc_dom_x86.c | 90 +++++++++++++++++++++++++++++---------------- 3 files changed, 69 insertions(+), 31 deletions(-) diff --git a/tools/libxc/xc_dom.h b/tools/libxc/xc_dom.h index 935b49e..90679da 100644 --- a/tools/libxc/xc_dom.h +++ b/tools/libxc/xc_dom.h @@ -127,6 +127,7 @@ struct xc_dom_image { domid_t console_domid; domid_t xenstore_domid; xen_pfn_t shared_info_mfn; + int pvh_enabled; xc_interface *xch; domid_t guest_domid; diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c index 3bf51ef..9355fe8 100644 --- a/tools/libxc/xc_dom_core.c +++ b/tools/libxc/xc_dom_core.c @@ -766,6 +766,15 @@ int xc_dom_parse_image(struct xc_dom_image *dom) goto err; } + if ( dom->pvh_enabled ) + { + const char *pvh_features = "writable_descriptor_tables|" + "auto_translated_physmap|" + "supervisor_mode_kernel|" + "hvm_callback_vector"; + elf_xen_parse_features(pvh_features, dom->f_requested, NULL); + } + /* check features */ for ( i = 0; i < XENFEAT_NR_SUBMAPS; i++ ) { diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index 60fc544..a30d9cc 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -407,7 +407,8 @@ static int setup_pgtables_x86_64(struct xc_dom_image *dom) pgpfn = (addr - dom->parms.virt_base) >> PAGE_SHIFT_X86; l1tab[l1off] pfn_to_paddr(xc_dom_p2m_guest(dom, pgpfn)) | L1_PROT; - if ( (addr >= dom->pgtables_seg.vstart) && + if ( (!dom->pvh_enabled) && + (addr >= dom->pgtables_seg.vstart) && (addr < dom->pgtables_seg.vend) ) l1tab[l1off] &= ~_PAGE_RW; /* page tables are r/o */ @@ -588,6 +589,13 @@ static int vcpu_x86_32(struct xc_dom_image *dom, void *ptr) DOMPRINTF_CALLED(dom->xch); + if ( dom->pvh_enabled ) + { + xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, + "%s: PVH not supported for 32bit guests.", __FUNCTION__); + return -1; + } + /* clear everything */ memset(ctxt, 0, sizeof(*ctxt)); @@ -630,12 +638,6 @@ static int vcpu_x86_64(struct xc_dom_image *dom, void *ptr) /* clear everything */ memset(ctxt, 0, sizeof(*ctxt)); - ctxt->user_regs.ds = FLAT_KERNEL_DS_X86_64; - ctxt->user_regs.es = FLAT_KERNEL_DS_X86_64; - ctxt->user_regs.fs = FLAT_KERNEL_DS_X86_64; - ctxt->user_regs.gs = FLAT_KERNEL_DS_X86_64; - ctxt->user_regs.ss = FLAT_KERNEL_SS_X86_64; - ctxt->user_regs.cs = FLAT_KERNEL_CS_X86_64; ctxt->user_regs.rip = dom->parms.virt_entry; ctxt->user_regs.rsp dom->parms.virt_base + (dom->bootstack_pfn + 1) * PAGE_SIZE_X86; @@ -643,15 +645,25 @@ static int vcpu_x86_64(struct xc_dom_image *dom, void *ptr) dom->parms.virt_base + (dom->start_info_pfn) * PAGE_SIZE_X86; ctxt->user_regs.rflags = 1 << 9; /* Interrupt Enable */ - ctxt->kernel_ss = ctxt->user_regs.ss; - ctxt->kernel_sp = ctxt->user_regs.esp; - ctxt->flags = VGCF_in_kernel_X86_64 | VGCF_online_X86_64; cr3_pfn = xc_dom_p2m_guest(dom, dom->pgtables_seg.pfn); ctxt->ctrlreg[3] = xen_pfn_to_cr3_x86_64(cr3_pfn); DOMPRINTF("%s: cr3: pfn 0x%" PRIpfn " mfn 0x%" PRIpfn "", __FUNCTION__, dom->pgtables_seg.pfn, cr3_pfn); + if ( dom->pvh_enabled ) + return 0; + + ctxt->user_regs.ds = FLAT_KERNEL_DS_X86_64; + ctxt->user_regs.es = FLAT_KERNEL_DS_X86_64; + ctxt->user_regs.fs = FLAT_KERNEL_DS_X86_64; + ctxt->user_regs.gs = FLAT_KERNEL_DS_X86_64; + ctxt->user_regs.ss = FLAT_KERNEL_SS_X86_64; + ctxt->user_regs.cs = FLAT_KERNEL_CS_X86_64; + + ctxt->kernel_ss = ctxt->user_regs.ss; + ctxt->kernel_sp = ctxt->user_regs.esp; + return 0; } @@ -752,7 +764,7 @@ int arch_setup_meminit(struct xc_dom_image *dom) rc = x86_compat(dom->xch, dom->guest_domid, dom->guest_type); if ( rc ) return rc; - if ( xc_dom_feature_translated(dom) ) + if ( xc_dom_feature_translated(dom) && !dom->pvh_enabled ) { dom->shadow_enabled = 1; rc = x86_shadow(dom->xch, dom->guest_domid); @@ -828,6 +840,38 @@ int arch_setup_bootearly(struct xc_dom_image *dom) return 0; } +/* + * Map grant table frames into guest physmap. PVH manages grant during boot + * via HVM mechanisms. + */ +static int map_grant_table_frames(struct xc_dom_image *dom) +{ + int i, rc; + + if ( dom->pvh_enabled ) + return 0; + + for ( i = 0; ; i++ ) + { + rc = xc_domain_add_to_physmap(dom->xch, dom->guest_domid, + XENMAPSPACE_grant_table, + i, dom->total_pages + i); + if ( rc != 0 ) + { + if ( (i > 0) && (errno == EINVAL) ) + { + DOMPRINTF("%s: %d grant tables mapped", __FUNCTION__, i); + break; + } + xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, + "%s: mapping grant tables failed " "(pfn=0x%" PRIpfn + ", rc=%d)", __FUNCTION__, dom->total_pages + i, rc); + return rc; + } + } + return 0; +} + int arch_setup_bootlate(struct xc_dom_image *dom) { static const struct { @@ -866,7 +910,6 @@ int arch_setup_bootlate(struct xc_dom_image *dom) else { /* paravirtualized guest with auto-translation */ - int i; /* Map shared info frame into guest physmap. */ rc = xc_domain_add_to_physmap(dom->xch, dom->guest_domid, @@ -880,25 +923,10 @@ int arch_setup_bootlate(struct xc_dom_image *dom) return rc; } - /* Map grant table frames into guest physmap. */ - for ( i = 0; ; i++ ) - { - rc = xc_domain_add_to_physmap(dom->xch, dom->guest_domid, - XENMAPSPACE_grant_table, - i, dom->total_pages + i); - if ( rc != 0 ) - { - if ( (i > 0) && (errno == EINVAL) ) - { - DOMPRINTF("%s: %d grant tables mapped", __FUNCTION__, i); - break; - } - xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, - "%s: mapping grant tables failed " "(pfn=0x%" - PRIpfn ", rc=%d)", __FUNCTION__, dom->total_pages + i, rc); - return rc; - } - } + rc = map_grant_table_frames(dom); + if ( rc != 0 ) + return rc; + shinfo = dom->shared_info_pfn; } -- 1.7.9.5
George Dunlap
2013-Nov-11 14:57 UTC
[PATCH v15 19/19] pvh tools: libxl changes to create a PVH guest.
From: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Tim Deegan <tim@xen.org> --- v13 (gwd): - Added XEN_DOMCTL_CDF_pvh_guest flag CC: Ian Jackson <ian.jackson@citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> --- docs/man/xl.cfg.pod.5 | 3 +++ tools/libxl/libxl.h | 6 ++++++ tools/libxl/libxl_create.c | 13 +++++++++++++ tools/libxl/libxl_dom.c | 3 +++ tools/libxl/libxl_internal.h | 1 + tools/libxl/libxl_types.idl | 1 + tools/libxl/libxl_x86.c | 4 +++- tools/libxl/xl_cmdimpl.c | 1 + 8 files changed, 31 insertions(+), 1 deletion(-) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index d2d8921..e6fc83f 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -653,6 +653,9 @@ if your particular guest kernel does not require this behaviour then it is safe to allow this to be enabled but you may wish to disable it anyway. +=item B<pvh=BOOLEAN> +Selects whether to run this PV guest in an HVM container. Default is 0. + =back =head2 Fully-virtualised (HVM) Guest Specific Options diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 1c6675d..a214d77 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -369,6 +369,12 @@ */ #define LIBXL_HAVE_DOMAIN_CREATE_RESTORE_PARAMS 1 +/* + * LIBXL_HAVE_CREATEINFO_PVH + * If this is defined, then libxl supports creation of a PVH guest. + */ +#define LIBXL_HAVE_CREATEINFO_PVH 1 + /* Functions annotated with LIBXL_EXTERNAL_CALLERS_ONLY may not be * called from within libxl itself. Callers outside libxl, who * do not #include libxl_internal.h, are fine. */ diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 9d793ba..1442c7b 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -33,6 +33,9 @@ int libxl__domain_create_info_setdefault(libxl__gc *gc, if (c_info->type == LIBXL_DOMAIN_TYPE_HVM) { libxl_defbool_setdefault(&c_info->hap, true); libxl_defbool_setdefault(&c_info->oos, true); + } else { + libxl_defbool_setdefault(&c_info->pvh, false); + libxl_defbool_setdefault(&c_info->hap, libxl_defbool_val(c_info->pvh)); } libxl_defbool_setdefault(&c_info->run_hotplug_scripts, true); @@ -381,6 +384,8 @@ int libxl__domain_build(libxl__gc *gc, break; case LIBXL_DOMAIN_TYPE_PV: + state->pvh_enabled = libxl_defbool_val(d_config->c_info.pvh); + ret = libxl__build_pv(gc, domid, info, state); if (ret) goto out; @@ -440,6 +445,14 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_create_info *info, flags |= XEN_DOMCTL_CDF_hvm_guest; flags |= libxl_defbool_val(info->hap) ? XEN_DOMCTL_CDF_hap : 0; flags |= libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off; + } else if ( libxl_defbool_val(info->pvh) ) { + flags |= XEN_DOMCTL_CDF_pvh_guest; + if ( !libxl_defbool_val(info->hap) ) { + LOG(ERROR, "HAP must be on for PVH"); + rc = ERROR_INVAL; + goto out; + } + flags |= XEN_DOMCTL_CDF_hap; } *domid = -1; diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 1812bdc..301d29b 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -348,7 +348,10 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid, return ERROR_FAIL; } + dom->pvh_enabled = state->pvh_enabled; + LOG(DEBUG, "pv kernel mapped %d path %s", state->pv_kernel.mapped, state->pv_kernel.path); + if (state->pv_kernel.mapped) { ret = xc_dom_kernel_mem(dom, state->pv_kernel.data, diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 4729566..bc0185b 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -887,6 +887,7 @@ typedef struct { libxl__file_reference pv_kernel; libxl__file_reference pv_ramdisk; const char * pv_cmdline; + bool pvh_enabled; } libxl__domain_build_state; _hidden int libxl__build_pre(libxl__gc *gc, uint32_t domid, diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index 5c43d6f..a189db3 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -257,6 +257,7 @@ libxl_domain_create_info = Struct("domain_create_info",[ ("platformdata", libxl_key_value_list), ("poolid", uint32), ("run_hotplug_scripts",libxl_defbool), + ("pvh", libxl_defbool), ], dir=DIR_IN) libxl_domain_restore_params = Struct("domain_restore_params", [ diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c index a78c91d..87a8110 100644 --- a/tools/libxl/libxl_x86.c +++ b/tools/libxl/libxl_x86.c @@ -290,7 +290,9 @@ int libxl__arch_domain_create(libxl__gc *gc, libxl_domain_config *d_config, if (rtc_timeoffset) xc_domain_set_time_offset(ctx->xch, domid, rtc_timeoffset); - if (d_config->b_info.type == LIBXL_DOMAIN_TYPE_HVM) { + if (d_config->b_info.type == LIBXL_DOMAIN_TYPE_HVM || + libxl_defbool_val(d_config->c_info.pvh)) { + unsigned long shadow; shadow = (d_config->b_info.shadow_memkb + 1023) / 1024; xc_shadow_control(ctx->xch, domid, XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION, NULL, 0, &shadow, 0, NULL); diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 40feb7d..ba2b97f 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -642,6 +642,7 @@ static void parse_config_data(const char *config_source, !strncmp(buf, "hvm", strlen(buf))) c_info->type = LIBXL_DOMAIN_TYPE_HVM; + xlu_cfg_get_defbool(config, "pvh", &c_info->pvh, 0); xlu_cfg_get_defbool(config, "hap", &c_info->hap, 0); if (xlu_cfg_replace_string (config, "name", &c_info->name, 0)) { -- 1.7.9.5
On 11/11/13 14:57, George Dunlap wrote:> == Status => > 01 a Allow vmx_update_debug_state to be called when v!=current > 02 A libxc: Move temporary grant table mapping to end of memory > 03 a pvh prep: code motion > 04 n pvh: Tolerate HVM guests having no ioreq page > 05 a Introduce pv guest type and has_hvm_container macros > 06 * pvh: Introduce PVH guest type > 07 pvh: Disable unneeded features of HVM containers > 08 * ! pvh: vmx-specific changes > 09 ra pvh: Do not allow PVH guests to change paging modes > 10 a pvh: PVH access to hypercalls > 11 Ra pvh: Use PV e820 > 12 * ! pvh: Set up more PV stuff in set_info_guest > 13 * ! pvh: PV cpuid > 14 * ! pvh: Use PV handlers for :IOActually, with the new way of handling PIO, this doesn''t touch the vmx code anymore; so ignore the ''!'' here. -George
On 11/11/2013 14:57, "George Dunlap" <george.dunlap@eu.citrix.com> wrote:> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> > CC: Mukesh Rathor <mukesh.rathor@oracle.com> > CC: Jan Beulich <beulich@suse.com> > CC: Tim Deegan <tim@xen.org> > CC: Keir Fraser <keir@xen.org> > CC: Ian Jackson <ian.jackson@citrix.com> > CC: Ian Campbell <ian.campbell@citrix.com> > CC: Jun Nakajima <jun.nakajima@intel.com> > CC: Eddie Dong <eddie.dong@intel.com>Acked-by: Keir Fraser <keir@xen.org>
Acked-by Eddie Dong <eddie.dong@intel.com> -----Original Message----- From: George Dunlap [mailto:george.dunlap@eu.citrix.com] Sent: Monday, November 11, 2013 10:57 PM To: xen-devel@lists.xen.org Cc: George Dunlap; Mukesh Rathor; Jan Beulich; Tim Deegan; Keir Fraser; Ian Jackson; Ian Campbell; Nakajima, Jun; Dong, Eddie Subject: [PATCH v15 00/18] Introduce PVH domU support == Status = 01 a Allow vmx_update_debug_state to be called when v!=current 02 A libxc: Move temporary grant table mapping to end of memory 03 a pvh prep: code motion 04 n pvh: Tolerate HVM guests having no ioreq page 05 a Introduce pv guest type and has_hvm_container macros 06 * pvh: Introduce PVH guest type 07 pvh: Disable unneeded features of HVM containers 08 * ! pvh: vmx-specific changes 09 ra pvh: Do not allow PVH guests to change paging modes 10 a pvh: PVH access to hypercalls 11 Ra pvh: Use PV e820 12 * ! pvh: Set up more PV stuff in set_info_guest 13 * ! pvh: PV cpuid 14 * ! pvh: Use PV handlers for :IO 15 A pvh: Disable 32-bit guest support for now 16 a pvh: Restrict tsc_mode to NEVER_EMULATE for now 17 a pvh: Documentation 18 a PVH xen tools: libxc changes to build a PVH guest. 19 a PVH xen tools: libxl changes to create a PVH guest. Key *: Non-trivial changes in v15. n: New in v15 a/r: acked / reviewed (lowercase for 1, capitals for >1) !: Still missing any review from necesary maintainers (VMX maintainers) This series addresses review comments from versions 13 and 14. Additionally, it generalizes the PVH IO path from the previous series, by allowing HVM guests in general to tolerate not having a backing device model, and then registering a PIO handler for PVH guests which will call into the PV IO handlers. Major "to-do" or "open issues" (which need not stop the series going in) include: - Get rid of the extra mode, and make PVH just HVM with some flags - Implement full PV set_info_guest, to make the cpu bring-up code the same - Whether to support forced invalid ops. ATM the only effect of not having this is that xen-detect claims to be in an HVM xen guest rather than a PV xen guest. == Summay = This patch series is a reworking of a series developed by Mukesh Rathor at Oracle. The entirety of the design and development was done by him; I have only reworked, reorganized, and simplified things in a way that I think makes more sense. The vast majority of the credit for this effort therefore goes to him. This version is labelled v14 because it is based on his most recent series, v11. Because this is based on his work, I retain the "Signed-off-by" in patches which are based on his code. This is not meant to imply that he supports the modified version, only that he is involved in certifying that the origin of the code for copyright purposes. This patch series is broken down into several broad strokes: * Miscellaneous fixes or tweaks * Code motion, so future patches are simpler * Introduction of the "hvm_container" concept, which will form the basis for sharing codepaths between hvm and pvh * Start with PVH as an HVM container * Disable unneeded HVM functionality * Enable PV functionality * Disable not-yet-implemented functionality * Enable toolstack changes required to make PVH guests This patch series can also be pulled from this git tree: git://xenbits.xen.org/people/gdunlap/xen.git out/pvh-v15 The kernel code for PVH guests can be found here: git://oss.oracle.com/git/mrathor/linux.git pvh.v9-muk-1 (That repo/branch also contains a config file, pvh-config-file) Changes in v14 can be found inline; major changes since v13 include: * Various bug fixes * Use HVM emulation for IO instructions * ...thus removing many of the changes required to allow the PV emulation codepath to work for PVH guests Changes in v13 can be found inline; major changes since v12 include: * Include Mukesh''s toolstack patches (v4) * Allocate hvm_param struct for PVH domains; remove patch disabling memevents For those who have been following the series as it develops, here is a summary of the major changes from Mukesh''s series (v11->v12): * Introduction of "has_hvm_container_*()" macros, rather than using "!is_pv_*". The patch which introduces this also does the vast majority of the "heavy lifting" in terms of defining PVH. * Effort is made to use as much common code as possible. No separate vmcs constructor, no separate vmexit handlers. More of a "start with everything and disable if necessary" approach rather than "start with nothing and enable as needed" approach. * One exception is arch_set_info_guest(), where a small amount of code duplication meant a lot fewer "if(!is_pvh_domain())"s in awkward places * I rely on things being disabled at a higher level and passed down. For instance, I no longer explicitly disable rdtsc exiting in construct_vmcs(), since that will happen automatically when we''re in NEVER_EMULATE mode (which is currently enforced for PVH). Similarly for nested vmx and things relating to HAP mode. * I have also done a slightly more extensive audit of is_pv_* and is_hvm_* and tried to do more restrictions. * I changed the "enable PVH by setting PV + HAP", replacing it instead with a separate flag, just like the HVM case, since it makes sense to plan on using shadow in the future (although it is Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> CC: Mukesh Rathor <mukesh.rathor@oracle.com> CC: Jan Beulich <beulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: Keir Fraser <keir@xen.org> CC: Ian Jackson <ian.jackson@citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Jun Nakajima <jun.nakajima@intel.com> CC: Eddie Dong <eddie.dong@intel.com>
Ian Jackson
2013-Nov-12 11:33 UTC
Re: [PATCH v15 18/19] pvh tools: libxc changes to build a PVH guest.
George Dunlap writes ("[PATCH v15 18/19] pvh tools: libxc changes to build a PVH guest."):> From: Mukesh Rathor <mukesh.rathor@oracle.com> > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > Acked-by: Tim Deegan <tim@xen.org>Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> This looks like a reasonable implementation of something plausible, although I haven''t second-guessed of exactly what. Ian.
Ian Jackson
2013-Nov-12 11:38 UTC
Re: [PATCH v15 19/19] pvh tools: libxl changes to create a PVH guest.
George Dunlap writes ("[PATCH v15 19/19] pvh tools: libxl changes to create a PVH guest."):> From: Mukesh Rathor <mukesh.rathor@oracle.com> > > Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> > Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> > Acked-by: Tim Deegan <tim@xen.org>This looks good to me, assuming that the lower layers do what I think they do. I have one style quibble:> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > index 9d793ba..1442c7b 100644 > --- a/tools/libxl/libxl_create.c > +++ b/tools/libxl/libxl_create.c...> @@ -440,6 +445,14 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_create_info *info, > flags |= XEN_DOMCTL_CDF_hvm_guest; > flags |= libxl_defbool_val(info->hap) ? XEN_DOMCTL_CDF_hap : 0; > flags |= libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off; > + } else if ( libxl_defbool_val(info->pvh) ) {...> + if ( !libxl_defbool_val(info->hap) ) {In libxl there should be no spaces inside the if''s ( ). Thanks, Ian.
Jan Beulich
2013-Nov-12 13:34 UTC
Re: [PATCH v15 05/19] pvh prep: Introduce pv guest type and has_hvm_container macros
>>> On 11.11.13 at 15:57, George Dunlap <george.dunlap@eu.citrix.com> wrote: > --- a/xen/arch/x86/acpi/suspend.c > +++ b/xen/arch/x86/acpi/suspend.c > @@ -85,7 +85,7 @@ void restore_rest_processor_state(void) > BUG(); > > /* Maybe load the debug registers. */ > - BUG_ON(is_hvm_vcpu(curr)); > + BUG_ON(!is_pv_vcpu(curr)); > if ( !is_idle_vcpu(curr) && curr->arch.debugreg[7] ) > { > write_debugreg(0, curr->arch.debugreg[0]);This is certainly fine for now, but I guess in the long run (i.e. when supporting Dom0) this will need changing from BUG_ON() to integrating the condition with the if().> @@ -1246,8 +1246,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c) > bool_t compat = is_pv_32on64_domain(v->domain); > #define c(fld) (!compat ? (c.nat->fld) : (c.cmp->fld)) > > - if ( is_hvm_vcpu(v) ) > - memset(c.nat, 0, sizeof(*c.nat)); > + memset(c.nat, 0, sizeof(*c.nat));Removing the conditional is both dangerous and unmotivated: Dangerous because you now assume that the space allocated is always covering the (larger) native size (this currently happens to be that way in the only caller, but isn''t guaranteed). And unmotivated because switching just the conditional (as done throughout the rest of the patch) would have sufficed here. I''m intending to adjust this once I get to committing the first so many patches from this series. (And in the unlikely case I might need to defer committing until another rev of the series, this means Reviewed-by as long as the above gets fixed.) Jan
Jan Beulich
2013-Nov-12 13:51 UTC
Re: [PATCH v15 07/19] pvh: Disable unneeded features of HVM containers
>>> On 11.11.13 at 15:57, George Dunlap <george.dunlap@eu.citrix.com> wrote: > @@ -1414,6 +1442,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, > /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs, > * a fast path for LAPIC accesses, skipping the p2m lookup. */ > if ( !nestedhvm_vcpu_in_guestmode(v) > + && is_hvm_vcpu(v)This change looks unnecessary - nestedhvm_vcpu_enter_guestmode() (the only way by which nestedhvm_vcpu_in_guestmode() could return other than zero) must obviously never be used on a PVH guest. Jan
>>> On 11.11.13 at 15:57, George Dunlap <george.dunlap@eu.citrix.com> wrote: > --- a/xen/arch/x86/hvm/vmx/vmcs.c > +++ b/xen/arch/x86/hvm/vmx/vmcs.c > @@ -28,6 +28,7 @@ > #include <asm/msr.h> > #include <asm/xstate.h> > #include <asm/hvm/hvm.h> > +#include <asm/hvm/nestedhvm.h>Going through the changes below I can''t see why you need this, and it also looks pretty far off for a PVH related change. Jan
>>> On 11.11.13 at 15:57, George Dunlap <george.dunlap@eu.citrix.com> wrote: > @@ -566,7 +581,10 @@ int hvm_domain_initialise(struct domain *d) > d->arch.hvm_domain.io_handler->num_slot = 0; > > if ( is_pvh_domain(d) ) > + { > + register_portio_handler(d, 0, 0xffffffff, handle_pvh_io);Nice idea, but I''m afraid it''ll need a little more than this long term (i.e. I''m fine for this to go is as is, with the second paragraph below address in some way): hvm_io_intercept() first tries dpci_ioport_intercept(), and without device model I think that''s not correct. If indeed it isn''t, a simple solution might be to just disallow the registration of translated I/O port ranges for PVH guests. The other thing here is the use of 0xffffffff: The parameter is a size one, not an end of range. And considering that we''re talking about I/O ports, 0x10000 would seem to be the right value (or, if taking the potential wrapping into account, 0x10003). Jan
George Dunlap
2013-Nov-12 14:56 UTC
Re: [PATCH v15 07/19] pvh: Disable unneeded features of HVM containers
On 11/12/2013 01:51 PM, Jan Beulich wrote:>>>> On 11.11.13 at 15:57, George Dunlap <george.dunlap@eu.citrix.com> wrote: >> @@ -1414,6 +1442,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, >> /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs, >> * a fast path for LAPIC accesses, skipping the p2m lookup. */ >> if ( !nestedhvm_vcpu_in_guestmode(v) >> + && is_hvm_vcpu(v) > > This change looks unnecessary - nestedhvm_vcpu_enter_guestmode() > (the only way by which nestedhvm_vcpu_in_guestmode() could return > other than zero) must obviously never be used on a PVH guest.You''ve got the polarity of that check wrong -- PVH guests are never in guest mode, therefore !nestedhvm_vcpu_in_guestmode() is always true. -George
Jan Beulich
2013-Nov-12 15:03 UTC
Re: [PATCH v15 07/19] pvh: Disable unneeded features of HVM containers
>>> On 12.11.13 at 15:56, George Dunlap <george.dunlap@eu.citrix.com> wrote: > On 11/12/2013 01:51 PM, Jan Beulich wrote: >>>>> On 11.11.13 at 15:57, George Dunlap <george.dunlap@eu.citrix.com> wrote: >>> @@ -1414,6 +1442,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, >>> /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs, >>> * a fast path for LAPIC accesses, skipping the p2m lookup. */ >>> if ( !nestedhvm_vcpu_in_guestmode(v) >>> + && is_hvm_vcpu(v) >> >> This change looks unnecessary - nestedhvm_vcpu_enter_guestmode() >> (the only way by which nestedhvm_vcpu_in_guestmode() could return >> other than zero) must obviously never be used on a PVH guest. > > You''ve got the polarity of that check wrong -- PVH guests are never in > guest mode, therefore !nestedhvm_vcpu_in_guestmode() is always true.Oh, indeed. In which case I''d slightly prefer !is_pvh_vcpu() here, but I guess it''s not really significant anyway. Jan
On 11/12/2013 02:03 PM, Jan Beulich wrote:>>>> On 11.11.13 at 15:57, George Dunlap <george.dunlap@eu.citrix.com> wrote: >> --- a/xen/arch/x86/hvm/vmx/vmcs.c >> +++ b/xen/arch/x86/hvm/vmx/vmcs.c >> @@ -28,6 +28,7 @@ >> #include <asm/msr.h> >> #include <asm/xstate.h> >> #include <asm/hvm/hvm.h> >> +#include <asm/hvm/nestedhvm.h> > > Going through the changes below I can''t see why you need this, > and it also looks pretty far off for a PVH related change.It''s needed for the HVM_CR4_RESERVED_BITS macro, which calls nestedhvm_enabled(). I couldn''t figure out a clean way to include nestedhvm.h from hvm.h (ran into weird compilation errors), so just gave up and added it in here. -George
George Dunlap
2013-Nov-12 15:08 UTC
Re: [PATCH v15 07/19] pvh: Disable unneeded features of HVM containers
On 11/12/2013 03:03 PM, Jan Beulich wrote:>>>> On 12.11.13 at 15:56, George Dunlap <george.dunlap@eu.citrix.com> wrote: >> On 11/12/2013 01:51 PM, Jan Beulich wrote: >>>>>> On 11.11.13 at 15:57, George Dunlap <george.dunlap@eu.citrix.com> wrote: >>>> @@ -1414,6 +1442,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, >>>> /* For the benefit of 32-bit WinXP (& older Windows) on AMD CPUs, >>>> * a fast path for LAPIC accesses, skipping the p2m lookup. */ >>>> if ( !nestedhvm_vcpu_in_guestmode(v) >>>> + && is_hvm_vcpu(v) >>> >>> This change looks unnecessary - nestedhvm_vcpu_enter_guestmode() >>> (the only way by which nestedhvm_vcpu_in_guestmode() could return >>> other than zero) must obviously never be used on a PVH guest. >> >> You''ve got the polarity of that check wrong -- PVH guests are never in >> guest mode, therefore !nestedhvm_vcpu_in_guestmode() is always true. > > Oh, indeed. In which case I''d slightly prefer !is_pvh_vcpu() here, > but I guess it''s not really significant anyway.The plan I think is for is_pvh to go away, and instead to have a flag indicating whether the guest has / is using a LAPIC. -George
George Dunlap
2013-Nov-12 15:12 UTC
Re: [PATCH v15 05/19] pvh prep: Introduce pv guest type and has_hvm_container macros
On 11/12/2013 01:34 PM, Jan Beulich wrote:>>>> On 11.11.13 at 15:57, George Dunlap <george.dunlap@eu.citrix.com> wrote: >> --- a/xen/arch/x86/acpi/suspend.c >> +++ b/xen/arch/x86/acpi/suspend.c >> @@ -85,7 +85,7 @@ void restore_rest_processor_state(void) >> BUG(); >> >> /* Maybe load the debug registers. */ >> - BUG_ON(is_hvm_vcpu(curr)); >> + BUG_ON(!is_pv_vcpu(curr)); >> if ( !is_idle_vcpu(curr) && curr->arch.debugreg[7] ) >> { >> write_debugreg(0, curr->arch.debugreg[0]); > > This is certainly fine for now, but I guess in the long run (i.e. when > supporting Dom0) this will need changing from BUG_ON() to > integrating the condition with the if().Yes -- the BUG_ON() was to make sure someone thinks carefully about what the right thing is to do here. I suppose there should have been a fixme here as well, however.> >> @@ -1246,8 +1246,7 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c) >> bool_t compat = is_pv_32on64_domain(v->domain); >> #define c(fld) (!compat ? (c.nat->fld) : (c.cmp->fld)) >> >> - if ( is_hvm_vcpu(v) ) >> - memset(c.nat, 0, sizeof(*c.nat)); >> + memset(c.nat, 0, sizeof(*c.nat)); > > Removing the conditional is both dangerous and unmotivated: > Dangerous because you now assume that the space allocated is > always covering the (larger) native size (this currently happens to > be that way in the only caller, but isn''t guaranteed). And > unmotivated because switching just the conditional (as done > throughout the rest of the patch) would have sufficed here.Indeed, I can''t think of why I didn''t change it to "!is_pv_vcpu()" here. -George
>>> On 12.11.13 at 16:06, George Dunlap <george.dunlap@eu.citrix.com> wrote: > On 11/12/2013 02:03 PM, Jan Beulich wrote: >>>>> On 11.11.13 at 15:57, George Dunlap <george.dunlap@eu.citrix.com> wrote: >>> --- a/xen/arch/x86/hvm/vmx/vmcs.c >>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c >>> @@ -28,6 +28,7 @@ >>> #include <asm/msr.h> >>> #include <asm/xstate.h> >>> #include <asm/hvm/hvm.h> >>> +#include <asm/hvm/nestedhvm.h> >> >> Going through the changes below I can''t see why you need this, >> and it also looks pretty far off for a PVH related change. > > It''s needed for the HVM_CR4_RESERVED_BITS macro, which calls > nestedhvm_enabled(). I couldn''t figure out a clean way to include > nestedhvm.h from hvm.h (ran into weird compilation errors), so just gave > up and added it in here.Fine then for now, albeit the proper change would be to move the declaration of nestedhvm_enabled() into hvm.h. Jan
On 11/12/2013 02:33 PM, Jan Beulich wrote:>>>> On 11.11.13 at 15:57, George Dunlap <george.dunlap@eu.citrix.com> wrote: >> @@ -566,7 +581,10 @@ int hvm_domain_initialise(struct domain *d) >> d->arch.hvm_domain.io_handler->num_slot = 0; >> >> if ( is_pvh_domain(d) ) >> + { >> + register_portio_handler(d, 0, 0xffffffff, handle_pvh_io); > > Nice idea, but I''m afraid it''ll need a little more than this long term > (i.e. I''m fine for this to go is as is, with the second paragraph below > address in some way): hvm_io_intercept() first tries > dpci_ioport_intercept(), and without device model I think that''s not > correct. If indeed it isn''t, a simple solution might be to just disallow > the registration of translated I/O port ranges for PVH guests. > > The other thing here is the use of 0xffffffff: The parameter is a > size one, not an end of range. And considering that we''re talking > about I/O ports, 0x10000 would seem to be the right value (or, > if taking the potential wrapping into account, 0x10003).Oh, right -- I was looking at a trace of the guest, but confusing the values written with the address. Do you just want to change it to 0x10000? (Or 0x10003?) -George
>>> On 12.11.13 at 17:54, George Dunlap <george.dunlap@eu.citrix.com> wrote: > On 11/12/2013 02:33 PM, Jan Beulich wrote: >>>>> On 11.11.13 at 15:57, George Dunlap <george.dunlap@eu.citrix.com> wrote: >>> @@ -566,7 +581,10 @@ int hvm_domain_initialise(struct domain *d) >>> d->arch.hvm_domain.io_handler->num_slot = 0; >>> >>> if ( is_pvh_domain(d) ) >>> + { >>> + register_portio_handler(d, 0, 0xffffffff, handle_pvh_io); >> >> Nice idea, but I''m afraid it''ll need a little more than this long term >> (i.e. I''m fine for this to go is as is, with the second paragraph below >> address in some way): hvm_io_intercept() first tries >> dpci_ioport_intercept(), and without device model I think that''s not >> correct. If indeed it isn''t, a simple solution might be to just disallow >> the registration of translated I/O port ranges for PVH guests. >> >> The other thing here is the use of 0xffffffff: The parameter is a >> size one, not an end of range. And considering that we''re talking >> about I/O ports, 0x10000 would seem to be the right value (or, >> if taking the potential wrapping into account, 0x10003). > > Oh, right -- I was looking at a trace of the guest, but confusing the > values written with the address. Do you just want to change it to > 0x10000? (Or 0x10003?)Yes, I will (but not today anymore). Jan