The following patch series adds PMU support in Xen for PV guests. There is a companion patchset for Linux kernel. In addition, another set of changes will be provided (later) for userland perf code. This version has following limitations: * For accurate profiling of dom0/Xen dom0 VCPUs should be pinned. * Hypervisor code is only profiled on processors that have running dom0 VCPUs on them. * No backtrace support. * Will fail to load under XSM: we ran out of bits in permissions vector and this needs to be fixed separately A few notes that may help reviewing: * A shared data structure (xenpmu_data_t) between each PV VPCU and hypervisor CPU is used for passing registers'' values as well as PMU state at the time of PMU interrupt. * PMU interrupts are taken by hypervisor at NMI level for both HVM and PV. * Guest''s interrupt handler does not read/write PMU MSRs directly. Instead, it accesses xenpmu_data_t and flushes it to HW it before returning. * PMU mode is controlled at runtime via /sys/hypervisor/pmu/pmu/{pmu_mode,pmu_flags} in addition to ''vpmu'' boot option (which is preserved for back compatibility). The following modes are provided: * disable: VPMU is off * enable: VPMU is on. Guests can profile themselves, dom0 profiles itself and Xen * priv_enable: dom0 only profiling. dom0 collects samples for everyone. Sampling in guests is suspended. * /proc/xen/xensyms file exports hypervisor''s symbols to dom0 (similar to /proc/kallsyms) * VPMU infrastructure is now used for both HVM and PV and therefore has been moved up from hvm subtree Boris Ostrovsky (13): Export hypervisor symbols Set VCPU''s is_running flag closer to when the VCPU is dispatched x86/PMU: Stop AMD counters when called from vpmu_save_force() x86/VPMU: Minor VPMU cleanup intel/VPMU: Clean up Intel VPMU code x86/PMU: Add public xenpmu.h x86/PMU: Make vpmu not HVM-specific x86/PMU: Interface for setting PMU mode and flags x86/PMU: Initialize PMU for PV guests x86/PMU: Add support for PMU registes handling on PV guests x86/PMU: Handle PMU interrupts for PV guests x86/PMU: Save VPMU state for PV guests during context switch x86/PMU: Move vpmu files up from hvm directory xen/arch/x86/Makefile | 9 +- xen/arch/x86/apic.c | 13 - xen/arch/x86/domain.c | 18 +- xen/arch/x86/hvm/Makefile | 1 - xen/arch/x86/hvm/svm/Makefile | 1 - xen/arch/x86/hvm/svm/entry.S | 2 + xen/arch/x86/hvm/svm/vpmu.c | 494 ------------- xen/arch/x86/hvm/vmx/Makefile | 1 - xen/arch/x86/hvm/vmx/entry.S | 1 + xen/arch/x86/hvm/vmx/vmcs.c | 59 ++ xen/arch/x86/hvm/vmx/vpmu_core2.c | 894 ----------------------- xen/arch/x86/hvm/vpmu.c | 266 ------- xen/arch/x86/oprofile/op_model_ppro.c | 3 +- xen/arch/x86/platform_hypercall.c | 14 + xen/arch/x86/traps.c | 38 +- xen/arch/x86/vpmu.c | 545 ++++++++++++++ xen/arch/x86/vpmu_amd.c | 486 +++++++++++++ xen/arch/x86/vpmu_intel.c | 938 +++++++++++++++++++++++++ xen/arch/x86/x86_64/asm-offsets.c | 1 + xen/arch/x86/x86_64/compat/entry.S | 4 + xen/arch/x86/x86_64/entry.S | 4 + xen/arch/x86/x86_64/platform_hypercall.c | 2 +- xen/common/event_channel.c | 1 + xen/common/schedule.c | 10 +- xen/common/symbols-dummy.c | 1 + xen/common/symbols.c | 78 +- xen/include/asm-x86/domain.h | 3 + xen/include/asm-x86/hvm/vcpu.h | 3 - xen/include/asm-x86/hvm/vmx/vmcs.h | 3 +- xen/include/asm-x86/hvm/vmx/vpmu_core2.h | 51 -- xen/include/asm-x86/hvm/vpmu.h | 104 --- xen/include/asm-x86/irq.h | 1 - xen/include/asm-x86/mach-default/irq_vectors.h | 1 - xen/include/asm-x86/vpmu.h | 97 +++ xen/include/public/platform.h | 21 + xen/include/public/xen.h | 2 + xen/include/public/xenpmu.h | 138 ++++ xen/include/xen/hypercall.h | 4 + xen/include/xen/softirq.h | 1 + xen/include/xen/symbols.h | 4 + xen/tools/symbols.c | 4 + 41 files changed, 2465 insertions(+), 1856 deletions(-) delete mode 100644 xen/arch/x86/hvm/svm/vpmu.c delete mode 100644 xen/arch/x86/hvm/vmx/vpmu_core2.c delete mode 100644 xen/arch/x86/hvm/vpmu.c create mode 100644 xen/arch/x86/vpmu.c create mode 100644 xen/arch/x86/vpmu_amd.c create mode 100644 xen/arch/x86/vpmu_intel.c delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h delete mode 100644 xen/include/asm-x86/hvm/vpmu.h create mode 100644 xen/include/asm-x86/vpmu.h create mode 100644 xen/include/public/xenpmu.h -- 1.8.1.4
Export Xen''s symbols in format similar to Linux'' /proc/kallsyms. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- xen/arch/x86/Makefile | 8 ++-- xen/arch/x86/platform_hypercall.c | 14 ++++++ xen/arch/x86/x86_64/platform_hypercall.c | 2 +- xen/common/symbols-dummy.c | 1 + xen/common/symbols.c | 78 ++++++++++++++++++++++++++++++-- xen/include/public/platform.h | 21 +++++++++ xen/include/xen/symbols.h | 4 ++ xen/tools/symbols.c | 4 ++ 8 files changed, 123 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile index d502bdf..a27ac44 100644 --- a/xen/arch/x86/Makefile +++ b/xen/arch/x86/Makefile @@ -102,11 +102,11 @@ $(BASEDIR)/common/symbols-dummy.o: $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o $(LD) $(LDFLAGS) -T xen.lds -N prelink.o \ $(BASEDIR)/common/symbols-dummy.o -o $(@D)/.$(@F).0 - $(NM) -n $(@D)/.$(@F).0 | $(BASEDIR)/tools/symbols >$(@D)/.$(@F).0.S + $(NM) -n $(@D)/.$(@F).0 | $(BASEDIR)/tools/symbols --all-symbols >$(@D)/.$(@F).0.S $(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).0.o $(LD) $(LDFLAGS) -T xen.lds -N prelink.o \ $(@D)/.$(@F).0.o -o $(@D)/.$(@F).1 - $(NM) -n $(@D)/.$(@F).1 | $(BASEDIR)/tools/symbols >$(@D)/.$(@F).1.S + $(NM) -n $(@D)/.$(@F).1 | $(BASEDIR)/tools/symbols --all-symbols >$(@D)/.$(@F).1.S $(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o $(LD) $(LDFLAGS) -T xen.lds -N prelink.o \ $(@D)/.$(@F).1.o -o $@ @@ -129,13 +129,13 @@ $(TARGET).efi: prelink-efi.o efi.lds efi/relocs-dummy.o $(BASEDIR)/common/symbol $(guard) $(LD) $(call EFI_LDFLAGS,$(base)) -T efi.lds -N $< efi/relocs-dummy.o \ $(BASEDIR)/common/symbols-dummy.o -o $(@D)/.$(@F).$(base).0 &&) : $(guard) efi/mkreloc $(foreach base,$(VIRT_BASE) $(ALT_BASE),$(@D)/.$(@F).$(base).0) >$(@D)/.$(@F).0r.S - $(guard) $(NM) -n $(@D)/.$(@F).$(VIRT_BASE).0 | $(guard) $(BASEDIR)/tools/symbols >$(@D)/.$(@F).0s.S + $(guard) $(NM) -n $(@D)/.$(@F).$(VIRT_BASE).0 | $(guard) $(BASEDIR)/tools/symbols --all-symbols >$(@D)/.$(@F).0s.S $(guard) $(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).0r.o $(@D)/.$(@F).0s.o $(foreach base, $(VIRT_BASE) $(ALT_BASE), \ $(guard) $(LD) $(call EFI_LDFLAGS,$(base)) -T efi.lds -N $< \ $(@D)/.$(@F).0r.o $(@D)/.$(@F).0s.o -o $(@D)/.$(@F).$(base).1 &&) : $(guard) efi/mkreloc $(foreach base,$(VIRT_BASE) $(ALT_BASE),$(@D)/.$(@F).$(base).1) >$(@D)/.$(@F).1r.S - $(guard) $(NM) -n $(@D)/.$(@F).$(VIRT_BASE).1 | $(guard) $(BASEDIR)/tools/symbols >$(@D)/.$(@F).1s.S + $(guard) $(NM) -n $(@D)/.$(@F).$(VIRT_BASE).1 | $(guard) $(BASEDIR)/tools/symbols --all-symbols >$(@D)/.$(@F).1s.S $(guard) $(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o $(guard) $(LD) $(call EFI_LDFLAGS,$(VIRT_BASE)) -T efi.lds -N $< \ $(@D)/.$(@F).1r.o $(@D)/.$(@F).1s.o -o $@ diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c index 7175a82..492dc98 100644 --- a/xen/arch/x86/platform_hypercall.c +++ b/xen/arch/x86/platform_hypercall.c @@ -23,6 +23,7 @@ #include <xen/cpu.h> #include <xen/pmstat.h> #include <xen/irq.h> +#include <xen/symbols.h> #include <asm/current.h> #include <public/platform.h> #include <acpi/cpufreq/processor_perf.h> @@ -597,6 +598,19 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op) } break; + case XENPF_get_symbols: + { + XEN_GUEST_HANDLE(char) gbuf; + + /* Buffer that holds the symbols */ + guest_from_compat_handle(gbuf, op->u.symdata.buf); + + ret = xensyms_read(guest_handle_to_param(gbuf, char), &op->u.symdata); + if ( ret >= 0 && __copy_field_to_guest(u_xenpf_op, op, u.symdata) ) + ret = -EFAULT; + } + break; + default: ret = -ENOSYS; break; diff --git a/xen/arch/x86/x86_64/platform_hypercall.c b/xen/arch/x86/x86_64/platform_hypercall.c index aa2ad54..9ef705a 100644 --- a/xen/arch/x86/x86_64/platform_hypercall.c +++ b/xen/arch/x86/x86_64/platform_hypercall.c @@ -35,7 +35,7 @@ CHECK_pf_pcpu_version; #undef xen_pf_pcpu_version #define xenpf_enter_acpi_sleep compat_pf_enter_acpi_sleep - +#define xenpf_symdata compat_pf_symdata #define COMPAT #define _XEN_GUEST_HANDLE(t) XEN_GUEST_HANDLE(t) #define _XEN_GUEST_HANDLE_PARAM(t) XEN_GUEST_HANDLE_PARAM(t) diff --git a/xen/common/symbols-dummy.c b/xen/common/symbols-dummy.c index 5090c3b..52a86c7 100644 --- a/xen/common/symbols-dummy.c +++ b/xen/common/symbols-dummy.c @@ -12,6 +12,7 @@ const unsigned int symbols_offsets[1]; const unsigned long symbols_addresses[1]; #endif const unsigned int symbols_num_syms; +const unsigned long symbols_names_bytes; const u8 symbols_names[1]; const u8 symbols_token_table[1]; diff --git a/xen/common/symbols.c b/xen/common/symbols.c index 83b2b58..85d90b0 100644 --- a/xen/common/symbols.c +++ b/xen/common/symbols.c @@ -17,6 +17,8 @@ #include <xen/lib.h> #include <xen/string.h> #include <xen/spinlock.h> +#include <public/platform.h> +#include <xen/guest_access.h> #ifdef SYMBOLS_ORIGIN extern const unsigned int symbols_offsets[1]; @@ -26,6 +28,7 @@ extern const unsigned long symbols_addresses[]; #define symbols_address(n) symbols_addresses[n] #endif extern const unsigned int symbols_num_syms; +extern const unsigned long symbols_names_bytes; extern const u8 symbols_names[]; extern const u8 symbols_token_table[]; @@ -110,10 +113,7 @@ const char *symbols_lookup(unsigned long addr, namebuf[KSYM_NAME_LEN] = 0; namebuf[0] = 0; - if (!is_active_kernel_text(addr)) - return NULL; - - /* do a binary search on the sorted symbols_addresses array */ + /* do a binary search on the sorted symbols_addresses array */ low = 0; high = symbols_num_syms; @@ -174,3 +174,73 @@ void __print_symbol(const char *fmt, unsigned long address) spin_unlock_irqrestore(&lock, flags); } + +/* + * Get symbol type information. This is encoded as a single char at the + * beginning of the symbol name. + */ +static char symbols_get_symbol_type(unsigned int off) +{ + /* + * Get just the first code, look it up in the token table, + * and return the first char from this token. + */ + return symbols_token_table[symbols_token_index[symbols_names[off + 1]]]; +} + +/* + * Returns XENSYMS_SZ bytes worth of symbols to dom0 + */ +int xensyms_read(XEN_GUEST_HANDLE_PARAM(char) gbuf, + struct xenpf_symdata *symdata) +{ + int ret, len = 0, str_len; + unsigned long next_off; + char namebuf[KSYM_NAME_LEN + 1]; + char type; + char *buf; + + buf = xzalloc_bytes(XENSYMS_SZ); + if ( !buf ) + return -ENOMEM; + + if ( symdata->xen_symnum > symbols_num_syms || + symdata->xen_offset > symbols_names_bytes ) + return -EINVAL; + + /* + * Go symbol by symbol, until either reach end of symbol table or fill + * whole XENSYMS_SZ worth of buffer + */ + while ( 1 ) + { + type = symbols_get_symbol_type(symdata->xen_offset); + next_off = symbols_expand_symbol(symdata->xen_offset, namebuf); + + if ( namebuf[0] == ''\0'' ) + break; + + /* "%016lx %c %s\n" */ + str_len = 16 + 1 + 1 + 1 + strlen(namebuf) + 1 + 1; + + if ( len + str_len >= XENSYMS_SZ ) + break; + + snprintf(&buf[len], str_len, "%016lx %c %s\n", + symbols_offsets[symdata->xen_symnum] + SYMBOLS_ORIGIN, + type, namebuf); + + len += str_len; + symdata->xen_offset = next_off; + symdata->xen_symnum++; + } + + ret = copy_to_guest(gbuf, buf, XENSYMS_SZ) ? -EFAULT : 0; + + xfree(buf); + + if ( ret == 0 ) + return len; + else + return ret; +} diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h index 4341f54..f1782be 100644 --- a/xen/include/public/platform.h +++ b/xen/include/public/platform.h @@ -527,6 +527,26 @@ struct xenpf_core_parking { typedef struct xenpf_core_parking xenpf_core_parking_t; DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t); +#define XENPF_get_symbols 61 + +#define XENSYMS_SZ 4096 +struct xenpf_symdata { + /* + * offset into Xen''s symbol data and symbol number from + * last call. Used only by Xen. + */ + uint64_t xen_offset; + uint64_t xen_symnum; + + /* + * Symbols data, formatted similar to /proc/kallsyms: + * <address> <type> <name> + */ + XEN_GUEST_HANDLE(char) buf; +}; +typedef struct xenpf_symdata xenpf_symdata_t; +DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t); + /* * ` enum neg_errnoval * ` HYPERVISOR_platform_op(const struct xen_platform_op*); @@ -553,6 +573,7 @@ struct xen_platform_op { struct xenpf_cpu_hotadd cpu_add; struct xenpf_mem_hotadd mem_add; struct xenpf_core_parking core_parking; + struct xenpf_symdata symdata; uint8_t pad[128]; } u; }; diff --git a/xen/include/xen/symbols.h b/xen/include/xen/symbols.h index 37cf6bf..a0e36d0 100644 --- a/xen/include/xen/symbols.h +++ b/xen/include/xen/symbols.h @@ -2,6 +2,8 @@ #define _XEN_SYMBOLS_H #include <xen/types.h> +#include <public/xen.h> +#include <public/platform.h> #define KSYM_NAME_LEN 127 @@ -34,4 +36,6 @@ do { \ __print_symbol(fmt, addr); \ } while(0) +extern int xensyms_read(XEN_GUEST_HANDLE_PARAM(char) gbuf, struct xenpf_symdata *symdata); + #endif /*_XEN_SYMBOLS_H*/ diff --git a/xen/tools/symbols.c b/xen/tools/symbols.c index f39c906..818204d 100644 --- a/xen/tools/symbols.c +++ b/xen/tools/symbols.c @@ -272,6 +272,10 @@ static void write_src(void) } printf("\n"); + output_label("symbols_names_bytes"); + printf("\t.long\t%d\n", off); + printf("\n"); + output_label("symbols_markers"); for (i = 0; i < ((table_cnt + 255) >> 8); i++) printf("\t.long\t%d\n", markers[i]); -- 1.8.1.4
Boris Ostrovsky
2013-Sep-10 15:20 UTC
[PATCH v1 02/13] Set VCPU''s is_running flag closer to when the VCPU is dispatched
An interrupt handler happening during new VCPU scheduling may want to know who was on the (physical) processor at the point of the interrupt. Just looking at ''current'' may not be accurate since there is a window of time when ''current'' points to new VCPU and its is_running flag is set but the VCPU has not been dispatched yet. More importantly, on Intel processors, if the handler wants to examine certain state of an HVM VCPU (such as segment registers) the VMCS pointer is not set yet. This patch will move setting the is_running flag to a later point. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- xen/arch/x86/domain.c | 1 + xen/arch/x86/hvm/svm/entry.S | 2 ++ xen/arch/x86/hvm/vmx/entry.S | 1 + xen/arch/x86/x86_64/asm-offsets.c | 1 + xen/common/schedule.c | 10 ++++++++-- 5 files changed, 13 insertions(+), 2 deletions(-) I am not particularly happy about changes to common/schedule.c. I could define an arch-specific macro in an include file but I don''t see a good place to do this. Perhaps someone could suggest a better solution. Or maybe the ifdef is not needed at all (it was added in case something breaks on ARM). diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 874742c..e119d7b 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -142,6 +142,7 @@ static void continue_nonidle_domain(struct vcpu *v) { check_wakeup_from_wait(); mark_regs_dirty(guest_cpu_user_regs()); + v->is_running = 1; reset_stack_and_jump(ret_from_intr); } diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S index 1969629..728e773 100644 --- a/xen/arch/x86/hvm/svm/entry.S +++ b/xen/arch/x86/hvm/svm/entry.S @@ -74,6 +74,8 @@ UNLIKELY_END(svm_trace) mov VCPU_svm_vmcb_pa(%rbx),%rax + movb $1,VCPU_is_running(%rbx) + pop %r15 pop %r14 pop %r13 diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S index 496a62c..9e33f45 100644 --- a/xen/arch/x86/hvm/vmx/entry.S +++ b/xen/arch/x86/hvm/vmx/entry.S @@ -125,6 +125,7 @@ UNLIKELY_END(realmode) mov $GUEST_RFLAGS,%eax VMWRITE(UREGS_eflags) + movb $1,VCPU_is_running(%rbx) cmpb $0,VCPU_vmx_launched(%rbx) pop %r15 pop %r14 diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c index b0098b3..9fa06c0 100644 --- a/xen/arch/x86/x86_64/asm-offsets.c +++ b/xen/arch/x86/x86_64/asm-offsets.c @@ -86,6 +86,7 @@ void __dummy__(void) OFFSET(VCPU_kernel_sp, struct vcpu, arch.pv_vcpu.kernel_sp); OFFSET(VCPU_kernel_ss, struct vcpu, arch.pv_vcpu.kernel_ss); OFFSET(VCPU_guest_context_flags, struct vcpu, arch.vgc_flags); + OFFSET(VCPU_is_running, struct vcpu, is_running); OFFSET(VCPU_nmi_pending, struct vcpu, nmi_pending); OFFSET(VCPU_mce_pending, struct vcpu, mce_pending); OFFSET(VCPU_nmi_old_mask, struct vcpu, nmi_state.old_mask); diff --git a/xen/common/schedule.c b/xen/common/schedule.c index a8398bd..af3edbc 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -1219,8 +1219,14 @@ static void schedule(void) * switch, else lost_records resume will not work properly. */ - ASSERT(!next->is_running); - next->is_running = 1; +#ifdef CONFIG_X86 + if ( is_idle_vcpu(next) ) + /* On x86 guests will set is_running right before they start running. */ +#endif + { + ASSERT(!next->is_running); + next->is_running = 1; + } pcpu_schedule_unlock_irq(cpu); -- 1.8.1.4
Boris Ostrovsky
2013-Sep-10 15:21 UTC
[PATCH v1 03/13] x86/PMU: Stop AMD counters when called from vpmu_save_force()
Change amd_vpmu_save() algorithm to accommodate cases when we need to stop counters from vpmu_save_force() (needed by subsequent PMU patches). Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- xen/arch/x86/hvm/svm/vpmu.c | 14 ++++---------- xen/arch/x86/hvm/vpmu.c | 12 ++++++------ 2 files changed, 10 insertions(+), 16 deletions(-) diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c index 4d1fbc8..5d9c3f5 100644 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ b/xen/arch/x86/hvm/svm/vpmu.c @@ -223,22 +223,16 @@ static int amd_vpmu_save(struct vcpu *v) struct amd_vpmu_context *ctx = vpmu->context; unsigned int i; - /* - * Stop the counters. If we came here via vpmu_save_force (i.e. - * when VPMU_CONTEXT_SAVE is set) counters are already stopped. - */ - if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) ) + if ( !vpmu_is_set(vpmu, VPMU_FROZEN) ) { - vpmu_set(vpmu, VPMU_FROZEN); - for ( i = 0; i < num_counters; i++ ) wrmsrl(ctrls[i], 0); - return 0; + vpmu_set(vpmu, VPMU_FROZEN); } - if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) - return 0; + if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) ) + return 0; context_save(v); diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c index 21fbaba..a4e3664 100644 --- a/xen/arch/x86/hvm/vpmu.c +++ b/xen/arch/x86/hvm/vpmu.c @@ -127,13 +127,19 @@ static void vpmu_save_force(void *arg) struct vcpu *v = (struct vcpu *)arg; struct vpmu_struct *vpmu = vcpu_vpmu(v); + if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) + return; + if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) return; + vpmu_set(vpmu, VPMU_CONTEXT_SAVE); + if ( vpmu->arch_vpmu_ops ) (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v); vpmu_reset(vpmu, VPMU_CONTEXT_SAVE); + vpmu_reset(vpmu, VPMU_CONTEXT_LOADED); per_cpu(last_vcpu, smp_processor_id()) = NULL; } @@ -177,12 +183,8 @@ void vpmu_load(struct vcpu *v) * before saving the context. */ if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) - { - vpmu_set(vpmu, VPMU_CONTEXT_SAVE); on_selected_cpus(cpumask_of(vpmu->last_pcpu), vpmu_save_force, (void *)v, 1); - vpmu_reset(vpmu, VPMU_CONTEXT_LOADED); - } } /* Prevent forced context save from remote CPU */ @@ -195,9 +197,7 @@ void vpmu_load(struct vcpu *v) vpmu = vcpu_vpmu(prev); /* Someone ran here before us */ - vpmu_set(vpmu, VPMU_CONTEXT_SAVE); vpmu_save_force(prev); - vpmu_reset(vpmu, VPMU_CONTEXT_LOADED); vpmu = vcpu_vpmu(v); } -- 1.8.1.4
Update macros that modify VPMU flags to allow changing multiple bits at once. Make sure that we only touch MSR bitmap on HVM guests (both VMX and SVM). This is needed by subsequent PMU patches. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- xen/arch/x86/hvm/svm/vpmu.c | 14 +++++++++----- xen/arch/x86/hvm/vmx/vpmu_core2.c | 9 +++------ xen/arch/x86/hvm/vpmu.c | 11 +++-------- xen/include/asm-x86/hvm/vpmu.h | 9 +++++---- 4 files changed, 20 insertions(+), 23 deletions(-) diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c index 5d9c3f5..a09930e 100644 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ b/xen/arch/x86/hvm/svm/vpmu.c @@ -236,7 +236,8 @@ static int amd_vpmu_save(struct vcpu *v) context_save(v); - if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && ctx->msr_bitmap_set ) + if ( is_hvm_domain(v->domain) && + !vpmu_is_set(vpmu, VPMU_RUNNING) && ctx->msr_bitmap_set ) amd_vpmu_unset_msr_bitmap(v); return 1; @@ -276,7 +277,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) struct vpmu_struct *vpmu = vcpu_vpmu(v); /* For all counters, enable guest only mode for HVM guest */ - if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) && + if ( is_hvm_domain(v->domain) && (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) && !(is_guest_mode(msr_content)) ) { set_guest_mode(msr_content); @@ -292,7 +293,8 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) apic_write(APIC_LVTPC, PMU_APIC_VECTOR); vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR; - if ( !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set ) + if ( is_hvm_domain(v->domain) && + !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set ) amd_vpmu_set_msr_bitmap(v); } @@ -303,7 +305,8 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED); vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED; vpmu_reset(vpmu, VPMU_RUNNING); - if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set ) + if ( is_hvm_domain(v->domain) && + ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set ) amd_vpmu_unset_msr_bitmap(v); release_pmu_ownship(PMU_OWNER_HVM); } @@ -395,7 +398,8 @@ static void amd_vpmu_destroy(struct vcpu *v) if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) return; - if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set ) + if ( is_hvm_domain(v->domain) && + ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set ) amd_vpmu_unset_msr_bitmap(v); xfree(vpmu->context); diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c index 15b2036..101888d 100644 --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c @@ -305,10 +305,7 @@ static int core2_vpmu_save(struct vcpu *v) { struct vpmu_struct *vpmu = vcpu_vpmu(v); - if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) ) - return 0; - - if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) + if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) ) return 0; __core2_vpmu_save(v); @@ -420,7 +417,7 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index) { __core2_vpmu_load(current); vpmu_set(vpmu, VPMU_CONTEXT_LOADED); - if ( cpu_has_vmx_msr_bitmap ) + if ( cpu_has_vmx_msr_bitmap && is_hvm_domain(current->domain) ) core2_vpmu_set_msr_bitmap(current->arch.hvm_vmx.msr_bitmap); } return 1; @@ -786,7 +783,7 @@ static void core2_vpmu_destroy(struct vcpu *v) return; xfree(core2_vpmu_cxt->pmu_enable); xfree(vpmu->context); - if ( cpu_has_vmx_msr_bitmap ) + if ( cpu_has_vmx_msr_bitmap && is_hvm_domain(v->domain) ) core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap); release_pmu_ownship(PMU_OWNER_HVM); vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED); diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c index a4e3664..d6a9ff6 100644 --- a/xen/arch/x86/hvm/vpmu.c +++ b/xen/arch/x86/hvm/vpmu.c @@ -127,10 +127,7 @@ static void vpmu_save_force(void *arg) struct vcpu *v = (struct vcpu *)arg; struct vpmu_struct *vpmu = vcpu_vpmu(v); - if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) - return; - - if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) + if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) ) return; vpmu_set(vpmu, VPMU_CONTEXT_SAVE); @@ -138,8 +135,7 @@ static void vpmu_save_force(void *arg) if ( vpmu->arch_vpmu_ops ) (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v); - vpmu_reset(vpmu, VPMU_CONTEXT_SAVE); - vpmu_reset(vpmu, VPMU_CONTEXT_LOADED); + vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED); per_cpu(last_vcpu, smp_processor_id()) = NULL; } @@ -149,8 +145,7 @@ void vpmu_save(struct vcpu *v) struct vpmu_struct *vpmu = vcpu_vpmu(v); int pcpu = smp_processor_id(); - if ( !(vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) && - vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)) ) + if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) ) return; vpmu->last_pcpu = pcpu; diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h index 03b9462..674cdad 100644 --- a/xen/include/asm-x86/hvm/vpmu.h +++ b/xen/include/asm-x86/hvm/vpmu.h @@ -81,10 +81,11 @@ struct vpmu_struct { #define VPMU_CPU_HAS_BTS 0x200 /* Has Branch Trace Store */ -#define vpmu_set(_vpmu, _x) ((_vpmu)->flags |= (_x)) -#define vpmu_reset(_vpmu, _x) ((_vpmu)->flags &= ~(_x)) -#define vpmu_is_set(_vpmu, _x) ((_vpmu)->flags & (_x)) -#define vpmu_clear(_vpmu) ((_vpmu)->flags = 0) +#define vpmu_set(_vpmu, _x) ((_vpmu)->flags |= (_x)) +#define vpmu_reset(_vpmu, _x) ((_vpmu)->flags &= ~(_x)) +#define vpmu_is_set(_vpmu, _x) ((_vpmu)->flags & (_x)) +#define vpmu_is_set_all(_vpmu, _x) (((_vpmu)->flags & (_x)) == (_x)) +#define vpmu_clear(_vpmu) ((_vpmu)->flags = 0) int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content); int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content); -- 1.8.1.4
Boris Ostrovsky
2013-Sep-10 15:21 UTC
[PATCH v1 05/13] intel/VPMU: Clean up Intel VPMU code
Remove struct pmumsr and convert core2_fix_counters and core2_ctrls into arrays of u32 (MSR offsets). Call core2_get_pmc_count() once, during initialization. Properly clean up when core2_vpmu_alloc_resource() fails and add routines to remove MSRs from VMCS. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- xen/arch/x86/hvm/vmx/vmcs.c | 59 +++++++++ xen/arch/x86/hvm/vmx/vpmu_core2.c | 218 ++++++++++++++++--------------- xen/include/asm-x86/hvm/vmx/vmcs.h | 2 + xen/include/asm-x86/hvm/vmx/vpmu_core2.h | 19 --- 4 files changed, 171 insertions(+), 127 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index de9f592..756bc13 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -1136,6 +1136,36 @@ int vmx_add_guest_msr(u32 msr) return 0; } +void vmx_rm_guest_msr(u32 msr) +{ + struct vcpu *curr = current; + unsigned int i, idx, msr_count = curr->arch.hvm_vmx.msr_count; + struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area; + + if ( msr_area == NULL ) + return; + + for ( idx = 0; idx < msr_count; idx++ ) + if ( msr_area[idx].index == msr ) + break; + + if ( idx == msr_count ) + return; + + for ( i = idx; i < msr_count - 1; i++ ) + { + msr_area[i].index = msr_area[i + 1].index; + rdmsrl(msr_area[i].index, msr_area[i].data); + } + msr_area[msr_count - 1].index = 0; + + curr->arch.hvm_vmx.msr_count = --msr_count; + __vmwrite(VM_EXIT_MSR_STORE_COUNT, msr_count); + __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, msr_count); + + return; +} + int vmx_add_host_load_msr(u32 msr) { struct vcpu *curr = current; @@ -1166,6 +1196,35 @@ int vmx_add_host_load_msr(u32 msr) return 0; } +void vmx_rm_host_load_msr(u32 msr) +{ + struct vcpu *curr = current; + unsigned int i, idx, msr_count = curr->arch.hvm_vmx.host_msr_count; + struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.host_msr_area; + + if ( msr_area == NULL ) + return; + + for ( idx = 0; idx < msr_count; idx++ ) + if ( msr_area[idx].index == msr ) + break; + + if ( idx == msr_count ) + return; + + for ( i = idx; i < msr_count - 1; i++ ) + { + msr_area[i].index = msr_area[i + 1].index; + rdmsrl(msr_area[i].index, msr_area[i].data); + } + msr_area[msr_count - 1].index = 0; + + curr->arch.hvm_vmx.host_msr_count = --msr_count; + __vmwrite(VM_EXIT_MSR_LOAD_COUNT, msr_count); + + return; +} + void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector) { int index, offset, changed; diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c index 101888d..30a948e 100644 --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c @@ -64,6 +64,47 @@ #define PMU_FIXED_WIDTH_BITS 8 /* 8 bits 5..12 */ #define PMU_FIXED_WIDTH_MASK (((1 << PMU_FIXED_WIDTH_BITS) -1) << PMU_FIXED_WIDTH_SHIFT) +static const u32 core2_fix_counters_msr[] = { + MSR_CORE_PERF_FIXED_CTR0, + MSR_CORE_PERF_FIXED_CTR1, + MSR_CORE_PERF_FIXED_CTR2 +}; +#define VPMU_CORE2_NUM_FIXED (sizeof(core2_fix_counters_msr) / sizeof(u32)) + +/* + * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed + * counters. 4 bits for every counter. + */ +#define FIXED_CTR_CTRL_BITS 4 +#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1) + +/* The index into the core2_ctrls_msr[] of this MSR used in core2_vpmu_dump() */ +#define MSR_CORE_PERF_FIXED_CTR_CTRL_IDX 0 + +/* Core 2 Non-architectual Performance Control MSRs. */ +static const u32 core2_ctrls_msr[] = { + MSR_CORE_PERF_FIXED_CTR_CTRL, + MSR_IA32_PEBS_ENABLE, + MSR_IA32_DS_AREA +}; +#define VPMU_CORE2_NUM_CTRLS (sizeof(core2_ctrls_msr) / sizeof(u32)) + +struct core2_pmu_enable { + char ds_area_enable; + char fixed_ctr_enable[VPMU_CORE2_NUM_FIXED]; + char arch_pmc_enable[1]; +}; + +struct core2_vpmu_context { + struct core2_pmu_enable *pmu_enable; + u64 fix_counters[VPMU_CORE2_NUM_FIXED]; + u64 ctrls[VPMU_CORE2_NUM_CTRLS]; + u64 global_ovf_status; + struct arch_msr_pair arch_msr_pair[1]; +}; + +static int arch_pmc_cnt; /* Number of general-purpose performance counters */ + /* * QUIRK to workaround an issue on various family 6 cpus. * The issue leads to endless PMC interrupt loops on the processor. @@ -84,11 +125,8 @@ static void check_pmc_quirk(void) is_pmc_quirk = 0; } -static int core2_get_pmc_count(void); static void handle_pmc_quirk(u64 msr_content) { - int num_gen_pmc = core2_get_pmc_count(); - int num_fix_pmc = 3; int i; u64 val; @@ -96,7 +134,7 @@ static void handle_pmc_quirk(u64 msr_content) return; val = msr_content; - for ( i = 0; i < num_gen_pmc; i++ ) + for ( i = 0; i < arch_pmc_cnt; i++ ) { if ( val & 0x1 ) { @@ -108,7 +146,7 @@ static void handle_pmc_quirk(u64 msr_content) val >>= 1; } val = msr_content >> 32; - for ( i = 0; i < num_fix_pmc; i++ ) + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) { if ( val & 0x1 ) { @@ -121,45 +159,6 @@ static void handle_pmc_quirk(u64 msr_content) } } -static const u32 core2_fix_counters_msr[] = { - MSR_CORE_PERF_FIXED_CTR0, - MSR_CORE_PERF_FIXED_CTR1, - MSR_CORE_PERF_FIXED_CTR2 -}; - -/* - * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed - * counters. 4 bits for every counter. - */ -#define FIXED_CTR_CTRL_BITS 4 -#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1) - -/* The index into the core2_ctrls_msr[] of this MSR used in core2_vpmu_dump() */ -#define MSR_CORE_PERF_FIXED_CTR_CTRL_IDX 0 - -/* Core 2 Non-architectual Performance Control MSRs. */ -static const u32 core2_ctrls_msr[] = { - MSR_CORE_PERF_FIXED_CTR_CTRL, - MSR_IA32_PEBS_ENABLE, - MSR_IA32_DS_AREA -}; - -struct pmumsr { - unsigned int num; - const u32 *msr; -}; - -static const struct pmumsr core2_fix_counters = { - VPMU_CORE2_NUM_FIXED, - core2_fix_counters_msr -}; - -static const struct pmumsr core2_ctrls = { - VPMU_CORE2_NUM_CTRLS, - core2_ctrls_msr -}; -static int arch_pmc_cnt; - /* * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15] */ @@ -167,19 +166,14 @@ static int core2_get_pmc_count(void) { u32 eax, ebx, ecx, edx; - if ( arch_pmc_cnt == 0 ) - { - cpuid(0xa, &eax, &ebx, &ecx, &edx); - arch_pmc_cnt = (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT; - } - - return arch_pmc_cnt; + cpuid(0xa, &eax, &ebx, &ecx, &edx); + return ( (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT ); } static u64 core2_calc_intial_glb_ctrl_msr(void) { - int arch_pmc_bits = (1 << core2_get_pmc_count()) - 1; - u64 fix_pmc_bits = (1 << 3) - 1; + int arch_pmc_bits = (1 << arch_pmc_cnt) - 1; + u64 fix_pmc_bits = (1 << VPMU_CORE2_NUM_FIXED) - 1; return ((fix_pmc_bits << 32) | arch_pmc_bits); } @@ -196,9 +190,9 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index) { int i; - for ( i = 0; i < core2_fix_counters.num; i++ ) + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) { - if ( core2_fix_counters.msr[i] == msr_index ) + if ( core2_fix_counters_msr[i] == msr_index ) { *type = MSR_TYPE_COUNTER; *index = i; @@ -206,9 +200,9 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index) } } - for ( i = 0; i < core2_ctrls.num; i++ ) + for ( i = 0; i < VPMU_CORE2_NUM_CTRLS; i++ ) { - if ( core2_ctrls.msr[i] == msr_index ) + if ( core2_ctrls_msr[i] == msr_index ) { *type = MSR_TYPE_CTRL; *index = i; @@ -225,7 +219,7 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index) } if ( (msr_index >= MSR_IA32_PERFCTR0) && - (msr_index < (MSR_IA32_PERFCTR0 + core2_get_pmc_count())) ) + (msr_index < (MSR_IA32_PERFCTR0 + arch_pmc_cnt)) ) { *type = MSR_TYPE_ARCH_COUNTER; *index = msr_index - MSR_IA32_PERFCTR0; @@ -233,7 +227,7 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index) } if ( (msr_index >= MSR_P6_EVNTSEL0) && - (msr_index < (MSR_P6_EVNTSEL0 + core2_get_pmc_count())) ) + (msr_index < (MSR_P6_EVNTSEL0 + arch_pmc_cnt)) ) { *type = MSR_TYPE_ARCH_CTRL; *index = msr_index - MSR_P6_EVNTSEL0; @@ -248,13 +242,13 @@ static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap) int i; /* Allow Read/Write PMU Counters MSR Directly. */ - for ( i = 0; i < core2_fix_counters.num; i++ ) + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) { - clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]), msr_bitmap); - clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]), + clear_bit(msraddr_to_bitpos(core2_fix_counters_msr[i]), msr_bitmap); + clear_bit(msraddr_to_bitpos(core2_fix_counters_msr[i]), msr_bitmap + 0x800/BYTES_PER_LONG); } - for ( i = 0; i < core2_get_pmc_count(); i++ ) + for ( i = 0; i < arch_pmc_cnt; i++ ) { clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap); clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), @@ -262,9 +256,9 @@ static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap) } /* Allow Read PMU Non-global Controls Directly. */ - for ( i = 0; i < core2_ctrls.num; i++ ) - clear_bit(msraddr_to_bitpos(core2_ctrls.msr[i]), msr_bitmap); - for ( i = 0; i < core2_get_pmc_count(); i++ ) + for ( i = 0; i < VPMU_CORE2_NUM_CTRLS; i++ ) + clear_bit(msraddr_to_bitpos(core2_ctrls_msr[i]), msr_bitmap); + for ( i = 0; i < arch_pmc_cnt; i++ ) clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap); } @@ -272,21 +266,21 @@ static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap) { int i; - for ( i = 0; i < core2_fix_counters.num; i++ ) + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) { - set_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]), msr_bitmap); - set_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]), + set_bit(msraddr_to_bitpos(core2_fix_counters_msr[i]), msr_bitmap); + set_bit(msraddr_to_bitpos(core2_fix_counters_msr[i]), msr_bitmap + 0x800/BYTES_PER_LONG); } - for ( i = 0; i < core2_get_pmc_count(); i++ ) + for ( i = 0; i < arch_pmc_cnt; i++ ) { set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap); set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap + 0x800/BYTES_PER_LONG); } - for ( i = 0; i < core2_ctrls.num; i++ ) - set_bit(msraddr_to_bitpos(core2_ctrls.msr[i]), msr_bitmap); - for ( i = 0; i < core2_get_pmc_count(); i++ ) + for ( i = 0; i < VPMU_CORE2_NUM_CTRLS; i++ ) + set_bit(msraddr_to_bitpos(core2_ctrls_msr[i]), msr_bitmap); + for ( i = 0; i < arch_pmc_cnt; i++ ) set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap); } @@ -295,9 +289,9 @@ static inline void __core2_vpmu_save(struct vcpu *v) int i; struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context; - for ( i = 0; i < core2_fix_counters.num; i++ ) - rdmsrl(core2_fix_counters.msr[i], core2_vpmu_cxt->fix_counters[i]); - for ( i = 0; i < core2_get_pmc_count(); i++ ) + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) + rdmsrl(core2_fix_counters_msr[i], core2_vpmu_cxt->fix_counters[i]); + for ( i = 0; i < arch_pmc_cnt; i++ ) rdmsrl(MSR_IA32_PERFCTR0+i, core2_vpmu_cxt->arch_msr_pair[i].counter); } @@ -322,14 +316,14 @@ static inline void __core2_vpmu_load(struct vcpu *v) int i; struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context; - for ( i = 0; i < core2_fix_counters.num; i++ ) - wrmsrl(core2_fix_counters.msr[i], core2_vpmu_cxt->fix_counters[i]); - for ( i = 0; i < core2_get_pmc_count(); i++ ) + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) + wrmsrl(core2_fix_counters_msr[i], core2_vpmu_cxt->fix_counters[i]); + for ( i = 0; i < arch_pmc_cnt; i++ ) wrmsrl(MSR_IA32_PERFCTR0+i, core2_vpmu_cxt->arch_msr_pair[i].counter); - for ( i = 0; i < core2_ctrls.num; i++ ) - wrmsrl(core2_ctrls.msr[i], core2_vpmu_cxt->ctrls[i]); - for ( i = 0; i < core2_get_pmc_count(); i++ ) + for ( i = 0; i < VPMU_CORE2_NUM_CTRLS; i++ ) + wrmsrl(core2_ctrls_msr[i], core2_vpmu_cxt->ctrls[i]); + for ( i = 0; i < arch_pmc_cnt; i++ ) wrmsrl(MSR_P6_EVNTSEL0+i, core2_vpmu_cxt->arch_msr_pair[i].control); } @@ -347,39 +341,46 @@ static int core2_vpmu_alloc_resource(struct vcpu *v) { struct vpmu_struct *vpmu = vcpu_vpmu(v); struct core2_vpmu_context *core2_vpmu_cxt; - struct core2_pmu_enable *pmu_enable; + struct core2_pmu_enable *pmu_enable = NULL; if ( !acquire_pmu_ownership(PMU_OWNER_HVM) ) return 0; wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0); if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) ) - return 0; + goto out_err; if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) ) - return 0; + goto out_err; vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, core2_calc_intial_glb_ctrl_msr()); pmu_enable = xzalloc_bytes(sizeof(struct core2_pmu_enable) + - core2_get_pmc_count() - 1); + arch_pmc_cnt - 1); if ( !pmu_enable ) - goto out1; + goto out_err; core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) + - (core2_get_pmc_count()-1)*sizeof(struct arch_msr_pair)); + (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair)); if ( !core2_vpmu_cxt ) - goto out2; + goto out_err; + core2_vpmu_cxt->pmu_enable = pmu_enable; vpmu->context = (void *)core2_vpmu_cxt; + vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED); + return 1; - out2: + +out_err: xfree(pmu_enable); - out1: - gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, PMU feature is " - "unavailable on domain %d vcpu %d.\n", - v->vcpu_id, v->domain->domain_id); + vmx_rm_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL); + vmx_rm_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL); + release_pmu_ownship(PMU_OWNER_HVM); + + printk("Failed to allocate VPMU resources for domain %u vcpu %u\n", + v->vcpu_id, v->domain->domain_id); + return 0; } @@ -407,10 +408,8 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index) return 0; if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) && - (vpmu->context != NULL || - !core2_vpmu_alloc_resource(current)) ) + !core2_vpmu_alloc_resource(current) ) return 0; - vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED); /* Do the lazy load staff. */ if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) @@ -490,7 +489,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) return 1; case MSR_CORE_PERF_GLOBAL_CTRL: global_ctrl = msr_content; - for ( i = 0; i < core2_get_pmc_count(); i++ ) + for ( i = 0; i < arch_pmc_cnt; i++ ) { rdmsrl(MSR_P6_EVNTSEL0+i, non_global_ctrl); core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i] @@ -500,7 +499,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) rdmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, non_global_ctrl); global_ctrl = msr_content >> 32; - for ( i = 0; i < core2_fix_counters.num; i++ ) + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) { core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1: 0); @@ -512,7 +511,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) non_global_ctrl = msr_content; vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl); global_ctrl >>= 32; - for ( i = 0; i < core2_fix_counters.num; i++ ) + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) { core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1: 0); @@ -523,14 +522,14 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) default: tmp = msr - MSR_P6_EVNTSEL0; vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl); - if ( tmp >= 0 && tmp < core2_get_pmc_count() ) + if ( tmp >= 0 && tmp < arch_pmc_cnt ) core2_vpmu_cxt->pmu_enable->arch_pmc_enable[tmp] (global_ctrl >> tmp) & (msr_content >> 22) & 1; } - for ( i = 0; i < core2_fix_counters.num; i++ ) + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) pmu_enable |= core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i]; - for ( i = 0; i < core2_get_pmc_count(); i++ ) + for ( i = 0; i < arch_pmc_cnt; i++ ) pmu_enable |= core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i]; pmu_enable |= core2_vpmu_cxt->pmu_enable->ds_area_enable; if ( pmu_enable ) @@ -652,7 +651,7 @@ static void core2_vpmu_do_cpuid(unsigned int input, static void core2_vpmu_dump(struct vcpu *v) { struct vpmu_struct *vpmu = vcpu_vpmu(v); - int i, num; + int i; struct core2_vpmu_context *core2_vpmu_cxt = NULL; u64 val; @@ -670,9 +669,9 @@ static void core2_vpmu_dump(struct vcpu *v) printk(" vPMU running\n"); core2_vpmu_cxt = vpmu->context; - num = core2_get_pmc_count(); + /* Print the contents of the counter and its configuration msr. */ - for ( i = 0; i < num; i++ ) + for ( i = 0; i < arch_pmc_cnt; i++ ) { struct arch_msr_pair* msr_pair = core2_vpmu_cxt->arch_msr_pair; if ( core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i] ) @@ -684,7 +683,7 @@ static void core2_vpmu_dump(struct vcpu *v) * MSR_CORE_PERF_FIXED_CTR_CTRL. */ val = core2_vpmu_cxt->ctrls[MSR_CORE_PERF_FIXED_CTR_CTRL_IDX]; - for ( i = 0; i < core2_fix_counters.num; i++ ) + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) { if ( core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] ) printk(" fixed_%d: 0x%016lx ctrl: 0x%lx\n", @@ -707,7 +706,7 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs) if ( is_pmc_quirk ) handle_pmc_quirk(msr_content); core2_vpmu_cxt->global_ovf_status |= msr_content; - msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count()) - 1); + msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1); wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); } else @@ -770,7 +769,10 @@ static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags) } } func_out: + + arch_pmc_cnt = core2_get_pmc_count(); check_pmc_quirk(); + return 0; } diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h index f30e5ac..5971613 100644 --- a/xen/include/asm-x86/hvm/vmx/vmcs.h +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h @@ -470,7 +470,9 @@ void vmx_enable_intercept_for_msr(struct vcpu *v, u32 msr, int type); int vmx_read_guest_msr(u32 msr, u64 *val); int vmx_write_guest_msr(u32 msr, u64 val); int vmx_add_guest_msr(u32 msr); +void vmx_rm_guest_msr(u32 msr); int vmx_add_host_load_msr(u32 msr); +void vmx_rm_host_load_msr(u32 msr); void vmx_vmcs_switch(struct vmcs_struct *from, struct vmcs_struct *to); void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector); void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector); diff --git a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h index 60b05fd..410372d 100644 --- a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h +++ b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h @@ -23,29 +23,10 @@ #ifndef __ASM_X86_HVM_VPMU_CORE_H_ #define __ASM_X86_HVM_VPMU_CORE_H_ -/* Currently only 3 fixed counters are supported. */ -#define VPMU_CORE2_NUM_FIXED 3 -/* Currently only 3 Non-architectual Performance Control MSRs */ -#define VPMU_CORE2_NUM_CTRLS 3 - struct arch_msr_pair { u64 counter; u64 control; }; -struct core2_pmu_enable { - char ds_area_enable; - char fixed_ctr_enable[VPMU_CORE2_NUM_FIXED]; - char arch_pmc_enable[1]; -}; - -struct core2_vpmu_context { - struct core2_pmu_enable *pmu_enable; - u64 fix_counters[VPMU_CORE2_NUM_FIXED]; - u64 ctrls[VPMU_CORE2_NUM_CTRLS]; - u64 global_ovf_status; - struct arch_msr_pair arch_msr_pair[1]; -}; - #endif /* __ASM_X86_HVM_VPMU_CORE_H_ */ -- 1.8.1.4
Add xenpmu.h header file, move various macros and structures that will be shared between hypervisor and PV guests to it. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- xen/arch/x86/hvm/svm/vpmu.c | 12 +--- xen/arch/x86/hvm/vmx/vpmu_core2.c | 43 +++---------- xen/arch/x86/hvm/vpmu.c | 5 +- xen/arch/x86/oprofile/op_model_ppro.c | 1 - xen/include/asm-x86/hvm/vmx/vpmu_core2.h | 32 ---------- xen/include/asm-x86/hvm/vpmu.h | 14 +---- xen/include/public/xenpmu.h | 101 +++++++++++++++++++++++++++++++ 7 files changed, 115 insertions(+), 93 deletions(-) delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h create mode 100644 xen/include/public/xenpmu.h diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c index a09930e..9f9c9ea 100644 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ b/xen/arch/x86/hvm/svm/vpmu.c @@ -30,10 +30,7 @@ #include <asm/apic.h> #include <asm/hvm/vlapic.h> #include <asm/hvm/vpmu.h> - -#define F10H_NUM_COUNTERS 4 -#define F15H_NUM_COUNTERS 6 -#define MAX_NUM_COUNTERS F15H_NUM_COUNTERS +#include <public/xenpmu.h> #define MSR_F10H_EVNTSEL_GO_SHIFT 40 #define MSR_F10H_EVNTSEL_EN_SHIFT 22 @@ -83,13 +80,6 @@ static const u32 AMD_F15H_CTRLS[] = { MSR_AMD_FAM15H_EVNTSEL5 }; -/* storage for context switching */ -struct amd_vpmu_context { - u64 counters[MAX_NUM_COUNTERS]; - u64 ctrls[MAX_NUM_COUNTERS]; - bool_t msr_bitmap_set; -}; - static inline int get_pmu_reg_type(u32 addr) { if ( (addr >= MSR_K7_EVNTSEL0) && (addr <= MSR_K7_EVNTSEL3) ) diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c index 30a948e..f3b6de0 100644 --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c @@ -35,8 +35,8 @@ #include <asm/hvm/vmx/vmcs.h> #include <public/sched.h> #include <public/hvm/save.h> +#include <public/xenpmu.h> #include <asm/hvm/vpmu.h> -#include <asm/hvm/vmx/vpmu_core2.h> /* * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID @@ -64,12 +64,10 @@ #define PMU_FIXED_WIDTH_BITS 8 /* 8 bits 5..12 */ #define PMU_FIXED_WIDTH_MASK (((1 << PMU_FIXED_WIDTH_BITS) -1) << PMU_FIXED_WIDTH_SHIFT) -static const u32 core2_fix_counters_msr[] = { - MSR_CORE_PERF_FIXED_CTR0, - MSR_CORE_PERF_FIXED_CTR1, - MSR_CORE_PERF_FIXED_CTR2 -}; -#define VPMU_CORE2_NUM_FIXED (sizeof(core2_fix_counters_msr) / sizeof(u32)) + +/* Intel-specific VPMU features */ +#define VPMU_CPU_HAS_DS 0x100 /* Has Debug Store */ +#define VPMU_CPU_HAS_BTS 0x200 /* Has Branch Trace Store */ /* * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed @@ -81,28 +79,6 @@ static const u32 core2_fix_counters_msr[] = { /* The index into the core2_ctrls_msr[] of this MSR used in core2_vpmu_dump() */ #define MSR_CORE_PERF_FIXED_CTR_CTRL_IDX 0 -/* Core 2 Non-architectual Performance Control MSRs. */ -static const u32 core2_ctrls_msr[] = { - MSR_CORE_PERF_FIXED_CTR_CTRL, - MSR_IA32_PEBS_ENABLE, - MSR_IA32_DS_AREA -}; -#define VPMU_CORE2_NUM_CTRLS (sizeof(core2_ctrls_msr) / sizeof(u32)) - -struct core2_pmu_enable { - char ds_area_enable; - char fixed_ctr_enable[VPMU_CORE2_NUM_FIXED]; - char arch_pmc_enable[1]; -}; - -struct core2_vpmu_context { - struct core2_pmu_enable *pmu_enable; - u64 fix_counters[VPMU_CORE2_NUM_FIXED]; - u64 ctrls[VPMU_CORE2_NUM_CTRLS]; - u64 global_ovf_status; - struct arch_msr_pair arch_msr_pair[1]; -}; - static int arch_pmc_cnt; /* Number of general-purpose performance counters */ /* @@ -237,6 +213,7 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index) return 0; } +#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000) static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap) { int i; @@ -355,13 +332,11 @@ static int core2_vpmu_alloc_resource(struct vcpu *v) vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, core2_calc_intial_glb_ctrl_msr()); - pmu_enable = xzalloc_bytes(sizeof(struct core2_pmu_enable) + - arch_pmc_cnt - 1); + pmu_enable = xzalloc_bytes(sizeof(struct core2_pmu_enable)); if ( !pmu_enable ) goto out_err; - core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) + - (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair)); + core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context)); if ( !core2_vpmu_cxt ) goto out_err; @@ -730,7 +705,7 @@ static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags) u64 msr_content; struct cpuinfo_x86 *c = ¤t_cpu_data; - if ( !(vpmu_flags & VPMU_BOOT_BTS) ) + if ( !(vpmu_flags & VPMU_INTEL_BTS) ) goto func_out; /* Check the ''Debug Store'' feature in the CPUID.EAX[1]:EDX[21] */ if ( cpu_has(c, X86_FEATURE_DS) ) diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c index d6a9ff6..768f766 100644 --- a/xen/arch/x86/hvm/vpmu.c +++ b/xen/arch/x86/hvm/vpmu.c @@ -31,6 +31,7 @@ #include <asm/hvm/svm/svm.h> #include <asm/hvm/svm/vmcb.h> #include <asm/apic.h> +#include <public/xenpmu.h> /* * "vpmu" : vpmu generally enabled @@ -51,7 +52,7 @@ static void __init parse_vpmu_param(char *s) break; default: if ( !strcmp(s, "bts") ) - opt_vpmu_enabled |= VPMU_BOOT_BTS; + opt_vpmu_enabled |= VPMU_INTEL_BTS; else if ( *s ) { printk("VPMU: unknown flag: %s - vpmu disabled!\n", s); @@ -59,7 +60,7 @@ static void __init parse_vpmu_param(char *s) } /* fall through */ case 1: - opt_vpmu_enabled |= VPMU_BOOT_ENABLED; + opt_vpmu_enabled |= VPMU_ON; break; } } diff --git a/xen/arch/x86/oprofile/op_model_ppro.c b/xen/arch/x86/oprofile/op_model_ppro.c index 3225937..2939a40 100644 --- a/xen/arch/x86/oprofile/op_model_ppro.c +++ b/xen/arch/x86/oprofile/op_model_ppro.c @@ -20,7 +20,6 @@ #include <asm/regs.h> #include <asm/current.h> #include <asm/hvm/vpmu.h> -#include <asm/hvm/vmx/vpmu_core2.h> #include "op_x86_model.h" #include "op_counter.h" diff --git a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h deleted file mode 100644 index 410372d..0000000 --- a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h +++ /dev/null @@ -1,32 +0,0 @@ - -/* - * vpmu_core2.h: CORE 2 specific PMU virtualization for HVM domain. - * - * Copyright (c) 2007, Intel Corporation. - * - * This program is free software; you can redistribute it and/or modify it - * under the terms and conditions of the GNU General Public License, - * version 2, as published by the Free Software Foundation. - * - * This program is distributed in the hope it will be useful, but WITHOUT - * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or - * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for - * more details. - * - * You should have received a copy of the GNU General Public License along with - * this program; if not, write to the Free Software Foundation, Inc., 59 Temple - * Place - Suite 330, Boston, MA 02111-1307 USA. - * - * Author: Haitao Shan <haitao.shan@intel.com> - */ - -#ifndef __ASM_X86_HVM_VPMU_CORE_H_ -#define __ASM_X86_HVM_VPMU_CORE_H_ - -struct arch_msr_pair { - u64 counter; - u64 control; -}; - -#endif /* __ASM_X86_HVM_VPMU_CORE_H_ */ - diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h index 674cdad..410ad12 100644 --- a/xen/include/asm-x86/hvm/vpmu.h +++ b/xen/include/asm-x86/hvm/vpmu.h @@ -22,19 +22,12 @@ #ifndef __ASM_X86_HVM_VPMU_H_ #define __ASM_X86_HVM_VPMU_H_ -/* - * Flag bits given as a string on the hypervisor boot parameter ''vpmu''. - * See arch/x86/hvm/vpmu.c. - */ -#define VPMU_BOOT_ENABLED 0x1 /* vpmu generally enabled. */ -#define VPMU_BOOT_BTS 0x2 /* Intel BTS feature wanted. */ +#include <public/xenpmu.h> -#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000) #define vcpu_vpmu(vcpu) (&((vcpu)->arch.hvm_vcpu.vpmu)) #define vpmu_vcpu(vpmu) (container_of((vpmu), struct vcpu, \ arch.hvm_vcpu.vpmu)) -#define vpmu_domain(vpmu) (vpmu_vcpu(vpmu)->domain) #define MSR_TYPE_COUNTER 0 #define MSR_TYPE_CTRL 1 @@ -76,11 +69,6 @@ struct vpmu_struct { #define VPMU_FROZEN 0x10 /* Stop counters while VCPU is not running */ #define VPMU_PASSIVE_DOMAIN_ALLOCATED 0x20 -/* VPMU features */ -#define VPMU_CPU_HAS_DS 0x100 /* Has Debug Store */ -#define VPMU_CPU_HAS_BTS 0x200 /* Has Branch Trace Store */ - - #define vpmu_set(_vpmu, _x) ((_vpmu)->flags |= (_x)) #define vpmu_reset(_vpmu, _x) ((_vpmu)->flags &= ~(_x)) #define vpmu_is_set(_vpmu, _x) ((_vpmu)->flags & (_x)) diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h new file mode 100644 index 0000000..420b674 --- /dev/null +++ b/xen/include/public/xenpmu.h @@ -0,0 +1,101 @@ +#ifndef __XEN_PUBLIC_XENPMU_H__ +#define __XEN_PUBLIC_XENPMU_H__ + +#include <asm/msr.h> + +#include "xen.h" + +#define XENPMU_VER_MAJ 0 +#define XENPMU_VER_MIN 0 + +/* VPMU modes */ +#define VPMU_MODE_MASK 0xff +#define VPMU_OFF 0 +/* guests can profile themselves, (dom0 profiles itself and Xen) */ +#define VPMU_ON (1<<0) +/* + * Only dom0 has access to VPMU and it profiles everyone: itself, + * the hypervisor and the guests. + */ +#define VPMU_PRIV (1<<1) + +/* VPMU flags */ +#define VPMU_FLAGS_MASK ((uint32_t)(~VPMU_MODE_MASK)) +#define VPMU_INTEL_BTS (1<<8) /* Ignored on AMD */ + + +/* AMD PMU registers and structures */ +#define F10H_NUM_COUNTERS 4 +#define F15H_NUM_COUNTERS 6 +/* To accommodate more counters in the future (e.g. NB counters) */ +#define MAX_NUM_COUNTERS 16 +struct amd_vpmu_context { + uint64_t counters[MAX_NUM_COUNTERS]; + uint64_t ctrls[MAX_NUM_COUNTERS]; + uint8_t msr_bitmap_set; +}; + + +/* Intel PMU registers and structures */ +static const uint32_t core2_fix_counters_msr[] = { + MSR_CORE_PERF_FIXED_CTR0, + MSR_CORE_PERF_FIXED_CTR1, + MSR_CORE_PERF_FIXED_CTR2 +}; +#define VPMU_CORE2_NUM_FIXED (sizeof(core2_fix_counters_msr) / sizeof(uint32_t)) + +/* Core 2 Non-architectual Performance Control MSRs. */ +static const uint32_t core2_ctrls_msr[] = { + MSR_CORE_PERF_FIXED_CTR_CTRL, + MSR_IA32_PEBS_ENABLE, + MSR_IA32_DS_AREA +}; +#define VPMU_CORE2_NUM_CTRLS (sizeof(core2_ctrls_msr) / sizeof(uint32_t)) + +#define VPMU_CORE2_MAX_ARCH_PMCS 16 +struct core2_pmu_enable { + char ds_area_enable; + char fixed_ctr_enable[VPMU_CORE2_NUM_FIXED]; + char arch_pmc_enable[VPMU_CORE2_MAX_ARCH_PMCS]; +}; + +struct arch_msr_pair { + uint64_t counter; + uint64_t control; +}; +struct core2_vpmu_context { + struct core2_pmu_enable *pmu_enable; + uint64_t fix_counters[VPMU_CORE2_NUM_FIXED]; + uint64_t ctrls[VPMU_CORE2_NUM_CTRLS]; + uint64_t global_ovf_status; + struct arch_msr_pair arch_msr_pair[VPMU_CORE2_MAX_ARCH_PMCS]; +}; + +/* PMU flags */ +#define PMU_CACHED 1 + +/* Shared between hypervisor and PV domain */ +typedef struct xenpmu_data { + struct cpu_user_regs regs; + uint16_t domain_id; + uint32_t vcpu_id; + uint32_t pcpu_id; + uint32_t pmu_flags; + union { + struct amd_vpmu_context amd; + struct core2_vpmu_context intel; + } pmu; +} xenpmu_data_t; + + +#endif /* __XEN_PUBLIC_XENPMU_H__ */ + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ -- 1.8.1.4
Boris Ostrovsky
2013-Sep-10 15:21 UTC
[PATCH v1 07/13] x86/PMU: Make vpmu not HVM-specific
vpmu structure will be used for both HVM and PV guests. Move it from hvm_vcpu to arch_vcpu. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- xen/include/asm-x86/domain.h | 2 ++ xen/include/asm-x86/hvm/vcpu.h | 3 --- xen/include/asm-x86/hvm/vpmu.h | 4 ++-- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index d79464d..4f2247e 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -397,6 +397,8 @@ struct arch_vcpu void (*ctxt_switch_from) (struct vcpu *); void (*ctxt_switch_to) (struct vcpu *); + struct vpmu_struct vpmu; + /* Virtual Machine Extensions */ union { struct pv_vcpu pv_vcpu; diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h index e8b8cd7..207f65d 100644 --- a/xen/include/asm-x86/hvm/vcpu.h +++ b/xen/include/asm-x86/hvm/vcpu.h @@ -139,9 +139,6 @@ struct hvm_vcpu { u32 msr_tsc_aux; u64 msr_tsc_adjust; - /* VPMU */ - struct vpmu_struct vpmu; - union { struct arch_vmx_struct vmx; struct arch_svm_struct svm; diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h index 410ad12..f5f8c9c 100644 --- a/xen/include/asm-x86/hvm/vpmu.h +++ b/xen/include/asm-x86/hvm/vpmu.h @@ -25,9 +25,9 @@ #include <public/xenpmu.h> -#define vcpu_vpmu(vcpu) (&((vcpu)->arch.hvm_vcpu.vpmu)) +#define vcpu_vpmu(vcpu) (&((vcpu)->arch.vpmu)) #define vpmu_vcpu(vpmu) (container_of((vpmu), struct vcpu, \ - arch.hvm_vcpu.vpmu)) + arch.vpmu)) #define MSR_TYPE_COUNTER 0 #define MSR_TYPE_CTRL 1 -- 1.8.1.4
Boris Ostrovsky
2013-Sep-10 15:21 UTC
[PATCH v1 08/13] x86/PMU: Interface for setting PMU mode and flags
Add runtime interface for setting PMU mode and flags. Three main modes are provided: * PMU off * PMU on: Guests can access PMU MSRs and receive PMU interrupts. dom0 profiles itself and the hypervisor. * dom0-only PMU: dom0 collects samples for both itself and guests. For feature flagso only Intel''s BTS is currently supported. Mode and flags are set via new PMU hypercall. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- xen/arch/x86/hvm/svm/vpmu.c | 2 +- xen/arch/x86/hvm/vmx/vpmu_core2.c | 2 +- xen/arch/x86/hvm/vpmu.c | 77 ++++++++++++++++++++++++++++++++++---- xen/arch/x86/x86_64/compat/entry.S | 4 ++ xen/arch/x86/x86_64/entry.S | 4 ++ xen/include/asm-x86/hvm/vpmu.h | 2 + xen/include/public/xen.h | 1 + xen/include/public/xenpmu.h | 19 ++++++++++ xen/include/xen/hypercall.h | 4 ++ 9 files changed, 105 insertions(+), 10 deletions(-) diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c index 9f9c9ea..4477f63 100644 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ b/xen/arch/x86/hvm/svm/vpmu.c @@ -458,7 +458,7 @@ int svm_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags) int ret = 0; /* vpmu enabled? */ - if ( !vpmu_flags ) + if ( vpmu_flags == VPMU_OFF ) return 0; switch ( family ) diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c index f3b6de0..66325d5 100644 --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c @@ -824,7 +824,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags) int ret = 0; vpmu->arch_vpmu_ops = &core2_no_vpmu_ops; - if ( !vpmu_flags ) + if ( vpmu_flags == VPMU_OFF ) return 0; if ( family == 6 ) diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c index 768f766..820576e 100644 --- a/xen/arch/x86/hvm/vpmu.c +++ b/xen/arch/x86/hvm/vpmu.c @@ -21,6 +21,7 @@ #include <xen/config.h> #include <xen/sched.h> #include <xen/xenoprof.h> +#include <xen/guest_access.h> #include <asm/regs.h> #include <asm/types.h> #include <asm/msr.h> @@ -38,7 +39,7 @@ * "vpmu=off" : vpmu generally disabled * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on. */ -static unsigned int __read_mostly opt_vpmu_enabled; +uint32_t __read_mostly vpmu_mode = VPMU_OFF; static void parse_vpmu_param(char *s); custom_param("vpmu", parse_vpmu_param); @@ -52,7 +53,7 @@ static void __init parse_vpmu_param(char *s) break; default: if ( !strcmp(s, "bts") ) - opt_vpmu_enabled |= VPMU_INTEL_BTS; + vpmu_mode |= VPMU_INTEL_BTS; else if ( *s ) { printk("VPMU: unknown flag: %s - vpmu disabled!\n", s); @@ -60,7 +61,7 @@ static void __init parse_vpmu_param(char *s) } /* fall through */ case 1: - opt_vpmu_enabled |= VPMU_ON; + vpmu_mode |= VPMU_ON; break; } } @@ -226,19 +227,19 @@ void vpmu_initialise(struct vcpu *v) switch ( vendor ) { case X86_VENDOR_AMD: - if ( svm_vpmu_initialise(v, opt_vpmu_enabled) != 0 ) - opt_vpmu_enabled = 0; + if ( svm_vpmu_initialise(v, vpmu_mode) != 0 ) + vpmu_mode = VPMU_OFF; break; case X86_VENDOR_INTEL: - if ( vmx_vpmu_initialise(v, opt_vpmu_enabled) != 0 ) - opt_vpmu_enabled = 0; + if ( vmx_vpmu_initialise(v, vpmu_mode) != 0 ) + vpmu_mode = VPMU_OFF; break; default: printk("VPMU: Initialization failed. " "Unknown CPU vendor %d\n", vendor); - opt_vpmu_enabled = 0; + vpmu_mode = VPMU_OFF; break; } } @@ -260,3 +261,63 @@ void vpmu_dump(struct vcpu *v) vpmu->arch_vpmu_ops->arch_vpmu_dump(v); } +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) +{ + int ret = -EINVAL; + xenpmu_params_t pmu_params; + uint32_t mode, flags; + + switch ( op ) + { + case XENPMU_mode_set: + if ( !is_control_domain(current->domain) ) + return -EPERM; + + if ( copy_from_guest(&pmu_params, arg, 1) ) + return -EFAULT; + + mode = (uint32_t)pmu_params.control & VPMU_MODE_MASK; + if ( (mode & ~(VPMU_ON | VPMU_PRIV)) || + ((mode & VPMU_ON) && (mode & VPMU_PRIV)) ) + return -EINVAL; + + vpmu_mode &= ~VPMU_MODE_MASK; + vpmu_mode |= mode; + + ret = 0; + break; + + case XENPMU_mode_get: + pmu_params.control = vpmu_mode & VPMU_MODE_MASK; + if ( copy_to_guest(arg, &pmu_params, 1) ) + return -EFAULT; + ret = 0; + break; + + case XENPMU_flags_set: + if ( !is_control_domain(current->domain) ) + return -EPERM; + + if ( copy_from_guest(&pmu_params, arg, 1) ) + return -EFAULT; + + flags = (uint64_t)pmu_params.control & VPMU_FLAGS_MASK; + if ( flags & ~VPMU_INTEL_BTS ) + return -EINVAL; + + vpmu_mode &= ~VPMU_FLAGS_MASK; + vpmu_mode |= flags; + + ret = 0; + break; + + case XENPMU_flags_get: + pmu_params.control = vpmu_mode & VPMU_FLAGS_MASK; + if ( copy_to_guest(arg, &pmu_params, 1) ) + return -EFAULT; + ret = 0; + break; + } + + return ret; +} diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S index c0afe2c..bc03ffe 100644 --- a/xen/arch/x86/x86_64/compat/entry.S +++ b/xen/arch/x86/x86_64/compat/entry.S @@ -413,6 +413,8 @@ ENTRY(compat_hypercall_table) .quad do_domctl .quad compat_kexec_op .quad do_tmem_op + .quad do_ni_hypercall /* reserved for XenClient */ + .quad do_xenpmu_op /* 40 */ .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8) .quad compat_ni_hypercall .endr @@ -461,6 +463,8 @@ ENTRY(compat_hypercall_args_table) .byte 1 /* do_domctl */ .byte 2 /* compat_kexec_op */ .byte 1 /* do_tmem_op */ + .byte 0 /* reserved for XenClient */ + .byte 2 /* do_xenpmu_op */ /* 40 */ .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table) .byte 0 /* compat_ni_hypercall */ .endr diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S index 5beeccb..2944427 100644 --- a/xen/arch/x86/x86_64/entry.S +++ b/xen/arch/x86/x86_64/entry.S @@ -762,6 +762,8 @@ ENTRY(hypercall_table) .quad do_domctl .quad do_kexec_op .quad do_tmem_op + .quad do_ni_hypercall /* reserved for XenClient */ + .quad do_xenpmu_op /* 40 */ .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8) .quad do_ni_hypercall .endr @@ -810,6 +812,8 @@ ENTRY(hypercall_args_table) .byte 1 /* do_domctl */ .byte 2 /* do_kexec */ .byte 1 /* do_tmem_op */ + .byte 0 /* reserved for XenClient */ + .byte 2 /* do_xenpmu_op */ /* 40 */ .rept __HYPERVISOR_arch_0-(.-hypercall_args_table) .byte 0 /* do_ni_hypercall */ .endr diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h index f5f8c9c..cc45c70 100644 --- a/xen/include/asm-x86/hvm/vpmu.h +++ b/xen/include/asm-x86/hvm/vpmu.h @@ -89,5 +89,7 @@ void vpmu_dump(struct vcpu *v); extern int acquire_pmu_ownership(int pmu_ownership); extern void release_pmu_ownership(int pmu_ownership); +extern uint32_t vpmu_mode; + #endif /* __ASM_X86_HVM_VPMU_H_*/ diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h index 3cab74f..7f56560 100644 --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -101,6 +101,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t); #define __HYPERVISOR_kexec_op 37 #define __HYPERVISOR_tmem_op 38 #define __HYPERVISOR_xc_reserved_op 39 /* reserved for XenClient */ +#define __HYPERVISOR_xenpmu_op 40 /* Architecture-specific hypercall definitions. */ #define __HYPERVISOR_arch_0 48 diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h index 420b674..7240a30 100644 --- a/xen/include/public/xenpmu.h +++ b/xen/include/public/xenpmu.h @@ -8,6 +8,25 @@ #define XENPMU_VER_MAJ 0 #define XENPMU_VER_MIN 0 +/* HYPERVISOR_xenpmu_op commands */ +#define XENPMU_mode_get 0 +#define XENPMU_mode_set 1 +#define XENPMU_flags_get 2 +#define XENPMU_flags_set 3 + +/* Parameters structure for HYPERVISOR_xenpmu_op call */ +typedef struct xenpmu_params { + union { + struct version { + uint8_t maj; + uint8_t min; + } version; + uint64_t pad; + }; + uint64_t control; +} xenpmu_params_t; + + /* VPMU modes */ #define VPMU_MODE_MASK 0xff #define VPMU_OFF 0 diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h index a9e5229..ad3d3de 100644 --- a/xen/include/xen/hypercall.h +++ b/xen/include/xen/hypercall.h @@ -14,6 +14,7 @@ #include <public/event_channel.h> #include <public/tmem.h> #include <public/version.h> +#include <public/xenpmu.h> #include <asm/hypercall.h> #include <xsm/xsm.h> @@ -139,6 +140,9 @@ do_tmem_op( extern long do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg); +extern long +do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg); + #ifdef CONFIG_COMPAT extern int -- 1.8.1.4
Boris Ostrovsky
2013-Sep-10 15:21 UTC
[PATCH v1 09/13] x86/PMU: Initialize PMU for PV guests
Code for initializing/deinitializing PMU, including setting up interrupt handlers. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- xen/arch/x86/hvm/svm/vpmu.c | 34 ++++++------ xen/arch/x86/hvm/vmx/vpmu_core2.c | 58 +++++++++++++------- xen/arch/x86/hvm/vpmu.c | 110 +++++++++++++++++++++++++++++++++++++- xen/common/event_channel.c | 1 + xen/include/asm-x86/hvm/vpmu.h | 1 + xen/include/public/xen.h | 1 + xen/include/public/xenpmu.h | 23 ++++++-- xen/include/xen/softirq.h | 1 + 8 files changed, 189 insertions(+), 40 deletions(-) diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c index 4477f63..c39c7a2 100644 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ b/xen/arch/x86/hvm/svm/vpmu.c @@ -367,14 +367,19 @@ static int amd_vpmu_initialise(struct vcpu *v) } } - ctxt = xzalloc(struct amd_vpmu_context); - if ( !ctxt ) + if ( is_hvm_domain(v->domain) ) { - gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, " - " PMU feature is unavailable on domain %d vcpu %d.\n", - v->vcpu_id, v->domain->domain_id); - return -ENOMEM; + ctxt = xzalloc(struct amd_vpmu_context); + if ( !ctxt ) + { + gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, " + " PMU feature is unavailable on domain %d vcpu %d.\n", + v->vcpu_id, v->domain->domain_id); + return -ENOMEM; + } } + else + ctxt = &v->arch.vpmu.xenpmu_data->pmu.amd; vpmu->context = ctxt; vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED); @@ -388,18 +393,17 @@ static void amd_vpmu_destroy(struct vcpu *v) if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) return; - if ( is_hvm_domain(v->domain) && - ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set ) - amd_vpmu_unset_msr_bitmap(v); - - xfree(vpmu->context); - vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED); - - if ( vpmu_is_set(vpmu, VPMU_RUNNING) ) + if ( is_hvm_domain(v->domain) ) { - vpmu_reset(vpmu, VPMU_RUNNING); + if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set ) + amd_vpmu_unset_msr_bitmap(v); + + xfree(vpmu->context); release_pmu_ownship(PMU_OWNER_HVM); } + + vpmu->context = NULL; + vpmu_clear(vpmu); } /* VPMU part of the ''q'' keyhandler */ diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c index 66325d5..ecaa799 100644 --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c @@ -320,25 +320,33 @@ static int core2_vpmu_alloc_resource(struct vcpu *v) struct core2_vpmu_context *core2_vpmu_cxt; struct core2_pmu_enable *pmu_enable = NULL; - if ( !acquire_pmu_ownership(PMU_OWNER_HVM) ) + pmu_enable = xzalloc_bytes(sizeof(struct core2_pmu_enable)); + if ( !pmu_enable ) return 0; + + if ( is_hvm_domain(v->domain) ) + { + if ( !acquire_pmu_ownership(PMU_OWNER_HVM) ) + goto out_err; - wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0); - if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) ) - goto out_err; + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0); + if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) ) + goto out_err; - if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) ) - goto out_err; - vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, - core2_calc_intial_glb_ctrl_msr()); + if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) ) + goto out_err; + vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, + core2_calc_intial_glb_ctrl_msr()); - pmu_enable = xzalloc_bytes(sizeof(struct core2_pmu_enable)); - if ( !pmu_enable ) - goto out_err; - - core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context)); - if ( !core2_vpmu_cxt ) - goto out_err; + core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context)); + if ( !core2_vpmu_cxt ) + goto out_err; + } + else + { + core2_vpmu_cxt = &v->arch.vpmu.xenpmu_data->pmu.intel; + vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED); + } core2_vpmu_cxt->pmu_enable = pmu_enable; vpmu->context = (void *)core2_vpmu_cxt; @@ -748,6 +756,11 @@ func_out: arch_pmc_cnt = core2_get_pmc_count(); check_pmc_quirk(); + /* PV domains can allocate resources immediately */ + if ( !is_hvm_domain(v->domain) ) + if ( !core2_vpmu_alloc_resource(v) ) + return 1; + return 0; } @@ -758,12 +771,17 @@ static void core2_vpmu_destroy(struct vcpu *v) if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) return; - xfree(core2_vpmu_cxt->pmu_enable); - xfree(vpmu->context); - if ( cpu_has_vmx_msr_bitmap && is_hvm_domain(v->domain) ) - core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap); + + if ( is_hvm_domain(v->domain) ) + { + xfree(core2_vpmu_cxt->pmu_enable); + xfree(vpmu->context); + if ( cpu_has_vmx_msr_bitmap ) + core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap); + } + release_pmu_ownship(PMU_OWNER_HVM); - vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED); + vpmu_clear(vpmu); } struct arch_vpmu_ops core2_vpmu_ops = { diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c index 820576e..04cc114 100644 --- a/xen/arch/x86/hvm/vpmu.c +++ b/xen/arch/x86/hvm/vpmu.c @@ -21,6 +21,9 @@ #include <xen/config.h> #include <xen/sched.h> #include <xen/xenoprof.h> +#include <xen/event.h> +#include <xen/softirq.h> +#include <xen/hypercall.h> #include <xen/guest_access.h> #include <asm/regs.h> #include <asm/types.h> @@ -32,6 +35,7 @@ #include <asm/hvm/svm/svm.h> #include <asm/hvm/svm/vmcb.h> #include <asm/apic.h> +#include <asm/nmi.h> #include <public/xenpmu.h> /* @@ -249,7 +253,13 @@ void vpmu_destroy(struct vcpu *v) struct vpmu_struct *vpmu = vcpu_vpmu(v); if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy ) + { + /* Unload VPMU first. This will stop counters from running */ + on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu), + vpmu_save_force, (void *)v, 1); + vpmu->arch_vpmu_ops->arch_vpmu_destroy(v); + } } /* Dump some vpmu informations on console. Used in keyhandler dump_domains(). */ @@ -261,6 +271,92 @@ void vpmu_dump(struct vcpu *v) vpmu->arch_vpmu_ops->arch_vpmu_dump(v); } +int pmu_nmi_interrupt(struct cpu_user_regs *regs, int cpu) +{ + return vpmu_do_interrupt(regs); +} + +/* Process the softirq set by PMU NMI handler */ +void pmu_virq(void) +{ + struct vcpu *v = current; + + if ( (vpmu_mode & VPMU_PRIV) || + (v->domain->domain_id >= DOMID_FIRST_RESERVED) ) + { + if ( smp_processor_id() >= dom0->max_vcpus ) + { + printk(KERN_WARNING "PMU softirq on unexpected processor %d\n", + smp_processor_id()); + return; + } + v = dom0->vcpu[smp_processor_id()]; + } + + send_guest_vcpu_virq(v, VIRQ_XENPMU); +} + +static int pvpmu_init(struct domain *d, xenpmu_params_t *params) +{ + struct vcpu *v; + static int pvpmu_initted = 0; + + if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus ) + return -EINVAL; + + if ( !pvpmu_initted ) + { + if (reserve_lapic_nmi() == 0) + set_nmi_callback(pmu_nmi_interrupt); + else + { + printk("Failed to reserve PMU NMI\n"); + return -EBUSY; + } + open_softirq(PMU_SOFTIRQ, pmu_virq); + pvpmu_initted = 1; + } + + if ( !mfn_valid(params->mfn) || + !get_page_and_type(mfn_to_page(params->mfn), d, PGT_writable_page) ) + return -EINVAL; + + v = d->vcpu[params->vcpu]; + v->arch.vpmu.xenpmu_data = map_domain_page_global(params->mfn); + memset(v->arch.vpmu.xenpmu_data, 0, PAGE_SIZE); + + vpmu_initialise(v); + + return 0; +} + +static void pvpmu_finish(struct domain *d, xenpmu_params_t *params) +{ + struct vcpu *v; + uint64_t mfn; + + if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus ) + return; + + v = d->vcpu[params->vcpu]; + if (v != current) + vcpu_pause(v); + + if ( v->arch.vpmu.xenpmu_data ) + { + mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data); + if ( mfn_valid(mfn) ) + { + unmap_domain_page_global(v->arch.vpmu.xenpmu_data); + put_page_and_type(mfn_to_page(mfn)); + } + } + vpmu_destroy(v); + + if (v != current) + vcpu_unpause(v); +} + long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) { int ret = -EINVAL; @@ -317,7 +413,19 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) return -EFAULT; ret = 0; break; - } + + case XENPMU_init: + if ( copy_from_guest(&pmu_params, arg, 1) ) + return -EFAULT; + ret = pvpmu_init(current->domain, &pmu_params); + break; + + case XENPMU_finish: + if ( copy_from_guest(&pmu_params, arg, 1) ) + return -EFAULT; + pvpmu_finish(current->domain, &pmu_params); + break; + } return ret; } diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index 64c976b..9ee6e5a 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -107,6 +107,7 @@ static int virq_is_global(uint32_t virq) case VIRQ_TIMER: case VIRQ_DEBUG: case VIRQ_XENOPROF: + case VIRQ_XENPMU: rc = 0; break; case VIRQ_ARCH_0 ... VIRQ_ARCH_7: diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h index cc45c70..46dfbc6 100644 --- a/xen/include/asm-x86/hvm/vpmu.h +++ b/xen/include/asm-x86/hvm/vpmu.h @@ -59,6 +59,7 @@ struct vpmu_struct { u32 hw_lapic_lvtpc; void *context; struct arch_vpmu_ops *arch_vpmu_ops; + xenpmu_data_t *xenpmu_data; }; /* VPMU states */ diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h index 7f56560..91d3db2 100644 --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -161,6 +161,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t); #define VIRQ_MEM_EVENT 10 /* G. (DOM0) A memory event has occured */ #define VIRQ_XC_RESERVED 11 /* G. Reserved for XenClient */ #define VIRQ_ENOMEM 12 /* G. (DOM0) Low on heap memory */ +#define VIRQ_XENPMU 13 /* V. PMC interrupt */ /* Architecture-specific VIRQ definitions. */ #define VIRQ_ARCH_0 16 diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h index 7240a30..ffaf3fe 100644 --- a/xen/include/public/xenpmu.h +++ b/xen/include/public/xenpmu.h @@ -13,6 +13,8 @@ #define XENPMU_mode_set 1 #define XENPMU_flags_get 2 #define XENPMU_flags_set 3 +#define XENPMU_init 4 +#define XENPMU_finish 5 /* Parameters structure for HYPERVISOR_xenpmu_op call */ typedef struct xenpmu_params { @@ -24,6 +26,8 @@ typedef struct xenpmu_params { uint64_t pad; }; uint64_t control; + uint64_t mfn; + uint64_t vcpu; } xenpmu_params_t; @@ -83,11 +87,14 @@ struct arch_msr_pair { uint64_t control; }; struct core2_vpmu_context { - struct core2_pmu_enable *pmu_enable; + uint64_t global_ctrl; + uint64_t global_ovf_ctrl; + uint64_t global_status; + uint64_t global_ovf_status; uint64_t fix_counters[VPMU_CORE2_NUM_FIXED]; uint64_t ctrls[VPMU_CORE2_NUM_CTRLS]; - uint64_t global_ovf_status; struct arch_msr_pair arch_msr_pair[VPMU_CORE2_MAX_ARCH_PMCS]; + struct core2_pmu_enable *pmu_enable; }; /* PMU flags */ @@ -95,14 +102,22 @@ struct core2_vpmu_context { /* Shared between hypervisor and PV domain */ typedef struct xenpmu_data { - struct cpu_user_regs regs; - uint16_t domain_id; + union { + struct cpu_user_regs regs; + uint8_t pad[256]; + }; + uint32_t domain_id; uint32_t vcpu_id; uint32_t pcpu_id; uint32_t pmu_flags; union { struct amd_vpmu_context amd; struct core2_vpmu_context intel; +#define MAX(x,y) ((x) > (y) ? (x) : (y)) +#define MAX_CTXT_SZ MAX(sizeof(struct amd_vpmu_context),\ + sizeof(struct core2_vpmu_context)) +#define PMU_PAD_SIZE ((MAX_CTXT_SZ + 64) & ~63) + 128 + uint8_t pad[PMU_PAD_SIZE]; /* a bit more than necessary */ } pmu; } xenpmu_data_t; diff --git a/xen/include/xen/softirq.h b/xen/include/xen/softirq.h index 0c0d481..5829fa4 100644 --- a/xen/include/xen/softirq.h +++ b/xen/include/xen/softirq.h @@ -8,6 +8,7 @@ enum { NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ, RCU_SOFTIRQ, TASKLET_SOFTIRQ, + PMU_SOFTIRQ, NR_COMMON_SOFTIRQS }; -- 1.8.1.4
Boris Ostrovsky
2013-Sep-10 15:21 UTC
[PATCH v1 10/13] x86/PMU: Add support for PMU registes handling on PV guests
Intercept accesses to PMU MSRs and LVTPC APIC vector (only APIC_LVT_MASKED bit is processed) and process them in VPMU module. Dump VPMU state for all domains (HVM and PV) when requested. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- xen/arch/x86/domain.c | 3 +- xen/arch/x86/hvm/vmx/vpmu_core2.c | 94 ++++++++++++++++++++++++++++++--------- xen/arch/x86/hvm/vpmu.c | 16 +++++++ xen/arch/x86/traps.c | 38 +++++++++++++++- xen/include/public/xenpmu.h | 2 + 5 files changed, 128 insertions(+), 25 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index e119d7b..36f4192 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1940,8 +1940,7 @@ void arch_dump_vcpu_info(struct vcpu *v) { paging_dump_vcpu_info(v); - if ( is_hvm_vcpu(v) ) - vpmu_dump(v); + vpmu_dump(v); } void domain_cpuid( diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c index ecaa799..489dc49 100644 --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c @@ -27,6 +27,7 @@ #include <asm/regs.h> #include <asm/types.h> #include <asm/apic.h> +#include <asm/traps.h> #include <asm/msr.h> #include <asm/msr-index.h> #include <asm/hvm/support.h> @@ -270,6 +271,9 @@ static inline void __core2_vpmu_save(struct vcpu *v) rdmsrl(core2_fix_counters_msr[i], core2_vpmu_cxt->fix_counters[i]); for ( i = 0; i < arch_pmc_cnt; i++ ) rdmsrl(MSR_IA32_PERFCTR0+i, core2_vpmu_cxt->arch_msr_pair[i].counter); + + if ( !is_hvm_domain(v->domain) ) + rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status); } static int core2_vpmu_save(struct vcpu *v) @@ -279,10 +283,14 @@ static int core2_vpmu_save(struct vcpu *v) if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) ) return 0; + if ( !is_hvm_domain(v->domain) ) + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0); + __core2_vpmu_save(v); /* Unset PMU MSR bitmap to trap lazy load. */ - if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && cpu_has_vmx_msr_bitmap ) + if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && cpu_has_vmx_msr_bitmap + && is_hvm_domain(v->domain) ) core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap); return 1; @@ -302,6 +310,12 @@ static inline void __core2_vpmu_load(struct vcpu *v) wrmsrl(core2_ctrls_msr[i], core2_vpmu_cxt->ctrls[i]); for ( i = 0; i < arch_pmc_cnt; i++ ) wrmsrl(MSR_P6_EVNTSEL0+i, core2_vpmu_cxt->arch_msr_pair[i].control); + + if ( !is_hvm_domain(v->domain) ) + { + wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, core2_vpmu_cxt->global_ovf_ctrl); + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl); + } } static void core2_vpmu_load(struct vcpu *v) @@ -431,7 +445,12 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) ) return 1; gdprintk(XENLOG_WARNING, "Debug Store is not supported on this cpu\n"); - hvm_inject_hw_exception(TRAP_gp_fault, 0); + + if ( is_hvm_domain(v->domain) ) + hvm_inject_hw_exception(TRAP_gp_fault, 0); + else + send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault); + return 0; } } @@ -443,11 +462,15 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) { case MSR_CORE_PERF_GLOBAL_OVF_CTRL: core2_vpmu_cxt->global_ovf_status &= ~msr_content; + core2_vpmu_cxt->global_ovf_ctrl = msr_content; return 1; case MSR_CORE_PERF_GLOBAL_STATUS: gdprintk(XENLOG_INFO, "Can not write readonly MSR: " "MSR_PERF_GLOBAL_STATUS(0x38E)!\n"); - hvm_inject_hw_exception(TRAP_gp_fault, 0); + if ( is_hvm_domain(v->domain) ) + hvm_inject_hw_exception(TRAP_gp_fault, 0); + else + send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault); return 1; case MSR_IA32_PEBS_ENABLE: if ( msr_content & 1 ) @@ -462,7 +485,10 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) gdprintk(XENLOG_WARNING, "Illegal address for IA32_DS_AREA: %#" PRIx64 "x\n", msr_content); - hvm_inject_hw_exception(TRAP_gp_fault, 0); + if ( is_hvm_domain(v->domain) ) + hvm_inject_hw_exception(TRAP_gp_fault, 0); + else + send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault); return 1; } core2_vpmu_cxt->pmu_enable->ds_area_enable = msr_content ? 1 : 0; @@ -492,7 +518,10 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) break; case MSR_CORE_PERF_FIXED_CTR_CTRL: non_global_ctrl = msr_content; - vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl); + if ( is_hvm_domain(v->domain) ) + vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl); + else + rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl); global_ctrl >>= 32; for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) { @@ -504,7 +533,10 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) break; default: tmp = msr - MSR_P6_EVNTSEL0; - vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl); + if ( is_hvm_domain(v->domain) ) + vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl); + else + rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl); if ( tmp >= 0 && tmp < arch_pmc_cnt ) core2_vpmu_cxt->pmu_enable->arch_pmc_enable[tmp] (global_ctrl >> tmp) & (msr_content >> 22) & 1; @@ -520,17 +552,20 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) else vpmu_reset(vpmu, VPMU_RUNNING); - /* Setup LVTPC in local apic */ - if ( vpmu_is_set(vpmu, VPMU_RUNNING) && - is_vlapic_lvtpc_enabled(vcpu_vlapic(v)) ) - { - apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR); - vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR; - } - else + if ( is_hvm_domain(v->domain) ) { - apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED); - vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED; + /* Setup LVTPC in local apic */ + if ( vpmu_is_set(vpmu, VPMU_RUNNING) && + is_vlapic_lvtpc_enabled(vcpu_vlapic(v)) ) + { + apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR); + vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR; + } + else + { + apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED); + vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED; + } } core2_vpmu_save_msr_context(v, type, index, msr_content); @@ -559,13 +594,27 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) inject_gp = 1; break; } - if (inject_gp) - hvm_inject_hw_exception(TRAP_gp_fault, 0); + + if (inject_gp) + { + if ( is_hvm_domain(v->domain) ) + hvm_inject_hw_exception(TRAP_gp_fault, 0); + else + send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault); + } else wrmsrl(msr, msr_content); } - else - vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content); + else + { + if ( is_hvm_domain(v->domain) ) + vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content); + else + { + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content); + core2_vpmu_cxt->global_ctrl = msr_content; + } + } return 1; } @@ -589,7 +638,10 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) *msr_content = core2_vpmu_cxt->global_ovf_status; break; case MSR_CORE_PERF_GLOBAL_CTRL: - vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content); + if ( is_hvm_domain(v->domain) ) + vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content); + else + rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content); break; default: rdmsrl(msr, *msr_content); diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c index 04cc114..0adacce 100644 --- a/xen/arch/x86/hvm/vpmu.c +++ b/xen/arch/x86/hvm/vpmu.c @@ -70,6 +70,14 @@ static void __init parse_vpmu_param(char *s) } } +static void vpmu_lvtpc_update(uint32_t val) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(current); + + vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED); + apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc); +} + int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) { struct vpmu_struct *vpmu = vcpu_vpmu(current); @@ -425,6 +433,14 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) return -EFAULT; pvpmu_finish(current->domain, &pmu_params); break; + + case XENPMU_lvtpc_set: + if ( copy_from_guest(&pmu_params, arg, 1) ) + return -EFAULT; + + vpmu_lvtpc_update((uint32_t)pmu_params.lvtpc); + ret = 0; + break; } return ret; diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 57dbd0c..64c9c25 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -71,6 +71,7 @@ #include <asm/apic.h> #include <asm/mc146818rtc.h> #include <asm/hpet.h> +#include <asm/hvm/vpmu.h> #include <public/arch-x86/cpuid.h> #include <xsm/xsm.h> @@ -871,7 +872,6 @@ static void pv_cpuid(struct cpu_user_regs *regs) break; case 0x00000005: /* MONITOR/MWAIT */ - case 0x0000000a: /* Architectural Performance Monitor Features */ case 0x0000000b: /* Extended Topology Enumeration */ case 0x8000000a: /* SVM revision and features */ case 0x8000001b: /* Instruction Based Sampling */ @@ -880,7 +880,8 @@ static void pv_cpuid(struct cpu_user_regs *regs) unsupported: a = b = c = d = 0; break; - + case 0x0000000a: /* Architectural Performance Monitor Features (Intel) */ + break; default: (void)cpuid_hypervisor_leaves(regs->eax, 0, &a, &b, &c, &d); break; @@ -2486,6 +2487,17 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) if ( wrmsr_safe(regs->ecx, msr_content) != 0 ) goto fail; break; + case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1: + case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1: + case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2: + case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL: + case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5: + if ( !vpmu_do_wrmsr(regs->ecx, msr_content) ) + { + if ( (vpmu_mode & VPMU_PRIV) && (v->domain == dom0) ) + goto invalid; + } + break; default: if ( wrmsr_hypervisor_regs(regs->ecx, msr_content) == 1 ) break; @@ -2574,6 +2586,24 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) regs->eax = (uint32_t)msr_content; regs->edx = (uint32_t)(msr_content >> 32); break; + case MSR_IA32_PERF_CAPABILITIES: + if ( rdmsr_safe(regs->ecx, msr_content) ) + goto fail; + /* Full-Width Writes not supported */ + regs->eax = (uint32_t)msr_content & ~(1 << 13); + regs->edx = (uint32_t)(msr_content >> 32); + break; + case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1: + case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1: + case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2: + case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL: + case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5: + if ( vpmu_do_rdmsr(regs->ecx, &msr_content) ) { + regs->eax = (uint32_t)msr_content; + regs->edx = (uint32_t)(msr_content >> 32); + break; + } + goto rdmsr_normal; default: if ( rdmsr_hypervisor_regs(regs->ecx, &val) ) { @@ -2606,6 +2636,10 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) pv_cpuid(regs); break; + case 0x33: /* RDPMC */ + rdpmc(regs->ecx, regs->eax, regs->edx); + break; + default: goto fail; } diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h index ffaf3fe..dc8bad2 100644 --- a/xen/include/public/xenpmu.h +++ b/xen/include/public/xenpmu.h @@ -15,6 +15,7 @@ #define XENPMU_flags_set 3 #define XENPMU_init 4 #define XENPMU_finish 5 +#define XENPMU_lvtpc_set 6 /* Parameters structure for HYPERVISOR_xenpmu_op call */ typedef struct xenpmu_params { @@ -28,6 +29,7 @@ typedef struct xenpmu_params { uint64_t control; uint64_t mfn; uint64_t vcpu; + uint64_t lvtpc; } xenpmu_params_t; -- 1.8.1.4
Boris Ostrovsky
2013-Sep-10 15:21 UTC
[PATCH v1 11/13] x86/PMU: Handle PMU interrupts for PV guests
Add support for handling PMU interrupts for PV guests, make these interrupts NMI instead of PMU_APIC_VECTOR vector. Depending on vpmu_mode forward the interrupts to appropriate guest (mode is VPMU_ON) or to dom0 (VPMU_DOM0). VPMU for the interrupted VCPU is unloaded until the guest issues XENPMU_flush hypercall. This allows the guest to access PMU MSR values that are stored in VPMU context which is shared between hypervisor and domain, thus avoiding traps to hypervisor. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- xen/arch/x86/apic.c | 13 --- xen/arch/x86/hvm/svm/vpmu.c | 8 +- xen/arch/x86/hvm/vmx/vpmu_core2.c | 8 +- xen/arch/x86/hvm/vpmu.c | 110 +++++++++++++++++++++++-- xen/include/asm-x86/hvm/vpmu.h | 1 + xen/include/asm-x86/irq.h | 1 - xen/include/asm-x86/mach-default/irq_vectors.h | 1 - xen/include/public/xenpmu.h | 1 + 8 files changed, 114 insertions(+), 29 deletions(-) diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c index a52a0e8..9675e76 100644 --- a/xen/arch/x86/apic.c +++ b/xen/arch/x86/apic.c @@ -125,9 +125,6 @@ void __init apic_intr_init(void) /* IPI vectors for APIC spurious and error interrupts */ set_direct_apic_vector(SPURIOUS_APIC_VECTOR, spurious_interrupt); set_direct_apic_vector(ERROR_APIC_VECTOR, error_interrupt); - - /* Performance Counters Interrupt */ - set_direct_apic_vector(PMU_APIC_VECTOR, pmu_apic_interrupt); } /* Using APIC to generate smp_local_timer_interrupt? */ @@ -1368,16 +1365,6 @@ void error_interrupt(struct cpu_user_regs *regs) } /* - * This interrupt handles performance counters interrupt - */ - -void pmu_apic_interrupt(struct cpu_user_regs *regs) -{ - ack_APIC_irq(); - vpmu_do_interrupt(regs); -} - -/* * This initializes the IO-APIC and APIC hardware if this is * a UP kernel. */ diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c index c39c7a2..1815674 100644 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ b/xen/arch/x86/hvm/svm/vpmu.c @@ -280,8 +280,8 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) if ( !acquire_pmu_ownership(PMU_OWNER_HVM) ) return 1; vpmu_set(vpmu, VPMU_RUNNING); - apic_write(APIC_LVTPC, PMU_APIC_VECTOR); - vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR; + apic_write(APIC_LVTPC, APIC_DM_NMI); + vpmu->hw_lapic_lvtpc = APIC_DM_NMI; if ( is_hvm_domain(v->domain) && !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set ) @@ -292,8 +292,8 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) && (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu, VPMU_RUNNING) ) { - apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED); - vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED; + apic_write(APIC_LVTPC, APIC_DM_NMI | APIC_LVT_MASKED); + vpmu->hw_lapic_lvtpc = APIC_DM_NMI | APIC_LVT_MASKED; vpmu_reset(vpmu, VPMU_RUNNING); if ( is_hvm_domain(v->domain) && ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set ) diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c index 489dc49..3f5941a 100644 --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c @@ -558,13 +558,13 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) if ( vpmu_is_set(vpmu, VPMU_RUNNING) && is_vlapic_lvtpc_enabled(vcpu_vlapic(v)) ) { - apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR); - vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR; + apic_write_around(APIC_LVTPC, APIC_DM_NMI); + vpmu->hw_lapic_lvtpc = APIC_DM_NMI; } else { - apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED); - vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED; + apic_write_around(APIC_LVTPC, APIC_DM_NMI | APIC_LVT_MASKED); + vpmu->hw_lapic_lvtpc = APIC_DM_NMI | APIC_LVT_MASKED; } } diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c index 0adacce..f28b7af 100644 --- a/xen/arch/x86/hvm/vpmu.c +++ b/xen/arch/x86/hvm/vpmu.c @@ -47,6 +47,7 @@ uint32_t __read_mostly vpmu_mode = VPMU_OFF; static void parse_vpmu_param(char *s); custom_param("vpmu", parse_vpmu_param); +static void vpmu_save_force(void *arg); static DEFINE_PER_CPU(struct vcpu *, last_vcpu); static void __init parse_vpmu_param(char *s) @@ -74,7 +75,7 @@ static void vpmu_lvtpc_update(uint32_t val) { struct vpmu_struct *vpmu = vcpu_vpmu(current); - vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED); + vpmu->hw_lapic_lvtpc = APIC_DM_NMI | (val & APIC_LVT_MASKED); apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc); } @@ -82,6 +83,9 @@ int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) { struct vpmu_struct *vpmu = vcpu_vpmu(current); + if ( (vpmu_mode & VPMU_PRIV) && (current->domain != dom0) ) + return 0; + if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr ) return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content); return 0; @@ -91,6 +95,9 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) { struct vpmu_struct *vpmu = vcpu_vpmu(current); + if ( vpmu_mode & VPMU_PRIV && current->domain != dom0 ) + return 0; + if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr ) return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content); return 0; @@ -99,17 +106,96 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) int vpmu_do_interrupt(struct cpu_user_regs *regs) { struct vcpu *v = current; - struct vpmu_struct *vpmu = vcpu_vpmu(v); + struct vpmu_struct *vpmu; - if ( vpmu->arch_vpmu_ops ) + + /* dom0 will handle this interrupt */ + if ( (vpmu_mode & VPMU_PRIV) || + (v->domain->domain_id >= DOMID_FIRST_RESERVED) ) + { + if ( smp_processor_id() >= dom0->max_vcpus ) + return 0; + v = dom0->vcpu[smp_processor_id()]; + } + + vpmu = vcpu_vpmu(v); + if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) + return 0; + + if ( !is_hvm_domain(v->domain) || vpmu_mode & VPMU_PRIV ) + { + /* PV guest or dom0 is doing system profiling */ + void *p; + struct cpu_user_regs *gregs; + + p = v->arch.vpmu.xenpmu_data; + + /* PV guest will be reading PMU MSRs from xenpmu_data */ + vpmu_save_force(v); + + /* Store appropriate registers in xenpmu_data + * + * Note: ''!current->is_running'' is possible when ''set_current(next)'' + * for the (HVM) guest has been called but ''reset_stack_and_jump()'' + * has not (i.e. the guest is not actually running yet). + */ + if ( !is_hvm_domain(current->domain) || + ((vpmu_mode & VPMU_PRIV) && !current->is_running) ) + { + /* + * 32-bit dom0 cannot process Xen''s addresses (which are 64 bit) + * and therefore we treat it the same way as a non-priviledged + * PV 32-bit domain. + */ + if ( is_pv_32bit_domain(current->domain) ) + { + struct compat_cpu_user_regs cmp; + + gregs = guest_cpu_user_regs(); + XLAT_cpu_user_regs(&cmp, gregs); + memcpy(p, &cmp, sizeof(struct compat_cpu_user_regs)); + } + else if ( (current->domain != dom0) && !is_idle_vcpu(current) && + !(vpmu_mode & VPMU_PRIV) ) + { + /* PV guest */ + gregs = guest_cpu_user_regs(); + memcpy(p, gregs, sizeof(struct cpu_user_regs)); + } + else + memcpy(p, regs, sizeof(struct cpu_user_regs)); + } + else + { + /* HVM guest */ + struct segment_register cs; + + gregs = guest_cpu_user_regs(); + hvm_get_segment_register(current, x86_seg_cs, &cs); + gregs->cs = cs.attr.fields.dpl; + + memcpy(p, gregs, sizeof(struct cpu_user_regs)); + } + + v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id; + v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id; + + raise_softirq(PMU_SOFTIRQ); + vpmu_set(vpmu, VPMU_WAIT_FOR_FLUSH); + + return 1; + } + else if ( vpmu->arch_vpmu_ops ) { - struct vlapic *vlapic = vcpu_vlapic(v); + /* HVM guest */ + struct vlapic *vlapic; u32 vlapic_lvtpc; unsigned char int_vec; if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) ) return 0; + vlapic = vcpu_vlapic(v); if ( !is_vlapic_lvtpc_enabled(vlapic) ) return 1; @@ -169,7 +255,7 @@ void vpmu_save(struct vcpu *v) if ( vpmu->arch_vpmu_ops->arch_vpmu_save(v) ) vpmu_reset(vpmu, VPMU_CONTEXT_LOADED); - apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED); + apic_write(APIC_LVTPC, APIC_DM_NMI | APIC_LVT_MASKED); } void vpmu_load(struct vcpu *v) @@ -223,7 +309,13 @@ void vpmu_load(struct vcpu *v) vpmu->arch_vpmu_ops->arch_vpmu_load(v); } - vpmu_set(vpmu, VPMU_CONTEXT_LOADED); + /* + * PMU interrupt may happen while loading the context above. That + * may cause vpmu_save_force() in the handler so we we don''t + * want to mark the context as loaded. + */ + if ( !vpmu_is_set(vpmu, VPMU_WAIT_FOR_FLUSH) ) + vpmu_set(vpmu, VPMU_CONTEXT_LOADED); } void vpmu_initialise(struct vcpu *v) @@ -441,6 +533,12 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) vpmu_lvtpc_update((uint32_t)pmu_params.lvtpc); ret = 0; break; + + case XENPMU_flush: + vpmu_reset(vcpu_vpmu(current), VPMU_WAIT_FOR_FLUSH); + vpmu_load(current); + ret = 0; + break; } return ret; diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h index 46dfbc6..f7f507f 100644 --- a/xen/include/asm-x86/hvm/vpmu.h +++ b/xen/include/asm-x86/hvm/vpmu.h @@ -69,6 +69,7 @@ struct vpmu_struct { #define VPMU_CONTEXT_SAVE 0x8 /* Force context save */ #define VPMU_FROZEN 0x10 /* Stop counters while VCPU is not running */ #define VPMU_PASSIVE_DOMAIN_ALLOCATED 0x20 +#define VPMU_WAIT_FOR_FLUSH 0x40 /* PV guest waits for XENPMU_flush */ #define vpmu_set(_vpmu, _x) ((_vpmu)->flags |= (_x)) #define vpmu_reset(_vpmu, _x) ((_vpmu)->flags &= ~(_x)) diff --git a/xen/include/asm-x86/irq.h b/xen/include/asm-x86/irq.h index 7f5da06..e582a72 100644 --- a/xen/include/asm-x86/irq.h +++ b/xen/include/asm-x86/irq.h @@ -88,7 +88,6 @@ void invalidate_interrupt(struct cpu_user_regs *regs); void call_function_interrupt(struct cpu_user_regs *regs); void apic_timer_interrupt(struct cpu_user_regs *regs); void error_interrupt(struct cpu_user_regs *regs); -void pmu_apic_interrupt(struct cpu_user_regs *regs); void spurious_interrupt(struct cpu_user_regs *regs); void irq_move_cleanup_interrupt(struct cpu_user_regs *regs); diff --git a/xen/include/asm-x86/mach-default/irq_vectors.h b/xen/include/asm-x86/mach-default/irq_vectors.h index 992e00c..46dcfaf 100644 --- a/xen/include/asm-x86/mach-default/irq_vectors.h +++ b/xen/include/asm-x86/mach-default/irq_vectors.h @@ -8,7 +8,6 @@ #define EVENT_CHECK_VECTOR 0xfc #define CALL_FUNCTION_VECTOR 0xfb #define LOCAL_TIMER_VECTOR 0xfa -#define PMU_APIC_VECTOR 0xf9 /* * High-priority dynamically-allocated vectors. For interrupts that * must be higher priority than any guest-bound interrupt. diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h index dc8bad2..e70e55e 100644 --- a/xen/include/public/xenpmu.h +++ b/xen/include/public/xenpmu.h @@ -16,6 +16,7 @@ #define XENPMU_init 4 #define XENPMU_finish 5 #define XENPMU_lvtpc_set 6 +#define XENPMU_flush 7 /* Write cached MSR values to HW */ /* Parameters structure for HYPERVISOR_xenpmu_op call */ typedef struct xenpmu_params { -- 1.8.1.4
Boris Ostrovsky
2013-Sep-10 15:21 UTC
[PATCH v1 12/13] x86/PMU: Save VPMU state for PV guests during context switch
Save VPMU state during context switch for both HVM and PV guests unless we are in VPMU_DOM0 vpmu mode (i.e. dom0 is doing all profiling). Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- xen/arch/x86/domain.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 36f4192..e74ad5c 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1416,17 +1416,15 @@ void context_switch(struct vcpu *prev, struct vcpu *next) } if (prev != next) - update_runstate_area(prev); - - if ( is_hvm_vcpu(prev) ) { - if (prev != next) + update_runstate_area(prev); + if ( !(vpmu_mode & VPMU_PRIV) || prev->domain != dom0 ) vpmu_save(prev); - - if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) ) - pt_save_timer(prev); } + if ( is_hvm_vcpu(prev) && !list_empty(&prev->arch.hvm_vcpu.tm_list) ) + pt_save_timer(prev); + local_irq_disable(); set_current(next); @@ -1463,7 +1461,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next) (next->domain->domain_id != 0)); } - if (is_hvm_vcpu(next) && (prev != next) ) + if ( prev != next && !(vpmu_mode & VPMU_PRIV) ) /* Must be done with interrupts enabled */ vpmu_load(next); -- 1.8.1.4
Boris Ostrovsky
2013-Sep-10 15:21 UTC
[PATCH v1 13/13] x86/PMU: Move vpmu files up from hvm directory
Since VPMU is now used by both HVM and PV we should move it up from HVM subtree: xen/arch/x86/hvm/vpmu.c => xen/arch/x86/vpmu.c xen/arch/x86/hvm/vmx/vpmu_core2.c => xen/arch/x86/vpmu_intel.c xen/arch/x86/hvm/svm/vpmu.c => xen/arch/x86/vpmu_amd.c xen/include/asm-x86/hvm/vpmu.h => xen/include/asm-x86/vpmu.h No code changes (except for adjusting Makefiles and paths for #includes). Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- xen/arch/x86/Makefile | 1 + xen/arch/x86/hvm/Makefile | 1 - xen/arch/x86/hvm/svm/Makefile | 1 - xen/arch/x86/hvm/svm/vpmu.c | 486 ------------------ xen/arch/x86/hvm/vmx/Makefile | 1 - xen/arch/x86/hvm/vmx/vpmu_core2.c | 938 ---------------------------------- xen/arch/x86/hvm/vpmu.c | 545 -------------------- xen/arch/x86/oprofile/op_model_ppro.c | 2 +- xen/arch/x86/traps.c | 2 +- xen/arch/x86/vpmu.c | 545 ++++++++++++++++++++ xen/arch/x86/vpmu_amd.c | 486 ++++++++++++++++++ xen/arch/x86/vpmu_intel.c | 938 ++++++++++++++++++++++++++++++++++ xen/include/asm-x86/domain.h | 1 + xen/include/asm-x86/hvm/vmx/vmcs.h | 1 - xen/include/asm-x86/hvm/vpmu.h | 97 ---- xen/include/asm-x86/vpmu.h | 97 ++++ 16 files changed, 2070 insertions(+), 2072 deletions(-) delete mode 100644 xen/arch/x86/hvm/svm/vpmu.c delete mode 100644 xen/arch/x86/hvm/vmx/vpmu_core2.c delete mode 100644 xen/arch/x86/hvm/vpmu.c create mode 100644 xen/arch/x86/vpmu.c create mode 100644 xen/arch/x86/vpmu_amd.c create mode 100644 xen/arch/x86/vpmu_intel.c delete mode 100644 xen/include/asm-x86/hvm/vpmu.h create mode 100644 xen/include/asm-x86/vpmu.h diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile index a27ac44..47d067d 100644 --- a/xen/arch/x86/Makefile +++ b/xen/arch/x86/Makefile @@ -58,6 +58,7 @@ obj-y += crash.o obj-y += tboot.o obj-y += hpet.o obj-y += xstate.o +obj-y += vpmu.o vpmu_intel.o vpmu_amd.o obj-$(crash_debug) += gdbstub.o diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile index eea5555..742b83b 100644 --- a/xen/arch/x86/hvm/Makefile +++ b/xen/arch/x86/hvm/Makefile @@ -22,4 +22,3 @@ obj-y += vlapic.o obj-y += vmsi.o obj-y += vpic.o obj-y += vpt.o -obj-y += vpmu.o \ No newline at end of file diff --git a/xen/arch/x86/hvm/svm/Makefile b/xen/arch/x86/hvm/svm/Makefile index a10a55e..760d295 100644 --- a/xen/arch/x86/hvm/svm/Makefile +++ b/xen/arch/x86/hvm/svm/Makefile @@ -6,4 +6,3 @@ obj-y += nestedsvm.o obj-y += svm.o obj-y += svmdebug.o obj-y += vmcb.o -obj-y += vpmu.o diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c deleted file mode 100644 index 1815674..0000000 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ /dev/null @@ -1,486 +0,0 @@ -/* - * vpmu.c: PMU virtualization for HVM domain. - * - * Copyright (c) 2010, Advanced Micro Devices, Inc. - * Parts of this code are Copyright (c) 2007, Intel Corporation - * - * Author: Wei Wang <wei.wang2@amd.com> - * Tested by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> - * - * This program is free software; you can redistribute it and/or modify it - * under the terms and conditions of the GNU General Public License, - * version 2, as published by the Free Software Foundation. - * - * This program is distributed in the hope it will be useful, but WITHOUT - * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or - * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for - * more details. - * - * You should have received a copy of the GNU General Public License along with - * this program; if not, write to the Free Software Foundation, Inc., 59 Temple - * Place - Suite 330, Boston, MA 02111-1307 USA. - * - */ - -#include <xen/config.h> -#include <xen/xenoprof.h> -#include <xen/hvm/save.h> -#include <xen/sched.h> -#include <xen/irq.h> -#include <asm/apic.h> -#include <asm/hvm/vlapic.h> -#include <asm/hvm/vpmu.h> -#include <public/xenpmu.h> - -#define MSR_F10H_EVNTSEL_GO_SHIFT 40 -#define MSR_F10H_EVNTSEL_EN_SHIFT 22 -#define MSR_F10H_COUNTER_LENGTH 48 - -#define is_guest_mode(msr) ((msr) & (1ULL << MSR_F10H_EVNTSEL_GO_SHIFT)) -#define is_pmu_enabled(msr) ((msr) & (1ULL << MSR_F10H_EVNTSEL_EN_SHIFT)) -#define set_guest_mode(msr) (msr |= (1ULL << MSR_F10H_EVNTSEL_GO_SHIFT)) -#define is_overflowed(msr) (!((msr) & (1ULL << (MSR_F10H_COUNTER_LENGTH-1)))) - -static unsigned int __read_mostly num_counters; -static const u32 __read_mostly *counters; -static const u32 __read_mostly *ctrls; -static bool_t __read_mostly k7_counters_mirrored; - -/* PMU Counter MSRs. */ -static const u32 AMD_F10H_COUNTERS[] = { - MSR_K7_PERFCTR0, - MSR_K7_PERFCTR1, - MSR_K7_PERFCTR2, - MSR_K7_PERFCTR3 -}; - -/* PMU Control MSRs. */ -static const u32 AMD_F10H_CTRLS[] = { - MSR_K7_EVNTSEL0, - MSR_K7_EVNTSEL1, - MSR_K7_EVNTSEL2, - MSR_K7_EVNTSEL3 -}; - -static const u32 AMD_F15H_COUNTERS[] = { - MSR_AMD_FAM15H_PERFCTR0, - MSR_AMD_FAM15H_PERFCTR1, - MSR_AMD_FAM15H_PERFCTR2, - MSR_AMD_FAM15H_PERFCTR3, - MSR_AMD_FAM15H_PERFCTR4, - MSR_AMD_FAM15H_PERFCTR5 -}; - -static const u32 AMD_F15H_CTRLS[] = { - MSR_AMD_FAM15H_EVNTSEL0, - MSR_AMD_FAM15H_EVNTSEL1, - MSR_AMD_FAM15H_EVNTSEL2, - MSR_AMD_FAM15H_EVNTSEL3, - MSR_AMD_FAM15H_EVNTSEL4, - MSR_AMD_FAM15H_EVNTSEL5 -}; - -static inline int get_pmu_reg_type(u32 addr) -{ - if ( (addr >= MSR_K7_EVNTSEL0) && (addr <= MSR_K7_EVNTSEL3) ) - return MSR_TYPE_CTRL; - - if ( (addr >= MSR_K7_PERFCTR0) && (addr <= MSR_K7_PERFCTR3) ) - return MSR_TYPE_COUNTER; - - if ( (addr >= MSR_AMD_FAM15H_EVNTSEL0) && - (addr <= MSR_AMD_FAM15H_PERFCTR5 ) ) - { - if (addr & 1) - return MSR_TYPE_COUNTER; - else - return MSR_TYPE_CTRL; - } - - /* unsupported registers */ - return -1; -} - -static inline u32 get_fam15h_addr(u32 addr) -{ - switch ( addr ) - { - case MSR_K7_PERFCTR0: - return MSR_AMD_FAM15H_PERFCTR0; - case MSR_K7_PERFCTR1: - return MSR_AMD_FAM15H_PERFCTR1; - case MSR_K7_PERFCTR2: - return MSR_AMD_FAM15H_PERFCTR2; - case MSR_K7_PERFCTR3: - return MSR_AMD_FAM15H_PERFCTR3; - case MSR_K7_EVNTSEL0: - return MSR_AMD_FAM15H_EVNTSEL0; - case MSR_K7_EVNTSEL1: - return MSR_AMD_FAM15H_EVNTSEL1; - case MSR_K7_EVNTSEL2: - return MSR_AMD_FAM15H_EVNTSEL2; - case MSR_K7_EVNTSEL3: - return MSR_AMD_FAM15H_EVNTSEL3; - default: - break; - } - - return addr; -} - -static void amd_vpmu_set_msr_bitmap(struct vcpu *v) -{ - unsigned int i; - struct vpmu_struct *vpmu = vcpu_vpmu(v); - struct amd_vpmu_context *ctxt = vpmu->context; - - for ( i = 0; i < num_counters; i++ ) - { - svm_intercept_msr(v, counters[i], MSR_INTERCEPT_NONE); - svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_WRITE); - } - - ctxt->msr_bitmap_set = 1; -} - -static void amd_vpmu_unset_msr_bitmap(struct vcpu *v) -{ - unsigned int i; - struct vpmu_struct *vpmu = vcpu_vpmu(v); - struct amd_vpmu_context *ctxt = vpmu->context; - - for ( i = 0; i < num_counters; i++ ) - { - svm_intercept_msr(v, counters[i], MSR_INTERCEPT_RW); - svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_RW); - } - - ctxt->msr_bitmap_set = 0; -} - -static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs) -{ - return 1; -} - -static inline void context_load(struct vcpu *v) -{ - unsigned int i; - struct vpmu_struct *vpmu = vcpu_vpmu(v); - struct amd_vpmu_context *ctxt = vpmu->context; - - for ( i = 0; i < num_counters; i++ ) - { - wrmsrl(counters[i], ctxt->counters[i]); - wrmsrl(ctrls[i], ctxt->ctrls[i]); - } -} - -static void amd_vpmu_load(struct vcpu *v) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(v); - struct amd_vpmu_context *ctxt = vpmu->context; - - vpmu_reset(vpmu, VPMU_FROZEN); - - if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) - { - unsigned int i; - - for ( i = 0; i < num_counters; i++ ) - wrmsrl(ctrls[i], ctxt->ctrls[i]); - - return; - } - - context_load(v); -} - -static inline void context_save(struct vcpu *v) -{ - unsigned int i; - struct vpmu_struct *vpmu = vcpu_vpmu(v); - struct amd_vpmu_context *ctxt = vpmu->context; - - /* No need to save controls -- they are saved in amd_vpmu_do_wrmsr */ - for ( i = 0; i < num_counters; i++ ) - rdmsrl(counters[i], ctxt->counters[i]); -} - -static int amd_vpmu_save(struct vcpu *v) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(v); - struct amd_vpmu_context *ctx = vpmu->context; - unsigned int i; - - if ( !vpmu_is_set(vpmu, VPMU_FROZEN) ) - { - for ( i = 0; i < num_counters; i++ ) - wrmsrl(ctrls[i], 0); - - vpmu_set(vpmu, VPMU_FROZEN); - } - - if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) ) - return 0; - - context_save(v); - - if ( is_hvm_domain(v->domain) && - !vpmu_is_set(vpmu, VPMU_RUNNING) && ctx->msr_bitmap_set ) - amd_vpmu_unset_msr_bitmap(v); - - return 1; -} - -static void context_update(unsigned int msr, u64 msr_content) -{ - unsigned int i; - struct vcpu *v = current; - struct vpmu_struct *vpmu = vcpu_vpmu(v); - struct amd_vpmu_context *ctxt = vpmu->context; - - if ( k7_counters_mirrored && - ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)) ) - { - msr = get_fam15h_addr(msr); - } - - for ( i = 0; i < num_counters; i++ ) - { - if ( msr == ctrls[i] ) - { - ctxt->ctrls[i] = msr_content; - return; - } - else if (msr == counters[i] ) - { - ctxt->counters[i] = msr_content; - return; - } - } -} - -static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) -{ - struct vcpu *v = current; - struct vpmu_struct *vpmu = vcpu_vpmu(v); - - /* For all counters, enable guest only mode for HVM guest */ - if ( is_hvm_domain(v->domain) && (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) && - !(is_guest_mode(msr_content)) ) - { - set_guest_mode(msr_content); - } - - /* check if the first counter is enabled */ - if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) && - is_pmu_enabled(msr_content) && !vpmu_is_set(vpmu, VPMU_RUNNING) ) - { - if ( !acquire_pmu_ownership(PMU_OWNER_HVM) ) - return 1; - vpmu_set(vpmu, VPMU_RUNNING); - apic_write(APIC_LVTPC, APIC_DM_NMI); - vpmu->hw_lapic_lvtpc = APIC_DM_NMI; - - if ( is_hvm_domain(v->domain) && - !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set ) - amd_vpmu_set_msr_bitmap(v); - } - - /* stop saving & restore if guest stops first counter */ - if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) && - (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu, VPMU_RUNNING) ) - { - apic_write(APIC_LVTPC, APIC_DM_NMI | APIC_LVT_MASKED); - vpmu->hw_lapic_lvtpc = APIC_DM_NMI | APIC_LVT_MASKED; - vpmu_reset(vpmu, VPMU_RUNNING); - if ( is_hvm_domain(v->domain) && - ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set ) - amd_vpmu_unset_msr_bitmap(v); - release_pmu_ownship(PMU_OWNER_HVM); - } - - if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) - || vpmu_is_set(vpmu, VPMU_FROZEN) ) - { - context_load(v); - vpmu_set(vpmu, VPMU_CONTEXT_LOADED); - vpmu_reset(vpmu, VPMU_FROZEN); - } - - /* Update vpmu context immediately */ - context_update(msr, msr_content); - - /* Write to hw counters */ - wrmsrl(msr, msr_content); - return 1; -} - -static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) -{ - struct vcpu *v = current; - struct vpmu_struct *vpmu = vcpu_vpmu(v); - - if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) - || vpmu_is_set(vpmu, VPMU_FROZEN) ) - { - context_load(v); - vpmu_set(vpmu, VPMU_CONTEXT_LOADED); - vpmu_reset(vpmu, VPMU_FROZEN); - } - - rdmsrl(msr, *msr_content); - - return 1; -} - -static int amd_vpmu_initialise(struct vcpu *v) -{ - struct amd_vpmu_context *ctxt; - struct vpmu_struct *vpmu = vcpu_vpmu(v); - uint8_t family = current_cpu_data.x86; - - if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) - return 0; - - if ( counters == NULL ) - { - switch ( family ) - { - case 0x15: - num_counters = F15H_NUM_COUNTERS; - counters = AMD_F15H_COUNTERS; - ctrls = AMD_F15H_CTRLS; - k7_counters_mirrored = 1; - break; - case 0x10: - case 0x12: - case 0x14: - case 0x16: - default: - num_counters = F10H_NUM_COUNTERS; - counters = AMD_F10H_COUNTERS; - ctrls = AMD_F10H_CTRLS; - k7_counters_mirrored = 0; - break; - } - } - - if ( is_hvm_domain(v->domain) ) - { - ctxt = xzalloc(struct amd_vpmu_context); - if ( !ctxt ) - { - gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, " - " PMU feature is unavailable on domain %d vcpu %d.\n", - v->vcpu_id, v->domain->domain_id); - return -ENOMEM; - } - } - else - ctxt = &v->arch.vpmu.xenpmu_data->pmu.amd; - - vpmu->context = ctxt; - vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED); - return 0; -} - -static void amd_vpmu_destroy(struct vcpu *v) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(v); - - if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) - return; - - if ( is_hvm_domain(v->domain) ) - { - if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set ) - amd_vpmu_unset_msr_bitmap(v); - - xfree(vpmu->context); - release_pmu_ownship(PMU_OWNER_HVM); - } - - vpmu->context = NULL; - vpmu_clear(vpmu); -} - -/* VPMU part of the ''q'' keyhandler */ -static void amd_vpmu_dump(struct vcpu *v) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(v); - struct amd_vpmu_context *ctxt = vpmu->context; - unsigned int i; - - printk(" VPMU state: 0x%x ", vpmu->flags); - if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) - { - printk("\n"); - return; - } - - printk("("); - if ( vpmu_is_set(vpmu, VPMU_PASSIVE_DOMAIN_ALLOCATED) ) - printk("PASSIVE_DOMAIN_ALLOCATED, "); - if ( vpmu_is_set(vpmu, VPMU_FROZEN) ) - printk("FROZEN, "); - if ( vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) ) - printk("SAVE, "); - if ( vpmu_is_set(vpmu, VPMU_RUNNING) ) - printk("RUNNING, "); - if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) - printk("LOADED, "); - printk("ALLOCATED)\n"); - - for ( i = 0; i < num_counters; i++ ) - { - uint64_t ctrl, cntr; - - rdmsrl(ctrls[i], ctrl); - rdmsrl(counters[i], cntr); - printk(" 0x%08x: 0x%lx (0x%lx in HW) 0x%08x: 0x%lx (0x%lx in HW)\n", - ctrls[i], ctxt->ctrls[i], ctrl, - counters[i], ctxt->counters[i], cntr); - } -} - -struct arch_vpmu_ops amd_vpmu_ops = { - .do_wrmsr = amd_vpmu_do_wrmsr, - .do_rdmsr = amd_vpmu_do_rdmsr, - .do_interrupt = amd_vpmu_do_interrupt, - .arch_vpmu_destroy = amd_vpmu_destroy, - .arch_vpmu_save = amd_vpmu_save, - .arch_vpmu_load = amd_vpmu_load, - .arch_vpmu_dump = amd_vpmu_dump -}; - -int svm_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(v); - uint8_t family = current_cpu_data.x86; - int ret = 0; - - /* vpmu enabled? */ - if ( vpmu_flags == VPMU_OFF ) - return 0; - - switch ( family ) - { - case 0x10: - case 0x12: - case 0x14: - case 0x15: - case 0x16: - ret = amd_vpmu_initialise(v); - if ( !ret ) - vpmu->arch_vpmu_ops = &amd_vpmu_ops; - return ret; - } - - printk("VPMU: Initialization failed. " - "AMD processor family %d has not " - "been supported\n", family); - return -EINVAL; -} - diff --git a/xen/arch/x86/hvm/vmx/Makefile b/xen/arch/x86/hvm/vmx/Makefile index 373b3d9..04a29ce 100644 --- a/xen/arch/x86/hvm/vmx/Makefile +++ b/xen/arch/x86/hvm/vmx/Makefile @@ -3,5 +3,4 @@ obj-y += intr.o obj-y += realmode.o obj-y += vmcs.o obj-y += vmx.o -obj-y += vpmu_core2.o obj-y += vvmx.o diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c deleted file mode 100644 index 3f5941a..0000000 --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c +++ /dev/null @@ -1,938 +0,0 @@ -/* - * vpmu_core2.c: CORE 2 specific PMU virtualization for HVM domain. - * - * Copyright (c) 2007, Intel Corporation. - * - * This program is free software; you can redistribute it and/or modify it - * under the terms and conditions of the GNU General Public License, - * version 2, as published by the Free Software Foundation. - * - * This program is distributed in the hope it will be useful, but WITHOUT - * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or - * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for - * more details. - * - * You should have received a copy of the GNU General Public License along with - * this program; if not, write to the Free Software Foundation, Inc., 59 Temple - * Place - Suite 330, Boston, MA 02111-1307 USA. - * - * Author: Haitao Shan <haitao.shan@intel.com> - */ - -#include <xen/config.h> -#include <xen/sched.h> -#include <xen/xenoprof.h> -#include <xen/irq.h> -#include <asm/system.h> -#include <asm/regs.h> -#include <asm/types.h> -#include <asm/apic.h> -#include <asm/traps.h> -#include <asm/msr.h> -#include <asm/msr-index.h> -#include <asm/hvm/support.h> -#include <asm/hvm/vlapic.h> -#include <asm/hvm/vmx/vmx.h> -#include <asm/hvm/vmx/vmcs.h> -#include <public/sched.h> -#include <public/hvm/save.h> -#include <public/xenpmu.h> -#include <asm/hvm/vpmu.h> - -/* - * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID - * instruction. - * cpuid 0xa - Architectural Performance Monitoring Leaf - * Register eax - */ -#define PMU_VERSION_SHIFT 0 /* Version ID */ -#define PMU_VERSION_BITS 8 /* 8 bits 0..7 */ -#define PMU_VERSION_MASK (((1 << PMU_VERSION_BITS) - 1) << PMU_VERSION_SHIFT) - -#define PMU_GENERAL_NR_SHIFT 8 /* Number of general pmu registers */ -#define PMU_GENERAL_NR_BITS 8 /* 8 bits 8..15 */ -#define PMU_GENERAL_NR_MASK (((1 << PMU_GENERAL_NR_BITS) - 1) << PMU_GENERAL_NR_SHIFT) - -#define PMU_GENERAL_WIDTH_SHIFT 16 /* Width of general pmu registers */ -#define PMU_GENERAL_WIDTH_BITS 8 /* 8 bits 16..23 */ -#define PMU_GENERAL_WIDTH_MASK (((1 << PMU_GENERAL_WIDTH_BITS) - 1) << PMU_GENERAL_WIDTH_SHIFT) -/* Register edx */ -#define PMU_FIXED_NR_SHIFT 0 /* Number of fixed pmu registers */ -#define PMU_FIXED_NR_BITS 5 /* 5 bits 0..4 */ -#define PMU_FIXED_NR_MASK (((1 << PMU_FIXED_NR_BITS) -1) << PMU_FIXED_NR_SHIFT) - -#define PMU_FIXED_WIDTH_SHIFT 5 /* Width of fixed pmu registers */ -#define PMU_FIXED_WIDTH_BITS 8 /* 8 bits 5..12 */ -#define PMU_FIXED_WIDTH_MASK (((1 << PMU_FIXED_WIDTH_BITS) -1) << PMU_FIXED_WIDTH_SHIFT) - - -/* Intel-specific VPMU features */ -#define VPMU_CPU_HAS_DS 0x100 /* Has Debug Store */ -#define VPMU_CPU_HAS_BTS 0x200 /* Has Branch Trace Store */ - -/* - * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed - * counters. 4 bits for every counter. - */ -#define FIXED_CTR_CTRL_BITS 4 -#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1) - -/* The index into the core2_ctrls_msr[] of this MSR used in core2_vpmu_dump() */ -#define MSR_CORE_PERF_FIXED_CTR_CTRL_IDX 0 - -static int arch_pmc_cnt; /* Number of general-purpose performance counters */ - -/* - * QUIRK to workaround an issue on various family 6 cpus. - * The issue leads to endless PMC interrupt loops on the processor. - * If the interrupt handler is running and a pmc reaches the value 0, this - * value remains forever and it triggers immediately a new interrupt after - * finishing the handler. - * A workaround is to read all flagged counters and if the value is 0 write - * 1 (or another value != 0) into it. - * There exist no errata and the real cause of this behaviour is unknown. - */ -bool_t __read_mostly is_pmc_quirk; - -static void check_pmc_quirk(void) -{ - if ( current_cpu_data.x86 == 6 ) - is_pmc_quirk = 1; - else - is_pmc_quirk = 0; -} - -static void handle_pmc_quirk(u64 msr_content) -{ - int i; - u64 val; - - if ( !is_pmc_quirk ) - return; - - val = msr_content; - for ( i = 0; i < arch_pmc_cnt; i++ ) - { - if ( val & 0x1 ) - { - u64 cnt; - rdmsrl(MSR_P6_PERFCTR0 + i, cnt); - if ( cnt == 0 ) - wrmsrl(MSR_P6_PERFCTR0 + i, 1); - } - val >>= 1; - } - val = msr_content >> 32; - for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) - { - if ( val & 0x1 ) - { - u64 cnt; - rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, cnt); - if ( cnt == 0 ) - wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, 1); - } - val >>= 1; - } -} - -/* - * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15] - */ -static int core2_get_pmc_count(void) -{ - u32 eax, ebx, ecx, edx; - - cpuid(0xa, &eax, &ebx, &ecx, &edx); - return ( (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT ); -} - -static u64 core2_calc_intial_glb_ctrl_msr(void) -{ - int arch_pmc_bits = (1 << arch_pmc_cnt) - 1; - u64 fix_pmc_bits = (1 << VPMU_CORE2_NUM_FIXED) - 1; - return ((fix_pmc_bits << 32) | arch_pmc_bits); -} - -/* edx bits 5-12: Bit width of fixed-function performance counters */ -static int core2_get_bitwidth_fix_count(void) -{ - u32 eax, ebx, ecx, edx; - - cpuid(0xa, &eax, &ebx, &ecx, &edx); - return ((edx & PMU_FIXED_WIDTH_MASK) >> PMU_FIXED_WIDTH_SHIFT); -} - -static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index) -{ - int i; - - for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) - { - if ( core2_fix_counters_msr[i] == msr_index ) - { - *type = MSR_TYPE_COUNTER; - *index = i; - return 1; - } - } - - for ( i = 0; i < VPMU_CORE2_NUM_CTRLS; i++ ) - { - if ( core2_ctrls_msr[i] == msr_index ) - { - *type = MSR_TYPE_CTRL; - *index = i; - return 1; - } - } - - if ( (msr_index == MSR_CORE_PERF_GLOBAL_CTRL) || - (msr_index == MSR_CORE_PERF_GLOBAL_STATUS) || - (msr_index == MSR_CORE_PERF_GLOBAL_OVF_CTRL) ) - { - *type = MSR_TYPE_GLOBAL; - return 1; - } - - if ( (msr_index >= MSR_IA32_PERFCTR0) && - (msr_index < (MSR_IA32_PERFCTR0 + arch_pmc_cnt)) ) - { - *type = MSR_TYPE_ARCH_COUNTER; - *index = msr_index - MSR_IA32_PERFCTR0; - return 1; - } - - if ( (msr_index >= MSR_P6_EVNTSEL0) && - (msr_index < (MSR_P6_EVNTSEL0 + arch_pmc_cnt)) ) - { - *type = MSR_TYPE_ARCH_CTRL; - *index = msr_index - MSR_P6_EVNTSEL0; - return 1; - } - - return 0; -} - -#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000) -static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap) -{ - int i; - - /* Allow Read/Write PMU Counters MSR Directly. */ - for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) - { - clear_bit(msraddr_to_bitpos(core2_fix_counters_msr[i]), msr_bitmap); - clear_bit(msraddr_to_bitpos(core2_fix_counters_msr[i]), - msr_bitmap + 0x800/BYTES_PER_LONG); - } - for ( i = 0; i < arch_pmc_cnt; i++ ) - { - clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap); - clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), - msr_bitmap + 0x800/BYTES_PER_LONG); - } - - /* Allow Read PMU Non-global Controls Directly. */ - for ( i = 0; i < VPMU_CORE2_NUM_CTRLS; i++ ) - clear_bit(msraddr_to_bitpos(core2_ctrls_msr[i]), msr_bitmap); - for ( i = 0; i < arch_pmc_cnt; i++ ) - clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap); -} - -static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap) -{ - int i; - - for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) - { - set_bit(msraddr_to_bitpos(core2_fix_counters_msr[i]), msr_bitmap); - set_bit(msraddr_to_bitpos(core2_fix_counters_msr[i]), - msr_bitmap + 0x800/BYTES_PER_LONG); - } - for ( i = 0; i < arch_pmc_cnt; i++ ) - { - set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap); - set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), - msr_bitmap + 0x800/BYTES_PER_LONG); - } - for ( i = 0; i < VPMU_CORE2_NUM_CTRLS; i++ ) - set_bit(msraddr_to_bitpos(core2_ctrls_msr[i]), msr_bitmap); - for ( i = 0; i < arch_pmc_cnt; i++ ) - set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap); -} - -static inline void __core2_vpmu_save(struct vcpu *v) -{ - int i; - struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context; - - for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) - rdmsrl(core2_fix_counters_msr[i], core2_vpmu_cxt->fix_counters[i]); - for ( i = 0; i < arch_pmc_cnt; i++ ) - rdmsrl(MSR_IA32_PERFCTR0+i, core2_vpmu_cxt->arch_msr_pair[i].counter); - - if ( !is_hvm_domain(v->domain) ) - rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status); -} - -static int core2_vpmu_save(struct vcpu *v) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(v); - - if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) ) - return 0; - - if ( !is_hvm_domain(v->domain) ) - wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0); - - __core2_vpmu_save(v); - - /* Unset PMU MSR bitmap to trap lazy load. */ - if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && cpu_has_vmx_msr_bitmap - && is_hvm_domain(v->domain) ) - core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap); - - return 1; -} - -static inline void __core2_vpmu_load(struct vcpu *v) -{ - int i; - struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context; - - for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) - wrmsrl(core2_fix_counters_msr[i], core2_vpmu_cxt->fix_counters[i]); - for ( i = 0; i < arch_pmc_cnt; i++ ) - wrmsrl(MSR_IA32_PERFCTR0+i, core2_vpmu_cxt->arch_msr_pair[i].counter); - - for ( i = 0; i < VPMU_CORE2_NUM_CTRLS; i++ ) - wrmsrl(core2_ctrls_msr[i], core2_vpmu_cxt->ctrls[i]); - for ( i = 0; i < arch_pmc_cnt; i++ ) - wrmsrl(MSR_P6_EVNTSEL0+i, core2_vpmu_cxt->arch_msr_pair[i].control); - - if ( !is_hvm_domain(v->domain) ) - { - wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, core2_vpmu_cxt->global_ovf_ctrl); - wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl); - } -} - -static void core2_vpmu_load(struct vcpu *v) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(v); - - if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) - return; - - __core2_vpmu_load(v); -} - -static int core2_vpmu_alloc_resource(struct vcpu *v) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(v); - struct core2_vpmu_context *core2_vpmu_cxt; - struct core2_pmu_enable *pmu_enable = NULL; - - pmu_enable = xzalloc_bytes(sizeof(struct core2_pmu_enable)); - if ( !pmu_enable ) - return 0; - - if ( is_hvm_domain(v->domain) ) - { - if ( !acquire_pmu_ownership(PMU_OWNER_HVM) ) - goto out_err; - - wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0); - if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) ) - goto out_err; - - if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) ) - goto out_err; - vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, - core2_calc_intial_glb_ctrl_msr()); - - core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context)); - if ( !core2_vpmu_cxt ) - goto out_err; - } - else - { - core2_vpmu_cxt = &v->arch.vpmu.xenpmu_data->pmu.intel; - vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED); - } - - core2_vpmu_cxt->pmu_enable = pmu_enable; - vpmu->context = (void *)core2_vpmu_cxt; - - vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED); - - return 1; - -out_err: - xfree(pmu_enable); - vmx_rm_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL); - vmx_rm_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL); - release_pmu_ownship(PMU_OWNER_HVM); - - printk("Failed to allocate VPMU resources for domain %u vcpu %u\n", - v->vcpu_id, v->domain->domain_id); - - return 0; -} - -static void core2_vpmu_save_msr_context(struct vcpu *v, int type, - int index, u64 msr_data) -{ - struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context; - - switch ( type ) - { - case MSR_TYPE_CTRL: - core2_vpmu_cxt->ctrls[index] = msr_data; - break; - case MSR_TYPE_ARCH_CTRL: - core2_vpmu_cxt->arch_msr_pair[index].control = msr_data; - break; - } -} - -static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(current); - - if ( !is_core2_vpmu_msr(msr_index, type, index) ) - return 0; - - if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) && - !core2_vpmu_alloc_resource(current) ) - return 0; - - /* Do the lazy load staff. */ - if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) - { - __core2_vpmu_load(current); - vpmu_set(vpmu, VPMU_CONTEXT_LOADED); - if ( cpu_has_vmx_msr_bitmap && is_hvm_domain(current->domain) ) - core2_vpmu_set_msr_bitmap(current->arch.hvm_vmx.msr_bitmap); - } - return 1; -} - -static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) -{ - u64 global_ctrl, non_global_ctrl; - char pmu_enable = 0; - int i, tmp; - int type = -1, index = -1; - struct vcpu *v = current; - struct vpmu_struct *vpmu = vcpu_vpmu(v); - struct core2_vpmu_context *core2_vpmu_cxt = NULL; - - if ( !core2_vpmu_msr_common_check(msr, &type, &index) ) - { - /* Special handling for BTS */ - if ( msr == MSR_IA32_DEBUGCTLMSR ) - { - uint64_t supported = IA32_DEBUGCTLMSR_TR | IA32_DEBUGCTLMSR_BTS | - IA32_DEBUGCTLMSR_BTINT; - - if ( cpu_has(¤t_cpu_data, X86_FEATURE_DSCPL) ) - supported |= IA32_DEBUGCTLMSR_BTS_OFF_OS | - IA32_DEBUGCTLMSR_BTS_OFF_USR; - if ( msr_content & supported ) - { - if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) ) - return 1; - gdprintk(XENLOG_WARNING, "Debug Store is not supported on this cpu\n"); - - if ( is_hvm_domain(v->domain) ) - hvm_inject_hw_exception(TRAP_gp_fault, 0); - else - send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault); - - return 0; - } - } - return 0; - } - - core2_vpmu_cxt = vpmu->context; - switch ( msr ) - { - case MSR_CORE_PERF_GLOBAL_OVF_CTRL: - core2_vpmu_cxt->global_ovf_status &= ~msr_content; - core2_vpmu_cxt->global_ovf_ctrl = msr_content; - return 1; - case MSR_CORE_PERF_GLOBAL_STATUS: - gdprintk(XENLOG_INFO, "Can not write readonly MSR: " - "MSR_PERF_GLOBAL_STATUS(0x38E)!\n"); - if ( is_hvm_domain(v->domain) ) - hvm_inject_hw_exception(TRAP_gp_fault, 0); - else - send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault); - return 1; - case MSR_IA32_PEBS_ENABLE: - if ( msr_content & 1 ) - gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS, " - "which is not supported.\n"); - return 1; - case MSR_IA32_DS_AREA: - if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) ) - { - if ( !is_canonical_address(msr_content) ) - { - gdprintk(XENLOG_WARNING, - "Illegal address for IA32_DS_AREA: %#" PRIx64 "x\n", - msr_content); - if ( is_hvm_domain(v->domain) ) - hvm_inject_hw_exception(TRAP_gp_fault, 0); - else - send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault); - return 1; - } - core2_vpmu_cxt->pmu_enable->ds_area_enable = msr_content ? 1 : 0; - break; - } - gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n"); - return 1; - case MSR_CORE_PERF_GLOBAL_CTRL: - global_ctrl = msr_content; - for ( i = 0; i < arch_pmc_cnt; i++ ) - { - rdmsrl(MSR_P6_EVNTSEL0+i, non_global_ctrl); - core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i] - global_ctrl & (non_global_ctrl >> 22) & 1; - global_ctrl >>= 1; - } - - rdmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, non_global_ctrl); - global_ctrl = msr_content >> 32; - for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) - { - core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] - (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1: 0); - non_global_ctrl >>= FIXED_CTR_CTRL_BITS; - global_ctrl >>= 1; - } - break; - case MSR_CORE_PERF_FIXED_CTR_CTRL: - non_global_ctrl = msr_content; - if ( is_hvm_domain(v->domain) ) - vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl); - else - rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl); - global_ctrl >>= 32; - for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) - { - core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] - (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1: 0); - non_global_ctrl >>= 4; - global_ctrl >>= 1; - } - break; - default: - tmp = msr - MSR_P6_EVNTSEL0; - if ( is_hvm_domain(v->domain) ) - vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl); - else - rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl); - if ( tmp >= 0 && tmp < arch_pmc_cnt ) - core2_vpmu_cxt->pmu_enable->arch_pmc_enable[tmp] - (global_ctrl >> tmp) & (msr_content >> 22) & 1; - } - - for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) - pmu_enable |= core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i]; - for ( i = 0; i < arch_pmc_cnt; i++ ) - pmu_enable |= core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i]; - pmu_enable |= core2_vpmu_cxt->pmu_enable->ds_area_enable; - if ( pmu_enable ) - vpmu_set(vpmu, VPMU_RUNNING); - else - vpmu_reset(vpmu, VPMU_RUNNING); - - if ( is_hvm_domain(v->domain) ) - { - /* Setup LVTPC in local apic */ - if ( vpmu_is_set(vpmu, VPMU_RUNNING) && - is_vlapic_lvtpc_enabled(vcpu_vlapic(v)) ) - { - apic_write_around(APIC_LVTPC, APIC_DM_NMI); - vpmu->hw_lapic_lvtpc = APIC_DM_NMI; - } - else - { - apic_write_around(APIC_LVTPC, APIC_DM_NMI | APIC_LVT_MASKED); - vpmu->hw_lapic_lvtpc = APIC_DM_NMI | APIC_LVT_MASKED; - } - } - - core2_vpmu_save_msr_context(v, type, index, msr_content); - if ( type != MSR_TYPE_GLOBAL ) - { - u64 mask; - int inject_gp = 0; - switch ( type ) - { - case MSR_TYPE_ARCH_CTRL: /* MSR_P6_EVNTSEL[0,...] */ - mask = ~((1ull << 32) - 1); - if (msr_content & mask) - inject_gp = 1; - break; - case MSR_TYPE_CTRL: /* IA32_FIXED_CTR_CTRL */ - if ( msr == MSR_IA32_DS_AREA ) - break; - /* 4 bits per counter, currently 3 fixed counters implemented. */ - mask = ~((1ull << (VPMU_CORE2_NUM_FIXED * FIXED_CTR_CTRL_BITS)) - 1); - if (msr_content & mask) - inject_gp = 1; - break; - case MSR_TYPE_COUNTER: /* IA32_FIXED_CTR[0-2] */ - mask = ~((1ull << core2_get_bitwidth_fix_count()) - 1); - if (msr_content & mask) - inject_gp = 1; - break; - } - - if (inject_gp) - { - if ( is_hvm_domain(v->domain) ) - hvm_inject_hw_exception(TRAP_gp_fault, 0); - else - send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault); - } - else - wrmsrl(msr, msr_content); - } - else - { - if ( is_hvm_domain(v->domain) ) - vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content); - else - { - wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content); - core2_vpmu_cxt->global_ctrl = msr_content; - } - } - - return 1; -} - -static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) -{ - int type = -1, index = -1; - struct vcpu *v = current; - struct vpmu_struct *vpmu = vcpu_vpmu(v); - struct core2_vpmu_context *core2_vpmu_cxt = NULL; - - if ( core2_vpmu_msr_common_check(msr, &type, &index) ) - { - core2_vpmu_cxt = vpmu->context; - switch ( msr ) - { - case MSR_CORE_PERF_GLOBAL_OVF_CTRL: - *msr_content = 0; - break; - case MSR_CORE_PERF_GLOBAL_STATUS: - *msr_content = core2_vpmu_cxt->global_ovf_status; - break; - case MSR_CORE_PERF_GLOBAL_CTRL: - if ( is_hvm_domain(v->domain) ) - vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content); - else - rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content); - break; - default: - rdmsrl(msr, *msr_content); - } - } - else - { - /* Extension for BTS */ - if ( msr == MSR_IA32_MISC_ENABLE ) - { - if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) ) - *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL; - } - else - return 0; - } - - return 1; -} - -static void core2_vpmu_do_cpuid(unsigned int input, - unsigned int *eax, unsigned int *ebx, - unsigned int *ecx, unsigned int *edx) -{ - if (input == 0x1) - { - struct vpmu_struct *vpmu = vcpu_vpmu(current); - - if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) ) - { - /* Switch on the ''Debug Store'' feature in CPUID.EAX[1]:EDX[21] */ - *edx |= cpufeat_mask(X86_FEATURE_DS); - if ( cpu_has(¤t_cpu_data, X86_FEATURE_DTES64) ) - *ecx |= cpufeat_mask(X86_FEATURE_DTES64); - if ( cpu_has(¤t_cpu_data, X86_FEATURE_DSCPL) ) - *ecx |= cpufeat_mask(X86_FEATURE_DSCPL); - } - } -} - -/* Dump vpmu info on console, called in the context of keyhandler ''q''. */ -static void core2_vpmu_dump(struct vcpu *v) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(v); - int i; - struct core2_vpmu_context *core2_vpmu_cxt = NULL; - u64 val; - - if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) - return; - - if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ) - { - if ( vpmu_set(vpmu, VPMU_CONTEXT_LOADED) ) - printk(" vPMU loaded\n"); - else - printk(" vPMU allocated\n"); - return; - } - - printk(" vPMU running\n"); - core2_vpmu_cxt = vpmu->context; - - /* Print the contents of the counter and its configuration msr. */ - for ( i = 0; i < arch_pmc_cnt; i++ ) - { - struct arch_msr_pair* msr_pair = core2_vpmu_cxt->arch_msr_pair; - if ( core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i] ) - printk(" general_%d: 0x%016lx ctrl: 0x%016lx\n", - i, msr_pair[i].counter, msr_pair[i].control); - } - /* - * The configuration of the fixed counter is 4 bits each in the - * MSR_CORE_PERF_FIXED_CTR_CTRL. - */ - val = core2_vpmu_cxt->ctrls[MSR_CORE_PERF_FIXED_CTR_CTRL_IDX]; - for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) - { - if ( core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] ) - printk(" fixed_%d: 0x%016lx ctrl: 0x%lx\n", - i, core2_vpmu_cxt->fix_counters[i], - val & FIXED_CTR_CTRL_MASK); - val >>= FIXED_CTR_CTRL_BITS; - } -} - -static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs) -{ - struct vcpu *v = current; - u64 msr_content; - struct vpmu_struct *vpmu = vcpu_vpmu(v); - struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context; - - rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); - if ( msr_content ) - { - if ( is_pmc_quirk ) - handle_pmc_quirk(msr_content); - core2_vpmu_cxt->global_ovf_status |= msr_content; - msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1); - wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); - } - else - { - /* No PMC overflow but perhaps a Trace Message interrupt. */ - msr_content = __vmread(GUEST_IA32_DEBUGCTL); - if ( !(msr_content & IA32_DEBUGCTLMSR_TR) ) - return 0; - } - - /* HW sets the MASK bit when performance counter interrupt occurs*/ - vpmu->hw_lapic_lvtpc = apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED; - apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc); - - return 1; -} - -static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(v); - u64 msr_content; - struct cpuinfo_x86 *c = ¤t_cpu_data; - - if ( !(vpmu_flags & VPMU_INTEL_BTS) ) - goto func_out; - /* Check the ''Debug Store'' feature in the CPUID.EAX[1]:EDX[21] */ - if ( cpu_has(c, X86_FEATURE_DS) ) - { - if ( !cpu_has(c, X86_FEATURE_DTES64) ) - { - printk(XENLOG_G_WARNING "CPU doesn''t support 64-bit DS Area" - " - Debug Store disabled for d%d:v%d\n", - v->domain->domain_id, v->vcpu_id); - goto func_out; - } - vpmu_set(vpmu, VPMU_CPU_HAS_DS); - rdmsrl(MSR_IA32_MISC_ENABLE, msr_content); - if ( msr_content & MSR_IA32_MISC_ENABLE_BTS_UNAVAIL ) - { - /* If BTS_UNAVAIL is set reset the DS feature. */ - vpmu_reset(vpmu, VPMU_CPU_HAS_DS); - printk(XENLOG_G_WARNING "CPU has set BTS_UNAVAIL" - " - Debug Store disabled for d%d:v%d\n", - v->domain->domain_id, v->vcpu_id); - } - else - { - vpmu_set(vpmu, VPMU_CPU_HAS_BTS); - if ( !cpu_has(c, X86_FEATURE_DSCPL) ) - printk(XENLOG_G_INFO - "vpmu: CPU doesn''t support CPL-Qualified BTS\n"); - printk("******************************************************\n"); - printk("** WARNING: Emulation of BTS Feature is switched on **\n"); - printk("** Using this processor feature in a virtualized **\n"); - printk("** environment is not 100%% safe. **\n"); - printk("** Setting the DS buffer address with wrong values **\n"); - printk("** may lead to hypervisor hangs or crashes. **\n"); - printk("** It is NOT recommended for production use! **\n"); - printk("******************************************************\n"); - } - } -func_out: - - arch_pmc_cnt = core2_get_pmc_count(); - check_pmc_quirk(); - - /* PV domains can allocate resources immediately */ - if ( !is_hvm_domain(v->domain) ) - if ( !core2_vpmu_alloc_resource(v) ) - return 1; - - return 0; -} - -static void core2_vpmu_destroy(struct vcpu *v) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(v); - struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context; - - if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) - return; - - if ( is_hvm_domain(v->domain) ) - { - xfree(core2_vpmu_cxt->pmu_enable); - xfree(vpmu->context); - if ( cpu_has_vmx_msr_bitmap ) - core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap); - } - - release_pmu_ownship(PMU_OWNER_HVM); - vpmu_clear(vpmu); -} - -struct arch_vpmu_ops core2_vpmu_ops = { - .do_wrmsr = core2_vpmu_do_wrmsr, - .do_rdmsr = core2_vpmu_do_rdmsr, - .do_interrupt = core2_vpmu_do_interrupt, - .do_cpuid = core2_vpmu_do_cpuid, - .arch_vpmu_destroy = core2_vpmu_destroy, - .arch_vpmu_save = core2_vpmu_save, - .arch_vpmu_load = core2_vpmu_load, - .arch_vpmu_dump = core2_vpmu_dump -}; - -static void core2_no_vpmu_do_cpuid(unsigned int input, - unsigned int *eax, unsigned int *ebx, - unsigned int *ecx, unsigned int *edx) -{ - /* - * As in this case the vpmu is not enabled reset some bits in the - * architectural performance monitoring related part. - */ - if ( input == 0xa ) - { - *eax &= ~PMU_VERSION_MASK; - *eax &= ~PMU_GENERAL_NR_MASK; - *eax &= ~PMU_GENERAL_WIDTH_MASK; - - *edx &= ~PMU_FIXED_NR_MASK; - *edx &= ~PMU_FIXED_WIDTH_MASK; - } -} - -/* - * If its a vpmu msr set it to 0. - */ -static int core2_no_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) -{ - int type = -1, index = -1; - if ( !is_core2_vpmu_msr(msr, &type, &index) ) - return 0; - *msr_content = 0; - return 1; -} - -/* - * These functions are used in case vpmu is not enabled. - */ -struct arch_vpmu_ops core2_no_vpmu_ops = { - .do_rdmsr = core2_no_vpmu_do_rdmsr, - .do_cpuid = core2_no_vpmu_do_cpuid, -}; - -int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(v); - uint8_t family = current_cpu_data.x86; - uint8_t cpu_model = current_cpu_data.x86_model; - int ret = 0; - - vpmu->arch_vpmu_ops = &core2_no_vpmu_ops; - if ( vpmu_flags == VPMU_OFF ) - return 0; - - if ( family == 6 ) - { - switch ( cpu_model ) - { - /* Core2: */ - case 0x0f: /* original 65 nm celeron/pentium/core2/xeon, "Merom"/"Conroe" */ - case 0x16: /* single-core 65 nm celeron/core2solo "Merom-L"/"Conroe-L" */ - case 0x17: /* 45 nm celeron/core2/xeon "Penryn"/"Wolfdale" */ - case 0x1d: /* six-core 45 nm xeon "Dunnington" */ - - case 0x2a: /* SandyBridge */ - case 0x2d: /* SandyBridge, "Romley-EP" */ - - /* Nehalem: */ - case 0x1a: /* 45 nm nehalem, "Bloomfield" */ - case 0x1e: /* 45 nm nehalem, "Lynnfield", "Clarksfield", "Jasper Forest" */ - case 0x2e: /* 45 nm nehalem-ex, "Beckton" */ - - /* Westmere: */ - case 0x25: /* 32 nm nehalem, "Clarkdale", "Arrandale" */ - case 0x2c: /* 32 nm nehalem, "Gulftown", "Westmere-EP" */ - case 0x27: /* 32 nm Westmere-EX */ - - case 0x3a: /* IvyBridge */ - case 0x3e: /* IvyBridge EP */ - case 0x3c: /* Haswell */ - ret = core2_vpmu_initialise(v, vpmu_flags); - if ( !ret ) - vpmu->arch_vpmu_ops = &core2_vpmu_ops; - return ret; - } - } - - printk("VPMU: Initialization failed. " - "Intel processor family %d model %d has not " - "been supported\n", family, cpu_model); - return -EINVAL; -} - diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c deleted file mode 100644 index f28b7af..0000000 --- a/xen/arch/x86/hvm/vpmu.c +++ /dev/null @@ -1,545 +0,0 @@ -/* - * vpmu.c: PMU virtualization for HVM domain. - * - * Copyright (c) 2007, Intel Corporation. - * - * This program is free software; you can redistribute it and/or modify it - * under the terms and conditions of the GNU General Public License, - * version 2, as published by the Free Software Foundation. - * - * This program is distributed in the hope it will be useful, but WITHOUT - * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or - * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for - * more details. - * - * You should have received a copy of the GNU General Public License along with - * this program; if not, write to the Free Software Foundation, Inc., 59 Temple - * Place - Suite 330, Boston, MA 02111-1307 USA. - * - * Author: Haitao Shan <haitao.shan@intel.com> - */ -#include <xen/config.h> -#include <xen/sched.h> -#include <xen/xenoprof.h> -#include <xen/event.h> -#include <xen/softirq.h> -#include <xen/hypercall.h> -#include <xen/guest_access.h> -#include <asm/regs.h> -#include <asm/types.h> -#include <asm/msr.h> -#include <asm/hvm/support.h> -#include <asm/hvm/vmx/vmx.h> -#include <asm/hvm/vmx/vmcs.h> -#include <asm/hvm/vpmu.h> -#include <asm/hvm/svm/svm.h> -#include <asm/hvm/svm/vmcb.h> -#include <asm/apic.h> -#include <asm/nmi.h> -#include <public/xenpmu.h> - -/* - * "vpmu" : vpmu generally enabled - * "vpmu=off" : vpmu generally disabled - * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on. - */ -uint32_t __read_mostly vpmu_mode = VPMU_OFF; -static void parse_vpmu_param(char *s); -custom_param("vpmu", parse_vpmu_param); - -static void vpmu_save_force(void *arg); -static DEFINE_PER_CPU(struct vcpu *, last_vcpu); - -static void __init parse_vpmu_param(char *s) -{ - switch ( parse_bool(s) ) - { - case 0: - break; - default: - if ( !strcmp(s, "bts") ) - vpmu_mode |= VPMU_INTEL_BTS; - else if ( *s ) - { - printk("VPMU: unknown flag: %s - vpmu disabled!\n", s); - break; - } - /* fall through */ - case 1: - vpmu_mode |= VPMU_ON; - break; - } -} - -static void vpmu_lvtpc_update(uint32_t val) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(current); - - vpmu->hw_lapic_lvtpc = APIC_DM_NMI | (val & APIC_LVT_MASKED); - apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc); -} - -int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(current); - - if ( (vpmu_mode & VPMU_PRIV) && (current->domain != dom0) ) - return 0; - - if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr ) - return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content); - return 0; -} - -int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(current); - - if ( vpmu_mode & VPMU_PRIV && current->domain != dom0 ) - return 0; - - if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr ) - return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content); - return 0; -} - -int vpmu_do_interrupt(struct cpu_user_regs *regs) -{ - struct vcpu *v = current; - struct vpmu_struct *vpmu; - - - /* dom0 will handle this interrupt */ - if ( (vpmu_mode & VPMU_PRIV) || - (v->domain->domain_id >= DOMID_FIRST_RESERVED) ) - { - if ( smp_processor_id() >= dom0->max_vcpus ) - return 0; - v = dom0->vcpu[smp_processor_id()]; - } - - vpmu = vcpu_vpmu(v); - if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) - return 0; - - if ( !is_hvm_domain(v->domain) || vpmu_mode & VPMU_PRIV ) - { - /* PV guest or dom0 is doing system profiling */ - void *p; - struct cpu_user_regs *gregs; - - p = v->arch.vpmu.xenpmu_data; - - /* PV guest will be reading PMU MSRs from xenpmu_data */ - vpmu_save_force(v); - - /* Store appropriate registers in xenpmu_data - * - * Note: ''!current->is_running'' is possible when ''set_current(next)'' - * for the (HVM) guest has been called but ''reset_stack_and_jump()'' - * has not (i.e. the guest is not actually running yet). - */ - if ( !is_hvm_domain(current->domain) || - ((vpmu_mode & VPMU_PRIV) && !current->is_running) ) - { - /* - * 32-bit dom0 cannot process Xen''s addresses (which are 64 bit) - * and therefore we treat it the same way as a non-priviledged - * PV 32-bit domain. - */ - if ( is_pv_32bit_domain(current->domain) ) - { - struct compat_cpu_user_regs cmp; - - gregs = guest_cpu_user_regs(); - XLAT_cpu_user_regs(&cmp, gregs); - memcpy(p, &cmp, sizeof(struct compat_cpu_user_regs)); - } - else if ( (current->domain != dom0) && !is_idle_vcpu(current) && - !(vpmu_mode & VPMU_PRIV) ) - { - /* PV guest */ - gregs = guest_cpu_user_regs(); - memcpy(p, gregs, sizeof(struct cpu_user_regs)); - } - else - memcpy(p, regs, sizeof(struct cpu_user_regs)); - } - else - { - /* HVM guest */ - struct segment_register cs; - - gregs = guest_cpu_user_regs(); - hvm_get_segment_register(current, x86_seg_cs, &cs); - gregs->cs = cs.attr.fields.dpl; - - memcpy(p, gregs, sizeof(struct cpu_user_regs)); - } - - v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id; - v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id; - - raise_softirq(PMU_SOFTIRQ); - vpmu_set(vpmu, VPMU_WAIT_FOR_FLUSH); - - return 1; - } - else if ( vpmu->arch_vpmu_ops ) - { - /* HVM guest */ - struct vlapic *vlapic; - u32 vlapic_lvtpc; - unsigned char int_vec; - - if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) ) - return 0; - - vlapic = vcpu_vlapic(v); - if ( !is_vlapic_lvtpc_enabled(vlapic) ) - return 1; - - vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC); - int_vec = vlapic_lvtpc & APIC_VECTOR_MASK; - - if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED ) - vlapic_set_irq(vcpu_vlapic(v), int_vec, 0); - else - v->nmi_pending = 1; - return 1; - } - - return 0; -} - -void vpmu_do_cpuid(unsigned int input, - unsigned int *eax, unsigned int *ebx, - unsigned int *ecx, unsigned int *edx) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(current); - - if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_cpuid ) - vpmu->arch_vpmu_ops->do_cpuid(input, eax, ebx, ecx, edx); -} - -static void vpmu_save_force(void *arg) -{ - struct vcpu *v = (struct vcpu *)arg; - struct vpmu_struct *vpmu = vcpu_vpmu(v); - - if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) ) - return; - - vpmu_set(vpmu, VPMU_CONTEXT_SAVE); - - if ( vpmu->arch_vpmu_ops ) - (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v); - - vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED); - - per_cpu(last_vcpu, smp_processor_id()) = NULL; -} - -void vpmu_save(struct vcpu *v) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(v); - int pcpu = smp_processor_id(); - - if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) ) - return; - - vpmu->last_pcpu = pcpu; - per_cpu(last_vcpu, pcpu) = v; - - if ( vpmu->arch_vpmu_ops ) - if ( vpmu->arch_vpmu_ops->arch_vpmu_save(v) ) - vpmu_reset(vpmu, VPMU_CONTEXT_LOADED); - - apic_write(APIC_LVTPC, APIC_DM_NMI | APIC_LVT_MASKED); -} - -void vpmu_load(struct vcpu *v) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(v); - int pcpu = smp_processor_id(); - struct vcpu *prev = NULL; - - if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) - return; - - /* First time this VCPU is running here */ - if ( vpmu->last_pcpu != pcpu ) - { - /* - * Get the context from last pcpu that we ran on. Note that if another - * VCPU is running there it must have saved this VPCU''s context before - * startig to run (see below). - * There should be no race since remote pcpu will disable interrupts - * before saving the context. - */ - if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) - on_selected_cpus(cpumask_of(vpmu->last_pcpu), - vpmu_save_force, (void *)v, 1); - } - - /* Prevent forced context save from remote CPU */ - local_irq_disable(); - - prev = per_cpu(last_vcpu, pcpu); - - if ( prev != v && prev ) - { - vpmu = vcpu_vpmu(prev); - - /* Someone ran here before us */ - vpmu_save_force(prev); - - vpmu = vcpu_vpmu(v); - } - - local_irq_enable(); - - /* Only when PMU is counting, we load PMU context immediately. */ - if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ) - return; - - if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_load ) - { - apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc); - vpmu->arch_vpmu_ops->arch_vpmu_load(v); - } - - /* - * PMU interrupt may happen while loading the context above. That - * may cause vpmu_save_force() in the handler so we we don''t - * want to mark the context as loaded. - */ - if ( !vpmu_is_set(vpmu, VPMU_WAIT_FOR_FLUSH) ) - vpmu_set(vpmu, VPMU_CONTEXT_LOADED); -} - -void vpmu_initialise(struct vcpu *v) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(v); - uint8_t vendor = current_cpu_data.x86_vendor; - - if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) - vpmu_destroy(v); - vpmu_clear(vpmu); - vpmu->context = NULL; - - switch ( vendor ) - { - case X86_VENDOR_AMD: - if ( svm_vpmu_initialise(v, vpmu_mode) != 0 ) - vpmu_mode = VPMU_OFF; - break; - - case X86_VENDOR_INTEL: - if ( vmx_vpmu_initialise(v, vpmu_mode) != 0 ) - vpmu_mode = VPMU_OFF; - break; - - default: - printk("VPMU: Initialization failed. " - "Unknown CPU vendor %d\n", vendor); - vpmu_mode = VPMU_OFF; - break; - } -} - -void vpmu_destroy(struct vcpu *v) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(v); - - if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy ) - { - /* Unload VPMU first. This will stop counters from running */ - on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu), - vpmu_save_force, (void *)v, 1); - - vpmu->arch_vpmu_ops->arch_vpmu_destroy(v); - } -} - -/* Dump some vpmu informations on console. Used in keyhandler dump_domains(). */ -void vpmu_dump(struct vcpu *v) -{ - struct vpmu_struct *vpmu = vcpu_vpmu(v); - - if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_dump ) - vpmu->arch_vpmu_ops->arch_vpmu_dump(v); -} - -int pmu_nmi_interrupt(struct cpu_user_regs *regs, int cpu) -{ - return vpmu_do_interrupt(regs); -} - -/* Process the softirq set by PMU NMI handler */ -void pmu_virq(void) -{ - struct vcpu *v = current; - - if ( (vpmu_mode & VPMU_PRIV) || - (v->domain->domain_id >= DOMID_FIRST_RESERVED) ) - { - if ( smp_processor_id() >= dom0->max_vcpus ) - { - printk(KERN_WARNING "PMU softirq on unexpected processor %d\n", - smp_processor_id()); - return; - } - v = dom0->vcpu[smp_processor_id()]; - } - - send_guest_vcpu_virq(v, VIRQ_XENPMU); -} - -static int pvpmu_init(struct domain *d, xenpmu_params_t *params) -{ - struct vcpu *v; - static int pvpmu_initted = 0; - - if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus ) - return -EINVAL; - - if ( !pvpmu_initted ) - { - if (reserve_lapic_nmi() == 0) - set_nmi_callback(pmu_nmi_interrupt); - else - { - printk("Failed to reserve PMU NMI\n"); - return -EBUSY; - } - open_softirq(PMU_SOFTIRQ, pmu_virq); - pvpmu_initted = 1; - } - - if ( !mfn_valid(params->mfn) || - !get_page_and_type(mfn_to_page(params->mfn), d, PGT_writable_page) ) - return -EINVAL; - - v = d->vcpu[params->vcpu]; - v->arch.vpmu.xenpmu_data = map_domain_page_global(params->mfn); - memset(v->arch.vpmu.xenpmu_data, 0, PAGE_SIZE); - - vpmu_initialise(v); - - return 0; -} - -static void pvpmu_finish(struct domain *d, xenpmu_params_t *params) -{ - struct vcpu *v; - uint64_t mfn; - - if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus ) - return; - - v = d->vcpu[params->vcpu]; - if (v != current) - vcpu_pause(v); - - if ( v->arch.vpmu.xenpmu_data ) - { - mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data); - if ( mfn_valid(mfn) ) - { - unmap_domain_page_global(v->arch.vpmu.xenpmu_data); - put_page_and_type(mfn_to_page(mfn)); - } - } - vpmu_destroy(v); - - if (v != current) - vcpu_unpause(v); -} - -long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) -{ - int ret = -EINVAL; - xenpmu_params_t pmu_params; - uint32_t mode, flags; - - switch ( op ) - { - case XENPMU_mode_set: - if ( !is_control_domain(current->domain) ) - return -EPERM; - - if ( copy_from_guest(&pmu_params, arg, 1) ) - return -EFAULT; - - mode = (uint32_t)pmu_params.control & VPMU_MODE_MASK; - if ( (mode & ~(VPMU_ON | VPMU_PRIV)) || - ((mode & VPMU_ON) && (mode & VPMU_PRIV)) ) - return -EINVAL; - - vpmu_mode &= ~VPMU_MODE_MASK; - vpmu_mode |= mode; - - ret = 0; - break; - - case XENPMU_mode_get: - pmu_params.control = vpmu_mode & VPMU_MODE_MASK; - if ( copy_to_guest(arg, &pmu_params, 1) ) - return -EFAULT; - ret = 0; - break; - - case XENPMU_flags_set: - if ( !is_control_domain(current->domain) ) - return -EPERM; - - if ( copy_from_guest(&pmu_params, arg, 1) ) - return -EFAULT; - - flags = (uint64_t)pmu_params.control & VPMU_FLAGS_MASK; - if ( flags & ~VPMU_INTEL_BTS ) - return -EINVAL; - - vpmu_mode &= ~VPMU_FLAGS_MASK; - vpmu_mode |= flags; - - ret = 0; - break; - - case XENPMU_flags_get: - pmu_params.control = vpmu_mode & VPMU_FLAGS_MASK; - if ( copy_to_guest(arg, &pmu_params, 1) ) - return -EFAULT; - ret = 0; - break; - - case XENPMU_init: - if ( copy_from_guest(&pmu_params, arg, 1) ) - return -EFAULT; - ret = pvpmu_init(current->domain, &pmu_params); - break; - - case XENPMU_finish: - if ( copy_from_guest(&pmu_params, arg, 1) ) - return -EFAULT; - pvpmu_finish(current->domain, &pmu_params); - break; - - case XENPMU_lvtpc_set: - if ( copy_from_guest(&pmu_params, arg, 1) ) - return -EFAULT; - - vpmu_lvtpc_update((uint32_t)pmu_params.lvtpc); - ret = 0; - break; - - case XENPMU_flush: - vpmu_reset(vcpu_vpmu(current), VPMU_WAIT_FOR_FLUSH); - vpmu_load(current); - ret = 0; - break; - } - - return ret; -} diff --git a/xen/arch/x86/oprofile/op_model_ppro.c b/xen/arch/x86/oprofile/op_model_ppro.c index 2939a40..9135801 100644 --- a/xen/arch/x86/oprofile/op_model_ppro.c +++ b/xen/arch/x86/oprofile/op_model_ppro.c @@ -19,7 +19,7 @@ #include <asm/processor.h> #include <asm/regs.h> #include <asm/current.h> -#include <asm/hvm/vpmu.h> +#include <asm/vpmu.h> #include "op_x86_model.h" #include "op_counter.h" diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 64c9c25..bca0a37 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -71,7 +71,7 @@ #include <asm/apic.h> #include <asm/mc146818rtc.h> #include <asm/hpet.h> -#include <asm/hvm/vpmu.h> +#include <asm/vpmu.h> #include <public/arch-x86/cpuid.h> #include <xsm/xsm.h> diff --git a/xen/arch/x86/vpmu.c b/xen/arch/x86/vpmu.c new file mode 100644 index 0000000..3f24903 --- /dev/null +++ b/xen/arch/x86/vpmu.c @@ -0,0 +1,545 @@ +/* + * vpmu.c: PMU virtualization for HVM domain. + * + * Copyright (c) 2007, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + * + * Author: Haitao Shan <haitao.shan@intel.com> + */ +#include <xen/config.h> +#include <xen/sched.h> +#include <xen/xenoprof.h> +#include <xen/event.h> +#include <xen/softirq.h> +#include <xen/hypercall.h> +#include <xen/guest_access.h> +#include <asm/regs.h> +#include <asm/types.h> +#include <asm/msr.h> +#include <asm/hvm/support.h> +#include <asm/hvm/vmx/vmx.h> +#include <asm/hvm/vmx/vmcs.h> +#include <asm/vpmu.h> +#include <asm/hvm/svm/svm.h> +#include <asm/hvm/svm/vmcb.h> +#include <asm/apic.h> +#include <asm/nmi.h> +#include <public/xenpmu.h> + +/* + * "vpmu" : vpmu generally enabled + * "vpmu=off" : vpmu generally disabled + * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on. + */ +uint32_t __read_mostly vpmu_mode = VPMU_OFF; +static void parse_vpmu_param(char *s); +custom_param("vpmu", parse_vpmu_param); + +static void vpmu_save_force(void *arg); +static DEFINE_PER_CPU(struct vcpu *, last_vcpu); + +static void __init parse_vpmu_param(char *s) +{ + switch ( parse_bool(s) ) + { + case 0: + break; + default: + if ( !strcmp(s, "bts") ) + vpmu_mode |= VPMU_INTEL_BTS; + else if ( *s ) + { + printk("VPMU: unknown flag: %s - vpmu disabled!\n", s); + break; + } + /* fall through */ + case 1: + vpmu_mode |= VPMU_ON; + break; + } +} + +static void vpmu_lvtpc_update(uint32_t val) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(current); + + vpmu->hw_lapic_lvtpc = APIC_DM_NMI | (val & APIC_LVT_MASKED); + apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc); +} + +int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(current); + + if ( (vpmu_mode & VPMU_PRIV) && (current->domain != dom0) ) + return 0; + + if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr ) + return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content); + return 0; +} + +int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(current); + + if ( vpmu_mode & VPMU_PRIV && current->domain != dom0 ) + return 0; + + if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr ) + return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content); + return 0; +} + +int vpmu_do_interrupt(struct cpu_user_regs *regs) +{ + struct vcpu *v = current; + struct vpmu_struct *vpmu; + + + /* dom0 will handle this interrupt */ + if ( (vpmu_mode & VPMU_PRIV) || + (v->domain->domain_id >= DOMID_FIRST_RESERVED) ) + { + if ( smp_processor_id() >= dom0->max_vcpus ) + return 0; + v = dom0->vcpu[smp_processor_id()]; + } + + vpmu = vcpu_vpmu(v); + if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) + return 0; + + if ( !is_hvm_domain(v->domain) || vpmu_mode & VPMU_PRIV ) + { + /* PV guest or dom0 is doing system profiling */ + void *p; + struct cpu_user_regs *gregs; + + p = v->arch.vpmu.xenpmu_data; + + /* PV guest will be reading PMU MSRs from xenpmu_data */ + vpmu_save_force(v); + + /* Store appropriate registers in xenpmu_data + * + * Note: ''!current->is_running'' is possible when ''set_current(next)'' + * for the (HVM) guest has been called but ''reset_stack_and_jump()'' + * has not (i.e. the guest is not actually running yet). + */ + if ( !is_hvm_domain(current->domain) || + ((vpmu_mode & VPMU_PRIV) && !current->is_running) ) + { + /* + * 32-bit dom0 cannot process Xen''s addresses (which are 64 bit) + * and therefore we treat it the same way as a non-priviledged + * PV 32-bit domain. + */ + if ( is_pv_32bit_domain(current->domain) ) + { + struct compat_cpu_user_regs cmp; + + gregs = guest_cpu_user_regs(); + XLAT_cpu_user_regs(&cmp, gregs); + memcpy(p, &cmp, sizeof(struct compat_cpu_user_regs)); + } + else if ( (current->domain != dom0) && !is_idle_vcpu(current) && + !(vpmu_mode & VPMU_PRIV) ) + { + /* PV guest */ + gregs = guest_cpu_user_regs(); + memcpy(p, gregs, sizeof(struct cpu_user_regs)); + } + else + memcpy(p, regs, sizeof(struct cpu_user_regs)); + } + else + { + /* HVM guest */ + struct segment_register cs; + + gregs = guest_cpu_user_regs(); + hvm_get_segment_register(current, x86_seg_cs, &cs); + gregs->cs = cs.attr.fields.dpl; + + memcpy(p, gregs, sizeof(struct cpu_user_regs)); + } + + v->arch.vpmu.xenpmu_data->domain_id = current->domain->domain_id; + v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id; + + raise_softirq(PMU_SOFTIRQ); + vpmu_set(vpmu, VPMU_WAIT_FOR_FLUSH); + + return 1; + } + else if ( vpmu->arch_vpmu_ops ) + { + /* HVM guest */ + struct vlapic *vlapic; + u32 vlapic_lvtpc; + unsigned char int_vec; + + if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) ) + return 0; + + vlapic = vcpu_vlapic(v); + if ( !is_vlapic_lvtpc_enabled(vlapic) ) + return 1; + + vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC); + int_vec = vlapic_lvtpc & APIC_VECTOR_MASK; + + if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED ) + vlapic_set_irq(vcpu_vlapic(v), int_vec, 0); + else + v->nmi_pending = 1; + return 1; + } + + return 0; +} + +void vpmu_do_cpuid(unsigned int input, + unsigned int *eax, unsigned int *ebx, + unsigned int *ecx, unsigned int *edx) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(current); + + if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_cpuid ) + vpmu->arch_vpmu_ops->do_cpuid(input, eax, ebx, ecx, edx); +} + +static void vpmu_save_force(void *arg) +{ + struct vcpu *v = (struct vcpu *)arg; + struct vpmu_struct *vpmu = vcpu_vpmu(v); + + if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) ) + return; + + vpmu_set(vpmu, VPMU_CONTEXT_SAVE); + + if ( vpmu->arch_vpmu_ops ) + (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v); + + vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED); + + per_cpu(last_vcpu, smp_processor_id()) = NULL; +} + +void vpmu_save(struct vcpu *v) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(v); + int pcpu = smp_processor_id(); + + if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) ) + return; + + vpmu->last_pcpu = pcpu; + per_cpu(last_vcpu, pcpu) = v; + + if ( vpmu->arch_vpmu_ops ) + if ( vpmu->arch_vpmu_ops->arch_vpmu_save(v) ) + vpmu_reset(vpmu, VPMU_CONTEXT_LOADED); + + apic_write(APIC_LVTPC, APIC_DM_NMI | APIC_LVT_MASKED); +} + +void vpmu_load(struct vcpu *v) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(v); + int pcpu = smp_processor_id(); + struct vcpu *prev = NULL; + + if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) + return; + + /* First time this VCPU is running here */ + if ( vpmu->last_pcpu != pcpu ) + { + /* + * Get the context from last pcpu that we ran on. Note that if another + * VCPU is running there it must have saved this VPCU''s context before + * startig to run (see below). + * There should be no race since remote pcpu will disable interrupts + * before saving the context. + */ + if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) + on_selected_cpus(cpumask_of(vpmu->last_pcpu), + vpmu_save_force, (void *)v, 1); + } + + /* Prevent forced context save from remote CPU */ + local_irq_disable(); + + prev = per_cpu(last_vcpu, pcpu); + + if ( prev != v && prev ) + { + vpmu = vcpu_vpmu(prev); + + /* Someone ran here before us */ + vpmu_save_force(prev); + + vpmu = vcpu_vpmu(v); + } + + local_irq_enable(); + + /* Only when PMU is counting, we load PMU context immediately. */ + if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ) + return; + + if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_load ) + { + apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc); + vpmu->arch_vpmu_ops->arch_vpmu_load(v); + } + + /* + * PMU interrupt may happen while loading the context above. That + * may cause vpmu_save_force() in the handler so we we don''t + * want to mark the context as loaded. + */ + if ( !vpmu_is_set(vpmu, VPMU_WAIT_FOR_FLUSH) ) + vpmu_set(vpmu, VPMU_CONTEXT_LOADED); +} + +void vpmu_initialise(struct vcpu *v) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(v); + uint8_t vendor = current_cpu_data.x86_vendor; + + if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) + vpmu_destroy(v); + vpmu_clear(vpmu); + vpmu->context = NULL; + + switch ( vendor ) + { + case X86_VENDOR_AMD: + if ( svm_vpmu_initialise(v, vpmu_mode) != 0 ) + vpmu_mode = VPMU_OFF; + break; + + case X86_VENDOR_INTEL: + if ( vmx_vpmu_initialise(v, vpmu_mode) != 0 ) + vpmu_mode = VPMU_OFF; + break; + + default: + printk("VPMU: Initialization failed. " + "Unknown CPU vendor %d\n", vendor); + vpmu_mode = VPMU_OFF; + break; + } +} + +void vpmu_destroy(struct vcpu *v) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(v); + + if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_destroy ) + { + /* Unload VPMU first. This will stop counters from running */ + on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu), + vpmu_save_force, (void *)v, 1); + + vpmu->arch_vpmu_ops->arch_vpmu_destroy(v); + } +} + +/* Dump some vpmu informations on console. Used in keyhandler dump_domains(). */ +void vpmu_dump(struct vcpu *v) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(v); + + if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->arch_vpmu_dump ) + vpmu->arch_vpmu_ops->arch_vpmu_dump(v); +} + +int pmu_nmi_interrupt(struct cpu_user_regs *regs, int cpu) +{ + return vpmu_do_interrupt(regs); +} + +/* Process the softirq set by PMU NMI handler */ +void pmu_virq(void) +{ + struct vcpu *v = current; + + if ( (vpmu_mode & VPMU_PRIV) || + (v->domain->domain_id >= DOMID_FIRST_RESERVED) ) + { + if ( smp_processor_id() >= dom0->max_vcpus ) + { + printk(KERN_WARNING "PMU softirq on unexpected processor %d\n", + smp_processor_id()); + return; + } + v = dom0->vcpu[smp_processor_id()]; + } + + send_guest_vcpu_virq(v, VIRQ_XENPMU); +} + +static int pvpmu_init(struct domain *d, xenpmu_params_t *params) +{ + struct vcpu *v; + static int pvpmu_initted = 0; + + if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus ) + return -EINVAL; + + if ( !pvpmu_initted ) + { + if (reserve_lapic_nmi() == 0) + set_nmi_callback(pmu_nmi_interrupt); + else + { + printk("Failed to reserve PMU NMI\n"); + return -EBUSY; + } + open_softirq(PMU_SOFTIRQ, pmu_virq); + pvpmu_initted = 1; + } + + if ( !mfn_valid(params->mfn) || + !get_page_and_type(mfn_to_page(params->mfn), d, PGT_writable_page) ) + return -EINVAL; + + v = d->vcpu[params->vcpu]; + v->arch.vpmu.xenpmu_data = map_domain_page_global(params->mfn); + memset(v->arch.vpmu.xenpmu_data, 0, PAGE_SIZE); + + vpmu_initialise(v); + + return 0; +} + +static void pvpmu_finish(struct domain *d, xenpmu_params_t *params) +{ + struct vcpu *v; + uint64_t mfn; + + if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus ) + return; + + v = d->vcpu[params->vcpu]; + if (v != current) + vcpu_pause(v); + + if ( v->arch.vpmu.xenpmu_data ) + { + mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data); + if ( mfn_valid(mfn) ) + { + unmap_domain_page_global(v->arch.vpmu.xenpmu_data); + put_page_and_type(mfn_to_page(mfn)); + } + } + vpmu_destroy(v); + + if (v != current) + vcpu_unpause(v); +} + +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg) +{ + int ret = -EINVAL; + xenpmu_params_t pmu_params; + uint32_t mode, flags; + + switch ( op ) + { + case XENPMU_mode_set: + if ( !is_control_domain(current->domain) ) + return -EPERM; + + if ( copy_from_guest(&pmu_params, arg, 1) ) + return -EFAULT; + + mode = (uint32_t)pmu_params.control & VPMU_MODE_MASK; + if ( (mode & ~(VPMU_ON | VPMU_PRIV)) || + ((mode & VPMU_ON) && (mode & VPMU_PRIV)) ) + return -EINVAL; + + vpmu_mode &= ~VPMU_MODE_MASK; + vpmu_mode |= mode; + + ret = 0; + break; + + case XENPMU_mode_get: + pmu_params.control = vpmu_mode & VPMU_MODE_MASK; + if ( copy_to_guest(arg, &pmu_params, 1) ) + return -EFAULT; + ret = 0; + break; + + case XENPMU_flags_set: + if ( !is_control_domain(current->domain) ) + return -EPERM; + + if ( copy_from_guest(&pmu_params, arg, 1) ) + return -EFAULT; + + flags = (uint64_t)pmu_params.control & VPMU_FLAGS_MASK; + if ( flags & ~VPMU_INTEL_BTS ) + return -EINVAL; + + vpmu_mode &= ~VPMU_FLAGS_MASK; + vpmu_mode |= flags; + + ret = 0; + break; + + case XENPMU_flags_get: + pmu_params.control = vpmu_mode & VPMU_FLAGS_MASK; + if ( copy_to_guest(arg, &pmu_params, 1) ) + return -EFAULT; + ret = 0; + break; + + case XENPMU_init: + if ( copy_from_guest(&pmu_params, arg, 1) ) + return -EFAULT; + ret = pvpmu_init(current->domain, &pmu_params); + break; + + case XENPMU_finish: + if ( copy_from_guest(&pmu_params, arg, 1) ) + return -EFAULT; + pvpmu_finish(current->domain, &pmu_params); + break; + + case XENPMU_lvtpc_set: + if ( copy_from_guest(&pmu_params, arg, 1) ) + return -EFAULT; + + vpmu_lvtpc_update((uint32_t)pmu_params.lvtpc); + ret = 0; + break; + + case XENPMU_flush: + vpmu_reset(vcpu_vpmu(current), VPMU_WAIT_FOR_FLUSH); + vpmu_load(current); + ret = 0; + break; + } + + return ret; +} diff --git a/xen/arch/x86/vpmu_amd.c b/xen/arch/x86/vpmu_amd.c new file mode 100644 index 0000000..f64ffc0 --- /dev/null +++ b/xen/arch/x86/vpmu_amd.c @@ -0,0 +1,486 @@ +/* + * vpmu.c: PMU virtualization for HVM domain. + * + * Copyright (c) 2010, Advanced Micro Devices, Inc. + * Parts of this code are Copyright (c) 2007, Intel Corporation + * + * Author: Wei Wang <wei.wang2@amd.com> + * Tested by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + * + */ + +#include <xen/config.h> +#include <xen/xenoprof.h> +#include <xen/hvm/save.h> +#include <xen/sched.h> +#include <xen/irq.h> +#include <asm/apic.h> +#include <asm/hvm/vlapic.h> +#include <asm/vpmu.h> +#include <public/xenpmu.h> + +#define MSR_F10H_EVNTSEL_GO_SHIFT 40 +#define MSR_F10H_EVNTSEL_EN_SHIFT 22 +#define MSR_F10H_COUNTER_LENGTH 48 + +#define is_guest_mode(msr) ((msr) & (1ULL << MSR_F10H_EVNTSEL_GO_SHIFT)) +#define is_pmu_enabled(msr) ((msr) & (1ULL << MSR_F10H_EVNTSEL_EN_SHIFT)) +#define set_guest_mode(msr) (msr |= (1ULL << MSR_F10H_EVNTSEL_GO_SHIFT)) +#define is_overflowed(msr) (!((msr) & (1ULL << (MSR_F10H_COUNTER_LENGTH-1)))) + +static unsigned int __read_mostly num_counters; +static const u32 __read_mostly *counters; +static const u32 __read_mostly *ctrls; +static bool_t __read_mostly k7_counters_mirrored; + +/* PMU Counter MSRs. */ +static const u32 AMD_F10H_COUNTERS[] = { + MSR_K7_PERFCTR0, + MSR_K7_PERFCTR1, + MSR_K7_PERFCTR2, + MSR_K7_PERFCTR3 +}; + +/* PMU Control MSRs. */ +static const u32 AMD_F10H_CTRLS[] = { + MSR_K7_EVNTSEL0, + MSR_K7_EVNTSEL1, + MSR_K7_EVNTSEL2, + MSR_K7_EVNTSEL3 +}; + +static const u32 AMD_F15H_COUNTERS[] = { + MSR_AMD_FAM15H_PERFCTR0, + MSR_AMD_FAM15H_PERFCTR1, + MSR_AMD_FAM15H_PERFCTR2, + MSR_AMD_FAM15H_PERFCTR3, + MSR_AMD_FAM15H_PERFCTR4, + MSR_AMD_FAM15H_PERFCTR5 +}; + +static const u32 AMD_F15H_CTRLS[] = { + MSR_AMD_FAM15H_EVNTSEL0, + MSR_AMD_FAM15H_EVNTSEL1, + MSR_AMD_FAM15H_EVNTSEL2, + MSR_AMD_FAM15H_EVNTSEL3, + MSR_AMD_FAM15H_EVNTSEL4, + MSR_AMD_FAM15H_EVNTSEL5 +}; + +static inline int get_pmu_reg_type(u32 addr) +{ + if ( (addr >= MSR_K7_EVNTSEL0) && (addr <= MSR_K7_EVNTSEL3) ) + return MSR_TYPE_CTRL; + + if ( (addr >= MSR_K7_PERFCTR0) && (addr <= MSR_K7_PERFCTR3) ) + return MSR_TYPE_COUNTER; + + if ( (addr >= MSR_AMD_FAM15H_EVNTSEL0) && + (addr <= MSR_AMD_FAM15H_PERFCTR5 ) ) + { + if (addr & 1) + return MSR_TYPE_COUNTER; + else + return MSR_TYPE_CTRL; + } + + /* unsupported registers */ + return -1; +} + +static inline u32 get_fam15h_addr(u32 addr) +{ + switch ( addr ) + { + case MSR_K7_PERFCTR0: + return MSR_AMD_FAM15H_PERFCTR0; + case MSR_K7_PERFCTR1: + return MSR_AMD_FAM15H_PERFCTR1; + case MSR_K7_PERFCTR2: + return MSR_AMD_FAM15H_PERFCTR2; + case MSR_K7_PERFCTR3: + return MSR_AMD_FAM15H_PERFCTR3; + case MSR_K7_EVNTSEL0: + return MSR_AMD_FAM15H_EVNTSEL0; + case MSR_K7_EVNTSEL1: + return MSR_AMD_FAM15H_EVNTSEL1; + case MSR_K7_EVNTSEL2: + return MSR_AMD_FAM15H_EVNTSEL2; + case MSR_K7_EVNTSEL3: + return MSR_AMD_FAM15H_EVNTSEL3; + default: + break; + } + + return addr; +} + +static void amd_vpmu_set_msr_bitmap(struct vcpu *v) +{ + unsigned int i; + struct vpmu_struct *vpmu = vcpu_vpmu(v); + struct amd_vpmu_context *ctxt = vpmu->context; + + for ( i = 0; i < num_counters; i++ ) + { + svm_intercept_msr(v, counters[i], MSR_INTERCEPT_NONE); + svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_WRITE); + } + + ctxt->msr_bitmap_set = 1; +} + +static void amd_vpmu_unset_msr_bitmap(struct vcpu *v) +{ + unsigned int i; + struct vpmu_struct *vpmu = vcpu_vpmu(v); + struct amd_vpmu_context *ctxt = vpmu->context; + + for ( i = 0; i < num_counters; i++ ) + { + svm_intercept_msr(v, counters[i], MSR_INTERCEPT_RW); + svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_RW); + } + + ctxt->msr_bitmap_set = 0; +} + +static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs) +{ + return 1; +} + +static inline void context_load(struct vcpu *v) +{ + unsigned int i; + struct vpmu_struct *vpmu = vcpu_vpmu(v); + struct amd_vpmu_context *ctxt = vpmu->context; + + for ( i = 0; i < num_counters; i++ ) + { + wrmsrl(counters[i], ctxt->counters[i]); + wrmsrl(ctrls[i], ctxt->ctrls[i]); + } +} + +static void amd_vpmu_load(struct vcpu *v) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(v); + struct amd_vpmu_context *ctxt = vpmu->context; + + vpmu_reset(vpmu, VPMU_FROZEN); + + if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) + { + unsigned int i; + + for ( i = 0; i < num_counters; i++ ) + wrmsrl(ctrls[i], ctxt->ctrls[i]); + + return; + } + + context_load(v); +} + +static inline void context_save(struct vcpu *v) +{ + unsigned int i; + struct vpmu_struct *vpmu = vcpu_vpmu(v); + struct amd_vpmu_context *ctxt = vpmu->context; + + /* No need to save controls -- they are saved in amd_vpmu_do_wrmsr */ + for ( i = 0; i < num_counters; i++ ) + rdmsrl(counters[i], ctxt->counters[i]); +} + +static int amd_vpmu_save(struct vcpu *v) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(v); + struct amd_vpmu_context *ctx = vpmu->context; + unsigned int i; + + if ( !vpmu_is_set(vpmu, VPMU_FROZEN) ) + { + for ( i = 0; i < num_counters; i++ ) + wrmsrl(ctrls[i], 0); + + vpmu_set(vpmu, VPMU_FROZEN); + } + + if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) ) + return 0; + + context_save(v); + + if ( is_hvm_domain(v->domain) && + !vpmu_is_set(vpmu, VPMU_RUNNING) && ctx->msr_bitmap_set ) + amd_vpmu_unset_msr_bitmap(v); + + return 1; +} + +static void context_update(unsigned int msr, u64 msr_content) +{ + unsigned int i; + struct vcpu *v = current; + struct vpmu_struct *vpmu = vcpu_vpmu(v); + struct amd_vpmu_context *ctxt = vpmu->context; + + if ( k7_counters_mirrored && + ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)) ) + { + msr = get_fam15h_addr(msr); + } + + for ( i = 0; i < num_counters; i++ ) + { + if ( msr == ctrls[i] ) + { + ctxt->ctrls[i] = msr_content; + return; + } + else if (msr == counters[i] ) + { + ctxt->counters[i] = msr_content; + return; + } + } +} + +static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) +{ + struct vcpu *v = current; + struct vpmu_struct *vpmu = vcpu_vpmu(v); + + /* For all counters, enable guest only mode for HVM guest */ + if ( is_hvm_domain(v->domain) && (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) && + !(is_guest_mode(msr_content)) ) + { + set_guest_mode(msr_content); + } + + /* check if the first counter is enabled */ + if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) && + is_pmu_enabled(msr_content) && !vpmu_is_set(vpmu, VPMU_RUNNING) ) + { + if ( !acquire_pmu_ownership(PMU_OWNER_HVM) ) + return 1; + vpmu_set(vpmu, VPMU_RUNNING); + apic_write(APIC_LVTPC, APIC_DM_NMI); + vpmu->hw_lapic_lvtpc = APIC_DM_NMI; + + if ( is_hvm_domain(v->domain) && + !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set ) + amd_vpmu_set_msr_bitmap(v); + } + + /* stop saving & restore if guest stops first counter */ + if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) && + (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu, VPMU_RUNNING) ) + { + apic_write(APIC_LVTPC, APIC_DM_NMI | APIC_LVT_MASKED); + vpmu->hw_lapic_lvtpc = APIC_DM_NMI | APIC_LVT_MASKED; + vpmu_reset(vpmu, VPMU_RUNNING); + if ( is_hvm_domain(v->domain) && + ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set ) + amd_vpmu_unset_msr_bitmap(v); + release_pmu_ownship(PMU_OWNER_HVM); + } + + if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) + || vpmu_is_set(vpmu, VPMU_FROZEN) ) + { + context_load(v); + vpmu_set(vpmu, VPMU_CONTEXT_LOADED); + vpmu_reset(vpmu, VPMU_FROZEN); + } + + /* Update vpmu context immediately */ + context_update(msr, msr_content); + + /* Write to hw counters */ + wrmsrl(msr, msr_content); + return 1; +} + +static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) +{ + struct vcpu *v = current; + struct vpmu_struct *vpmu = vcpu_vpmu(v); + + if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) + || vpmu_is_set(vpmu, VPMU_FROZEN) ) + { + context_load(v); + vpmu_set(vpmu, VPMU_CONTEXT_LOADED); + vpmu_reset(vpmu, VPMU_FROZEN); + } + + rdmsrl(msr, *msr_content); + + return 1; +} + +static int amd_vpmu_initialise(struct vcpu *v) +{ + struct amd_vpmu_context *ctxt; + struct vpmu_struct *vpmu = vcpu_vpmu(v); + uint8_t family = current_cpu_data.x86; + + if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) + return 0; + + if ( counters == NULL ) + { + switch ( family ) + { + case 0x15: + num_counters = F15H_NUM_COUNTERS; + counters = AMD_F15H_COUNTERS; + ctrls = AMD_F15H_CTRLS; + k7_counters_mirrored = 1; + break; + case 0x10: + case 0x12: + case 0x14: + case 0x16: + default: + num_counters = F10H_NUM_COUNTERS; + counters = AMD_F10H_COUNTERS; + ctrls = AMD_F10H_CTRLS; + k7_counters_mirrored = 0; + break; + } + } + + if ( is_hvm_domain(v->domain) ) + { + ctxt = xzalloc(struct amd_vpmu_context); + if ( !ctxt ) + { + gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, " + " PMU feature is unavailable on domain %d vcpu %d.\n", + v->vcpu_id, v->domain->domain_id); + return -ENOMEM; + } + } + else + ctxt = &v->arch.vpmu.xenpmu_data->pmu.amd; + + vpmu->context = ctxt; + vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED); + return 0; +} + +static void amd_vpmu_destroy(struct vcpu *v) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(v); + + if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) + return; + + if ( is_hvm_domain(v->domain) ) + { + if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set ) + amd_vpmu_unset_msr_bitmap(v); + + xfree(vpmu->context); + release_pmu_ownship(PMU_OWNER_HVM); + } + + vpmu->context = NULL; + vpmu_clear(vpmu); +} + +/* VPMU part of the ''q'' keyhandler */ +static void amd_vpmu_dump(struct vcpu *v) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(v); + struct amd_vpmu_context *ctxt = vpmu->context; + unsigned int i; + + printk(" VPMU state: 0x%x ", vpmu->flags); + if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) + { + printk("\n"); + return; + } + + printk("("); + if ( vpmu_is_set(vpmu, VPMU_PASSIVE_DOMAIN_ALLOCATED) ) + printk("PASSIVE_DOMAIN_ALLOCATED, "); + if ( vpmu_is_set(vpmu, VPMU_FROZEN) ) + printk("FROZEN, "); + if ( vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) ) + printk("SAVE, "); + if ( vpmu_is_set(vpmu, VPMU_RUNNING) ) + printk("RUNNING, "); + if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) + printk("LOADED, "); + printk("ALLOCATED)\n"); + + for ( i = 0; i < num_counters; i++ ) + { + uint64_t ctrl, cntr; + + rdmsrl(ctrls[i], ctrl); + rdmsrl(counters[i], cntr); + printk(" 0x%08x: 0x%lx (0x%lx in HW) 0x%08x: 0x%lx (0x%lx in HW)\n", + ctrls[i], ctxt->ctrls[i], ctrl, + counters[i], ctxt->counters[i], cntr); + } +} + +struct arch_vpmu_ops amd_vpmu_ops = { + .do_wrmsr = amd_vpmu_do_wrmsr, + .do_rdmsr = amd_vpmu_do_rdmsr, + .do_interrupt = amd_vpmu_do_interrupt, + .arch_vpmu_destroy = amd_vpmu_destroy, + .arch_vpmu_save = amd_vpmu_save, + .arch_vpmu_load = amd_vpmu_load, + .arch_vpmu_dump = amd_vpmu_dump +}; + +int svm_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(v); + uint8_t family = current_cpu_data.x86; + int ret = 0; + + /* vpmu enabled? */ + if ( vpmu_flags == VPMU_OFF ) + return 0; + + switch ( family ) + { + case 0x10: + case 0x12: + case 0x14: + case 0x15: + case 0x16: + ret = amd_vpmu_initialise(v); + if ( !ret ) + vpmu->arch_vpmu_ops = &amd_vpmu_ops; + return ret; + } + + printk("VPMU: Initialization failed. " + "AMD processor family %d has not " + "been supported\n", family); + return -EINVAL; +} + diff --git a/xen/arch/x86/vpmu_intel.c b/xen/arch/x86/vpmu_intel.c new file mode 100644 index 0000000..d7570c5 --- /dev/null +++ b/xen/arch/x86/vpmu_intel.c @@ -0,0 +1,938 @@ +/* + * vpmu_core2.c: CORE 2 specific PMU virtualization for HVM domain. + * + * Copyright (c) 2007, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + * + * Author: Haitao Shan <haitao.shan@intel.com> + */ + +#include <xen/config.h> +#include <xen/sched.h> +#include <xen/xenoprof.h> +#include <xen/irq.h> +#include <asm/system.h> +#include <asm/regs.h> +#include <asm/types.h> +#include <asm/apic.h> +#include <asm/traps.h> +#include <asm/msr.h> +#include <asm/msr-index.h> +#include <asm/hvm/support.h> +#include <asm/hvm/vlapic.h> +#include <asm/hvm/vmx/vmx.h> +#include <asm/hvm/vmx/vmcs.h> +#include <public/sched.h> +#include <public/hvm/save.h> +#include <public/xenpmu.h> +#include <asm/vpmu.h> + +/* + * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID + * instruction. + * cpuid 0xa - Architectural Performance Monitoring Leaf + * Register eax + */ +#define PMU_VERSION_SHIFT 0 /* Version ID */ +#define PMU_VERSION_BITS 8 /* 8 bits 0..7 */ +#define PMU_VERSION_MASK (((1 << PMU_VERSION_BITS) - 1) << PMU_VERSION_SHIFT) + +#define PMU_GENERAL_NR_SHIFT 8 /* Number of general pmu registers */ +#define PMU_GENERAL_NR_BITS 8 /* 8 bits 8..15 */ +#define PMU_GENERAL_NR_MASK (((1 << PMU_GENERAL_NR_BITS) - 1) << PMU_GENERAL_NR_SHIFT) + +#define PMU_GENERAL_WIDTH_SHIFT 16 /* Width of general pmu registers */ +#define PMU_GENERAL_WIDTH_BITS 8 /* 8 bits 16..23 */ +#define PMU_GENERAL_WIDTH_MASK (((1 << PMU_GENERAL_WIDTH_BITS) - 1) << PMU_GENERAL_WIDTH_SHIFT) +/* Register edx */ +#define PMU_FIXED_NR_SHIFT 0 /* Number of fixed pmu registers */ +#define PMU_FIXED_NR_BITS 5 /* 5 bits 0..4 */ +#define PMU_FIXED_NR_MASK (((1 << PMU_FIXED_NR_BITS) -1) << PMU_FIXED_NR_SHIFT) + +#define PMU_FIXED_WIDTH_SHIFT 5 /* Width of fixed pmu registers */ +#define PMU_FIXED_WIDTH_BITS 8 /* 8 bits 5..12 */ +#define PMU_FIXED_WIDTH_MASK (((1 << PMU_FIXED_WIDTH_BITS) -1) << PMU_FIXED_WIDTH_SHIFT) + + +/* Intel-specific VPMU features */ +#define VPMU_CPU_HAS_DS 0x100 /* Has Debug Store */ +#define VPMU_CPU_HAS_BTS 0x200 /* Has Branch Trace Store */ + +/* + * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed + * counters. 4 bits for every counter. + */ +#define FIXED_CTR_CTRL_BITS 4 +#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1) + +/* The index into the core2_ctrls_msr[] of this MSR used in core2_vpmu_dump() */ +#define MSR_CORE_PERF_FIXED_CTR_CTRL_IDX 0 + +static int arch_pmc_cnt; /* Number of general-purpose performance counters */ + +/* + * QUIRK to workaround an issue on various family 6 cpus. + * The issue leads to endless PMC interrupt loops on the processor. + * If the interrupt handler is running and a pmc reaches the value 0, this + * value remains forever and it triggers immediately a new interrupt after + * finishing the handler. + * A workaround is to read all flagged counters and if the value is 0 write + * 1 (or another value != 0) into it. + * There exist no errata and the real cause of this behaviour is unknown. + */ +bool_t __read_mostly is_pmc_quirk; + +static void check_pmc_quirk(void) +{ + if ( current_cpu_data.x86 == 6 ) + is_pmc_quirk = 1; + else + is_pmc_quirk = 0; +} + +static void handle_pmc_quirk(u64 msr_content) +{ + int i; + u64 val; + + if ( !is_pmc_quirk ) + return; + + val = msr_content; + for ( i = 0; i < arch_pmc_cnt; i++ ) + { + if ( val & 0x1 ) + { + u64 cnt; + rdmsrl(MSR_P6_PERFCTR0 + i, cnt); + if ( cnt == 0 ) + wrmsrl(MSR_P6_PERFCTR0 + i, 1); + } + val >>= 1; + } + val = msr_content >> 32; + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) + { + if ( val & 0x1 ) + { + u64 cnt; + rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, cnt); + if ( cnt == 0 ) + wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, 1); + } + val >>= 1; + } +} + +/* + * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15] + */ +static int core2_get_pmc_count(void) +{ + u32 eax, ebx, ecx, edx; + + cpuid(0xa, &eax, &ebx, &ecx, &edx); + return ( (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT ); +} + +static u64 core2_calc_intial_glb_ctrl_msr(void) +{ + int arch_pmc_bits = (1 << arch_pmc_cnt) - 1; + u64 fix_pmc_bits = (1 << VPMU_CORE2_NUM_FIXED) - 1; + return ((fix_pmc_bits << 32) | arch_pmc_bits); +} + +/* edx bits 5-12: Bit width of fixed-function performance counters */ +static int core2_get_bitwidth_fix_count(void) +{ + u32 eax, ebx, ecx, edx; + + cpuid(0xa, &eax, &ebx, &ecx, &edx); + return ((edx & PMU_FIXED_WIDTH_MASK) >> PMU_FIXED_WIDTH_SHIFT); +} + +static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index) +{ + int i; + + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) + { + if ( core2_fix_counters_msr[i] == msr_index ) + { + *type = MSR_TYPE_COUNTER; + *index = i; + return 1; + } + } + + for ( i = 0; i < VPMU_CORE2_NUM_CTRLS; i++ ) + { + if ( core2_ctrls_msr[i] == msr_index ) + { + *type = MSR_TYPE_CTRL; + *index = i; + return 1; + } + } + + if ( (msr_index == MSR_CORE_PERF_GLOBAL_CTRL) || + (msr_index == MSR_CORE_PERF_GLOBAL_STATUS) || + (msr_index == MSR_CORE_PERF_GLOBAL_OVF_CTRL) ) + { + *type = MSR_TYPE_GLOBAL; + return 1; + } + + if ( (msr_index >= MSR_IA32_PERFCTR0) && + (msr_index < (MSR_IA32_PERFCTR0 + arch_pmc_cnt)) ) + { + *type = MSR_TYPE_ARCH_COUNTER; + *index = msr_index - MSR_IA32_PERFCTR0; + return 1; + } + + if ( (msr_index >= MSR_P6_EVNTSEL0) && + (msr_index < (MSR_P6_EVNTSEL0 + arch_pmc_cnt)) ) + { + *type = MSR_TYPE_ARCH_CTRL; + *index = msr_index - MSR_P6_EVNTSEL0; + return 1; + } + + return 0; +} + +#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000) +static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap) +{ + int i; + + /* Allow Read/Write PMU Counters MSR Directly. */ + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) + { + clear_bit(msraddr_to_bitpos(core2_fix_counters_msr[i]), msr_bitmap); + clear_bit(msraddr_to_bitpos(core2_fix_counters_msr[i]), + msr_bitmap + 0x800/BYTES_PER_LONG); + } + for ( i = 0; i < arch_pmc_cnt; i++ ) + { + clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap); + clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), + msr_bitmap + 0x800/BYTES_PER_LONG); + } + + /* Allow Read PMU Non-global Controls Directly. */ + for ( i = 0; i < VPMU_CORE2_NUM_CTRLS; i++ ) + clear_bit(msraddr_to_bitpos(core2_ctrls_msr[i]), msr_bitmap); + for ( i = 0; i < arch_pmc_cnt; i++ ) + clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap); +} + +static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap) +{ + int i; + + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) + { + set_bit(msraddr_to_bitpos(core2_fix_counters_msr[i]), msr_bitmap); + set_bit(msraddr_to_bitpos(core2_fix_counters_msr[i]), + msr_bitmap + 0x800/BYTES_PER_LONG); + } + for ( i = 0; i < arch_pmc_cnt; i++ ) + { + set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap); + set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), + msr_bitmap + 0x800/BYTES_PER_LONG); + } + for ( i = 0; i < VPMU_CORE2_NUM_CTRLS; i++ ) + set_bit(msraddr_to_bitpos(core2_ctrls_msr[i]), msr_bitmap); + for ( i = 0; i < arch_pmc_cnt; i++ ) + set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap); +} + +static inline void __core2_vpmu_save(struct vcpu *v) +{ + int i; + struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context; + + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) + rdmsrl(core2_fix_counters_msr[i], core2_vpmu_cxt->fix_counters[i]); + for ( i = 0; i < arch_pmc_cnt; i++ ) + rdmsrl(MSR_IA32_PERFCTR0+i, core2_vpmu_cxt->arch_msr_pair[i].counter); + + if ( !is_hvm_domain(v->domain) ) + rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status); +} + +static int core2_vpmu_save(struct vcpu *v) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(v); + + if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) ) + return 0; + + if ( !is_hvm_domain(v->domain) ) + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0); + + __core2_vpmu_save(v); + + /* Unset PMU MSR bitmap to trap lazy load. */ + if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && cpu_has_vmx_msr_bitmap + && is_hvm_domain(v->domain) ) + core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap); + + return 1; +} + +static inline void __core2_vpmu_load(struct vcpu *v) +{ + int i; + struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context; + + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) + wrmsrl(core2_fix_counters_msr[i], core2_vpmu_cxt->fix_counters[i]); + for ( i = 0; i < arch_pmc_cnt; i++ ) + wrmsrl(MSR_IA32_PERFCTR0+i, core2_vpmu_cxt->arch_msr_pair[i].counter); + + for ( i = 0; i < VPMU_CORE2_NUM_CTRLS; i++ ) + wrmsrl(core2_ctrls_msr[i], core2_vpmu_cxt->ctrls[i]); + for ( i = 0; i < arch_pmc_cnt; i++ ) + wrmsrl(MSR_P6_EVNTSEL0+i, core2_vpmu_cxt->arch_msr_pair[i].control); + + if ( !is_hvm_domain(v->domain) ) + { + wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, core2_vpmu_cxt->global_ovf_ctrl); + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl); + } +} + +static void core2_vpmu_load(struct vcpu *v) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(v); + + if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) + return; + + __core2_vpmu_load(v); +} + +static int core2_vpmu_alloc_resource(struct vcpu *v) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(v); + struct core2_vpmu_context *core2_vpmu_cxt; + struct core2_pmu_enable *pmu_enable = NULL; + + pmu_enable = xzalloc_bytes(sizeof(struct core2_pmu_enable)); + if ( !pmu_enable ) + return 0; + + if ( is_hvm_domain(v->domain) ) + { + if ( !acquire_pmu_ownership(PMU_OWNER_HVM) ) + goto out_err; + + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0); + if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) ) + goto out_err; + + if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) ) + goto out_err; + vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, + core2_calc_intial_glb_ctrl_msr()); + + core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context)); + if ( !core2_vpmu_cxt ) + goto out_err; + } + else + { + core2_vpmu_cxt = &v->arch.vpmu.xenpmu_data->pmu.intel; + vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED); + } + + core2_vpmu_cxt->pmu_enable = pmu_enable; + vpmu->context = (void *)core2_vpmu_cxt; + + vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED); + + return 1; + +out_err: + xfree(pmu_enable); + vmx_rm_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL); + vmx_rm_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL); + release_pmu_ownship(PMU_OWNER_HVM); + + printk("Failed to allocate VPMU resources for domain %u vcpu %u\n", + v->vcpu_id, v->domain->domain_id); + + return 0; +} + +static void core2_vpmu_save_msr_context(struct vcpu *v, int type, + int index, u64 msr_data) +{ + struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context; + + switch ( type ) + { + case MSR_TYPE_CTRL: + core2_vpmu_cxt->ctrls[index] = msr_data; + break; + case MSR_TYPE_ARCH_CTRL: + core2_vpmu_cxt->arch_msr_pair[index].control = msr_data; + break; + } +} + +static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(current); + + if ( !is_core2_vpmu_msr(msr_index, type, index) ) + return 0; + + if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) && + !core2_vpmu_alloc_resource(current) ) + return 0; + + /* Do the lazy load staff. */ + if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) + { + __core2_vpmu_load(current); + vpmu_set(vpmu, VPMU_CONTEXT_LOADED); + if ( cpu_has_vmx_msr_bitmap && is_hvm_domain(current->domain) ) + core2_vpmu_set_msr_bitmap(current->arch.hvm_vmx.msr_bitmap); + } + return 1; +} + +static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content) +{ + u64 global_ctrl, non_global_ctrl; + char pmu_enable = 0; + int i, tmp; + int type = -1, index = -1; + struct vcpu *v = current; + struct vpmu_struct *vpmu = vcpu_vpmu(v); + struct core2_vpmu_context *core2_vpmu_cxt = NULL; + + if ( !core2_vpmu_msr_common_check(msr, &type, &index) ) + { + /* Special handling for BTS */ + if ( msr == MSR_IA32_DEBUGCTLMSR ) + { + uint64_t supported = IA32_DEBUGCTLMSR_TR | IA32_DEBUGCTLMSR_BTS | + IA32_DEBUGCTLMSR_BTINT; + + if ( cpu_has(¤t_cpu_data, X86_FEATURE_DSCPL) ) + supported |= IA32_DEBUGCTLMSR_BTS_OFF_OS | + IA32_DEBUGCTLMSR_BTS_OFF_USR; + if ( msr_content & supported ) + { + if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) ) + return 1; + gdprintk(XENLOG_WARNING, "Debug Store is not supported on this cpu\n"); + + if ( is_hvm_domain(v->domain) ) + hvm_inject_hw_exception(TRAP_gp_fault, 0); + else + send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault); + + return 0; + } + } + return 0; + } + + core2_vpmu_cxt = vpmu->context; + switch ( msr ) + { + case MSR_CORE_PERF_GLOBAL_OVF_CTRL: + core2_vpmu_cxt->global_ovf_status &= ~msr_content; + core2_vpmu_cxt->global_ovf_ctrl = msr_content; + return 1; + case MSR_CORE_PERF_GLOBAL_STATUS: + gdprintk(XENLOG_INFO, "Can not write readonly MSR: " + "MSR_PERF_GLOBAL_STATUS(0x38E)!\n"); + if ( is_hvm_domain(v->domain) ) + hvm_inject_hw_exception(TRAP_gp_fault, 0); + else + send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault); + return 1; + case MSR_IA32_PEBS_ENABLE: + if ( msr_content & 1 ) + gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS, " + "which is not supported.\n"); + return 1; + case MSR_IA32_DS_AREA: + if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) ) + { + if ( !is_canonical_address(msr_content) ) + { + gdprintk(XENLOG_WARNING, + "Illegal address for IA32_DS_AREA: %#" PRIx64 "x\n", + msr_content); + if ( is_hvm_domain(v->domain) ) + hvm_inject_hw_exception(TRAP_gp_fault, 0); + else + send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault); + return 1; + } + core2_vpmu_cxt->pmu_enable->ds_area_enable = msr_content ? 1 : 0; + break; + } + gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n"); + return 1; + case MSR_CORE_PERF_GLOBAL_CTRL: + global_ctrl = msr_content; + for ( i = 0; i < arch_pmc_cnt; i++ ) + { + rdmsrl(MSR_P6_EVNTSEL0+i, non_global_ctrl); + core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i] + global_ctrl & (non_global_ctrl >> 22) & 1; + global_ctrl >>= 1; + } + + rdmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, non_global_ctrl); + global_ctrl = msr_content >> 32; + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) + { + core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] + (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1: 0); + non_global_ctrl >>= FIXED_CTR_CTRL_BITS; + global_ctrl >>= 1; + } + break; + case MSR_CORE_PERF_FIXED_CTR_CTRL: + non_global_ctrl = msr_content; + if ( is_hvm_domain(v->domain) ) + vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl); + else + rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl); + global_ctrl >>= 32; + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) + { + core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] + (global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1: 0); + non_global_ctrl >>= 4; + global_ctrl >>= 1; + } + break; + default: + tmp = msr - MSR_P6_EVNTSEL0; + if ( is_hvm_domain(v->domain) ) + vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl); + else + rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl); + if ( tmp >= 0 && tmp < arch_pmc_cnt ) + core2_vpmu_cxt->pmu_enable->arch_pmc_enable[tmp] + (global_ctrl >> tmp) & (msr_content >> 22) & 1; + } + + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) + pmu_enable |= core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i]; + for ( i = 0; i < arch_pmc_cnt; i++ ) + pmu_enable |= core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i]; + pmu_enable |= core2_vpmu_cxt->pmu_enable->ds_area_enable; + if ( pmu_enable ) + vpmu_set(vpmu, VPMU_RUNNING); + else + vpmu_reset(vpmu, VPMU_RUNNING); + + if ( is_hvm_domain(v->domain) ) + { + /* Setup LVTPC in local apic */ + if ( vpmu_is_set(vpmu, VPMU_RUNNING) && + is_vlapic_lvtpc_enabled(vcpu_vlapic(v)) ) + { + apic_write_around(APIC_LVTPC, APIC_DM_NMI); + vpmu->hw_lapic_lvtpc = APIC_DM_NMI; + } + else + { + apic_write_around(APIC_LVTPC, APIC_DM_NMI | APIC_LVT_MASKED); + vpmu->hw_lapic_lvtpc = APIC_DM_NMI | APIC_LVT_MASKED; + } + } + + core2_vpmu_save_msr_context(v, type, index, msr_content); + if ( type != MSR_TYPE_GLOBAL ) + { + u64 mask; + int inject_gp = 0; + switch ( type ) + { + case MSR_TYPE_ARCH_CTRL: /* MSR_P6_EVNTSEL[0,...] */ + mask = ~((1ull << 32) - 1); + if (msr_content & mask) + inject_gp = 1; + break; + case MSR_TYPE_CTRL: /* IA32_FIXED_CTR_CTRL */ + if ( msr == MSR_IA32_DS_AREA ) + break; + /* 4 bits per counter, currently 3 fixed counters implemented. */ + mask = ~((1ull << (VPMU_CORE2_NUM_FIXED * FIXED_CTR_CTRL_BITS)) - 1); + if (msr_content & mask) + inject_gp = 1; + break; + case MSR_TYPE_COUNTER: /* IA32_FIXED_CTR[0-2] */ + mask = ~((1ull << core2_get_bitwidth_fix_count()) - 1); + if (msr_content & mask) + inject_gp = 1; + break; + } + + if (inject_gp) + { + if ( is_hvm_domain(v->domain) ) + hvm_inject_hw_exception(TRAP_gp_fault, 0); + else + send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault); + } + else + wrmsrl(msr, msr_content); + } + else + { + if ( is_hvm_domain(v->domain) ) + vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content); + else + { + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content); + core2_vpmu_cxt->global_ctrl = msr_content; + } + } + + return 1; +} + +static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) +{ + int type = -1, index = -1; + struct vcpu *v = current; + struct vpmu_struct *vpmu = vcpu_vpmu(v); + struct core2_vpmu_context *core2_vpmu_cxt = NULL; + + if ( core2_vpmu_msr_common_check(msr, &type, &index) ) + { + core2_vpmu_cxt = vpmu->context; + switch ( msr ) + { + case MSR_CORE_PERF_GLOBAL_OVF_CTRL: + *msr_content = 0; + break; + case MSR_CORE_PERF_GLOBAL_STATUS: + *msr_content = core2_vpmu_cxt->global_ovf_status; + break; + case MSR_CORE_PERF_GLOBAL_CTRL: + if ( is_hvm_domain(v->domain) ) + vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content); + else + rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content); + break; + default: + rdmsrl(msr, *msr_content); + } + } + else + { + /* Extension for BTS */ + if ( msr == MSR_IA32_MISC_ENABLE ) + { + if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) ) + *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL; + } + else + return 0; + } + + return 1; +} + +static void core2_vpmu_do_cpuid(unsigned int input, + unsigned int *eax, unsigned int *ebx, + unsigned int *ecx, unsigned int *edx) +{ + if (input == 0x1) + { + struct vpmu_struct *vpmu = vcpu_vpmu(current); + + if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) ) + { + /* Switch on the ''Debug Store'' feature in CPUID.EAX[1]:EDX[21] */ + *edx |= cpufeat_mask(X86_FEATURE_DS); + if ( cpu_has(¤t_cpu_data, X86_FEATURE_DTES64) ) + *ecx |= cpufeat_mask(X86_FEATURE_DTES64); + if ( cpu_has(¤t_cpu_data, X86_FEATURE_DSCPL) ) + *ecx |= cpufeat_mask(X86_FEATURE_DSCPL); + } + } +} + +/* Dump vpmu info on console, called in the context of keyhandler ''q''. */ +static void core2_vpmu_dump(struct vcpu *v) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(v); + int i; + struct core2_vpmu_context *core2_vpmu_cxt = NULL; + u64 val; + + if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) + return; + + if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ) + { + if ( vpmu_set(vpmu, VPMU_CONTEXT_LOADED) ) + printk(" vPMU loaded\n"); + else + printk(" vPMU allocated\n"); + return; + } + + printk(" vPMU running\n"); + core2_vpmu_cxt = vpmu->context; + + /* Print the contents of the counter and its configuration msr. */ + for ( i = 0; i < arch_pmc_cnt; i++ ) + { + struct arch_msr_pair* msr_pair = core2_vpmu_cxt->arch_msr_pair; + if ( core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i] ) + printk(" general_%d: 0x%016lx ctrl: 0x%016lx\n", + i, msr_pair[i].counter, msr_pair[i].control); + } + /* + * The configuration of the fixed counter is 4 bits each in the + * MSR_CORE_PERF_FIXED_CTR_CTRL. + */ + val = core2_vpmu_cxt->ctrls[MSR_CORE_PERF_FIXED_CTR_CTRL_IDX]; + for ( i = 0; i < VPMU_CORE2_NUM_FIXED; i++ ) + { + if ( core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] ) + printk(" fixed_%d: 0x%016lx ctrl: 0x%lx\n", + i, core2_vpmu_cxt->fix_counters[i], + val & FIXED_CTR_CTRL_MASK); + val >>= FIXED_CTR_CTRL_BITS; + } +} + +static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs) +{ + struct vcpu *v = current; + u64 msr_content; + struct vpmu_struct *vpmu = vcpu_vpmu(v); + struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context; + + rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); + if ( msr_content ) + { + if ( is_pmc_quirk ) + handle_pmc_quirk(msr_content); + core2_vpmu_cxt->global_ovf_status |= msr_content; + msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1); + wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); + } + else + { + /* No PMC overflow but perhaps a Trace Message interrupt. */ + msr_content = __vmread(GUEST_IA32_DEBUGCTL); + if ( !(msr_content & IA32_DEBUGCTLMSR_TR) ) + return 0; + } + + /* HW sets the MASK bit when performance counter interrupt occurs*/ + vpmu->hw_lapic_lvtpc = apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED; + apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc); + + return 1; +} + +static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(v); + u64 msr_content; + struct cpuinfo_x86 *c = ¤t_cpu_data; + + if ( !(vpmu_flags & VPMU_INTEL_BTS) ) + goto func_out; + /* Check the ''Debug Store'' feature in the CPUID.EAX[1]:EDX[21] */ + if ( cpu_has(c, X86_FEATURE_DS) ) + { + if ( !cpu_has(c, X86_FEATURE_DTES64) ) + { + printk(XENLOG_G_WARNING "CPU doesn''t support 64-bit DS Area" + " - Debug Store disabled for d%d:v%d\n", + v->domain->domain_id, v->vcpu_id); + goto func_out; + } + vpmu_set(vpmu, VPMU_CPU_HAS_DS); + rdmsrl(MSR_IA32_MISC_ENABLE, msr_content); + if ( msr_content & MSR_IA32_MISC_ENABLE_BTS_UNAVAIL ) + { + /* If BTS_UNAVAIL is set reset the DS feature. */ + vpmu_reset(vpmu, VPMU_CPU_HAS_DS); + printk(XENLOG_G_WARNING "CPU has set BTS_UNAVAIL" + " - Debug Store disabled for d%d:v%d\n", + v->domain->domain_id, v->vcpu_id); + } + else + { + vpmu_set(vpmu, VPMU_CPU_HAS_BTS); + if ( !cpu_has(c, X86_FEATURE_DSCPL) ) + printk(XENLOG_G_INFO + "vpmu: CPU doesn''t support CPL-Qualified BTS\n"); + printk("******************************************************\n"); + printk("** WARNING: Emulation of BTS Feature is switched on **\n"); + printk("** Using this processor feature in a virtualized **\n"); + printk("** environment is not 100%% safe. **\n"); + printk("** Setting the DS buffer address with wrong values **\n"); + printk("** may lead to hypervisor hangs or crashes. **\n"); + printk("** It is NOT recommended for production use! **\n"); + printk("******************************************************\n"); + } + } +func_out: + + arch_pmc_cnt = core2_get_pmc_count(); + check_pmc_quirk(); + + /* PV domains can allocate resources immediately */ + if ( !is_hvm_domain(v->domain) ) + if ( !core2_vpmu_alloc_resource(v) ) + return 1; + + return 0; +} + +static void core2_vpmu_destroy(struct vcpu *v) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(v); + struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context; + + if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) + return; + + if ( is_hvm_domain(v->domain) ) + { + xfree(core2_vpmu_cxt->pmu_enable); + xfree(vpmu->context); + if ( cpu_has_vmx_msr_bitmap ) + core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap); + } + + release_pmu_ownship(PMU_OWNER_HVM); + vpmu_clear(vpmu); +} + +struct arch_vpmu_ops core2_vpmu_ops = { + .do_wrmsr = core2_vpmu_do_wrmsr, + .do_rdmsr = core2_vpmu_do_rdmsr, + .do_interrupt = core2_vpmu_do_interrupt, + .do_cpuid = core2_vpmu_do_cpuid, + .arch_vpmu_destroy = core2_vpmu_destroy, + .arch_vpmu_save = core2_vpmu_save, + .arch_vpmu_load = core2_vpmu_load, + .arch_vpmu_dump = core2_vpmu_dump +}; + +static void core2_no_vpmu_do_cpuid(unsigned int input, + unsigned int *eax, unsigned int *ebx, + unsigned int *ecx, unsigned int *edx) +{ + /* + * As in this case the vpmu is not enabled reset some bits in the + * architectural performance monitoring related part. + */ + if ( input == 0xa ) + { + *eax &= ~PMU_VERSION_MASK; + *eax &= ~PMU_GENERAL_NR_MASK; + *eax &= ~PMU_GENERAL_WIDTH_MASK; + + *edx &= ~PMU_FIXED_NR_MASK; + *edx &= ~PMU_FIXED_WIDTH_MASK; + } +} + +/* + * If its a vpmu msr set it to 0. + */ +static int core2_no_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) +{ + int type = -1, index = -1; + if ( !is_core2_vpmu_msr(msr, &type, &index) ) + return 0; + *msr_content = 0; + return 1; +} + +/* + * These functions are used in case vpmu is not enabled. + */ +struct arch_vpmu_ops core2_no_vpmu_ops = { + .do_rdmsr = core2_no_vpmu_do_rdmsr, + .do_cpuid = core2_no_vpmu_do_cpuid, +}; + +int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags) +{ + struct vpmu_struct *vpmu = vcpu_vpmu(v); + uint8_t family = current_cpu_data.x86; + uint8_t cpu_model = current_cpu_data.x86_model; + int ret = 0; + + vpmu->arch_vpmu_ops = &core2_no_vpmu_ops; + if ( vpmu_flags == VPMU_OFF ) + return 0; + + if ( family == 6 ) + { + switch ( cpu_model ) + { + /* Core2: */ + case 0x0f: /* original 65 nm celeron/pentium/core2/xeon, "Merom"/"Conroe" */ + case 0x16: /* single-core 65 nm celeron/core2solo "Merom-L"/"Conroe-L" */ + case 0x17: /* 45 nm celeron/core2/xeon "Penryn"/"Wolfdale" */ + case 0x1d: /* six-core 45 nm xeon "Dunnington" */ + + case 0x2a: /* SandyBridge */ + case 0x2d: /* SandyBridge, "Romley-EP" */ + + /* Nehalem: */ + case 0x1a: /* 45 nm nehalem, "Bloomfield" */ + case 0x1e: /* 45 nm nehalem, "Lynnfield", "Clarksfield", "Jasper Forest" */ + case 0x2e: /* 45 nm nehalem-ex, "Beckton" */ + + /* Westmere: */ + case 0x25: /* 32 nm nehalem, "Clarkdale", "Arrandale" */ + case 0x2c: /* 32 nm nehalem, "Gulftown", "Westmere-EP" */ + case 0x27: /* 32 nm Westmere-EX */ + + case 0x3a: /* IvyBridge */ + case 0x3e: /* IvyBridge EP */ + case 0x3c: /* Haswell */ + ret = core2_vpmu_initialise(v, vpmu_flags); + if ( !ret ) + vpmu->arch_vpmu_ops = &core2_vpmu_ops; + return ret; + } + } + + printk("VPMU: Initialization failed. " + "Intel processor family %d model %d has not " + "been supported\n", family, cpu_model); + return -EINVAL; +} + diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index 4f2247e..0b79d39 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -8,6 +8,7 @@ #include <asm/hvm/domain.h> #include <asm/e820.h> #include <asm/mce.h> +#include <asm/vpmu.h> #include <public/vcpu.h> #define has_32bit_shinfo(d) ((d)->arch.has_32bit_shinfo) diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h index 5971613..beb959f 100644 --- a/xen/include/asm-x86/hvm/vmx/vmcs.h +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h @@ -20,7 +20,6 @@ #define __ASM_X86_HVM_VMX_VMCS_H__ #include <asm/hvm/io.h> -#include <asm/hvm/vpmu.h> #include <irq_vectors.h> extern void vmcs_dump_vcpu(struct vcpu *v); diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h deleted file mode 100644 index f7f507f..0000000 --- a/xen/include/asm-x86/hvm/vpmu.h +++ /dev/null @@ -1,97 +0,0 @@ -/* - * vpmu.h: PMU virtualization for HVM domain. - * - * Copyright (c) 2007, Intel Corporation. - * - * This program is free software; you can redistribute it and/or modify it - * under the terms and conditions of the GNU General Public License, - * version 2, as published by the Free Software Foundation. - * - * This program is distributed in the hope it will be useful, but WITHOUT - * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or - * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for - * more details. - * - * You should have received a copy of the GNU General Public License along with - * this program; if not, write to the Free Software Foundation, Inc., 59 Temple - * Place - Suite 330, Boston, MA 02111-1307 USA. - * - * Author: Haitao Shan <haitao.shan@intel.com> - */ - -#ifndef __ASM_X86_HVM_VPMU_H_ -#define __ASM_X86_HVM_VPMU_H_ - -#include <public/xenpmu.h> - - -#define vcpu_vpmu(vcpu) (&((vcpu)->arch.vpmu)) -#define vpmu_vcpu(vpmu) (container_of((vpmu), struct vcpu, \ - arch.vpmu)) - -#define MSR_TYPE_COUNTER 0 -#define MSR_TYPE_CTRL 1 -#define MSR_TYPE_GLOBAL 2 -#define MSR_TYPE_ARCH_COUNTER 3 -#define MSR_TYPE_ARCH_CTRL 4 - - -/* Arch specific operations shared by all vpmus */ -struct arch_vpmu_ops { - int (*do_wrmsr)(unsigned int msr, uint64_t msr_content); - int (*do_rdmsr)(unsigned int msr, uint64_t *msr_content); - int (*do_interrupt)(struct cpu_user_regs *regs); - void (*do_cpuid)(unsigned int input, - unsigned int *eax, unsigned int *ebx, - unsigned int *ecx, unsigned int *edx); - void (*arch_vpmu_destroy)(struct vcpu *v); - int (*arch_vpmu_save)(struct vcpu *v); - void (*arch_vpmu_load)(struct vcpu *v); - void (*arch_vpmu_dump)(struct vcpu *v); -}; - -int vmx_vpmu_initialise(struct vcpu *, unsigned int flags); -int svm_vpmu_initialise(struct vcpu *, unsigned int flags); - -struct vpmu_struct { - u32 flags; - u32 last_pcpu; - u32 hw_lapic_lvtpc; - void *context; - struct arch_vpmu_ops *arch_vpmu_ops; - xenpmu_data_t *xenpmu_data; -}; - -/* VPMU states */ -#define VPMU_CONTEXT_ALLOCATED 0x1 -#define VPMU_CONTEXT_LOADED 0x2 -#define VPMU_RUNNING 0x4 -#define VPMU_CONTEXT_SAVE 0x8 /* Force context save */ -#define VPMU_FROZEN 0x10 /* Stop counters while VCPU is not running */ -#define VPMU_PASSIVE_DOMAIN_ALLOCATED 0x20 -#define VPMU_WAIT_FOR_FLUSH 0x40 /* PV guest waits for XENPMU_flush */ - -#define vpmu_set(_vpmu, _x) ((_vpmu)->flags |= (_x)) -#define vpmu_reset(_vpmu, _x) ((_vpmu)->flags &= ~(_x)) -#define vpmu_is_set(_vpmu, _x) ((_vpmu)->flags & (_x)) -#define vpmu_is_set_all(_vpmu, _x) (((_vpmu)->flags & (_x)) == (_x)) -#define vpmu_clear(_vpmu) ((_vpmu)->flags = 0) - -int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content); -int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content); -int vpmu_do_interrupt(struct cpu_user_regs *regs); -void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, - unsigned int *ecx, unsigned int *edx); -void vpmu_initialise(struct vcpu *v); -void vpmu_destroy(struct vcpu *v); -void vpmu_save(struct vcpu *v); -void vpmu_load(struct vcpu *v); -void vpmu_dump(struct vcpu *v); - -extern int acquire_pmu_ownership(int pmu_ownership); -extern void release_pmu_ownership(int pmu_ownership); - -extern uint32_t vpmu_mode; - -#endif /* __ASM_X86_HVM_VPMU_H_*/ - diff --git a/xen/include/asm-x86/vpmu.h b/xen/include/asm-x86/vpmu.h new file mode 100644 index 0000000..f7f507f --- /dev/null +++ b/xen/include/asm-x86/vpmu.h @@ -0,0 +1,97 @@ +/* + * vpmu.h: PMU virtualization for HVM domain. + * + * Copyright (c) 2007, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + * + * Author: Haitao Shan <haitao.shan@intel.com> + */ + +#ifndef __ASM_X86_HVM_VPMU_H_ +#define __ASM_X86_HVM_VPMU_H_ + +#include <public/xenpmu.h> + + +#define vcpu_vpmu(vcpu) (&((vcpu)->arch.vpmu)) +#define vpmu_vcpu(vpmu) (container_of((vpmu), struct vcpu, \ + arch.vpmu)) + +#define MSR_TYPE_COUNTER 0 +#define MSR_TYPE_CTRL 1 +#define MSR_TYPE_GLOBAL 2 +#define MSR_TYPE_ARCH_COUNTER 3 +#define MSR_TYPE_ARCH_CTRL 4 + + +/* Arch specific operations shared by all vpmus */ +struct arch_vpmu_ops { + int (*do_wrmsr)(unsigned int msr, uint64_t msr_content); + int (*do_rdmsr)(unsigned int msr, uint64_t *msr_content); + int (*do_interrupt)(struct cpu_user_regs *regs); + void (*do_cpuid)(unsigned int input, + unsigned int *eax, unsigned int *ebx, + unsigned int *ecx, unsigned int *edx); + void (*arch_vpmu_destroy)(struct vcpu *v); + int (*arch_vpmu_save)(struct vcpu *v); + void (*arch_vpmu_load)(struct vcpu *v); + void (*arch_vpmu_dump)(struct vcpu *v); +}; + +int vmx_vpmu_initialise(struct vcpu *, unsigned int flags); +int svm_vpmu_initialise(struct vcpu *, unsigned int flags); + +struct vpmu_struct { + u32 flags; + u32 last_pcpu; + u32 hw_lapic_lvtpc; + void *context; + struct arch_vpmu_ops *arch_vpmu_ops; + xenpmu_data_t *xenpmu_data; +}; + +/* VPMU states */ +#define VPMU_CONTEXT_ALLOCATED 0x1 +#define VPMU_CONTEXT_LOADED 0x2 +#define VPMU_RUNNING 0x4 +#define VPMU_CONTEXT_SAVE 0x8 /* Force context save */ +#define VPMU_FROZEN 0x10 /* Stop counters while VCPU is not running */ +#define VPMU_PASSIVE_DOMAIN_ALLOCATED 0x20 +#define VPMU_WAIT_FOR_FLUSH 0x40 /* PV guest waits for XENPMU_flush */ + +#define vpmu_set(_vpmu, _x) ((_vpmu)->flags |= (_x)) +#define vpmu_reset(_vpmu, _x) ((_vpmu)->flags &= ~(_x)) +#define vpmu_is_set(_vpmu, _x) ((_vpmu)->flags & (_x)) +#define vpmu_is_set_all(_vpmu, _x) (((_vpmu)->flags & (_x)) == (_x)) +#define vpmu_clear(_vpmu) ((_vpmu)->flags = 0) + +int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content); +int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content); +int vpmu_do_interrupt(struct cpu_user_regs *regs); +void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, + unsigned int *ecx, unsigned int *edx); +void vpmu_initialise(struct vcpu *v); +void vpmu_destroy(struct vcpu *v); +void vpmu_save(struct vcpu *v); +void vpmu_load(struct vcpu *v); +void vpmu_dump(struct vcpu *v); + +extern int acquire_pmu_ownership(int pmu_ownership); +extern void release_pmu_ownership(int pmu_ownership); + +extern uint32_t vpmu_mode; + +#endif /* __ASM_X86_HVM_VPMU_H_*/ + -- 1.8.1.4
>>> On 10.09.13 at 17:20, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > This version has following limitations: > * For accurate profiling of dom0/Xen dom0 VCPUs should be pinned. > * Hypervisor code is only profiled on processors that have running dom0 VCPUs > on them.With that I assume this is an RFC rather than full-fledged submission? Jan
On 09/10/2013 11:34 AM, Jan Beulich wrote:>>>> On 10.09.13 at 17:20, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: >> This version has following limitations: >> * For accurate profiling of dom0/Xen dom0 VCPUs should be pinned. >> * Hypervisor code is only profiled on processors that have running dom0 VCPUs >> on them. > With that I assume this is an RFC rather than full-fledged submission?I was thinking that this would be something like stage 1 implementation (and probably should have mentioned this in the cover letter). For this stage I wanted to confine all changes on Linux side to xen subtrees. Properly addressing the above limitation would likely require changes in non-xen sources (change in perf file format, remote MSR access etc.). -boris
>>> On 10.09.13 at 17:20, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > --- a/xen/arch/x86/Makefile > +++ b/xen/arch/x86/Makefile > @@ -102,11 +102,11 @@ $(BASEDIR)/common/symbols-dummy.o: > $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o > $(LD) $(LDFLAGS) -T xen.lds -N prelink.o \ > $(BASEDIR)/common/symbols-dummy.o -o $(@D)/.$(@F).0 > - $(NM) -n $(@D)/.$(@F).0 | $(BASEDIR)/tools/symbols >$(@D)/.$(@F).0.S > + $(NM) -n $(@D)/.$(@F).0 | $(BASEDIR)/tools/symbols --all-symbols >$(@D)/.$(@F).0.SFor one I can''t see what use data symbols have for performance analysis. And then I''m opposed to growing the symbol table size unconditionally for no good reason.> --- a/xen/include/public/platform.h > +++ b/xen/include/public/platform.h > @@ -527,6 +527,26 @@ struct xenpf_core_parking { > typedef struct xenpf_core_parking xenpf_core_parking_t; > DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t); > > +#define XENPF_get_symbols 61 > + > +#define XENSYMS_SZ 4096This doesn''t appear to belong into the public interface.> +struct xenpf_symdata { > + /* > + * offset into Xen''s symbol data and symbol number from > + * last call. Used only by Xen. > + */ > + uint64_t xen_offset; > + uint64_t xen_symnum;I wonder whether that''s really a suitable mechanism.> + > + /* > + * Symbols data, formatted similar to /proc/kallsyms: > + * <address> <type> <name> > + */ > + XEN_GUEST_HANDLE(char) buf;This is too simplistic: Please use a proper structure here, to allow switching the internal symbol table representation (which I have on my todo list) without having to mimic old behavior. Jan
Jan Beulich
2013-Sep-11 07:58 UTC
Re: [PATCH v1 02/13] Set VCPU''s is_running flag closer to when the VCPU is dispatched
>>> On 10.09.13 at 17:20, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -1219,8 +1219,14 @@ static void schedule(void) > * switch, else lost_records resume will not work properly. > */ > > - ASSERT(!next->is_running); > - next->is_running = 1; > +#ifdef CONFIG_X86 > + if ( is_idle_vcpu(next) ) > + /* On x86 guests will set is_running right before they start running. */ > +#endif > + { > + ASSERT(!next->is_running); > + next->is_running = 1; > + }I''m not sure the change as a whole is appropriate in the first place (the patch description is all but assuring that this change doesn''t have unexpected side effects, namely in the individual schedulers), but this clearly is a no-go: Either do this consistently for ARM too, or drop the idea. Jan
> --- /dev/null > +++ b/xen/include/public/xenpmu.hThis new file is completely unacceptable as a public header.> @@ -0,0 +1,101 @@ > +#ifndef __XEN_PUBLIC_XENPMU_H__ > +#define __XEN_PUBLIC_XENPMU_H__ > + > +#include <asm/msr.h>This is a no-go.> + > +#include "xen.h" > + > +#define XENPMU_VER_MAJ 0 > +#define XENPMU_VER_MIN 0 > + > +/* VPMU modes */ > +#define VPMU_MODE_MASK 0xffAll of these defines would need XEN_ prefixes (and types would similarly need xen_). And there ought to be some association in the comment to the field(s) that these constants would actually go into: From a cursory look I can''t see this.> +#define VPMU_OFF 0 > +/* guests can profile themselves, (dom0 profiles itself and Xen) */Comment style.> +#define VPMU_ON (1<<0) > +/* > + * Only dom0 has access to VPMU and it profiles everyone: itself, > + * the hypervisor and the guests. > + */ > +#define VPMU_PRIV (1<<1) > + > +/* VPMU flags */ > +#define VPMU_FLAGS_MASK ((uint32_t)(~VPMU_MODE_MASK)) > +#define VPMU_INTEL_BTS (1<<8) /* Ignored on AMD */ > + > + > +/* AMD PMU registers and structures */ > +#define F10H_NUM_COUNTERS 4 > +#define F15H_NUM_COUNTERS 6 > +/* To accommodate more counters in the future (e.g. NB counters) */ > +#define MAX_NUM_COUNTERS 16Perhaps better to have the number of counters in the structure?> +struct amd_vpmu_context { > + uint64_t counters[MAX_NUM_COUNTERS]; > + uint64_t ctrls[MAX_NUM_COUNTERS]; > + uint8_t msr_bitmap_set; > +}; > + > + > +/* Intel PMU registers and structures */ > +static const uint32_t core2_fix_counters_msr[] = {You''re kidding, aren''t you? Jan
On Tue, 2013-09-10 at 11:21 -0400, Boris Ostrovsky wrote:> > diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h > new file mode 100644 > index 0000000..420b674 > --- /dev/null > +++ b/xen/include/public/xenpmu.hWhen adding new public interfaces please can we mark them up such that they get properly linked into the generate docs[1] xen/include/public/event_channel.h is an example of a header which is already marked up and "make -C docs html/hypercall/index.htm" will build you a copy in docs/html. [1] http://xenbits.xen.org/docs/unstable/hypercall/index.html
On 09/11/2013 03:51 AM, Jan Beulich wrote:>>>> On 10.09.13 at 17:20, Boris Ostrovsky<boris.ostrovsky@oracle.com> wrote: >> --- a/xen/arch/x86/Makefile >> +++ b/xen/arch/x86/Makefile >> @@ -102,11 +102,11 @@ $(BASEDIR)/common/symbols-dummy.o: >> $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o >> $(LD) $(LDFLAGS) -T xen.lds -N prelink.o \ >> $(BASEDIR)/common/symbols-dummy.o -o $(@D)/.$(@F).0 >> - $(NM) -n $(@D)/.$(@F).0 | $(BASEDIR)/tools/symbols >$(@D)/.$(@F).0.S >> + $(NM) -n $(@D)/.$(@F).0 | $(BASEDIR)/tools/symbols --all-symbols >$(@D)/.$(@F).0.S > For one I can''t see what use data symbols have for performance > analysis.They are used by perf, similarly to kallsyms (usually when debug symbols from binary are not available)> And then I''m opposed to growing the symbol table size > unconditionally for no good reason.I think I can remove --all-symbols, it is not strictly necessary for what I plan now for perf. We may need to add it later, possibly with a config option.>> --- a/xen/include/public/platform.h >> +++ b/xen/include/public/platform.h >> @@ -527,6 +527,26 @@ struct xenpf_core_parking { >> typedef struct xenpf_core_parking xenpf_core_parking_t; >> DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t); >> >> +#define XENPF_get_symbols 61 >> + >> +#define XENSYMS_SZ 4096 > This doesn''t appear to belong into the public interface.Linux driver needs to know size of the buffer that is passed from the hypervisir. I suppose I can just use PAGE_SIZE.>> +struct xenpf_symdata { >> + /* >> + * offset into Xen''s symbol data and symbol number from >> + * last call. Used only by Xen. >> + */ >> + uint64_t xen_offset; >> + uint64_t xen_symnum; > I wonder whether that''s really a suitable mechanism.Why do you think this is not suitable? Linux needs to keep track of position in the symbol table while it is walking over the file, otherwise we will need to keep the state in hypervisor which is much less desirable.> >> + >> + /* >> + * Symbols data, formatted similar to /proc/kallsyms: >> + * <address> <type> <name> >> + */ >> + XEN_GUEST_HANDLE(char) buf; > This is too simplistic: Please use a proper structure here, to allow > switching the internal symbol table representation (which I have on > my todo list) without having to mimic old behavior.I don''t think I know what you are referring to here. -boris
On 09/11/2013 04:13 AM, Jan Beulich wrote:>> --- /dev/null >> +++ b/xen/include/public/xenpmu.h > This new file is completely unacceptable as a public header.Is this the same comment as IanC made? Mark up public interfaces for doc generation? Or something else?> >> @@ -0,0 +1,101 @@ >> +#ifndef __XEN_PUBLIC_XENPMU_H__ >> +#define __XEN_PUBLIC_XENPMU_H__ >> + >> +#include <asm/msr.h> > This is a no-go. > >> + >> +#include "xen.h" >> + >> +#define XENPMU_VER_MAJ 0 >> +#define XENPMU_VER_MIN 0 >> + >> +/* VPMU modes */ >> +#define VPMU_MODE_MASK 0xff > All of these defines would need XEN_ prefixes (and types would > similarly need xen_). And there ought to be some association in the > comment to the field(s) that these constants would actually go into: > From a cursory look I can''t see this. > >> +#define VPMU_OFF 0 >> +/* guests can profile themselves, (dom0 profiles itself and Xen) */ > Comment style. > >> +#define VPMU_ON (1<<0) >> +/* >> + * Only dom0 has access to VPMU and it profiles everyone: itself, >> + * the hypervisor and the guests. >> + */ >> +#define VPMU_PRIV (1<<1) >> + >> +/* VPMU flags */ >> +#define VPMU_FLAGS_MASK ((uint32_t)(~VPMU_MODE_MASK)) >> +#define VPMU_INTEL_BTS (1<<8) /* Ignored on AMD */ >> + >> + >> +/* AMD PMU registers and structures */ >> +#define F10H_NUM_COUNTERS 4 >> +#define F15H_NUM_COUNTERS 6 >> +/* To accommodate more counters in the future (e.g. NB counters) */ >> +#define MAX_NUM_COUNTERS 16 > Perhaps better to have the number of counters in the structure? > >> +struct amd_vpmu_context { >> + uint64_t counters[MAX_NUM_COUNTERS]; >> + uint64_t ctrls[MAX_NUM_COUNTERS]; >> + uint8_t msr_bitmap_set; >> +}; >> + >> + >> +/* Intel PMU registers and structures */ >> +static const uint32_t core2_fix_counters_msr[] = { > You''re kidding, aren''t you?This was moved from vpmu.h. I will change it to enum (or remove altogether). -boris
>>> On 11.09.13 at 15:55, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > On 09/11/2013 03:51 AM, Jan Beulich wrote: >>>>> On 10.09.13 at 17:20, Boris Ostrovsky<boris.ostrovsky@oracle.com> wrote: >>> --- a/xen/arch/x86/Makefile >>> +++ b/xen/arch/x86/Makefile >>> @@ -102,11 +102,11 @@ $(BASEDIR)/common/symbols-dummy.o: >>> $(TARGET)-syms: prelink.o xen.lds $(BASEDIR)/common/symbols-dummy.o >>> $(LD) $(LDFLAGS) -T xen.lds -N prelink.o \ >>> $(BASEDIR)/common/symbols-dummy.o -o $(@D)/.$(@F).0 >>> - $(NM) -n $(@D)/.$(@F).0 | $(BASEDIR)/tools/symbols >$(@D)/.$(@F).0.S >>> + $(NM) -n $(@D)/.$(@F).0 | $(BASEDIR)/tools/symbols --all-symbols > >$(@D)/.$(@F).0.S >> For one I can''t see what use data symbols have for performance >> analysis. > > They are used by perf, similarly to kallsyms (usually when debug symbols > from binary are not available)Right, but I specifically said _data_ symbols. I can see what code ones are going to be used for.>> And then I''m opposed to growing the symbol table size >> unconditionally for no good reason. > > I think I can remove --all-symbols, it is not strictly necessary for > what I plan now for > perf. We may need to add it later, possibly with a config option.Yes, please, as long as they''re not really useful.>>> --- a/xen/include/public/platform.h >>> +++ b/xen/include/public/platform.h >>> @@ -527,6 +527,26 @@ struct xenpf_core_parking { >>> typedef struct xenpf_core_parking xenpf_core_parking_t; >>> DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t); >>> >>> +#define XENPF_get_symbols 61 >>> + >>> +#define XENSYMS_SZ 4096 >> This doesn''t appear to belong into the public interface. > > Linux driver needs to know size of the buffer that is passed from > the hypervisir. I suppose I can just use PAGE_SIZE.Buffer? Passed from the hypervisor? And no, there''s no PAGE_SIZE in the public interface as far as I''m aware.>>> +struct xenpf_symdata { >>> + /* >>> + * offset into Xen''s symbol data and symbol number from >>> + * last call. Used only by Xen. >>> + */ >>> + uint64_t xen_offset; >>> + uint64_t xen_symnum; >> I wonder whether that''s really a suitable mechanism. > > Why do you think this is not suitable? > > Linux needs to keep track of position in the symbol table while > it is walking over the file, otherwise we will need to keep the state > in hypervisor which is much less desirable.This could be as simple as a "give me the n-th symbol" interface. The handler in the hypervisor could cache the last symbol together with the associated data (with the assumption that there''s only ever going to be one iteration in progress), invalidating the cache if the coming in index isn''t one greater than the last one processed. All the caching of course is only necessary if otherwise lookup times aren''t acceptable.>>> + >>> + /* >>> + * Symbols data, formatted similar to /proc/kallsyms: >>> + * <address> <type> <name> >>> + */ >>> + XEN_GUEST_HANDLE(char) buf; >> This is too simplistic: Please use a proper structure here, to allow >> switching the internal symbol table representation (which I have on >> my todo list) without having to mimic old behavior. > > I don''t think I know what you are referring to here.Rather than having a handle to a simply byte array, you ought to have a handle to a structure containing address, type, and (pointer to/handle of) name. Jan
>>> On 11.09.13 at 16:03, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > On 09/11/2013 04:13 AM, Jan Beulich wrote: >>> --- /dev/null >>> +++ b/xen/include/public/xenpmu.h >> This new file is completely unacceptable as a public header. > > Is this the same comment as IanC made? Mark up public interfaces for > doc generation? > > Or something else?Something else - see the other comments I had given down to about the middle of the file. I stopped there, assuming I provided enough examples of what is bad. Jan
On 09/11/2013 10:12 AM, Jan Beulich wrote:> >>>> --- a/xen/include/public/platform.h >>>> +++ b/xen/include/public/platform.h >>>> @@ -527,6 +527,26 @@ struct xenpf_core_parking { >>>> typedef struct xenpf_core_parking xenpf_core_parking_t; >>>> DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t); >>>> >>>> +#define XENPF_get_symbols 61 >>>> + >>>> +#define XENSYMS_SZ 4096 >>> This doesn''t appear to belong into the public interface. >> Linux driver needs to know size of the buffer that is passed from >> the hypervisir. I suppose I can just use PAGE_SIZE. > Buffer? Passed from the hypervisor?As it is written now, we pass XENSYMS_SZ worth of (formatted) symbol information to dom0.> And no, there''s no PAGE_SIZE in the public interface as far as I''m > aware. > >>>> +struct xenpf_symdata { >>>> + /* >>>> + * offset into Xen''s symbol data and symbol number from >>>> + * last call. Used only by Xen. >>>> + */ >>>> + uint64_t xen_offset; >>>> + uint64_t xen_symnum; >>> I wonder whether that''s really a suitable mechanism. >> Why do you think this is not suitable? >> >> Linux needs to keep track of position in the symbol table while >> it is walking over the file, otherwise we will need to keep the state >> in hypervisor which is much less desirable. > This could be as simple as a "give me the n-th symbol" interface. > The handler in the hypervisor could cache the last symbol > together with the associated data (with the assumption that there''s > only ever going to be one iteration in progress), invalidating the > cache if the coming in index isn''t one greater than the last one > processed. All the caching of course is only necessary if otherwise > lookup times aren''t acceptable.That would be just having xen_symnum (and caching xen_offset in the hypervisor).>>>> + >>>> + /* >>>> + * Symbols data, formatted similar to /proc/kallsyms: >>>> + * <address> <type> <name> >>>> + */ >>>> + XEN_GUEST_HANDLE(char) buf; >>> This is too simplistic: Please use a proper structure here, to allow >>> switching the internal symbol table representation (which I have on >>> my todo list) without having to mimic old behavior. >> I don''t think I know what you are referring to here. > Rather than having a handle to a simply byte array, you ought > to have a handle to a structure containing address, type, and > (pointer to/handle of) name. >Are you suggesting passing symbols one per hypercall? That''s over 4000 hypercalls per one file read. How about requesting N next symbols? -boris
>>> On 11.09.13 at 16:57, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > On 09/11/2013 10:12 AM, Jan Beulich wrote: >> >>>>> --- a/xen/include/public/platform.h >>>>> +++ b/xen/include/public/platform.h >>>>> @@ -527,6 +527,26 @@ struct xenpf_core_parking { >>>>> typedef struct xenpf_core_parking xenpf_core_parking_t; >>>>> DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t); >>>>> >>>>> +#define XENPF_get_symbols 61 >>>>> + >>>>> +#define XENSYMS_SZ 4096 >>>> This doesn''t appear to belong into the public interface. >>> Linux driver needs to know size of the buffer that is passed from >>> the hypervisir. I suppose I can just use PAGE_SIZE. >> Buffer? Passed from the hypervisor? > > As it is written now, we pass XENSYMS_SZ worth of (formatted) symbol > information to dom0.Right, that what I understood, and that''s what I want to avoid.>>>>> + /* >>>>> + * Symbols data, formatted similar to /proc/kallsyms: >>>>> + * <address> <type> <name> >>>>> + */ >>>>> + XEN_GUEST_HANDLE(char) buf; >>>> This is too simplistic: Please use a proper structure here, to allow >>>> switching the internal symbol table representation (which I have on >>>> my todo list) without having to mimic old behavior. >>> I don''t think I know what you are referring to here. >> Rather than having a handle to a simply byte array, you ought >> to have a handle to a structure containing address, type, and >> (pointer to/handle of) name. >> > > Are you suggesting passing symbols one per hypercall? That''s over 4000 > hypercalls per one file read. How about requesting N next symbols?That''d be fine too, but could be almost equally achieved with multi-calls. Jan
On 10/09/13 16:47, Boris Ostrovsky wrote:> On 09/10/2013 11:34 AM, Jan Beulich wrote: >>>>> On 10.09.13 at 17:20, Boris Ostrovsky <boris.ostrovsky@oracle.com> >>>>> wrote: >>> This version has following limitations: >>> * For accurate profiling of dom0/Xen dom0 VCPUs should be pinned. >>> * Hypervisor code is only profiled on processors that have running >>> dom0 VCPUs >>> on them. >> With that I assume this is an RFC rather than full-fledged submission? > > I was thinking that this would be something like stage 1 > implementation (and > probably should have mentioned this in the cover letter). > > For this stage I wanted to confine all changes on Linux side to xen > subtrees. > Properly addressing the above limitation would likely require changes > in non-xen > sources (change in perf file format, remote MSR access etc.).I think having the vpmu stuff for PV guests is a great idea, and from a quick skim through I don''t have any problems with the general approach. (Obviously some more detailed review will be needed.) However, I''m not a fan of this method of collecting perf stuff for Xen and other VMs together in the cpu buffers for dom0. I think it''s ugly, fragile, and non-scalable, and I would prefer to see if we could implement the same feature (allowing perf to analyze Xen and other vcpus) some other way. And I would rather not use it as a "stage 1", for fear that it would become entrenched. I think at the hackathon we discussed the idea of having "fake" cpus -- each of which would correspond to either a pcpu with Xen, or a vcpu of another domain. How problematic is that approach? For phase 1 can we just do vpmu for PV guests (and add hooks to allow domains to profile themselves), and look into how to profile Xen and other VMs as a stage 2? -George
On 09/11/2013 01:01 PM, George Dunlap wrote:> On 10/09/13 16:47, Boris Ostrovsky wrote: >> On 09/10/2013 11:34 AM, Jan Beulich wrote: >>>>>> On 10.09.13 at 17:20, Boris Ostrovsky >>>>>> <boris.ostrovsky@oracle.com> wrote: >>>> This version has following limitations: >>>> * For accurate profiling of dom0/Xen dom0 VCPUs should be pinned. >>>> * Hypervisor code is only profiled on processors that have running >>>> dom0 VCPUs >>>> on them. >>> With that I assume this is an RFC rather than full-fledged submission? >> >> I was thinking that this would be something like stage 1 >> implementation (and >> probably should have mentioned this in the cover letter). >> >> For this stage I wanted to confine all changes on Linux side to xen >> subtrees. >> Properly addressing the above limitation would likely require changes >> in non-xen >> sources (change in perf file format, remote MSR access etc.). > > I think having the vpmu stuff for PV guests is a great idea, and from > a quick skim through I don''t have any problems with the general > approach. (Obviously some more detailed review will be needed.) > > However, I''m not a fan of this method of collecting perf stuff for Xen > and other VMs together in the cpu buffers for dom0. I think it''s > ugly, fragile, and non-scalable, and I would prefer to see if we could > implement the same feature (allowing perf to analyze Xen and other > vcpus) some other way. And I would rather not use it as a "stage 1", > for fear that it would become entrenched.I can see how collecting samples for other domains may be questionable now (DOM0_PRIV mode) since at this stage there is no way to distinguish between samples for non-priviledged domains. But why do you think that getting data for both dom0 and Xen is problematic? Someone has to process Xen''s samples and who would do this if not dom0? We could store samples in separate files (e.g. perf.data.dom0 and perf.data.xen) but that''s toolstack''s job.> > I think at the hackathon we discussed the idea of having "fake" cpus > -- each of which would correspond to either a pcpu with Xen, or a vcpu > of another domain. How problematic is that approach?This is what I was planning to do later. Those would be "fake" CPUs in the sense that their cpuids would be something like (vcpuID | domainID) or (PCPU|<sometag>). But it would be a natural extension of what is being done now.> For phase 1 can we just do vpmu for PV guests (and add hooks to allow > domains to profile themselves), and look into how to profile Xen and > other VMs as a stage 2?-boris
On 11/09/13 19:22, Boris Ostrovsky wrote:> On 09/11/2013 01:01 PM, George Dunlap wrote: >> On 10/09/13 16:47, Boris Ostrovsky wrote: >>> On 09/10/2013 11:34 AM, Jan Beulich wrote: >>>>>>> On 10.09.13 at 17:20, Boris Ostrovsky >>>>>>> <boris.ostrovsky@oracle.com> wrote: >>>>> This version has following limitations: >>>>> * For accurate profiling of dom0/Xen dom0 VCPUs should be pinned. >>>>> * Hypervisor code is only profiled on processors that have running >>>>> dom0 VCPUs >>>>> on them. >>>> With that I assume this is an RFC rather than full-fledged submission? >>> >>> I was thinking that this would be something like stage 1 >>> implementation (and >>> probably should have mentioned this in the cover letter). >>> >>> For this stage I wanted to confine all changes on Linux side to xen >>> subtrees. >>> Properly addressing the above limitation would likely require >>> changes in non-xen >>> sources (change in perf file format, remote MSR access etc.). >> >> I think having the vpmu stuff for PV guests is a great idea, and from >> a quick skim through I don''t have any problems with the general >> approach. (Obviously some more detailed review will be needed.) >> >> However, I''m not a fan of this method of collecting perf stuff for >> Xen and other VMs together in the cpu buffers for dom0. I think it''s >> ugly, fragile, and non-scalable, and I would prefer to see if we >> could implement the same feature (allowing perf to analyze Xen and >> other vcpus) some other way. And I would rather not use it as a >> "stage 1", for fear that it would become entrenched. > > I can see how collecting samples for other domains may be questionable > now (DOM0_PRIV mode) since at this stage there is no way to > distinguish between samples for non-priviledged domains. > > But why do you think that getting data for both dom0 and Xen is > problematic? Someone has to process Xen''s samples and who would do > this if not dom0? We could store samples in separate files (e.g. > perf.data.dom0 and perf.data.xen) but that''s toolstack''s job.It''s not so much about dom0 collecting the samples and passing them on to the analysis tools; this is already what xenalyze does, in essence. It''s about the requirement of having the dom0 vcpus pinned 1-1 to physical cpus: both limiting the flexibility for scheduling, and limiting the configuration flexibility wrt having dom0 vcpus < pcpus. That is what seems an ugly hack to me -- having dom0 sort of try to do something that requires hypervisor-level privileges and making a bit of a mess of it. I''m unfortunately not familiar enough with the perf system to know exactly what it is that Linux needs to do (why, for example, you think it would need remote MSR access if dom0 weren''t pinned), and how hard would be for Xen just to do that work, and provide an "adapter" that would translate Xen-specific stuff into something perf could consume. Would it be possible, for example, for dom0 to specify what needed to be collected, for Xen to generate the samples in a Xen-specific format, and then have something in dom0 that would separate the samples into one file per domain that look similar enough to a trace file that the perf system could consume it? -George
On 09/12/2013 05:39 AM, George Dunlap wrote:> On 11/09/13 19:22, Boris Ostrovsky wrote: >> On 09/11/2013 01:01 PM, George Dunlap wrote: >>> On 10/09/13 16:47, Boris Ostrovsky wrote: >>>> On 09/10/2013 11:34 AM, Jan Beulich wrote: >>>>>>>> On 10.09.13 at 17:20, Boris Ostrovsky >>>>>>>> <boris.ostrovsky@oracle.com> wrote: >>>>>> This version has following limitations: >>>>>> * For accurate profiling of dom0/Xen dom0 VCPUs should be pinned. >>>>>> * Hypervisor code is only profiled on processors that have >>>>>> running dom0 VCPUs >>>>>> on them. >>>>> With that I assume this is an RFC rather than full-fledged >>>>> submission? >>>> >>>> I was thinking that this would be something like stage 1 >>>> implementation (and >>>> probably should have mentioned this in the cover letter). >>>> >>>> For this stage I wanted to confine all changes on Linux side to xen >>>> subtrees. >>>> Properly addressing the above limitation would likely require >>>> changes in non-xen >>>> sources (change in perf file format, remote MSR access etc.). >>> >>> I think having the vpmu stuff for PV guests is a great idea, and >>> from a quick skim through I don''t have any problems with the general >>> approach. (Obviously some more detailed review will be needed.) >>> >>> However, I''m not a fan of this method of collecting perf stuff for >>> Xen and other VMs together in the cpu buffers for dom0. I think >>> it''s ugly, fragile, and non-scalable, and I would prefer to see if >>> we could implement the same feature (allowing perf to analyze Xen >>> and other vcpus) some other way. And I would rather not use it as a >>> "stage 1", for fear that it would become entrenched. >> >> I can see how collecting samples for other domains may be >> questionable now (DOM0_PRIV mode) since at this stage there is no way >> to distinguish between samples for non-priviledged domains. >> >> But why do you think that getting data for both dom0 and Xen is >> problematic? Someone has to process Xen''s samples and who would do >> this if not dom0? We could store samples in separate files (e.g. >> perf.data.dom0 and perf.data.xen) but that''s toolstack''s job. > > It''s not so much about dom0 collecting the samples and passing them on > to the analysis tools; this is already what xenalyze does, in > essence. It''s about the requirement of having the dom0 vcpus pinned > 1-1 to physical cpus: both limiting the flexibility for scheduling, > and limiting the configuration flexibility wrt having dom0 vcpus < > pcpus. That is what seems an ugly hack to me -- having dom0 sort of > try to do something that requires hypervisor-level privileges and > making a bit of a mess of it.I probably should have explained the limitations better in the original message. Pinning: The only reason this version requires pinning is because I haven''t provided hooks in Linux perf code to store both PCPU and VCPU of the sample in the perf_sample_data. And I didn''t do so this because this would need to be done outside of arch/x86/xen and I decided not to go there for this stage. So for now perf still only knows about CPUs, not PCPUs or VCPUs. Note that hypervisor already provides information about both P/VCPUs to dom0 (*) so so when I fix what I described above in Linux (kernel and perf toolstack) the right association of P/VCPUs will start working. And pinning is not really *required*. If you don''t pin you will not get accurate sample distribution of hypervisor samples in perf. For instance, if Xen''s foo() was sampled on PCPU0 and then PCPU1 while dom0''s VCPU0 was running on each of them perf will assime that both samples were taken on CPU0. Note again: CPU0, not P- or VCPU0). #VCPUs < #PCPUs This is different from pinning. The issue here is that tools (e.g. perf) need to access the PMU''s MSR. And they do it with something like wrmsr(msr, value), and they assume that they are programming PMU on current processor. So if a dom0''s VCPU never runs on some PCPU it currently cannot program the PMU there. One way to address this could be to have wrmsr_cpu(cpu, msr, value). And presumably on bare metal this will be patched over with regular wrmsr. (*) Well, it doesn''t. Because I forgot to add this to the code (it''s one line, really) but I will in the next version.> > I''m unfortunately not familiar enough with the perf system to know > exactly what it is that Linux needs to do (why, for example, you think > it would need remote MSR access if dom0 weren''t pinned),Remote MSR access is needed not because of pinning but because the tool (perf, or any other tool for that matter) needs to program the PMU on non-dom0 processors.> and how hard would be for Xen just to do that work, and provide an > "adapter" that would translate Xen-specific stuff into something perf > could consume. Would it be possible, for example, for dom0 to specify > what needed to be collected, for Xen to generate the samples in a > Xen-specific format, and then have something in dom0 that would > separate the samples into one file per domain that look similar enough > to a trace file that the perf system could consume it?Perf calculates sampling period on each sample and writes resulting value into the counter MSR (I haven''t looked yet at how it uses other performance facilities such as PEBS, IBS and such). Processing sample data is done by the toolstack and is relatively easy, we don''t need Xen-specific format (once we fix the pinning issue so we know to whom a sample belongs). Programming PMU HW from exiting perf code is the challenge. Thanks. -boris