thr3ads.net - Xen devel - [PATCH v2 00/13] x86/PMU: Xen PMU PV support [Sep 2013]

If this information is useful, please help other people find it:
Share via:

Boris Ostrovsky

2013-Sep-20 09:41 UTC

[PATCH v2 00/13] x86/PMU: Xen PMU PV support

Changes in v2:

* Xen symbols are exported as data structure (as opoosed to a set of formatted
strings in v1). Even though one symbol per hypercall is returned performance
appears
to be acceptable: reading whole file from dom0 userland takes on average about
twice as long as reading /proc/kallsyms

* More cleanup of Intel VPMU code to simplify publicly exported structures

* There is an architecture-independent and x86-specific public include files
(ARM
has a stub)

* General cleanup of public include files to make them more presentable (and
to make auto doc generation better)

* Setting of vcpu->is_running is now done on ARM in schedule_tail as well
(making
changes to common/schedule.c architecture-independent). Note that this is not
tested since I don''t have access to ARM hardware.

* PCPU ID of interrupted processor is now passed to PV guest


Linux patches will be updated later.

==================================================================
Boris Ostrovsky (13):
  Export hypervisor symbols
  Set VCPU''s is_running flag closer to when the VCPU is dispatched
  x86/PMU: Stop AMD counters when called from vpmu_save_force()
  x86/VPMU: Minor VPMU cleanup
  intel/VPMU: Clean up Intel VPMU code
  x86/PMU: Add public xenpmu.h
  x86/PMU: Make vpmu not HVM-specific
  x86/PMU: Interface for setting PMU mode and flags
  x86/PMU: Initialize PMU for PV guests
  x86/PMU: Add support for PMU registes handling on PV guests
  x86/PMU: Handle PMU interrupts for PV guests
  x86/PMU: Save VPMU state for PV guests during context switch
  x86/PMU: Move vpmu files up from hvm directory

 xen/arch/arm/domain.c                          |   1 +
 xen/arch/x86/Makefile                          |   1 +
 xen/arch/x86/apic.c                            |  13 -
 xen/arch/x86/domain.c                          |  18 +-
 xen/arch/x86/hvm/Makefile                      |   1 -
 xen/arch/x86/hvm/svm/Makefile                  |   1 -
 xen/arch/x86/hvm/svm/entry.S                   |   2 +
 xen/arch/x86/hvm/svm/vpmu.c                    | 494 -------------
 xen/arch/x86/hvm/vmx/Makefile                  |   1 -
 xen/arch/x86/hvm/vmx/entry.S                   |   1 +
 xen/arch/x86/hvm/vmx/vmcs.c                    |  59 ++
 xen/arch/x86/hvm/vmx/vpmu_core2.c              | 894 -----------------------
 xen/arch/x86/hvm/vpmu.c                        | 266 -------
 xen/arch/x86/oprofile/op_model_ppro.c          |   8 +-
 xen/arch/x86/platform_hypercall.c              |   9 +
 xen/arch/x86/traps.c                           |  39 +-
 xen/arch/x86/vpmu.c                            | 549 +++++++++++++++
 xen/arch/x86/vpmu_amd.c                        | 489 +++++++++++++
 xen/arch/x86/vpmu_intel.c                      | 936 +++++++++++++++++++++++++
 xen/arch/x86/x86_64/asm-offsets.c              |   1 +
 xen/arch/x86/x86_64/compat/entry.S             |   4 +
 xen/arch/x86/x86_64/entry.S                    |   4 +
 xen/arch/x86/x86_64/platform_hypercall.c       |   2 +-
 xen/common/event_channel.c                     |   1 +
 xen/common/schedule.c                          |   8 +-
 xen/common/symbols-dummy.c                     |   1 +
 xen/common/symbols.c                           |  58 +-
 xen/include/asm-x86/domain.h                   |   3 +
 xen/include/asm-x86/hvm/vcpu.h                 |   3 -
 xen/include/asm-x86/hvm/vmx/vmcs.h             |   3 +-
 xen/include/asm-x86/hvm/vmx/vpmu_core2.h       |  51 --
 xen/include/asm-x86/hvm/vpmu.h                 | 104 ---
 xen/include/asm-x86/irq.h                      |   1 -
 xen/include/asm-x86/mach-default/irq_vectors.h |   1 -
 xen/include/asm-x86/vpmu.h                     |  96 +++
 xen/include/public/arch-x86/xenpmu-x86.h       |  62 ++
 xen/include/public/platform.h                  |  22 +
 xen/include/public/xen.h                       |   2 +
 xen/include/public/xenpmu.h                    |  95 +++
 xen/include/xen/hypercall.h                    |   4 +
 xen/include/xen/softirq.h                      |   1 +
 xen/include/xen/symbols.h                      |   4 +
 xen/tools/symbols.c                            |   4 +
 43 files changed, 2467 insertions(+), 1850 deletions(-)
 delete mode 100644 xen/arch/x86/hvm/svm/vpmu.c
 delete mode 100644 xen/arch/x86/hvm/vmx/vpmu_core2.c
 delete mode 100644 xen/arch/x86/hvm/vpmu.c
 create mode 100644 xen/arch/x86/vpmu.c
 create mode 100644 xen/arch/x86/vpmu_amd.c
 create mode 100644 xen/arch/x86/vpmu_intel.c
 delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
 delete mode 100644 xen/include/asm-x86/hvm/vpmu.h
 create mode 100644 xen/include/asm-x86/vpmu.h
 create mode 100644 xen/include/public/arch-x86/xenpmu-x86.h
 create mode 100644 xen/include/public/xenpmu.h

-- 
1.8.1.4

Boris Ostrovsky

2013-Sep-20 09:42 UTC

head link

[PATCH v2 01/13] Export hypervisor symbols

Export Xen''s symbols in format similar to Linux''
/proc/kallsyms.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/platform_hypercall.c        |  9 +++++
 xen/arch/x86/x86_64/platform_hypercall.c |  2 +-
 xen/common/symbols-dummy.c               |  1 +
 xen/common/symbols.c                     | 58 ++++++++++++++++++++++++++++++--
 xen/include/public/platform.h            | 22 ++++++++++++
 xen/include/xen/symbols.h                |  4 +++
 xen/tools/symbols.c                      |  4 +++
 7 files changed, 97 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/platform_hypercall.c
b/xen/arch/x86/platform_hypercall.c
index 7175a82..39376fe 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -23,6 +23,7 @@
 #include <xen/cpu.h>
 #include <xen/pmstat.h>
 #include <xen/irq.h>
+#include <xen/symbols.h>
 #include <asm/current.h>
 #include <public/platform.h>
 #include <acpi/cpufreq/processor_perf.h>
@@ -597,6 +598,14 @@ ret_t
do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
     }
     break;
 
+    case XENPF_get_symbols:
+    {
+        ret = xensyms_read(&op->u.symdata);
+        if ( ret >= 0 && __copy_field_to_guest(u_xenpf_op, op,
u.symdata) )
+            ret = -EFAULT;
+    }
+    break;
+ 
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/arch/x86/x86_64/platform_hypercall.c
b/xen/arch/x86/x86_64/platform_hypercall.c
index aa2ad54..9ef705a 100644
--- a/xen/arch/x86/x86_64/platform_hypercall.c
+++ b/xen/arch/x86/x86_64/platform_hypercall.c
@@ -35,7 +35,7 @@ CHECK_pf_pcpu_version;
 #undef xen_pf_pcpu_version
 
 #define xenpf_enter_acpi_sleep compat_pf_enter_acpi_sleep
-
+#define xenpf_symdata	compat_pf_symdata
 #define COMPAT
 #define _XEN_GUEST_HANDLE(t) XEN_GUEST_HANDLE(t)
 #define _XEN_GUEST_HANDLE_PARAM(t) XEN_GUEST_HANDLE_PARAM(t)
diff --git a/xen/common/symbols-dummy.c b/xen/common/symbols-dummy.c
index 5090c3b..52a86c7 100644
--- a/xen/common/symbols-dummy.c
+++ b/xen/common/symbols-dummy.c
@@ -12,6 +12,7 @@ const unsigned int symbols_offsets[1];
 const unsigned long symbols_addresses[1];
 #endif
 const unsigned int symbols_num_syms;
+const unsigned long symbols_names_bytes;
 const u8 symbols_names[1];
 
 const u8 symbols_token_table[1];
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index 83b2b58..e74a585 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -17,6 +17,8 @@
 #include <xen/lib.h>
 #include <xen/string.h>
 #include <xen/spinlock.h>
+#include <public/platform.h>
+#include <xen/guest_access.h>
 
 #ifdef SYMBOLS_ORIGIN
 extern const unsigned int symbols_offsets[1];
@@ -26,6 +28,7 @@ extern const unsigned long symbols_addresses[];
 #define symbols_address(n) symbols_addresses[n]
 #endif
 extern const unsigned int symbols_num_syms;
+extern const unsigned long symbols_names_bytes;
 extern const u8 symbols_names[];
 
 extern const u8 symbols_token_table[];
@@ -35,7 +38,8 @@ extern const unsigned int symbols_markers[];
 
 /* expand a compressed symbol data into the resulting uncompressed string,
    given the offset to where the symbol is in the compressed stream */
-static unsigned int symbols_expand_symbol(unsigned int off, char *result)
+static unsigned int symbols_expand_symbol(unsigned int off, char *result,
+                                          int maxlen)
 {
     int len, skipped_first = 0;
     const u8 *tptr, *data;
@@ -49,6 +53,9 @@ static unsigned int symbols_expand_symbol(unsigned int off,
char *result)
      * the compressed stream */
     off += len + 1;
 
+    if (maxlen < len)
+        len = maxlen;
+
     /* for every byte on the compressed symbol data, copy the table
        entry for that byte */
     while(len) {
@@ -129,7 +136,7 @@ const char *symbols_lookup(unsigned long addr,
         --low;
 
         /* Grab name */
-    symbols_expand_symbol(get_symbol_offset(low), namebuf);
+    symbols_expand_symbol(get_symbol_offset(low), namebuf, sizeof(namebuf));
 
     /* Search for next non-aliased symbol */
     for (i = low + 1; i < symbols_num_syms; i++) {
@@ -174,3 +181,50 @@ void __print_symbol(const char *fmt, unsigned long address)
 
     spin_unlock_irqrestore(&lock, flags);
 }
+
+/*
+ * Get symbol type information. This is encoded as a single char at the
+ * beginning of the symbol name.
+ */
+static char symbols_get_symbol_type(unsigned int off)
+{
+    /*
+     * Get just the first code, look it up in the token table,
+     * and return the first char from this token.
+     */
+    return symbols_token_table[symbols_token_index[symbols_names[off + 1]]];
+}
+
+/*
+ * Symbols are most likely accessed sequentially so we remember position from
+ * previous read. This can help us avoid extra call to get_symbol_offset().
+ */
+static uint64_t next_symbol, next_offset;
+static DEFINE_SPINLOCK(symbols_mutex);
+
+int xensyms_read(struct xenpf_symdata *symdata)
+{
+    if ( symdata->xen_symnum > symbols_num_syms )
+        return -EINVAL;
+    else if ( symdata->xen_symnum == symbols_num_syms )
+        return 0;
+
+    spin_lock(&symbols_mutex);
+
+    if ( symdata->xen_symnum == 0 )
+        next_offset = next_symbol = 0;
+    else if ( next_symbol != symdata->xen_symnum )
+        /* Non-sequential access */
+        next_offset = get_symbol_offset(symdata->xen_symnum);
+
+    symdata->type = symbols_get_symbol_type(next_offset);
+    next_offset = symbols_expand_symbol(next_offset, symdata->name,
+        sizeof(symdata->name));
+    symdata->address = symbols_offsets[symdata->xen_symnum] +
SYMBOLS_ORIGIN;
+
+    next_symbol = symdata->xen_symnum + 1;
+
+    spin_unlock(&symbols_mutex);
+
+    return strlen(symdata->name);
+}
diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
index 4341f54..870e14b 100644
--- a/xen/include/public/platform.h
+++ b/xen/include/public/platform.h
@@ -527,6 +527,27 @@ struct xenpf_core_parking {
 typedef struct xenpf_core_parking xenpf_core_parking_t;
 DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
 
+#define XENPF_get_symbols  61
+
+struct xenpf_symdata {
+    /* IN variables */
+    uint64_t xen_symnum;
+
+    /* OUT variables */
+    uint64_t address;
+    uint64_t type;
+    /*
+     * KSYM_NAME_LEN is 128 bytes. However, we cannot be larger than pad in
+     * xen_platform_op below (which is 128 bytes as well). Since the largest
+     * symbol is around 50 bytes it''s probably more trouble than
it''s worth
+     * to try to deal with symbols that are close to 128 bytes in length.
+     */
+#define XEN_SYMS_MAX_LEN (128 - 3 * 8)
+    char name[XEN_SYMS_MAX_LEN];
+};
+typedef struct xenpf_symdata xenpf_symdata_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);
+
 /*
  * ` enum neg_errnoval
  * ` HYPERVISOR_platform_op(const struct xen_platform_op*);
@@ -553,6 +574,7 @@ struct xen_platform_op {
         struct xenpf_cpu_hotadd        cpu_add;
         struct xenpf_mem_hotadd        mem_add;
         struct xenpf_core_parking      core_parking;
+        struct xenpf_symdata           symdata;
         uint8_t                        pad[128];
     } u;
 };
diff --git a/xen/include/xen/symbols.h b/xen/include/xen/symbols.h
index 37cf6bf..c8df28f 100644
--- a/xen/include/xen/symbols.h
+++ b/xen/include/xen/symbols.h
@@ -2,6 +2,8 @@
 #define _XEN_SYMBOLS_H
 
 #include <xen/types.h>
+#include <public/xen.h>
+#include <public/platform.h>
 
 #define KSYM_NAME_LEN 127
 
@@ -34,4 +36,6 @@ do {						\
 	__print_symbol(fmt, addr);		\
 } while(0)
 
+extern int xensyms_read(struct xenpf_symdata *symdata);
+
 #endif /*_XEN_SYMBOLS_H*/
diff --git a/xen/tools/symbols.c b/xen/tools/symbols.c
index f39c906..818204d 100644
--- a/xen/tools/symbols.c
+++ b/xen/tools/symbols.c
@@ -272,6 +272,10 @@ static void write_src(void)
 	}
 	printf("\n");
 
+	output_label("symbols_names_bytes");
+	printf("\t.long\t%d\n", off);
+	printf("\n");
+
 	output_label("symbols_markers");
 	for (i = 0; i < ((table_cnt + 255) >> 8); i++)
 		printf("\t.long\t%d\n", markers[i]);
-- 
1.8.1.4

Boris Ostrovsky

2013-Sep-20 09:42 UTC

head link

[PATCH v2 02/13] Set VCPU''s is_running flag closer to when the VCPU is dispatched

An interrupt handler happening during new VCPU scheduling may want to know
who was on the (physical) processor at the point of the interrupt. Just
looking at ''current'' may not be accurate since there is a
window of time when
''current'' points to new VCPU and its is_running flag is set
but the VCPU has
not been dispatched yet. More importantly, on Intel processors, if the handler
wants to examine certain state of an HVM VCPU (such as segment registers) the
VMCS pointer is not set yet.

This patch will move setting the is_running flag to a later point.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/arm/domain.c             | 1 +
 xen/arch/x86/domain.c             | 1 +
 xen/arch/x86/hvm/svm/entry.S      | 2 ++
 xen/arch/x86/hvm/vmx/entry.S      | 1 +
 xen/arch/x86/x86_64/asm-offsets.c | 1 +
 xen/common/schedule.c             | 8 ++++++--
 6 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 373c7b3..94a6bd4 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -229,6 +229,7 @@ static void schedule_tail(struct vcpu *prev)
     ctxt_switch_from(prev);
 
     ctxt_switch_to(current);
+    current->is_running = 1;
 
     local_irq_enable();
 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 874742c..e119d7b 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -142,6 +142,7 @@ static void continue_nonidle_domain(struct vcpu *v)
 {
     check_wakeup_from_wait();
     mark_regs_dirty(guest_cpu_user_regs());
+    v->is_running = 1;
     reset_stack_and_jump(ret_from_intr);
 }
 
diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
index 1969629..728e773 100644
--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -74,6 +74,8 @@ UNLIKELY_END(svm_trace)
 
         mov  VCPU_svm_vmcb_pa(%rbx),%rax
 
+        movb $1,VCPU_is_running(%rbx)
+
         pop  %r15
         pop  %r14
         pop  %r13
diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
index 496a62c..9e33f45 100644
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -125,6 +125,7 @@ UNLIKELY_END(realmode)
         mov  $GUEST_RFLAGS,%eax
         VMWRITE(UREGS_eflags)
 
+        movb $1,VCPU_is_running(%rbx)
         cmpb $0,VCPU_vmx_launched(%rbx)
         pop  %r15
         pop  %r14
diff --git a/xen/arch/x86/x86_64/asm-offsets.c
b/xen/arch/x86/x86_64/asm-offsets.c
index b0098b3..9fa06c0 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -86,6 +86,7 @@ void __dummy__(void)
     OFFSET(VCPU_kernel_sp, struct vcpu, arch.pv_vcpu.kernel_sp);
     OFFSET(VCPU_kernel_ss, struct vcpu, arch.pv_vcpu.kernel_ss);
     OFFSET(VCPU_guest_context_flags, struct vcpu, arch.vgc_flags);
+    OFFSET(VCPU_is_running, struct vcpu, is_running);
     OFFSET(VCPU_nmi_pending, struct vcpu, nmi_pending);
     OFFSET(VCPU_mce_pending, struct vcpu, mce_pending);
     OFFSET(VCPU_nmi_old_mask, struct vcpu, nmi_state.old_mask);
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index a8398bd..32c26e8 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1219,8 +1219,12 @@ static void schedule(void)
      * switch, else lost_records resume will not work properly.
      */
 
-    ASSERT(!next->is_running);
-    next->is_running = 1;
+    if ( is_idle_vcpu(next) )
+    /* Non-idle cpus set is_running right before they start running. */
+    {
+        ASSERT(!next->is_running);
+        next->is_running = 1;
+    }
 
     pcpu_schedule_unlock_irq(cpu);
 
-- 
1.8.1.4

Boris Ostrovsky

2013-Sep-20 09:42 UTC

head link

[PATCH v2 03/13] x86/PMU: Stop AMD counters when called from vpmu_save_force()

Change amd_vpmu_save() algorithm to accommodate cases when we need
to stop counters from vpmu_save_force() (needed by subsequent PMU
patches).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/arch/x86/hvm/svm/vpmu.c | 14 ++++----------
 xen/arch/x86/hvm/vpmu.c     | 12 ++++++------
 2 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 4d1fbc8..5d9c3f5 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -223,22 +223,16 @@ static int amd_vpmu_save(struct vcpu *v)
     struct amd_vpmu_context *ctx = vpmu->context;
     unsigned int i;
 
-    /*
-     * Stop the counters. If we came here via vpmu_save_force (i.e.
-     * when VPMU_CONTEXT_SAVE is set) counters are already stopped.
-     */
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
+    if ( !vpmu_is_set(vpmu, VPMU_FROZEN) )
     {
-        vpmu_set(vpmu, VPMU_FROZEN);
-
         for ( i = 0; i < num_counters; i++ )
             wrmsrl(ctrls[i], 0);
 
-        return 0;
+        vpmu_set(vpmu, VPMU_FROZEN);
     }
 
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-        return 0;
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
+            return 0;
 
     context_save(v);
 
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 21fbaba..a4e3664 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -127,13 +127,19 @@ static void vpmu_save_force(void *arg)
     struct vcpu *v = (struct vcpu *)arg;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return;
+
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
         return;
 
+    vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+
     if ( vpmu->arch_vpmu_ops )
         (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v);
 
     vpmu_reset(vpmu, VPMU_CONTEXT_SAVE);
+    vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
 
     per_cpu(last_vcpu, smp_processor_id()) = NULL;
 }
@@ -177,12 +183,8 @@ void vpmu_load(struct vcpu *v)
          * before saving the context.
          */
         if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-        {
-            vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
             on_selected_cpus(cpumask_of(vpmu->last_pcpu),
                              vpmu_save_force, (void *)v, 1);
-            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
-        }
     } 
 
     /* Prevent forced context save from remote CPU */
@@ -195,9 +197,7 @@ void vpmu_load(struct vcpu *v)
         vpmu = vcpu_vpmu(prev);
 
         /* Someone ran here before us */
-        vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
         vpmu_save_force(prev);
-        vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
 
         vpmu = vcpu_vpmu(v);
     }
-- 
1.8.1.4

Boris Ostrovsky

2013-Sep-20 09:42 UTC

head link

[PATCH v2 04/13] x86/VPMU: Minor VPMU cleanup

Update macros that modify VPMU flags to allow changing multiple bits at once.

Make sure that we only touch MSR bitmap on HVM guests (both VMX and SVM). This
is needed by subsequent PMU patches.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/arch/x86/hvm/svm/vpmu.c       | 14 +++++++++-----
 xen/arch/x86/hvm/vmx/vpmu_core2.c |  9 +++------
 xen/arch/x86/hvm/vpmu.c           | 11 +++--------
 xen/include/asm-x86/hvm/vpmu.h    |  9 +++++----
 4 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 5d9c3f5..a09930e 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -236,7 +236,8 @@ static int amd_vpmu_save(struct vcpu *v)
 
     context_save(v);
 
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && ctx->msr_bitmap_set )
+    if ( is_hvm_domain(v->domain) && 
+        !vpmu_is_set(vpmu, VPMU_RUNNING) && ctx->msr_bitmap_set )
         amd_vpmu_unset_msr_bitmap(v);
 
     return 1;
@@ -276,7 +277,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t
msr_content)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
     /* For all counters, enable guest only mode for HVM guest */
-    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
+    if ( is_hvm_domain(v->domain) && (get_pmu_reg_type(msr) ==
MSR_TYPE_CTRL) &&
         !(is_guest_mode(msr_content)) )
     {
         set_guest_mode(msr_content);
@@ -292,7 +293,8 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t
msr_content)
         apic_write(APIC_LVTPC, PMU_APIC_VECTOR);
         vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
 
-        if ( !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set
)
+        if ( is_hvm_domain(v->domain) &&
+             !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set
)
             amd_vpmu_set_msr_bitmap(v);
     }
 
@@ -303,7 +305,8 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t
msr_content)
         apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
         vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
         vpmu_reset(vpmu, VPMU_RUNNING);
-        if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+        if ( is_hvm_domain(v->domain) &&
+             ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
             amd_vpmu_unset_msr_bitmap(v);
         release_pmu_ownship(PMU_OWNER_HVM);
     }
@@ -395,7 +398,8 @@ static void amd_vpmu_destroy(struct vcpu *v)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
 
-    if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+    if ( is_hvm_domain(v->domain) &&
+         ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
         amd_vpmu_unset_msr_bitmap(v);
 
     xfree(vpmu->context);
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c
b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 15b2036..101888d 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -305,10 +305,7 @@ static int core2_vpmu_save(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
-        return 0;
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) ) 
+    if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
         return 0;
 
     __core2_vpmu_save(v);
@@ -420,7 +417,7 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int
*type, int *index)
     {
         __core2_vpmu_load(current);
         vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-        if ( cpu_has_vmx_msr_bitmap )
+        if ( cpu_has_vmx_msr_bitmap &&
is_hvm_domain(current->domain) )
             core2_vpmu_set_msr_bitmap(current->arch.hvm_vmx.msr_bitmap);
     }
     return 1;
@@ -786,7 +783,7 @@ static void core2_vpmu_destroy(struct vcpu *v)
         return;
     xfree(core2_vpmu_cxt->pmu_enable);
     xfree(vpmu->context);
-    if ( cpu_has_vmx_msr_bitmap )
+    if ( cpu_has_vmx_msr_bitmap && is_hvm_domain(v->domain) )
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
     release_pmu_ownship(PMU_OWNER_HVM);
     vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index a4e3664..d6a9ff6 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -127,10 +127,7 @@ static void vpmu_save_force(void *arg)
     struct vcpu *v = (struct vcpu *)arg;
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return;
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+    if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
         return;
 
     vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
@@ -138,8 +135,7 @@ static void vpmu_save_force(void *arg)
     if ( vpmu->arch_vpmu_ops )
         (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v);
 
-    vpmu_reset(vpmu, VPMU_CONTEXT_SAVE);
-    vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+    vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
 
     per_cpu(last_vcpu, smp_processor_id()) = NULL;
 }
@@ -149,8 +145,7 @@ void vpmu_save(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     int pcpu = smp_processor_id();
 
-    if ( !(vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) &&
-           vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)) )
+    if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
        return;
 
     vpmu->last_pcpu = pcpu;
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 03b9462..674cdad 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -81,10 +81,11 @@ struct vpmu_struct {
 #define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
 
 
-#define vpmu_set(_vpmu, _x)    ((_vpmu)->flags |= (_x))
-#define vpmu_reset(_vpmu, _x)  ((_vpmu)->flags &= ~(_x))
-#define vpmu_is_set(_vpmu, _x) ((_vpmu)->flags & (_x))
-#define vpmu_clear(_vpmu)      ((_vpmu)->flags = 0)
+#define vpmu_set(_vpmu, _x)         ((_vpmu)->flags |= (_x))
+#define vpmu_reset(_vpmu, _x)       ((_vpmu)->flags &= ~(_x))
+#define vpmu_is_set(_vpmu, _x)      ((_vpmu)->flags & (_x))
+#define vpmu_is_set_all(_vpmu, _x)  (((_vpmu)->flags & (_x)) == (_x))
+#define vpmu_clear(_vpmu)           ((_vpmu)->flags = 0)
 
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content);
 int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
-- 
1.8.1.4

Boris Ostrovsky

2013-Sep-20 09:42 UTC

head link

[PATCH v2 05/13] intel/VPMU: Clean up Intel VPMU code

Remove struct pmumsr and core2_pmu_enable. Replace static MSR structures with
fields in core2_vpmu_context.

Call core2_get_pmc_count() once, during initialization.

Properly clean up when core2_vpmu_alloc_resource() fails and add routines
to remove MSRs from VMCS.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c              |  59 +++++++
 xen/arch/x86/hvm/vmx/vpmu_core2.c        | 289 ++++++++++++++-----------------
 xen/include/asm-x86/hvm/vmx/vmcs.h       |   2 +
 xen/include/asm-x86/hvm/vmx/vpmu_core2.h |  19 --
 4 files changed, 191 insertions(+), 178 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index de9f592..756bc13 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1136,6 +1136,36 @@ int vmx_add_guest_msr(u32 msr)
     return 0;
 }
 
+void vmx_rm_guest_msr(u32 msr)
+{
+    struct vcpu *curr = current;
+    unsigned int i, idx, msr_count = curr->arch.hvm_vmx.msr_count;
+    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
+
+    if ( msr_area == NULL )
+        return;
+    
+    for ( idx = 0; idx < msr_count; idx++ )
+        if ( msr_area[idx].index == msr )
+            break;
+
+    if ( idx == msr_count )
+        return;
+    
+    for ( i = idx; i < msr_count - 1; i++ )
+    {
+        msr_area[i].index = msr_area[i + 1].index;
+        rdmsrl(msr_area[i].index, msr_area[i].data);
+    }
+    msr_area[msr_count - 1].index = 0;
+
+    curr->arch.hvm_vmx.msr_count = --msr_count;
+    __vmwrite(VM_EXIT_MSR_STORE_COUNT, msr_count);
+    __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, msr_count);
+
+    return;
+}
+
 int vmx_add_host_load_msr(u32 msr)
 {
     struct vcpu *curr = current;
@@ -1166,6 +1196,35 @@ int vmx_add_host_load_msr(u32 msr)
     return 0;
 }
 
+void vmx_rm_host_load_msr(u32 msr)
+{
+    struct vcpu *curr = current;
+    unsigned int i, idx,  msr_count = curr->arch.hvm_vmx.host_msr_count;
+    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.host_msr_area;
+
+    if ( msr_area == NULL )
+        return;
+
+    for ( idx = 0; idx < msr_count; idx++ )
+        if ( msr_area[idx].index == msr )
+            break;
+
+    if ( idx == msr_count )
+        return;
+    
+    for ( i = idx; i < msr_count - 1; i++ )
+    {
+        msr_area[i].index = msr_area[i + 1].index;
+        rdmsrl(msr_area[i].index, msr_area[i].data);
+    }
+    msr_area[msr_count - 1].index = 0;    
+
+    curr->arch.hvm_vmx.host_msr_count = --msr_count;
+    __vmwrite(VM_EXIT_MSR_LOAD_COUNT, msr_count);
+
+    return;
+}
+
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector)
 {
     int index, offset, changed;
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c
b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 101888d..50f784f 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -65,6 +65,26 @@
 #define PMU_FIXED_WIDTH_MASK     (((1 << PMU_FIXED_WIDTH_BITS) -1)
<< PMU_FIXED_WIDTH_SHIFT)
 
 /*
+ * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
+ * counters. 4 bits for every counter.
+ */
+#define FIXED_CTR_CTRL_BITS 4
+#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
+
+#define VPMU_CORE2_MAX_FIXED_PMCS     4
+struct core2_vpmu_context {
+    u64 fixed_ctrl;
+    u64 ds_area;
+    u64 pebs_enable;
+    u64 global_ovf_status;
+    u64 fix_counters[VPMU_CORE2_MAX_FIXED_PMCS];
+    struct arch_msr_pair arch_msr_pair[1];
+};
+
+static int arch_pmc_cnt; /* Number of general-purpose performance counters */
+static int fixed_pmc_cnt; /* Number of fixed performance counters */
+
+/*
  * QUIRK to workaround an issue on various family 6 cpus.
  * The issue leads to endless PMC interrupt loops on the processor.
  * If the interrupt handler is running and a pmc reaches the value 0, this
@@ -84,11 +104,8 @@ static void check_pmc_quirk(void)
         is_pmc_quirk = 0;    
 }
 
-static int core2_get_pmc_count(void);
 static void handle_pmc_quirk(u64 msr_content)
 {
-    int num_gen_pmc = core2_get_pmc_count();
-    int num_fix_pmc  = 3;
     int i;
     u64 val;
 
@@ -96,7 +113,7 @@ static void handle_pmc_quirk(u64 msr_content)
         return;
 
     val = msr_content;
-    for ( i = 0; i < num_gen_pmc; i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
         if ( val & 0x1 )
         {
@@ -108,7 +125,7 @@ static void handle_pmc_quirk(u64 msr_content)
         val >>= 1;
     }
     val = msr_content >> 32;
-    for ( i = 0; i < num_fix_pmc; i++ )
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
         if ( val & 0x1 )
         {
@@ -121,65 +138,32 @@ static void handle_pmc_quirk(u64 msr_content)
     }
 }
 
-static const u32 core2_fix_counters_msr[] = {
-    MSR_CORE_PERF_FIXED_CTR0,
-    MSR_CORE_PERF_FIXED_CTR1,
-    MSR_CORE_PERF_FIXED_CTR2
-};
-
 /*
- * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
- * counters. 4 bits for every counter.
+ * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
  */
-#define FIXED_CTR_CTRL_BITS 4
-#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
-
-/* The index into the core2_ctrls_msr[] of this MSR used in core2_vpmu_dump()
*/
-#define MSR_CORE_PERF_FIXED_CTR_CTRL_IDX 0
-
-/* Core 2 Non-architectual Performance Control MSRs. */
-static const u32 core2_ctrls_msr[] = {
-    MSR_CORE_PERF_FIXED_CTR_CTRL,
-    MSR_IA32_PEBS_ENABLE,
-    MSR_IA32_DS_AREA
-};
-
-struct pmumsr {
-    unsigned int num;
-    const u32 *msr;
-};
-
-static const struct pmumsr core2_fix_counters = {
-    VPMU_CORE2_NUM_FIXED,
-    core2_fix_counters_msr
-};
+static int core2_get_arch_pmc_count(void)
+{
+    u32 eax, ebx, ecx, edx;
 
-static const struct pmumsr core2_ctrls = {
-    VPMU_CORE2_NUM_CTRLS,
-    core2_ctrls_msr
-};
-static int arch_pmc_cnt;
+    cpuid(0xa, &eax, &ebx, &ecx, &edx);
+    return ( (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT );
+}
 
 /*
- * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
+ * Read the number of fixed counters via CPUID.EDX[0xa].EDX[0..4]
  */
-static int core2_get_pmc_count(void)
+static int core2_get_fixed_pmc_count(void)
 {
     u32 eax, ebx, ecx, edx;
 
-    if ( arch_pmc_cnt == 0 )
-    {
-        cpuid(0xa, &eax, &ebx, &ecx, &edx);
-        arch_pmc_cnt = (eax & PMU_GENERAL_NR_MASK) >>
PMU_GENERAL_NR_SHIFT;
-    }
-
-    return arch_pmc_cnt;
+    cpuid(0xa, &eax, &ebx, &ecx, &edx);
+    return ( (eax & PMU_FIXED_NR_MASK) >> PMU_FIXED_NR_SHIFT );
 }
 
 static u64 core2_calc_intial_glb_ctrl_msr(void)
 {
-    int arch_pmc_bits = (1 << core2_get_pmc_count()) - 1;
-    u64 fix_pmc_bits  = (1 << 3) - 1;
+    int arch_pmc_bits = (1 << arch_pmc_cnt) - 1;
+    u64 fix_pmc_bits  = (1 << fixed_pmc_cnt) - 1;
     return ((fix_pmc_bits << 32) | arch_pmc_bits);
 }
 
@@ -196,9 +180,9 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int
*index)
 {
     int i;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
-        if ( core2_fix_counters.msr[i] == msr_index )
+        if ( msr_index == MSR_CORE_PERF_FIXED_CTR0 + i )
         {
             *type = MSR_TYPE_COUNTER;
             *index = i;
@@ -206,14 +190,12 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int
*index)
         }
     }
 
-    for ( i = 0; i < core2_ctrls.num; i++ )
+    if ( (msr_index == MSR_CORE_PERF_FIXED_CTR_CTRL ) ||
+        (msr_index == MSR_IA32_DS_AREA) ||
+        (msr_index == MSR_IA32_PEBS_ENABLE) )
     {
-        if ( core2_ctrls.msr[i] == msr_index )
-        {
-            *type = MSR_TYPE_CTRL;
-            *index = i;
-            return 1;
-        }
+        *type = MSR_TYPE_CTRL;
+        return 1;
     }
 
     if ( (msr_index == MSR_CORE_PERF_GLOBAL_CTRL) ||
@@ -225,7 +207,7 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int
*index)
     }
 
     if ( (msr_index >= MSR_IA32_PERFCTR0) &&
-         (msr_index < (MSR_IA32_PERFCTR0 + core2_get_pmc_count())) )
+         (msr_index < (MSR_IA32_PERFCTR0 + arch_pmc_cnt)) )
     {
         *type = MSR_TYPE_ARCH_COUNTER;
         *index = msr_index - MSR_IA32_PERFCTR0;
@@ -233,7 +215,7 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int
*index)
     }
 
     if ( (msr_index >= MSR_P6_EVNTSEL0) &&
-         (msr_index < (MSR_P6_EVNTSEL0 + core2_get_pmc_count())) )
+         (msr_index < (MSR_P6_EVNTSEL0 + arch_pmc_cnt)) )
     {
         *type = MSR_TYPE_ARCH_CTRL;
         *index = msr_index - MSR_P6_EVNTSEL0;
@@ -248,13 +230,13 @@ static void core2_vpmu_set_msr_bitmap(unsigned long
*msr_bitmap)
     int i;
 
     /* Allow Read/Write PMU Counters MSR Directly. */
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
-        clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]), msr_bitmap);
-        clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
+        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
+        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
                   msr_bitmap + 0x800/BYTES_PER_LONG);
     }
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
         clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
         clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
@@ -262,32 +244,37 @@ static void core2_vpmu_set_msr_bitmap(unsigned long
*msr_bitmap)
     }
 
     /* Allow Read PMU Non-global Controls Directly. */
-    for ( i = 0; i < core2_ctrls.num; i++ )
-        clear_bit(msraddr_to_bitpos(core2_ctrls.msr[i]), msr_bitmap);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
         clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap);
+
+    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
+    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
+    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
 }
 
 static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
 {
     int i;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
-        set_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]), msr_bitmap);
-        set_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
+        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
+        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
                 msr_bitmap + 0x800/BYTES_PER_LONG);
     }
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
         set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
         set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
                 msr_bitmap + 0x800/BYTES_PER_LONG);
     }
-    for ( i = 0; i < core2_ctrls.num; i++ )
-        set_bit(msraddr_to_bitpos(core2_ctrls.msr[i]), msr_bitmap);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
+
+    for ( i = 0; i < arch_pmc_cnt; i++ )
         set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap);
+
+    set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
+    set_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
+    set_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
 }
 
 static inline void __core2_vpmu_save(struct vcpu *v)
@@ -295,10 +282,10 @@ static inline void __core2_vpmu_save(struct vcpu *v)
     int i;
     struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
-        rdmsrl(core2_fix_counters.msr[i], core2_vpmu_cxt->fix_counters[i]);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        rdmsrl(MSR_IA32_PERFCTR0+i,
core2_vpmu_cxt->arch_msr_pair[i].counter);
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i,
core2_vpmu_cxt->fix_counters[i]);
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        rdmsrl(MSR_IA32_PERFCTR0 + i,
core2_vpmu_cxt->arch_msr_pair[i].counter);
 }
 
 static int core2_vpmu_save(struct vcpu *v)
@@ -322,14 +309,16 @@ static inline void __core2_vpmu_load(struct vcpu *v)
     int i;
     struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
-        wrmsrl(core2_fix_counters.msr[i], core2_vpmu_cxt->fix_counters[i]);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        wrmsrl(MSR_IA32_PERFCTR0+i,
core2_vpmu_cxt->arch_msr_pair[i].counter);
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i,
core2_vpmu_cxt->fix_counters[i]);
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        wrmsrl(MSR_IA32_PERFCTR0 + i,
core2_vpmu_cxt->arch_msr_pair[i].counter);
 
-    for ( i = 0; i < core2_ctrls.num; i++ )
-        wrmsrl(core2_ctrls.msr[i], core2_vpmu_cxt->ctrls[i]);
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
+    wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
+    wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
+    wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
+
+    for ( i = 0; i < arch_pmc_cnt; i++ )
         wrmsrl(MSR_P6_EVNTSEL0+i, core2_vpmu_cxt->arch_msr_pair[i].control);
 }
 
@@ -347,56 +336,39 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     struct core2_vpmu_context *core2_vpmu_cxt;
-    struct core2_pmu_enable *pmu_enable;
 
     if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
         return 0;
 
     wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
     if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-        return 0;
+        goto out_err;
 
     if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-        return 0;
+        goto out_err;
     vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
                  core2_calc_intial_glb_ctrl_msr());
 
-    pmu_enable = xzalloc_bytes(sizeof(struct core2_pmu_enable) +
-                               core2_get_pmc_count() - 1);
-    if ( !pmu_enable )
-        goto out1;
-
     core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) +
-                    (core2_get_pmc_count()-1)*sizeof(struct arch_msr_pair));
+                    (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair));
     if ( !core2_vpmu_cxt )
-        goto out2;
-    core2_vpmu_cxt->pmu_enable = pmu_enable;
+        goto out_err;
+
     vpmu->context = (void *)core2_vpmu_cxt;
 
+    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+
     return 1;
- out2:
-    xfree(pmu_enable);
- out1:
-    gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, PMU feature is
"
-             "unavailable on domain %d vcpu %d.\n",
-             v->vcpu_id, v->domain->domain_id);
-    return 0;
-}
 
-static void core2_vpmu_save_msr_context(struct vcpu *v, int type,
-                                       int index, u64 msr_data)
-{
-    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+out_err:
+    vmx_rm_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL);
+    vmx_rm_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL);
+    release_pmu_ownship(PMU_OWNER_HVM);
 
-    switch ( type )
-    {
-    case MSR_TYPE_CTRL:
-        core2_vpmu_cxt->ctrls[index] = msr_data;
-        break;
-    case MSR_TYPE_ARCH_CTRL:
-        core2_vpmu_cxt->arch_msr_pair[index].control = msr_data;
-        break;
-    }
+    printk("Failed to allocate VPMU resources for domain %u vcpu
%u\n",
+           v->vcpu_id, v->domain->domain_id);
+
+    return 0;
 }
 
 static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
@@ -407,10 +379,8 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int
*type, int *index)
         return 0;
 
     if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) &&
-	 (vpmu->context != NULL ||
-	  !core2_vpmu_alloc_resource(current)) )
+         !core2_vpmu_alloc_resource(current) )
         return 0;
-    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
 
     /* Do the lazy load staff. */
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
@@ -426,7 +396,7 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int
*type, int *index)
 static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
 {
     u64 global_ctrl, non_global_ctrl;
-    char pmu_enable = 0;
+    unsigned pmu_enable = 0;
     int i, tmp;
     int type = -1, index = -1;
     struct vcpu *v = current;
@@ -471,6 +441,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t
msr_content)
         if ( msr_content & 1 )
             gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS,
"
                      "which is not supported.\n");
+        core2_vpmu_cxt->pebs_enable = msr_content;
         return 1;
     case MSR_IA32_DS_AREA:
         if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
@@ -483,27 +454,25 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t
msr_content)
                 hvm_inject_hw_exception(TRAP_gp_fault, 0);
                 return 1;
             }
-            core2_vpmu_cxt->pmu_enable->ds_area_enable = msr_content ? 1
: 0;
+            core2_vpmu_cxt->ds_area = msr_content;
             break;
         }
         gdprintk(XENLOG_WARNING, "Guest setting of DTS is
ignored.\n");
         return 1;
     case MSR_CORE_PERF_GLOBAL_CTRL:
         global_ctrl = msr_content;
-        for ( i = 0; i < core2_get_pmc_count(); i++ )
+        for ( i = 0; i < arch_pmc_cnt; i++ )
         {
             rdmsrl(MSR_P6_EVNTSEL0+i, non_global_ctrl);
-            core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i] -             
global_ctrl & (non_global_ctrl >> 22) & 1;
+            pmu_enable += global_ctrl & (non_global_ctrl >> 22) &
1;
             global_ctrl >>= 1;
         }
 
         rdmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, non_global_ctrl);
         global_ctrl = msr_content >> 32;
-        for ( i = 0; i < core2_fix_counters.num; i++ )
+        for ( i = 0; i < fixed_pmc_cnt; i++ )
         {
-            core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] -            
(global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1: 0);
+            pmu_enable += (global_ctrl & 1) & ((non_global_ctrl &
0x3)? 1: 0);
             non_global_ctrl >>= FIXED_CTR_CTRL_BITS;
             global_ctrl >>= 1;
         }
@@ -512,27 +481,27 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t
msr_content)
         non_global_ctrl = msr_content;
         vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
         global_ctrl >>= 32;
-        for ( i = 0; i < core2_fix_counters.num; i++ )
+        for ( i = 0; i < fixed_pmc_cnt; i++ )
         {
-            core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] -            
(global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1: 0);
+            pmu_enable += (global_ctrl & 1) & ((non_global_ctrl &
0x3)? 1: 0);
             non_global_ctrl >>= 4;
             global_ctrl >>= 1;
         }
+        core2_vpmu_cxt->fixed_ctrl = msr_content;
         break;
     default:
         tmp = msr - MSR_P6_EVNTSEL0;
-        vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
-        if ( tmp >= 0 && tmp < core2_get_pmc_count() )
-            core2_vpmu_cxt->pmu_enable->arch_pmc_enable[tmp] -           
(global_ctrl >> tmp) & (msr_content >> 22) & 1;
+        if ( tmp >= 0 && tmp < arch_pmc_cnt )
+        {
+            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+            core2_vpmu_cxt->arch_msr_pair[tmp].control = msr_content;
+            for ( i = 0; i < arch_pmc_cnt && !pmu_enable; i++ )
+                pmu_enable += (global_ctrl >> i) &
+                    (core2_vpmu_cxt->arch_msr_pair[i].control >> 22)
& 1;
+        }
     }
 
-    for ( i = 0; i < core2_fix_counters.num; i++ )
-        pmu_enable |= core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i];
-    for ( i = 0; i < core2_get_pmc_count(); i++ )
-        pmu_enable |= core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i];
-    pmu_enable |= core2_vpmu_cxt->pmu_enable->ds_area_enable;
+    pmu_enable += (core2_vpmu_cxt->ds_area != 0);
     if ( pmu_enable )
         vpmu_set(vpmu, VPMU_RUNNING);
     else
@@ -551,7 +520,6 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t
msr_content)
         vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
     }
 
-    core2_vpmu_save_msr_context(v, type, index, msr_content);
     if ( type != MSR_TYPE_GLOBAL )
     {
         u64 mask;
@@ -567,7 +535,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t
msr_content)
             if  ( msr == MSR_IA32_DS_AREA )
                 break;
             /* 4 bits per counter, currently 3 fixed counters implemented. */
-            mask = ~((1ull << (VPMU_CORE2_NUM_FIXED *
FIXED_CTR_CTRL_BITS)) - 1);
+            mask = ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) -
1);
             if (msr_content & mask)
                 inject_gp = 1;
             break;
@@ -652,7 +620,7 @@ static void core2_vpmu_do_cpuid(unsigned int input,
 static void core2_vpmu_dump(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    int i, num;
+    int i;
     struct core2_vpmu_context *core2_vpmu_cxt = NULL;
     u64 val;
 
@@ -670,26 +638,24 @@ static void core2_vpmu_dump(struct vcpu *v)
 
     printk("    vPMU running\n");
     core2_vpmu_cxt = vpmu->context;
-    num = core2_get_pmc_count();
+
     /* Print the contents of the counter and its configuration msr. */
-    for ( i = 0; i < num; i++ )
+    for ( i = 0; i < arch_pmc_cnt; i++ )
     {
         struct arch_msr_pair* msr_pair = core2_vpmu_cxt->arch_msr_pair;
-        if ( core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i] )
-            printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
-                             i, msr_pair[i].counter, msr_pair[i].control);
+        printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
+               i, msr_pair[i].counter, msr_pair[i].control);
     }
     /*
      * The configuration of the fixed counter is 4 bits each in the
      * MSR_CORE_PERF_FIXED_CTR_CTRL.
      */
-    val = core2_vpmu_cxt->ctrls[MSR_CORE_PERF_FIXED_CTR_CTRL_IDX];
-    for ( i = 0; i < core2_fix_counters.num; i++ )
+    val = core2_vpmu_cxt->fixed_ctrl;
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
     {
-        if ( core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] )
-            printk("      fixed_%d:   0x%016lx ctrl: 0x%lx\n",
-                             i, core2_vpmu_cxt->fix_counters[i],
-                             val & FIXED_CTR_CTRL_MASK);
+        printk("      fixed_%d:   0x%016lx ctrl: 0x%lx\n",
+               i, core2_vpmu_cxt->fix_counters[i],
+               val & FIXED_CTR_CTRL_MASK);
         val >>= FIXED_CTR_CTRL_BITS;
     }
 }
@@ -707,7 +673,7 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs
*regs)
         if ( is_pmc_quirk )
             handle_pmc_quirk(msr_content);
         core2_vpmu_cxt->global_ovf_status |= msr_content;
-        msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count())
- 1);
+        msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
         wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
     }
     else
@@ -770,18 +736,23 @@ static int core2_vpmu_initialise(struct vcpu *v, unsigned
int vpmu_flags)
         }
     }
 func_out:
+
+    arch_pmc_cnt = core2_get_arch_pmc_count();
+    fixed_pmc_cnt = core2_get_fixed_pmc_count();
+    if ( fixed_pmc_cnt > VPMU_CORE2_MAX_FIXED_PMCS )
+        fixed_pmc_cnt = VPMU_CORE2_MAX_FIXED_PMCS;
     check_pmc_quirk();
+
     return 0;
 }
 
 static void core2_vpmu_destroy(struct vcpu *v)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context;
 
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
-    xfree(core2_vpmu_cxt->pmu_enable);
+
     xfree(vpmu->context);
     if ( cpu_has_vmx_msr_bitmap && is_hvm_domain(v->domain) )
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h
b/xen/include/asm-x86/hvm/vmx/vmcs.h
index f30e5ac..5971613 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -470,7 +470,9 @@ void vmx_enable_intercept_for_msr(struct vcpu *v, u32 msr,
int type);
 int vmx_read_guest_msr(u32 msr, u64 *val);
 int vmx_write_guest_msr(u32 msr, u64 val);
 int vmx_add_guest_msr(u32 msr);
+void vmx_rm_guest_msr(u32 msr);
 int vmx_add_host_load_msr(u32 msr);
+void vmx_rm_host_load_msr(u32 msr);
 void vmx_vmcs_switch(struct vmcs_struct *from, struct vmcs_struct *to);
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector);
 void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector);
diff --git a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
index 60b05fd..410372d 100644
--- a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
+++ b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
@@ -23,29 +23,10 @@
 #ifndef __ASM_X86_HVM_VPMU_CORE_H_
 #define __ASM_X86_HVM_VPMU_CORE_H_
 
-/* Currently only 3 fixed counters are supported. */
-#define VPMU_CORE2_NUM_FIXED 3
-/* Currently only 3 Non-architectual Performance Control MSRs */
-#define VPMU_CORE2_NUM_CTRLS 3
-
 struct arch_msr_pair {
     u64 counter;
     u64 control;
 };
 
-struct core2_pmu_enable {
-    char ds_area_enable;
-    char fixed_ctr_enable[VPMU_CORE2_NUM_FIXED];
-    char arch_pmc_enable[1];
-};
-
-struct core2_vpmu_context {
-    struct core2_pmu_enable *pmu_enable;
-    u64 fix_counters[VPMU_CORE2_NUM_FIXED];
-    u64 ctrls[VPMU_CORE2_NUM_CTRLS];
-    u64 global_ovf_status;
-    struct arch_msr_pair arch_msr_pair[1];
-};
-
 #endif /* __ASM_X86_HVM_VPMU_CORE_H_ */
 
-- 
1.8.1.4

Boris Ostrovsky

2013-Sep-20 09:42 UTC

head link

[PATCH v2 06/13] x86/PMU: Add public xenpmu.h

Add xenpmu.h header file, move various macros and structures that will be
shared between hypervisor and PV guests to it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/svm/vpmu.c              | 15 +++-----
 xen/arch/x86/hvm/vmx/vpmu_core2.c        | 43 ++++++++++++----------
 xen/arch/x86/hvm/vpmu.c                  |  1 +
 xen/arch/x86/oprofile/op_model_ppro.c    |  6 +++-
 xen/include/asm-x86/hvm/vmx/vpmu_core2.h | 32 -----------------
 xen/include/asm-x86/hvm/vpmu.h           | 10 ++----
 xen/include/public/arch-x86/xenpmu-x86.h | 62 ++++++++++++++++++++++++++++++++
 xen/include/public/xenpmu.h              | 38 ++++++++++++++++++++
 8 files changed, 136 insertions(+), 71 deletions(-)
 delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
 create mode 100644 xen/include/public/arch-x86/xenpmu-x86.h
 create mode 100644 xen/include/public/xenpmu.h

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index a09930e..25532d0 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -30,10 +30,7 @@
 #include <asm/apic.h>
 #include <asm/hvm/vlapic.h>
 #include <asm/hvm/vpmu.h>
-
-#define F10H_NUM_COUNTERS 4
-#define F15H_NUM_COUNTERS 6
-#define MAX_NUM_COUNTERS F15H_NUM_COUNTERS
+#include <public/xenpmu.h>
 
 #define MSR_F10H_EVNTSEL_GO_SHIFT   40
 #define MSR_F10H_EVNTSEL_EN_SHIFT   22
@@ -49,6 +46,9 @@ static const u32 __read_mostly *counters;
 static const u32 __read_mostly *ctrls;
 static bool_t __read_mostly k7_counters_mirrored;
 
+#define F10H_NUM_COUNTERS   4
+#define F15H_NUM_COUNTERS   6
+
 /* PMU Counter MSRs. */
 static const u32 AMD_F10H_COUNTERS[] = {
     MSR_K7_PERFCTR0,
@@ -83,13 +83,6 @@ static const u32 AMD_F15H_CTRLS[] = {
     MSR_AMD_FAM15H_EVNTSEL5
 };
 
-/* storage for context switching */
-struct amd_vpmu_context {
-    u64 counters[MAX_NUM_COUNTERS];
-    u64 ctrls[MAX_NUM_COUNTERS];
-    bool_t msr_bitmap_set;
-};
-
 static inline int get_pmu_reg_type(u32 addr)
 {
     if ( (addr >= MSR_K7_EVNTSEL0) && (addr <= MSR_K7_EVNTSEL3) )
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c
b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 50f784f..7d1da3f 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -35,8 +35,8 @@
 #include <asm/hvm/vmx/vmcs.h>
 #include <public/sched.h>
 #include <public/hvm/save.h>
+#include <public/xenpmu.h>
 #include <asm/hvm/vpmu.h>
-#include <asm/hvm/vmx/vpmu_core2.h>
 
 /*
  * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID
@@ -64,6 +64,10 @@
 #define PMU_FIXED_WIDTH_BITS     8  /* 8 bits 5..12 */
 #define PMU_FIXED_WIDTH_MASK     (((1 << PMU_FIXED_WIDTH_BITS) -1)
<< PMU_FIXED_WIDTH_SHIFT)
 
+/* Intel-specific VPMU features */
+#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
+#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
+
 /*
  * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
  * counters. 4 bits for every counter.
@@ -71,16 +75,6 @@
 #define FIXED_CTR_CTRL_BITS 4
 #define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
 
-#define VPMU_CORE2_MAX_FIXED_PMCS     4
-struct core2_vpmu_context {
-    u64 fixed_ctrl;
-    u64 ds_area;
-    u64 pebs_enable;
-    u64 global_ovf_status;
-    u64 fix_counters[VPMU_CORE2_MAX_FIXED_PMCS];
-    struct arch_msr_pair arch_msr_pair[1];
-};
-
 static int arch_pmc_cnt; /* Number of general-purpose performance counters */
 static int fixed_pmc_cnt; /* Number of fixed performance counters */
 
@@ -225,6 +219,7 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type, int
*index)
     return 0;
 }
 
+#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
 static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
 {
     int i;
@@ -349,8 +344,7 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
     vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
                  core2_calc_intial_glb_ctrl_msr());
 
-    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) +
-                    (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair));
+    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context));
     if ( !core2_vpmu_cxt )
         goto out_err;
 
@@ -614,6 +608,18 @@ static void core2_vpmu_do_cpuid(unsigned int input,
                 *ecx |= cpufeat_mask(X86_FEATURE_DSCPL);
         }
     }
+    else if ( input == 0xa )
+    {
+        /* Limit number of counters to max that we support */
+        if ( ((*eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT)
>
+             XENPMU_CORE2_MAX_ARCH_PMCS )
+            *eax = (*eax & ~PMU_GENERAL_NR_MASK) |
+                (XENPMU_CORE2_MAX_ARCH_PMCS << PMU_GENERAL_NR_SHIFT);
+        if ( ((*edx & PMU_FIXED_NR_MASK) >> PMU_FIXED_NR_SHIFT) >
+             XENPMU_CORE2_MAX_FIXED_PMCS )
+            *eax = (*eax & ~PMU_FIXED_NR_MASK) |
+                (XENPMU_CORE2_MAX_FIXED_PMCS << PMU_FIXED_NR_SHIFT);
+    }
 }
 
 /* Dump vpmu info on console, called in the context of keyhandler
''q''. */
@@ -641,11 +647,10 @@ static void core2_vpmu_dump(struct vcpu *v)
 
     /* Print the contents of the counter and its configuration msr. */
     for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        struct arch_msr_pair* msr_pair = core2_vpmu_cxt->arch_msr_pair;
         printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
-               i, msr_pair[i].counter, msr_pair[i].control);
-    }
+               i, core2_vpmu_cxt->arch_msr_pair[i].counter,
+               core2_vpmu_cxt->arch_msr_pair[i].control);
+
     /*
      * The configuration of the fixed counter is 4 bits each in the
      * MSR_CORE_PERF_FIXED_CTR_CTRL.
@@ -739,8 +744,8 @@ func_out:
 
     arch_pmc_cnt = core2_get_arch_pmc_count();
     fixed_pmc_cnt = core2_get_fixed_pmc_count();
-    if ( fixed_pmc_cnt > VPMU_CORE2_MAX_FIXED_PMCS )
-        fixed_pmc_cnt = VPMU_CORE2_MAX_FIXED_PMCS;
+    if ( fixed_pmc_cnt > XENPMU_CORE2_MAX_FIXED_PMCS )
+        fixed_pmc_cnt = XENPMU_CORE2_MAX_FIXED_PMCS;
     check_pmc_quirk();
 
     return 0;
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index d6a9ff6..fa8cfd7 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -31,6 +31,7 @@
 #include <asm/hvm/svm/svm.h>
 #include <asm/hvm/svm/vmcb.h>
 #include <asm/apic.h>
+#include <public/xenpmu.h>
 
 /*
  * "vpmu" :     vpmu generally enabled
diff --git a/xen/arch/x86/oprofile/op_model_ppro.c
b/xen/arch/x86/oprofile/op_model_ppro.c
index 3225937..5aae2e7 100644
--- a/xen/arch/x86/oprofile/op_model_ppro.c
+++ b/xen/arch/x86/oprofile/op_model_ppro.c
@@ -20,11 +20,15 @@
 #include <asm/regs.h>
 #include <asm/current.h>
 #include <asm/hvm/vpmu.h>
-#include <asm/hvm/vmx/vpmu_core2.h>
 
 #include "op_x86_model.h"
 #include "op_counter.h"
 
+struct arch_msr_pair {
+    u64 counter;
+    u64 control;
+};
+
 /*
  * Intel "Architectural Performance Monitoring" CPUID
  * detection/enumeration details:
diff --git a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
deleted file mode 100644
index 410372d..0000000
--- a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
+++ /dev/null
@@ -1,32 +0,0 @@
-
-/*
- * vpmu_core2.h: CORE 2 specific PMU virtualization for HVM domain.
- *
- * Copyright (c) 2007, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Author: Haitao Shan <haitao.shan@intel.com>
- */
-
-#ifndef __ASM_X86_HVM_VPMU_CORE_H_
-#define __ASM_X86_HVM_VPMU_CORE_H_
-
-struct arch_msr_pair {
-    u64 counter;
-    u64 control;
-};
-
-#endif /* __ASM_X86_HVM_VPMU_CORE_H_ */
-
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 674cdad..50cdc4f 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -22,6 +22,8 @@
 #ifndef __ASM_X86_HVM_VPMU_H_
 #define __ASM_X86_HVM_VPMU_H_
 
+#include <public/xenpmu.h>
+
 /*
  * Flag bits given as a string on the hypervisor boot parameter
''vpmu''.
  * See arch/x86/hvm/vpmu.c.
@@ -29,12 +31,9 @@
 #define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
 #define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
 
-
-#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
 #define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
 #define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
                                           arch.hvm_vcpu.vpmu))
-#define vpmu_domain(vpmu) (vpmu_vcpu(vpmu)->domain)
 
 #define MSR_TYPE_COUNTER            0
 #define MSR_TYPE_CTRL               1
@@ -76,11 +75,6 @@ struct vpmu_struct {
 #define VPMU_FROZEN                         0x10  /* Stop counters while VCPU
is not running */
 #define VPMU_PASSIVE_DOMAIN_ALLOCATED       0x20
 
-/* VPMU features */
-#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
-#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
-
-
 #define vpmu_set(_vpmu, _x)         ((_vpmu)->flags |= (_x))
 #define vpmu_reset(_vpmu, _x)       ((_vpmu)->flags &= ~(_x))
 #define vpmu_is_set(_vpmu, _x)      ((_vpmu)->flags & (_x))
diff --git a/xen/include/public/arch-x86/xenpmu-x86.h
b/xen/include/public/arch-x86/xenpmu-x86.h
new file mode 100644
index 0000000..04e02b3
--- /dev/null
+++ b/xen/include/public/arch-x86/xenpmu-x86.h
@@ -0,0 +1,62 @@
+#ifndef __XEN_PUBLIC_ARCH_X86_PMU_H__
+#define __XEN_PUBLIC_ARCH_X86_PMU_H__
+
+/* x86-specific PMU definitions */
+
+
+/* AMD PMU registers and structures */
+#define XENPMU_AMD_MAX_COUNTERS        16 /* To accommodate more counters in */
+                                          /* the future (e.g. NB counters)   */
+struct amd_vpmu_context {
+    uint64_t counters[XENPMU_AMD_MAX_COUNTERS];
+    uint64_t ctrls[XENPMU_AMD_MAX_COUNTERS];
+    uint8_t msr_bitmap_set;               /* Used by HVM only */
+};
+
+/* Intel PMU registers and structures */
+#define XENPMU_CORE2_MAX_ARCH_PMCS     16
+#define XENPMU_CORE2_MAX_FIXED_PMCS    4
+struct core2_vpmu_context {
+    uint64_t global_ctrl;
+    uint64_t global_ovf_ctrl;
+    uint64_t global_status;
+    uint64_t global_ovf_status;
+    uint64_t fixed_ctrl;
+    uint64_t ds_area;
+    uint64_t pebs_enable;
+    uint64_t debugctl;
+    uint64_t fix_counters[XENPMU_CORE2_MAX_FIXED_PMCS];
+    struct {
+        uint64_t counter;
+        uint64_t control;
+    } arch_msr_pair[XENPMU_CORE2_MAX_ARCH_PMCS];
+};
+
+#define MAX(x, y)                 ((x) > (y) ? (x) : (y))
+#define XENPMU_MAX_CTXT_SZ        MAX(sizeof(struct amd_vpmu_context),\
+                                      sizeof(struct core2_vpmu_context))
+#define XENPMU_CTXT_PAD_SZ        (((XENPMU_MAX_CTXT_SZ + 64) & ~63) + 128)
+struct arch_xenpmu {
+    union {
+        struct cpu_user_regs regs;
+        uint8_t pad2[256];
+    };
+    union {
+        struct amd_vpmu_context amd;
+        struct core2_vpmu_context intel;
+        uint8_t pad1[XENPMU_CTXT_PAD_SZ];
+    };
+};
+typedef struct arch_xenpmu arch_xenpmu_t;
+
+#endif /* __XEN_PUBLIC_ARCH_X86_PMU_H__ */
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
+
diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h
new file mode 100644
index 0000000..fbacd7e
--- /dev/null
+++ b/xen/include/public/xenpmu.h
@@ -0,0 +1,38 @@
+#ifndef __XEN_PUBLIC_XENPMU_H__
+#define __XEN_PUBLIC_XENPMU_H__
+
+#include "xen.h"
+#if defined(__i386__) || defined(__x86_64__)
+#include "arch-x86/xenpmu-x86.h"
+#elif defined (__arm__) || defined (__aarch64__)
+#include "arch-arm.h"
+#else
+#error "Unsupported architecture"
+#endif
+
+#define XENPMU_VER_MAJ    0
+#define XENPMU_VER_MIN    0
+
+
+/* Shared between hypervisor and PV domain */
+struct xenpmu_data {
+    uint32_t domain_id;
+    uint32_t vcpu_id;
+    uint32_t pcpu_id;
+    uint32_t pmu_flags;
+
+    arch_xenpmu_t pmu;
+};
+typedef struct xenpmu_data xenpmu_data_t;
+
+#endif /* __XEN_PUBLIC_XENPMU_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
1.8.1.4

Boris Ostrovsky

2013-Sep-20 09:42 UTC

head link

[PATCH v2 07/13] x86/PMU: Make vpmu not HVM-specific

vpmu structure will be used for both HVM and PV guests. Move it from
hvm_vcpu to arch_vcpu.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/include/asm-x86/domain.h   | 2 ++
 xen/include/asm-x86/hvm/vcpu.h | 3 ---
 xen/include/asm-x86/hvm/vpmu.h | 4 ++--
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index d79464d..4f2247e 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -397,6 +397,8 @@ struct arch_vcpu
     void (*ctxt_switch_from) (struct vcpu *);
     void (*ctxt_switch_to) (struct vcpu *);
 
+    struct vpmu_struct vpmu;
+
     /* Virtual Machine Extensions */
     union {
         struct pv_vcpu pv_vcpu;
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index e8b8cd7..207f65d 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -139,9 +139,6 @@ struct hvm_vcpu {
     u32                 msr_tsc_aux;
     u64                 msr_tsc_adjust;
 
-    /* VPMU */
-    struct vpmu_struct  vpmu;
-
     union {
         struct arch_vmx_struct vmx;
         struct arch_svm_struct svm;
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 50cdc4f..051f4f0 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -31,9 +31,9 @@
 #define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
 #define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
 
-#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
+#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.vpmu))
 #define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
-                                          arch.hvm_vcpu.vpmu))
+                                          arch.vpmu))
 
 #define MSR_TYPE_COUNTER            0
 #define MSR_TYPE_CTRL               1
-- 
1.8.1.4

Boris Ostrovsky

2013-Sep-20 09:42 UTC

head link

[PATCH v2 08/13] x86/PMU: Interface for setting PMU mode and flags

Add runtime interface for setting PMU mode and flags. Three main modes are
provided:
* PMU off
* PMU on: Guests can access PMU MSRs and receive PMU interrupts. dom0
  profiles itself and the hypervisor.
* dom0-only PMU: dom0 collects samples for both itself and guests.

For feature flagso only Intel''s BTS is currently supported.

Mode and flags are set via new PMU hypercall.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/svm/vpmu.c        |  2 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c  |  4 +-
 xen/arch/x86/hvm/vpmu.c            | 79 ++++++++++++++++++++++++++++++++++----
 xen/arch/x86/x86_64/compat/entry.S |  4 ++
 xen/arch/x86/x86_64/entry.S        |  4 ++
 xen/include/asm-x86/hvm/vpmu.h     |  9 +----
 xen/include/public/xen.h           |  1 +
 xen/include/public/xenpmu.h        | 53 +++++++++++++++++++++++++
 xen/include/xen/hypercall.h        |  4 ++
 9 files changed, 142 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 25532d0..99d0137 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -461,7 +461,7 @@ int svm_vpmu_initialise(struct vcpu *v, unsigned int
vpmu_flags)
     int ret = 0;
 
     /* vpmu enabled? */
-    if ( !vpmu_flags )
+    if ( vpmu_flags == XENPMU_MODE_OFF )
         return 0;
 
     switch ( family )
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c
b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 7d1da3f..7dae451 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -702,7 +702,7 @@ static int core2_vpmu_initialise(struct vcpu *v, unsigned
int vpmu_flags)
     u64 msr_content;
     struct cpuinfo_x86 *c = &current_cpu_data;
 
-    if ( !(vpmu_flags & VPMU_BOOT_BTS) )
+    if ( !(vpmu_flags & XENPMU_FLAGS_INTEL_BTS) )
         goto func_out;
     /* Check the ''Debug Store'' feature in the
CPUID.EAX[1]:EDX[21] */
     if ( cpu_has(c, X86_FEATURE_DS) )
@@ -823,7 +823,7 @@ int vmx_vpmu_initialise(struct vcpu *v, unsigned int
vpmu_flags)
     int ret = 0;
 
     vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
-    if ( !vpmu_flags )
+    if ( vpmu_flags == XENPMU_MODE_OFF )
         return 0;
 
     if ( family == 6 )
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index fa8cfd7..256eb13 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -21,6 +21,7 @@
 #include <xen/config.h>
 #include <xen/sched.h>
 #include <xen/xenoprof.h>
+#include <xen/guest_access.h>
 #include <asm/regs.h>
 #include <asm/types.h>
 #include <asm/msr.h>
@@ -38,7 +39,7 @@
  * "vpmu=off" : vpmu generally disabled
  * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
  */
-static unsigned int __read_mostly opt_vpmu_enabled;
+uint32_t __read_mostly vpmu_mode = XENPMU_MODE_OFF;
 static void parse_vpmu_param(char *s);
 custom_param("vpmu", parse_vpmu_param);
 
@@ -52,7 +53,7 @@ static void __init parse_vpmu_param(char *s)
         break;
     default:
         if ( !strcmp(s, "bts") )
-            opt_vpmu_enabled |= VPMU_BOOT_BTS;
+            vpmu_mode |= XENPMU_FLAGS_INTEL_BTS;
         else if ( *s )
         {
             printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
@@ -60,7 +61,7 @@ static void __init parse_vpmu_param(char *s)
         }
         /* fall through */
     case 1:
-        opt_vpmu_enabled |= VPMU_BOOT_ENABLED;
+        vpmu_mode |= XENPMU_MODE_ON;
         break;
     }
 }
@@ -226,19 +227,19 @@ void vpmu_initialise(struct vcpu *v)
     switch ( vendor )
     {
     case X86_VENDOR_AMD:
-        if ( svm_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
-            opt_vpmu_enabled = 0;
+        if ( svm_vpmu_initialise(v, vpmu_mode) != 0 )
+            vpmu_mode = XENPMU_MODE_OFF;
         break;
 
     case X86_VENDOR_INTEL:
-        if ( vmx_vpmu_initialise(v, opt_vpmu_enabled) != 0 )
-            opt_vpmu_enabled = 0;
+        if ( vmx_vpmu_initialise(v, vpmu_mode) != 0 )
+            vpmu_mode = XENPMU_MODE_OFF;
         break;
 
     default:
         printk("VPMU: Initialization failed. "
                "Unknown CPU vendor %d\n", vendor);
-        opt_vpmu_enabled = 0;
+        vpmu_mode = XENPMU_MODE_OFF;
         break;
     }
 }
@@ -260,3 +261,65 @@ void vpmu_dump(struct vcpu *v)
         vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
 }
 
+long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
+{
+    int ret = -EINVAL;
+    xenpmu_params_t pmu_params;
+    uint32_t mode, flags;
+
+    switch ( op )
+    {
+    case XENPMU_mode_set:
+        if ( !is_control_domain(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        mode = (uint32_t)pmu_params.val & XENPMU_MODE_MASK;
+        if ( (mode & ~(XENPMU_MODE_ON | XENPMU_MODE_PRIV)) ||
+            ((mode & XENPMU_MODE_ON) && (mode &
XENPMU_MODE_PRIV)) )
+            return -EINVAL;
+
+        vpmu_mode &= ~XENPMU_MODE_MASK;
+        vpmu_mode |= mode;
+
+        ret = 0;
+        break;
+
+    case XENPMU_mode_get:
+        pmu_params.val = vpmu_mode & XENPMU_MODE_MASK;
+        pmu_params.version.maj = XENPMU_VER_MAJ;
+        pmu_params.version.min = XENPMU_VER_MIN;
+        if ( copy_to_guest(arg, &pmu_params, 1) )
+            return -EFAULT;
+        ret = 0;
+        break;
+
+    case XENPMU_flags_set:
+        if ( !is_control_domain(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        flags = (uint64_t)pmu_params.val & XENPMU_FLAGS_MASK;
+        if ( flags & ~XENPMU_FLAGS_INTEL_BTS )
+            return -EINVAL;
+
+        vpmu_mode &= ~XENPMU_FLAGS_MASK;
+        vpmu_mode |= flags;
+
+        ret = 0;
+        break;
+
+    case XENPMU_flags_get:
+        pmu_params.val = vpmu_mode & XENPMU_FLAGS_MASK;
+        if ( copy_to_guest(arg, &pmu_params, 1) )
+            return -EFAULT;
+        ret = 0;
+        break;
+     }
+
+    return ret;
+}
diff --git a/xen/arch/x86/x86_64/compat/entry.S
b/xen/arch/x86/x86_64/compat/entry.S
index c0afe2c..bc03ffe 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -413,6 +413,8 @@ ENTRY(compat_hypercall_table)
         .quad do_domctl
         .quad compat_kexec_op
         .quad do_tmem_op
+        .quad do_ni_hypercall           /* reserved for XenClient */
+        .quad do_xenpmu_op              /* 40 */
         .rept __HYPERVISOR_arch_0-((.-compat_hypercall_table)/8)
         .quad compat_ni_hypercall
         .endr
@@ -461,6 +463,8 @@ ENTRY(compat_hypercall_args_table)
         .byte 1 /* do_domctl                */
         .byte 2 /* compat_kexec_op          */
         .byte 1 /* do_tmem_op               */
+        .byte 0 /* reserved for XenClient   */
+        .byte 2 /* do_xenpmu_op             */  /* 40 */
         .rept __HYPERVISOR_arch_0-(.-compat_hypercall_args_table)
         .byte 0 /* compat_ni_hypercall      */
         .endr
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 5beeccb..2944427 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -762,6 +762,8 @@ ENTRY(hypercall_table)
         .quad do_domctl
         .quad do_kexec_op
         .quad do_tmem_op
+        .quad do_ni_hypercall       /* reserved for XenClient */
+        .quad do_xenpmu_op          /* 40 */
         .rept __HYPERVISOR_arch_0-((.-hypercall_table)/8)
         .quad do_ni_hypercall
         .endr
@@ -810,6 +812,8 @@ ENTRY(hypercall_args_table)
         .byte 1 /* do_domctl            */
         .byte 2 /* do_kexec             */
         .byte 1 /* do_tmem_op           */
+        .byte 0 /* reserved for XenClient */
+        .byte 2 /* do_xenpmu_op         */  /* 40 */
         .rept __HYPERVISOR_arch_0-(.-hypercall_args_table)
         .byte 0 /* do_ni_hypercall      */
         .endr
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 051f4f0..87cef41 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -24,13 +24,6 @@
 
 #include <public/xenpmu.h>
 
-/*
- * Flag bits given as a string on the hypervisor boot parameter
''vpmu''.
- * See arch/x86/hvm/vpmu.c.
- */
-#define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
-#define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
-
 #define vcpu_vpmu(vcpu)   (&((vcpu)->arch.vpmu))
 #define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
                                           arch.vpmu))
@@ -95,5 +88,7 @@ void vpmu_dump(struct vcpu *v);
 extern int acquire_pmu_ownership(int pmu_ownership);
 extern void release_pmu_ownership(int pmu_ownership);
 
+extern uint32_t vpmu_mode;
+
 #endif /* __ASM_X86_HVM_VPMU_H_*/
 
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 3cab74f..7f56560 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -101,6 +101,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define __HYPERVISOR_kexec_op             37
 #define __HYPERVISOR_tmem_op              38
 #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
+#define __HYPERVISOR_xenpmu_op            40
 
 /* Architecture-specific hypercall definitions. */
 #define __HYPERVISOR_arch_0               48
diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h
index fbacd7e..7f5c65c 100644
--- a/xen/include/public/xenpmu.h
+++ b/xen/include/public/xenpmu.h
@@ -13,6 +13,59 @@
 #define XENPMU_VER_MAJ    0
 #define XENPMU_VER_MIN    0
 
+/*
+ * ` enum neg_errnoval
+ * ` HYPERVISOR_xenpmu_op(enum xenpmu_op cmd, struct xenpmu_params *args);
+ *
+ * @cmd  == XENPMU_* (PMU operation)
+ * @args == struct xenpmu_params
+ */
+/* ` enum xenpmu_op { */
+#define XENPMU_mode_get        0 /* Also used for getting PMU version */
+#define XENPMU_mode_set        1
+#define XENPMU_flags_get       2
+#define XENPMU_flags_set       3
+/* ` } */
+
+/* Parameters structure for HYPERVISOR_xenpmu_op call */
+struct xenpmu_params {
+    /* IN/OUT parameters */
+    union {
+        struct version {
+            uint8_t maj;
+            uint8_t min;
+        } version;
+        uint64_t pad;
+    };
+    union {
+        uint64_t val;
+        void *valp;
+    };
+
+    /* IN parameters */
+    uint64_t vcpu;
+};
+typedef struct xenpmu_params xenpmu_params_t;
+
+
+/* PMU modes:
+ * - XENPMU_MODE_OFF:   No PMU virtualization
+ * - XENPMU_MODE_ON:    Guests can profile themselves, dom0 profiles
+ *                      itself and Xen
+ * - XENPMU_MODE_PRIV:  Only dom0 has access to VPMU and it profiles
+ *                      everyone: itself, the hypervisor and the guests.
+ */
+#define XENPMU_MODE_MASK          0xff
+#define XENPMU_MODE_OFF           0
+#define XENPMU_MODE_ON            (1<<0)
+#define XENPMU_MODE_PRIV          (1<<1)
+
+/*
+ * PMU flags:
+ * - XENPMU_FLAGS_INTEL_BTS: Intel BTS support (ignored on AMD)
+ */
+#define XENPMU_FLAGS_MASK         ((uint32_t)(~XENPMU_MODE_MASK))
+#define XENPMU_FLAGS_INTEL_BTS    (1<<8)
 
 /* Shared between hypervisor and PV domain */
 struct xenpmu_data {
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index a9e5229..ad3d3de 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -14,6 +14,7 @@
 #include <public/event_channel.h>
 #include <public/tmem.h>
 #include <public/version.h>
+#include <public/xenpmu.h>
 #include <asm/hypercall.h>
 #include <xsm/xsm.h>
 
@@ -139,6 +140,9 @@ do_tmem_op(
 extern long
 do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
 
+extern long
+do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
+
 #ifdef CONFIG_COMPAT
 
 extern int
-- 
1.8.1.4

Boris Ostrovsky

2013-Sep-20 09:42 UTC

head link

[PATCH v2 09/13] x86/PMU: Initialize PMU for PV guests

Code for initializing/deinitializing PMU, including setting up interrupt
handlers.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/hvm/svm/vpmu.c       |  34 ++++++------
 xen/arch/x86/hvm/vmx/vpmu_core2.c |  51 ++++++++++++------
 xen/arch/x86/hvm/vpmu.c           | 111 +++++++++++++++++++++++++++++++++++++-
 xen/common/event_channel.c        |   1 +
 xen/include/asm-x86/hvm/vpmu.h    |   1 +
 xen/include/public/xen.h          |   1 +
 xen/include/public/xenpmu.h       |   2 +
 xen/include/xen/softirq.h         |   1 +
 8 files changed, 170 insertions(+), 32 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 99d0137..527a1de 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -370,14 +370,19 @@ static int amd_vpmu_initialise(struct vcpu *v)
 	 }
     }
 
-    ctxt = xzalloc(struct amd_vpmu_context);
-    if ( !ctxt )
+    if ( is_hvm_domain(v->domain) )
     {
-        gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
-            " PMU feature is unavailable on domain %d vcpu %d.\n",
-            v->vcpu_id, v->domain->domain_id);
-        return -ENOMEM;
+        ctxt = xzalloc(struct amd_vpmu_context);
+        if ( !ctxt )
+        {
+            gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
+                " PMU feature is unavailable on domain %d vcpu
%d.\n",
+                v->vcpu_id, v->domain->domain_id);
+                return -ENOMEM;
+        }
     }
+    else
+        ctxt = &v->arch.vpmu.xenpmu_data->pmu.amd;
 
     vpmu->context = ctxt;
     vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
@@ -391,18 +396,17 @@ static void amd_vpmu_destroy(struct vcpu *v)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
 
-    if ( is_hvm_domain(v->domain) &&
-         ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
-        amd_vpmu_unset_msr_bitmap(v);
-
-    xfree(vpmu->context);
-    vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
-
-    if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
+    if ( is_hvm_domain(v->domain) )
     {
-        vpmu_reset(vpmu, VPMU_RUNNING);
+        if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+            amd_vpmu_unset_msr_bitmap(v);
+
+        xfree(vpmu->context);
         release_pmu_ownship(PMU_OWNER_HVM);
     }
+
+    vpmu->context = NULL;
+    vpmu_clear(vpmu);
 }
 
 /* VPMU part of the ''q'' keyhandler */
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c
b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 7dae451..5726610 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -332,21 +332,29 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
     struct core2_vpmu_context *core2_vpmu_cxt;
 
-    if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
-        return 0;
+    if ( is_hvm_domain(v->domain) )
+    {
+        if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
+            goto out_err;
 
-    wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-    if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-        goto out_err;
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+        if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+            goto out_err;
 
-    if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-        goto out_err;
-    vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
-                 core2_calc_intial_glb_ctrl_msr());
+        if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+            goto out_err;
+        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+                            core2_calc_intial_glb_ctrl_msr());
 
-    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context));
-    if ( !core2_vpmu_cxt )
-        goto out_err;
+        core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context));
+        if ( !core2_vpmu_cxt )
+            goto out_err;
+    }
+    else
+    {
+        core2_vpmu_cxt = &v->arch.vpmu.xenpmu_data->pmu.intel;
+        vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+    }
 
     vpmu->context = (void *)core2_vpmu_cxt;
 
@@ -743,11 +751,18 @@ static int core2_vpmu_initialise(struct vcpu *v, unsigned
int vpmu_flags)
 func_out:
 
     arch_pmc_cnt = core2_get_arch_pmc_count();
+    if ( arch_pmc_cnt > XENPMU_CORE2_MAX_ARCH_PMCS )
+        arch_pmc_cnt = XENPMU_CORE2_MAX_ARCH_PMCS;
     fixed_pmc_cnt = core2_get_fixed_pmc_count();
     if ( fixed_pmc_cnt > XENPMU_CORE2_MAX_FIXED_PMCS )
         fixed_pmc_cnt = XENPMU_CORE2_MAX_FIXED_PMCS;
     check_pmc_quirk();
 
+    /* PV domains can allocate resources immediately */
+    if ( !is_hvm_domain(v->domain) )
+        if ( !core2_vpmu_alloc_resource(v) )
+            return 1;
+
     return 0;
 }
 
@@ -758,11 +773,15 @@ static void core2_vpmu_destroy(struct vcpu *v)
     if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
         return;
 
-    xfree(vpmu->context);
-    if ( cpu_has_vmx_msr_bitmap && is_hvm_domain(v->domain) )
-        core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
+    if ( is_hvm_domain(v->domain) )
+    {
+        xfree(vpmu->context);
+        if ( cpu_has_vmx_msr_bitmap )
+            core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
+    }
+
     release_pmu_ownship(PMU_OWNER_HVM);
-    vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
+    vpmu_clear(vpmu);
 }
 
 struct arch_vpmu_ops core2_vpmu_ops = {
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 256eb13..69aaa7b 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -21,6 +21,9 @@
 #include <xen/config.h>
 #include <xen/sched.h>
 #include <xen/xenoprof.h>
+#include <xen/event.h>
+#include <xen/softirq.h>
+#include <xen/hypercall.h>
 #include <xen/guest_access.h>
 #include <asm/regs.h>
 #include <asm/types.h>
@@ -32,6 +35,7 @@
 #include <asm/hvm/svm/svm.h>
 #include <asm/hvm/svm/vmcb.h>
 #include <asm/apic.h>
+#include <asm/nmi.h>
 #include <public/xenpmu.h>
 
 /*
@@ -249,7 +253,13 @@ void vpmu_destroy(struct vcpu *v)
     struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
     if ( vpmu->arch_vpmu_ops &&
vpmu->arch_vpmu_ops->arch_vpmu_destroy )
+    {
+        /* Unload VPMU first. This will stop counters from running */
+        on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
+                         vpmu_save_force, (void *)v, 1);
+
         vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
+    }
 }
 
 /* Dump some vpmu informations on console. Used in keyhandler dump_domains().
*/
@@ -261,6 +271,93 @@ void vpmu_dump(struct vcpu *v)
         vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
 }
 
+int pmu_nmi_interrupt(struct cpu_user_regs *regs, int cpu)
+{
+    return vpmu_do_interrupt(regs);
+}
+
+/* Process the softirq set by PMU NMI handler */
+void pmu_virq(void)
+{
+    struct vcpu *v = current;
+
+    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
+        (v->domain->domain_id >= DOMID_FIRST_RESERVED) )
+    {
+        if ( smp_processor_id() >= dom0->max_vcpus )
+        {
+            printk(KERN_WARNING "PMU softirq on unexpected processor
%d\n",
+                smp_processor_id());
+            return;
+        }
+        v = dom0->vcpu[smp_processor_id()];
+    }
+
+    send_guest_vcpu_virq(v, VIRQ_XENPMU);
+}
+
+static int pvpmu_init(struct domain *d, xenpmu_params_t *params)
+{
+    struct vcpu *v;
+    uint64_t mfn = params->val;
+    static int pvpmu_initted = 0;
+
+    if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus )
+        return -EINVAL;
+
+    if ( !pvpmu_initted )
+    {
+        if (reserve_lapic_nmi() == 0)
+            set_nmi_callback(pmu_nmi_interrupt);
+        else
+        {
+            printk("Failed to reserve PMU NMI\n");
+            return -EBUSY;
+        }
+        open_softirq(PMU_SOFTIRQ, pmu_virq);
+        pvpmu_initted = 1;
+    }
+
+    if ( !mfn_valid(mfn) ||
+        !get_page_and_type(mfn_to_page(mfn), d, PGT_writable_page) )
+        return -EINVAL;
+
+    v = d->vcpu[params->vcpu];
+    v->arch.vpmu.xenpmu_data = map_domain_page_global(mfn);
+    memset(v->arch.vpmu.xenpmu_data, 0, PAGE_SIZE);
+
+    vpmu_initialise(v);
+
+    return 0;
+}
+
+static void pvpmu_finish(struct domain *d, xenpmu_params_t *params)
+{
+    struct vcpu *v;
+    uint64_t mfn;
+
+    if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus )
+        return;
+
+    v = d->vcpu[params->vcpu];
+    if (v != current)
+        vcpu_pause(v);
+
+    if ( v->arch.vpmu.xenpmu_data )
+    {
+        mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data);
+        if ( mfn_valid(mfn) )
+        {
+            unmap_domain_page_global(v->arch.vpmu.xenpmu_data);
+            put_page_and_type(mfn_to_page(mfn));
+        }
+    }
+    vpmu_destroy(v);
+
+    if (v != current)
+        vcpu_unpause(v);
+}
+
 long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     int ret = -EINVAL;
@@ -319,7 +416,19 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
             return -EFAULT;
         ret = 0;
         break;
-     }
+
+    case XENPMU_init:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+        ret = pvpmu_init(current->domain, &pmu_params);
+        break;
+
+    case XENPMU_finish:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+        pvpmu_finish(current->domain, &pmu_params);
+        break;
+    }
 
     return ret;
 }
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 64c976b..9ee6e5a 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -107,6 +107,7 @@ static int virq_is_global(uint32_t virq)
     case VIRQ_TIMER:
     case VIRQ_DEBUG:
     case VIRQ_XENOPROF:
+    case VIRQ_XENPMU:
         rc = 0;
         break;
     case VIRQ_ARCH_0 ... VIRQ_ARCH_7:
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 87cef41..e046afd 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -58,6 +58,7 @@ struct vpmu_struct {
     u32 hw_lapic_lvtpc;
     void *context;
     struct arch_vpmu_ops *arch_vpmu_ops;
+    xenpmu_data_t *xenpmu_data;
 };
 
 /* VPMU states */
diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h
index 7f56560..91d3db2 100644
--- a/xen/include/public/xen.h
+++ b/xen/include/public/xen.h
@@ -161,6 +161,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
 #define VIRQ_MEM_EVENT  10 /* G. (DOM0) A memory event has occured           */
 #define VIRQ_XC_RESERVED 11 /* G. Reserved for XenClient                     */
 #define VIRQ_ENOMEM     12 /* G. (DOM0) Low on heap memory       */
+#define VIRQ_XENPMU     13 /* V.  PMC interrupt                              */
 
 /* Architecture-specific VIRQ definitions. */
 #define VIRQ_ARCH_0    16
diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h
index 7f5c65c..ec49097 100644
--- a/xen/include/public/xenpmu.h
+++ b/xen/include/public/xenpmu.h
@@ -25,6 +25,8 @@
 #define XENPMU_mode_set        1
 #define XENPMU_flags_get       2
 #define XENPMU_flags_set       3
+#define XENPMU_init            4
+#define XENPMU_finish          5
 /* ` } */
 
 /* Parameters structure for HYPERVISOR_xenpmu_op call */
diff --git a/xen/include/xen/softirq.h b/xen/include/xen/softirq.h
index 0c0d481..5829fa4 100644
--- a/xen/include/xen/softirq.h
+++ b/xen/include/xen/softirq.h
@@ -8,6 +8,7 @@ enum {
     NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ,
     RCU_SOFTIRQ,
     TASKLET_SOFTIRQ,
+    PMU_SOFTIRQ,
     NR_COMMON_SOFTIRQS
 };
 
-- 
1.8.1.4

Boris Ostrovsky

2013-Sep-20 09:42 UTC

head link

[PATCH v2 10/13] x86/PMU: Add support for PMU registes handling on PV guests

Intercept accesses to PMU MSRs and LVTPC APIC vector (only
APIC_LVT_MASKED bit is processed) and process them in VPMU
module.

Dump VPMU state for all domains (HVM and PV) when requested.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/domain.c             |  3 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 90 ++++++++++++++++++++++++++++++---------
 xen/arch/x86/hvm/vpmu.c           | 16 +++++++
 xen/arch/x86/traps.c              | 39 ++++++++++++++++-
 xen/include/public/xenpmu.h       |  1 +
 5 files changed, 125 insertions(+), 24 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index e119d7b..36f4192 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1940,8 +1940,7 @@ void arch_dump_vcpu_info(struct vcpu *v)
 {
     paging_dump_vcpu_info(v);
 
-    if ( is_hvm_vcpu(v) )
-        vpmu_dump(v);
+    vpmu_dump(v);
 }
 
 void domain_cpuid(
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c
b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 5726610..ebbb516 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -27,6 +27,7 @@
 #include <asm/regs.h>
 #include <asm/types.h>
 #include <asm/apic.h>
+#include <asm/traps.h>
 #include <asm/msr.h>
 #include <asm/msr-index.h>
 #include <asm/hvm/support.h>
@@ -281,6 +282,9 @@ static inline void __core2_vpmu_save(struct vcpu *v)
         rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i,
core2_vpmu_cxt->fix_counters[i]);
     for ( i = 0; i < arch_pmc_cnt; i++ )
         rdmsrl(MSR_IA32_PERFCTR0 + i,
core2_vpmu_cxt->arch_msr_pair[i].counter);
+
+    if ( !is_hvm_domain(v->domain) )
+        rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
 }
 
 static int core2_vpmu_save(struct vcpu *v)
@@ -290,10 +294,14 @@ static int core2_vpmu_save(struct vcpu *v)
     if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
         return 0;
 
+    if ( !is_hvm_domain(v->domain) )
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+
     __core2_vpmu_save(v);
 
     /* Unset PMU MSR bitmap to trap lazy load. */
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && cpu_has_vmx_msr_bitmap )
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && cpu_has_vmx_msr_bitmap 
+        && is_hvm_domain(v->domain) )
         core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
 
     return 1;
@@ -315,6 +323,12 @@ static inline void __core2_vpmu_load(struct vcpu *v)
 
     for ( i = 0; i < arch_pmc_cnt; i++ )
         wrmsrl(MSR_P6_EVNTSEL0+i, core2_vpmu_cxt->arch_msr_pair[i].control);
+
+    if ( !is_hvm_domain(v->domain) )
+    {
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL,
core2_vpmu_cxt->global_ovf_ctrl);
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
+    }
 }
 
 static void core2_vpmu_load(struct vcpu *v)
@@ -421,7 +435,12 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t
msr_content)
                 if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
                     return 1;
                 gdprintk(XENLOG_WARNING, "Debug Store is not supported on
this cpu\n");
-                hvm_inject_hw_exception(TRAP_gp_fault, 0);
+
+                if ( is_hvm_domain(v->domain) )
+                    hvm_inject_hw_exception(TRAP_gp_fault, 0);
+                else
+                    send_guest_trap(v->domain, v->vcpu_id,
TRAP_gp_fault);
+
                 return 0;
             }
         }
@@ -433,11 +452,15 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t
msr_content)
     {
     case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
         core2_vpmu_cxt->global_ovf_status &= ~msr_content;
+        core2_vpmu_cxt->global_ovf_ctrl = msr_content;
         return 1;
     case MSR_CORE_PERF_GLOBAL_STATUS:
         gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
                  "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
-        hvm_inject_hw_exception(TRAP_gp_fault, 0);
+        if ( is_hvm_domain(v->domain) )
+            hvm_inject_hw_exception(TRAP_gp_fault, 0);
+        else
+            send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault);
         return 1;
     case MSR_IA32_PEBS_ENABLE:
         if ( msr_content & 1 )
@@ -453,7 +476,10 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t
msr_content)
                 gdprintk(XENLOG_WARNING,
                          "Illegal address for IA32_DS_AREA: %#"
PRIx64 "x\n",
                          msr_content);
-                hvm_inject_hw_exception(TRAP_gp_fault, 0);
+                if ( is_hvm_domain(v->domain) )
+                    hvm_inject_hw_exception(TRAP_gp_fault, 0);
+                else
+                    send_guest_trap(v->domain, v->vcpu_id,
TRAP_gp_fault);
                 return 1;
             }
             core2_vpmu_cxt->ds_area = msr_content;
@@ -478,10 +504,14 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t
msr_content)
             non_global_ctrl >>= FIXED_CTR_CTRL_BITS;
             global_ctrl >>= 1;
         }
+        core2_vpmu_cxt->global_ctrl = msr_content;
         break;
     case MSR_CORE_PERF_FIXED_CTR_CTRL:
         non_global_ctrl = msr_content;
-        vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+        if ( is_hvm_domain(v->domain) )
+            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+        else
+            rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
         global_ctrl >>= 32;
         for ( i = 0; i < fixed_pmc_cnt; i++ )
         {
@@ -495,7 +525,10 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t
msr_content)
         tmp = msr - MSR_P6_EVNTSEL0;
         if ( tmp >= 0 && tmp < arch_pmc_cnt )
         {
-            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+            if ( is_hvm_domain(v->domain) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
&global_ctrl);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
             core2_vpmu_cxt->arch_msr_pair[tmp].control = msr_content;
             for ( i = 0; i < arch_pmc_cnt && !pmu_enable; i++ )
                 pmu_enable += (global_ctrl >> i) &
@@ -509,17 +542,20 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t
msr_content)
     else
         vpmu_reset(vpmu, VPMU_RUNNING);
 
-    /* Setup LVTPC in local apic */
-    if ( vpmu_is_set(vpmu, VPMU_RUNNING) &&
-         is_vlapic_lvtpc_enabled(vcpu_vlapic(v)) )
-    {
-        apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
-    }
-    else
+    if ( is_hvm_domain(v->domain) )
     {
-        apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
+        /* Setup LVTPC in local apic */
+        if ( vpmu_is_set(vpmu, VPMU_RUNNING) &&
+             is_vlapic_lvtpc_enabled(vcpu_vlapic(v)) )
+        {
+            apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR);
+            vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
+        }
+        else
+        {
+            apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
+            vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
+        }
     }
 
     if ( type != MSR_TYPE_GLOBAL )
@@ -547,13 +583,24 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t
msr_content)
                 inject_gp = 1;
             break;
         }
-        if (inject_gp)
-            hvm_inject_hw_exception(TRAP_gp_fault, 0);
+
+        if (inject_gp) 
+        {
+           if ( is_hvm_domain(v->domain) )
+               hvm_inject_hw_exception(TRAP_gp_fault, 0);
+           else
+               send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault);
+        }
         else
             wrmsrl(msr, msr_content);
     }
     else
-        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+    {
+       if ( is_hvm_domain(v->domain) )
+           vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+       else
+           wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+    }
 
     return 1;
 }
@@ -577,7 +624,10 @@ static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t
*msr_content)
             *msr_content = core2_vpmu_cxt->global_ovf_status;
             break;
         case MSR_CORE_PERF_GLOBAL_CTRL:
-            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+            if ( is_hvm_domain(v->domain) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
             break;
         default:
             rdmsrl(msr, *msr_content);
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 69aaa7b..4638193 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -70,6 +70,14 @@ static void __init parse_vpmu_param(char *s)
     }
 }
 
+static void vpmu_lvtpc_update(uint32_t val)
+{
+     struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+     vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
+     apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+}
+
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
@@ -428,6 +436,14 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
             return -EFAULT;
         pvpmu_finish(current->domain, &pmu_params);
         break;
+
+    case XENPMU_lvtpc_set:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        vpmu_lvtpc_update((uint32_t)pmu_params.val);
+        ret = 0;
+        break;
     }
 
     return ret;
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 57dbd0c..f378a24 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -71,6 +71,7 @@
 #include <asm/apic.h>
 #include <asm/mc146818rtc.h>
 #include <asm/hpet.h>
+#include <asm/hvm/vpmu.h>
 #include <public/arch-x86/cpuid.h>
 #include <xsm/xsm.h>
 
@@ -871,7 +872,6 @@ static void pv_cpuid(struct cpu_user_regs *regs)
         break;
 
     case 0x00000005: /* MONITOR/MWAIT */
-    case 0x0000000a: /* Architectural Performance Monitor Features */
     case 0x0000000b: /* Extended Topology Enumeration */
     case 0x8000000a: /* SVM revision and features */
     case 0x8000001b: /* Instruction Based Sampling */
@@ -880,7 +880,9 @@ static void pv_cpuid(struct cpu_user_regs *regs)
     unsupported:
         a = b = c = d = 0;
         break;
-
+    case 0x0000000a: /* Architectural Performance Monitor Features (Intel) */
+        vpmu_do_cpuid(0xa, &a, &b, &c, &d);
+        break;
     default:
         (void)cpuid_hypervisor_leaves(regs->eax, 0, &a, &b, &c,
&d);
         break;
@@ -2486,6 +2488,17 @@ static int emulate_privileged_op(struct cpu_user_regs
*regs)
             if ( wrmsr_safe(regs->ecx, msr_content) != 0 )
                 goto fail;
             break;
+        case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1:
+        case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1:
+        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
+            if ( !vpmu_do_wrmsr(regs->ecx, msr_content) )
+            {
+                if ( (vpmu_mode & XENPMU_MODE_PRIV) &&
(v->domain == dom0) )
+                    goto invalid;
+            }
+            break;
         default:
             if ( wrmsr_hypervisor_regs(regs->ecx, msr_content) == 1 )
                 break;
@@ -2574,6 +2587,24 @@ static int emulate_privileged_op(struct cpu_user_regs
*regs)
             regs->eax = (uint32_t)msr_content;
             regs->edx = (uint32_t)(msr_content >> 32);
             break;
+        case MSR_IA32_PERF_CAPABILITIES:
+            if ( rdmsr_safe(regs->ecx, msr_content) )
+                goto fail;
+            /* Full-Width Writes not supported */
+            regs->eax = (uint32_t)msr_content & ~(1 << 13);
+            regs->edx = (uint32_t)(msr_content >> 32);
+            break;
+        case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1:
+        case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1:
+        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
+            if ( vpmu_do_rdmsr(regs->ecx, &msr_content) ) {
+                regs->eax = (uint32_t)msr_content;
+                regs->edx = (uint32_t)(msr_content >> 32);
+                break;
+            }
+            goto rdmsr_normal;
         default:
             if ( rdmsr_hypervisor_regs(regs->ecx, &val) )
             {
@@ -2606,6 +2637,10 @@ static int emulate_privileged_op(struct cpu_user_regs
*regs)
         pv_cpuid(regs);
         break;
 
+    case 0x33: /* RDPMC */
+        rdpmc(regs->ecx, regs->eax, regs->edx);
+        break;
+
     default:
         goto fail;
     }
diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h
index ec49097..0060670 100644
--- a/xen/include/public/xenpmu.h
+++ b/xen/include/public/xenpmu.h
@@ -27,6 +27,7 @@
 #define XENPMU_flags_set       3
 #define XENPMU_init            4
 #define XENPMU_finish          5
+#define XENPMU_lvtpc_set       6
 /* ` } */
 
 /* Parameters structure for HYPERVISOR_xenpmu_op call */
-- 
1.8.1.4

Boris Ostrovsky

2013-Sep-20 09:42 UTC

head link

[PATCH v2 11/13] x86/PMU: Handle PMU interrupts for PV guests

Add support for handling PMU interrupts for PV guests, make these interrupts
NMI instead of PMU_APIC_VECTOR vector. Depending on vpmu_mode forward the
interrupts to appropriate guest (mode is VPMU_ON) or to dom0 (VPMU_DOM0).

VPMU for the interrupted VCPU is unloaded until the guest issues XENPMU_flush
hypercall. This allows the guest to access PMU MSR values that are stored in
VPMU context which is shared between hypervisor and domain, thus avoiding
traps to hypervisor.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/apic.c                            |  13 ---
 xen/arch/x86/hvm/svm/vpmu.c                    |   8 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c              |   8 +-
 xen/arch/x86/hvm/vpmu.c                        | 111 +++++++++++++++++++++++--
 xen/include/asm-x86/hvm/vpmu.h                 |   1 +
 xen/include/asm-x86/irq.h                      |   1 -
 xen/include/asm-x86/mach-default/irq_vectors.h |   1 -
 xen/include/public/xenpmu.h                    |   1 +
 8 files changed, 115 insertions(+), 29 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index a52a0e8..9675e76 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -125,9 +125,6 @@ void __init apic_intr_init(void)
     /* IPI vectors for APIC spurious and error interrupts */
     set_direct_apic_vector(SPURIOUS_APIC_VECTOR, spurious_interrupt);
     set_direct_apic_vector(ERROR_APIC_VECTOR, error_interrupt);
-
-    /* Performance Counters Interrupt */
-    set_direct_apic_vector(PMU_APIC_VECTOR, pmu_apic_interrupt);
 }
 
 /* Using APIC to generate smp_local_timer_interrupt? */
@@ -1368,16 +1365,6 @@ void error_interrupt(struct cpu_user_regs *regs)
 }
 
 /*
- * This interrupt handles performance counters interrupt
- */
-
-void pmu_apic_interrupt(struct cpu_user_regs *regs)
-{
-    ack_APIC_irq();
-    vpmu_do_interrupt(regs);
-}
-
-/*
  * This initializes the IO-APIC and APIC hardware if this is
  * a UP kernel.
  */
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 527a1de..3993a95 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -283,8 +283,8 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t
msr_content)
         if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
             return 1;
         vpmu_set(vpmu, VPMU_RUNNING);
-        apic_write(APIC_LVTPC, PMU_APIC_VECTOR);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
+        apic_write(APIC_LVTPC, APIC_DM_NMI);
+        vpmu->hw_lapic_lvtpc = APIC_DM_NMI;
 
         if ( is_hvm_domain(v->domain) &&
              !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set
)
@@ -295,8 +295,8 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t
msr_content)
     if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
         (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu,
VPMU_RUNNING) )
     {
-        apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
-        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
+        apic_write(APIC_LVTPC, APIC_DM_NMI | APIC_LVT_MASKED);
+        vpmu->hw_lapic_lvtpc = APIC_DM_NMI | APIC_LVT_MASKED;
         vpmu_reset(vpmu, VPMU_RUNNING);
         if ( is_hvm_domain(v->domain) &&
              ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c
b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index ebbb516..27f0807 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -548,13 +548,13 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t
msr_content)
         if ( vpmu_is_set(vpmu, VPMU_RUNNING) &&
              is_vlapic_lvtpc_enabled(vcpu_vlapic(v)) )
         {
-            apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR);
-            vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
+            apic_write_around(APIC_LVTPC, APIC_DM_NMI);
+            vpmu->hw_lapic_lvtpc = APIC_DM_NMI;
         }
         else
         {
-            apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
-            vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
+            apic_write_around(APIC_LVTPC, APIC_DM_NMI | APIC_LVT_MASKED);
+            vpmu->hw_lapic_lvtpc = APIC_DM_NMI | APIC_LVT_MASKED;
         }
     }
 
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 4638193..1ea3a96 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -47,6 +47,7 @@ uint32_t __read_mostly vpmu_mode = XENPMU_MODE_OFF;
 static void parse_vpmu_param(char *s);
 custom_param("vpmu", parse_vpmu_param);
 
+static void vpmu_save_force(void *arg);
 static DEFINE_PER_CPU(struct vcpu *, last_vcpu);
 
 static void __init parse_vpmu_param(char *s)
@@ -74,7 +75,7 @@ static void vpmu_lvtpc_update(uint32_t val)
 {
      struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
-     vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
+     vpmu->hw_lapic_lvtpc = APIC_DM_NMI | (val & APIC_LVT_MASKED);
      apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
 }
 
@@ -82,6 +83,9 @@ int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
+    if ( (vpmu_mode & XENPMU_MODE_PRIV) && (current->domain !=
dom0) )
+        return 0;
+
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr
)
         return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
     return 0;
@@ -91,6 +95,9 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 {
     struct vpmu_struct *vpmu = vcpu_vpmu(current);
 
+    if ( (vpmu_mode & XENPMU_MODE_PRIV) && (current->domain !=
dom0) )
+        return 0;
+
     if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr
)
         return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
     return 0;
@@ -99,17 +106,97 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 int vpmu_do_interrupt(struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct vpmu_struct *vpmu;
 
-    if ( vpmu->arch_vpmu_ops )
+
+    /* dom0 will handle this interrupt */
+    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
+        (v->domain->domain_id >= DOMID_FIRST_RESERVED) )
+    {
+            if ( smp_processor_id() >= dom0->max_vcpus )
+                return 0;
+            v = dom0->vcpu[smp_processor_id()];
+    }
+
+    vpmu = vcpu_vpmu(v);
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return 0;
+
+    if ( !is_hvm_domain(v->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
+    {
+        /* PV guest or dom0 is doing system profiling */
+        void *p;
+        struct cpu_user_regs *gregs;
+
+        p = &v->arch.vpmu.xenpmu_data->pmu.regs;
+
+        /* PV guest will be reading PMU MSRs from xenpmu_data */
+        vpmu_save_force(v);
+
+        /* Store appropriate registers in xenpmu_data
+         *
+         * Note: ''!current->is_running'' is possible when
''set_current(next)''
+         * for the (HVM) guest has been called but
''reset_stack_and_jump()''
+         * has not (i.e. the guest is not actually running yet).
+         */
+        if ( !is_hvm_domain(current->domain) ||
+             ((vpmu_mode & XENPMU_MODE_PRIV) &&
!current->is_running) )
+        {
+            /*
+             * 32-bit dom0 cannot process Xen''s addresses (which are
64 bit)
+             * and therefore we treat it the same way as a non-priviledged
+             * PV 32-bit domain.
+             */
+            if ( is_pv_32bit_domain(current->domain) )
+            {
+                struct compat_cpu_user_regs cmp;
+
+                gregs = guest_cpu_user_regs();
+                XLAT_cpu_user_regs(&cmp, gregs);
+                memcpy(p, &cmp, sizeof(struct compat_cpu_user_regs));
+            }
+            else if ( (current->domain != dom0) &&
!is_idle_vcpu(current) &&
+                !(vpmu_mode & XENPMU_MODE_PRIV) )
+            {
+                /* PV guest */
+                gregs = guest_cpu_user_regs();
+                memcpy(p, gregs, sizeof(struct cpu_user_regs));
+            }
+            else
+                memcpy(p, regs, sizeof(struct cpu_user_regs));
+        }
+        else
+        {
+            /* HVM guest */
+            struct segment_register cs;
+
+            gregs = guest_cpu_user_regs();
+            hvm_get_segment_register(current, x86_seg_cs, &cs);
+            gregs->cs = cs.attr.fields.dpl;
+
+            memcpy(p, gregs, sizeof(struct cpu_user_regs));
+        }
+
+        v->arch.vpmu.xenpmu_data->domain_id =
current->domain->domain_id;
+        v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
+        v->arch.vpmu.xenpmu_data->pcpu_id = smp_processor_id();
+
+        raise_softirq(PMU_SOFTIRQ);
+        vpmu_set(vpmu, VPMU_WAIT_FOR_FLUSH);
+
+        return 1;
+    }
+    else  if ( vpmu->arch_vpmu_ops )
     {
-        struct vlapic *vlapic = vcpu_vlapic(v);
+        /* HVM guest */
+        struct vlapic *vlapic;
         u32 vlapic_lvtpc;
         unsigned char int_vec;
 
         if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
             return 0;
 
+        vlapic = vcpu_vlapic(v);
         if ( !is_vlapic_lvtpc_enabled(vlapic) )
             return 1;
 
@@ -169,7 +256,7 @@ void vpmu_save(struct vcpu *v)
         if ( vpmu->arch_vpmu_ops->arch_vpmu_save(v) )
             vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
 
-    apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
+    apic_write(APIC_LVTPC, APIC_DM_NMI | APIC_LVT_MASKED);
 }
 
 void vpmu_load(struct vcpu *v)
@@ -223,7 +310,13 @@ void vpmu_load(struct vcpu *v)
         vpmu->arch_vpmu_ops->arch_vpmu_load(v);
     }
 
-    vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+    /*
+     * PMU interrupt may happen while loading the context above. That
+     * may cause vpmu_save_force() in the handler so we we don''t
+     * want to mark the context as loaded.
+     */
+    if ( !vpmu_is_set(vpmu, VPMU_WAIT_FOR_FLUSH) )
+        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
 }
 
 void vpmu_initialise(struct vcpu *v)
@@ -444,6 +537,12 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
         vpmu_lvtpc_update((uint32_t)pmu_params.val);
         ret = 0;
         break;
+
+    case XENPMU_flush:
+        vpmu_reset(vcpu_vpmu(current), VPMU_WAIT_FOR_FLUSH);
+        vpmu_load(current);
+        ret = 0;
+        break;
     }
 
     return ret;
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index e046afd..348fc9a 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -68,6 +68,7 @@ struct vpmu_struct {
 #define VPMU_CONTEXT_SAVE                   0x8   /* Force context save */
 #define VPMU_FROZEN                         0x10  /* Stop counters while VCPU
is not running */
 #define VPMU_PASSIVE_DOMAIN_ALLOCATED       0x20
+#define VPMU_WAIT_FOR_FLUSH                 0x40  /* PV guest waits for
XENPMU_flush */
 
 #define vpmu_set(_vpmu, _x)         ((_vpmu)->flags |= (_x))
 #define vpmu_reset(_vpmu, _x)       ((_vpmu)->flags &= ~(_x))
diff --git a/xen/include/asm-x86/irq.h b/xen/include/asm-x86/irq.h
index 7f5da06..e582a72 100644
--- a/xen/include/asm-x86/irq.h
+++ b/xen/include/asm-x86/irq.h
@@ -88,7 +88,6 @@ void invalidate_interrupt(struct cpu_user_regs *regs);
 void call_function_interrupt(struct cpu_user_regs *regs);
 void apic_timer_interrupt(struct cpu_user_regs *regs);
 void error_interrupt(struct cpu_user_regs *regs);
-void pmu_apic_interrupt(struct cpu_user_regs *regs);
 void spurious_interrupt(struct cpu_user_regs *regs);
 void irq_move_cleanup_interrupt(struct cpu_user_regs *regs);
 
diff --git a/xen/include/asm-x86/mach-default/irq_vectors.h
b/xen/include/asm-x86/mach-default/irq_vectors.h
index 992e00c..46dcfaf 100644
--- a/xen/include/asm-x86/mach-default/irq_vectors.h
+++ b/xen/include/asm-x86/mach-default/irq_vectors.h
@@ -8,7 +8,6 @@
 #define EVENT_CHECK_VECTOR	0xfc
 #define CALL_FUNCTION_VECTOR	0xfb
 #define LOCAL_TIMER_VECTOR	0xfa
-#define PMU_APIC_VECTOR 	0xf9
 /*
  * High-priority dynamically-allocated vectors. For interrupts that
  * must be higher priority than any guest-bound interrupt.
diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h
index 0060670..f05fdfa 100644
--- a/xen/include/public/xenpmu.h
+++ b/xen/include/public/xenpmu.h
@@ -28,6 +28,7 @@
 #define XENPMU_init            4
 #define XENPMU_finish          5
 #define XENPMU_lvtpc_set       6
+#define XENPMU_flush           7 /* Write cached MSR values to HW     */
 /* ` } */
 
 /* Parameters structure for HYPERVISOR_xenpmu_op call */
-- 
1.8.1.4

Boris Ostrovsky

2013-Sep-20 09:42 UTC

head link

[PATCH v2 12/13] x86/PMU: Save VPMU state for PV guests during context switch

Save VPMU state during context switch for both HVM and PV guests unless we
are in VPMU_DOM0 vpmu mode (i.e. dom0 is doing all profiling).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/domain.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 36f4192..b2d14cb 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1416,17 +1416,15 @@ void context_switch(struct vcpu *prev, struct vcpu
*next)
     }
 
     if (prev != next)
-        update_runstate_area(prev);
-
-    if ( is_hvm_vcpu(prev) )
     {
-        if (prev != next)
+        update_runstate_area(prev);
+        if ( !(vpmu_mode & XENPMU_MODE_PRIV) || prev->domain != dom0 )
             vpmu_save(prev);
-
-        if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
-            pt_save_timer(prev);
     }
 
+    if ( is_hvm_vcpu(prev) &&
!list_empty(&prev->arch.hvm_vcpu.tm_list) )
+        pt_save_timer(prev);
+
     local_irq_disable();
 
     set_current(next);
@@ -1463,7 +1461,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
                            (next->domain->domain_id != 0));
     }
 
-    if (is_hvm_vcpu(next) && (prev != next) )
+    if ( prev != next && !(vpmu_mode & XENPMU_MODE_PRIV) )
         /* Must be done with interrupts enabled */
         vpmu_load(next);
 
-- 
1.8.1.4

Boris Ostrovsky

2013-Sep-20 09:42 UTC

head link

[PATCH v2 13/13] x86/PMU: Move vpmu files up from hvm directory

Since VPMU is now used by both HVM and PV we should move it up from HVM subtree:
 xen/arch/x86/hvm/vpmu.c => xen/arch/x86/vpmu.c
 xen/arch/x86/hvm/vmx/vpmu_core2.c => xen/arch/x86/vpmu_intel.c
 xen/arch/x86/hvm/svm/vpmu.c => xen/arch/x86/vpmu_amd.c
 xen/include/asm-x86/hvm/vpmu.h => xen/include/asm-x86/vpmu.h

No code changes (except for adjusting Makefiles and paths for #includes).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/Makefile                 |   1 +
 xen/arch/x86/hvm/Makefile             |   1 -
 xen/arch/x86/hvm/svm/Makefile         |   1 -
 xen/arch/x86/hvm/svm/vpmu.c           | 489 ------------------
 xen/arch/x86/hvm/vmx/Makefile         |   1 -
 xen/arch/x86/hvm/vmx/vpmu_core2.c     | 936 ----------------------------------
 xen/arch/x86/hvm/vpmu.c               | 549 --------------------
 xen/arch/x86/oprofile/op_model_ppro.c |   2 +-
 xen/arch/x86/traps.c                  |   2 +-
 xen/arch/x86/vpmu.c                   | 549 ++++++++++++++++++++
 xen/arch/x86/vpmu_amd.c               | 489 ++++++++++++++++++
 xen/arch/x86/vpmu_intel.c             | 936 ++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/domain.h          |   1 +
 xen/include/asm-x86/hvm/vmx/vmcs.h    |   1 -
 xen/include/asm-x86/hvm/vpmu.h        |  96 ----
 xen/include/asm-x86/vpmu.h            |  96 ++++
 16 files changed, 2074 insertions(+), 2076 deletions(-)
 delete mode 100644 xen/arch/x86/hvm/svm/vpmu.c
 delete mode 100644 xen/arch/x86/hvm/vmx/vpmu_core2.c
 delete mode 100644 xen/arch/x86/hvm/vpmu.c
 create mode 100644 xen/arch/x86/vpmu.c
 create mode 100644 xen/arch/x86/vpmu_amd.c
 create mode 100644 xen/arch/x86/vpmu_intel.c
 delete mode 100644 xen/include/asm-x86/hvm/vpmu.h
 create mode 100644 xen/include/asm-x86/vpmu.h

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index d502bdf..34b9c57 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -58,6 +58,7 @@ obj-y += crash.o
 obj-y += tboot.o
 obj-y += hpet.o
 obj-y += xstate.o
+obj-y += vpmu.o vpmu_intel.o vpmu_amd.o
 
 obj-$(crash_debug) += gdbstub.o
 
diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index eea5555..742b83b 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -22,4 +22,3 @@ obj-y += vlapic.o
 obj-y += vmsi.o
 obj-y += vpic.o
 obj-y += vpt.o
-obj-y += vpmu.o
\ No newline at end of file
diff --git a/xen/arch/x86/hvm/svm/Makefile b/xen/arch/x86/hvm/svm/Makefile
index a10a55e..760d295 100644
--- a/xen/arch/x86/hvm/svm/Makefile
+++ b/xen/arch/x86/hvm/svm/Makefile
@@ -6,4 +6,3 @@ obj-y += nestedsvm.o
 obj-y += svm.o
 obj-y += svmdebug.o
 obj-y += vmcb.o
-obj-y += vpmu.o
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
deleted file mode 100644
index 3993a95..0000000
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ /dev/null
@@ -1,489 +0,0 @@
-/*
- * vpmu.c: PMU virtualization for HVM domain.
- *
- * Copyright (c) 2010, Advanced Micro Devices, Inc.
- * Parts of this code are Copyright (c) 2007, Intel Corporation
- *
- * Author: Wei Wang <wei.wang2@amd.com>
- * Tested by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- */
-
-#include <xen/config.h>
-#include <xen/xenoprof.h>
-#include <xen/hvm/save.h>
-#include <xen/sched.h>
-#include <xen/irq.h>
-#include <asm/apic.h>
-#include <asm/hvm/vlapic.h>
-#include <asm/hvm/vpmu.h>
-#include <public/xenpmu.h>
-
-#define MSR_F10H_EVNTSEL_GO_SHIFT   40
-#define MSR_F10H_EVNTSEL_EN_SHIFT   22
-#define MSR_F10H_COUNTER_LENGTH     48
-
-#define is_guest_mode(msr) ((msr) & (1ULL <<
MSR_F10H_EVNTSEL_GO_SHIFT))
-#define is_pmu_enabled(msr) ((msr) & (1ULL <<
MSR_F10H_EVNTSEL_EN_SHIFT))
-#define set_guest_mode(msr) (msr |= (1ULL << MSR_F10H_EVNTSEL_GO_SHIFT))
-#define is_overflowed(msr) (!((msr) & (1ULL <<
(MSR_F10H_COUNTER_LENGTH-1))))
-
-static unsigned int __read_mostly num_counters;
-static const u32 __read_mostly *counters;
-static const u32 __read_mostly *ctrls;
-static bool_t __read_mostly k7_counters_mirrored;
-
-#define F10H_NUM_COUNTERS   4
-#define F15H_NUM_COUNTERS   6
-
-/* PMU Counter MSRs. */
-static const u32 AMD_F10H_COUNTERS[] = {
-    MSR_K7_PERFCTR0,
-    MSR_K7_PERFCTR1,
-    MSR_K7_PERFCTR2,
-    MSR_K7_PERFCTR3
-};
-
-/* PMU Control MSRs. */
-static const u32 AMD_F10H_CTRLS[] = {
-    MSR_K7_EVNTSEL0,
-    MSR_K7_EVNTSEL1,
-    MSR_K7_EVNTSEL2,
-    MSR_K7_EVNTSEL3
-};
-
-static const u32 AMD_F15H_COUNTERS[] = {
-    MSR_AMD_FAM15H_PERFCTR0,
-    MSR_AMD_FAM15H_PERFCTR1,
-    MSR_AMD_FAM15H_PERFCTR2,
-    MSR_AMD_FAM15H_PERFCTR3,
-    MSR_AMD_FAM15H_PERFCTR4,
-    MSR_AMD_FAM15H_PERFCTR5
-};
-
-static const u32 AMD_F15H_CTRLS[] = {
-    MSR_AMD_FAM15H_EVNTSEL0,
-    MSR_AMD_FAM15H_EVNTSEL1,
-    MSR_AMD_FAM15H_EVNTSEL2,
-    MSR_AMD_FAM15H_EVNTSEL3,
-    MSR_AMD_FAM15H_EVNTSEL4,
-    MSR_AMD_FAM15H_EVNTSEL5
-};
-
-static inline int get_pmu_reg_type(u32 addr)
-{
-    if ( (addr >= MSR_K7_EVNTSEL0) && (addr <= MSR_K7_EVNTSEL3) )
-        return MSR_TYPE_CTRL;
-
-    if ( (addr >= MSR_K7_PERFCTR0) && (addr <= MSR_K7_PERFCTR3) )
-        return MSR_TYPE_COUNTER;
-
-    if ( (addr >= MSR_AMD_FAM15H_EVNTSEL0) &&
-         (addr <= MSR_AMD_FAM15H_PERFCTR5 ) )
-    {
-        if (addr & 1)
-            return MSR_TYPE_COUNTER;
-        else
-            return MSR_TYPE_CTRL;
-    }
-
-    /* unsupported registers */
-    return -1;
-}
-
-static inline u32 get_fam15h_addr(u32 addr)
-{
-    switch ( addr )
-    {
-    case MSR_K7_PERFCTR0:
-        return MSR_AMD_FAM15H_PERFCTR0;
-    case MSR_K7_PERFCTR1:
-        return MSR_AMD_FAM15H_PERFCTR1;
-    case MSR_K7_PERFCTR2:
-        return MSR_AMD_FAM15H_PERFCTR2;
-    case MSR_K7_PERFCTR3:
-        return MSR_AMD_FAM15H_PERFCTR3;
-    case MSR_K7_EVNTSEL0:
-        return MSR_AMD_FAM15H_EVNTSEL0;
-    case MSR_K7_EVNTSEL1:
-        return MSR_AMD_FAM15H_EVNTSEL1;
-    case MSR_K7_EVNTSEL2:
-        return MSR_AMD_FAM15H_EVNTSEL2;
-    case MSR_K7_EVNTSEL3:
-        return MSR_AMD_FAM15H_EVNTSEL3;
-    default:
-        break;
-    }
-
-    return addr;
-}
-
-static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
-{
-    unsigned int i;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-        svm_intercept_msr(v, counters[i], MSR_INTERCEPT_NONE);
-        svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_WRITE);
-    }
-
-    ctxt->msr_bitmap_set = 1;
-}
-
-static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
-{
-    unsigned int i;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-        svm_intercept_msr(v, counters[i], MSR_INTERCEPT_RW);
-        svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_RW);
-    }
-
-    ctxt->msr_bitmap_set = 0;
-}
-
-static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs)
-{
-    return 1;
-}
-
-static inline void context_load(struct vcpu *v)
-{
-    unsigned int i;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-        wrmsrl(counters[i], ctxt->counters[i]);
-        wrmsrl(ctrls[i], ctxt->ctrls[i]);
-    }
-}
-
-static void amd_vpmu_load(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
-
-    vpmu_reset(vpmu, VPMU_FROZEN);
-
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-    {
-        unsigned int i;
-
-        for ( i = 0; i < num_counters; i++ )
-            wrmsrl(ctrls[i], ctxt->ctrls[i]);
-
-        return;
-    }
-
-    context_load(v);
-}
-
-static inline void context_save(struct vcpu *v)
-{
-    unsigned int i;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
-
-    /* No need to save controls -- they are saved in amd_vpmu_do_wrmsr */
-    for ( i = 0; i < num_counters; i++ )
-        rdmsrl(counters[i], ctxt->counters[i]);
-}
-
-static int amd_vpmu_save(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctx = vpmu->context;
-    unsigned int i;
-
-    if ( !vpmu_is_set(vpmu, VPMU_FROZEN) )
-    {
-        for ( i = 0; i < num_counters; i++ )
-            wrmsrl(ctrls[i], 0);
-
-        vpmu_set(vpmu, VPMU_FROZEN);
-    }
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
-            return 0;
-
-    context_save(v);
-
-    if ( is_hvm_domain(v->domain) && 
-        !vpmu_is_set(vpmu, VPMU_RUNNING) && ctx->msr_bitmap_set )
-        amd_vpmu_unset_msr_bitmap(v);
-
-    return 1;
-}
-
-static void context_update(unsigned int msr, u64 msr_content)
-{
-    unsigned int i;
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
-
-    if ( k7_counters_mirrored &&
-        ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)) )
-    {
-        msr = get_fam15h_addr(msr);
-    }
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-       if ( msr == ctrls[i] )
-       {
-           ctxt->ctrls[i] = msr_content;
-           return;
-       }
-        else if (msr == counters[i] )
-        {
-            ctxt->counters[i] = msr_content;
-            return;
-        }
-    }
-}
-
-static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
-{
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    /* For all counters, enable guest only mode for HVM guest */
-    if ( is_hvm_domain(v->domain) && (get_pmu_reg_type(msr) ==
MSR_TYPE_CTRL) &&
-        !(is_guest_mode(msr_content)) )
-    {
-        set_guest_mode(msr_content);
-    }
-
-    /* check if the first counter is enabled */
-    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
-        is_pmu_enabled(msr_content) && !vpmu_is_set(vpmu, VPMU_RUNNING)
)
-    {
-        if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
-            return 1;
-        vpmu_set(vpmu, VPMU_RUNNING);
-        apic_write(APIC_LVTPC, APIC_DM_NMI);
-        vpmu->hw_lapic_lvtpc = APIC_DM_NMI;
-
-        if ( is_hvm_domain(v->domain) &&
-             !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set
)
-            amd_vpmu_set_msr_bitmap(v);
-    }
-
-    /* stop saving & restore if guest stops first counter */
-    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
-        (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu,
VPMU_RUNNING) )
-    {
-        apic_write(APIC_LVTPC, APIC_DM_NMI | APIC_LVT_MASKED);
-        vpmu->hw_lapic_lvtpc = APIC_DM_NMI | APIC_LVT_MASKED;
-        vpmu_reset(vpmu, VPMU_RUNNING);
-        if ( is_hvm_domain(v->domain) &&
-             ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
-            amd_vpmu_unset_msr_bitmap(v);
-        release_pmu_ownship(PMU_OWNER_HVM);
-    }
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)
-        || vpmu_is_set(vpmu, VPMU_FROZEN) )
-    {
-        context_load(v);
-        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-        vpmu_reset(vpmu, VPMU_FROZEN);
-    }
-
-    /* Update vpmu context immediately */
-    context_update(msr, msr_content);
-
-    /* Write to hw counters */
-    wrmsrl(msr, msr_content);
-    return 1;
-}
-
-static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)
-        || vpmu_is_set(vpmu, VPMU_FROZEN) )
-    {
-        context_load(v);
-        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-        vpmu_reset(vpmu, VPMU_FROZEN);
-    }
-
-    rdmsrl(msr, *msr_content);
-
-    return 1;
-}
-
-static int amd_vpmu_initialise(struct vcpu *v)
-{
-    struct amd_vpmu_context *ctxt;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    uint8_t family = current_cpu_data.x86;
-
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return 0;
-
-    if ( counters == NULL )
-    {
-         switch ( family )
-	 {
-	 case 0x15:
-	     num_counters = F15H_NUM_COUNTERS;
-	     counters = AMD_F15H_COUNTERS;
-	     ctrls = AMD_F15H_CTRLS;
-	     k7_counters_mirrored = 1;
-	     break;
-	 case 0x10:
-	 case 0x12:
-	 case 0x14:
-	 case 0x16:
-	 default:
-	     num_counters = F10H_NUM_COUNTERS;
-	     counters = AMD_F10H_COUNTERS;
-	     ctrls = AMD_F10H_CTRLS;
-	     k7_counters_mirrored = 0;
-	     break;
-	 }
-    }
-
-    if ( is_hvm_domain(v->domain) )
-    {
-        ctxt = xzalloc(struct amd_vpmu_context);
-        if ( !ctxt )
-        {
-            gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
-                " PMU feature is unavailable on domain %d vcpu
%d.\n",
-                v->vcpu_id, v->domain->domain_id);
-                return -ENOMEM;
-        }
-    }
-    else
-        ctxt = &v->arch.vpmu.xenpmu_data->pmu.amd;
-
-    vpmu->context = ctxt;
-    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
-    return 0;
-}
-
-static void amd_vpmu_destroy(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return;
-
-    if ( is_hvm_domain(v->domain) )
-    {
-        if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
-            amd_vpmu_unset_msr_bitmap(v);
-
-        xfree(vpmu->context);
-        release_pmu_ownship(PMU_OWNER_HVM);
-    }
-
-    vpmu->context = NULL;
-    vpmu_clear(vpmu);
-}
-
-/* VPMU part of the ''q'' keyhandler */
-static void amd_vpmu_dump(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct amd_vpmu_context *ctxt = vpmu->context;
-    unsigned int i;
-
-    printk("    VPMU state: 0x%x ", vpmu->flags);
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-    {
-         printk("\n");
-         return;
-    }
-
-    printk("(");
-    if ( vpmu_is_set(vpmu, VPMU_PASSIVE_DOMAIN_ALLOCATED) )
-        printk("PASSIVE_DOMAIN_ALLOCATED, ");
-    if ( vpmu_is_set(vpmu, VPMU_FROZEN) )
-        printk("FROZEN, ");
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
-        printk("SAVE, ");
-    if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
-        printk("RUNNING, ");
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-        printk("LOADED, ");
-    printk("ALLOCATED)\n");
-
-    for ( i = 0; i < num_counters; i++ )
-    {
-        uint64_t ctrl, cntr;
-
-        rdmsrl(ctrls[i], ctrl);
-        rdmsrl(counters[i], cntr);
-        printk("      0x%08x: 0x%lx (0x%lx in HW)    0x%08x: 0x%lx (0x%lx
in HW)\n",
-            ctrls[i], ctxt->ctrls[i], ctrl,
-            counters[i], ctxt->counters[i], cntr);
-    }
-}
-
-struct arch_vpmu_ops amd_vpmu_ops = {
-    .do_wrmsr = amd_vpmu_do_wrmsr,
-    .do_rdmsr = amd_vpmu_do_rdmsr,
-    .do_interrupt = amd_vpmu_do_interrupt,
-    .arch_vpmu_destroy = amd_vpmu_destroy,
-    .arch_vpmu_save = amd_vpmu_save,
-    .arch_vpmu_load = amd_vpmu_load,
-    .arch_vpmu_dump = amd_vpmu_dump
-};
-
-int svm_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    uint8_t family = current_cpu_data.x86;
-    int ret = 0;
-
-    /* vpmu enabled? */
-    if ( vpmu_flags == XENPMU_MODE_OFF )
-        return 0;
-
-    switch ( family )
-    {
-    case 0x10:
-    case 0x12:
-    case 0x14:
-    case 0x15:
-    case 0x16:
-        ret = amd_vpmu_initialise(v);
-        if ( !ret )
-            vpmu->arch_vpmu_ops = &amd_vpmu_ops;
-        return ret;
-    }
-
-    printk("VPMU: Initialization failed. "
-           "AMD processor family %d has not "
-           "been supported\n", family);
-    return -EINVAL;
-}
-
diff --git a/xen/arch/x86/hvm/vmx/Makefile b/xen/arch/x86/hvm/vmx/Makefile
index 373b3d9..04a29ce 100644
--- a/xen/arch/x86/hvm/vmx/Makefile
+++ b/xen/arch/x86/hvm/vmx/Makefile
@@ -3,5 +3,4 @@ obj-y += intr.o
 obj-y += realmode.o
 obj-y += vmcs.o
 obj-y += vmx.o
-obj-y += vpmu_core2.o
 obj-y += vvmx.o
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c
b/xen/arch/x86/hvm/vmx/vpmu_core2.c
deleted file mode 100644
index 27f0807..0000000
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ /dev/null
@@ -1,936 +0,0 @@
-/*
- * vpmu_core2.c: CORE 2 specific PMU virtualization for HVM domain.
- *
- * Copyright (c) 2007, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Author: Haitao Shan <haitao.shan@intel.com>
- */
-
-#include <xen/config.h>
-#include <xen/sched.h>
-#include <xen/xenoprof.h>
-#include <xen/irq.h>
-#include <asm/system.h>
-#include <asm/regs.h>
-#include <asm/types.h>
-#include <asm/apic.h>
-#include <asm/traps.h>
-#include <asm/msr.h>
-#include <asm/msr-index.h>
-#include <asm/hvm/support.h>
-#include <asm/hvm/vlapic.h>
-#include <asm/hvm/vmx/vmx.h>
-#include <asm/hvm/vmx/vmcs.h>
-#include <public/sched.h>
-#include <public/hvm/save.h>
-#include <public/xenpmu.h>
-#include <asm/hvm/vpmu.h>
-
-/*
- * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID
- * instruction.
- * cpuid 0xa - Architectural Performance Monitoring Leaf
- * Register eax
- */
-#define PMU_VERSION_SHIFT        0  /* Version ID */
-#define PMU_VERSION_BITS         8  /* 8 bits 0..7 */
-#define PMU_VERSION_MASK         (((1 << PMU_VERSION_BITS) - 1) <<
PMU_VERSION_SHIFT)
-
-#define PMU_GENERAL_NR_SHIFT     8  /* Number of general pmu registers */
-#define PMU_GENERAL_NR_BITS      8  /* 8 bits 8..15 */
-#define PMU_GENERAL_NR_MASK      (((1 << PMU_GENERAL_NR_BITS) - 1)
<< PMU_GENERAL_NR_SHIFT)
-
-#define PMU_GENERAL_WIDTH_SHIFT 16  /* Width of general pmu registers */
-#define PMU_GENERAL_WIDTH_BITS   8  /* 8 bits 16..23 */
-#define PMU_GENERAL_WIDTH_MASK  (((1 << PMU_GENERAL_WIDTH_BITS) - 1)
<< PMU_GENERAL_WIDTH_SHIFT)
-/* Register edx */
-#define PMU_FIXED_NR_SHIFT       0  /* Number of fixed pmu registers */
-#define PMU_FIXED_NR_BITS        5  /* 5 bits 0..4 */
-#define PMU_FIXED_NR_MASK        (((1 << PMU_FIXED_NR_BITS) -1) <<
PMU_FIXED_NR_SHIFT)
-
-#define PMU_FIXED_WIDTH_SHIFT    5  /* Width of fixed pmu registers */
-#define PMU_FIXED_WIDTH_BITS     8  /* 8 bits 5..12 */
-#define PMU_FIXED_WIDTH_MASK     (((1 << PMU_FIXED_WIDTH_BITS) -1)
<< PMU_FIXED_WIDTH_SHIFT)
-
-/* Intel-specific VPMU features */
-#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
-#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
-
-/*
- * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
- * counters. 4 bits for every counter.
- */
-#define FIXED_CTR_CTRL_BITS 4
-#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
-
-static int arch_pmc_cnt; /* Number of general-purpose performance counters */
-static int fixed_pmc_cnt; /* Number of fixed performance counters */
-
-/*
- * QUIRK to workaround an issue on various family 6 cpus.
- * The issue leads to endless PMC interrupt loops on the processor.
- * If the interrupt handler is running and a pmc reaches the value 0, this
- * value remains forever and it triggers immediately a new interrupt after
- * finishing the handler.
- * A workaround is to read all flagged counters and if the value is 0 write
- * 1 (or another value != 0) into it.
- * There exist no errata and the real cause of this behaviour is unknown.
- */
-bool_t __read_mostly is_pmc_quirk;
-
-static void check_pmc_quirk(void)
-{
-    if ( current_cpu_data.x86 == 6 )
-        is_pmc_quirk = 1;
-    else
-        is_pmc_quirk = 0;    
-}
-
-static void handle_pmc_quirk(u64 msr_content)
-{
-    int i;
-    u64 val;
-
-    if ( !is_pmc_quirk )
-        return;
-
-    val = msr_content;
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        if ( val & 0x1 )
-        {
-            u64 cnt;
-            rdmsrl(MSR_P6_PERFCTR0 + i, cnt);
-            if ( cnt == 0 )
-                wrmsrl(MSR_P6_PERFCTR0 + i, 1);
-        }
-        val >>= 1;
-    }
-    val = msr_content >> 32;
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        if ( val & 0x1 )
-        {
-            u64 cnt;
-            rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, cnt);
-            if ( cnt == 0 )
-                wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, 1);
-        }
-        val >>= 1;
-    }
-}
-
-/*
- * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
- */
-static int core2_get_arch_pmc_count(void)
-{
-    u32 eax, ebx, ecx, edx;
-
-    cpuid(0xa, &eax, &ebx, &ecx, &edx);
-    return ( (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT );
-}
-
-/*
- * Read the number of fixed counters via CPUID.EDX[0xa].EDX[0..4]
- */
-static int core2_get_fixed_pmc_count(void)
-{
-    u32 eax, ebx, ecx, edx;
-
-    cpuid(0xa, &eax, &ebx, &ecx, &edx);
-    return ( (eax & PMU_FIXED_NR_MASK) >> PMU_FIXED_NR_SHIFT );
-}
-
-static u64 core2_calc_intial_glb_ctrl_msr(void)
-{
-    int arch_pmc_bits = (1 << arch_pmc_cnt) - 1;
-    u64 fix_pmc_bits  = (1 << fixed_pmc_cnt) - 1;
-    return ((fix_pmc_bits << 32) | arch_pmc_bits);
-}
-
-/* edx bits 5-12: Bit width of fixed-function performance counters  */
-static int core2_get_bitwidth_fix_count(void)
-{
-    u32 eax, ebx, ecx, edx;
-
-    cpuid(0xa, &eax, &ebx, &ecx, &edx);
-    return ((edx & PMU_FIXED_WIDTH_MASK) >> PMU_FIXED_WIDTH_SHIFT);
-}
-
-static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
-{
-    int i;
-
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        if ( msr_index == MSR_CORE_PERF_FIXED_CTR0 + i )
-        {
-            *type = MSR_TYPE_COUNTER;
-            *index = i;
-            return 1;
-        }
-    }
-
-    if ( (msr_index == MSR_CORE_PERF_FIXED_CTR_CTRL ) ||
-        (msr_index == MSR_IA32_DS_AREA) ||
-        (msr_index == MSR_IA32_PEBS_ENABLE) )
-    {
-        *type = MSR_TYPE_CTRL;
-        return 1;
-    }
-
-    if ( (msr_index == MSR_CORE_PERF_GLOBAL_CTRL) ||
-         (msr_index == MSR_CORE_PERF_GLOBAL_STATUS) ||
-         (msr_index == MSR_CORE_PERF_GLOBAL_OVF_CTRL) )
-    {
-        *type = MSR_TYPE_GLOBAL;
-        return 1;
-    }
-
-    if ( (msr_index >= MSR_IA32_PERFCTR0) &&
-         (msr_index < (MSR_IA32_PERFCTR0 + arch_pmc_cnt)) )
-    {
-        *type = MSR_TYPE_ARCH_COUNTER;
-        *index = msr_index - MSR_IA32_PERFCTR0;
-        return 1;
-    }
-
-    if ( (msr_index >= MSR_P6_EVNTSEL0) &&
-         (msr_index < (MSR_P6_EVNTSEL0 + arch_pmc_cnt)) )
-    {
-        *type = MSR_TYPE_ARCH_CTRL;
-        *index = msr_index - MSR_P6_EVNTSEL0;
-        return 1;
-    }
-
-    return 0;
-}
-
-#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
-static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
-{
-    int i;
-
-    /* Allow Read/Write PMU Counters MSR Directly. */
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
-        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
-                  msr_bitmap + 0x800/BYTES_PER_LONG);
-    }
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
-        clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
-                  msr_bitmap + 0x800/BYTES_PER_LONG);
-    }
-
-    /* Allow Read PMU Non-global Controls Directly. */
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-        clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap);
-
-    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
-    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
-    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
-}
-
-static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
-{
-    int i;
-
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
-        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
-                msr_bitmap + 0x800/BYTES_PER_LONG);
-    }
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-    {
-        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
-        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
-                msr_bitmap + 0x800/BYTES_PER_LONG);
-    }
-
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-        set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap);
-
-    set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
-    set_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
-    set_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
-}
-
-static inline void __core2_vpmu_save(struct vcpu *v)
-{
-    int i;
-    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
-
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i,
core2_vpmu_cxt->fix_counters[i]);
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-        rdmsrl(MSR_IA32_PERFCTR0 + i,
core2_vpmu_cxt->arch_msr_pair[i].counter);
-
-    if ( !is_hvm_domain(v->domain) )
-        rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
-}
-
-static int core2_vpmu_save(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
-        return 0;
-
-    if ( !is_hvm_domain(v->domain) )
-        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-
-    __core2_vpmu_save(v);
-
-    /* Unset PMU MSR bitmap to trap lazy load. */
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && cpu_has_vmx_msr_bitmap 
-        && is_hvm_domain(v->domain) )
-        core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
-
-    return 1;
-}
-
-static inline void __core2_vpmu_load(struct vcpu *v)
-{
-    int i;
-    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
-
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i,
core2_vpmu_cxt->fix_counters[i]);
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-        wrmsrl(MSR_IA32_PERFCTR0 + i,
core2_vpmu_cxt->arch_msr_pair[i].counter);
-
-    wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
-    wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
-    wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
-
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-        wrmsrl(MSR_P6_EVNTSEL0+i, core2_vpmu_cxt->arch_msr_pair[i].control);
-
-    if ( !is_hvm_domain(v->domain) )
-    {
-        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL,
core2_vpmu_cxt->global_ovf_ctrl);
-        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
-    }
-}
-
-static void core2_vpmu_load(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-        return;
-
-    __core2_vpmu_load(v);
-}
-
-static int core2_vpmu_alloc_resource(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt;
-
-    if ( is_hvm_domain(v->domain) )
-    {
-        if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
-            goto out_err;
-
-        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-        if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-            goto out_err;
-
-        if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
-            goto out_err;
-        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
-                            core2_calc_intial_glb_ctrl_msr());
-
-        core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context));
-        if ( !core2_vpmu_cxt )
-            goto out_err;
-    }
-    else
-    {
-        core2_vpmu_cxt = &v->arch.vpmu.xenpmu_data->pmu.intel;
-        vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
-    }
-
-    vpmu->context = (void *)core2_vpmu_cxt;
-
-    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
-
-    return 1;
-
-out_err:
-    vmx_rm_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL);
-    vmx_rm_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL);
-    release_pmu_ownship(PMU_OWNER_HVM);
-
-    printk("Failed to allocate VPMU resources for domain %u vcpu
%u\n",
-           v->vcpu_id, v->domain->domain_id);
-
-    return 0;
-}
-
-static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-    if ( !is_core2_vpmu_msr(msr_index, type, index) )
-        return 0;
-
-    if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) &&
-         !core2_vpmu_alloc_resource(current) )
-        return 0;
-
-    /* Do the lazy load staff. */
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-    {
-        __core2_vpmu_load(current);
-        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-        if ( cpu_has_vmx_msr_bitmap &&
is_hvm_domain(current->domain) )
-            core2_vpmu_set_msr_bitmap(current->arch.hvm_vmx.msr_bitmap);
-    }
-    return 1;
-}
-
-static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
-{
-    u64 global_ctrl, non_global_ctrl;
-    unsigned pmu_enable = 0;
-    int i, tmp;
-    int type = -1, index = -1;
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = NULL;
-
-    if ( !core2_vpmu_msr_common_check(msr, &type, &index) )
-    {
-        /* Special handling for BTS */
-        if ( msr == MSR_IA32_DEBUGCTLMSR )
-        {
-            uint64_t supported = IA32_DEBUGCTLMSR_TR | IA32_DEBUGCTLMSR_BTS |
-                                 IA32_DEBUGCTLMSR_BTINT;
-
-            if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
-                supported |= IA32_DEBUGCTLMSR_BTS_OFF_OS |
-                             IA32_DEBUGCTLMSR_BTS_OFF_USR;
-            if ( msr_content & supported )
-            {
-                if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
-                    return 1;
-                gdprintk(XENLOG_WARNING, "Debug Store is not supported on
this cpu\n");
-
-                if ( is_hvm_domain(v->domain) )
-                    hvm_inject_hw_exception(TRAP_gp_fault, 0);
-                else
-                    send_guest_trap(v->domain, v->vcpu_id,
TRAP_gp_fault);
-
-                return 0;
-            }
-        }
-        return 0;
-    }
-
-    core2_vpmu_cxt = vpmu->context;
-    switch ( msr )
-    {
-    case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-        core2_vpmu_cxt->global_ovf_status &= ~msr_content;
-        core2_vpmu_cxt->global_ovf_ctrl = msr_content;
-        return 1;
-    case MSR_CORE_PERF_GLOBAL_STATUS:
-        gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
-                 "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
-        if ( is_hvm_domain(v->domain) )
-            hvm_inject_hw_exception(TRAP_gp_fault, 0);
-        else
-            send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault);
-        return 1;
-    case MSR_IA32_PEBS_ENABLE:
-        if ( msr_content & 1 )
-            gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS,
"
-                     "which is not supported.\n");
-        core2_vpmu_cxt->pebs_enable = msr_content;
-        return 1;
-    case MSR_IA32_DS_AREA:
-        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
-        {
-            if ( !is_canonical_address(msr_content) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Illegal address for IA32_DS_AREA: %#"
PRIx64 "x\n",
-                         msr_content);
-                if ( is_hvm_domain(v->domain) )
-                    hvm_inject_hw_exception(TRAP_gp_fault, 0);
-                else
-                    send_guest_trap(v->domain, v->vcpu_id,
TRAP_gp_fault);
-                return 1;
-            }
-            core2_vpmu_cxt->ds_area = msr_content;
-            break;
-        }
-        gdprintk(XENLOG_WARNING, "Guest setting of DTS is
ignored.\n");
-        return 1;
-    case MSR_CORE_PERF_GLOBAL_CTRL:
-        global_ctrl = msr_content;
-        for ( i = 0; i < arch_pmc_cnt; i++ )
-        {
-            rdmsrl(MSR_P6_EVNTSEL0+i, non_global_ctrl);
-            pmu_enable += global_ctrl & (non_global_ctrl >> 22) &
1;
-            global_ctrl >>= 1;
-        }
-
-        rdmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, non_global_ctrl);
-        global_ctrl = msr_content >> 32;
-        for ( i = 0; i < fixed_pmc_cnt; i++ )
-        {
-            pmu_enable += (global_ctrl & 1) & ((non_global_ctrl &
0x3)? 1: 0);
-            non_global_ctrl >>= FIXED_CTR_CTRL_BITS;
-            global_ctrl >>= 1;
-        }
-        core2_vpmu_cxt->global_ctrl = msr_content;
-        break;
-    case MSR_CORE_PERF_FIXED_CTR_CTRL:
-        non_global_ctrl = msr_content;
-        if ( is_hvm_domain(v->domain) )
-            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
-        else
-            rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
-        global_ctrl >>= 32;
-        for ( i = 0; i < fixed_pmc_cnt; i++ )
-        {
-            pmu_enable += (global_ctrl & 1) & ((non_global_ctrl &
0x3)? 1: 0);
-            non_global_ctrl >>= 4;
-            global_ctrl >>= 1;
-        }
-        core2_vpmu_cxt->fixed_ctrl = msr_content;
-        break;
-    default:
-        tmp = msr - MSR_P6_EVNTSEL0;
-        if ( tmp >= 0 && tmp < arch_pmc_cnt )
-        {
-            if ( is_hvm_domain(v->domain) )
-                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
&global_ctrl);
-            else
-                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
-            core2_vpmu_cxt->arch_msr_pair[tmp].control = msr_content;
-            for ( i = 0; i < arch_pmc_cnt && !pmu_enable; i++ )
-                pmu_enable += (global_ctrl >> i) &
-                    (core2_vpmu_cxt->arch_msr_pair[i].control >> 22)
& 1;
-        }
-    }
-
-    pmu_enable += (core2_vpmu_cxt->ds_area != 0);
-    if ( pmu_enable )
-        vpmu_set(vpmu, VPMU_RUNNING);
-    else
-        vpmu_reset(vpmu, VPMU_RUNNING);
-
-    if ( is_hvm_domain(v->domain) )
-    {
-        /* Setup LVTPC in local apic */
-        if ( vpmu_is_set(vpmu, VPMU_RUNNING) &&
-             is_vlapic_lvtpc_enabled(vcpu_vlapic(v)) )
-        {
-            apic_write_around(APIC_LVTPC, APIC_DM_NMI);
-            vpmu->hw_lapic_lvtpc = APIC_DM_NMI;
-        }
-        else
-        {
-            apic_write_around(APIC_LVTPC, APIC_DM_NMI | APIC_LVT_MASKED);
-            vpmu->hw_lapic_lvtpc = APIC_DM_NMI | APIC_LVT_MASKED;
-        }
-    }
-
-    if ( type != MSR_TYPE_GLOBAL )
-    {
-        u64 mask;
-        int inject_gp = 0;
-        switch ( type )
-        {
-        case MSR_TYPE_ARCH_CTRL:      /* MSR_P6_EVNTSEL[0,...] */
-            mask = ~((1ull << 32) - 1);
-            if (msr_content & mask)
-                inject_gp = 1;
-            break;
-        case MSR_TYPE_CTRL:           /* IA32_FIXED_CTR_CTRL */
-            if  ( msr == MSR_IA32_DS_AREA )
-                break;
-            /* 4 bits per counter, currently 3 fixed counters implemented. */
-            mask = ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) -
1);
-            if (msr_content & mask)
-                inject_gp = 1;
-            break;
-        case MSR_TYPE_COUNTER:        /* IA32_FIXED_CTR[0-2] */
-            mask = ~((1ull << core2_get_bitwidth_fix_count()) - 1);
-            if (msr_content & mask)
-                inject_gp = 1;
-            break;
-        }
-
-        if (inject_gp) 
-        {
-           if ( is_hvm_domain(v->domain) )
-               hvm_inject_hw_exception(TRAP_gp_fault, 0);
-           else
-               send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault);
-        }
-        else
-            wrmsrl(msr, msr_content);
-    }
-    else
-    {
-       if ( is_hvm_domain(v->domain) )
-           vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
-       else
-           wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
-    }
-
-    return 1;
-}
-
-static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    int type = -1, index = -1;
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = NULL;
-
-    if ( core2_vpmu_msr_common_check(msr, &type, &index) )
-    {
-        core2_vpmu_cxt = vpmu->context;
-        switch ( msr )
-        {
-        case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-            *msr_content = 0;
-            break;
-        case MSR_CORE_PERF_GLOBAL_STATUS:
-            *msr_content = core2_vpmu_cxt->global_ovf_status;
-            break;
-        case MSR_CORE_PERF_GLOBAL_CTRL:
-            if ( is_hvm_domain(v->domain) )
-                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
-            else
-                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
-            break;
-        default:
-            rdmsrl(msr, *msr_content);
-        }
-    }
-    else
-    {
-        /* Extension for BTS */
-        if ( msr == MSR_IA32_MISC_ENABLE )
-        {
-            if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
-                *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
-        }
-        else
-            return 0;
-    }
-
-    return 1;
-}
-
-static void core2_vpmu_do_cpuid(unsigned int input,
-                                unsigned int *eax, unsigned int *ebx,
-                                unsigned int *ecx, unsigned int *edx)
-{
-    if (input == 0x1)
-    {
-        struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
-        {
-            /* Switch on the ''Debug Store'' feature in
CPUID.EAX[1]:EDX[21] */
-            *edx |= cpufeat_mask(X86_FEATURE_DS);
-            if ( cpu_has(&current_cpu_data, X86_FEATURE_DTES64) )
-                *ecx |= cpufeat_mask(X86_FEATURE_DTES64);
-            if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
-                *ecx |= cpufeat_mask(X86_FEATURE_DSCPL);
-        }
-    }
-    else if ( input == 0xa )
-    {
-        /* Limit number of counters to max that we support */
-        if ( ((*eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT)
>
-             XENPMU_CORE2_MAX_ARCH_PMCS )
-            *eax = (*eax & ~PMU_GENERAL_NR_MASK) |
-                (XENPMU_CORE2_MAX_ARCH_PMCS << PMU_GENERAL_NR_SHIFT);
-        if ( ((*edx & PMU_FIXED_NR_MASK) >> PMU_FIXED_NR_SHIFT) >
-             XENPMU_CORE2_MAX_FIXED_PMCS )
-            *eax = (*eax & ~PMU_FIXED_NR_MASK) |
-                (XENPMU_CORE2_MAX_FIXED_PMCS << PMU_FIXED_NR_SHIFT);
-    }
-}
-
-/* Dump vpmu info on console, called in the context of keyhandler
''q''. */
-static void core2_vpmu_dump(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    int i;
-    struct core2_vpmu_context *core2_vpmu_cxt = NULL;
-    u64 val;
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-         return;
-
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
-    {
-        if ( vpmu_set(vpmu, VPMU_CONTEXT_LOADED) )
-            printk("    vPMU loaded\n");
-        else
-            printk("    vPMU allocated\n");
-        return;
-    }
-
-    printk("    vPMU running\n");
-    core2_vpmu_cxt = vpmu->context;
-
-    /* Print the contents of the counter and its configuration msr. */
-    for ( i = 0; i < arch_pmc_cnt; i++ )
-        printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
-               i, core2_vpmu_cxt->arch_msr_pair[i].counter,
-               core2_vpmu_cxt->arch_msr_pair[i].control);
-
-    /*
-     * The configuration of the fixed counter is 4 bits each in the
-     * MSR_CORE_PERF_FIXED_CTR_CTRL.
-     */
-    val = core2_vpmu_cxt->fixed_ctrl;
-    for ( i = 0; i < fixed_pmc_cnt; i++ )
-    {
-        printk("      fixed_%d:   0x%016lx ctrl: 0x%lx\n",
-               i, core2_vpmu_cxt->fix_counters[i],
-               val & FIXED_CTR_CTRL_MASK);
-        val >>= FIXED_CTR_CTRL_BITS;
-    }
-}
-
-static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
-{
-    struct vcpu *v = current;
-    u64 msr_content;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context;
-
-    rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);
-    if ( msr_content )
-    {
-        if ( is_pmc_quirk )
-            handle_pmc_quirk(msr_content);
-        core2_vpmu_cxt->global_ovf_status |= msr_content;
-        msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
-        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
-    }
-    else
-    {
-        /* No PMC overflow but perhaps a Trace Message interrupt. */
-        msr_content = __vmread(GUEST_IA32_DEBUGCTL);
-        if ( !(msr_content & IA32_DEBUGCTLMSR_TR) )
-            return 0;
-    }
-
-    /* HW sets the MASK bit when performance counter interrupt occurs*/
-    vpmu->hw_lapic_lvtpc = apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED;
-    apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
-
-    return 1;
-}
-
-static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    u64 msr_content;
-    struct cpuinfo_x86 *c = &current_cpu_data;
-
-    if ( !(vpmu_flags & XENPMU_FLAGS_INTEL_BTS) )
-        goto func_out;
-    /* Check the ''Debug Store'' feature in the
CPUID.EAX[1]:EDX[21] */
-    if ( cpu_has(c, X86_FEATURE_DS) )
-    {
-        if ( !cpu_has(c, X86_FEATURE_DTES64) )
-        {
-            printk(XENLOG_G_WARNING "CPU doesn''t support 64-bit
DS Area"
-                   " - Debug Store disabled for d%d:v%d\n",
-                   v->domain->domain_id, v->vcpu_id);
-            goto func_out;
-        }
-        vpmu_set(vpmu, VPMU_CPU_HAS_DS);
-        rdmsrl(MSR_IA32_MISC_ENABLE, msr_content);
-        if ( msr_content & MSR_IA32_MISC_ENABLE_BTS_UNAVAIL )
-        {
-            /* If BTS_UNAVAIL is set reset the DS feature. */
-            vpmu_reset(vpmu, VPMU_CPU_HAS_DS);
-            printk(XENLOG_G_WARNING "CPU has set BTS_UNAVAIL"
-                   " - Debug Store disabled for d%d:v%d\n",
-                   v->domain->domain_id, v->vcpu_id);
-        }
-        else
-        {
-            vpmu_set(vpmu, VPMU_CPU_HAS_BTS);
-            if ( !cpu_has(c, X86_FEATURE_DSCPL) )
-                printk(XENLOG_G_INFO
-                       "vpmu: CPU doesn''t support CPL-Qualified
BTS\n");
-           
printk("******************************************************\n");
-            printk("** WARNING: Emulation of BTS Feature is switched on
**\n");
-            printk("** Using this processor feature in a virtualized   
**\n");
-            printk("** environment is not 100%% safe.                   
**\n");
-            printk("** Setting the DS buffer address with wrong values 
**\n");
-            printk("** may lead to hypervisor hangs or crashes.        
**\n");
-            printk("** It is NOT recommended for production use!       
**\n");
-           
printk("******************************************************\n");
-        }
-    }
-func_out:
-
-    arch_pmc_cnt = core2_get_arch_pmc_count();
-    if ( arch_pmc_cnt > XENPMU_CORE2_MAX_ARCH_PMCS )
-        arch_pmc_cnt = XENPMU_CORE2_MAX_ARCH_PMCS;
-    fixed_pmc_cnt = core2_get_fixed_pmc_count();
-    if ( fixed_pmc_cnt > XENPMU_CORE2_MAX_FIXED_PMCS )
-        fixed_pmc_cnt = XENPMU_CORE2_MAX_FIXED_PMCS;
-    check_pmc_quirk();
-
-    /* PV domains can allocate resources immediately */
-    if ( !is_hvm_domain(v->domain) )
-        if ( !core2_vpmu_alloc_resource(v) )
-            return 1;
-
-    return 0;
-}
-
-static void core2_vpmu_destroy(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return;
-
-    if ( is_hvm_domain(v->domain) )
-    {
-        xfree(vpmu->context);
-        if ( cpu_has_vmx_msr_bitmap )
-            core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
-    }
-
-    release_pmu_ownship(PMU_OWNER_HVM);
-    vpmu_clear(vpmu);
-}
-
-struct arch_vpmu_ops core2_vpmu_ops = {
-    .do_wrmsr = core2_vpmu_do_wrmsr,
-    .do_rdmsr = core2_vpmu_do_rdmsr,
-    .do_interrupt = core2_vpmu_do_interrupt,
-    .do_cpuid = core2_vpmu_do_cpuid,
-    .arch_vpmu_destroy = core2_vpmu_destroy,
-    .arch_vpmu_save = core2_vpmu_save,
-    .arch_vpmu_load = core2_vpmu_load,
-    .arch_vpmu_dump = core2_vpmu_dump
-};
-
-static void core2_no_vpmu_do_cpuid(unsigned int input,
-                                unsigned int *eax, unsigned int *ebx,
-                                unsigned int *ecx, unsigned int *edx)
-{
-    /*
-     * As in this case the vpmu is not enabled reset some bits in the
-     * architectural performance monitoring related part.
-     */
-    if ( input == 0xa )
-    {
-        *eax &= ~PMU_VERSION_MASK;
-        *eax &= ~PMU_GENERAL_NR_MASK;
-        *eax &= ~PMU_GENERAL_WIDTH_MASK;
-
-        *edx &= ~PMU_FIXED_NR_MASK;
-        *edx &= ~PMU_FIXED_WIDTH_MASK;
-    }
-}
-
-/*
- * If its a vpmu msr set it to 0.
- */
-static int core2_no_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    int type = -1, index = -1;
-    if ( !is_core2_vpmu_msr(msr, &type, &index) )
-        return 0;
-    *msr_content = 0;
-    return 1;
-}
-
-/*
- * These functions are used in case vpmu is not enabled.
- */
-struct arch_vpmu_ops core2_no_vpmu_ops = {
-    .do_rdmsr = core2_no_vpmu_do_rdmsr,
-    .do_cpuid = core2_no_vpmu_do_cpuid,
-};
-
-int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    uint8_t family = current_cpu_data.x86;
-    uint8_t cpu_model = current_cpu_data.x86_model;
-    int ret = 0;
-
-    vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
-    if ( vpmu_flags == XENPMU_MODE_OFF )
-        return 0;
-
-    if ( family == 6 )
-    {
-        switch ( cpu_model )
-        {
-        /* Core2: */
-        case 0x0f: /* original 65 nm celeron/pentium/core2/xeon,
"Merom"/"Conroe" */
-        case 0x16: /* single-core 65 nm celeron/core2solo
"Merom-L"/"Conroe-L" */
-        case 0x17: /* 45 nm celeron/core2/xeon
"Penryn"/"Wolfdale" */
-        case 0x1d: /* six-core 45 nm xeon "Dunnington" */
-
-        case 0x2a: /* SandyBridge */
-        case 0x2d: /* SandyBridge, "Romley-EP" */
-
-        /* Nehalem: */
-        case 0x1a: /* 45 nm nehalem, "Bloomfield" */
-        case 0x1e: /* 45 nm nehalem, "Lynnfield",
"Clarksfield", "Jasper Forest" */
-        case 0x2e: /* 45 nm nehalem-ex, "Beckton" */
-
-        /* Westmere: */
-        case 0x25: /* 32 nm nehalem, "Clarkdale",
"Arrandale" */
-        case 0x2c: /* 32 nm nehalem, "Gulftown",
"Westmere-EP" */
-        case 0x27: /* 32 nm Westmere-EX */
-
-        case 0x3a: /* IvyBridge */
-        case 0x3e: /* IvyBridge EP */
-        case 0x3c: /* Haswell */
-            ret = core2_vpmu_initialise(v, vpmu_flags);
-            if ( !ret )
-                vpmu->arch_vpmu_ops = &core2_vpmu_ops;
-            return ret;
-        }
-    }
-
-    printk("VPMU: Initialization failed. "
-           "Intel processor family %d model %d has not "
-           "been supported\n", family, cpu_model);
-    return -EINVAL;
-}
-
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
deleted file mode 100644
index 1ea3a96..0000000
--- a/xen/arch/x86/hvm/vpmu.c
+++ /dev/null
@@ -1,549 +0,0 @@
-/*
- * vpmu.c: PMU virtualization for HVM domain.
- *
- * Copyright (c) 2007, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Author: Haitao Shan <haitao.shan@intel.com>
- */
-#include <xen/config.h>
-#include <xen/sched.h>
-#include <xen/xenoprof.h>
-#include <xen/event.h>
-#include <xen/softirq.h>
-#include <xen/hypercall.h>
-#include <xen/guest_access.h>
-#include <asm/regs.h>
-#include <asm/types.h>
-#include <asm/msr.h>
-#include <asm/hvm/support.h>
-#include <asm/hvm/vmx/vmx.h>
-#include <asm/hvm/vmx/vmcs.h>
-#include <asm/hvm/vpmu.h>
-#include <asm/hvm/svm/svm.h>
-#include <asm/hvm/svm/vmcb.h>
-#include <asm/apic.h>
-#include <asm/nmi.h>
-#include <public/xenpmu.h>
-
-/*
- * "vpmu" :     vpmu generally enabled
- * "vpmu=off" : vpmu generally disabled
- * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
- */
-uint32_t __read_mostly vpmu_mode = XENPMU_MODE_OFF;
-static void parse_vpmu_param(char *s);
-custom_param("vpmu", parse_vpmu_param);
-
-static void vpmu_save_force(void *arg);
-static DEFINE_PER_CPU(struct vcpu *, last_vcpu);
-
-static void __init parse_vpmu_param(char *s)
-{
-    switch ( parse_bool(s) )
-    {
-    case 0:
-        break;
-    default:
-        if ( !strcmp(s, "bts") )
-            vpmu_mode |= XENPMU_FLAGS_INTEL_BTS;
-        else if ( *s )
-        {
-            printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
-            break;
-        }
-        /* fall through */
-    case 1:
-        vpmu_mode |= XENPMU_MODE_ON;
-        break;
-    }
-}
-
-static void vpmu_lvtpc_update(uint32_t val)
-{
-     struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-     vpmu->hw_lapic_lvtpc = APIC_DM_NMI | (val & APIC_LVT_MASKED);
-     apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
-}
-
-int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-    if ( (vpmu_mode & XENPMU_MODE_PRIV) && (current->domain !=
dom0) )
-        return 0;
-
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr
)
-        return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
-    return 0;
-}
-
-int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-    if ( (vpmu_mode & XENPMU_MODE_PRIV) && (current->domain !=
dom0) )
-        return 0;
-
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr
)
-        return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
-    return 0;
-}
-
-int vpmu_do_interrupt(struct cpu_user_regs *regs)
-{
-    struct vcpu *v = current;
-    struct vpmu_struct *vpmu;
-
-
-    /* dom0 will handle this interrupt */
-    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
-        (v->domain->domain_id >= DOMID_FIRST_RESERVED) )
-    {
-            if ( smp_processor_id() >= dom0->max_vcpus )
-                return 0;
-            v = dom0->vcpu[smp_processor_id()];
-    }
-
-    vpmu = vcpu_vpmu(v);
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return 0;
-
-    if ( !is_hvm_domain(v->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
-    {
-        /* PV guest or dom0 is doing system profiling */
-        void *p;
-        struct cpu_user_regs *gregs;
-
-        p = &v->arch.vpmu.xenpmu_data->pmu.regs;
-
-        /* PV guest will be reading PMU MSRs from xenpmu_data */
-        vpmu_save_force(v);
-
-        /* Store appropriate registers in xenpmu_data
-         *
-         * Note: ''!current->is_running'' is possible when
''set_current(next)''
-         * for the (HVM) guest has been called but
''reset_stack_and_jump()''
-         * has not (i.e. the guest is not actually running yet).
-         */
-        if ( !is_hvm_domain(current->domain) ||
-             ((vpmu_mode & XENPMU_MODE_PRIV) &&
!current->is_running) )
-        {
-            /*
-             * 32-bit dom0 cannot process Xen''s addresses (which are
64 bit)
-             * and therefore we treat it the same way as a non-priviledged
-             * PV 32-bit domain.
-             */
-            if ( is_pv_32bit_domain(current->domain) )
-            {
-                struct compat_cpu_user_regs cmp;
-
-                gregs = guest_cpu_user_regs();
-                XLAT_cpu_user_regs(&cmp, gregs);
-                memcpy(p, &cmp, sizeof(struct compat_cpu_user_regs));
-            }
-            else if ( (current->domain != dom0) &&
!is_idle_vcpu(current) &&
-                !(vpmu_mode & XENPMU_MODE_PRIV) )
-            {
-                /* PV guest */
-                gregs = guest_cpu_user_regs();
-                memcpy(p, gregs, sizeof(struct cpu_user_regs));
-            }
-            else
-                memcpy(p, regs, sizeof(struct cpu_user_regs));
-        }
-        else
-        {
-            /* HVM guest */
-            struct segment_register cs;
-
-            gregs = guest_cpu_user_regs();
-            hvm_get_segment_register(current, x86_seg_cs, &cs);
-            gregs->cs = cs.attr.fields.dpl;
-
-            memcpy(p, gregs, sizeof(struct cpu_user_regs));
-        }
-
-        v->arch.vpmu.xenpmu_data->domain_id =
current->domain->domain_id;
-        v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
-        v->arch.vpmu.xenpmu_data->pcpu_id = smp_processor_id();
-
-        raise_softirq(PMU_SOFTIRQ);
-        vpmu_set(vpmu, VPMU_WAIT_FOR_FLUSH);
-
-        return 1;
-    }
-    else  if ( vpmu->arch_vpmu_ops )
-    {
-        /* HVM guest */
-        struct vlapic *vlapic;
-        u32 vlapic_lvtpc;
-        unsigned char int_vec;
-
-        if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
-            return 0;
-
-        vlapic = vcpu_vlapic(v);
-        if ( !is_vlapic_lvtpc_enabled(vlapic) )
-            return 1;
-
-        vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
-        int_vec = vlapic_lvtpc & APIC_VECTOR_MASK;
-
-        if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
-            vlapic_set_irq(vcpu_vlapic(v), int_vec, 0);
-        else
-            v->nmi_pending = 1;
-        return 1;
-    }
-
-    return 0;
-}
-
-void vpmu_do_cpuid(unsigned int input,
-                   unsigned int *eax, unsigned int *ebx,
-                   unsigned int *ecx, unsigned int *edx)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(current);
-
-    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_cpuid
)
-        vpmu->arch_vpmu_ops->do_cpuid(input, eax, ebx, ecx, edx);
-}
-
-static void vpmu_save_force(void *arg)
-{
-    struct vcpu *v = (struct vcpu *)arg;
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
-        return;
-
-    vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-
-    if ( vpmu->arch_vpmu_ops )
-        (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v);
-
-    vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-
-    per_cpu(last_vcpu, smp_processor_id()) = NULL;
-}
-
-void vpmu_save(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    int pcpu = smp_processor_id();
-
-    if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
-       return;
-
-    vpmu->last_pcpu = pcpu;
-    per_cpu(last_vcpu, pcpu) = v;
-
-    if ( vpmu->arch_vpmu_ops )
-        if ( vpmu->arch_vpmu_ops->arch_vpmu_save(v) )
-            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
-
-    apic_write(APIC_LVTPC, APIC_DM_NMI | APIC_LVT_MASKED);
-}
-
-void vpmu_load(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    int pcpu = smp_processor_id();
-    struct vcpu *prev = NULL;
-
-    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        return;
-
-    /* First time this VCPU is running here */
-    if ( vpmu->last_pcpu != pcpu )
-    {
-        /*
-         * Get the context from last pcpu that we ran on. Note that if another
-         * VCPU is running there it must have saved this VPCU''s
context before
-         * startig to run (see below).
-         * There should be no race since remote pcpu will disable interrupts
-         * before saving the context.
-         */
-        if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-            on_selected_cpus(cpumask_of(vpmu->last_pcpu),
-                             vpmu_save_force, (void *)v, 1);
-    } 
-
-    /* Prevent forced context save from remote CPU */
-    local_irq_disable();
-
-    prev = per_cpu(last_vcpu, pcpu);
-
-    if ( prev != v && prev )
-    {
-        vpmu = vcpu_vpmu(prev);
-
-        /* Someone ran here before us */
-        vpmu_save_force(prev);
-
-        vpmu = vcpu_vpmu(v);
-    }
-
-    local_irq_enable();
-
-    /* Only when PMU is counting, we load PMU context immediately. */
-    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
-        return;
-
-    if ( vpmu->arch_vpmu_ops &&
vpmu->arch_vpmu_ops->arch_vpmu_load )
-    {
-        apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
-        vpmu->arch_vpmu_ops->arch_vpmu_load(v);
-    }
-
-    /*
-     * PMU interrupt may happen while loading the context above. That
-     * may cause vpmu_save_force() in the handler so we we don''t
-     * want to mark the context as loaded.
-     */
-    if ( !vpmu_is_set(vpmu, VPMU_WAIT_FOR_FLUSH) )
-        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
-}
-
-void vpmu_initialise(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-    uint8_t vendor = current_cpu_data.x86_vendor;
-
-    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-        vpmu_destroy(v);
-    vpmu_clear(vpmu);
-    vpmu->context = NULL;
-
-    switch ( vendor )
-    {
-    case X86_VENDOR_AMD:
-        if ( svm_vpmu_initialise(v, vpmu_mode) != 0 )
-            vpmu_mode = XENPMU_MODE_OFF;
-        break;
-
-    case X86_VENDOR_INTEL:
-        if ( vmx_vpmu_initialise(v, vpmu_mode) != 0 )
-            vpmu_mode = XENPMU_MODE_OFF;
-        break;
-
-    default:
-        printk("VPMU: Initialization failed. "
-               "Unknown CPU vendor %d\n", vendor);
-        vpmu_mode = XENPMU_MODE_OFF;
-        break;
-    }
-}
-
-void vpmu_destroy(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( vpmu->arch_vpmu_ops &&
vpmu->arch_vpmu_ops->arch_vpmu_destroy )
-    {
-        /* Unload VPMU first. This will stop counters from running */
-        on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
-                         vpmu_save_force, (void *)v, 1);
-
-        vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
-    }
-}
-
-/* Dump some vpmu informations on console. Used in keyhandler dump_domains().
*/
-void vpmu_dump(struct vcpu *v)
-{
-    struct vpmu_struct *vpmu = vcpu_vpmu(v);
-
-    if ( vpmu->arch_vpmu_ops &&
vpmu->arch_vpmu_ops->arch_vpmu_dump )
-        vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
-}
-
-int pmu_nmi_interrupt(struct cpu_user_regs *regs, int cpu)
-{
-    return vpmu_do_interrupt(regs);
-}
-
-/* Process the softirq set by PMU NMI handler */
-void pmu_virq(void)
-{
-    struct vcpu *v = current;
-
-    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
-        (v->domain->domain_id >= DOMID_FIRST_RESERVED) )
-    {
-        if ( smp_processor_id() >= dom0->max_vcpus )
-        {
-            printk(KERN_WARNING "PMU softirq on unexpected processor
%d\n",
-                smp_processor_id());
-            return;
-        }
-        v = dom0->vcpu[smp_processor_id()];
-    }
-
-    send_guest_vcpu_virq(v, VIRQ_XENPMU);
-}
-
-static int pvpmu_init(struct domain *d, xenpmu_params_t *params)
-{
-    struct vcpu *v;
-    uint64_t mfn = params->val;
-    static int pvpmu_initted = 0;
-
-    if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus )
-        return -EINVAL;
-
-    if ( !pvpmu_initted )
-    {
-        if (reserve_lapic_nmi() == 0)
-            set_nmi_callback(pmu_nmi_interrupt);
-        else
-        {
-            printk("Failed to reserve PMU NMI\n");
-            return -EBUSY;
-        }
-        open_softirq(PMU_SOFTIRQ, pmu_virq);
-        pvpmu_initted = 1;
-    }
-
-    if ( !mfn_valid(mfn) ||
-        !get_page_and_type(mfn_to_page(mfn), d, PGT_writable_page) )
-        return -EINVAL;
-
-    v = d->vcpu[params->vcpu];
-    v->arch.vpmu.xenpmu_data = map_domain_page_global(mfn);
-    memset(v->arch.vpmu.xenpmu_data, 0, PAGE_SIZE);
-
-    vpmu_initialise(v);
-
-    return 0;
-}
-
-static void pvpmu_finish(struct domain *d, xenpmu_params_t *params)
-{
-    struct vcpu *v;
-    uint64_t mfn;
-
-    if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus )
-        return;
-
-    v = d->vcpu[params->vcpu];
-    if (v != current)
-        vcpu_pause(v);
-
-    if ( v->arch.vpmu.xenpmu_data )
-    {
-        mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data);
-        if ( mfn_valid(mfn) )
-        {
-            unmap_domain_page_global(v->arch.vpmu.xenpmu_data);
-            put_page_and_type(mfn_to_page(mfn));
-        }
-    }
-    vpmu_destroy(v);
-
-    if (v != current)
-        vcpu_unpause(v);
-}
-
-long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
-{
-    int ret = -EINVAL;
-    xenpmu_params_t pmu_params;
-    uint32_t mode, flags;
-
-    switch ( op )
-    {
-    case XENPMU_mode_set:
-        if ( !is_control_domain(current->domain) )
-            return -EPERM;
-
-        if ( copy_from_guest(&pmu_params, arg, 1) )
-            return -EFAULT;
-
-        mode = (uint32_t)pmu_params.val & XENPMU_MODE_MASK;
-        if ( (mode & ~(XENPMU_MODE_ON | XENPMU_MODE_PRIV)) ||
-            ((mode & XENPMU_MODE_ON) && (mode &
XENPMU_MODE_PRIV)) )
-            return -EINVAL;
-
-        vpmu_mode &= ~XENPMU_MODE_MASK;
-        vpmu_mode |= mode;
-
-        ret = 0;
-        break;
-
-    case XENPMU_mode_get:
-        pmu_params.val = vpmu_mode & XENPMU_MODE_MASK;
-        pmu_params.version.maj = XENPMU_VER_MAJ;
-        pmu_params.version.min = XENPMU_VER_MIN;
-        if ( copy_to_guest(arg, &pmu_params, 1) )
-            return -EFAULT;
-        ret = 0;
-        break;
-
-    case XENPMU_flags_set:
-        if ( !is_control_domain(current->domain) )
-            return -EPERM;
-
-        if ( copy_from_guest(&pmu_params, arg, 1) )
-            return -EFAULT;
-
-        flags = (uint64_t)pmu_params.val & XENPMU_FLAGS_MASK;
-        if ( flags & ~XENPMU_FLAGS_INTEL_BTS )
-            return -EINVAL;
-
-        vpmu_mode &= ~XENPMU_FLAGS_MASK;
-        vpmu_mode |= flags;
-
-        ret = 0;
-        break;
-
-    case XENPMU_flags_get:
-        pmu_params.val = vpmu_mode & XENPMU_FLAGS_MASK;
-        if ( copy_to_guest(arg, &pmu_params, 1) )
-            return -EFAULT;
-        ret = 0;
-        break;
-
-    case XENPMU_init:
-        if ( copy_from_guest(&pmu_params, arg, 1) )
-            return -EFAULT;
-        ret = pvpmu_init(current->domain, &pmu_params);
-        break;
-
-    case XENPMU_finish:
-        if ( copy_from_guest(&pmu_params, arg, 1) )
-            return -EFAULT;
-        pvpmu_finish(current->domain, &pmu_params);
-        break;
-
-    case XENPMU_lvtpc_set:
-        if ( copy_from_guest(&pmu_params, arg, 1) )
-            return -EFAULT;
-
-        vpmu_lvtpc_update((uint32_t)pmu_params.val);
-        ret = 0;
-        break;
-
-    case XENPMU_flush:
-        vpmu_reset(vcpu_vpmu(current), VPMU_WAIT_FOR_FLUSH);
-        vpmu_load(current);
-        ret = 0;
-        break;
-    }
-
-    return ret;
-}
diff --git a/xen/arch/x86/oprofile/op_model_ppro.c
b/xen/arch/x86/oprofile/op_model_ppro.c
index 5aae2e7..bf5d9a5 100644
--- a/xen/arch/x86/oprofile/op_model_ppro.c
+++ b/xen/arch/x86/oprofile/op_model_ppro.c
@@ -19,7 +19,7 @@
 #include <asm/processor.h>
 #include <asm/regs.h>
 #include <asm/current.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 
 #include "op_x86_model.h"
 #include "op_counter.h"
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index f378a24..f969516 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -71,7 +71,7 @@
 #include <asm/apic.h>
 #include <asm/mc146818rtc.h>
 #include <asm/hpet.h>
-#include <asm/hvm/vpmu.h>
+#include <asm/vpmu.h>
 #include <public/arch-x86/cpuid.h>
 #include <xsm/xsm.h>
 
diff --git a/xen/arch/x86/vpmu.c b/xen/arch/x86/vpmu.c
new file mode 100644
index 0000000..763fe5c
--- /dev/null
+++ b/xen/arch/x86/vpmu.c
@@ -0,0 +1,549 @@
+/*
+ * vpmu.c: PMU virtualization for HVM domain.
+ *
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Author: Haitao Shan <haitao.shan@intel.com>
+ */
+#include <xen/config.h>
+#include <xen/sched.h>
+#include <xen/xenoprof.h>
+#include <xen/event.h>
+#include <xen/softirq.h>
+#include <xen/hypercall.h>
+#include <xen/guest_access.h>
+#include <asm/regs.h>
+#include <asm/types.h>
+#include <asm/msr.h>
+#include <asm/hvm/support.h>
+#include <asm/hvm/vmx/vmx.h>
+#include <asm/hvm/vmx/vmcs.h>
+#include <asm/vpmu.h>
+#include <asm/hvm/svm/svm.h>
+#include <asm/hvm/svm/vmcb.h>
+#include <asm/apic.h>
+#include <asm/nmi.h>
+#include <public/xenpmu.h>
+
+/*
+ * "vpmu" :     vpmu generally enabled
+ * "vpmu=off" : vpmu generally disabled
+ * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
+ */
+uint32_t __read_mostly vpmu_mode = XENPMU_MODE_OFF;
+static void parse_vpmu_param(char *s);
+custom_param("vpmu", parse_vpmu_param);
+
+static void vpmu_save_force(void *arg);
+static DEFINE_PER_CPU(struct vcpu *, last_vcpu);
+
+static void __init parse_vpmu_param(char *s)
+{
+    switch ( parse_bool(s) )
+    {
+    case 0:
+        break;
+    default:
+        if ( !strcmp(s, "bts") )
+            vpmu_mode |= XENPMU_FLAGS_INTEL_BTS;
+        else if ( *s )
+        {
+            printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
+            break;
+        }
+        /* fall through */
+    case 1:
+        vpmu_mode |= XENPMU_MODE_ON;
+        break;
+    }
+}
+
+static void vpmu_lvtpc_update(uint32_t val)
+{
+     struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+     vpmu->hw_lapic_lvtpc = APIC_DM_NMI | (val & APIC_LVT_MASKED);
+     apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+}
+
+int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    if ( (vpmu_mode & XENPMU_MODE_PRIV) && (current->domain !=
dom0) )
+        return 0;
+
+    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr
)
+        return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content);
+    return 0;
+}
+
+int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    if ( (vpmu_mode & XENPMU_MODE_PRIV) && (current->domain !=
dom0) )
+        return 0;
+
+    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr
)
+        return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+    return 0;
+}
+
+int vpmu_do_interrupt(struct cpu_user_regs *regs)
+{
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu;
+
+
+    /* dom0 will handle this interrupt */
+    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
+        (v->domain->domain_id >= DOMID_FIRST_RESERVED) )
+    {
+            if ( smp_processor_id() >= dom0->max_vcpus )
+                return 0;
+            v = dom0->vcpu[smp_processor_id()];
+    }
+
+    vpmu = vcpu_vpmu(v);
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return 0;
+
+    if ( !is_hvm_domain(v->domain) || (vpmu_mode & XENPMU_MODE_PRIV) )
+    {
+        /* PV guest or dom0 is doing system profiling */
+        void *p;
+        struct cpu_user_regs *gregs;
+
+        p = &v->arch.vpmu.xenpmu_data->pmu.regs;
+
+        /* PV guest will be reading PMU MSRs from xenpmu_data */
+        vpmu_save_force(v);
+
+        /* Store appropriate registers in xenpmu_data
+         *
+         * Note: ''!current->is_running'' is possible when
''set_current(next)''
+         * for the (HVM) guest has been called but
''reset_stack_and_jump()''
+         * has not (i.e. the guest is not actually running yet).
+         */
+        if ( !is_hvm_domain(current->domain) ||
+             ((vpmu_mode & XENPMU_MODE_PRIV) &&
!current->is_running) )
+        {
+            /*
+             * 32-bit dom0 cannot process Xen''s addresses (which are
64 bit)
+             * and therefore we treat it the same way as a non-priviledged
+             * PV 32-bit domain.
+             */
+            if ( is_pv_32bit_domain(current->domain) )
+            {
+                struct compat_cpu_user_regs cmp;
+
+                gregs = guest_cpu_user_regs();
+                XLAT_cpu_user_regs(&cmp, gregs);
+                memcpy(p, &cmp, sizeof(struct compat_cpu_user_regs));
+            }
+            else if ( (current->domain != dom0) &&
!is_idle_vcpu(current) &&
+                !(vpmu_mode & XENPMU_MODE_PRIV) )
+            {
+                /* PV guest */
+                gregs = guest_cpu_user_regs();
+                memcpy(p, gregs, sizeof(struct cpu_user_regs));
+            }
+            else
+                memcpy(p, regs, sizeof(struct cpu_user_regs));
+        }
+        else
+        {
+            /* HVM guest */
+            struct segment_register cs;
+
+            gregs = guest_cpu_user_regs();
+            hvm_get_segment_register(current, x86_seg_cs, &cs);
+            gregs->cs = cs.attr.fields.dpl;
+
+            memcpy(p, gregs, sizeof(struct cpu_user_regs));
+        }
+
+        v->arch.vpmu.xenpmu_data->domain_id =
current->domain->domain_id;
+        v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
+        v->arch.vpmu.xenpmu_data->pcpu_id = smp_processor_id();
+
+        raise_softirq(PMU_SOFTIRQ);
+        vpmu_set(vpmu, VPMU_WAIT_FOR_FLUSH);
+
+        return 1;
+    }
+    else  if ( vpmu->arch_vpmu_ops )
+    {
+        /* HVM guest */
+        struct vlapic *vlapic;
+        u32 vlapic_lvtpc;
+        unsigned char int_vec;
+
+        if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
+            return 0;
+
+        vlapic = vcpu_vlapic(v);
+        if ( !is_vlapic_lvtpc_enabled(vlapic) )
+            return 1;
+
+        vlapic_lvtpc = vlapic_get_reg(vlapic, APIC_LVTPC);
+        int_vec = vlapic_lvtpc & APIC_VECTOR_MASK;
+
+        if ( GET_APIC_DELIVERY_MODE(vlapic_lvtpc) == APIC_MODE_FIXED )
+            vlapic_set_irq(vcpu_vlapic(v), int_vec, 0);
+        else
+            v->nmi_pending = 1;
+        return 1;
+    }
+
+    return 0;
+}
+
+void vpmu_do_cpuid(unsigned int input,
+                   unsigned int *eax, unsigned int *ebx,
+                   unsigned int *ecx, unsigned int *edx)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_cpuid
)
+        vpmu->arch_vpmu_ops->do_cpuid(input, eax, ebx, ecx, edx);
+}
+
+static void vpmu_save_force(void *arg)
+{
+    struct vcpu *v = (struct vcpu *)arg;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
+        return;
+
+    vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+
+    if ( vpmu->arch_vpmu_ops )
+        (void)vpmu->arch_vpmu_ops->arch_vpmu_save(v);
+
+    vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+
+    per_cpu(last_vcpu, smp_processor_id()) = NULL;
+}
+
+void vpmu_save(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    int pcpu = smp_processor_id();
+
+    if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_ALLOCATED | VPMU_CONTEXT_LOADED) )
+       return;
+
+    vpmu->last_pcpu = pcpu;
+    per_cpu(last_vcpu, pcpu) = v;
+
+    if ( vpmu->arch_vpmu_ops )
+        if ( vpmu->arch_vpmu_ops->arch_vpmu_save(v) )
+            vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+
+    apic_write(APIC_LVTPC, APIC_DM_NMI | APIC_LVT_MASKED);
+}
+
+void vpmu_load(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    int pcpu = smp_processor_id();
+    struct vcpu *prev = NULL;
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return;
+
+    /* First time this VCPU is running here */
+    if ( vpmu->last_pcpu != pcpu )
+    {
+        /*
+         * Get the context from last pcpu that we ran on. Note that if another
+         * VCPU is running there it must have saved this VPCU''s
context before
+         * startig to run (see below).
+         * There should be no race since remote pcpu will disable interrupts
+         * before saving the context.
+         */
+        if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+            on_selected_cpus(cpumask_of(vpmu->last_pcpu),
+                             vpmu_save_force, (void *)v, 1);
+    } 
+
+    /* Prevent forced context save from remote CPU */
+    local_irq_disable();
+
+    prev = per_cpu(last_vcpu, pcpu);
+
+    if ( prev != v && prev )
+    {
+        vpmu = vcpu_vpmu(prev);
+
+        /* Someone ran here before us */
+        vpmu_save_force(prev);
+
+        vpmu = vcpu_vpmu(v);
+    }
+
+    local_irq_enable();
+
+    /* Only when PMU is counting, we load PMU context immediately. */
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
+        return;
+
+    if ( vpmu->arch_vpmu_ops &&
vpmu->arch_vpmu_ops->arch_vpmu_load )
+    {
+        apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+        vpmu->arch_vpmu_ops->arch_vpmu_load(v);
+    }
+
+    /*
+     * PMU interrupt may happen while loading the context above. That
+     * may cause vpmu_save_force() in the handler so we we don''t
+     * want to mark the context as loaded.
+     */
+    if ( !vpmu_is_set(vpmu, VPMU_WAIT_FOR_FLUSH) )
+        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+}
+
+void vpmu_initialise(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    uint8_t vendor = current_cpu_data.x86_vendor;
+
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        vpmu_destroy(v);
+    vpmu_clear(vpmu);
+    vpmu->context = NULL;
+
+    switch ( vendor )
+    {
+    case X86_VENDOR_AMD:
+        if ( svm_vpmu_initialise(v, vpmu_mode) != 0 )
+            vpmu_mode = XENPMU_MODE_OFF;
+        break;
+
+    case X86_VENDOR_INTEL:
+        if ( vmx_vpmu_initialise(v, vpmu_mode) != 0 )
+            vpmu_mode = XENPMU_MODE_OFF;
+        break;
+
+    default:
+        printk("VPMU: Initialization failed. "
+               "Unknown CPU vendor %d\n", vendor);
+        vpmu_mode = XENPMU_MODE_OFF;
+        break;
+    }
+}
+
+void vpmu_destroy(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( vpmu->arch_vpmu_ops &&
vpmu->arch_vpmu_ops->arch_vpmu_destroy )
+    {
+        /* Unload VPMU first. This will stop counters from running */
+        on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu),
+                         vpmu_save_force, (void *)v, 1);
+
+        vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
+    }
+}
+
+/* Dump some vpmu informations on console. Used in keyhandler dump_domains().
*/
+void vpmu_dump(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( vpmu->arch_vpmu_ops &&
vpmu->arch_vpmu_ops->arch_vpmu_dump )
+        vpmu->arch_vpmu_ops->arch_vpmu_dump(v);
+}
+
+int pmu_nmi_interrupt(struct cpu_user_regs *regs, int cpu)
+{
+    return vpmu_do_interrupt(regs);
+}
+
+/* Process the softirq set by PMU NMI handler */
+void pmu_virq(void)
+{
+    struct vcpu *v = current;
+
+    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
+        (v->domain->domain_id >= DOMID_FIRST_RESERVED) )
+    {
+        if ( smp_processor_id() >= dom0->max_vcpus )
+        {
+            printk(KERN_WARNING "PMU softirq on unexpected processor
%d\n",
+                smp_processor_id());
+            return;
+        }
+        v = dom0->vcpu[smp_processor_id()];
+    }
+
+    send_guest_vcpu_virq(v, VIRQ_XENPMU);
+}
+
+static int pvpmu_init(struct domain *d, xenpmu_params_t *params)
+{
+    struct vcpu *v;
+    uint64_t mfn = params->val;
+    static int pvpmu_initted = 0;
+
+    if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus )
+        return -EINVAL;
+
+    if ( !pvpmu_initted )
+    {
+        if (reserve_lapic_nmi() == 0)
+            set_nmi_callback(pmu_nmi_interrupt);
+        else
+        {
+            printk("Failed to reserve PMU NMI\n");
+            return -EBUSY;
+        }
+        open_softirq(PMU_SOFTIRQ, pmu_virq);
+        pvpmu_initted = 1;
+    }
+
+    if ( !mfn_valid(mfn) ||
+        !get_page_and_type(mfn_to_page(mfn), d, PGT_writable_page) )
+        return -EINVAL;
+
+    v = d->vcpu[params->vcpu];
+    v->arch.vpmu.xenpmu_data = map_domain_page_global(mfn);
+    memset(v->arch.vpmu.xenpmu_data, 0, PAGE_SIZE);
+
+    vpmu_initialise(v);
+
+    return 0;
+}
+
+static void pvpmu_finish(struct domain *d, xenpmu_params_t *params)
+{
+    struct vcpu *v;
+    uint64_t mfn;
+
+    if ( params->vcpu < 0 || params->vcpu >= d->max_vcpus )
+        return;
+
+    v = d->vcpu[params->vcpu];
+    if (v != current)
+        vcpu_pause(v);
+
+    if ( v->arch.vpmu.xenpmu_data )
+    {
+        mfn = domain_page_map_to_mfn(v->arch.vpmu.xenpmu_data);
+        if ( mfn_valid(mfn) )
+        {
+            unmap_domain_page_global(v->arch.vpmu.xenpmu_data);
+            put_page_and_type(mfn_to_page(mfn));
+        }
+    }
+    vpmu_destroy(v);
+
+    if (v != current)
+        vcpu_unpause(v);
+}
+
+long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg)
+{
+    int ret = -EINVAL;
+    xenpmu_params_t pmu_params;
+    uint32_t mode, flags;
+
+    switch ( op )
+    {
+    case XENPMU_mode_set:
+        if ( !is_control_domain(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        mode = (uint32_t)pmu_params.val & XENPMU_MODE_MASK;
+        if ( (mode & ~(XENPMU_MODE_ON | XENPMU_MODE_PRIV)) ||
+            ((mode & XENPMU_MODE_ON) && (mode &
XENPMU_MODE_PRIV)) )
+            return -EINVAL;
+
+        vpmu_mode &= ~XENPMU_MODE_MASK;
+        vpmu_mode |= mode;
+
+        ret = 0;
+        break;
+
+    case XENPMU_mode_get:
+        pmu_params.val = vpmu_mode & XENPMU_MODE_MASK;
+        pmu_params.version.maj = XENPMU_VER_MAJ;
+        pmu_params.version.min = XENPMU_VER_MIN;
+        if ( copy_to_guest(arg, &pmu_params, 1) )
+            return -EFAULT;
+        ret = 0;
+        break;
+
+    case XENPMU_flags_set:
+        if ( !is_control_domain(current->domain) )
+            return -EPERM;
+
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        flags = (uint64_t)pmu_params.val & XENPMU_FLAGS_MASK;
+        if ( flags & ~XENPMU_FLAGS_INTEL_BTS )
+            return -EINVAL;
+
+        vpmu_mode &= ~XENPMU_FLAGS_MASK;
+        vpmu_mode |= flags;
+
+        ret = 0;
+        break;
+
+    case XENPMU_flags_get:
+        pmu_params.val = vpmu_mode & XENPMU_FLAGS_MASK;
+        if ( copy_to_guest(arg, &pmu_params, 1) )
+            return -EFAULT;
+        ret = 0;
+        break;
+
+    case XENPMU_init:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+        ret = pvpmu_init(current->domain, &pmu_params);
+        break;
+
+    case XENPMU_finish:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+        pvpmu_finish(current->domain, &pmu_params);
+        break;
+
+    case XENPMU_lvtpc_set:
+        if ( copy_from_guest(&pmu_params, arg, 1) )
+            return -EFAULT;
+
+        vpmu_lvtpc_update((uint32_t)pmu_params.val);
+        ret = 0;
+        break;
+
+    case XENPMU_flush:
+        vpmu_reset(vcpu_vpmu(current), VPMU_WAIT_FOR_FLUSH);
+        vpmu_load(current);
+        ret = 0;
+        break;
+    }
+
+    return ret;
+}
diff --git a/xen/arch/x86/vpmu_amd.c b/xen/arch/x86/vpmu_amd.c
new file mode 100644
index 0000000..dd0a262
--- /dev/null
+++ b/xen/arch/x86/vpmu_amd.c
@@ -0,0 +1,489 @@
+/*
+ * vpmu.c: PMU virtualization for HVM domain.
+ *
+ * Copyright (c) 2010, Advanced Micro Devices, Inc.
+ * Parts of this code are Copyright (c) 2007, Intel Corporation
+ *
+ * Author: Wei Wang <wei.wang2@amd.com>
+ * Tested by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+
+#include <xen/config.h>
+#include <xen/xenoprof.h>
+#include <xen/hvm/save.h>
+#include <xen/sched.h>
+#include <xen/irq.h>
+#include <asm/apic.h>
+#include <asm/hvm/vlapic.h>
+#include <asm/vpmu.h>
+#include <public/xenpmu.h>
+
+#define MSR_F10H_EVNTSEL_GO_SHIFT   40
+#define MSR_F10H_EVNTSEL_EN_SHIFT   22
+#define MSR_F10H_COUNTER_LENGTH     48
+
+#define is_guest_mode(msr) ((msr) & (1ULL <<
MSR_F10H_EVNTSEL_GO_SHIFT))
+#define is_pmu_enabled(msr) ((msr) & (1ULL <<
MSR_F10H_EVNTSEL_EN_SHIFT))
+#define set_guest_mode(msr) (msr |= (1ULL << MSR_F10H_EVNTSEL_GO_SHIFT))
+#define is_overflowed(msr) (!((msr) & (1ULL <<
(MSR_F10H_COUNTER_LENGTH-1))))
+
+static unsigned int __read_mostly num_counters;
+static const u32 __read_mostly *counters;
+static const u32 __read_mostly *ctrls;
+static bool_t __read_mostly k7_counters_mirrored;
+
+#define F10H_NUM_COUNTERS   4
+#define F15H_NUM_COUNTERS   6
+
+/* PMU Counter MSRs. */
+static const u32 AMD_F10H_COUNTERS[] = {
+    MSR_K7_PERFCTR0,
+    MSR_K7_PERFCTR1,
+    MSR_K7_PERFCTR2,
+    MSR_K7_PERFCTR3
+};
+
+/* PMU Control MSRs. */
+static const u32 AMD_F10H_CTRLS[] = {
+    MSR_K7_EVNTSEL0,
+    MSR_K7_EVNTSEL1,
+    MSR_K7_EVNTSEL2,
+    MSR_K7_EVNTSEL3
+};
+
+static const u32 AMD_F15H_COUNTERS[] = {
+    MSR_AMD_FAM15H_PERFCTR0,
+    MSR_AMD_FAM15H_PERFCTR1,
+    MSR_AMD_FAM15H_PERFCTR2,
+    MSR_AMD_FAM15H_PERFCTR3,
+    MSR_AMD_FAM15H_PERFCTR4,
+    MSR_AMD_FAM15H_PERFCTR5
+};
+
+static const u32 AMD_F15H_CTRLS[] = {
+    MSR_AMD_FAM15H_EVNTSEL0,
+    MSR_AMD_FAM15H_EVNTSEL1,
+    MSR_AMD_FAM15H_EVNTSEL2,
+    MSR_AMD_FAM15H_EVNTSEL3,
+    MSR_AMD_FAM15H_EVNTSEL4,
+    MSR_AMD_FAM15H_EVNTSEL5
+};
+
+static inline int get_pmu_reg_type(u32 addr)
+{
+    if ( (addr >= MSR_K7_EVNTSEL0) && (addr <= MSR_K7_EVNTSEL3) )
+        return MSR_TYPE_CTRL;
+
+    if ( (addr >= MSR_K7_PERFCTR0) && (addr <= MSR_K7_PERFCTR3) )
+        return MSR_TYPE_COUNTER;
+
+    if ( (addr >= MSR_AMD_FAM15H_EVNTSEL0) &&
+         (addr <= MSR_AMD_FAM15H_PERFCTR5 ) )
+    {
+        if (addr & 1)
+            return MSR_TYPE_COUNTER;
+        else
+            return MSR_TYPE_CTRL;
+    }
+
+    /* unsupported registers */
+    return -1;
+}
+
+static inline u32 get_fam15h_addr(u32 addr)
+{
+    switch ( addr )
+    {
+    case MSR_K7_PERFCTR0:
+        return MSR_AMD_FAM15H_PERFCTR0;
+    case MSR_K7_PERFCTR1:
+        return MSR_AMD_FAM15H_PERFCTR1;
+    case MSR_K7_PERFCTR2:
+        return MSR_AMD_FAM15H_PERFCTR2;
+    case MSR_K7_PERFCTR3:
+        return MSR_AMD_FAM15H_PERFCTR3;
+    case MSR_K7_EVNTSEL0:
+        return MSR_AMD_FAM15H_EVNTSEL0;
+    case MSR_K7_EVNTSEL1:
+        return MSR_AMD_FAM15H_EVNTSEL1;
+    case MSR_K7_EVNTSEL2:
+        return MSR_AMD_FAM15H_EVNTSEL2;
+    case MSR_K7_EVNTSEL3:
+        return MSR_AMD_FAM15H_EVNTSEL3;
+    default:
+        break;
+    }
+
+    return addr;
+}
+
+static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
+{
+    unsigned int i;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct amd_vpmu_context *ctxt = vpmu->context;
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+        svm_intercept_msr(v, counters[i], MSR_INTERCEPT_NONE);
+        svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_WRITE);
+    }
+
+    ctxt->msr_bitmap_set = 1;
+}
+
+static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
+{
+    unsigned int i;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct amd_vpmu_context *ctxt = vpmu->context;
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+        svm_intercept_msr(v, counters[i], MSR_INTERCEPT_RW);
+        svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_RW);
+    }
+
+    ctxt->msr_bitmap_set = 0;
+}
+
+static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs)
+{
+    return 1;
+}
+
+static inline void context_load(struct vcpu *v)
+{
+    unsigned int i;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct amd_vpmu_context *ctxt = vpmu->context;
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+        wrmsrl(counters[i], ctxt->counters[i]);
+        wrmsrl(ctrls[i], ctxt->ctrls[i]);
+    }
+}
+
+static void amd_vpmu_load(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct amd_vpmu_context *ctxt = vpmu->context;
+
+    vpmu_reset(vpmu, VPMU_FROZEN);
+
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+    {
+        unsigned int i;
+
+        for ( i = 0; i < num_counters; i++ )
+            wrmsrl(ctrls[i], ctxt->ctrls[i]);
+
+        return;
+    }
+
+    context_load(v);
+}
+
+static inline void context_save(struct vcpu *v)
+{
+    unsigned int i;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct amd_vpmu_context *ctxt = vpmu->context;
+
+    /* No need to save controls -- they are saved in amd_vpmu_do_wrmsr */
+    for ( i = 0; i < num_counters; i++ )
+        rdmsrl(counters[i], ctxt->counters[i]);
+}
+
+static int amd_vpmu_save(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct amd_vpmu_context *ctx = vpmu->context;
+    unsigned int i;
+
+    if ( !vpmu_is_set(vpmu, VPMU_FROZEN) )
+    {
+        for ( i = 0; i < num_counters; i++ )
+            wrmsrl(ctrls[i], 0);
+
+        vpmu_set(vpmu, VPMU_FROZEN);
+    }
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
+            return 0;
+
+    context_save(v);
+
+    if ( is_hvm_domain(v->domain) && 
+        !vpmu_is_set(vpmu, VPMU_RUNNING) && ctx->msr_bitmap_set )
+        amd_vpmu_unset_msr_bitmap(v);
+
+    return 1;
+}
+
+static void context_update(unsigned int msr, u64 msr_content)
+{
+    unsigned int i;
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct amd_vpmu_context *ctxt = vpmu->context;
+
+    if ( k7_counters_mirrored &&
+        ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)) )
+    {
+        msr = get_fam15h_addr(msr);
+    }
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+       if ( msr == ctrls[i] )
+       {
+           ctxt->ctrls[i] = msr_content;
+           return;
+       }
+        else if (msr == counters[i] )
+        {
+            ctxt->counters[i] = msr_content;
+            return;
+        }
+    }
+}
+
+static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
+{
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    /* For all counters, enable guest only mode for HVM guest */
+    if ( is_hvm_domain(v->domain) && (get_pmu_reg_type(msr) ==
MSR_TYPE_CTRL) &&
+        !(is_guest_mode(msr_content)) )
+    {
+        set_guest_mode(msr_content);
+    }
+
+    /* check if the first counter is enabled */
+    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
+        is_pmu_enabled(msr_content) && !vpmu_is_set(vpmu, VPMU_RUNNING)
)
+    {
+        if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
+            return 1;
+        vpmu_set(vpmu, VPMU_RUNNING);
+        apic_write(APIC_LVTPC, APIC_DM_NMI);
+        vpmu->hw_lapic_lvtpc = APIC_DM_NMI;
+
+        if ( is_hvm_domain(v->domain) &&
+             !((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set
)
+            amd_vpmu_set_msr_bitmap(v);
+    }
+
+    /* stop saving & restore if guest stops first counter */
+    if ( (get_pmu_reg_type(msr) == MSR_TYPE_CTRL) &&
+        (is_pmu_enabled(msr_content) == 0) && vpmu_is_set(vpmu,
VPMU_RUNNING) )
+    {
+        apic_write(APIC_LVTPC, APIC_DM_NMI | APIC_LVT_MASKED);
+        vpmu->hw_lapic_lvtpc = APIC_DM_NMI | APIC_LVT_MASKED;
+        vpmu_reset(vpmu, VPMU_RUNNING);
+        if ( is_hvm_domain(v->domain) &&
+             ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+            amd_vpmu_unset_msr_bitmap(v);
+        release_pmu_ownship(PMU_OWNER_HVM);
+    }
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)
+        || vpmu_is_set(vpmu, VPMU_FROZEN) )
+    {
+        context_load(v);
+        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+        vpmu_reset(vpmu, VPMU_FROZEN);
+    }
+
+    /* Update vpmu context immediately */
+    context_update(msr, msr_content);
+
+    /* Write to hw counters */
+    wrmsrl(msr, msr_content);
+    return 1;
+}
+
+static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
+{
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED)
+        || vpmu_is_set(vpmu, VPMU_FROZEN) )
+    {
+        context_load(v);
+        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+        vpmu_reset(vpmu, VPMU_FROZEN);
+    }
+
+    rdmsrl(msr, *msr_content);
+
+    return 1;
+}
+
+static int amd_vpmu_initialise(struct vcpu *v)
+{
+    struct amd_vpmu_context *ctxt;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    uint8_t family = current_cpu_data.x86;
+
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return 0;
+
+    if ( counters == NULL )
+    {
+         switch ( family )
+	 {
+	 case 0x15:
+	     num_counters = F15H_NUM_COUNTERS;
+	     counters = AMD_F15H_COUNTERS;
+	     ctrls = AMD_F15H_CTRLS;
+	     k7_counters_mirrored = 1;
+	     break;
+	 case 0x10:
+	 case 0x12:
+	 case 0x14:
+	 case 0x16:
+	 default:
+	     num_counters = F10H_NUM_COUNTERS;
+	     counters = AMD_F10H_COUNTERS;
+	     ctrls = AMD_F10H_CTRLS;
+	     k7_counters_mirrored = 0;
+	     break;
+	 }
+    }
+
+    if ( is_hvm_domain(v->domain) )
+    {
+        ctxt = xzalloc(struct amd_vpmu_context);
+        if ( !ctxt )
+        {
+            gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
+                " PMU feature is unavailable on domain %d vcpu
%d.\n",
+                v->vcpu_id, v->domain->domain_id);
+                return -ENOMEM;
+        }
+    }
+    else
+        ctxt = &v->arch.vpmu.xenpmu_data->pmu.amd;
+
+    vpmu->context = ctxt;
+    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+    return 0;
+}
+
+static void amd_vpmu_destroy(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return;
+
+    if ( is_hvm_domain(v->domain) )
+    {
+        if ( ((struct amd_vpmu_context *)vpmu->context)->msr_bitmap_set )
+            amd_vpmu_unset_msr_bitmap(v);
+
+        xfree(vpmu->context);
+        release_pmu_ownship(PMU_OWNER_HVM);
+    }
+
+    vpmu->context = NULL;
+    vpmu_clear(vpmu);
+}
+
+/* VPMU part of the ''q'' keyhandler */
+static void amd_vpmu_dump(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct amd_vpmu_context *ctxt = vpmu->context;
+    unsigned int i;
+
+    printk("    VPMU state: 0x%x ", vpmu->flags);
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+    {
+         printk("\n");
+         return;
+    }
+
+    printk("(");
+    if ( vpmu_is_set(vpmu, VPMU_PASSIVE_DOMAIN_ALLOCATED) )
+        printk("PASSIVE_DOMAIN_ALLOCATED, ");
+    if ( vpmu_is_set(vpmu, VPMU_FROZEN) )
+        printk("FROZEN, ");
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
+        printk("SAVE, ");
+    if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
+        printk("RUNNING, ");
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+        printk("LOADED, ");
+    printk("ALLOCATED)\n");
+
+    for ( i = 0; i < num_counters; i++ )
+    {
+        uint64_t ctrl, cntr;
+
+        rdmsrl(ctrls[i], ctrl);
+        rdmsrl(counters[i], cntr);
+        printk("      0x%08x: 0x%lx (0x%lx in HW)    0x%08x: 0x%lx (0x%lx
in HW)\n",
+            ctrls[i], ctxt->ctrls[i], ctrl,
+            counters[i], ctxt->counters[i], cntr);
+    }
+}
+
+struct arch_vpmu_ops amd_vpmu_ops = {
+    .do_wrmsr = amd_vpmu_do_wrmsr,
+    .do_rdmsr = amd_vpmu_do_rdmsr,
+    .do_interrupt = amd_vpmu_do_interrupt,
+    .arch_vpmu_destroy = amd_vpmu_destroy,
+    .arch_vpmu_save = amd_vpmu_save,
+    .arch_vpmu_load = amd_vpmu_load,
+    .arch_vpmu_dump = amd_vpmu_dump
+};
+
+int svm_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    uint8_t family = current_cpu_data.x86;
+    int ret = 0;
+
+    /* vpmu enabled? */
+    if ( vpmu_flags == XENPMU_MODE_OFF )
+        return 0;
+
+    switch ( family )
+    {
+    case 0x10:
+    case 0x12:
+    case 0x14:
+    case 0x15:
+    case 0x16:
+        ret = amd_vpmu_initialise(v);
+        if ( !ret )
+            vpmu->arch_vpmu_ops = &amd_vpmu_ops;
+        return ret;
+    }
+
+    printk("VPMU: Initialization failed. "
+           "AMD processor family %d has not "
+           "been supported\n", family);
+    return -EINVAL;
+}
+
diff --git a/xen/arch/x86/vpmu_intel.c b/xen/arch/x86/vpmu_intel.c
new file mode 100644
index 0000000..e38d231
--- /dev/null
+++ b/xen/arch/x86/vpmu_intel.c
@@ -0,0 +1,936 @@
+/*
+ * vpmu_core2.c: CORE 2 specific PMU virtualization for HVM domain.
+ *
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Author: Haitao Shan <haitao.shan@intel.com>
+ */
+
+#include <xen/config.h>
+#include <xen/sched.h>
+#include <xen/xenoprof.h>
+#include <xen/irq.h>
+#include <asm/system.h>
+#include <asm/regs.h>
+#include <asm/types.h>
+#include <asm/apic.h>
+#include <asm/traps.h>
+#include <asm/msr.h>
+#include <asm/msr-index.h>
+#include <asm/hvm/support.h>
+#include <asm/hvm/vlapic.h>
+#include <asm/hvm/vmx/vmx.h>
+#include <asm/hvm/vmx/vmcs.h>
+#include <public/sched.h>
+#include <public/hvm/save.h>
+#include <public/xenpmu.h>
+#include <asm/vpmu.h>
+
+/*
+ * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID
+ * instruction.
+ * cpuid 0xa - Architectural Performance Monitoring Leaf
+ * Register eax
+ */
+#define PMU_VERSION_SHIFT        0  /* Version ID */
+#define PMU_VERSION_BITS         8  /* 8 bits 0..7 */
+#define PMU_VERSION_MASK         (((1 << PMU_VERSION_BITS) - 1) <<
PMU_VERSION_SHIFT)
+
+#define PMU_GENERAL_NR_SHIFT     8  /* Number of general pmu registers */
+#define PMU_GENERAL_NR_BITS      8  /* 8 bits 8..15 */
+#define PMU_GENERAL_NR_MASK      (((1 << PMU_GENERAL_NR_BITS) - 1)
<< PMU_GENERAL_NR_SHIFT)
+
+#define PMU_GENERAL_WIDTH_SHIFT 16  /* Width of general pmu registers */
+#define PMU_GENERAL_WIDTH_BITS   8  /* 8 bits 16..23 */
+#define PMU_GENERAL_WIDTH_MASK  (((1 << PMU_GENERAL_WIDTH_BITS) - 1)
<< PMU_GENERAL_WIDTH_SHIFT)
+/* Register edx */
+#define PMU_FIXED_NR_SHIFT       0  /* Number of fixed pmu registers */
+#define PMU_FIXED_NR_BITS        5  /* 5 bits 0..4 */
+#define PMU_FIXED_NR_MASK        (((1 << PMU_FIXED_NR_BITS) -1) <<
PMU_FIXED_NR_SHIFT)
+
+#define PMU_FIXED_WIDTH_SHIFT    5  /* Width of fixed pmu registers */
+#define PMU_FIXED_WIDTH_BITS     8  /* 8 bits 5..12 */
+#define PMU_FIXED_WIDTH_MASK     (((1 << PMU_FIXED_WIDTH_BITS) -1)
<< PMU_FIXED_WIDTH_SHIFT)
+
+/* Intel-specific VPMU features */
+#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
+#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace Store */
+
+/*
+ * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
+ * counters. 4 bits for every counter.
+ */
+#define FIXED_CTR_CTRL_BITS 4
+#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
+
+static int arch_pmc_cnt; /* Number of general-purpose performance counters */
+static int fixed_pmc_cnt; /* Number of fixed performance counters */
+
+/*
+ * QUIRK to workaround an issue on various family 6 cpus.
+ * The issue leads to endless PMC interrupt loops on the processor.
+ * If the interrupt handler is running and a pmc reaches the value 0, this
+ * value remains forever and it triggers immediately a new interrupt after
+ * finishing the handler.
+ * A workaround is to read all flagged counters and if the value is 0 write
+ * 1 (or another value != 0) into it.
+ * There exist no errata and the real cause of this behaviour is unknown.
+ */
+bool_t __read_mostly is_pmc_quirk;
+
+static void check_pmc_quirk(void)
+{
+    if ( current_cpu_data.x86 == 6 )
+        is_pmc_quirk = 1;
+    else
+        is_pmc_quirk = 0;    
+}
+
+static void handle_pmc_quirk(u64 msr_content)
+{
+    int i;
+    u64 val;
+
+    if ( !is_pmc_quirk )
+        return;
+
+    val = msr_content;
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
+        if ( val & 0x1 )
+        {
+            u64 cnt;
+            rdmsrl(MSR_P6_PERFCTR0 + i, cnt);
+            if ( cnt == 0 )
+                wrmsrl(MSR_P6_PERFCTR0 + i, 1);
+        }
+        val >>= 1;
+    }
+    val = msr_content >> 32;
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        if ( val & 0x1 )
+        {
+            u64 cnt;
+            rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, cnt);
+            if ( cnt == 0 )
+                wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, 1);
+        }
+        val >>= 1;
+    }
+}
+
+/*
+ * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
+ */
+static int core2_get_arch_pmc_count(void)
+{
+    u32 eax, ebx, ecx, edx;
+
+    cpuid(0xa, &eax, &ebx, &ecx, &edx);
+    return ( (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT );
+}
+
+/*
+ * Read the number of fixed counters via CPUID.EDX[0xa].EDX[0..4]
+ */
+static int core2_get_fixed_pmc_count(void)
+{
+    u32 eax, ebx, ecx, edx;
+
+    cpuid(0xa, &eax, &ebx, &ecx, &edx);
+    return ( (eax & PMU_FIXED_NR_MASK) >> PMU_FIXED_NR_SHIFT );
+}
+
+static u64 core2_calc_intial_glb_ctrl_msr(void)
+{
+    int arch_pmc_bits = (1 << arch_pmc_cnt) - 1;
+    u64 fix_pmc_bits  = (1 << fixed_pmc_cnt) - 1;
+    return ((fix_pmc_bits << 32) | arch_pmc_bits);
+}
+
+/* edx bits 5-12: Bit width of fixed-function performance counters  */
+static int core2_get_bitwidth_fix_count(void)
+{
+    u32 eax, ebx, ecx, edx;
+
+    cpuid(0xa, &eax, &ebx, &ecx, &edx);
+    return ((edx & PMU_FIXED_WIDTH_MASK) >> PMU_FIXED_WIDTH_SHIFT);
+}
+
+static int is_core2_vpmu_msr(u32 msr_index, int *type, int *index)
+{
+    int i;
+
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        if ( msr_index == MSR_CORE_PERF_FIXED_CTR0 + i )
+        {
+            *type = MSR_TYPE_COUNTER;
+            *index = i;
+            return 1;
+        }
+    }
+
+    if ( (msr_index == MSR_CORE_PERF_FIXED_CTR_CTRL ) ||
+        (msr_index == MSR_IA32_DS_AREA) ||
+        (msr_index == MSR_IA32_PEBS_ENABLE) )
+    {
+        *type = MSR_TYPE_CTRL;
+        return 1;
+    }
+
+    if ( (msr_index == MSR_CORE_PERF_GLOBAL_CTRL) ||
+         (msr_index == MSR_CORE_PERF_GLOBAL_STATUS) ||
+         (msr_index == MSR_CORE_PERF_GLOBAL_OVF_CTRL) )
+    {
+        *type = MSR_TYPE_GLOBAL;
+        return 1;
+    }
+
+    if ( (msr_index >= MSR_IA32_PERFCTR0) &&
+         (msr_index < (MSR_IA32_PERFCTR0 + arch_pmc_cnt)) )
+    {
+        *type = MSR_TYPE_ARCH_COUNTER;
+        *index = msr_index - MSR_IA32_PERFCTR0;
+        return 1;
+    }
+
+    if ( (msr_index >= MSR_P6_EVNTSEL0) &&
+         (msr_index < (MSR_P6_EVNTSEL0 + arch_pmc_cnt)) )
+    {
+        *type = MSR_TYPE_ARCH_CTRL;
+        *index = msr_index - MSR_P6_EVNTSEL0;
+        return 1;
+    }
+
+    return 0;
+}
+
+#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
+static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
+{
+    int i;
+
+    /* Allow Read/Write PMU Counters MSR Directly. */
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
+        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
+                  msr_bitmap + 0x800/BYTES_PER_LONG);
+    }
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
+        clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
+        clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
+                  msr_bitmap + 0x800/BYTES_PER_LONG);
+    }
+
+    /* Allow Read PMU Non-global Controls Directly. */
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap);
+
+    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
+    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
+    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
+}
+
+static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
+{
+    int i;
+
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i), msr_bitmap);
+        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
+                msr_bitmap + 0x800/BYTES_PER_LONG);
+    }
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+    {
+        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
+        set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
+                msr_bitmap + 0x800/BYTES_PER_LONG);
+    }
+
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap);
+
+    set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
+    set_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
+    set_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
+}
+
+static inline void __core2_vpmu_save(struct vcpu *v)
+{
+    int i;
+    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i,
core2_vpmu_cxt->fix_counters[i]);
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        rdmsrl(MSR_IA32_PERFCTR0 + i,
core2_vpmu_cxt->arch_msr_pair[i].counter);
+
+    if ( !is_hvm_domain(v->domain) )
+        rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
+}
+
+static int core2_vpmu_save(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
+        return 0;
+
+    if ( !is_hvm_domain(v->domain) )
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+
+    __core2_vpmu_save(v);
+
+    /* Unset PMU MSR bitmap to trap lazy load. */
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) && cpu_has_vmx_msr_bitmap 
+        && is_hvm_domain(v->domain) )
+        core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
+
+    return 1;
+}
+
+static inline void __core2_vpmu_load(struct vcpu *v)
+{
+    int i;
+    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
+
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i,
core2_vpmu_cxt->fix_counters[i]);
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        wrmsrl(MSR_IA32_PERFCTR0 + i,
core2_vpmu_cxt->arch_msr_pair[i].counter);
+
+    wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
+    wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
+    wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
+
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        wrmsrl(MSR_P6_EVNTSEL0+i, core2_vpmu_cxt->arch_msr_pair[i].control);
+
+    if ( !is_hvm_domain(v->domain) )
+    {
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL,
core2_vpmu_cxt->global_ovf_ctrl);
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
+    }
+}
+
+static void core2_vpmu_load(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+        return;
+
+    __core2_vpmu_load(v);
+}
+
+static int core2_vpmu_alloc_resource(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct core2_vpmu_context *core2_vpmu_cxt;
+
+    if ( is_hvm_domain(v->domain) )
+    {
+        if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
+            goto out_err;
+
+        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+        if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+            goto out_err;
+
+        if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
+            goto out_err;
+        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+                            core2_calc_intial_glb_ctrl_msr());
+
+        core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context));
+        if ( !core2_vpmu_cxt )
+            goto out_err;
+    }
+    else
+    {
+        core2_vpmu_cxt = &v->arch.vpmu.xenpmu_data->pmu.intel;
+        vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+    }
+
+    vpmu->context = (void *)core2_vpmu_cxt;
+
+    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+
+    return 1;
+
+out_err:
+    vmx_rm_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL);
+    vmx_rm_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL);
+    release_pmu_ownship(PMU_OWNER_HVM);
+
+    printk("Failed to allocate VPMU resources for domain %u vcpu
%u\n",
+           v->vcpu_id, v->domain->domain_id);
+
+    return 0;
+}
+
+static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+    if ( !is_core2_vpmu_msr(msr_index, type, index) )
+        return 0;
+
+    if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) &&
+         !core2_vpmu_alloc_resource(current) )
+        return 0;
+
+    /* Do the lazy load staff. */
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+    {
+        __core2_vpmu_load(current);
+        vpmu_set(vpmu, VPMU_CONTEXT_LOADED);
+        if ( cpu_has_vmx_msr_bitmap &&
is_hvm_domain(current->domain) )
+            core2_vpmu_set_msr_bitmap(current->arch.hvm_vmx.msr_bitmap);
+    }
+    return 1;
+}
+
+static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
+{
+    u64 global_ctrl, non_global_ctrl;
+    unsigned pmu_enable = 0;
+    int i, tmp;
+    int type = -1, index = -1;
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct core2_vpmu_context *core2_vpmu_cxt = NULL;
+
+    if ( !core2_vpmu_msr_common_check(msr, &type, &index) )
+    {
+        /* Special handling for BTS */
+        if ( msr == MSR_IA32_DEBUGCTLMSR )
+        {
+            uint64_t supported = IA32_DEBUGCTLMSR_TR | IA32_DEBUGCTLMSR_BTS |
+                                 IA32_DEBUGCTLMSR_BTINT;
+
+            if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
+                supported |= IA32_DEBUGCTLMSR_BTS_OFF_OS |
+                             IA32_DEBUGCTLMSR_BTS_OFF_USR;
+            if ( msr_content & supported )
+            {
+                if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
+                    return 1;
+                gdprintk(XENLOG_WARNING, "Debug Store is not supported on
this cpu\n");
+
+                if ( is_hvm_domain(v->domain) )
+                    hvm_inject_hw_exception(TRAP_gp_fault, 0);
+                else
+                    send_guest_trap(v->domain, v->vcpu_id,
TRAP_gp_fault);
+
+                return 0;
+            }
+        }
+        return 0;
+    }
+
+    core2_vpmu_cxt = vpmu->context;
+    switch ( msr )
+    {
+    case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+        core2_vpmu_cxt->global_ovf_status &= ~msr_content;
+        core2_vpmu_cxt->global_ovf_ctrl = msr_content;
+        return 1;
+    case MSR_CORE_PERF_GLOBAL_STATUS:
+        gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
+                 "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
+        if ( is_hvm_domain(v->domain) )
+            hvm_inject_hw_exception(TRAP_gp_fault, 0);
+        else
+            send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault);
+        return 1;
+    case MSR_IA32_PEBS_ENABLE:
+        if ( msr_content & 1 )
+            gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS,
"
+                     "which is not supported.\n");
+        core2_vpmu_cxt->pebs_enable = msr_content;
+        return 1;
+    case MSR_IA32_DS_AREA:
+        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
+        {
+            if ( !is_canonical_address(msr_content) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Illegal address for IA32_DS_AREA: %#"
PRIx64 "x\n",
+                         msr_content);
+                if ( is_hvm_domain(v->domain) )
+                    hvm_inject_hw_exception(TRAP_gp_fault, 0);
+                else
+                    send_guest_trap(v->domain, v->vcpu_id,
TRAP_gp_fault);
+                return 1;
+            }
+            core2_vpmu_cxt->ds_area = msr_content;
+            break;
+        }
+        gdprintk(XENLOG_WARNING, "Guest setting of DTS is
ignored.\n");
+        return 1;
+    case MSR_CORE_PERF_GLOBAL_CTRL:
+        global_ctrl = msr_content;
+        for ( i = 0; i < arch_pmc_cnt; i++ )
+        {
+            rdmsrl(MSR_P6_EVNTSEL0+i, non_global_ctrl);
+            pmu_enable += global_ctrl & (non_global_ctrl >> 22) &
1;
+            global_ctrl >>= 1;
+        }
+
+        rdmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, non_global_ctrl);
+        global_ctrl = msr_content >> 32;
+        for ( i = 0; i < fixed_pmc_cnt; i++ )
+        {
+            pmu_enable += (global_ctrl & 1) & ((non_global_ctrl &
0x3)? 1: 0);
+            non_global_ctrl >>= FIXED_CTR_CTRL_BITS;
+            global_ctrl >>= 1;
+        }
+        core2_vpmu_cxt->global_ctrl = msr_content;
+        break;
+    case MSR_CORE_PERF_FIXED_CTR_CTRL:
+        non_global_ctrl = msr_content;
+        if ( is_hvm_domain(v->domain) )
+            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+        else
+            rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
+        global_ctrl >>= 32;
+        for ( i = 0; i < fixed_pmc_cnt; i++ )
+        {
+            pmu_enable += (global_ctrl & 1) & ((non_global_ctrl &
0x3)? 1: 0);
+            non_global_ctrl >>= 4;
+            global_ctrl >>= 1;
+        }
+        core2_vpmu_cxt->fixed_ctrl = msr_content;
+        break;
+    default:
+        tmp = msr - MSR_P6_EVNTSEL0;
+        if ( tmp >= 0 && tmp < arch_pmc_cnt )
+        {
+            if ( is_hvm_domain(v->domain) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
&global_ctrl);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
+            core2_vpmu_cxt->arch_msr_pair[tmp].control = msr_content;
+            for ( i = 0; i < arch_pmc_cnt && !pmu_enable; i++ )
+                pmu_enable += (global_ctrl >> i) &
+                    (core2_vpmu_cxt->arch_msr_pair[i].control >> 22)
& 1;
+        }
+    }
+
+    pmu_enable += (core2_vpmu_cxt->ds_area != 0);
+    if ( pmu_enable )
+        vpmu_set(vpmu, VPMU_RUNNING);
+    else
+        vpmu_reset(vpmu, VPMU_RUNNING);
+
+    if ( is_hvm_domain(v->domain) )
+    {
+        /* Setup LVTPC in local apic */
+        if ( vpmu_is_set(vpmu, VPMU_RUNNING) &&
+             is_vlapic_lvtpc_enabled(vcpu_vlapic(v)) )
+        {
+            apic_write_around(APIC_LVTPC, APIC_DM_NMI);
+            vpmu->hw_lapic_lvtpc = APIC_DM_NMI;
+        }
+        else
+        {
+            apic_write_around(APIC_LVTPC, APIC_DM_NMI | APIC_LVT_MASKED);
+            vpmu->hw_lapic_lvtpc = APIC_DM_NMI | APIC_LVT_MASKED;
+        }
+    }
+
+    if ( type != MSR_TYPE_GLOBAL )
+    {
+        u64 mask;
+        int inject_gp = 0;
+        switch ( type )
+        {
+        case MSR_TYPE_ARCH_CTRL:      /* MSR_P6_EVNTSEL[0,...] */
+            mask = ~((1ull << 32) - 1);
+            if (msr_content & mask)
+                inject_gp = 1;
+            break;
+        case MSR_TYPE_CTRL:           /* IA32_FIXED_CTR_CTRL */
+            if  ( msr == MSR_IA32_DS_AREA )
+                break;
+            /* 4 bits per counter, currently 3 fixed counters implemented. */
+            mask = ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) -
1);
+            if (msr_content & mask)
+                inject_gp = 1;
+            break;
+        case MSR_TYPE_COUNTER:        /* IA32_FIXED_CTR[0-2] */
+            mask = ~((1ull << core2_get_bitwidth_fix_count()) - 1);
+            if (msr_content & mask)
+                inject_gp = 1;
+            break;
+        }
+
+        if (inject_gp) 
+        {
+           if ( is_hvm_domain(v->domain) )
+               hvm_inject_hw_exception(TRAP_gp_fault, 0);
+           else
+               send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault);
+        }
+        else
+            wrmsrl(msr, msr_content);
+    }
+    else
+    {
+       if ( is_hvm_domain(v->domain) )
+           vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+       else
+           wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+    }
+
+    return 1;
+}
+
+static int core2_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
+{
+    int type = -1, index = -1;
+    struct vcpu *v = current;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct core2_vpmu_context *core2_vpmu_cxt = NULL;
+
+    if ( core2_vpmu_msr_common_check(msr, &type, &index) )
+    {
+        core2_vpmu_cxt = vpmu->context;
+        switch ( msr )
+        {
+        case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+            *msr_content = 0;
+            break;
+        case MSR_CORE_PERF_GLOBAL_STATUS:
+            *msr_content = core2_vpmu_cxt->global_ovf_status;
+            break;
+        case MSR_CORE_PERF_GLOBAL_CTRL:
+            if ( is_hvm_domain(v->domain) )
+                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
+            else
+                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
+            break;
+        default:
+            rdmsrl(msr, *msr_content);
+        }
+    }
+    else
+    {
+        /* Extension for BTS */
+        if ( msr == MSR_IA32_MISC_ENABLE )
+        {
+            if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
+                *msr_content &= ~MSR_IA32_MISC_ENABLE_BTS_UNAVAIL;
+        }
+        else
+            return 0;
+    }
+
+    return 1;
+}
+
+static void core2_vpmu_do_cpuid(unsigned int input,
+                                unsigned int *eax, unsigned int *ebx,
+                                unsigned int *ecx, unsigned int *edx)
+{
+    if (input == 0x1)
+    {
+        struct vpmu_struct *vpmu = vcpu_vpmu(current);
+
+        if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
+        {
+            /* Switch on the ''Debug Store'' feature in
CPUID.EAX[1]:EDX[21] */
+            *edx |= cpufeat_mask(X86_FEATURE_DS);
+            if ( cpu_has(&current_cpu_data, X86_FEATURE_DTES64) )
+                *ecx |= cpufeat_mask(X86_FEATURE_DTES64);
+            if ( cpu_has(&current_cpu_data, X86_FEATURE_DSCPL) )
+                *ecx |= cpufeat_mask(X86_FEATURE_DSCPL);
+        }
+    }
+    else if ( input == 0xa )
+    {
+        /* Limit number of counters to max that we support */
+        if ( ((*eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT)
>
+             XENPMU_CORE2_MAX_ARCH_PMCS )
+            *eax = (*eax & ~PMU_GENERAL_NR_MASK) |
+                (XENPMU_CORE2_MAX_ARCH_PMCS << PMU_GENERAL_NR_SHIFT);
+        if ( ((*edx & PMU_FIXED_NR_MASK) >> PMU_FIXED_NR_SHIFT) >
+             XENPMU_CORE2_MAX_FIXED_PMCS )
+            *eax = (*eax & ~PMU_FIXED_NR_MASK) |
+                (XENPMU_CORE2_MAX_FIXED_PMCS << PMU_FIXED_NR_SHIFT);
+    }
+}
+
+/* Dump vpmu info on console, called in the context of keyhandler
''q''. */
+static void core2_vpmu_dump(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    int i;
+    struct core2_vpmu_context *core2_vpmu_cxt = NULL;
+    u64 val;
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+         return;
+
+    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) )
+    {
+        if ( vpmu_set(vpmu, VPMU_CONTEXT_LOADED) )
+            printk("    vPMU loaded\n");
+        else
+            printk("    vPMU allocated\n");
+        return;
+    }
+
+    printk("    vPMU running\n");
+    core2_vpmu_cxt = vpmu->context;
+
+    /* Print the contents of the counter and its configuration msr. */
+    for ( i = 0; i < arch_pmc_cnt; i++ )
+        printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
+               i, core2_vpmu_cxt->arch_msr_pair[i].counter,
+               core2_vpmu_cxt->arch_msr_pair[i].control);
+
+    /*
+     * The configuration of the fixed counter is 4 bits each in the
+     * MSR_CORE_PERF_FIXED_CTR_CTRL.
+     */
+    val = core2_vpmu_cxt->fixed_ctrl;
+    for ( i = 0; i < fixed_pmc_cnt; i++ )
+    {
+        printk("      fixed_%d:   0x%016lx ctrl: 0x%lx\n",
+               i, core2_vpmu_cxt->fix_counters[i],
+               val & FIXED_CTR_CTRL_MASK);
+        val >>= FIXED_CTR_CTRL_BITS;
+    }
+}
+
+static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs)
+{
+    struct vcpu *v = current;
+    u64 msr_content;
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context;
+
+    rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);
+    if ( msr_content )
+    {
+        if ( is_pmc_quirk )
+            handle_pmc_quirk(msr_content);
+        core2_vpmu_cxt->global_ovf_status |= msr_content;
+        msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) - 1);
+        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
+    }
+    else
+    {
+        /* No PMC overflow but perhaps a Trace Message interrupt. */
+        msr_content = __vmread(GUEST_IA32_DEBUGCTL);
+        if ( !(msr_content & IA32_DEBUGCTLMSR_TR) )
+            return 0;
+    }
+
+    /* HW sets the MASK bit when performance counter interrupt occurs*/
+    vpmu->hw_lapic_lvtpc = apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED;
+    apic_write_around(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+
+    return 1;
+}
+
+static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    u64 msr_content;
+    struct cpuinfo_x86 *c = &current_cpu_data;
+
+    if ( !(vpmu_flags & XENPMU_FLAGS_INTEL_BTS) )
+        goto func_out;
+    /* Check the ''Debug Store'' feature in the
CPUID.EAX[1]:EDX[21] */
+    if ( cpu_has(c, X86_FEATURE_DS) )
+    {
+        if ( !cpu_has(c, X86_FEATURE_DTES64) )
+        {
+            printk(XENLOG_G_WARNING "CPU doesn''t support 64-bit
DS Area"
+                   " - Debug Store disabled for d%d:v%d\n",
+                   v->domain->domain_id, v->vcpu_id);
+            goto func_out;
+        }
+        vpmu_set(vpmu, VPMU_CPU_HAS_DS);
+        rdmsrl(MSR_IA32_MISC_ENABLE, msr_content);
+        if ( msr_content & MSR_IA32_MISC_ENABLE_BTS_UNAVAIL )
+        {
+            /* If BTS_UNAVAIL is set reset the DS feature. */
+            vpmu_reset(vpmu, VPMU_CPU_HAS_DS);
+            printk(XENLOG_G_WARNING "CPU has set BTS_UNAVAIL"
+                   " - Debug Store disabled for d%d:v%d\n",
+                   v->domain->domain_id, v->vcpu_id);
+        }
+        else
+        {
+            vpmu_set(vpmu, VPMU_CPU_HAS_BTS);
+            if ( !cpu_has(c, X86_FEATURE_DSCPL) )
+                printk(XENLOG_G_INFO
+                       "vpmu: CPU doesn''t support CPL-Qualified
BTS\n");
+           
printk("******************************************************\n");
+            printk("** WARNING: Emulation of BTS Feature is switched on
**\n");
+            printk("** Using this processor feature in a virtualized   
**\n");
+            printk("** environment is not 100%% safe.                   
**\n");
+            printk("** Setting the DS buffer address with wrong values 
**\n");
+            printk("** may lead to hypervisor hangs or crashes.        
**\n");
+            printk("** It is NOT recommended for production use!       
**\n");
+           
printk("******************************************************\n");
+        }
+    }
+func_out:
+
+    arch_pmc_cnt = core2_get_arch_pmc_count();
+    if ( arch_pmc_cnt > XENPMU_CORE2_MAX_ARCH_PMCS )
+        arch_pmc_cnt = XENPMU_CORE2_MAX_ARCH_PMCS;
+    fixed_pmc_cnt = core2_get_fixed_pmc_count();
+    if ( fixed_pmc_cnt > XENPMU_CORE2_MAX_FIXED_PMCS )
+        fixed_pmc_cnt = XENPMU_CORE2_MAX_FIXED_PMCS;
+    check_pmc_quirk();
+
+    /* PV domains can allocate resources immediately */
+    if ( !is_hvm_domain(v->domain) )
+        if ( !core2_vpmu_alloc_resource(v) )
+            return 1;
+
+    return 0;
+}
+
+static void core2_vpmu_destroy(struct vcpu *v)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+
+    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+        return;
+
+    if ( is_hvm_domain(v->domain) )
+    {
+        xfree(vpmu->context);
+        if ( cpu_has_vmx_msr_bitmap )
+            core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
+    }
+
+    release_pmu_ownship(PMU_OWNER_HVM);
+    vpmu_clear(vpmu);
+}
+
+struct arch_vpmu_ops core2_vpmu_ops = {
+    .do_wrmsr = core2_vpmu_do_wrmsr,
+    .do_rdmsr = core2_vpmu_do_rdmsr,
+    .do_interrupt = core2_vpmu_do_interrupt,
+    .do_cpuid = core2_vpmu_do_cpuid,
+    .arch_vpmu_destroy = core2_vpmu_destroy,
+    .arch_vpmu_save = core2_vpmu_save,
+    .arch_vpmu_load = core2_vpmu_load,
+    .arch_vpmu_dump = core2_vpmu_dump
+};
+
+static void core2_no_vpmu_do_cpuid(unsigned int input,
+                                unsigned int *eax, unsigned int *ebx,
+                                unsigned int *ecx, unsigned int *edx)
+{
+    /*
+     * As in this case the vpmu is not enabled reset some bits in the
+     * architectural performance monitoring related part.
+     */
+    if ( input == 0xa )
+    {
+        *eax &= ~PMU_VERSION_MASK;
+        *eax &= ~PMU_GENERAL_NR_MASK;
+        *eax &= ~PMU_GENERAL_WIDTH_MASK;
+
+        *edx &= ~PMU_FIXED_NR_MASK;
+        *edx &= ~PMU_FIXED_WIDTH_MASK;
+    }
+}
+
+/*
+ * If its a vpmu msr set it to 0.
+ */
+static int core2_no_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
+{
+    int type = -1, index = -1;
+    if ( !is_core2_vpmu_msr(msr, &type, &index) )
+        return 0;
+    *msr_content = 0;
+    return 1;
+}
+
+/*
+ * These functions are used in case vpmu is not enabled.
+ */
+struct arch_vpmu_ops core2_no_vpmu_ops = {
+    .do_rdmsr = core2_no_vpmu_do_rdmsr,
+    .do_cpuid = core2_no_vpmu_do_cpuid,
+};
+
+int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+{
+    struct vpmu_struct *vpmu = vcpu_vpmu(v);
+    uint8_t family = current_cpu_data.x86;
+    uint8_t cpu_model = current_cpu_data.x86_model;
+    int ret = 0;
+
+    vpmu->arch_vpmu_ops = &core2_no_vpmu_ops;
+    if ( vpmu_flags == XENPMU_MODE_OFF )
+        return 0;
+
+    if ( family == 6 )
+    {
+        switch ( cpu_model )
+        {
+        /* Core2: */
+        case 0x0f: /* original 65 nm celeron/pentium/core2/xeon,
"Merom"/"Conroe" */
+        case 0x16: /* single-core 65 nm celeron/core2solo
"Merom-L"/"Conroe-L" */
+        case 0x17: /* 45 nm celeron/core2/xeon
"Penryn"/"Wolfdale" */
+        case 0x1d: /* six-core 45 nm xeon "Dunnington" */
+
+        case 0x2a: /* SandyBridge */
+        case 0x2d: /* SandyBridge, "Romley-EP" */
+
+        /* Nehalem: */
+        case 0x1a: /* 45 nm nehalem, "Bloomfield" */
+        case 0x1e: /* 45 nm nehalem, "Lynnfield",
"Clarksfield", "Jasper Forest" */
+        case 0x2e: /* 45 nm nehalem-ex, "Beckton" */
+
+        /* Westmere: */
+        case 0x25: /* 32 nm nehalem, "Clarkdale",
"Arrandale" */
+        case 0x2c: /* 32 nm nehalem, "Gulftown",
"Westmere-EP" */
+        case 0x27: /* 32 nm Westmere-EX */
+
+        case 0x3a: /* IvyBridge */
+        case 0x3e: /* IvyBridge EP */
+        case 0x3c: /* Haswell */
+            ret = core2_vpmu_initialise(v, vpmu_flags);
+            if ( !ret )
+                vpmu->arch_vpmu_ops = &core2_vpmu_ops;
+            return ret;
+        }
+    }
+
+    printk("VPMU: Initialization failed. "
+           "Intel processor family %d model %d has not "
+           "been supported\n", family, cpu_model);
+    return -EINVAL;
+}
+
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 4f2247e..0b79d39 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -8,6 +8,7 @@
 #include <asm/hvm/domain.h>
 #include <asm/e820.h>
 #include <asm/mce.h>
+#include <asm/vpmu.h>
 #include <public/vcpu.h>
 
 #define has_32bit_shinfo(d)    ((d)->arch.has_32bit_shinfo)
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h
b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 5971613..beb959f 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -20,7 +20,6 @@
 #define __ASM_X86_HVM_VMX_VMCS_H__
 
 #include <asm/hvm/io.h>
-#include <asm/hvm/vpmu.h>
 #include <irq_vectors.h>
 
 extern void vmcs_dump_vcpu(struct vcpu *v);
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
deleted file mode 100644
index 348fc9a..0000000
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ /dev/null
@@ -1,96 +0,0 @@
-/*
- * vpmu.h: PMU virtualization for HVM domain.
- *
- * Copyright (c) 2007, Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms and conditions of the GNU General Public License,
- * version 2, as published by the Free Software Foundation.
- *
- * This program is distributed in the hope it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place - Suite 330, Boston, MA 02111-1307 USA.
- *
- * Author: Haitao Shan <haitao.shan@intel.com>
- */
-
-#ifndef __ASM_X86_HVM_VPMU_H_
-#define __ASM_X86_HVM_VPMU_H_
-
-#include <public/xenpmu.h>
-
-#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.vpmu))
-#define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
-                                          arch.vpmu))
-
-#define MSR_TYPE_COUNTER            0
-#define MSR_TYPE_CTRL               1
-#define MSR_TYPE_GLOBAL             2
-#define MSR_TYPE_ARCH_COUNTER       3
-#define MSR_TYPE_ARCH_CTRL          4
-
-
-/* Arch specific operations shared by all vpmus */
-struct arch_vpmu_ops {
-    int (*do_wrmsr)(unsigned int msr, uint64_t msr_content);
-    int (*do_rdmsr)(unsigned int msr, uint64_t *msr_content);
-    int (*do_interrupt)(struct cpu_user_regs *regs);
-    void (*do_cpuid)(unsigned int input,
-                     unsigned int *eax, unsigned int *ebx,
-                     unsigned int *ecx, unsigned int *edx);
-    void (*arch_vpmu_destroy)(struct vcpu *v);
-    int (*arch_vpmu_save)(struct vcpu *v);
-    void (*arch_vpmu_load)(struct vcpu *v);
-    void (*arch_vpmu_dump)(struct vcpu *v);
-};
-
-int vmx_vpmu_initialise(struct vcpu *, unsigned int flags);
-int svm_vpmu_initialise(struct vcpu *, unsigned int flags);
-
-struct vpmu_struct {
-    u32 flags;
-    u32 last_pcpu;
-    u32 hw_lapic_lvtpc;
-    void *context;
-    struct arch_vpmu_ops *arch_vpmu_ops;
-    xenpmu_data_t *xenpmu_data;
-};
-
-/* VPMU states */
-#define VPMU_CONTEXT_ALLOCATED              0x1
-#define VPMU_CONTEXT_LOADED                 0x2
-#define VPMU_RUNNING                        0x4
-#define VPMU_CONTEXT_SAVE                   0x8   /* Force context save */
-#define VPMU_FROZEN                         0x10  /* Stop counters while VCPU
is not running */
-#define VPMU_PASSIVE_DOMAIN_ALLOCATED       0x20
-#define VPMU_WAIT_FOR_FLUSH                 0x40  /* PV guest waits for
XENPMU_flush */
-
-#define vpmu_set(_vpmu, _x)         ((_vpmu)->flags |= (_x))
-#define vpmu_reset(_vpmu, _x)       ((_vpmu)->flags &= ~(_x))
-#define vpmu_is_set(_vpmu, _x)      ((_vpmu)->flags & (_x))
-#define vpmu_is_set_all(_vpmu, _x)  (((_vpmu)->flags & (_x)) == (_x))
-#define vpmu_clear(_vpmu)           ((_vpmu)->flags = 0)
-
-int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content);
-int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
-int vpmu_do_interrupt(struct cpu_user_regs *regs);
-void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
-                                       unsigned int *ecx, unsigned int *edx);
-void vpmu_initialise(struct vcpu *v);
-void vpmu_destroy(struct vcpu *v);
-void vpmu_save(struct vcpu *v);
-void vpmu_load(struct vcpu *v);
-void vpmu_dump(struct vcpu *v);
-
-extern int acquire_pmu_ownership(int pmu_ownership);
-extern void release_pmu_ownership(int pmu_ownership);
-
-extern uint32_t vpmu_mode;
-
-#endif /* __ASM_X86_HVM_VPMU_H_*/
-
diff --git a/xen/include/asm-x86/vpmu.h b/xen/include/asm-x86/vpmu.h
new file mode 100644
index 0000000..348fc9a
--- /dev/null
+++ b/xen/include/asm-x86/vpmu.h
@@ -0,0 +1,96 @@
+/*
+ * vpmu.h: PMU virtualization for HVM domain.
+ *
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ * Author: Haitao Shan <haitao.shan@intel.com>
+ */
+
+#ifndef __ASM_X86_HVM_VPMU_H_
+#define __ASM_X86_HVM_VPMU_H_
+
+#include <public/xenpmu.h>
+
+#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.vpmu))
+#define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
+                                          arch.vpmu))
+
+#define MSR_TYPE_COUNTER            0
+#define MSR_TYPE_CTRL               1
+#define MSR_TYPE_GLOBAL             2
+#define MSR_TYPE_ARCH_COUNTER       3
+#define MSR_TYPE_ARCH_CTRL          4
+
+
+/* Arch specific operations shared by all vpmus */
+struct arch_vpmu_ops {
+    int (*do_wrmsr)(unsigned int msr, uint64_t msr_content);
+    int (*do_rdmsr)(unsigned int msr, uint64_t *msr_content);
+    int (*do_interrupt)(struct cpu_user_regs *regs);
+    void (*do_cpuid)(unsigned int input,
+                     unsigned int *eax, unsigned int *ebx,
+                     unsigned int *ecx, unsigned int *edx);
+    void (*arch_vpmu_destroy)(struct vcpu *v);
+    int (*arch_vpmu_save)(struct vcpu *v);
+    void (*arch_vpmu_load)(struct vcpu *v);
+    void (*arch_vpmu_dump)(struct vcpu *v);
+};
+
+int vmx_vpmu_initialise(struct vcpu *, unsigned int flags);
+int svm_vpmu_initialise(struct vcpu *, unsigned int flags);
+
+struct vpmu_struct {
+    u32 flags;
+    u32 last_pcpu;
+    u32 hw_lapic_lvtpc;
+    void *context;
+    struct arch_vpmu_ops *arch_vpmu_ops;
+    xenpmu_data_t *xenpmu_data;
+};
+
+/* VPMU states */
+#define VPMU_CONTEXT_ALLOCATED              0x1
+#define VPMU_CONTEXT_LOADED                 0x2
+#define VPMU_RUNNING                        0x4
+#define VPMU_CONTEXT_SAVE                   0x8   /* Force context save */
+#define VPMU_FROZEN                         0x10  /* Stop counters while VCPU
is not running */
+#define VPMU_PASSIVE_DOMAIN_ALLOCATED       0x20
+#define VPMU_WAIT_FOR_FLUSH                 0x40  /* PV guest waits for
XENPMU_flush */
+
+#define vpmu_set(_vpmu, _x)         ((_vpmu)->flags |= (_x))
+#define vpmu_reset(_vpmu, _x)       ((_vpmu)->flags &= ~(_x))
+#define vpmu_is_set(_vpmu, _x)      ((_vpmu)->flags & (_x))
+#define vpmu_is_set_all(_vpmu, _x)  (((_vpmu)->flags & (_x)) == (_x))
+#define vpmu_clear(_vpmu)           ((_vpmu)->flags = 0)
+
+int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content);
+int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
+int vpmu_do_interrupt(struct cpu_user_regs *regs);
+void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
+                                       unsigned int *ecx, unsigned int *edx);
+void vpmu_initialise(struct vcpu *v);
+void vpmu_destroy(struct vcpu *v);
+void vpmu_save(struct vcpu *v);
+void vpmu_load(struct vcpu *v);
+void vpmu_dump(struct vcpu *v);
+
+extern int acquire_pmu_ownership(int pmu_ownership);
+extern void release_pmu_ownership(int pmu_ownership);
+
+extern uint32_t vpmu_mode;
+
+#endif /* __ASM_X86_HVM_VPMU_H_*/
+
-- 
1.8.1.4

Dietmar Hahn

2013-Sep-23 11:42 UTC

head link

Re: [PATCH v2 05/13] intel/VPMU: Clean up Intel VPMU code

Am Freitag 20 September 2013, 05:42:04 schrieb Boris
Ostrovsky:> Remove struct pmumsr and core2_pmu_enable. Replace static MSR structures
with
> fields in core2_vpmu_context.
> 
> Call core2_get_pmc_count() once, during initialization.
> 
> Properly clean up when core2_vpmu_alloc_resource() fails and add routines
> to remove MSRs from VMCS.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c              |  59 +++++++
>  xen/arch/x86/hvm/vmx/vpmu_core2.c        | 289
++++++++++++++-----------------
>  xen/include/asm-x86/hvm/vmx/vmcs.h       |   2 +
>  xen/include/asm-x86/hvm/vmx/vpmu_core2.h |  19 --
>  4 files changed, 191 insertions(+), 178 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index de9f592..756bc13 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1136,6 +1136,36 @@ int vmx_add_guest_msr(u32 msr)
>      return 0;
>  }
>  
> +void vmx_rm_guest_msr(u32 msr)
> +{
> +    struct vcpu *curr = current;
> +    unsigned int i, idx, msr_count = curr->arch.hvm_vmx.msr_count;
> +    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
> +
> +    if ( msr_area == NULL )
> +        return;
> +    
unneeded spaces
> +    for ( idx = 0; idx < msr_count; idx++ )
> +        if ( msr_area[idx].index == msr )
> +            break;
> +
> +    if ( idx == msr_count )
> +        return;
> +    
unneeded spaces
> +    for ( i = idx; i < msr_count - 1; i++ )
> +    {
> +        msr_area[i].index = msr_area[i + 1].index;
> +        rdmsrl(msr_area[i].index, msr_area[i].data);
> +    }
> +    msr_area[msr_count - 1].index = 0;
> +
> +    curr->arch.hvm_vmx.msr_count = --msr_count;
> +    __vmwrite(VM_EXIT_MSR_STORE_COUNT, msr_count);
> +    __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, msr_count);
> +
> +    return;
> +}
> +
>  int vmx_add_host_load_msr(u32 msr)
>  {
>      struct vcpu *curr = current;
> @@ -1166,6 +1196,35 @@ int vmx_add_host_load_msr(u32 msr)
>      return 0;
>  }
>  
> +void vmx_rm_host_load_msr(u32 msr)
> +{
> +    struct vcpu *curr = current;
> +    unsigned int i, idx,  msr_count =
curr->arch.hvm_vmx.host_msr_count;
> +    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.host_msr_area;
> +
> +    if ( msr_area == NULL )
> +        return;
> +
> +    for ( idx = 0; idx < msr_count; idx++ )
> +        if ( msr_area[idx].index == msr )
> +            break;
> +
> +    if ( idx == msr_count )
> +        return;
> +    
unneeded spaces
> +    for ( i = idx; i < msr_count - 1; i++ )
> +    {
> +        msr_area[i].index = msr_area[i + 1].index;
> +        rdmsrl(msr_area[i].index, msr_area[i].data);
> +    }
> +    msr_area[msr_count - 1].index = 0;    
trailing spaces
> +
> +    curr->arch.hvm_vmx.host_msr_count = --msr_count;
> +    __vmwrite(VM_EXIT_MSR_LOAD_COUNT, msr_count);
> +
> +    return;
> +}
> +
>  void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector)
>  {
>      int index, offset, changed;
> diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c
b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> index 101888d..50f784f 100644
> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -65,6 +65,26 @@
>  #define PMU_FIXED_WIDTH_MASK     (((1 << PMU_FIXED_WIDTH_BITS) -1)
<< PMU_FIXED_WIDTH_SHIFT)
>  
>  /*
> + * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
> + * counters. 4 bits for every counter.
> + */
> +#define FIXED_CTR_CTRL_BITS 4
> +#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
> +
> +#define VPMU_CORE2_MAX_FIXED_PMCS     4
> +struct core2_vpmu_context {
> +    u64 fixed_ctrl;
> +    u64 ds_area;
> +    u64 pebs_enable;
> +    u64 global_ovf_status;
> +    u64 fix_counters[VPMU_CORE2_MAX_FIXED_PMCS];
> +    struct arch_msr_pair arch_msr_pair[1];
> +};
> +
> +static int arch_pmc_cnt; /* Number of general-purpose performance counters
*/
> +static int fixed_pmc_cnt; /* Number of fixed performance counters */
> +
> +/*
>   * QUIRK to workaround an issue on various family 6 cpus.
>   * The issue leads to endless PMC interrupt loops on the processor.
>   * If the interrupt handler is running and a pmc reaches the value 0, this
> @@ -84,11 +104,8 @@ static void check_pmc_quirk(void)
>          is_pmc_quirk = 0;    
>  }
>  
> -static int core2_get_pmc_count(void);
>  static void handle_pmc_quirk(u64 msr_content)
>  {
> -    int num_gen_pmc = core2_get_pmc_count();
> -    int num_fix_pmc  = 3;
>      int i;
>      u64 val;
>  
> @@ -96,7 +113,7 @@ static void handle_pmc_quirk(u64 msr_content)
>          return;
>  
>      val = msr_content;
> -    for ( i = 0; i < num_gen_pmc; i++ )
> +    for ( i = 0; i < arch_pmc_cnt; i++ )
>      {
>          if ( val & 0x1 )
>          {
> @@ -108,7 +125,7 @@ static void handle_pmc_quirk(u64 msr_content)
>          val >>= 1;
>      }
>      val = msr_content >> 32;
> -    for ( i = 0; i < num_fix_pmc; i++ )
> +    for ( i = 0; i < fixed_pmc_cnt; i++ )
>      {
>          if ( val & 0x1 )
>          {
> @@ -121,65 +138,32 @@ static void handle_pmc_quirk(u64 msr_content)
>      }
>  }
>  
> -static const u32 core2_fix_counters_msr[] = {
> -    MSR_CORE_PERF_FIXED_CTR0,
> -    MSR_CORE_PERF_FIXED_CTR1,
> -    MSR_CORE_PERF_FIXED_CTR2
> -};
> -
>  /*
> - * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
> - * counters. 4 bits for every counter.
> + * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
>   */
> -#define FIXED_CTR_CTRL_BITS 4
> -#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
> -
> -/* The index into the core2_ctrls_msr[] of this MSR used in
core2_vpmu_dump() */
> -#define MSR_CORE_PERF_FIXED_CTR_CTRL_IDX 0
> -
> -/* Core 2 Non-architectual Performance Control MSRs. */
> -static const u32 core2_ctrls_msr[] = {
> -    MSR_CORE_PERF_FIXED_CTR_CTRL,
> -    MSR_IA32_PEBS_ENABLE,
> -    MSR_IA32_DS_AREA
> -};
> -
> -struct pmumsr {
> -    unsigned int num;
> -    const u32 *msr;
> -};
> -
> -static const struct pmumsr core2_fix_counters = {
> -    VPMU_CORE2_NUM_FIXED,
> -    core2_fix_counters_msr
> -};
> +static int core2_get_arch_pmc_count(void)
> +{
> +    u32 eax, ebx, ecx, edx;
>  
> -static const struct pmumsr core2_ctrls = {
> -    VPMU_CORE2_NUM_CTRLS,
> -    core2_ctrls_msr
> -};
> -static int arch_pmc_cnt;
> +    cpuid(0xa, &eax, &ebx, &ecx, &edx);
> +    return ( (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT
);
> +}
>  
>  /*
> - * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
> + * Read the number of fixed counters via CPUID.EDX[0xa].EDX[0..4]
>   */
> -static int core2_get_pmc_count(void)
> +static int core2_get_fixed_pmc_count(void)
>  {
>      u32 eax, ebx, ecx, edx;
>  
> -    if ( arch_pmc_cnt == 0 )
> -    {
> -        cpuid(0xa, &eax, &ebx, &ecx, &edx);
> -        arch_pmc_cnt = (eax & PMU_GENERAL_NR_MASK) >>
PMU_GENERAL_NR_SHIFT;
> -    }
> -
> -    return arch_pmc_cnt;
> +    cpuid(0xa, &eax, &ebx, &ecx, &edx);
> +    return ( (eax & PMU_FIXED_NR_MASK) >> PMU_FIXED_NR_SHIFT );
>  }
>  
>  static u64 core2_calc_intial_glb_ctrl_msr(void)
>  {
> -    int arch_pmc_bits = (1 << core2_get_pmc_count()) - 1;
> -    u64 fix_pmc_bits  = (1 << 3) - 1;
> +    int arch_pmc_bits = (1 << arch_pmc_cnt) - 1;
> +    u64 fix_pmc_bits  = (1 << fixed_pmc_cnt) - 1;
>      return ((fix_pmc_bits << 32) | arch_pmc_bits);
>  }
>  
> @@ -196,9 +180,9 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type,
int *index)
>  {
>      int i;
>  
> -    for ( i = 0; i < core2_fix_counters.num; i++ )
> +    for ( i = 0; i < fixed_pmc_cnt; i++ )
>      {
> -        if ( core2_fix_counters.msr[i] == msr_index )
> +        if ( msr_index == MSR_CORE_PERF_FIXED_CTR0 + i )
>          {
>              *type = MSR_TYPE_COUNTER;
>              *index = i;
> @@ -206,14 +190,12 @@ static int is_core2_vpmu_msr(u32 msr_index, int
*type, int *index)
>          }
>      }
>  
> -    for ( i = 0; i < core2_ctrls.num; i++ )
> +    if ( (msr_index == MSR_CORE_PERF_FIXED_CTR_CTRL ) ||
> +        (msr_index == MSR_IA32_DS_AREA) ||
> +        (msr_index == MSR_IA32_PEBS_ENABLE) )
>      {
> -        if ( core2_ctrls.msr[i] == msr_index )
> -        {
> -            *type = MSR_TYPE_CTRL;
> -            *index = i;
> -            return 1;
> -        }
> +        *type = MSR_TYPE_CTRL;
> +        return 1;
>      }
>  
>      if ( (msr_index == MSR_CORE_PERF_GLOBAL_CTRL) ||
> @@ -225,7 +207,7 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type,
int *index)
>      }
>  
>      if ( (msr_index >= MSR_IA32_PERFCTR0) &&
> -         (msr_index < (MSR_IA32_PERFCTR0 + core2_get_pmc_count())) )
> +         (msr_index < (MSR_IA32_PERFCTR0 + arch_pmc_cnt)) )
>      {
>          *type = MSR_TYPE_ARCH_COUNTER;
>          *index = msr_index - MSR_IA32_PERFCTR0;
> @@ -233,7 +215,7 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type,
int *index)
>      }
>  
>      if ( (msr_index >= MSR_P6_EVNTSEL0) &&
> -         (msr_index < (MSR_P6_EVNTSEL0 + core2_get_pmc_count())) )
> +         (msr_index < (MSR_P6_EVNTSEL0 + arch_pmc_cnt)) )
>      {
>          *type = MSR_TYPE_ARCH_CTRL;
>          *index = msr_index - MSR_P6_EVNTSEL0;
> @@ -248,13 +230,13 @@ static void core2_vpmu_set_msr_bitmap(unsigned long
*msr_bitmap)
>      int i;
>  
>      /* Allow Read/Write PMU Counters MSR Directly. */
> -    for ( i = 0; i < core2_fix_counters.num; i++ )
> +    for ( i = 0; i < fixed_pmc_cnt; i++ )
>      {
> -        clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
msr_bitmap);
> -        clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
> +        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
msr_bitmap);
> +        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
>                    msr_bitmap + 0x800/BYTES_PER_LONG);
>      }
> -    for ( i = 0; i < core2_get_pmc_count(); i++ )
> +    for ( i = 0; i < arch_pmc_cnt; i++ )
>      {
>          clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
>          clear_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
> @@ -262,32 +244,37 @@ static void core2_vpmu_set_msr_bitmap(unsigned long
*msr_bitmap)
>      }
>  
>      /* Allow Read PMU Non-global Controls Directly. */
> -    for ( i = 0; i < core2_ctrls.num; i++ )
> -        clear_bit(msraddr_to_bitpos(core2_ctrls.msr[i]), msr_bitmap);
> -    for ( i = 0; i < core2_get_pmc_count(); i++ )
> +    for ( i = 0; i < arch_pmc_cnt; i++ )
>          clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap);
> +
> +    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL),
msr_bitmap);
> +    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
> +    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
>  }
>  
>  static void core2_vpmu_unset_msr_bitmap(unsigned long *msr_bitmap)
>  {
>      int i;
>  
> -    for ( i = 0; i < core2_fix_counters.num; i++ )
> +    for ( i = 0; i < fixed_pmc_cnt; i++ )
>      {
> -        set_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]), msr_bitmap);
> -        set_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
> +        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
msr_bitmap);
> +        set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
>                  msr_bitmap + 0x800/BYTES_PER_LONG);
>      }
> -    for ( i = 0; i < core2_get_pmc_count(); i++ )
> +    for ( i = 0; i < arch_pmc_cnt; i++ )
>      {
>          set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i), msr_bitmap);
>          set_bit(msraddr_to_bitpos(MSR_IA32_PERFCTR0+i),
>                  msr_bitmap + 0x800/BYTES_PER_LONG);
>      }
> -    for ( i = 0; i < core2_ctrls.num; i++ )
> -        set_bit(msraddr_to_bitpos(core2_ctrls.msr[i]), msr_bitmap);
> -    for ( i = 0; i < core2_get_pmc_count(); i++ )
> +
> +    for ( i = 0; i < arch_pmc_cnt; i++ )
>          set_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap);
> +
> +    set_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL), msr_bitmap);
> +    set_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
> +    set_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
>  }
>  
>  static inline void __core2_vpmu_save(struct vcpu *v)
> @@ -295,10 +282,10 @@ static inline void __core2_vpmu_save(struct vcpu *v)
>      int i;
>      struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
>  
> -    for ( i = 0; i < core2_fix_counters.num; i++ )
> -        rdmsrl(core2_fix_counters.msr[i],
core2_vpmu_cxt->fix_counters[i]);
> -    for ( i = 0; i < core2_get_pmc_count(); i++ )
> -        rdmsrl(MSR_IA32_PERFCTR0+i,
core2_vpmu_cxt->arch_msr_pair[i].counter);
> +    for ( i = 0; i < fixed_pmc_cnt; i++ )
> +        rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i,
core2_vpmu_cxt->fix_counters[i]);
> +    for ( i = 0; i < arch_pmc_cnt; i++ )
> +        rdmsrl(MSR_IA32_PERFCTR0 + i,
core2_vpmu_cxt->arch_msr_pair[i].counter);
>  }
>  
>  static int core2_vpmu_save(struct vcpu *v)
> @@ -322,14 +309,16 @@ static inline void __core2_vpmu_load(struct vcpu *v)
>      int i;
>      struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
>  
> -    for ( i = 0; i < core2_fix_counters.num; i++ )
> -        wrmsrl(core2_fix_counters.msr[i],
core2_vpmu_cxt->fix_counters[i]);
> -    for ( i = 0; i < core2_get_pmc_count(); i++ )
> -        wrmsrl(MSR_IA32_PERFCTR0+i,
core2_vpmu_cxt->arch_msr_pair[i].counter);
> +    for ( i = 0; i < fixed_pmc_cnt; i++ )
> +        wrmsrl(MSR_CORE_PERF_FIXED_CTR0 + i,
core2_vpmu_cxt->fix_counters[i]);
> +    for ( i = 0; i < arch_pmc_cnt; i++ )
> +        wrmsrl(MSR_IA32_PERFCTR0 + i,
core2_vpmu_cxt->arch_msr_pair[i].counter);
>  
> -    for ( i = 0; i < core2_ctrls.num; i++ )
> -        wrmsrl(core2_ctrls.msr[i], core2_vpmu_cxt->ctrls[i]);
> -    for ( i = 0; i < core2_get_pmc_count(); i++ )
> +    wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
> +    wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
> +    wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
> +
> +    for ( i = 0; i < arch_pmc_cnt; i++ )
>          wrmsrl(MSR_P6_EVNTSEL0+i,
core2_vpmu_cxt->arch_msr_pair[i].control);
>  }
>  
> @@ -347,56 +336,39 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
>      struct core2_vpmu_context *core2_vpmu_cxt;
> -    struct core2_pmu_enable *pmu_enable;
>  
>      if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
>          return 0;
>  
>      wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
>      if ( vmx_add_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
> -        return 0;
> +        goto out_err;
>  
>      if ( vmx_add_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL) )
> -        return 0;
> +        goto out_err;
>      vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
>                   core2_calc_intial_glb_ctrl_msr());
>  
> -    pmu_enable = xzalloc_bytes(sizeof(struct core2_pmu_enable) +
> -                               core2_get_pmc_count() - 1);
> -    if ( !pmu_enable )
> -        goto out1;
> -
>      core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) +
> -                    (core2_get_pmc_count()-1)*sizeof(struct
arch_msr_pair));
> +                    (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair));
>      if ( !core2_vpmu_cxt )
> -        goto out2;
> -    core2_vpmu_cxt->pmu_enable = pmu_enable;
> +        goto out_err;
> +
>      vpmu->context = (void *)core2_vpmu_cxt;
>  
> +    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
> +
>      return 1;
> - out2:
> -    xfree(pmu_enable);
> - out1:
> -    gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, PMU
feature is "
> -             "unavailable on domain %d vcpu %d.\n",
> -             v->vcpu_id, v->domain->domain_id);
> -    return 0;
> -}
>  
> -static void core2_vpmu_save_msr_context(struct vcpu *v, int type,
> -                                       int index, u64 msr_data)
> -{
> -    struct core2_vpmu_context *core2_vpmu_cxt = vcpu_vpmu(v)->context;
> +out_err:
> +    vmx_rm_host_load_msr(MSR_CORE_PERF_GLOBAL_CTRL);
> +    vmx_rm_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL);
> +    release_pmu_ownship(PMU_OWNER_HVM);
>  
> -    switch ( type )
> -    {
> -    case MSR_TYPE_CTRL:
> -        core2_vpmu_cxt->ctrls[index] = msr_data;
> -        break;
> -    case MSR_TYPE_ARCH_CTRL:
> -        core2_vpmu_cxt->arch_msr_pair[index].control = msr_data;
> -        break;
> -    }
> +    printk("Failed to allocate VPMU resources for domain %u vcpu
%u\n",
> +           v->vcpu_id, v->domain->domain_id);
> +
> +    return 0;
>  }
>  
>  static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int
*index)
> @@ -407,10 +379,8 @@ static int core2_vpmu_msr_common_check(u32 msr_index,
int *type, int *index)
>          return 0;
>  
>      if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) &&
> -	 (vpmu->context != NULL ||
> -	  !core2_vpmu_alloc_resource(current)) )
> +         !core2_vpmu_alloc_resource(current) )
>          return 0;
> -    vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
>  
>      /* Do the lazy load staff. */
>      if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
> @@ -426,7 +396,7 @@ static int core2_vpmu_msr_common_check(u32 msr_index,
int *type, int *index)
>  static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>  {
>      u64 global_ctrl, non_global_ctrl;
> -    char pmu_enable = 0;
> +    unsigned pmu_enable = 0;
>      int i, tmp;
>      int type = -1, index = -1;
>      struct vcpu *v = current;
> @@ -471,6 +441,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
uint64_t msr_content)
>          if ( msr_content & 1 )
>              gdprintk(XENLOG_WARNING, "Guest is trying to enable PEBS,
"
>                       "which is not supported.\n");
> +        core2_vpmu_cxt->pebs_enable = msr_content;
>          return 1;
>      case MSR_IA32_DS_AREA:
>          if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_DS) )
> @@ -483,27 +454,25 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
uint64_t msr_content)
>                  hvm_inject_hw_exception(TRAP_gp_fault, 0);
>                  return 1;
>              }
> -            core2_vpmu_cxt->pmu_enable->ds_area_enable = msr_content
? 1 : 0;
> +            core2_vpmu_cxt->ds_area = msr_content;
>              break;
>          }
>          gdprintk(XENLOG_WARNING, "Guest setting of DTS is
ignored.\n");
>          return 1;
>      case MSR_CORE_PERF_GLOBAL_CTRL:
>          global_ctrl = msr_content;
> -        for ( i = 0; i < core2_get_pmc_count(); i++ )
> +        for ( i = 0; i < arch_pmc_cnt; i++ )
>          {
>              rdmsrl(MSR_P6_EVNTSEL0+i, non_global_ctrl);
> -            core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i] > -   
global_ctrl & (non_global_ctrl >> 22) & 1;
> +            pmu_enable += global_ctrl & (non_global_ctrl >> 22)
& 1;
>              global_ctrl >>= 1;
>          }
>  
>          rdmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, non_global_ctrl);
>          global_ctrl = msr_content >> 32;
> -        for ( i = 0; i < core2_fix_counters.num; i++ )
> +        for ( i = 0; i < fixed_pmc_cnt; i++ )
>          {
> -            core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] > -  
(global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1: 0);
> +            pmu_enable += (global_ctrl & 1) & ((non_global_ctrl
& 0x3)? 1: 0);
>              non_global_ctrl >>= FIXED_CTR_CTRL_BITS;
>              global_ctrl >>= 1;
>          }
> @@ -512,27 +481,27 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
uint64_t msr_content)
>          non_global_ctrl = msr_content;
>          vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
>          global_ctrl >>= 32;
> -        for ( i = 0; i < core2_fix_counters.num; i++ )
> +        for ( i = 0; i < fixed_pmc_cnt; i++ )
>          {
> -            core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] > -  
(global_ctrl & 1) & ((non_global_ctrl & 0x3)? 1: 0);
> +            pmu_enable += (global_ctrl & 1) & ((non_global_ctrl
& 0x3)? 1: 0);
>              non_global_ctrl >>= 4;
>              global_ctrl >>= 1;
>          }
> +        core2_vpmu_cxt->fixed_ctrl = msr_content;
>          break;
>      default:
>          tmp = msr - MSR_P6_EVNTSEL0;
> -        vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
> -        if ( tmp >= 0 && tmp < core2_get_pmc_count() )
> -            core2_vpmu_cxt->pmu_enable->arch_pmc_enable[tmp] > - 
(global_ctrl >> tmp) & (msr_content >> 22) & 1;
> +        if ( tmp >= 0 && tmp < arch_pmc_cnt )
> +        {
> +            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
&global_ctrl);
> +            core2_vpmu_cxt->arch_msr_pair[tmp].control = msr_content;
> +            for ( i = 0; i < arch_pmc_cnt && !pmu_enable; i++ )
> +                pmu_enable += (global_ctrl >> i) &
> +                    (core2_vpmu_cxt->arch_msr_pair[i].control >>
22) & 1;
> +        }
>      }
>  
> -    for ( i = 0; i < core2_fix_counters.num; i++ )
> -        pmu_enable |=
core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i];
> -    for ( i = 0; i < core2_get_pmc_count(); i++ )
> -        pmu_enable |=
core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i];
> -    pmu_enable |= core2_vpmu_cxt->pmu_enable->ds_area_enable;
> +    pmu_enable += (core2_vpmu_cxt->ds_area != 0);
>      if ( pmu_enable )
>          vpmu_set(vpmu, VPMU_RUNNING);
>      else
> @@ -551,7 +520,6 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
uint64_t msr_content)
>          vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
>      }
>  
> -    core2_vpmu_save_msr_context(v, type, index, msr_content);
>      if ( type != MSR_TYPE_GLOBAL )
>      {
>          u64 mask;
> @@ -567,7 +535,7 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
uint64_t msr_content)
>              if  ( msr == MSR_IA32_DS_AREA )
>                  break;
>              /* 4 bits per counter, currently 3 fixed counters implemented.
*/
> -            mask = ~((1ull << (VPMU_CORE2_NUM_FIXED *
FIXED_CTR_CTRL_BITS)) - 1);
> +            mask = ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS))
- 1);
>              if (msr_content & mask)
>                  inject_gp = 1;
>              break;
> @@ -652,7 +620,7 @@ static void core2_vpmu_do_cpuid(unsigned int input,
>  static void core2_vpmu_dump(struct vcpu *v)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -    int i, num;
> +    int i;
>      struct core2_vpmu_context *core2_vpmu_cxt = NULL;
>      u64 val;
>  
> @@ -670,26 +638,24 @@ static void core2_vpmu_dump(struct vcpu *v)
>  
>      printk("    vPMU running\n");
>      core2_vpmu_cxt = vpmu->context;
> -    num = core2_get_pmc_count();
> +
>      /* Print the contents of the counter and its configuration msr. */
> -    for ( i = 0; i < num; i++ )
> +    for ( i = 0; i < arch_pmc_cnt; i++ )
>      {
>          struct arch_msr_pair* msr_pair = core2_vpmu_cxt->arch_msr_pair;
> -        if ( core2_vpmu_cxt->pmu_enable->arch_pmc_enable[i] )
> -            printk("      general_%d: 0x%016lx ctrl:
0x%016lx\n",
> -                             i, msr_pair[i].counter, msr_pair[i].control);
> +        printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
> +               i, msr_pair[i].counter, msr_pair[i].control);
>      }
>      /*
>       * The configuration of the fixed counter is 4 bits each in the
>       * MSR_CORE_PERF_FIXED_CTR_CTRL.
>       */
> -    val = core2_vpmu_cxt->ctrls[MSR_CORE_PERF_FIXED_CTR_CTRL_IDX];
> -    for ( i = 0; i < core2_fix_counters.num; i++ )
> +    val = core2_vpmu_cxt->fixed_ctrl;
> +    for ( i = 0; i < fixed_pmc_cnt; i++ )
>      {
> -        if ( core2_vpmu_cxt->pmu_enable->fixed_ctr_enable[i] )
> -            printk("      fixed_%d:   0x%016lx ctrl: 0x%lx\n",
> -                             i, core2_vpmu_cxt->fix_counters[i],
> -                             val & FIXED_CTR_CTRL_MASK);
> +        printk("      fixed_%d:   0x%016lx ctrl: 0x%lx\n",
> +               i, core2_vpmu_cxt->fix_counters[i],
> +               val & FIXED_CTR_CTRL_MASK);
>          val >>= FIXED_CTR_CTRL_BITS;
>      }
>  }
> @@ -707,7 +673,7 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs
*regs)
>          if ( is_pmc_quirk )
>              handle_pmc_quirk(msr_content);
>          core2_vpmu_cxt->global_ovf_status |= msr_content;
> -        msr_content = 0xC000000700000000 | ((1 <<
core2_get_pmc_count()) - 1);
> +        msr_content = 0xC000000700000000 | ((1 << arch_pmc_cnt) -
1);
>          wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
>      }
>      else
> @@ -770,18 +736,23 @@ static int core2_vpmu_initialise(struct vcpu *v,
unsigned int vpmu_flags)
>          }
>      }
>  func_out:
> +
> +    arch_pmc_cnt = core2_get_arch_pmc_count();
> +    fixed_pmc_cnt = core2_get_fixed_pmc_count();
> +    if ( fixed_pmc_cnt > VPMU_CORE2_MAX_FIXED_PMCS )
> +        fixed_pmc_cnt = VPMU_CORE2_MAX_FIXED_PMCS;
Maybe printing a message when limiting the number of counters?

Dietmar.
>      check_pmc_quirk();
> +
>      return 0;
>  }
>  
>  static void core2_vpmu_destroy(struct vcpu *v)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(v);
> -    struct core2_vpmu_context *core2_vpmu_cxt = vpmu->context;
>  
>      if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
>          return;
> -    xfree(core2_vpmu_cxt->pmu_enable);
> +
>      xfree(vpmu->context);
>      if ( cpu_has_vmx_msr_bitmap && is_hvm_domain(v->domain) )
>          core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h
b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index f30e5ac..5971613 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -470,7 +470,9 @@ void vmx_enable_intercept_for_msr(struct vcpu *v, u32
msr, int type);
>  int vmx_read_guest_msr(u32 msr, u64 *val);
>  int vmx_write_guest_msr(u32 msr, u64 val);
>  int vmx_add_guest_msr(u32 msr);
> +void vmx_rm_guest_msr(u32 msr);
>  int vmx_add_host_load_msr(u32 msr);
> +void vmx_rm_host_load_msr(u32 msr);
>  void vmx_vmcs_switch(struct vmcs_struct *from, struct vmcs_struct *to);
>  void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector);
>  void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector);
> diff --git a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
> index 60b05fd..410372d 100644
> --- a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
> +++ b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
> @@ -23,29 +23,10 @@
>  #ifndef __ASM_X86_HVM_VPMU_CORE_H_
>  #define __ASM_X86_HVM_VPMU_CORE_H_
>  
> -/* Currently only 3 fixed counters are supported. */
> -#define VPMU_CORE2_NUM_FIXED 3
> -/* Currently only 3 Non-architectual Performance Control MSRs */
> -#define VPMU_CORE2_NUM_CTRLS 3
> -
>  struct arch_msr_pair {
>      u64 counter;
>      u64 control;
>  };
>  
> -struct core2_pmu_enable {
> -    char ds_area_enable;
> -    char fixed_ctr_enable[VPMU_CORE2_NUM_FIXED];
> -    char arch_pmc_enable[1];
> -};
> -
> -struct core2_vpmu_context {
> -    struct core2_pmu_enable *pmu_enable;
> -    u64 fix_counters[VPMU_CORE2_NUM_FIXED];
> -    u64 ctrls[VPMU_CORE2_NUM_CTRLS];
> -    u64 global_ovf_status;
> -    struct arch_msr_pair arch_msr_pair[1];
> -};
> -
>  #endif /* __ASM_X86_HVM_VPMU_CORE_H_ */
>  
> -- 
Company details: http://ts.fujitsu.com/imprint.html

Dietmar Hahn

2013-Sep-23 13:04 UTC

head link

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

Am Freitag 20 September 2013, 05:42:05 schrieb Boris
Ostrovsky:> Add xenpmu.h header file, move various macros and structures that will be
> shared between hypervisor and PV guests to it.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
>  xen/arch/x86/hvm/svm/vpmu.c              | 15 +++-----
>  xen/arch/x86/hvm/vmx/vpmu_core2.c        | 43 ++++++++++++----------
>  xen/arch/x86/hvm/vpmu.c                  |  1 +
>  xen/arch/x86/oprofile/op_model_ppro.c    |  6 +++-
>  xen/include/asm-x86/hvm/vmx/vpmu_core2.h | 32 -----------------
>  xen/include/asm-x86/hvm/vpmu.h           | 10 ++----
>  xen/include/public/arch-x86/xenpmu-x86.h | 62
++++++++++++++++++++++++++++++++
>  xen/include/public/xenpmu.h              | 38 ++++++++++++++++++++
>  8 files changed, 136 insertions(+), 71 deletions(-)
>  delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
>  create mode 100644 xen/include/public/arch-x86/xenpmu-x86.h
>  create mode 100644 xen/include/public/xenpmu.h
> 
> diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
> index a09930e..25532d0 100644
> --- a/xen/arch/x86/hvm/svm/vpmu.c
> +++ b/xen/arch/x86/hvm/svm/vpmu.c
> @@ -30,10 +30,7 @@
>  #include <asm/apic.h>
>  #include <asm/hvm/vlapic.h>
>  #include <asm/hvm/vpmu.h>
> -
> -#define F10H_NUM_COUNTERS 4
> -#define F15H_NUM_COUNTERS 6
> -#define MAX_NUM_COUNTERS F15H_NUM_COUNTERS
> +#include <public/xenpmu.h>
>  
>  #define MSR_F10H_EVNTSEL_GO_SHIFT   40
>  #define MSR_F10H_EVNTSEL_EN_SHIFT   22
> @@ -49,6 +46,9 @@ static const u32 __read_mostly *counters;
>  static const u32 __read_mostly *ctrls;
>  static bool_t __read_mostly k7_counters_mirrored;
>  
> +#define F10H_NUM_COUNTERS   4
> +#define F15H_NUM_COUNTERS   6
> +
>  /* PMU Counter MSRs. */
>  static const u32 AMD_F10H_COUNTERS[] = {
>      MSR_K7_PERFCTR0,
> @@ -83,13 +83,6 @@ static const u32 AMD_F15H_CTRLS[] = {
>      MSR_AMD_FAM15H_EVNTSEL5
>  };
>  
> -/* storage for context switching */
> -struct amd_vpmu_context {
> -    u64 counters[MAX_NUM_COUNTERS];
> -    u64 ctrls[MAX_NUM_COUNTERS];
> -    bool_t msr_bitmap_set;
> -};
> -
>  static inline int get_pmu_reg_type(u32 addr)
>  {
>      if ( (addr >= MSR_K7_EVNTSEL0) && (addr <=
MSR_K7_EVNTSEL3) )
> diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c
b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> index 50f784f..7d1da3f 100644
> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -35,8 +35,8 @@
>  #include <asm/hvm/vmx/vmcs.h>
>  #include <public/sched.h>
>  #include <public/hvm/save.h>
> +#include <public/xenpmu.h>
>  #include <asm/hvm/vpmu.h>
> -#include <asm/hvm/vmx/vpmu_core2.h>
>  
>  /*
>   * See Intel SDM Vol 2a Instruction Set Reference chapter 3 for CPUID
> @@ -64,6 +64,10 @@
>  #define PMU_FIXED_WIDTH_BITS     8  /* 8 bits 5..12 */
>  #define PMU_FIXED_WIDTH_MASK     (((1 << PMU_FIXED_WIDTH_BITS) -1)
<< PMU_FIXED_WIDTH_SHIFT)
>  
> +/* Intel-specific VPMU features */
> +#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
> +#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace
Store */
> +
>  /*
>   * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
>   * counters. 4 bits for every counter.
> @@ -71,16 +75,6 @@
>  #define FIXED_CTR_CTRL_BITS 4
>  #define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
>  
> -#define VPMU_CORE2_MAX_FIXED_PMCS     4
> -struct core2_vpmu_context {
> -    u64 fixed_ctrl;
> -    u64 ds_area;
> -    u64 pebs_enable;
> -    u64 global_ovf_status;
> -    u64 fix_counters[VPMU_CORE2_MAX_FIXED_PMCS];
> -    struct arch_msr_pair arch_msr_pair[1];
> -};
> -
>  static int arch_pmc_cnt; /* Number of general-purpose performance counters
*/
>  static int fixed_pmc_cnt; /* Number of fixed performance counters */
>  
> @@ -225,6 +219,7 @@ static int is_core2_vpmu_msr(u32 msr_index, int *type,
int *index)
>      return 0;
>  }
>  
> +#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
>  static void core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
>  {
>      int i;
> @@ -349,8 +344,7 @@ static int core2_vpmu_alloc_resource(struct vcpu *v)
>      vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
>                   core2_calc_intial_glb_ctrl_msr());
>  
> -    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context) +
> -                    (arch_pmc_cnt-1)*sizeof(struct arch_msr_pair));
> +    core2_vpmu_cxt = xzalloc_bytes(sizeof(struct core2_vpmu_context));
>      if ( !core2_vpmu_cxt )
>          goto out_err;
>  
> @@ -614,6 +608,18 @@ static void core2_vpmu_do_cpuid(unsigned int input,
>                  *ecx |= cpufeat_mask(X86_FEATURE_DSCPL);
>          }
>      }
> +    else if ( input == 0xa )
> +    {
> +        /* Limit number of counters to max that we support */
> +        if ( ((*eax & PMU_GENERAL_NR_MASK) >>
PMU_GENERAL_NR_SHIFT) >
> +             XENPMU_CORE2_MAX_ARCH_PMCS )
> +            *eax = (*eax & ~PMU_GENERAL_NR_MASK) |
> +                (XENPMU_CORE2_MAX_ARCH_PMCS <<
PMU_GENERAL_NR_SHIFT);
> +        if ( ((*edx & PMU_FIXED_NR_MASK) >> PMU_FIXED_NR_SHIFT)
>
> +             XENPMU_CORE2_MAX_FIXED_PMCS )
> +            *eax = (*eax & ~PMU_FIXED_NR_MASK) |
> +                (XENPMU_CORE2_MAX_FIXED_PMCS << PMU_FIXED_NR_SHIFT);
> +    }
>  }
>  
>  /* Dump vpmu info on console, called in the context of keyhandler
''q''. */
> @@ -641,11 +647,10 @@ static void core2_vpmu_dump(struct vcpu *v)
>  
>      /* Print the contents of the counter and its configuration msr. */
>      for ( i = 0; i < arch_pmc_cnt; i++ )
> -    {
> -        struct arch_msr_pair* msr_pair = core2_vpmu_cxt->arch_msr_pair;
>          printk("      general_%d: 0x%016lx ctrl: 0x%016lx\n",
> -               i, msr_pair[i].counter, msr_pair[i].control);
> -    }
> +               i, core2_vpmu_cxt->arch_msr_pair[i].counter,
> +               core2_vpmu_cxt->arch_msr_pair[i].control);
> +
>      /*
>       * The configuration of the fixed counter is 4 bits each in the
>       * MSR_CORE_PERF_FIXED_CTR_CTRL.
> @@ -739,8 +744,8 @@ func_out:
>  
>      arch_pmc_cnt = core2_get_arch_pmc_count();
>      fixed_pmc_cnt = core2_get_fixed_pmc_count();
> -    if ( fixed_pmc_cnt > VPMU_CORE2_MAX_FIXED_PMCS )
> -        fixed_pmc_cnt = VPMU_CORE2_MAX_FIXED_PMCS;
> +    if ( fixed_pmc_cnt > XENPMU_CORE2_MAX_FIXED_PMCS )
> +        fixed_pmc_cnt = XENPMU_CORE2_MAX_FIXED_PMCS;
>      check_pmc_quirk();
>  
>      return 0;
> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> index d6a9ff6..fa8cfd7 100644
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -31,6 +31,7 @@
>  #include <asm/hvm/svm/svm.h>
>  #include <asm/hvm/svm/vmcb.h>
>  #include <asm/apic.h>
> +#include <public/xenpmu.h>
>  
>  /*
>   * "vpmu" :     vpmu generally enabled
> diff --git a/xen/arch/x86/oprofile/op_model_ppro.c
b/xen/arch/x86/oprofile/op_model_ppro.c
> index 3225937..5aae2e7 100644
> --- a/xen/arch/x86/oprofile/op_model_ppro.c
> +++ b/xen/arch/x86/oprofile/op_model_ppro.c
> @@ -20,11 +20,15 @@
>  #include <asm/regs.h>
>  #include <asm/current.h>
>  #include <asm/hvm/vpmu.h>
> -#include <asm/hvm/vmx/vpmu_core2.h>
>  
>  #include "op_x86_model.h"
>  #include "op_counter.h"
>  
> +struct arch_msr_pair {
> +    u64 counter;
> +    u64 control;
> +};
> +
>  /*
>   * Intel "Architectural Performance Monitoring" CPUID
>   * detection/enumeration details:
> diff --git a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
b/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
> deleted file mode 100644
> index 410372d..0000000
> --- a/xen/include/asm-x86/hvm/vmx/vpmu_core2.h
> +++ /dev/null
> @@ -1,32 +0,0 @@
> -
> -/*
> - * vpmu_core2.h: CORE 2 specific PMU virtualization for HVM domain.
> - *
> - * Copyright (c) 2007, Intel Corporation.
> - *
> - * This program is free software; you can redistribute it and/or modify it
> - * under the terms and conditions of the GNU General Public License,
> - * version 2, as published by the Free Software Foundation.
> - *
> - * This program is distributed in the hope it will be useful, but WITHOUT
> - * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> - * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
for
> - * more details.
> - *
> - * You should have received a copy of the GNU General Public License along
with
> - * this program; if not, write to the Free Software Foundation, Inc., 59
Temple
> - * Place - Suite 330, Boston, MA 02111-1307 USA.
> - *
> - * Author: Haitao Shan <haitao.shan@intel.com>
> - */
> -
> -#ifndef __ASM_X86_HVM_VPMU_CORE_H_
> -#define __ASM_X86_HVM_VPMU_CORE_H_
> -
> -struct arch_msr_pair {
> -    u64 counter;
> -    u64 control;
> -};
> -
> -#endif /* __ASM_X86_HVM_VPMU_CORE_H_ */
> -
> diff --git a/xen/include/asm-x86/hvm/vpmu.h
b/xen/include/asm-x86/hvm/vpmu.h
> index 674cdad..50cdc4f 100644
> --- a/xen/include/asm-x86/hvm/vpmu.h
> +++ b/xen/include/asm-x86/hvm/vpmu.h
> @@ -22,6 +22,8 @@
>  #ifndef __ASM_X86_HVM_VPMU_H_
>  #define __ASM_X86_HVM_VPMU_H_
>  
> +#include <public/xenpmu.h>
> +
>  /*
>   * Flag bits given as a string on the hypervisor boot parameter
''vpmu''.
>   * See arch/x86/hvm/vpmu.c.
> @@ -29,12 +31,9 @@
>  #define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
>  #define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
>  
> -
> -#define msraddr_to_bitpos(x) (((x)&0xffff) + ((x)>>31)*0x2000)
>  #define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
>  #define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
>                                            arch.hvm_vcpu.vpmu))
> -#define vpmu_domain(vpmu) (vpmu_vcpu(vpmu)->domain)
>  
>  #define MSR_TYPE_COUNTER            0
>  #define MSR_TYPE_CTRL               1
> @@ -76,11 +75,6 @@ struct vpmu_struct {
>  #define VPMU_FROZEN                         0x10  /* Stop counters while
VCPU is not running */
>  #define VPMU_PASSIVE_DOMAIN_ALLOCATED       0x20
>  
> -/* VPMU features */
> -#define VPMU_CPU_HAS_DS                     0x100 /* Has Debug Store */
> -#define VPMU_CPU_HAS_BTS                    0x200 /* Has Branch Trace
Store */
> -
> -
>  #define vpmu_set(_vpmu, _x)         ((_vpmu)->flags |= (_x))
>  #define vpmu_reset(_vpmu, _x)       ((_vpmu)->flags &= ~(_x))
>  #define vpmu_is_set(_vpmu, _x)      ((_vpmu)->flags & (_x))
> diff --git a/xen/include/public/arch-x86/xenpmu-x86.h
b/xen/include/public/arch-x86/xenpmu-x86.h
> new file mode 100644
> index 0000000..04e02b3
> --- /dev/null
> +++ b/xen/include/public/arch-x86/xenpmu-x86.h
> @@ -0,0 +1,62 @@
> +#ifndef __XEN_PUBLIC_ARCH_X86_PMU_H__
> +#define __XEN_PUBLIC_ARCH_X86_PMU_H__
> +
> +/* x86-specific PMU definitions */
> +
> +
> +/* AMD PMU registers and structures */
> +#define XENPMU_AMD_MAX_COUNTERS        16 /* To accommodate more counters
in */
> +                                          /* the future (e.g. NB counters)
*/
> +struct amd_vpmu_context {
> +    uint64_t counters[XENPMU_AMD_MAX_COUNTERS];
> +    uint64_t ctrls[XENPMU_AMD_MAX_COUNTERS];
> +    uint8_t msr_bitmap_set;               /* Used by HVM only */
> +};
> +
> +/* Intel PMU registers and structures */
> +#define XENPMU_CORE2_MAX_ARCH_PMCS     16
> +#define XENPMU_CORE2_MAX_FIXED_PMCS    4
> +struct core2_vpmu_context {
> +    uint64_t global_ctrl;
> +    uint64_t global_ovf_ctrl;
> +    uint64_t global_status;
> +    uint64_t global_ovf_status;
> +    uint64_t fixed_ctrl;
> +    uint64_t ds_area;
> +    uint64_t pebs_enable;
> +    uint64_t debugctl;
What is debugctl for? I couldn''t find it in the other patches.
> +    uint64_t fix_counters[XENPMU_CORE2_MAX_FIXED_PMCS];
> +    struct {
> +        uint64_t counter;
> +        uint64_t control;
> +    } arch_msr_pair[XENPMU_CORE2_MAX_ARCH_PMCS];
> +};
> +
> +#define MAX(x, y)                 ((x) > (y) ? (x) : (y))
Maybe using MAX() from xen/kernel.h ?

Dietmar.
> +#define XENPMU_MAX_CTXT_SZ        MAX(sizeof(struct amd_vpmu_context),\
> +                                      sizeof(struct core2_vpmu_context))
> +#define XENPMU_CTXT_PAD_SZ        (((XENPMU_MAX_CTXT_SZ + 64) & ~63) +
128)
> +struct arch_xenpmu {
> +    union {
> +        struct cpu_user_regs regs;
> +        uint8_t pad2[256];
> +    };
> +    union {
> +        struct amd_vpmu_context amd;
> +        struct core2_vpmu_context intel;
> +        uint8_t pad1[XENPMU_CTXT_PAD_SZ];
> +    };
> +};
> +typedef struct arch_xenpmu arch_xenpmu_t;
> +
> +#endif /* __XEN_PUBLIC_ARCH_X86_PMU_H__ */
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> +
> diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h
> new file mode 100644
> index 0000000..fbacd7e
> --- /dev/null
> +++ b/xen/include/public/xenpmu.h
> @@ -0,0 +1,38 @@
> +#ifndef __XEN_PUBLIC_XENPMU_H__
> +#define __XEN_PUBLIC_XENPMU_H__
> +
> +#include "xen.h"
> +#if defined(__i386__) || defined(__x86_64__)
> +#include "arch-x86/xenpmu-x86.h"
> +#elif defined (__arm__) || defined (__aarch64__)
> +#include "arch-arm.h"
> +#else
> +#error "Unsupported architecture"
> +#endif
> +
> +#define XENPMU_VER_MAJ    0
> +#define XENPMU_VER_MIN    0
> +
> +
> +/* Shared between hypervisor and PV domain */
> +struct xenpmu_data {
> +    uint32_t domain_id;
> +    uint32_t vcpu_id;
> +    uint32_t pcpu_id;
> +    uint32_t pmu_flags;
> +
> +    arch_xenpmu_t pmu;
> +};
> +typedef struct xenpmu_data xenpmu_data_t;
> +
> +#endif /* __XEN_PUBLIC_XENPMU_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> -- 
Company details: http://ts.fujitsu.com/imprint.html

Jan Beulich

2013-Sep-23 13:16 UTC

head link

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

>>> On 23.09.13 at 15:04, Dietmar Hahn
<dietmar.hahn@ts.fujitsu.com> wrote:
> Am Freitag 20 September 2013, 05:42:05 schrieb Boris Ostrovsky:
>> +    uint64_t fix_counters[XENPMU_CORE2_MAX_FIXED_PMCS];
>> +    struct {
>> +        uint64_t counter;
>> +        uint64_t control;
>> +    } arch_msr_pair[XENPMU_CORE2_MAX_ARCH_PMCS];
>> +};
>> +
>> +#define MAX(x, y)                 ((x) > (y) ? (x) : (y))
> 
> Maybe using MAX() from xen/kernel.h ?
Certainly not - this is a public header; a definition of MAX() is as
misplaced here.

Jan

Boris Ostrovsky

2013-Sep-23 13:45 UTC

head link

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

On 09/23/2013 09:04 AM, Dietmar Hahn wrote:> Am Freitag 20 September 2013, 05:42:05 schrieb Boris Ostrovsky:
>> +
>> +/* Intel PMU registers and structures */
>> +#define XENPMU_CORE2_MAX_ARCH_PMCS     16
>> +#define XENPMU_CORE2_MAX_FIXED_PMCS    4
>> +struct core2_vpmu_context {
>> +    uint64_t global_ctrl;
>> +    uint64_t global_ovf_ctrl;
>> +    uint64_t global_status;
>> +    uint64_t global_ovf_status;
>> +    uint64_t fixed_ctrl;
>> +    uint64_t ds_area;
>> +    uint64_t pebs_enable;
>> +    uint64_t debugctl;
> What is debugctl for? I couldn''t find it in the other patches.
Right. I added it because I am pretty sure it will be needed to make BTS 
and PEBS work
for PV. And because this structure is part of the interface I thought I 
should put it in now.

-boris

Dietmar Hahn

2013-Sep-23 13:50 UTC

head link

Re: [PATCH v2 10/13] x86/PMU: Add support for PMU registes handling on PV guests

Am Freitag 20 September 2013, 05:42:09 schrieb Boris
Ostrovsky:> Intercept accesses to PMU MSRs and LVTPC APIC vector (only
> APIC_LVT_MASKED bit is processed) and process them in VPMU
> module.
> 
> Dump VPMU state for all domains (HVM and PV) when requested.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
>  xen/arch/x86/domain.c             |  3 +-
>  xen/arch/x86/hvm/vmx/vpmu_core2.c | 90
++++++++++++++++++++++++++++++---------
>  xen/arch/x86/hvm/vpmu.c           | 16 +++++++
>  xen/arch/x86/traps.c              | 39 ++++++++++++++++-
>  xen/include/public/xenpmu.h       |  1 +
>  5 files changed, 125 insertions(+), 24 deletions(-)
> 
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index e119d7b..36f4192 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1940,8 +1940,7 @@ void arch_dump_vcpu_info(struct vcpu *v)
>  {
>      paging_dump_vcpu_info(v);
>  
> -    if ( is_hvm_vcpu(v) )
> -        vpmu_dump(v);
> +    vpmu_dump(v);
>  }
>  
>  void domain_cpuid(
> diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c
b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> index 5726610..ebbb516 100644
> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -27,6 +27,7 @@
>  #include <asm/regs.h>
>  #include <asm/types.h>
>  #include <asm/apic.h>
> +#include <asm/traps.h>
>  #include <asm/msr.h>
>  #include <asm/msr-index.h>
>  #include <asm/hvm/support.h>
> @@ -281,6 +282,9 @@ static inline void __core2_vpmu_save(struct vcpu *v)
>          rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i,
core2_vpmu_cxt->fix_counters[i]);
>      for ( i = 0; i < arch_pmc_cnt; i++ )
>          rdmsrl(MSR_IA32_PERFCTR0 + i,
core2_vpmu_cxt->arch_msr_pair[i].counter);
> +
> +    if ( !is_hvm_domain(v->domain) )
> +        rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS,
core2_vpmu_cxt->global_status);
>  }
>  
>  static int core2_vpmu_save(struct vcpu *v)
> @@ -290,10 +294,14 @@ static int core2_vpmu_save(struct vcpu *v)
>      if ( !vpmu_is_set_all(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
>          return 0;
>  
> +    if ( !is_hvm_domain(v->domain) )
> +        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
> +
>      __core2_vpmu_save(v);
>  
>      /* Unset PMU MSR bitmap to trap lazy load. */
> -    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
cpu_has_vmx_msr_bitmap )
> +    if ( !vpmu_is_set(vpmu, VPMU_RUNNING) &&
cpu_has_vmx_msr_bitmap
> +        && is_hvm_domain(v->domain) )
>          core2_vpmu_unset_msr_bitmap(v->arch.hvm_vmx.msr_bitmap);
>  
>      return 1;
> @@ -315,6 +323,12 @@ static inline void __core2_vpmu_load(struct vcpu *v)
>  
>      for ( i = 0; i < arch_pmc_cnt; i++ )
>          wrmsrl(MSR_P6_EVNTSEL0+i,
core2_vpmu_cxt->arch_msr_pair[i].control);
> +
> +    if ( !is_hvm_domain(v->domain) )
> +    {
> +        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL,
core2_vpmu_cxt->global_ovf_ctrl);
> +        wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
> +    }
>  }
>  
>  static void core2_vpmu_load(struct vcpu *v)
> @@ -421,7 +435,12 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
uint64_t msr_content)
>                  if ( vpmu_is_set(vpmu, VPMU_CPU_HAS_BTS) )
>                      return 1;
>                  gdprintk(XENLOG_WARNING, "Debug Store is not
supported on this cpu\n");
> -                hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +
> +                if ( is_hvm_domain(v->domain) )
> +                    hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +                else
> +                    send_guest_trap(v->domain, v->vcpu_id,
TRAP_gp_fault);
Maybe use a macro or function for these 4 lines?
> +
>                  return 0;
>              }
>          }
> @@ -433,11 +452,15 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
uint64_t msr_content)
>      {
>      case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
>          core2_vpmu_cxt->global_ovf_status &= ~msr_content;
> +        core2_vpmu_cxt->global_ovf_ctrl = msr_content;
>          return 1;
>      case MSR_CORE_PERF_GLOBAL_STATUS:
>          gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
>                   "MSR_PERF_GLOBAL_STATUS(0x38E)!\n");
> -        hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +        if ( is_hvm_domain(v->domain) )
> +            hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +        else
> +            send_guest_trap(v->domain, v->vcpu_id, TRAP_gp_fault);
Macro/function?
>          return 1;
>      case MSR_IA32_PEBS_ENABLE:
>          if ( msr_content & 1 )
> @@ -453,7 +476,10 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
uint64_t msr_content)
>                  gdprintk(XENLOG_WARNING,
>                           "Illegal address for IA32_DS_AREA: %#"
PRIx64 "x\n",
>                           msr_content);
> -                hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +                if ( is_hvm_domain(v->domain) )
> +                    hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +                else
> +                    send_guest_trap(v->domain, v->vcpu_id,
TRAP_gp_fault);
Macro/function and trailing spaces?
>                  return 1;
>              }
>              core2_vpmu_cxt->ds_area = msr_content;
> @@ -478,10 +504,14 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
uint64_t msr_content)
>              non_global_ctrl >>= FIXED_CTR_CTRL_BITS;
>              global_ctrl >>= 1;
>          }
> +        core2_vpmu_cxt->global_ctrl = msr_content;
>          break;
>      case MSR_CORE_PERF_FIXED_CTR_CTRL:
>          non_global_ctrl = msr_content;
> -        vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
> +        if ( is_hvm_domain(v->domain) )
> +            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
&global_ctrl);
> +        else
> +            rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
>          global_ctrl >>= 32;
>          for ( i = 0; i < fixed_pmc_cnt; i++ )
>          {
> @@ -495,7 +525,10 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
uint64_t msr_content)
>          tmp = msr - MSR_P6_EVNTSEL0;
>          if ( tmp >= 0 && tmp < arch_pmc_cnt )
>          {
> -            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
&global_ctrl);
> +            if ( is_hvm_domain(v->domain) )
> +                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
&global_ctrl);
> +            else
> +                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
>              core2_vpmu_cxt->arch_msr_pair[tmp].control = msr_content;
>              for ( i = 0; i < arch_pmc_cnt && !pmu_enable; i++ )
>                  pmu_enable += (global_ctrl >> i) &
> @@ -509,17 +542,20 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
uint64_t msr_content)
>      else
>          vpmu_reset(vpmu, VPMU_RUNNING);
>  
> -    /* Setup LVTPC in local apic */
> -    if ( vpmu_is_set(vpmu, VPMU_RUNNING) &&
> -         is_vlapic_lvtpc_enabled(vcpu_vlapic(v)) )
> -    {
> -        apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR);
> -        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
> -    }
> -    else
> +    if ( is_hvm_domain(v->domain) )
>      {
> -        apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
> -        vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
> +        /* Setup LVTPC in local apic */
> +        if ( vpmu_is_set(vpmu, VPMU_RUNNING) &&
> +             is_vlapic_lvtpc_enabled(vcpu_vlapic(v)) )
> +        {
> +            apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR);
> +            vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR;
> +        }
> +        else
> +        {
> +            apic_write_around(APIC_LVTPC, PMU_APIC_VECTOR |
APIC_LVT_MASKED);
> +            vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | APIC_LVT_MASKED;
> +        }
>      }
>  
>      if ( type != MSR_TYPE_GLOBAL )
> @@ -547,13 +583,24 @@ static int core2_vpmu_do_wrmsr(unsigned int msr,
uint64_t msr_content)
>                  inject_gp = 1;
>              break;
>          }
> -        if (inject_gp)
> -            hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +
> +        if (inject_gp) 
> +        {
> +           if ( is_hvm_domain(v->domain) )
> +               hvm_inject_hw_exception(TRAP_gp_fault, 0);
> +           else
> +               send_guest_trap(v->domain, v->vcpu_id,
TRAP_gp_fault);
Macro/function?

Dietmar.
> +        }
>          else
>              wrmsrl(msr, msr_content);
>      }
>      else
> -        vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
> +    {
> +       if ( is_hvm_domain(v->domain) )
> +           vmx_write_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
> +       else
> +           wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
> +    }
>  
>      return 1;
>  }
> @@ -577,7 +624,10 @@ static int core2_vpmu_do_rdmsr(unsigned int msr,
uint64_t *msr_content)
>              *msr_content = core2_vpmu_cxt->global_ovf_status;
>              break;
>          case MSR_CORE_PERF_GLOBAL_CTRL:
> -            vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, msr_content);
> +            if ( is_hvm_domain(v->domain) )
> +                vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
msr_content);
> +            else
> +                rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, *msr_content);
>              break;
>          default:
>              rdmsrl(msr, *msr_content);
> diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
> index 69aaa7b..4638193 100644
> --- a/xen/arch/x86/hvm/vpmu.c
> +++ b/xen/arch/x86/hvm/vpmu.c
> @@ -70,6 +70,14 @@ static void __init parse_vpmu_param(char *s)
>      }
>  }
>  
> +static void vpmu_lvtpc_update(uint32_t val)
> +{
> +     struct vpmu_struct *vpmu = vcpu_vpmu(current);
> +
> +     vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val &
APIC_LVT_MASKED);
> +     apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
> +}
> +
>  int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content)
>  {
>      struct vpmu_struct *vpmu = vcpu_vpmu(current);
> @@ -428,6 +436,14 @@ long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(void)
arg)
>              return -EFAULT;
>          pvpmu_finish(current->domain, &pmu_params);
>          break;
> +
> +    case XENPMU_lvtpc_set:
> +        if ( copy_from_guest(&pmu_params, arg, 1) )
> +            return -EFAULT;
> +
> +        vpmu_lvtpc_update((uint32_t)pmu_params.val);
> +        ret = 0;
> +        break;
>      }
>  
>      return ret;
> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> index 57dbd0c..f378a24 100644
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -71,6 +71,7 @@
>  #include <asm/apic.h>
>  #include <asm/mc146818rtc.h>
>  #include <asm/hpet.h>
> +#include <asm/hvm/vpmu.h>
>  #include <public/arch-x86/cpuid.h>
>  #include <xsm/xsm.h>
>  
> @@ -871,7 +872,6 @@ static void pv_cpuid(struct cpu_user_regs *regs)
>          break;
>  
>      case 0x00000005: /* MONITOR/MWAIT */
> -    case 0x0000000a: /* Architectural Performance Monitor Features */
>      case 0x0000000b: /* Extended Topology Enumeration */
>      case 0x8000000a: /* SVM revision and features */
>      case 0x8000001b: /* Instruction Based Sampling */
> @@ -880,7 +880,9 @@ static void pv_cpuid(struct cpu_user_regs *regs)
>      unsupported:
>          a = b = c = d = 0;
>          break;
> -
> +    case 0x0000000a: /* Architectural Performance Monitor Features (Intel)
*/
> +        vpmu_do_cpuid(0xa, &a, &b, &c, &d);
> +        break;
>      default:
>          (void)cpuid_hypervisor_leaves(regs->eax, 0, &a, &b,
&c, &d);
>          break;
> @@ -2486,6 +2488,17 @@ static int emulate_privileged_op(struct
cpu_user_regs *regs)
>              if ( wrmsr_safe(regs->ecx, msr_content) != 0 )
>                  goto fail;
>              break;
> +        case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1:
> +        case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1:
> +        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
> +        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> +        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
> +            if ( !vpmu_do_wrmsr(regs->ecx, msr_content) )
> +            {
> +                if ( (vpmu_mode & XENPMU_MODE_PRIV) &&
(v->domain == dom0) )
> +                    goto invalid;
> +            }
> +            break;
>          default:
>              if ( wrmsr_hypervisor_regs(regs->ecx, msr_content) == 1 )
>                  break;
> @@ -2574,6 +2587,24 @@ static int emulate_privileged_op(struct
cpu_user_regs *regs)
>              regs->eax = (uint32_t)msr_content;
>              regs->edx = (uint32_t)(msr_content >> 32);
>              break;
> +        case MSR_IA32_PERF_CAPABILITIES:
> +            if ( rdmsr_safe(regs->ecx, msr_content) )
> +                goto fail;
> +            /* Full-Width Writes not supported */
> +            regs->eax = (uint32_t)msr_content & ~(1 << 13);
> +            regs->edx = (uint32_t)(msr_content >> 32);
> +            break;
> +        case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1:
> +        case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1:
> +        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
> +        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> +        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
> +            if ( vpmu_do_rdmsr(regs->ecx, &msr_content) ) {
> +                regs->eax = (uint32_t)msr_content;
> +                regs->edx = (uint32_t)(msr_content >> 32);
> +                break;
> +            }
> +            goto rdmsr_normal;
>          default:
>              if ( rdmsr_hypervisor_regs(regs->ecx, &val) )
>              {
> @@ -2606,6 +2637,10 @@ static int emulate_privileged_op(struct
cpu_user_regs *regs)
>          pv_cpuid(regs);
>          break;
>  
> +    case 0x33: /* RDPMC */
> +        rdpmc(regs->ecx, regs->eax, regs->edx);
> +        break;
> +
>      default:
>          goto fail;
>      }
> diff --git a/xen/include/public/xenpmu.h b/xen/include/public/xenpmu.h
> index ec49097..0060670 100644
> --- a/xen/include/public/xenpmu.h
> +++ b/xen/include/public/xenpmu.h
> @@ -27,6 +27,7 @@
>  #define XENPMU_flags_set       3
>  #define XENPMU_init            4
>  #define XENPMU_finish          5
> +#define XENPMU_lvtpc_set       6
>  /* ` } */
>  
>  /* Parameters structure for HYPERVISOR_xenpmu_op call */
> -- 
Company details: http://ts.fujitsu.com/imprint.html

Boris Ostrovsky

2013-Sep-23 14:00 UTC

head link

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

On 09/23/2013 09:16 AM, Jan Beulich wrote:>>>> On 23.09.13 at 15:04, Dietmar Hahn
<dietmar.hahn@ts.fujitsu.com> wrote:
>> Am Freitag 20 September 2013, 05:42:05 schrieb Boris Ostrovsky:
>>> +    uint64_t fix_counters[XENPMU_CORE2_MAX_FIXED_PMCS];
>>> +    struct {
>>> +        uint64_t counter;
>>> +        uint64_t control;
>>> +    } arch_msr_pair[XENPMU_CORE2_MAX_ARCH_PMCS];
>>> +};
>>> +
>>> +#define MAX(x, y)                 ((x) > (y) ? (x) : (y))
>> Maybe using MAX() from xen/kernel.h ?
> Certainly not - this is a public header; a definition of MAX() is as
> misplaced here.
>
Since I use MAX only once and only here I will just drop it and change 
XENPMU_MAX_CTXT_SZ which is

#define XENPMU_MAX_CTXT_SZ        MAX(sizeof(struct amd_vpmu_context),\
                                       sizeof(struct core2_vpmu_context))

into

#define XENPMU_MAX_CTXT_SZ       (sizeof(struct amd_vpmu_context) >\
sizeof(struct core2_vpmu_context) ?\
sizeof(struct amd_vpmu_context) :\
sizeof(struct core2_vpmu_context))

-boris

Konrad Rzeszutek Wilk

2013-Sep-23 19:42 UTC

head link

Re: [PATCH v2 01/13] Export hypervisor symbols

On Fri, Sep 20, 2013 at 05:42:00AM -0400, Boris Ostrovsky wrote:

I think the title needs a prefix of say: "xen/platform_op:" or such.
> Export Xen''s symbols in format similar to Linux''
/proc/kallsyms.
Which is?

I see in Linux:

000000000000bd58 D xen_cr3
...
000000000000e978 d soft_lockup_hrtimer_cnt
..
ffffffff8101cef0 T eager_fpu_init
ffffffff8101d010 t ptrace_triggered

I know that the first column is the EIP (thought some of them seem to be
based on some offset). The last one is pretty obvious too.

But the middle one:
konrad@phenom:~$ cat /proc/kallsyms  | awk ''{print $2}'' | sort
| uniq
b
B
d
D
r
R
t
T
V
W

Could you explain what those mean? And are they part of this?
Or does Xen not carry those?
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
>  xen/arch/x86/platform_hypercall.c        |  9 +++++
>  xen/arch/x86/x86_64/platform_hypercall.c |  2 +-
>  xen/common/symbols-dummy.c               |  1 +
>  xen/common/symbols.c                     | 58
++++++++++++++++++++++++++++++--
>  xen/include/public/platform.h            | 22 ++++++++++++
>  xen/include/xen/symbols.h                |  4 +++
>  xen/tools/symbols.c                      |  4 +++
>  7 files changed, 97 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/platform_hypercall.c
b/xen/arch/x86/platform_hypercall.c
> index 7175a82..39376fe 100644
> --- a/xen/arch/x86/platform_hypercall.c
> +++ b/xen/arch/x86/platform_hypercall.c
> @@ -23,6 +23,7 @@
>  #include <xen/cpu.h>
>  #include <xen/pmstat.h>
>  #include <xen/irq.h>
> +#include <xen/symbols.h>
>  #include <asm/current.h>
>  #include <public/platform.h>
>  #include <acpi/cpufreq/processor_perf.h>
> @@ -597,6 +598,14 @@ ret_t
do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>      }
>      break;
>  
> +    case XENPF_get_symbols:
> +    {
> +        ret = xensyms_read(&op->u.symdata);
> +        if ( ret >= 0 && __copy_field_to_guest(u_xenpf_op, op,
u.symdata) )
> +            ret = -EFAULT;
> +    }
> +    break;
> + 
>      default:
>          ret = -ENOSYS;
>          break;
> diff --git a/xen/arch/x86/x86_64/platform_hypercall.c
b/xen/arch/x86/x86_64/platform_hypercall.c
> index aa2ad54..9ef705a 100644
> --- a/xen/arch/x86/x86_64/platform_hypercall.c
> +++ b/xen/arch/x86/x86_64/platform_hypercall.c
> @@ -35,7 +35,7 @@ CHECK_pf_pcpu_version;
>  #undef xen_pf_pcpu_version
>  
>  #define xenpf_enter_acpi_sleep compat_pf_enter_acpi_sleep
> -
> +#define xenpf_symdata	compat_pf_symdata
>  #define COMPAT
>  #define _XEN_GUEST_HANDLE(t) XEN_GUEST_HANDLE(t)
>  #define _XEN_GUEST_HANDLE_PARAM(t) XEN_GUEST_HANDLE_PARAM(t)
> diff --git a/xen/common/symbols-dummy.c b/xen/common/symbols-dummy.c
> index 5090c3b..52a86c7 100644
> --- a/xen/common/symbols-dummy.c
> +++ b/xen/common/symbols-dummy.c
> @@ -12,6 +12,7 @@ const unsigned int symbols_offsets[1];
>  const unsigned long symbols_addresses[1];
>  #endif
>  const unsigned int symbols_num_syms;
> +const unsigned long symbols_names_bytes;
>  const u8 symbols_names[1];
>  
>  const u8 symbols_token_table[1];
> diff --git a/xen/common/symbols.c b/xen/common/symbols.c
> index 83b2b58..e74a585 100644
> --- a/xen/common/symbols.c
> +++ b/xen/common/symbols.c
> @@ -17,6 +17,8 @@
>  #include <xen/lib.h>
>  #include <xen/string.h>
>  #include <xen/spinlock.h>
> +#include <public/platform.h>
> +#include <xen/guest_access.h>
>  
>  #ifdef SYMBOLS_ORIGIN
>  extern const unsigned int symbols_offsets[1];
> @@ -26,6 +28,7 @@ extern const unsigned long symbols_addresses[];
>  #define symbols_address(n) symbols_addresses[n]
>  #endif
>  extern const unsigned int symbols_num_syms;
> +extern const unsigned long symbols_names_bytes;
>  extern const u8 symbols_names[];
>  
>  extern const u8 symbols_token_table[];
> @@ -35,7 +38,8 @@ extern const unsigned int symbols_markers[];
>  
>  /* expand a compressed symbol data into the resulting uncompressed string,
>     given the offset to where the symbol is in the compressed stream */
> -static unsigned int symbols_expand_symbol(unsigned int off, char *result)
> +static unsigned int symbols_expand_symbol(unsigned int off, char *result,
> +                                          int maxlen)
>  {
>      int len, skipped_first = 0;
>      const u8 *tptr, *data;
> @@ -49,6 +53,9 @@ static unsigned int symbols_expand_symbol(unsigned int
off, char *result)
>       * the compressed stream */
>      off += len + 1;
>  
> +    if (maxlen < len)
> +        len = maxlen;
> +
>      /* for every byte on the compressed symbol data, copy the table
>         entry for that byte */
>      while(len) {
> @@ -129,7 +136,7 @@ const char *symbols_lookup(unsigned long addr,
>          --low;
>  
>          /* Grab name */
> -    symbols_expand_symbol(get_symbol_offset(low), namebuf);
> +    symbols_expand_symbol(get_symbol_offset(low), namebuf,
sizeof(namebuf));
>  
>      /* Search for next non-aliased symbol */
>      for (i = low + 1; i < symbols_num_syms; i++) {
> @@ -174,3 +181,50 @@ void __print_symbol(const char *fmt, unsigned long
address)
>  
>      spin_unlock_irqrestore(&lock, flags);
>  }
> +
> +/*
> + * Get symbol type information. This is encoded as a single char at the
> + * beginning of the symbol name.
> + */
> +static char symbols_get_symbol_type(unsigned int off)
> +{
> +    /*
> +     * Get just the first code, look it up in the token table,
> +     * and return the first char from this token.
> +     */
> +    return symbols_token_table[symbols_token_index[symbols_names[off +
1]]];
> +}
> +
> +/*
> + * Symbols are most likely accessed sequentially so we remember position
from
> + * previous read. This can help us avoid extra call to
get_symbol_offset().
> + */
> +static uint64_t next_symbol, next_offset;
> +static DEFINE_SPINLOCK(symbols_mutex);
> +
> +int xensyms_read(struct xenpf_symdata *symdata)
> +{
> +    if ( symdata->xen_symnum > symbols_num_syms )
> +        return -EINVAL;
> +    else if ( symdata->xen_symnum == symbols_num_syms )
> +        return 0;
> +
> +    spin_lock(&symbols_mutex);
> +
> +    if ( symdata->xen_symnum == 0 )
> +        next_offset = next_symbol = 0;
> +    else if ( next_symbol != symdata->xen_symnum )
> +        /* Non-sequential access */
> +        next_offset = get_symbol_offset(symdata->xen_symnum);
> +
> +    symdata->type = symbols_get_symbol_type(next_offset);
> +    next_offset = symbols_expand_symbol(next_offset, symdata->name,
> +        sizeof(symdata->name));
> +    symdata->address = symbols_offsets[symdata->xen_symnum] +
SYMBOLS_ORIGIN;
> +
> +    next_symbol = symdata->xen_symnum + 1;
> +
> +    spin_unlock(&symbols_mutex);
> +
> +    return strlen(symdata->name);
> +}
> diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
> index 4341f54..870e14b 100644
> --- a/xen/include/public/platform.h
> +++ b/xen/include/public/platform.h
> @@ -527,6 +527,27 @@ struct xenpf_core_parking {
>  typedef struct xenpf_core_parking xenpf_core_parking_t;
>  DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
>  
> +#define XENPF_get_symbols  61
> +
> +struct xenpf_symdata {
> +    /* IN variables */
> +    uint64_t xen_symnum;
> +
> +    /* OUT variables */
> +    uint64_t address;
> +    uint64_t type;
> +    /*
> +     * KSYM_NAME_LEN is 128 bytes. However, we cannot be larger than pad
in
> +     * xen_platform_op below (which is 128 bytes as well). Since the
largest
> +     * symbol is around 50 bytes it''s probably more trouble than
it''s worth
> +     * to try to deal with symbols that are close to 128 bytes in length.
> +     */
> +#define XEN_SYMS_MAX_LEN (128 - 3 * 8)
> +    char name[XEN_SYMS_MAX_LEN];
> +};
> +typedef struct xenpf_symdata xenpf_symdata_t;
> +DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);
> +
>  /*
>   * ` enum neg_errnoval
>   * ` HYPERVISOR_platform_op(const struct xen_platform_op*);
> @@ -553,6 +574,7 @@ struct xen_platform_op {
>          struct xenpf_cpu_hotadd        cpu_add;
>          struct xenpf_mem_hotadd        mem_add;
>          struct xenpf_core_parking      core_parking;
> +        struct xenpf_symdata           symdata;
>          uint8_t                        pad[128];
>      } u;
>  };
> diff --git a/xen/include/xen/symbols.h b/xen/include/xen/symbols.h
> index 37cf6bf..c8df28f 100644
> --- a/xen/include/xen/symbols.h
> +++ b/xen/include/xen/symbols.h
> @@ -2,6 +2,8 @@
>  #define _XEN_SYMBOLS_H
>  
>  #include <xen/types.h>
> +#include <public/xen.h>
> +#include <public/platform.h>
>  
>  #define KSYM_NAME_LEN 127
>  
> @@ -34,4 +36,6 @@ do {						\
>  	__print_symbol(fmt, addr);		\
>  } while(0)
>  
> +extern int xensyms_read(struct xenpf_symdata *symdata);
> +
>  #endif /*_XEN_SYMBOLS_H*/
> diff --git a/xen/tools/symbols.c b/xen/tools/symbols.c
> index f39c906..818204d 100644
> --- a/xen/tools/symbols.c
> +++ b/xen/tools/symbols.c
> @@ -272,6 +272,10 @@ static void write_src(void)
>  	}
>  	printf("\n");
>  
> +	output_label("symbols_names_bytes");
> +	printf("\t.long\t%d\n", off);
> +	printf("\n");
> +
>  	output_label("symbols_markers");
>  	for (i = 0; i < ((table_cnt + 255) >> 8); i++)
>  		printf("\t.long\t%d\n", markers[i]);
> -- 
> 1.8.1.4
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2013-Sep-23 19:46 UTC

head link

Re: [PATCH v2 05/13] intel/VPMU: Clean up Intel VPMU code

On Fri, Sep 20, 2013 at 05:42:04AM -0400, Boris Ostrovsky
wrote:> Remove struct pmumsr and core2_pmu_enable. Replace static MSR structures
with
> fields in core2_vpmu_context.
> 
> Call core2_get_pmc_count() once, during initialization.
> 
> Properly clean up when core2_vpmu_alloc_resource() fails and add routines
> to remove MSRs from VMCS.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c              |  59 +++++++
>  xen/arch/x86/hvm/vmx/vpmu_core2.c        | 289
++++++++++++++-----------------
>  xen/include/asm-x86/hvm/vmx/vmcs.h       |   2 +
>  xen/include/asm-x86/hvm/vmx/vpmu_core2.h |  19 --
>  4 files changed, 191 insertions(+), 178 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index de9f592..756bc13 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1136,6 +1136,36 @@ int vmx_add_guest_msr(u32 msr)
>      return 0;
>  }
>  
> +void vmx_rm_guest_msr(u32 msr)
> +{
> +    struct vcpu *curr = current;
> +    unsigned int i, idx, msr_count = curr->arch.hvm_vmx.msr_count;
> +    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
> +
> +    if ( msr_area == NULL )
> +        return;
> +    
> +    for ( idx = 0; idx < msr_count; idx++ )
> +        if ( msr_area[idx].index == msr )
> +            break;
> +
> +    if ( idx == msr_count )
> +        return;
> +    
> +    for ( i = idx; i < msr_count - 1; i++ )
> +    {
> +        msr_area[i].index = msr_area[i + 1].index;
> +        rdmsrl(msr_area[i].index, msr_area[i].data);
> +    }
> +    msr_area[msr_count - 1].index = 0;
> +
> +    curr->arch.hvm_vmx.msr_count = --msr_count;
> +    __vmwrite(VM_EXIT_MSR_STORE_COUNT, msr_count);
> +    __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, msr_count);
> +
> +    return;
> +}
> +
>  int vmx_add_host_load_msr(u32 msr)
>  {
>      struct vcpu *curr = current;
> @@ -1166,6 +1196,35 @@ int vmx_add_host_load_msr(u32 msr)
>      return 0;
>  }
>  
> +void vmx_rm_host_load_msr(u32 msr)
> +{
> +    struct vcpu *curr = current;
> +    unsigned int i, idx,  msr_count =
curr->arch.hvm_vmx.host_msr_count;
> +    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.host_msr_area;
> +
> +    if ( msr_area == NULL )
> +        return;
> +
> +    for ( idx = 0; idx < msr_count; idx++ )
> +        if ( msr_area[idx].index == msr )
> +            break;
> +
> +    if ( idx == msr_count )
> +        return;
> +    
> +    for ( i = idx; i < msr_count - 1; i++ )
> +    {
> +        msr_area[i].index = msr_area[i + 1].index;
> +        rdmsrl(msr_area[i].index, msr_area[i].data);
> +    }
> +    msr_area[msr_count - 1].index = 0;    
> +
> +    curr->arch.hvm_vmx.host_msr_count = --msr_count;
> +    __vmwrite(VM_EXIT_MSR_LOAD_COUNT, msr_count);
> +
> +    return;
> +}
> +
>  void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector)
>  {
>      int index, offset, changed;
> diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c
b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> index 101888d..50f784f 100644
> --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
> +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
> @@ -65,6 +65,26 @@
>  #define PMU_FIXED_WIDTH_MASK     (((1 << PMU_FIXED_WIDTH_BITS) -1)
<< PMU_FIXED_WIDTH_SHIFT)
>  
>  /*
> + * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
> + * counters. 4 bits for every counter.
> + */
> +#define FIXED_CTR_CTRL_BITS 4
> +#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
> +
> +#define VPMU_CORE2_MAX_FIXED_PMCS     4
> +struct core2_vpmu_context {
> +    u64 fixed_ctrl;
> +    u64 ds_area;
> +    u64 pebs_enable;
> +    u64 global_ovf_status;
> +    u64 fix_counters[VPMU_CORE2_MAX_FIXED_PMCS];
> +    struct arch_msr_pair arch_msr_pair[1];
> +};
> +
> +static int arch_pmc_cnt; /* Number of general-purpose performance counters
*/
> +static int fixed_pmc_cnt; /* Number of fixed performance counters */
> +
> +/*
>   * QUIRK to workaround an issue on various family 6 cpus.
>   * The issue leads to endless PMC interrupt loops on the processor.
>   * If the interrupt handler is running and a pmc reaches the value 0, this
> @@ -84,11 +104,8 @@ static void check_pmc_quirk(void)
>          is_pmc_quirk = 0;    
>  }
>  
> -static int core2_get_pmc_count(void);
>  static void handle_pmc_quirk(u64 msr_content)
>  {
> -    int num_gen_pmc = core2_get_pmc_count();
> -    int num_fix_pmc  = 3;
>      int i;
>      u64 val;
>  
> @@ -96,7 +113,7 @@ static void handle_pmc_quirk(u64 msr_content)
>          return;
>  
>      val = msr_content;
> -    for ( i = 0; i < num_gen_pmc; i++ )
> +    for ( i = 0; i < arch_pmc_cnt; i++ )
>      {
>          if ( val & 0x1 )
>          {
> @@ -108,7 +125,7 @@ static void handle_pmc_quirk(u64 msr_content)
>          val >>= 1;
>      }
>      val = msr_content >> 32;
> -    for ( i = 0; i < num_fix_pmc; i++ )
> +    for ( i = 0; i < fixed_pmc_cnt; i++ )
>      {
>          if ( val & 0x1 )
>          {
> @@ -121,65 +138,32 @@ static void handle_pmc_quirk(u64 msr_content)
>      }
>  }
>  
> -static const u32 core2_fix_counters_msr[] = {
> -    MSR_CORE_PERF_FIXED_CTR0,
> -    MSR_CORE_PERF_FIXED_CTR1,
> -    MSR_CORE_PERF_FIXED_CTR2
> -};
> -
>  /*
> - * MSR_CORE_PERF_FIXED_CTR_CTRL contains the configuration of all fixed
> - * counters. 4 bits for every counter.
> + * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
>   */
> -#define FIXED_CTR_CTRL_BITS 4
> -#define FIXED_CTR_CTRL_MASK ((1 << FIXED_CTR_CTRL_BITS) - 1)
> -
> -/* The index into the core2_ctrls_msr[] of this MSR used in
core2_vpmu_dump() */
> -#define MSR_CORE_PERF_FIXED_CTR_CTRL_IDX 0
> -
> -/* Core 2 Non-architectual Performance Control MSRs. */
> -static const u32 core2_ctrls_msr[] = {
> -    MSR_CORE_PERF_FIXED_CTR_CTRL,
> -    MSR_IA32_PEBS_ENABLE,
> -    MSR_IA32_DS_AREA
> -};
> -
> -struct pmumsr {
> -    unsigned int num;
> -    const u32 *msr;
> -};
> -
> -static const struct pmumsr core2_fix_counters = {
> -    VPMU_CORE2_NUM_FIXED,
> -    core2_fix_counters_msr
> -};
> +static int core2_get_arch_pmc_count(void)
> +{
> +    u32 eax, ebx, ecx, edx;
>  
> -static const struct pmumsr core2_ctrls = {
> -    VPMU_CORE2_NUM_CTRLS,
> -    core2_ctrls_msr
> -};
> -static int arch_pmc_cnt;
> +    cpuid(0xa, &eax, &ebx, &ecx, &edx);
Could you use cpuid_eax ?
> +    return ( (eax & PMU_GENERAL_NR_MASK) >> PMU_GENERAL_NR_SHIFT
);
> +}
>  
>  /*
> - * Read the number of general counters via CPUID.EAX[0xa].EAX[8..15]
> + * Read the number of fixed counters via CPUID.EDX[0xa].EDX[0..4]
>   */
> -static int core2_get_pmc_count(void)
> +static int core2_get_fixed_pmc_count(void)
>  {
>      u32 eax, ebx, ecx, edx;
>  
> -    if ( arch_pmc_cnt == 0 )
> -    {
> -        cpuid(0xa, &eax, &ebx, &ecx, &edx);
> -        arch_pmc_cnt = (eax & PMU_GENERAL_NR_MASK) >>
PMU_GENERAL_NR_SHIFT;
> -    }
> -
> -    return arch_pmc_cnt;
> +    cpuid(0xa, &eax, &ebx, &ecx, &edx);
Ditto here.

Boris Ostrovsky

2013-Sep-23 20:06 UTC

head link

Re: [PATCH v2 01/13] Export hypervisor symbols

On 09/23/2013 03:42 PM, Konrad Rzeszutek Wilk wrote:> On Fri, Sep 20, 2013 at 05:42:00AM -0400, Boris Ostrovsky wrote:
>
> I think the title needs a prefix of say: "xen/platform_op:" or
such.
>
>> Export Xen''s symbols in format similar to Linux''
/proc/kallsyms.
> Which is?
>
> I see in Linux:
>
> 000000000000bd58 D xen_cr3
> ...
> 000000000000e978 d soft_lockup_hrtimer_cnt
> ..
> ffffffff8101cef0 T eager_fpu_init
> ffffffff8101d010 t ptrace_triggered
>
> I know that the first column is the EIP (thought some of them seem to be
> based on some offset). The last one is pretty obvious too.
>
> But the middle one:
> konrad@phenom:~$ cat /proc/kallsyms  | awk ''{print $2}'' |
sort | uniq
> b
> B
> d
> D
> r
> R
> t
> T
> V
> W
>
> Could you explain what those mean? And are they part of this?
> Or does Xen not carry those?
These are symbol types, described in ''man nm''. For Xen we
should only
see ''t'' and ''T'' which denote local or global
text symbol.

So the format would be
<address> <type> <symbol name>

In any case, this is not a comlpetely correct commit message, it really 
describes to v1 implementation. With v2 Xen will return these three 
values in a structure, which has nothing to do with *format* of how 
these symbols are presented. I''ll re-phrase it.

(And there other remnants of v1 in this patch that I just noticed, such 
as changes to xen/tools/symbols.c which are no longer needed)

-boris

>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> ---
>>   xen/arch/x86/platform_hypercall.c        |  9 +++++
>>   xen/arch/x86/x86_64/platform_hypercall.c |  2 +-
>>   xen/common/symbols-dummy.c               |  1 +
>>   xen/common/symbols.c                     | 58
++++++++++++++++++++++++++++++--
>>   xen/include/public/platform.h            | 22 ++++++++++++
>>   xen/include/xen/symbols.h                |  4 +++
>>   xen/tools/symbols.c                      |  4 +++
>>   7 files changed, 97 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/arch/x86/platform_hypercall.c
b/xen/arch/x86/platform_hypercall.c
>> index 7175a82..39376fe 100644
>> --- a/xen/arch/x86/platform_hypercall.c
>> +++ b/xen/arch/x86/platform_hypercall.c
>> @@ -23,6 +23,7 @@
>>   #include <xen/cpu.h>
>>   #include <xen/pmstat.h>
>>   #include <xen/irq.h>
>> +#include <xen/symbols.h>
>>   #include <asm/current.h>
>>   #include <public/platform.h>
>>   #include <acpi/cpufreq/processor_perf.h>
>> @@ -597,6 +598,14 @@ ret_t
do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
>>       }
>>       break;
>>   
>> +    case XENPF_get_symbols:
>> +    {
>> +        ret = xensyms_read(&op->u.symdata);
>> +        if ( ret >= 0 && __copy_field_to_guest(u_xenpf_op,
op, u.symdata) )
>> +            ret = -EFAULT;
>> +    }
>> +    break;
>> +
>>       default:
>>           ret = -ENOSYS;
>>           break;
>> diff --git a/xen/arch/x86/x86_64/platform_hypercall.c
b/xen/arch/x86/x86_64/platform_hypercall.c
>> index aa2ad54..9ef705a 100644
>> --- a/xen/arch/x86/x86_64/platform_hypercall.c
>> +++ b/xen/arch/x86/x86_64/platform_hypercall.c
>> @@ -35,7 +35,7 @@ CHECK_pf_pcpu_version;
>>   #undef xen_pf_pcpu_version
>>   
>>   #define xenpf_enter_acpi_sleep compat_pf_enter_acpi_sleep
>> -
>> +#define xenpf_symdata	compat_pf_symdata
>>   #define COMPAT
>>   #define _XEN_GUEST_HANDLE(t) XEN_GUEST_HANDLE(t)
>>   #define _XEN_GUEST_HANDLE_PARAM(t) XEN_GUEST_HANDLE_PARAM(t)
>> diff --git a/xen/common/symbols-dummy.c b/xen/common/symbols-dummy.c
>> index 5090c3b..52a86c7 100644
>> --- a/xen/common/symbols-dummy.c
>> +++ b/xen/common/symbols-dummy.c
>> @@ -12,6 +12,7 @@ const unsigned int symbols_offsets[1];
>>   const unsigned long symbols_addresses[1];
>>   #endif
>>   const unsigned int symbols_num_syms;
>> +const unsigned long symbols_names_bytes;
>>   const u8 symbols_names[1];
>>   
>>   const u8 symbols_token_table[1];
>> diff --git a/xen/common/symbols.c b/xen/common/symbols.c
>> index 83b2b58..e74a585 100644
>> --- a/xen/common/symbols.c
>> +++ b/xen/common/symbols.c
>> @@ -17,6 +17,8 @@
>>   #include <xen/lib.h>
>>   #include <xen/string.h>
>>   #include <xen/spinlock.h>
>> +#include <public/platform.h>
>> +#include <xen/guest_access.h>
>>   
>>   #ifdef SYMBOLS_ORIGIN
>>   extern const unsigned int symbols_offsets[1];
>> @@ -26,6 +28,7 @@ extern const unsigned long symbols_addresses[];
>>   #define symbols_address(n) symbols_addresses[n]
>>   #endif
>>   extern const unsigned int symbols_num_syms;
>> +extern const unsigned long symbols_names_bytes;
>>   extern const u8 symbols_names[];
>>   
>>   extern const u8 symbols_token_table[];
>> @@ -35,7 +38,8 @@ extern const unsigned int symbols_markers[];
>>   
>>   /* expand a compressed symbol data into the resulting uncompressed
string,
>>      given the offset to where the symbol is in the compressed stream
*/
>> -static unsigned int symbols_expand_symbol(unsigned int off, char
*result)
>> +static unsigned int symbols_expand_symbol(unsigned int off, char
*result,
>> +                                          int maxlen)
>>   {
>>       int len, skipped_first = 0;
>>       const u8 *tptr, *data;
>> @@ -49,6 +53,9 @@ static unsigned int symbols_expand_symbol(unsigned
int off, char *result)
>>        * the compressed stream */
>>       off += len + 1;
>>   
>> +    if (maxlen < len)
>> +        len = maxlen;
>> +
>>       /* for every byte on the compressed symbol data, copy the table
>>          entry for that byte */
>>       while(len) {
>> @@ -129,7 +136,7 @@ const char *symbols_lookup(unsigned long addr,
>>           --low;
>>   
>>           /* Grab name */
>> -    symbols_expand_symbol(get_symbol_offset(low), namebuf);
>> +    symbols_expand_symbol(get_symbol_offset(low), namebuf,
sizeof(namebuf));
>>   
>>       /* Search for next non-aliased symbol */
>>       for (i = low + 1; i < symbols_num_syms; i++) {
>> @@ -174,3 +181,50 @@ void __print_symbol(const char *fmt, unsigned long
address)
>>   
>>       spin_unlock_irqrestore(&lock, flags);
>>   }
>> +
>> +/*
>> + * Get symbol type information. This is encoded as a single char at
the
>> + * beginning of the symbol name.
>> + */
>> +static char symbols_get_symbol_type(unsigned int off)
>> +{
>> +    /*
>> +     * Get just the first code, look it up in the token table,
>> +     * and return the first char from this token.
>> +     */
>> +    return symbols_token_table[symbols_token_index[symbols_names[off +
1]]];
>> +}
>> +
>> +/*
>> + * Symbols are most likely accessed sequentially so we remember
position from
>> + * previous read. This can help us avoid extra call to
get_symbol_offset().
>> + */
>> +static uint64_t next_symbol, next_offset;
>> +static DEFINE_SPINLOCK(symbols_mutex);
>> +
>> +int xensyms_read(struct xenpf_symdata *symdata)
>> +{
>> +    if ( symdata->xen_symnum > symbols_num_syms )
>> +        return -EINVAL;
>> +    else if ( symdata->xen_symnum == symbols_num_syms )
>> +        return 0;
>> +
>> +    spin_lock(&symbols_mutex);
>> +
>> +    if ( symdata->xen_symnum == 0 )
>> +        next_offset = next_symbol = 0;
>> +    else if ( next_symbol != symdata->xen_symnum )
>> +        /* Non-sequential access */
>> +        next_offset = get_symbol_offset(symdata->xen_symnum);
>> +
>> +    symdata->type = symbols_get_symbol_type(next_offset);
>> +    next_offset = symbols_expand_symbol(next_offset, symdata->name,
>> +        sizeof(symdata->name));
>> +    symdata->address = symbols_offsets[symdata->xen_symnum] +
SYMBOLS_ORIGIN;
>> +
>> +    next_symbol = symdata->xen_symnum + 1;
>> +
>> +    spin_unlock(&symbols_mutex);
>> +
>> +    return strlen(symdata->name);
>> +}
>> diff --git a/xen/include/public/platform.h
b/xen/include/public/platform.h
>> index 4341f54..870e14b 100644
>> --- a/xen/include/public/platform.h
>> +++ b/xen/include/public/platform.h
>> @@ -527,6 +527,27 @@ struct xenpf_core_parking {
>>   typedef struct xenpf_core_parking xenpf_core_parking_t;
>>   DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
>>   
>> +#define XENPF_get_symbols  61
>> +
>> +struct xenpf_symdata {
>> +    /* IN variables */
>> +    uint64_t xen_symnum;
>> +
>> +    /* OUT variables */
>> +    uint64_t address;
>> +    uint64_t type;
>> +    /*
>> +     * KSYM_NAME_LEN is 128 bytes. However, we cannot be larger than
pad in
>> +     * xen_platform_op below (which is 128 bytes as well). Since the
largest
>> +     * symbol is around 50 bytes it''s probably more trouble
than it''s worth
>> +     * to try to deal with symbols that are close to 128 bytes in
length.
>> +     */
>> +#define XEN_SYMS_MAX_LEN (128 - 3 * 8)
>> +    char name[XEN_SYMS_MAX_LEN];
>> +};
>> +typedef struct xenpf_symdata xenpf_symdata_t;
>> +DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);
>> +
>>   /*
>>    * ` enum neg_errnoval
>>    * ` HYPERVISOR_platform_op(const struct xen_platform_op*);
>> @@ -553,6 +574,7 @@ struct xen_platform_op {
>>           struct xenpf_cpu_hotadd        cpu_add;
>>           struct xenpf_mem_hotadd        mem_add;
>>           struct xenpf_core_parking      core_parking;
>> +        struct xenpf_symdata           symdata;
>>           uint8_t                        pad[128];
>>       } u;
>>   };
>> diff --git a/xen/include/xen/symbols.h b/xen/include/xen/symbols.h
>> index 37cf6bf..c8df28f 100644
>> --- a/xen/include/xen/symbols.h
>> +++ b/xen/include/xen/symbols.h
>> @@ -2,6 +2,8 @@
>>   #define _XEN_SYMBOLS_H
>>   
>>   #include <xen/types.h>
>> +#include <public/xen.h>
>> +#include <public/platform.h>
>>   
>>   #define KSYM_NAME_LEN 127
>>   
>> @@ -34,4 +36,6 @@ do {						\
>>   	__print_symbol(fmt, addr);		\
>>   } while(0)
>>   
>> +extern int xensyms_read(struct xenpf_symdata *symdata);
>> +
>>   #endif /*_XEN_SYMBOLS_H*/
>> diff --git a/xen/tools/symbols.c b/xen/tools/symbols.c
>> index f39c906..818204d 100644
>> --- a/xen/tools/symbols.c
>> +++ b/xen/tools/symbols.c
>> @@ -272,6 +272,10 @@ static void write_src(void)
>>   	}
>>   	printf("\n");
>>   
>> +	output_label("symbols_names_bytes");
>> +	printf("\t.long\t%d\n", off);
>> +	printf("\n");
>> +
>>   	output_label("symbols_markers");
>>   	for (i = 0; i < ((table_cnt + 255) >> 8); i++)
>>   		printf("\t.long\t%d\n", markers[i]);
>> -- 
>> 1.8.1.4
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2013-Sep-24 17:40 UTC

head link

Re: [PATCH v2 01/13] Export hypervisor symbols

On Mon, Sep 23, 2013 at 04:06:46PM -0400, Boris Ostrovsky
wrote:> On 09/23/2013 03:42 PM, Konrad Rzeszutek Wilk wrote:
> >On Fri, Sep 20, 2013 at 05:42:00AM -0400, Boris Ostrovsky wrote:
> >
> >I think the title needs a prefix of say: "xen/platform_op:"
or such.
> >
> >>Export Xen''s symbols in format similar to Linux''
/proc/kallsyms.
> >Which is?
> >
> >I see in Linux:
> >
> >000000000000bd58 D xen_cr3
> >...
> >000000000000e978 d soft_lockup_hrtimer_cnt
> >..
> >ffffffff8101cef0 T eager_fpu_init
> >ffffffff8101d010 t ptrace_triggered
> >
> >I know that the first column is the EIP (thought some of them seem to
be
> >based on some offset). The last one is pretty obvious too.
> >
> >But the middle one:
> >konrad@phenom:~$ cat /proc/kallsyms  | awk ''{print
$2}'' | sort | uniq
> >b
> >B
> >d
> >D
> >r
> >R
> >t
> >T
> >V
> >W
> >
> >Could you explain what those mean? And are they part of this?
> >Or does Xen not carry those?
> 
> These are symbol types, described in ''man nm''. For Xen we
should
<sigh> I was doing ''man kallsyms'' which of course would
not give me that.

> only see ''t'' and ''T'' which denote local
or global text symbol.
> 
> So the format would be
> <address> <type> <symbol name>
> 
> In any case, this is not a comlpetely correct commit message, it
> really describes to v1 implementation. With v2 Xen will return these
> three values in a structure, which has nothing to do with *format*
> of how these symbols are presented. I''ll re-phrase it.
Thank you. Sorry for the noise - it didn''t occur for me to check
''nm'' :-(
> 
> (And there other remnants of v1 in this patch that I just noticed,
> such as changes to xen/tools/symbols.c which are no longer needed)

Jan Beulich

2013-Sep-25 13:15 UTC

head link

Re: [PATCH v2 01/13] Export hypervisor symbols

>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> +    case XENPF_get_symbols:
> +    {
> +        ret = xensyms_read(&op->u.symdata);
> +        if ( ret >= 0 && __copy_field_to_guest(u_xenpf_op, op,
u.symdata) )
> +            ret = -EFAULT;
> +    }
> +    break;
This yields a positive return value if a symbol was found, 0 if none
was found, and negative on error. Can we avoid this non-standard
first aspect?
> --- a/xen/arch/x86/x86_64/platform_hypercall.c
> +++ b/xen/arch/x86/x86_64/platform_hypercall.c
> @@ -35,7 +35,7 @@ CHECK_pf_pcpu_version;
>  #undef xen_pf_pcpu_version
>  
>  #define xenpf_enter_acpi_sleep compat_pf_enter_acpi_sleep
> -
Please retain this blank line.
> +#define xenpf_symdata	compat_pf_symdata
This needs to be accompanied by an entry in xen/include/xlat.lst,
and a respective CHECK_ invocation. I admit that the line
immediately above is incomplete too, and hence served as a bad
example (I''m preparing a fix as I write this).
> +static uint64_t next_symbol, next_offset;
> +static DEFINE_SPINLOCK(symbols_mutex);
> +
> +int xensyms_read(struct xenpf_symdata *symdata)
> +{
> +    if ( symdata->xen_symnum > symbols_num_syms )
> +        return -EINVAL;
This should be a more specific error code (-ERANGE perhaps).
> +    else if ( symdata->xen_symnum == symbols_num_syms )
Pointless "else".
> +        return 0;
> +
> +    spin_lock(&symbols_mutex);
> +
> +    if ( symdata->xen_symnum == 0 )
> +        next_offset = next_symbol = 0;
> +    else if ( next_symbol != symdata->xen_symnum )
> +        /* Non-sequential access */
> +        next_offset = get_symbol_offset(symdata->xen_symnum);
> +
> +    symdata->type = symbols_get_symbol_type(next_offset);
> +    next_offset = symbols_expand_symbol(next_offset, symdata->name,
> +        sizeof(symdata->name));
> +    symdata->address = symbols_offsets[symdata->xen_symnum] +
SYMBOLS_ORIGIN;
> +
> +    next_symbol = symdata->xen_symnum + 1;
> +
> +    spin_unlock(&symbols_mutex);
> +
> +    return strlen(symdata->name);
Altogether the changes you do appear to allow the nul terminator
to be written outside of the passed in array. Hence (a) you need
to avoid corrupting memory and (b) whether using strlen() here is
appropriate depends on how you deal with (a).
> --- a/xen/include/public/platform.h
> +++ b/xen/include/public/platform.h
> @@ -527,6 +527,27 @@ struct xenpf_core_parking {
>  typedef struct xenpf_core_parking xenpf_core_parking_t;
>  DEFINE_XEN_GUEST_HANDLE(xenpf_core_parking_t);
>  
> +#define XENPF_get_symbols  61
Now that you retrieve them one at a time, perhaps better
XENPF_get_symbol?
> +
> +struct xenpf_symdata {
> +    /* IN variables */
> +    uint64_t xen_symnum;
> +
> +    /* OUT variables */
> +    uint64_t address;
> +    uint64_t type;
"type" and "xen_symnum" could easily be less than 64 bits
wide.

Also, what''s the point of the "xen_" prefix in the latter?
> +    /*
> +     * KSYM_NAME_LEN is 128 bytes. However, we cannot be larger than pad
in
> +     * xen_platform_op below (which is 128 bytes as well). Since the
largest
> +     * symbol is around 50 bytes it''s probably more trouble than
it''s worth
> +     * to try to deal with symbols that are close to 128 bytes in length.
> +     */
> +#define XEN_SYMS_MAX_LEN (128 - 3 * 8)
> +    char name[XEN_SYMS_MAX_LEN];
No, this ought to be a handle pointing to a char array. Symbols
alone might be only around 50 bytes, but the moment we get to
the point of prefixing static symbols with their file names this will
break.

Jan

Jan Beulich

2013-Sep-25 13:42 UTC

head link

Re: [PATCH v2 02/13] Set VCPU''s is_running flag closer to when the VCPU is dispatched

>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> An interrupt handler happening during new VCPU scheduling may want to know
> who was on the (physical) processor at the point of the interrupt. Just
> looking at ''current'' may not be accurate since there is a
window of time when
> ''current'' points to new VCPU and its is_running flag is
set but the VCPU has
> not been dispatched yet. More importantly, on Intel processors, if the
handler
> wants to examine certain state of an HVM VCPU (such as segment registers)
the
> VMCS pointer is not set yet.
> 
> This patch will move setting the is_running flag to a later point.
As said on v1 already - I''m all but convinced that this is not going to
break something in subtle ways. Without you assuring us that you
carefully inspected _all_ uses of this flag, I don''t think this can go
in.

Jan

Jan Beulich

2013-Sep-25 13:55 UTC

head link

Re: [PATCH v2 05/13] intel/VPMU: Clean up Intel VPMU code

>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> +void vmx_rm_guest_msr(u32 msr)
> +{
> +    struct vcpu *curr = current;
> +    unsigned int i, idx, msr_count = curr->arch.hvm_vmx.msr_count;
> +    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
> +
> +    if ( msr_area == NULL )
> +        return;
> +    
> +    for ( idx = 0; idx < msr_count; idx++ )
> +        if ( msr_area[idx].index == msr )
> +            break;
> +
> +    if ( idx == msr_count )
> +        return;
> +    
> +    for ( i = idx; i < msr_count - 1; i++ )
"idx" not being used further down anymore, why do you need
another loop variable here?
> +    {
> +        msr_area[i].index = msr_area[i + 1].index;
> +        rdmsrl(msr_area[i].index, msr_area[i].data);
This is clearly a side effect of the function call no-one would
expect. Why do you do this?
> +    }
> +    msr_area[msr_count - 1].index = 0;
> +
> +    curr->arch.hvm_vmx.msr_count = --msr_count;
> +    __vmwrite(VM_EXIT_MSR_STORE_COUNT, msr_count);
> +    __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, msr_count);
> +
> +    return;
Pointless "return".

All the same comments apply to vmx_rm_host_load_msr().
> +static int arch_pmc_cnt; /* Number of general-purpose performance counters
*/
> +static int fixed_pmc_cnt; /* Number of fixed performance counters */
unsigned? __read_mostly?
> @@ -248,13 +230,13 @@ static void core2_vpmu_set_msr_bitmap(unsigned long
*msr_bitmap)
>      int i;
>  
>      /* Allow Read/Write PMU Counters MSR Directly. */
> -    for ( i = 0; i < core2_fix_counters.num; i++ )
> +    for ( i = 0; i < fixed_pmc_cnt; i++ )
>      {
> -        clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
msr_bitmap);
> -        clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
> +        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
msr_bitmap);
> +        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
Dropping the static array will make the handling here quite a bit more
complicated should there ever appear a second dis-contiguous MSR
range.
> @@ -262,32 +244,37 @@ static void core2_vpmu_set_msr_bitmap(unsigned long
*msr_bitmap)
>      }
>  
>      /* Allow Read PMU Non-global Controls Directly. */
> -    for ( i = 0; i < core2_ctrls.num; i++ )
> -        clear_bit(msraddr_to_bitpos(core2_ctrls.msr[i]), msr_bitmap);
> -    for ( i = 0; i < core2_get_pmc_count(); i++ )
> +    for ( i = 0; i < arch_pmc_cnt; i++ )
>          clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap);
> +
> +    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL),
msr_bitmap);
> +    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
> +    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
As you can see, this is already the case here.

The patch description doesn''t really say _why_ you do this.
Jan

Boris Ostrovsky

2013-Sep-25 14:03 UTC

head link

Re: [PATCH v2 01/13] Export hypervisor symbols

On 09/25/2013 09:15 AM, Jan Beulich wrote:>>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>> +    case XENPF_get_symbols:
>> +    {
>> +        ret = xensyms_read(&op->u.symdata);
>> +        if ( ret >= 0 && __copy_field_to_guest(u_xenpf_op,
op, u.symdata) )
>> +            ret = -EFAULT;
>> +    }
>> +    break;
> This yields a positive return value if a symbol was found, 0 if none
> was found, and negative on error. Can we avoid this non-standard
> first aspect?
We need to know on the caller side when EOF is reached. There is no good
error code that I can see that would be appropriate here. ERANGE or ENFILE
are the closest I can imagine but EOF is not really an error so I am not 
sure
this would be the right thing.

We could look at first byte of the string and see where it''s a 0 but 
that''s also
somewhat non-standard. Encoding type or address as an invalid token also
doesn''t look nice.
>
>> +        return 0;
>> +
>> +    spin_lock(&symbols_mutex);
>> +
>> +    if ( symdata->xen_symnum == 0 )
>> +        next_offset = next_symbol = 0;
>> +    else if ( next_symbol != symdata->xen_symnum )
>> +        /* Non-sequential access */
>> +        next_offset = get_symbol_offset(symdata->xen_symnum);
>> +
>> +    symdata->type = symbols_get_symbol_type(next_offset);
>> +    next_offset = symbols_expand_symbol(next_offset, symdata->name,
>> +        sizeof(symdata->name));
>> +    symdata->address = symbols_offsets[symdata->xen_symnum] +
SYMBOLS_ORIGIN;
>> +
>> +    next_symbol = symdata->xen_symnum + 1;
>> +
>> +    spin_unlock(&symbols_mutex);
>> +
>> +    return strlen(symdata->name);
> Altogether the changes you do appear to allow the nul terminator
> to be written outside of the passed in array. Hence (a) you need
> to avoid corrupting memory and (b) whether using strlen() here is
> appropriate depends on how you deal with (a).
Yes, I should have passed ''sizeof(symdata->name)-1'' to 
symbols_expand_symbol()
(I was thinking of strlen instead of sizeof)
>> +
>> +struct xenpf_symdata {
>> +    /* IN variables */
>> +    uint64_t xen_symnum;
>> +
>> +    /* OUT variables */
>> +    uint64_t address;
>> +    uint64_t type;
> "type" and "xen_symnum" could easily be less than 64
bits wide.
I am trying to avoid 32-bit compatibility issues here. You will see 
sometimes
unnecessary uint64_t in other structures well for the same reason.


-boris

Jan Beulich

2013-Sep-25 14:04 UTC

head link

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> +struct amd_vpmu_context {
> +    uint64_t counters[XENPMU_AMD_MAX_COUNTERS];
> +    uint64_t ctrls[XENPMU_AMD_MAX_COUNTERS];
> +    uint8_t msr_bitmap_set;               /* Used by HVM only */
> +};
sizeof() this will not be the same for this in a 64-bit an a 32-bit guest.
Are you intentionally creating a need for translation here?
> +
> +/* Intel PMU registers and structures */
> +#define XENPMU_CORE2_MAX_ARCH_PMCS     16
> +#define XENPMU_CORE2_MAX_FIXED_PMCS    4
> +struct core2_vpmu_context {
> +    uint64_t global_ctrl;
> +    uint64_t global_ovf_ctrl;
> +    uint64_t global_status;
> +    uint64_t global_ovf_status;
> +    uint64_t fixed_ctrl;
> +    uint64_t ds_area;
> +    uint64_t pebs_enable;
> +    uint64_t debugctl;
> +    uint64_t fix_counters[XENPMU_CORE2_MAX_FIXED_PMCS];
> +    struct {
> +        uint64_t counter;
> +        uint64_t control;
> +    } arch_msr_pair[XENPMU_CORE2_MAX_ARCH_PMCS];
> +};
I realize that using embedded arrays in both AMD and Intel
structures makes things easier to implement, but it reduces
forward compatibility. I''d therefore prefer those to be made
handles.
> +#define MAX(x, y)                 ((x) > (y) ? (x) : (y))
I think we already agreed in the context of someone else''s reply
that this has to go away.
> +struct arch_xenpmu {
> +    union {
> +        struct cpu_user_regs regs;
Oh, so you need to do translation for 32-bit guests anyway...
> --- /dev/null
> +++ b/xen/include/public/xenpmu.h
> @@ -0,0 +1,38 @@
> +#ifndef __XEN_PUBLIC_XENPMU_H__
> +#define __XEN_PUBLIC_XENPMU_H__
> +
> +#include "xen.h"
> +#if defined(__i386__) || defined(__x86_64__)
> +#include "arch-x86/xenpmu-x86.h"
> +#elif defined (__arm__) || defined (__aarch64__)
> +#include "arch-arm.h"
???

Jan

Jan Beulich

2013-Sep-25 14:05 UTC

head link

Re: [PATCH v2 07/13] x86/PMU: Make vpmu not HVM-specific

>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> --- a/xen/include/asm-x86/hvm/vpmu.h
> +++ b/xen/include/asm-x86/hvm/vpmu.h
> @@ -31,9 +31,9 @@
>  #define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
>  #define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
>  
> -#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
> +#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.vpmu))
>  #define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
> -                                          arch.hvm_vcpu.vpmu))
> +                                          arch.vpmu))
This clearly needs to be moved out of a HVM specific header the too.

Jan

Keir Fraser

2013-Sep-25 14:08 UTC

head link

Re: [PATCH v2 02/13] Set VCPU''s is_running flag closer to when the VCPU is dispatched

On 25/09/2013 14:42, "Jan Beulich" <JBeulich@suse.com> wrote:
>>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>> An interrupt handler happening during new VCPU scheduling may want to
know
>> who was on the (physical) processor at the point of the interrupt. Just
>> looking at ''current'' may not be accurate since there
is a window of time when
>> ''current'' points to new VCPU and its is_running flag
is set but the VCPU has
>> not been dispatched yet. More importantly, on Intel processors, if the
>> handler
>> wants to examine certain state of an HVM VCPU (such as segment
registers) the
>> VMCS pointer is not set yet.
>> 
>> This patch will move setting the is_running flag to a later point.
> 
> As said on v1 already - I''m all but convinced that this is not
going to
> break something in subtle ways. Without you assuring us that you
> carefully inspected _all_ uses of this flag, I don''t think this
can go in.
It''s very definitely not safe. It would break vcpu_sleep_sync() for
example
-- which depends on ->is_running being set inside the scheduler lock.
Otherwise, if vcpu_sleep_nosync() occurs immediately after dropping the lock
in schedule() then vcpu_sleep_sync() can see !v->is_running even though v is
in the process of being activated to run.

It''s a subtle flag, messing with it is unlikely ever to be the right
answer.

 -- Keir
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Jan Beulich

2013-Sep-25 14:11 UTC

head link

Re: [PATCH v2 08/13] x86/PMU: Interface for setting PMU mode and flags

>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> --- a/xen/include/public/xen.h
> +++ b/xen/include/public/xen.h
> @@ -101,6 +101,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
>  #define __HYPERVISOR_kexec_op             37
>  #define __HYPERVISOR_tmem_op              38
>  #define __HYPERVISOR_xc_reserved_op       39 /* reserved for XenClient */
> +#define __HYPERVISOR_xenpmu_op            40
>  
>  /* Architecture-specific hypercall definitions. */
>  #define __HYPERVISOR_arch_0               48
I wonder whether you wouldn''t better use an arch-specific hypercall
number here - there''s really very little that''s generic in
what I have
seen so far.
> +/* Parameters structure for HYPERVISOR_xenpmu_op call */
> +struct xenpmu_params {
> +    /* IN/OUT parameters */
> +    union {
> +        struct version {
> +            uint8_t maj;
> +            uint8_t min;
> +        } version;
> +        uint64_t pad;
> +    };
> +    union {
> +        uint64_t val;
> +        void *valp;
Without there also being a handle here I can''t see how you could
make use of the pointer.
> +    };
> +
> +    /* IN parameters */
> +    uint64_t vcpu;
Do you really want a 64-bit quantity here?
> @@ -139,6 +140,9 @@ do_tmem_op(
>  extern long
>  do_xenoprof_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
>  
> +extern long
> +do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(void) arg);
This seems wrong - the interface above makes it so that you would
only ever pass this a struct xenpmu_params, so the handle should
be of that kind from the beginning.

Jan

Jan Beulich

2013-Sep-25 14:23 UTC

head link

Re: [PATCH v2 10/13] x86/PMU: Add support for PMU registes handling on PV guests

>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> Intercept accesses to PMU MSRs and LVTPC APIC vector (only
> APIC_LVT_MASKED bit is processed) and process them in VPMU
> module.
Having scrolled through this more than once, I still can''t see where
any APIC interception is happening here.
> @@ -2486,6 +2488,17 @@ static int emulate_privileged_op(struct
cpu_user_regs *regs)
>              if ( wrmsr_safe(regs->ecx, msr_content) != 0 )
>                  goto fail;
>              break;
> +        case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1:
> +        case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1:
> +        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
> +        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> +        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
> +            if ( !vpmu_do_wrmsr(regs->ecx, msr_content) )
> +            {
> +                if ( (vpmu_mode & XENPMU_MODE_PRIV) &&
(v->domain == dom0) )
This is identical to checking ->dom_id against zero, yet we started
moving away from that model.
> @@ -2574,6 +2587,24 @@ static int emulate_privileged_op(struct
cpu_user_regs *regs)
>              regs->eax = (uint32_t)msr_content;
>              regs->edx = (uint32_t)(msr_content >> 32);
>              break;
> +        case MSR_IA32_PERF_CAPABILITIES:
> +            if ( rdmsr_safe(regs->ecx, msr_content) )
> +                goto fail;
> +            /* Full-Width Writes not supported */
> +            regs->eax = (uint32_t)msr_content & ~(1 << 13);
> +            regs->edx = (uint32_t)(msr_content >> 32);
Rather than black listing, please white list know good features
here.
> +            break;
> +        case MSR_P6_PERFCTR0...MSR_P6_PERFCTR1:
> +        case MSR_P6_EVNTSEL0...MSR_P6_EVNTSEL1:
> +        case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
> +        case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> +        case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
> +            if ( vpmu_do_rdmsr(regs->ecx, &msr_content) ) {
Coding style.
> +    case 0x33: /* RDPMC */
> +        rdpmc(regs->ecx, regs->eax, regs->edx);
> +        break;
This will #GP on invalid counter index, i.e. you''re creating a DoS
here.

Jan

Jan Beulich

2013-Sep-25 14:33 UTC

head link

Re: [PATCH v2 11/13] x86/PMU: Handle PMU interrupts for PV guests

>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> Add support for handling PMU interrupts for PV guests, make these
interrupts
> NMI instead of PMU_APIC_VECTOR vector. Depending on vpmu_mode forward the
> interrupts to appropriate guest (mode is VPMU_ON) or to dom0 (VPMU_DOM0).
Is using NMIs here a necessity? I guess not, in which case I''d really
like this to be a (perhaps even non-default) option controllable via
command line option.
> - * This interrupt handles performance counters interrupt
> - */
> -
> -void pmu_apic_interrupt(struct cpu_user_regs *regs)
> -{
> -    ack_APIC_irq();
> -    vpmu_do_interrupt(regs);
> -}
So this was the only caller of vpmu_do_interrupt(); no new one gets
added in this patch afaics, and I don''t recall having seen addition of
another caller in earlier patches. What''s the deal?
> @@ -99,17 +106,97 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t
*msr_content)
>  int vpmu_do_interrupt(struct cpu_user_regs *regs)
>  {
>      struct vcpu *v = current;
> -    struct vpmu_struct *vpmu = vcpu_vpmu(v);
> +    struct vpmu_struct *vpmu;
>  
> -    if ( vpmu->arch_vpmu_ops )
> +
> +    /* dom0 will handle this interrupt */
> +    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
> +        (v->domain->domain_id >= DOMID_FIRST_RESERVED) )
> +    {
> +            if ( smp_processor_id() >= dom0->max_vcpus )
> +                return 0;
> +            v = dom0->vcpu[smp_processor_id()];
> +    }
> +
> +    vpmu = vcpu_vpmu(v);
> +    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
> +        return 0;
> +
> +    if ( !is_hvm_domain(v->domain) || (vpmu_mode &
XENPMU_MODE_PRIV) )
> +    {
> +        /* PV guest or dom0 is doing system profiling */
> +        void *p;
> +        struct cpu_user_regs *gregs;
> +
> +        p = &v->arch.vpmu.xenpmu_data->pmu.regs;
> +
> +        /* PV guest will be reading PMU MSRs from xenpmu_data */
> +        vpmu_save_force(v);
> +
> +        /* Store appropriate registers in xenpmu_data
> +         *
> +         * Note: ''!current->is_running'' is possible
when ''set_current(next)''
> +         * for the (HVM) guest has been called but
''reset_stack_and_jump()''
> +         * has not (i.e. the guest is not actually running yet).
> +         */
> +        if ( !is_hvm_domain(current->domain) ||
> +             ((vpmu_mode & XENPMU_MODE_PRIV) &&
!current->is_running) )
> +        {
> +            /*
> +             * 32-bit dom0 cannot process Xen''s addresses (which
are 64 bit)
> +             * and therefore we treat it the same way as a non-priviledged
> +             * PV 32-bit domain.
> +             */
> +            if ( is_pv_32bit_domain(current->domain) )
> +            {
> +                struct compat_cpu_user_regs cmp;
> +
> +                gregs = guest_cpu_user_regs();
> +                XLAT_cpu_user_regs(&cmp, gregs);
> +                memcpy(p, &cmp, sizeof(struct compat_cpu_user_regs));
> +            }
> +            else if ( (current->domain != dom0) &&
!is_idle_vcpu(current) &&
> +                !(vpmu_mode & XENPMU_MODE_PRIV) )
> +            {
> +                /* PV guest */
> +                gregs = guest_cpu_user_regs();
> +                memcpy(p, gregs, sizeof(struct cpu_user_regs));
> +            }
> +            else
> +                memcpy(p, regs, sizeof(struct cpu_user_regs));
> +        }
> +        else
> +        {
> +            /* HVM guest */
> +            struct segment_register cs;
> +
> +            gregs = guest_cpu_user_regs();
> +            hvm_get_segment_register(current, x86_seg_cs, &cs);
> +            gregs->cs = cs.attr.fields.dpl;
> +
> +            memcpy(p, gregs, sizeof(struct cpu_user_regs));
> +        }
> +
> +        v->arch.vpmu.xenpmu_data->domain_id =
current->domain->domain_id;
> +        v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
> +        v->arch.vpmu.xenpmu_data->pcpu_id = smp_processor_id();
> +
> +        raise_softirq(PMU_SOFTIRQ);
> +        vpmu_set(vpmu, VPMU_WAIT_FOR_FLUSH);
> +
> +        return 1;
> +    }
> +    else  if ( vpmu->arch_vpmu_ops )
>      {
> -        struct vlapic *vlapic = vcpu_vlapic(v);
> +        /* HVM guest */
> +        struct vlapic *vlapic;
>          u32 vlapic_lvtpc;
>          unsigned char int_vec;
>  
>          if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
>              return 0;
>  
> +        vlapic = vcpu_vlapic(v);
>          if ( !is_vlapic_lvtpc_enabled(vlapic) )
>              return 1;
>  
Assuming the plan is to run this in NMI context - this is _a lot_ of
stuff you want to do. Did you carefully audit all paths for being
NMI-safe?

Jan

Boris Ostrovsky

2013-Sep-25 14:39 UTC

head link

Re: [PATCH v2 05/13] intel/VPMU: Clean up Intel VPMU code

On 09/25/2013 09:55 AM, Jan Beulich wrote:>>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>> +void vmx_rm_guest_msr(u32 msr)
>> +{
>> +    struct vcpu *curr = current;
>> +    unsigned int i, idx, msr_count = curr->arch.hvm_vmx.msr_count;
>> +    struct vmx_msr_entry *msr_area = curr->arch.hvm_vmx.msr_area;
>> +
>> +    if ( msr_area == NULL )
>> +        return;
>> +
>> +    for ( idx = 0; idx < msr_count; idx++ )
>> +        if ( msr_area[idx].index == msr )
>> +            break;
>> +
>> +    if ( idx == msr_count )
>> +        return;
>> +
>> +    for ( i = idx; i < msr_count - 1; i++ )
> "idx" not being used further down anymore, why do you need
> another loop variable here?
>
>> +    {
>> +        msr_area[i].index = msr_area[i + 1].index;
>> +        rdmsrl(msr_area[i].index, msr_area[i].data);
> This is clearly a side effect of the function call no-one would
> expect. Why do you do this?
I don''t understand what you are trying to say here.

(And this is wrong, instead of rdmsr it should be
         msr_area[i].data = msr_area[i + 1].data;
)
>
>> +    }
>> +    msr_area[msr_count - 1].index = 0;
>> +
>> +    curr->arch.hvm_vmx.msr_count = --msr_count;
>> +    __vmwrite(VM_EXIT_MSR_STORE_COUNT, msr_count);
>> +    __vmwrite(VM_ENTRY_MSR_LOAD_COUNT, msr_count);
>> +
>> +    return;
> Pointless "return".
>
> All the same comments apply to vmx_rm_host_load_msr().
>
>> +static int arch_pmc_cnt; /* Number of general-purpose performance
counters */
>> +static int fixed_pmc_cnt; /* Number of fixed performance counters */
> unsigned? __read_mostly?
>
>> @@ -248,13 +230,13 @@ static void core2_vpmu_set_msr_bitmap(unsigned
long *msr_bitmap)
>>       int i;
>>   
>>       /* Allow Read/Write PMU Counters MSR Directly. */
>> -    for ( i = 0; i < core2_fix_counters.num; i++ )
>> +    for ( i = 0; i < fixed_pmc_cnt; i++ )
>>       {
>> -        clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
msr_bitmap);
>> -        clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
>> +        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
msr_bitmap);
>> +        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
> Dropping the static array will make the handling here quite a bit more
> complicated should there ever appear a second dis-contiguous MSR
> range.
Fixed counters range should always be contiguous per Intel SDM.
>> @@ -262,32 +244,37 @@ static void core2_vpmu_set_msr_bitmap(unsigned
long *msr_bitmap)
>>       }
>>   
>>       /* Allow Read PMU Non-global Controls Directly. */
>> -    for ( i = 0; i < core2_ctrls.num; i++ )
>> -        clear_bit(msraddr_to_bitpos(core2_ctrls.msr[i]), msr_bitmap);
>> -    for ( i = 0; i < core2_get_pmc_count(); i++ )
>> +    for ( i = 0; i < arch_pmc_cnt; i++ )
>>           clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i), msr_bitmap);
>> +
>> +    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL),
msr_bitmap);
>> +    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE), msr_bitmap);
>> +    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
> As you can see, this is already the case here.
This is a different set of MSRs from from what you''ve commented on
above.

-boris
>
> The patch description doesn''t really say _why_ you do this.
> Jan
>

Andrew Cooper

2013-Sep-25 14:40 UTC

head link

Re: [PATCH v2 11/13] x86/PMU: Handle PMU interrupts for PV guests

On 25/09/13 15:33, Jan Beulich wrote:>>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>> Add support for handling PMU interrupts for PV guests, make these
interrupts
>> NMI instead of PMU_APIC_VECTOR vector. Depending on vpmu_mode forward
the
>> interrupts to appropriate guest (mode is VPMU_ON) or to dom0
(VPMU_DOM0).
> Is using NMIs here a necessity? I guess not, in which case I''d
really
> like this to be a (perhaps even non-default) option controllable via
> command line option.
>
>> - * This interrupt handles performance counters interrupt
>> - */
>> -
>> -void pmu_apic_interrupt(struct cpu_user_regs *regs)
>> -{
>> -    ack_APIC_irq();
>> -    vpmu_do_interrupt(regs);
>> -}
> So this was the only caller of vpmu_do_interrupt(); no new one gets
> added in this patch afaics, and I don''t recall having seen
addition of
> another caller in earlier patches. What''s the deal?
>
>> @@ -99,17 +106,97 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t
*msr_content)
>>  int vpmu_do_interrupt(struct cpu_user_regs *regs)
>>  {
>>      struct vcpu *v = current;
>> -    struct vpmu_struct *vpmu = vcpu_vpmu(v);
>> +    struct vpmu_struct *vpmu;
>>  
>> -    if ( vpmu->arch_vpmu_ops )
>> +
>> +    /* dom0 will handle this interrupt */
>> +    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
>> +        (v->domain->domain_id >= DOMID_FIRST_RESERVED) )
>> +    {
>> +            if ( smp_processor_id() >= dom0->max_vcpus )
>> +                return 0;
>> +            v = dom0->vcpu[smp_processor_id()];
>> +    }
>> +
>> +    vpmu = vcpu_vpmu(v);
>> +    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
>> +        return 0;
>> +
>> +    if ( !is_hvm_domain(v->domain) || (vpmu_mode &
XENPMU_MODE_PRIV) )
>> +    {
>> +        /* PV guest or dom0 is doing system profiling */
>> +        void *p;
>> +        struct cpu_user_regs *gregs;
>> +
>> +        p = &v->arch.vpmu.xenpmu_data->pmu.regs;
>> +
>> +        /* PV guest will be reading PMU MSRs from xenpmu_data */
>> +        vpmu_save_force(v);
>> +
>> +        /* Store appropriate registers in xenpmu_data
>> +         *
>> +         * Note: ''!current->is_running'' is
possible when ''set_current(next)''
>> +         * for the (HVM) guest has been called but
''reset_stack_and_jump()''
>> +         * has not (i.e. the guest is not actually running yet).
>> +         */
>> +        if ( !is_hvm_domain(current->domain) ||
>> +             ((vpmu_mode & XENPMU_MODE_PRIV) &&
!current->is_running) )
>> +        {
>> +            /*
>> +             * 32-bit dom0 cannot process Xen''s addresses
(which are 64 bit)
>> +             * and therefore we treat it the same way as a
non-priviledged
>> +             * PV 32-bit domain.
>> +             */
>> +            if ( is_pv_32bit_domain(current->domain) )
>> +            {
>> +                struct compat_cpu_user_regs cmp;
>> +
>> +                gregs = guest_cpu_user_regs();
>> +                XLAT_cpu_user_regs(&cmp, gregs);
>> +                memcpy(p, &cmp, sizeof(struct
compat_cpu_user_regs));
>> +            }
>> +            else if ( (current->domain != dom0) &&
!is_idle_vcpu(current) &&
>> +                !(vpmu_mode & XENPMU_MODE_PRIV) )
>> +            {
>> +                /* PV guest */
>> +                gregs = guest_cpu_user_regs();
>> +                memcpy(p, gregs, sizeof(struct cpu_user_regs));
>> +            }
>> +            else
>> +                memcpy(p, regs, sizeof(struct cpu_user_regs));
>> +        }
>> +        else
>> +        {
>> +            /* HVM guest */
>> +            struct segment_register cs;
>> +
>> +            gregs = guest_cpu_user_regs();
>> +            hvm_get_segment_register(current, x86_seg_cs, &cs);
>> +            gregs->cs = cs.attr.fields.dpl;
>> +
>> +            memcpy(p, gregs, sizeof(struct cpu_user_regs));
>> +        }
>> +
>> +        v->arch.vpmu.xenpmu_data->domain_id =
current->domain->domain_id;
>> +        v->arch.vpmu.xenpmu_data->vcpu_id = current->vcpu_id;
>> +        v->arch.vpmu.xenpmu_data->pcpu_id = smp_processor_id();
>> +
>> +        raise_softirq(PMU_SOFTIRQ);
>> +        vpmu_set(vpmu, VPMU_WAIT_FOR_FLUSH);
>> +
>> +        return 1;
>> +    }
>> +    else  if ( vpmu->arch_vpmu_ops )
>>      {
>> -        struct vlapic *vlapic = vcpu_vlapic(v);
>> +        /* HVM guest */
>> +        struct vlapic *vlapic;
>>          u32 vlapic_lvtpc;
>>          unsigned char int_vec;
>>  
>>          if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
>>              return 0;
>>  
>> +        vlapic = vcpu_vlapic(v);
>>          if ( !is_vlapic_lvtpc_enabled(vlapic) )
>>              return 1;
>>  
> Assuming the plan is to run this in NMI context - this is _a lot_ of
> stuff you want to do. Did you carefully audit all paths for being
> NMI-safe?
>
> Jan
vpmu_save() is not safe from an NMI context, as its non-NMI context uses
local_irq_disable() to achieve consistency.

~Andrew

Boris Ostrovsky

2013-Sep-25 14:49 UTC

head link

Re: [PATCH v2 07/13] x86/PMU: Make vpmu not HVM-specific

On 09/25/2013 10:05 AM, Jan Beulich wrote:>>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/include/asm-x86/hvm/vpmu.h
>> +++ b/xen/include/asm-x86/hvm/vpmu.h
>> @@ -31,9 +31,9 @@
>>   #define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
>>   #define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
>>   
>> -#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
>> +#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.vpmu))
>>   #define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
>> -                                          arch.hvm_vcpu.vpmu))
>> +                                          arch.vpmu))
> This clearly needs to be moved out of a HVM specific header the too.
The last patch of the series moves all vpmu* files up from HVM.

-boris

Jan Beulich

2013-Sep-25 14:53 UTC

head link

Re: [PATCH v2 01/13] Export hypervisor symbols

>>> On 25.09.13 at 16:03, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> On 09/25/2013 09:15 AM, Jan Beulich wrote:
>>>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>>> +    case XENPF_get_symbols:
>>> +    {
>>> +        ret = xensyms_read(&op->u.symdata);
>>> +        if ( ret >= 0 &&
__copy_field_to_guest(u_xenpf_op, op, u.symdata) )
>>> +            ret = -EFAULT;
>>> +    }
>>> +    break;
>> This yields a positive return value if a symbol was found, 0 if none
>> was found, and negative on error. Can we avoid this non-standard
>> first aspect?
> 
> We need to know on the caller side when EOF is reached. There is no good
> error code that I can see that would be appropriate here. ERANGE or ENFILE
> are the closest I can imagine but EOF is not really an error so I am not 
> sure
> this would be the right thing.
> 
> We could look at first byte of the string and see where it''s a 0
but
> that''s also
> somewhat non-standard. Encoding type or address as an invalid token also
> doesn''t look nice.
Why don''t you simply have the hypercall increment the passed in
symbol index. And if it didn''t get incremented the called will know
there was no data returned.
>>> +
>>> +struct xenpf_symdata {
>>> +    /* IN variables */
>>> +    uint64_t xen_symnum;
>>> +
>>> +    /* OUT variables */
>>> +    uint64_t address;
>>> +    uint64_t type;
>> "type" and "xen_symnum" could easily be less than
64 bits wide.
> 
> I am trying to avoid 32-bit compatibility issues here. You will see 
> sometimes
> unnecessary uint64_t in other structures well for the same reason.
I''d encourage you to avoid that, and use padding fields instead.

Jan

Boris Ostrovsky

2013-Sep-25 14:55 UTC

head link

Re: [PATCH v2 08/13] x86/PMU: Interface for setting PMU mode and flags

On 09/25/2013 10:11 AM, Jan Beulich wrote:>>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/include/public/xen.h
>> +++ b/xen/include/public/xen.h
>> @@ -101,6 +101,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
>>   #define __HYPERVISOR_kexec_op             37
>>   #define __HYPERVISOR_tmem_op              38
>>   #define __HYPERVISOR_xc_reserved_op       39 /* reserved for
XenClient */
>> +#define __HYPERVISOR_xenpmu_op            40
>>   
>>   /* Architecture-specific hypercall definitions. */
>>   #define __HYPERVISOR_arch_0               48
> I wonder whether you wouldn''t better use an arch-specific
hypercall
> number here - there''s really very little that''s generic
in what I have
> seen so far.
Some of xenpmu commands (querying PMU mode/flags, PMU 
initialization/de-initialization)
seem generic to me. They may call arch-specific routines.
>> +/* Parameters structure for HYPERVISOR_xenpmu_op call */
>> +struct xenpmu_params {
>> +    /* IN/OUT parameters */
>> +    union {
>> +        struct version {
>> +            uint8_t maj;
>> +            uint8_t min;
>> +        } version;
>> +        uint64_t pad;
>> +    };
>> +    union {
>> +        uint64_t val;
>> +        void *valp;
> Without there also being a handle here I can''t see how you could
> make use of the pointer.
Tha''t really a placeholder, valp is not used anywhere. I can replace it
with XEN_GUEST_HANDLE.

-boris

Jan Beulich

2013-Sep-25 14:57 UTC

head link

Re: [PATCH v2 05/13] intel/VPMU: Clean up Intel VPMU code

>>> On 25.09.13 at 16:39, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> On 09/25/2013 09:55 AM, Jan Beulich wrote:
>>>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>>> +    {
>>> +        msr_area[i].index = msr_area[i + 1].index;
>>> +        rdmsrl(msr_area[i].index, msr_area[i].data);
>> This is clearly a side effect of the function call no-one would
>> expect. Why do you do this?
> 
> I don''t understand what you are trying to say here.
> 
> (And this is wrong, instead of rdmsr it should be
>          msr_area[i].data = msr_area[i + 1].data;
> )
That was the very point - doing an MSR read here is clearly
an unexpected side effect.
>>> @@ -248,13 +230,13 @@ static void
core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
>>>       int i;
>>>   
>>>       /* Allow Read/Write PMU Counters MSR Directly. */
>>> -    for ( i = 0; i < core2_fix_counters.num; i++ )
>>> +    for ( i = 0; i < fixed_pmc_cnt; i++ )
>>>       {
>>> -        clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
msr_bitmap);
>>> -        clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
>>> +        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
> msr_bitmap);
>>> +        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 + i),
>> Dropping the static array will make the handling here quite a bit more
>> complicated should there ever appear a second dis-contiguous MSR
>> range.
> 
> Fixed counters range should always be contiguous per Intel SDM.
Until the current range runs out...
>>> @@ -262,32 +244,37 @@ static void
core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
>>>       }
>>>   
>>>       /* Allow Read PMU Non-global Controls Directly. */
>>> -    for ( i = 0; i < core2_ctrls.num; i++ )
>>> -        clear_bit(msraddr_to_bitpos(core2_ctrls.msr[i]),
msr_bitmap);
>>> -    for ( i = 0; i < core2_get_pmc_count(); i++ )
>>> +    for ( i = 0; i < arch_pmc_cnt; i++ )
>>>           clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i),
msr_bitmap);
>>> +
>>> +    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL),
msr_bitmap);
>>> +    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE),
msr_bitmap);
>>> +    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA), msr_bitmap);
>> As you can see, this is already the case here.
> 
> This is a different set of MSRs from from what you''ve commented on
above.
Sure, but the effect of breaking up a loop into individual operations
is seen quite nicely here.

Jan

Jan Beulich

2013-Sep-25 14:57 UTC

head link

Re: [PATCH v2 07/13] x86/PMU: Make vpmu not HVM-specific

>>> On 25.09.13 at 16:49, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> On 09/25/2013 10:05 AM, Jan Beulich wrote:
>>>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>>> --- a/xen/include/asm-x86/hvm/vpmu.h
>>> +++ b/xen/include/asm-x86/hvm/vpmu.h
>>> @@ -31,9 +31,9 @@
>>>   #define VPMU_BOOT_ENABLED 0x1    /* vpmu generally enabled. */
>>>   #define VPMU_BOOT_BTS     0x2    /* Intel BTS feature wanted. */
>>>   
>>> -#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
>>> +#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.vpmu))
>>>   #define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
>>> -                                          arch.hvm_vcpu.vpmu))
>>> +                                          arch.vpmu))
>> This clearly needs to be moved out of a HVM specific header the too.
> 
> The last patch of the series moves all vpmu* files up from HVM.
Yes, I got to that by now. Kind of counter intuitive though.

Jan

Boris Ostrovsky

2013-Sep-25 15:03 UTC

head link

Re: [PATCH v2 10/13] x86/PMU: Add support for PMU registes handling on PV guests

On 09/25/2013 10:23 AM, Jan Beulich wrote:>>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>> Intercept accesses to PMU MSRs and LVTPC APIC vector (only
>> APIC_LVT_MASKED bit is processed) and process them in VPMU
>> module.
> Having scrolled through this more than once, I still can''t see
where
> any APIC interception is happening here.
It''s not. This commit message is a a leftover from pre-v1 
implementation. LVTPC
is now updated via an explicit hypercall.
>> @@ -2574,6 +2587,24 @@ static int emulate_privileged_op(struct
cpu_user_regs *regs)
>>               regs->eax = (uint32_t)msr_content;
>>               regs->edx = (uint32_t)(msr_content >> 32);
>>               break;
>> +        case MSR_IA32_PERF_CAPABILITIES:
>> +            if ( rdmsr_safe(regs->ecx, msr_content) )
>> +                goto fail;
>> +            /* Full-Width Writes not supported */
>> +            regs->eax = (uint32_t)msr_content & ~(1 <<
13);
>> +            regs->edx = (uint32_t)(msr_content >> 32);
> Rather than black listing, please white list know good features
> here.
This ''case'' is gone after I rebased to latest sources (we
implement
full-width writes now).

-boris

Boris Ostrovsky

2013-Sep-25 15:19 UTC

head link

Re: [PATCH v2 11/13] x86/PMU: Handle PMU interrupts for PV guests

On 09/25/2013 10:33 AM, Jan Beulich wrote:>>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>> Add support for handling PMU interrupts for PV guests, make these
interrupts
>> NMI instead of PMU_APIC_VECTOR vector. Depending on vpmu_mode forward
the
>> interrupts to appropriate guest (mode is VPMU_ON) or to dom0
(VPMU_DOM0).
> Is using NMIs here a necessity? I guess not, in which case I''d
really
> like this to be a (perhaps even non-default) option controllable via
> command line option.
It is not a necessity but using NMIs will allow us to profile code that runs
with interrupts disabled.
>> - * This interrupt handles performance counters interrupt
>> - */
>> -
>> -void pmu_apic_interrupt(struct cpu_user_regs *regs)
>> -{
>> -    ack_APIC_irq();
>> -    vpmu_do_interrupt(regs);
>> -}
> So this was the only caller of vpmu_do_interrupt(); no new one gets
> added in this patch afaics, and I don''t recall having seen
addition of
> another caller in earlier patches. What''s the deal?
It''s in 09/13:

  
+int pmu_nmi_interrupt(struct cpu_user_regs *regs, int cpu)
+{
+    return vpmu_do_interrupt(regs);
+}
+



-boris

Jan Beulich

2013-Sep-25 15:25 UTC

head link

Re: [PATCH v2 11/13] x86/PMU: Handle PMU interrupts for PV guests

>>> On 25.09.13 at 17:19, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> On 09/25/2013 10:33 AM, Jan Beulich wrote:
>>>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>>> Add support for handling PMU interrupts for PV guests, make these
interrupts
>>> NMI instead of PMU_APIC_VECTOR vector. Depending on vpmu_mode
forward the
>>> interrupts to appropriate guest (mode is VPMU_ON) or to dom0
(VPMU_DOM0).
>> Is using NMIs here a necessity? I guess not, in which case I''d
really
>> like this to be a (perhaps even non-default) option controllable via
>> command line option.
> 
> It is not a necessity but using NMIs will allow us to profile code that
runs
> with interrupts disabled.
So my request stands to make this optional, default off.
>>> - * This interrupt handles performance counters interrupt
>>> - */
>>> -
>>> -void pmu_apic_interrupt(struct cpu_user_regs *regs)
>>> -{
>>> -    ack_APIC_irq();
>>> -    vpmu_do_interrupt(regs);
>>> -}
>> So this was the only caller of vpmu_do_interrupt(); no new one gets
>> added in this patch afaics, and I don''t recall having seen
addition of
>> another caller in earlier patches. What''s the deal?
> 
> It''s in 09/13:
> 
>   
> +int pmu_nmi_interrupt(struct cpu_user_regs *regs, int cpu)
> +{
> +    return vpmu_do_interrupt(regs);
> +}
> +
Ah. Then please try to break up your changes in a way that logically
connected things stay together. The change you quote above can''t
make much sense earlier than in patch 11.

Jan

Boris Ostrovsky

2013-Sep-25 15:37 UTC

head link

Re: [PATCH v2 05/13] intel/VPMU: Clean up Intel VPMU code

On 09/25/2013 10:57 AM, Jan Beulich wrote:>>>> On 25.09.13 at 16:39, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>>>>
>>>> @@ -248,13 +230,13 @@ static void
core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
>>>>        int i;
>>>>    
>>>>        /* Allow Read/Write PMU Counters MSR Directly. */
>>>> -    for ( i = 0; i < core2_fix_counters.num; i++ )
>>>> +    for ( i = 0; i < fixed_pmc_cnt; i++ )
>>>>        {
>>>> -       
clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]), msr_bitmap);
>>>> -       
clear_bit(msraddr_to_bitpos(core2_fix_counters.msr[i]),
>>>> +        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 +
i),
>> msr_bitmap);
>>>> +        clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR0 +
i),
>>> Dropping the static array will make the handling here quite a bit
more
>>> complicated should there ever appear a second dis-contiguous MSR
>>> range.
>> Fixed counters range should always be contiguous per Intel SDM.
> Until the current range runs out...
Well, there are 58 free addresses currently available in this range...
>>>> @@ -262,32 +244,37 @@ static void
core2_vpmu_set_msr_bitmap(unsigned long *msr_bitmap)
>>>>        }
>>>>    
>>>>        /* Allow Read PMU Non-global Controls Directly. */
>>>> -    for ( i = 0; i < core2_ctrls.num; i++ )
>>>> -        clear_bit(msraddr_to_bitpos(core2_ctrls.msr[i]),
msr_bitmap);
>>>> -    for ( i = 0; i < core2_get_pmc_count(); i++ )
>>>> +    for ( i = 0; i < arch_pmc_cnt; i++ )
>>>>            clear_bit(msraddr_to_bitpos(MSR_P6_EVNTSEL0+i),
msr_bitmap);
>>>> +
>>>> +    clear_bit(msraddr_to_bitpos(MSR_CORE_PERF_FIXED_CTR_CTRL),
msr_bitmap);
>>>> +    clear_bit(msraddr_to_bitpos(MSR_IA32_PEBS_ENABLE),
msr_bitmap);
>>>> +    clear_bit(msraddr_to_bitpos(MSR_IA32_DS_AREA),
msr_bitmap);
>>> As you can see, this is already the case here.
>> This is a different set of MSRs from from what you''ve
commented on above.
> Sure, but the effect of breaking up a loop into individual operations
> is seen quite nicely here.
Yes, but unlike fixed counters above, the registers in what used to be 
in core2_ctrls.msr
are responsible for different things. And in certain cases we want to 
access one register
but not the other.

An example is in current version of vpmu_dump():
  val = core2_vpmu_cxt->ctrls[MSR_CORE_PERF_FIXED_CTR_CTRL_IDX];

We had to add another macro (MSR_CORE_PERF_FIXED_CTR_CTRL_IDX).


So I think that separating these registers explicitly makes sense.

-boris

Boris Ostrovsky

2013-Sep-25 15:52 UTC

head link

Re: [PATCH v2 11/13] x86/PMU: Handle PMU interrupts for PV guests

On 09/25/2013 10:40 AM, Andrew Cooper wrote:> On 25/09/13 15:33, Jan Beulich wrote:
>>>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>>> Add support for handling PMU interrupts for PV guests, make these
interrupts
>>> NMI instead of PMU_APIC_VECTOR vector. Depending on vpmu_mode
forward the
>>> interrupts to appropriate guest (mode is VPMU_ON) or to dom0
(VPMU_DOM0).
>> Is using NMIs here a necessity? I guess not, in which case I''d
really
>> like this to be a (perhaps even non-default) option controllable via
>> command line option.
>>
>>> - * This interrupt handles performance counters interrupt
>>> - */
>>> -
>>> -void pmu_apic_interrupt(struct cpu_user_regs *regs)
>>> -{
>>> -    ack_APIC_irq();
>>> -    vpmu_do_interrupt(regs);
>>> -}
>> So this was the only caller of vpmu_do_interrupt(); no new one gets
>> added in this patch afaics, and I don''t recall having seen
addition of
>> another caller in earlier patches. What''s the deal?
>>
>>> @@ -99,17 +106,97 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t
*msr_content)
>>>   int vpmu_do_interrupt(struct cpu_user_regs *regs)
>>>   {
>>>       struct vcpu *v = current;
>>> -    struct vpmu_struct *vpmu = vcpu_vpmu(v);
>>> +    struct vpmu_struct *vpmu;
>>>   
>>> -    if ( vpmu->arch_vpmu_ops )
>>> +
>>> +    /* dom0 will handle this interrupt */
>>> +    if ( (vpmu_mode & XENPMU_MODE_PRIV) ||
>>> +        (v->domain->domain_id >= DOMID_FIRST_RESERVED) )
>>> +    {
>>> +            if ( smp_processor_id() >= dom0->max_vcpus )
>>> +                return 0;
>>> +            v = dom0->vcpu[smp_processor_id()];
>>> +    }
>>> +
>>> +    vpmu = vcpu_vpmu(v);
>>> +    if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
>>> +        return 0;
>>> +
>>> +    if ( !is_hvm_domain(v->domain) || (vpmu_mode &
XENPMU_MODE_PRIV) )
>>> +    {
>>> +        /* PV guest or dom0 is doing system profiling */
>>> +        void *p;
>>> +        struct cpu_user_regs *gregs;
>>> +
>>> +        p = &v->arch.vpmu.xenpmu_data->pmu.regs;
>>> +
>>> +        /* PV guest will be reading PMU MSRs from xenpmu_data */
>>> +        vpmu_save_force(v);
>>> +
>>> +        /* Store appropriate registers in xenpmu_data
>>> +         *
>>> +         * Note: ''!current->is_running'' is
possible when ''set_current(next)''
>>> +         * for the (HVM) guest has been called but
''reset_stack_and_jump()''
>>> +         * has not (i.e. the guest is not actually running yet).
>>> +         */
>>> +        if ( !is_hvm_domain(current->domain) ||
>>> +             ((vpmu_mode & XENPMU_MODE_PRIV) &&
!current->is_running) )
>>> +        {
>>> +            /*
>>> +             * 32-bit dom0 cannot process Xen''s addresses
(which are 64 bit)
>>> +             * and therefore we treat it the same way as a
non-priviledged
>>> +             * PV 32-bit domain.
>>> +             */
>>> +            if ( is_pv_32bit_domain(current->domain) )
>>> +            {
>>> +                struct compat_cpu_user_regs cmp;
>>> +
>>> +                gregs = guest_cpu_user_regs();
>>> +                XLAT_cpu_user_regs(&cmp, gregs);
>>> +                memcpy(p, &cmp, sizeof(struct
compat_cpu_user_regs));
>>> +            }
>>> +            else if ( (current->domain != dom0) &&
!is_idle_vcpu(current) &&
>>> +                !(vpmu_mode & XENPMU_MODE_PRIV) )
>>> +            {
>>> +                /* PV guest */
>>> +                gregs = guest_cpu_user_regs();
>>> +                memcpy(p, gregs, sizeof(struct cpu_user_regs));
>>> +            }
>>> +            else
>>> +                memcpy(p, regs, sizeof(struct cpu_user_regs));
>>> +        }
>>> +        else
>>> +        {
>>> +            /* HVM guest */
>>> +            struct segment_register cs;
>>> +
>>> +            gregs = guest_cpu_user_regs();
>>> +            hvm_get_segment_register(current, x86_seg_cs,
&cs);
>>> +            gregs->cs = cs.attr.fields.dpl;
>>> +
>>> +            memcpy(p, gregs, sizeof(struct cpu_user_regs));
>>> +        }
>>> +
>>> +        v->arch.vpmu.xenpmu_data->domain_id =
current->domain->domain_id;
>>> +        v->arch.vpmu.xenpmu_data->vcpu_id =
current->vcpu_id;
>>> +        v->arch.vpmu.xenpmu_data->pcpu_id =
smp_processor_id();
>>> +
>>> +        raise_softirq(PMU_SOFTIRQ);
>>> +        vpmu_set(vpmu, VPMU_WAIT_FOR_FLUSH);
>>> +
>>> +        return 1;
>>> +    }
>>> +    else  if ( vpmu->arch_vpmu_ops )
>>>       {
>>> -        struct vlapic *vlapic = vcpu_vlapic(v);
>>> +        /* HVM guest */
>>> +        struct vlapic *vlapic;
>>>           u32 vlapic_lvtpc;
>>>           unsigned char int_vec;
>>>   
>>>           if ( !vpmu->arch_vpmu_ops->do_interrupt(regs) )
>>>               return 0;
>>>   
>>> +        vlapic = vcpu_vlapic(v);
>>>           if ( !is_vlapic_lvtpc_enabled(vlapic) )
>>>               return 1;
>>>   
>> Assuming the plan is to run this in NMI context - this is _a lot_ of
>> stuff you want to do. Did you carefully audit all paths for being
>> NMI-safe?
>>
>> Jan
> vpmu_save() is not safe from an NMI context, as its non-NMI context uses
> local_irq_disable() to achieve consistency.
Sigh... hvm_get_segment_register() also appears to be unsafe. I''ll will
need to
move it somewhere else.

-boris

Boris Ostrovsky

2013-Sep-25 15:59 UTC

head link

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

On 09/25/2013 10:04 AM, Jan Beulich wrote>> --- /dev/null
>> +++ b/xen/include/public/xenpmu.h
>> @@ -0,0 +1,38 @@
>> +#ifndef __XEN_PUBLIC_XENPMU_H__
>> +#define __XEN_PUBLIC_XENPMU_H__
>> +
>> +#include "xen.h"
>> +#if defined(__i386__) || defined(__x86_64__)
>> +#include "arch-x86/xenpmu-x86.h"
>> +#elif defined (__arm__) || defined (__aarch64__)
>> +#include "arch-arm.h"
> ???

I need to define arch_xenpmu_t in all architectures. This is what
the shared structure looks like:

/* Shared between hypervisor and PV domain */
struct xenpmu_data {
     uint32_t domain_id;
     uint32_t vcpu_id;
     uint32_t pcpu_id;
     uint32_t pmu_flags;

     arch_xenpmu_t pmu;
};


-boris

Jan Beulich

2013-Sep-25 16:08 UTC

head link

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

>>> On 25.09.13 at 17:59, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> On 09/25/2013 10:04 AM, Jan Beulich wrote
>>> --- /dev/null
>>> +++ b/xen/include/public/xenpmu.h
>>> @@ -0,0 +1,38 @@
>>> +#ifndef __XEN_PUBLIC_XENPMU_H__
>>> +#define __XEN_PUBLIC_XENPMU_H__
>>> +
>>> +#include "xen.h"
>>> +#if defined(__i386__) || defined(__x86_64__)
>>> +#include "arch-x86/xenpmu-x86.h"
>>> +#elif defined (__arm__) || defined (__aarch64__)
>>> +#include "arch-arm.h"
>> ???
> 
> 
> I need to define arch_xenpmu_t in all architectures.
Right, but why not arch-arm/xenpmu-arm.h? (And maybe you
could stick a few more arms here and there - read: why the x86
duplication in the x86 name?)

Jan

Boris Ostrovsky

2013-Sep-30 13:25 UTC

head link

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

On 09/25/2013 10:04 AM, Jan Beulich wrote:>>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>> +struct amd_vpmu_context {
>> +    uint64_t counters[XENPMU_AMD_MAX_COUNTERS];
>> +    uint64_t ctrls[XENPMU_AMD_MAX_COUNTERS];
>> +    uint8_t msr_bitmap_set;               /* Used by HVM only */
>> +};
> sizeof() this will not be the same for this in a 64-bit an a 32-bit guest.
> Are you intentionally creating a need for translation here?
>
>> +
>> +/* Intel PMU registers and structures */
>> +#define XENPMU_CORE2_MAX_ARCH_PMCS     16
>> +#define XENPMU_CORE2_MAX_FIXED_PMCS    4
>> +struct core2_vpmu_context {
>> +    uint64_t global_ctrl;
>> +    uint64_t global_ovf_ctrl;
>> +    uint64_t global_status;
>> +    uint64_t global_ovf_status;
>> +    uint64_t fixed_ctrl;
>> +    uint64_t ds_area;
>> +    uint64_t pebs_enable;
>> +    uint64_t debugctl;
>> +    uint64_t fix_counters[XENPMU_CORE2_MAX_FIXED_PMCS];
>> +    struct {
>> +        uint64_t counter;
>> +        uint64_t control;
>> +    } arch_msr_pair[XENPMU_CORE2_MAX_ARCH_PMCS];
>> +};
> I realize that using embedded arrays in both AMD and Intel
> structures makes things easier to implement, but it reduces
> forward compatibility. I''d therefore prefer those to be made
> handles.
(I missed this comment earlier).

This is not done because it''s easier but because I need to keep the
structure in a shared page.

-boris

Jan Beulich

2013-Sep-30 13:30 UTC

head link

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

>>> On 30.09.13 at 15:25, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
> On 09/25/2013 10:04 AM, Jan Beulich wrote:
>>>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>>> +/* Intel PMU registers and structures */
>>> +#define XENPMU_CORE2_MAX_ARCH_PMCS     16
>>> +#define XENPMU_CORE2_MAX_FIXED_PMCS    4
>>> +struct core2_vpmu_context {
>>> +    uint64_t global_ctrl;
>>> +    uint64_t global_ovf_ctrl;
>>> +    uint64_t global_status;
>>> +    uint64_t global_ovf_status;
>>> +    uint64_t fixed_ctrl;
>>> +    uint64_t ds_area;
>>> +    uint64_t pebs_enable;
>>> +    uint64_t debugctl;
>>> +    uint64_t fix_counters[XENPMU_CORE2_MAX_FIXED_PMCS];
>>> +    struct {
>>> +        uint64_t counter;
>>> +        uint64_t control;
>>> +    } arch_msr_pair[XENPMU_CORE2_MAX_ARCH_PMCS];
>>> +};
>> I realize that using embedded arrays in both AMD and Intel
>> structures makes things easier to implement, but it reduces
>> forward compatibility. I''d therefore prefer those to be made
>> handles.
> 
> (I missed this comment earlier).
> 
> This is not done because it''s easier but because I need to keep
the
> structure in a shared page.
Then a more dynamic layout (with just the array base offset in the
page specified in the structure) would still be preferable as being
more extensible.

Jan

Boris Ostrovsky

2013-Sep-30 13:55 UTC

head link

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

On 09/30/2013 09:30 AM, Jan Beulich wrote:>>>> On 30.09.13 at 15:25, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>> On 09/25/2013 10:04 AM, Jan Beulich wrote:
>>>>>> On 20.09.13 at 11:42, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>>>> +/* Intel PMU registers and structures */
>>>> +#define XENPMU_CORE2_MAX_ARCH_PMCS     16
>>>> +#define XENPMU_CORE2_MAX_FIXED_PMCS    4
>>>> +struct core2_vpmu_context {
>>>> +    uint64_t global_ctrl;
>>>> +    uint64_t global_ovf_ctrl;
>>>> +    uint64_t global_status;
>>>> +    uint64_t global_ovf_status;
>>>> +    uint64_t fixed_ctrl;
>>>> +    uint64_t ds_area;
>>>> +    uint64_t pebs_enable;
>>>> +    uint64_t debugctl;
>>>> +    uint64_t fix_counters[XENPMU_CORE2_MAX_FIXED_PMCS];
>>>> +    struct {
>>>> +        uint64_t counter;
>>>> +        uint64_t control;
>>>> +    } arch_msr_pair[XENPMU_CORE2_MAX_ARCH_PMCS];
>>>> +};
>>> I realize that using embedded arrays in both AMD and Intel
>>> structures makes things easier to implement, but it reduces
>>> forward compatibility. I''d therefore prefer those to be
made
>>> handles.
>> (I missed this comment earlier).
>>
>> This is not done because it''s easier but because I need to
keep the
>> structure in a shared page.
> Then a more dynamic layout (with just the array base offset in the
> page specified in the structure) would still be preferable as being
> more extensible.
>
>
I''ll see how I can do this, I am afraid this may end up being a bit too
convoluted (the
arrays will still need to be part of some structure with decribes a 
context).

-boris

Xen devel - Sep 2013 - [PATCH v2 00/13] x86/PMU: Xen PMU PV support

[PATCH v2 00/13] x86/PMU: Xen PMU PV support

[PATCH v2 01/13] Export hypervisor symbols

[PATCH v2 02/13] Set VCPU''s is_running flag closer to when the VCPU is dispatched

[PATCH v2 03/13] x86/PMU: Stop AMD counters when called from vpmu_save_force()

[PATCH v2 04/13] x86/VPMU: Minor VPMU cleanup

[PATCH v2 05/13] intel/VPMU: Clean up Intel VPMU code

[PATCH v2 06/13] x86/PMU: Add public xenpmu.h

[PATCH v2 07/13] x86/PMU: Make vpmu not HVM-specific

[PATCH v2 08/13] x86/PMU: Interface for setting PMU mode and flags

[PATCH v2 09/13] x86/PMU: Initialize PMU for PV guests

[PATCH v2 10/13] x86/PMU: Add support for PMU registes handling on PV guests

[PATCH v2 11/13] x86/PMU: Handle PMU interrupts for PV guests

[PATCH v2 12/13] x86/PMU: Save VPMU state for PV guests during context switch

[PATCH v2 13/13] x86/PMU: Move vpmu files up from hvm directory

Re: [PATCH v2 05/13] intel/VPMU: Clean up Intel VPMU code

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

Re: [PATCH v2 10/13] x86/PMU: Add support for PMU registes handling on PV guests

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

Re: [PATCH v2 01/13] Export hypervisor symbols

Re: [PATCH v2 05/13] intel/VPMU: Clean up Intel VPMU code

Re: [PATCH v2 01/13] Export hypervisor symbols

Re: [PATCH v2 01/13] Export hypervisor symbols

Re: [PATCH v2 01/13] Export hypervisor symbols

Re: [PATCH v2 02/13] Set VCPU''s is_running flag closer to when the VCPU is dispatched

Re: [PATCH v2 05/13] intel/VPMU: Clean up Intel VPMU code

Re: [PATCH v2 01/13] Export hypervisor symbols

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

Re: [PATCH v2 07/13] x86/PMU: Make vpmu not HVM-specific

Re: [PATCH v2 02/13] Set VCPU''s is_running flag closer to when the VCPU is dispatched

Re: [PATCH v2 08/13] x86/PMU: Interface for setting PMU mode and flags

Re: [PATCH v2 10/13] x86/PMU: Add support for PMU registes handling on PV guests

Re: [PATCH v2 11/13] x86/PMU: Handle PMU interrupts for PV guests

Re: [PATCH v2 05/13] intel/VPMU: Clean up Intel VPMU code

Re: [PATCH v2 11/13] x86/PMU: Handle PMU interrupts for PV guests

Re: [PATCH v2 07/13] x86/PMU: Make vpmu not HVM-specific

Re: [PATCH v2 01/13] Export hypervisor symbols

Re: [PATCH v2 08/13] x86/PMU: Interface for setting PMU mode and flags

Re: [PATCH v2 05/13] intel/VPMU: Clean up Intel VPMU code

Re: [PATCH v2 07/13] x86/PMU: Make vpmu not HVM-specific

Re: [PATCH v2 10/13] x86/PMU: Add support for PMU registes handling on PV guests

Re: [PATCH v2 11/13] x86/PMU: Handle PMU interrupts for PV guests

Re: [PATCH v2 11/13] x86/PMU: Handle PMU interrupts for PV guests

Re: [PATCH v2 05/13] intel/VPMU: Clean up Intel VPMU code

Re: [PATCH v2 11/13] x86/PMU: Handle PMU interrupts for PV guests

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h

Re: [PATCH v2 06/13] x86/PMU: Add public xenpmu.h