Raghavendra K T
2011-Nov-30 08:59 UTC
[PATCH RFC V3 0/4] kvm : Paravirt-spinlock support for KVM guests
The 4-patch series to follow this email extends KVM-hypervisor and Linux guest running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's implementation. One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick another vcpu out of halt state. The blocking of vcpu is done using halt() in (lock_spinning) slowpath. The V2 change discussion was in: https://lkml.org/lkml/2011/10/23/207 Previous discussions : (posted by Srivatsa V). https://lkml.org/lkml/2010/7/26/24 https://lkml.org/lkml/2011/1/19/212 The BASE patch is tip 3.2-rc1 + Jeremy's following patches. xadd (https://lkml.org/lkml/2011/10/4/328) x86/ticketlocklock (https://lkml.org/lkml/2011/10/12/496). Changes in V3: - rebased to 3.2-rc1 - use halt() instead of wait for kick hypercall. - modify kick hyper call to do wakeup halted vcpu. - hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of head##.c). - fix the potential race when zero_stat is read. - export debugfs_create_32 and add documentation to API. - use static inline and enum instead of ADDSTAT macro. - add barrier() in after setting kick_vcpu. - empty static inline function for kvm_spinlock_init. - combine the patches one and two readuce overhead. - make KVM_DEBUGFS depends on DEBUGFS. - include debugfs header unconditionally. Changes in V2: - rebased patchesto -rc9 - synchronization related changes based on Jeremy's changes (Jeremy Fitzhardinge <jeremy.fitzhardinge at citrix.com>) pointed by Stephan Diestelhorst <stephan.diestelhorst at amd.com> - enabling 32 bit guests - splitted patches into two more chunks Srivatsa Vaddagiri, Suzuki Poulose, Raghavendra K T (4): Add debugfs support to print u32-arrays in debugfs Add a hypercall to KVM hypervisor to support pv-ticketlocks Added configuration support to enable debug information for KVM Guests pv-ticketlocks support for linux guests running on KVM hypervisor Results: From the results we can see that patched kernel performance is similar to BASE when there is no lock contention. But once we start seeing more contention, patched kernel outperforms BASE. set up : Kernel for host/guest : 3.2-rc1 + Jeremy's xadd, pv spinlock patches as BASE 3 guests with 8VCPU, 4GB RAM, 1 used for kernbench (kernbench -f -H -M -o 20) other for cpuhog (shell script while true with an instruction) scenario A: unpinned 1x: no hogs 2x: 8hogs in one guest 3x: 8hogs each in two guest Result for Non PLE machine : Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM BASE BASE+patch %improvement mean (sd) mean (sd) Scenario A: case 1x: 157.548 (10.624) 156.408 (11.1622) 0.723589 case 2x: 1110.18 (807.019) 310.96 (105.194) 71.9901 case 3x: 3110.36 (2408.03) 303.688 (110.474) 90.2362 Result for PLE machine: Machine : IBM xSeries with Intel(R) Xeon(R) X7560 2.27GHz CPU with 32/64 core, with 8 online cores and 4*64GB RAM BASE BASE+patch %improvement mean (sd) mean (sd) Scenario A: case 1x: 159.725 (47.4906) 159.07 (47.8133) 0.41008 case 2x: 190.957 (49.2976) 187.273 (50.5469) 1.92923 case 3x: 226.317 (88.6023) 223.698 (90.4362) 1.15723 --- 13 files changed, 454 insertions(+), 112 deletions(-) arch/x86/Kconfig | 9 ++ arch/x86/include/asm/kvm_para.h | 17 +++- arch/x86/kernel/kvm.c | 247 +++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.c | 28 +++++- arch/x86/xen/debugfs.c | 104 ---------------- arch/x86/xen/debugfs.h | 4 - arch/x86/xen/spinlock.c | 2 +- fs/debugfs/file.c | 128 ++++++++++++++++++++ include/linux/debugfs.h | 11 ++ include/linux/kvm.h | 1 + include/linux/kvm_host.h | 5 + include/linux/kvm_para.h | 1 + virt/kvm/kvm_main.c | 7 + 13 files changed, 452 insertions(+), 112 deletions(-)
Raghavendra K T
2011-Nov-30 08:59 UTC
[PATCH RFC V3 1/4] debugfs: Add support to print u32 array in debugfs
Add debugfs support to print u32-arrays in debugfs. Move the code from Xen to debugfs to make the code common for other users as well. Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> --- diff --git a/arch/x86/xen/debugfs.c b/arch/x86/xen/debugfs.c index 7c0fedd..c8377fb 100644 --- a/arch/x86/xen/debugfs.c +++ b/arch/x86/xen/debugfs.c @@ -19,107 +19,3 @@ struct dentry * __init xen_init_debugfs(void) return d_xen_debug; } -struct array_data -{ - void *array; - unsigned elements; -}; - -static int u32_array_open(struct inode *inode, struct file *file) -{ - file->private_data = NULL; - return nonseekable_open(inode, file); -} - -static size_t format_array(char *buf, size_t bufsize, const char *fmt, - u32 *array, unsigned array_size) -{ - size_t ret = 0; - unsigned i; - - for(i = 0; i < array_size; i++) { - size_t len; - - len = snprintf(buf, bufsize, fmt, array[i]); - len++; /* ' ' or '\n' */ - ret += len; - - if (buf) { - buf += len; - bufsize -= len; - buf[-1] = (i == array_size-1) ? '\n' : ' '; - } - } - - ret++; /* \0 */ - if (buf) - *buf = '\0'; - - return ret; -} - -static char *format_array_alloc(const char *fmt, u32 *array, unsigned array_size) -{ - size_t len = format_array(NULL, 0, fmt, array, array_size); - char *ret; - - ret = kmalloc(len, GFP_KERNEL); - if (ret == NULL) - return NULL; - - format_array(ret, len, fmt, array, array_size); - return ret; -} - -static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len, - loff_t *ppos) -{ - struct inode *inode = file->f_path.dentry->d_inode; - struct array_data *data = inode->i_private; - size_t size; - - if (*ppos == 0) { - if (file->private_data) { - kfree(file->private_data); - file->private_data = NULL; - } - - file->private_data = format_array_alloc("%u", data->array, data->elements); - } - - size = 0; - if (file->private_data) - size = strlen(file->private_data); - - return simple_read_from_buffer(buf, len, ppos, file->private_data, size); -} - -static int xen_array_release(struct inode *inode, struct file *file) -{ - kfree(file->private_data); - - return 0; -} - -static const struct file_operations u32_array_fops = { - .owner = THIS_MODULE, - .open = u32_array_open, - .release= xen_array_release, - .read = u32_array_read, - .llseek = no_llseek, -}; - -struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode, - struct dentry *parent, - u32 *array, unsigned elements) -{ - struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL); - - if (data == NULL) - return NULL; - - data->array = array; - data->elements = elements; - - return debugfs_create_file(name, mode, parent, data, &u32_array_fops); -} diff --git a/arch/x86/xen/debugfs.h b/arch/x86/xen/debugfs.h index e281320..12ebf33 100644 --- a/arch/x86/xen/debugfs.h +++ b/arch/x86/xen/debugfs.h @@ -3,8 +3,4 @@ struct dentry * __init xen_init_debugfs(void); -struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode, - struct dentry *parent, - u32 *array, unsigned elements); - #endif /* _XEN_DEBUGFS_H */ diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index fc506e6..14a8961 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -286,7 +286,7 @@ static int __init xen_spinlock_debugfs(void) debugfs_create_u64("time_blocked", 0444, d_spin_debug, &spinlock_stats.time_blocked); - xen_debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug, + debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug, spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1); return 0; diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c index 90f7657..df44ccf 100644 --- a/fs/debugfs/file.c +++ b/fs/debugfs/file.c @@ -18,6 +18,7 @@ #include <linux/pagemap.h> #include <linux/namei.h> #include <linux/debugfs.h> +#include <linux/slab.h> static ssize_t default_read_file(struct file *file, char __user *buf, size_t count, loff_t *ppos) @@ -525,3 +526,130 @@ struct dentry *debugfs_create_blob(const char *name, mode_t mode, return debugfs_create_file(name, mode, parent, blob, &fops_blob); } EXPORT_SYMBOL_GPL(debugfs_create_blob); + +struct array_data { + void *array; + u32 elements; +}; + +static int u32_array_open(struct inode *inode, struct file *file) +{ + file->private_data = NULL; + return nonseekable_open(inode, file); +} + +static size_t format_array(char *buf, size_t bufsize, const char *fmt, + u32 *array, u32 array_size) +{ + size_t ret = 0; + u32 i; + + for (i = 0; i < array_size; i++) { + size_t len; + + len = snprintf(buf, bufsize, fmt, array[i]); + len++; /* ' ' or '\n' */ + ret += len; + + if (buf) { + buf += len; + bufsize -= len; + buf[-1] = (i == array_size-1) ? '\n' : ' '; + } + } + + ret++; /* \0 */ + if (buf) + *buf = '\0'; + + return ret; +} + +static char *format_array_alloc(const char *fmt, u32 *array, + u32 array_size) +{ + size_t len = format_array(NULL, 0, fmt, array, array_size); + char *ret; + + ret = kmalloc(len, GFP_KERNEL); + if (ret == NULL) + return NULL; + + format_array(ret, len, fmt, array, array_size); + return ret; +} + +static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len, + loff_t *ppos) +{ + struct inode *inode = file->f_path.dentry->d_inode; + struct array_data *data = inode->i_private; + size_t size; + + if (*ppos == 0) { + if (file->private_data) { + kfree(file->private_data); + file->private_data = NULL; + } + + file->private_data = format_array_alloc("%u", data->array, + data->elements); + } + + size = 0; + if (file->private_data) + size = strlen(file->private_data); + + return simple_read_from_buffer(buf, len, ppos, + file->private_data, size); +} + +static int u32_array_release(struct inode *inode, struct file *file) +{ + kfree(file->private_data); + + return 0; +} + +static const struct file_operations u32_array_fops = { + .owner = THIS_MODULE, + .open = u32_array_open, + .release = u32_array_release, + .read = u32_array_read, + .llseek = no_llseek, +}; + +/** + * debugfs_create_u32_array - create a debugfs file that is used to read u32 + * array. + * @name: a pointer to a string containing the name of the file to create. + * @mode: the permission that the file should have. + * @parent: a pointer to the parent dentry for this file. This should be a + * directory dentry if set. If this parameter is %NULL, then the + * file will be created in the root of the debugfs filesystem. + * @array: u32 array that provides data. + * @elements: total number of elements in the array. + * + * This function creates a file in debugfs with the given name that exports + * @array as data. If the @mode variable is so set it can be read from. + * Writing is not supported. Seek within the file is also not supported. + * Once array is created its size can not be changed. + * + * The function returns a pointer to dentry on success. If debugfs is not + * enabled in the kernel, the value -%ENODEV will be returned. + */ +struct dentry *debugfs_create_u32_array(const char *name, mode_t mode, + struct dentry *parent, + u32 *array, u32 elements) +{ + struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL); + + if (data == NULL) + return NULL; + + data->array = array; + data->elements = elements; + + return debugfs_create_file(name, mode, parent, data, &u32_array_fops); +} +EXPORT_SYMBOL_GPL(debugfs_create_u32_array); diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h index e7d9b20..253e2fb 100644 --- a/include/linux/debugfs.h +++ b/include/linux/debugfs.h @@ -74,6 +74,10 @@ struct dentry *debugfs_create_blob(const char *name, mode_t mode, struct dentry *parent, struct debugfs_blob_wrapper *blob); +struct dentry *debugfs_create_u32_array(const char *name, mode_t mode, + struct dentry *parent, + u32 *array, u32 elements); + bool debugfs_initialized(void); #else @@ -193,6 +197,13 @@ static inline bool debugfs_initialized(void) return false; } +struct dentry *debugfs_create_u32_array(const char *name, mode_t mode, + struct dentry *parent, + u32 *array, u32 elements) +{ + return ERR_PTR(-ENODEV); +} + #endif #endif
Raghavendra K T
2011-Nov-30 08:59 UTC
[PATCH RFC V3 2/4] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
Add a hypercall to KVM hypervisor to support pv-ticketlocks KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state. The presence of these hypercalls is indicated to guest via KVM_FEATURE_KICK_VCPU/KVM_CAP_KICK_VCPU. Qemu needs a corresponding patch to pass up the presence of this feature to guest via cpuid. Patch to qemu will be sent separately. There is no Xen/KVM hypercall interface to await kick from. Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> --- diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 734c376..8b1d65d 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -16,12 +16,14 @@ #define KVM_FEATURE_CLOCKSOURCE 0 #define KVM_FEATURE_NOP_IO_DELAY 1 #define KVM_FEATURE_MMU_OP 2 + /* This indicates that the new set of kvmclock msrs * are available. The use of 0x11 and 0x12 is deprecated */ #define KVM_FEATURE_CLOCKSOURCE2 3 #define KVM_FEATURE_ASYNC_PF 4 #define KVM_FEATURE_STEAL_TIME 5 +#define KVM_FEATURE_KICK_VCPU 6 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c38efd7..6e1c8b4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2103,6 +2103,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_XSAVE: case KVM_CAP_ASYNC_PF: case KVM_CAP_GET_TSC_KHZ: + case KVM_CAP_KICK_VCPU: r = 1; break; case KVM_CAP_COALESCED_MMIO: @@ -2577,7 +2578,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, (1 << KVM_FEATURE_NOP_IO_DELAY) | (1 << KVM_FEATURE_CLOCKSOURCE2) | (1 << KVM_FEATURE_ASYNC_PF) | - (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); + (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | + (1 << KVM_FEATURE_KICK_VCPU); if (sched_info_on()) entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); @@ -5305,6 +5307,26 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) return 1; } +/* + * kvm_pv_kick_cpu_op: Kick a vcpu. + * + * @cpu - vcpu to be kicked. + */ +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int cpu) +{ + struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, cpu); + struct kvm_mp_state mp_state; + + mp_state.mp_state = KVM_MP_STATE_RUNNABLE; + if (vcpu) { + vcpu->kicked = 1; + /* Ensure kicked is always set before wakeup */ + barrier(); + } + kvm_arch_vcpu_ioctl_set_mpstate(vcpu, &mp_state); + kvm_vcpu_kick(vcpu); +} + int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) { unsigned long nr, a0, a1, a2, a3, ret; @@ -5341,6 +5363,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) case KVM_HC_MMU_OP: r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2), &ret); break; + case KVM_HC_KICK_CPU: + kvm_pv_kick_cpu_op(vcpu->kvm, a0); + ret = 0; + break; default: ret = -KVM_ENOSYS; break; diff --git a/include/linux/kvm.h b/include/linux/kvm.h index f47fcd3..e760035 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo { #define KVM_CAP_PPC_HIOR 67 #define KVM_CAP_PPC_PAPR 68 #define KVM_CAP_S390_GMAP 71 +#define KVM_CAP_KICK_VCPU 72 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index d526231..ff3b6ff 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -154,6 +154,11 @@ struct kvm_vcpu { #endif struct kvm_vcpu_arch arch; + + /* + * blocked vcpu wakes up by checking this flag set by unlocker. + */ + int kicked; }; static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu) diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h index 47a070b..19f10bd 100644 --- a/include/linux/kvm_para.h +++ b/include/linux/kvm_para.h @@ -19,6 +19,7 @@ #define KVM_HC_MMU_OP 2 #define KVM_HC_FEATURES 3 #define KVM_HC_PPC_MAP_MAGIC_PAGE 4 +#define KVM_HC_KICK_CPU 5 /* * hypercalls use architecture specific diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d9cfb78..8f4b6db 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -226,6 +226,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) vcpu->kvm = kvm; vcpu->vcpu_id = id; vcpu->pid = NULL; + vcpu->kicked = 0; init_waitqueue_head(&vcpu->wq); kvm_async_pf_vcpu_init(vcpu);
Raghavendra K T
2011-Nov-30 09:00 UTC
[PATCH RFC V3 3/4] kvm guest : Added configuration support to enable debug information for KVM Guests
Added configuration support to enable debug information for KVM Guests in debugfs Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> --- diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 5d8152d..526e3ae 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -561,6 +561,15 @@ config KVM_GUEST This option enables various optimizations for running under the KVM hypervisor. +config KVM_DEBUG_FS + bool "Enable debug information for KVM Guests in debugfs" + depends on KVM_GUEST && DEBUG_FS + default n + ---help--- + This option enables collection of various statistics for KVM guest. + Statistics are displayed in debugfs filesystem. Enabling this option + may incur significant overhead. + source "arch/x86/lguest/Kconfig" config PARAVIRT
Raghavendra K T
2011-Nov-30 09:00 UTC
[PATCH RFC V3 4/4] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
This patch extends Linux guests running on KVM hypervisor to support pv-ticketlocks. During smp_boot_cpus paravirtualied KVM guest detects if the hypervisor has required feature (KVM_FEATURE_KICK_VCPU) to support pv-ticketlocks. If so, support for pv-ticketlocks is registered via pv_lock_ops. Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> --- diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 8b1d65d..7e419ad 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -195,10 +195,21 @@ void kvm_async_pf_task_wait(u32 token); void kvm_async_pf_task_wake(u32 token); u32 kvm_read_and_reset_pf_reason(void); extern void kvm_disable_steal_time(void); -#else -#define kvm_guest_init() do { } while (0) + +#ifdef CONFIG_PARAVIRT_SPINLOCKS +void __init kvm_spinlock_init(void); +#else /* CONFIG_PARAVIRT_SPINLOCKS */ +static void kvm_spinlock_init(void) +{ +} +#endif /* CONFIG_PARAVIRT_SPINLOCKS */ + +#else /* CONFIG_KVM_GUEST */ +#define kvm_guest_init() do {} while (0) #define kvm_async_pf_task_wait(T) do {} while(0) #define kvm_async_pf_task_wake(T) do {} while(0) +#define kvm_spinlock_init() do {} while (0) + static inline u32 kvm_read_and_reset_pf_reason(void) { return 0; diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index a9c2116..dffeea3 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -33,6 +33,7 @@ #include <linux/sched.h> #include <linux/slab.h> #include <linux/kprobes.h> +#include <linux/debugfs.h> #include <asm/timer.h> #include <asm/cpu.h> #include <asm/traps.h> @@ -545,6 +546,7 @@ static void __init kvm_smp_prepare_boot_cpu(void) #endif kvm_guest_cpu_init(); native_smp_prepare_boot_cpu(); + kvm_spinlock_init(); } static void __cpuinit kvm_guest_cpu_online(void *dummy) @@ -627,3 +629,248 @@ static __init int activate_jump_labels(void) return 0; } arch_initcall(activate_jump_labels); + +#ifdef CONFIG_PARAVIRT_SPINLOCKS + +enum kvm_contention_stat { + TAKEN_SLOW, + TAKEN_SLOW_PICKUP, + RELEASED_SLOW, + RELEASED_SLOW_KICKED, + NR_CONTENTION_STATS +}; + +#ifdef CONFIG_KVM_DEBUG_FS + +static struct kvm_spinlock_stats +{ + u32 contention_stats[NR_CONTENTION_STATS]; + +#define HISTO_BUCKETS 30 + u32 histo_spin_blocked[HISTO_BUCKETS+1]; + + u64 time_blocked; +} spinlock_stats; + +static u8 zero_stats; + +static inline void check_zero(void) +{ + u8 ret; + u8 old = ACCESS_ONCE(zero_stats); + if (unlikely(old)) { + ret = cmpxchg(&zero_stats, old, 0); + /* This ensures only one fellow resets the stat */ + if (ret == old) + memset(&spinlock_stats, 0, sizeof(spinlock_stats)); + } +} + +static inline void add_stats(enum kvm_contention_stat var, int val) +{ + check_zero(); + spinlock_stats.contention_stats[var] += val; +} + + +static inline u64 spin_time_start(void) +{ + return sched_clock(); +} + +static void __spin_time_accum(u64 delta, u32 *array) +{ + unsigned index = ilog2(delta); + + check_zero(); + + if (index < HISTO_BUCKETS) + array[index]++; + else + array[HISTO_BUCKETS]++; +} + +static inline void spin_time_accum_blocked(u64 start) +{ + u32 delta = sched_clock() - start; + + __spin_time_accum(delta, spinlock_stats.histo_spin_blocked); + spinlock_stats.time_blocked += delta; +} + +static struct dentry *d_spin_debug; +static struct dentry *d_kvm_debug; + +struct dentry *kvm_init_debugfs(void) +{ + d_kvm_debug = debugfs_create_dir("kvm", NULL); + if (!d_kvm_debug) + printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n"); + + return d_kvm_debug; +} + +static int __init kvm_spinlock_debugfs(void) +{ + struct dentry *d_kvm = kvm_init_debugfs(); + + if (d_kvm == NULL) + return -ENOMEM; + + d_spin_debug = debugfs_create_dir("spinlocks", d_kvm); + + debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats); + + debugfs_create_u32("taken_slow", 0444, d_spin_debug, + &spinlock_stats.contention_stats[TAKEN_SLOW]); + debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug, + &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]); + + debugfs_create_u32("released_slow", 0444, d_spin_debug, + &spinlock_stats.contention_stats[RELEASED_SLOW]); + debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug, + &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]); + + debugfs_create_u64("time_blocked", 0444, d_spin_debug, + &spinlock_stats.time_blocked); + + debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug, + spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1); + + return 0; +} +fs_initcall(kvm_spinlock_debugfs); +#else /* !CONFIG_KVM_DEBUG_FS */ +#define TIMEOUT (1 << 10) +static inline void add_stats(enum kvm_contention_stat var, u32 val) +{ +} + +static inline u64 spin_time_start(void) +{ + return 0; +} + +static inline void spin_time_accum_blocked(u64 start) +{ +} +#endif /* CONFIG_KVM_DEBUG_FS */ + +struct kvm_lock_waiting { + struct arch_spinlock *lock; + __ticket_t want; +}; + +/* cpus 'waiting' on a spinlock to become available */ +static cpumask_t waiting_cpus; + +/* Track spinlock on which a cpu is waiting */ +static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting); + +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want) +{ + struct kvm_lock_waiting *w = &__get_cpu_var(lock_waiting); + int cpu = smp_processor_id(); + u64 start; + unsigned long flags; + + start = spin_time_start(); + + /* + * Make sure an interrupt handler can't upset things in a + * partially setup state. + */ + local_irq_save(flags); + + /* + * The ordering protocol on this is that the "lock" pointer + * may only be set non-NULL if the "want" ticket is correct. + * If we're updating "want", we must first clear "lock". + */ + w->lock = NULL; + smp_wmb(); + w->want = want; + smp_wmb(); + w->lock = lock; + + add_stats(TAKEN_SLOW, 1); + + /* + * This uses set_bit, which is atomic but we should not rely on its + * reordering gurantees. So barrier is needed after this call. + */ + cpumask_set_cpu(cpu, &waiting_cpus); + + barrier(); + + /* + * Mark entry to slowpath before doing the pickup test to make + * sure we don't deadlock with an unlocker. + */ + __ticket_enter_slowpath(lock); + + /* + * check again make sure it didn't become free while + * we weren't looking. + */ + if (ACCESS_ONCE(lock->tickets.head) == want) { + add_stats(TAKEN_SLOW_PICKUP, 1); + goto out; + } + + /* Allow interrupts while blocked */ + local_irq_restore(flags); + + /* halt until it's our turn and kicked. */ + halt(); + + local_irq_save(flags); +out: + cpumask_clear_cpu(cpu, &waiting_cpus); + w->lock = NULL; + local_irq_restore(flags); + spin_time_accum_blocked(start); +} +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning); + +/* Kick a cpu */ +static inline void kvm_kick_cpu(int cpu) +{ + kvm_hypercall1(KVM_HC_KICK_CPU, cpu); +} + +/* Kick vcpu waiting on @lock->head to reach value @ticket */ +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket) +{ + int cpu; + + add_stats(RELEASED_SLOW, 1); + + for_each_cpu(cpu, &waiting_cpus) { + const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu); + if (ACCESS_ONCE(w->lock) == lock && + ACCESS_ONCE(w->want) == ticket) { + add_stats(RELEASED_SLOW_KICKED, 1); + kvm_kick_cpu(cpu); + break; + } + } +} + +/* + * Setup pv_lock_ops to exploit KVM_FEATURE_KICK_VCPU if present. + */ +void __init kvm_spinlock_init(void) +{ + if (!kvm_para_available()) + return; + /* Does host kernel support KVM_FEATURE_KICK_VCPU? */ + if (!kvm_para_has_feature(KVM_FEATURE_KICK_VCPU)) + return; + + jump_label_inc(¶virt_ticketlocks_enabled); + + pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning); + pv_lock_ops.unlock_kick = kvm_unlock_kick; +} +#endif /* CONFIG_PARAVIRT_SPINLOCKS */ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8f4b6db..a89546b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1518,6 +1518,12 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) kvm_make_request(KVM_REQ_UNHALT, vcpu); break; } + if (vcpu->kicked) { + vcpu->kicked = 0; + barrier(); + kvm_make_request(KVM_REQ_UNHALT, vcpu); + break; + } if (kvm_cpu_has_pending_timer(vcpu)) break; if (signal_pending(current))
Avi Kivity
2011-Dec-01 11:11 UTC
[PATCH RFC V3 2/4] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
On 11/30/2011 10:59 AM, Raghavendra K T wrote:> Add a hypercall to KVM hypervisor to support pv-ticketlocks > > KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state. > > The presence of these hypercalls is indicated to guest via > KVM_FEATURE_KICK_VCPU/KVM_CAP_KICK_VCPU. > > Qemu needs a corresponding patch to pass up the presence of this feature to > guest via cpuid. Patch to qemu will be sent separately. > > There is no Xen/KVM hypercall interface to await kick from.The hypercall needs to be documented in Documentation/virtual/kvm/hypercalls.txt. Have you tested it on AMD machines? There are some differences in the hypercall infrastructure there.> /* This indicates that the new set of kvmclock msrs > * are available. The use of 0x11 and 0x12 is deprecated > */ > #define KVM_FEATURE_CLOCKSOURCE2 3 > #define KVM_FEATURE_ASYNC_PF 4 > #define KVM_FEATURE_STEAL_TIME 5 > +#define KVM_FEATURE_KICK_VCPU 6Documentation/virtual/kvm/cpuid.txt.> > /* The last 8 bits are used to indicate how to interpret the flags field > * in pvclock structure. If no bits are set, all flags are ignored. > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index c38efd7..6e1c8b4 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2103,6 +2103,7 @@ int kvm_dev_ioctl_check_extension(long ext) > case KVM_CAP_XSAVE: > case KVM_CAP_ASYNC_PF: > case KVM_CAP_GET_TSC_KHZ: > + case KVM_CAP_KICK_VCPU:This is redundant with the feature bit? In general, KVM_CAP is for the host API, while KVM_FEATURE is for the guest API.> > +/* > + * kvm_pv_kick_cpu_op: Kick a vcpu. > + * > + * @cpu - vcpu to be kicked. > + */ > +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int cpu) > +{ > + struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, cpu);There is no guarantee that guest cpu numbers match host vcpu numbers. Use APIC IDs, and kvm_apic_match_dest().> + struct kvm_mp_state mp_state; > + > + mp_state.mp_state = KVM_MP_STATE_RUNNABLE; > + if (vcpu) { > + vcpu->kicked = 1; > + /* Ensure kicked is always set before wakeup */ > + barrier(); > + } > + kvm_arch_vcpu_ioctl_set_mpstate(vcpu, &mp_state);This must only be called from the vcpu thread.> + kvm_vcpu_kick(vcpu); > +} > + > > struct kvm_vcpu_arch arch; > + > + /* > + * blocked vcpu wakes up by checking this flag set by unlocker. > + */ > + int kicked; >Write only variable. An alternative approach is to use an MSR protocol like x2apic ICR. -- error compiling committee.c: too many arguments to function
Marcelo Tosatti
2011-Dec-07 10:48 UTC
[PATCH RFC V3 2/4] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
On Wed, Nov 30, 2011 at 02:29:59PM +0530, Raghavendra K T wrote:> Add a hypercall to KVM hypervisor to support pv-ticketlocks > > KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state. > > The presence of these hypercalls is indicated to guest via > KVM_FEATURE_KICK_VCPU/KVM_CAP_KICK_VCPU. > > Qemu needs a corresponding patch to pass up the presence of this feature to > guest via cpuid. Patch to qemu will be sent separately. > > There is no Xen/KVM hypercall interface to await kick from. > > Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> > Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com> > Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> > --- > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h > index 734c376..8b1d65d 100644 > --- a/arch/x86/include/asm/kvm_para.h > +++ b/arch/x86/include/asm/kvm_para.h > @@ -16,12 +16,14 @@ > #define KVM_FEATURE_CLOCKSOURCE 0 > #define KVM_FEATURE_NOP_IO_DELAY 1 > #define KVM_FEATURE_MMU_OP 2 > + > /* This indicates that the new set of kvmclock msrs > * are available. The use of 0x11 and 0x12 is deprecated > */ > #define KVM_FEATURE_CLOCKSOURCE2 3 > #define KVM_FEATURE_ASYNC_PF 4 > #define KVM_FEATURE_STEAL_TIME 5 > +#define KVM_FEATURE_KICK_VCPU 6 > > /* The last 8 bits are used to indicate how to interpret the flags field > * in pvclock structure. If no bits are set, all flags are ignored. > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index c38efd7..6e1c8b4 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2103,6 +2103,7 @@ int kvm_dev_ioctl_check_extension(long ext) > case KVM_CAP_XSAVE: > case KVM_CAP_ASYNC_PF: > case KVM_CAP_GET_TSC_KHZ: > + case KVM_CAP_KICK_VCPU: > r = 1; > break; > case KVM_CAP_COALESCED_MMIO: > @@ -2577,7 +2578,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > (1 << KVM_FEATURE_NOP_IO_DELAY) | > (1 << KVM_FEATURE_CLOCKSOURCE2) | > (1 << KVM_FEATURE_ASYNC_PF) | > - (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); > + (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | > + (1 << KVM_FEATURE_KICK_VCPU); > > if (sched_info_on()) > entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); > @@ -5305,6 +5307,26 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) > return 1; > } > > +/* > + * kvm_pv_kick_cpu_op: Kick a vcpu. > + * > + * @cpu - vcpu to be kicked. > + */ > +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int cpu) > +{ > + struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, cpu); > + struct kvm_mp_state mp_state; > + > + mp_state.mp_state = KVM_MP_STATE_RUNNABLE;Since vcpu->mp_state is not protected by a lock, this is potentially racy. For example: CPU0 CPU1 kvm_pv_kick_cpu_op running vcpuN vcpuN->mp_state = KVM_MP_STATE_RUNNABLE; kvm_emulate_halt vcpuN->mp_state = KVM_MP_STATE_HALTED Is it harmless to lose a kick?
Reasonably Related Threads
- [PATCH RFC V3 0/4] kvm : Paravirt-spinlock support for KVM guests
- [PATCH RFC V5 0/6] kvm : Paravirt-spinlock support for KVM guests
- [PATCH RFC V5 0/6] kvm : Paravirt-spinlock support for KVM guests
- [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
- [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests