Raghavendra K T
2012-Mar-23 08:05 UTC
[PATCH RFC V5 0/6] kvm : Paravirt-spinlock support for KVM guests
The 6-patch series to follow this email extends KVM-hypervisor and Linux guest running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's implementation. One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick another vcpu out of halt state. The blocking of vcpu is done using halt() in (lock_spinning) slowpath. one MSR is added to aid live migration. Changes in V5: - rebased to 3.3-rc6 - added PV_UNHALT_MSR that would help in live migration (Avi) - removed PV_LOCK_KICK vcpu request and pv_unhalt flag (re)added. - Changed hypercall documentaion (Alex). - mode_t changed to umode_t in debugfs. - MSR related documentation added. - rename PV_LOCK_KICK to PV_UNHALT. - host and guest patches not mixed. (Marcelo, Alex) - kvm_kick_cpu now takes cpu so it can be used by flush_tlb_ipi_other paravirtualization (Nikunj) - coding style changes in variable declarion etc (Srikar) Changes in V4: - reabsed to 3.2.0 pre. - use APIC ID for kicking the vcpu and use kvm_apic_match_dest for matching (Avi) - fold vcpu->kicked flag into vcpu->requests (KVM_REQ_PVLOCK_KICK) and related changes for UNHALT path to make pv ticket spinlock migration friendly(Avi, Marcello) - Added Documentation for CPUID, Hypercall (KVM_HC_KICK_CPU) and capabilty (KVM_CAP_PVLOCK_KICK) (Avi) - Remove unneeded kvm_arch_vcpu_ioctl_set_mpstate call. (Marcello) - cumulative variable type changed (int ==> u32) in add_stat (Konrad) - remove unneeded kvm_guest_init for !CONFIG_KVM_GUEST case Changes in V3: - rebased to 3.2-rc1 - use halt() instead of wait for kick hypercall. - modify kick hyper call to do wakeup halted vcpu. - hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of head##.c). - fix the potential race when zero_stat is read. - export debugfs_create_32 and add documentation to API. - use static inline and enum instead of ADDSTAT macro. - add barrier() in after setting kick_vcpu. - empty static inline function for kvm_spinlock_init. - combine the patches one and two readuce overhead. - make KVM_DEBUGFS depends on DEBUGFS. - include debugfs header unconditionally. Changes in V2: - rebased patchesto -rc9 - synchronization related changes based on Jeremy's changes (Jeremy Fitzhardinge <jeremy.fitzhardinge at citrix.com>) pointed by Stephan Diestelhorst <stephan.diestelhorst at amd.com> - enabling 32 bit guests - splitted patches into two more chunks Test Set up : The BASE patch is 3.3.0-rc6 + jumplabel split patch (https://lkml.org/lkml/2012/2/21/167) + ticketlock cleanup patch (https://lkml.org/lkml/2012/3/21/161) Results: The performance gain is mainly because of reduced busy-wait time. From the results we can see that patched kernel performance is similar to BASE when there is no lock contention. But once we start seeing more contention, patched kernel outperforms BASE. 3 guests with 8VCPU, 8GB RAM, 1 used for kernbench (kernbench -f -H -M -o 20) other for cpuhog (shell script while true with an instruction) 1x: no hogs 2x: 8hogs in one guest 3x: 8hogs each in two guest 1) kernbench Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM BASE BASE+patch %improvement mean (sd) mean (sd) case 1x: 38.1033 (43.502) 38.09 (43.4269) 0.0349051 case 2x: 778.622 (1092.68) 129.342 (156.324) 83.3883 case 3x: 2399.11 (3548.32) 114.913 (139.5) 95.2102 2) pgbench: pgbench version: http://www.postgresql.org/ftp/snapshot/dev/ tool used for benchmarking: git://git.postgresql.org/git/pgbench-tools.git Ananlysis is done using ministat. Test is done for 1x overcommit to check overhead of pv spinlock. There is small performance penalty in non contention scenario (note BASE is jeremy's ticketlock). But with increase in number of threads, improvement is seen. guest: 64bit 8 vCPU and 8GB RAM shared buffer size = 2GB x base_kernel + patched_kernel N Min Max Median Avg Stddev +--------------------- NRCLIENT = 1 ----------------------------------------+ x 10 7468.0719 7774.0026 7529.9217 7594.9696 128.7725 + 10 7280.413 7650.6619 7425.7968 7434.9344 144.59127 Difference at 95.0% confidence -160.035 +/- 128.641 -2.10712% +/- 1.69376% +--------------------- NRCLIENT = 2 ----------------------------------------+ x 10 14604.344 14849.358 14725.845 14724.722 76.866294 + 10 14070.064 14246.013 14125.556 14138.169 60.556379 Difference at 95.0% confidence -586.553 +/- 65.014 -3.98346% +/- 0.441529% +--------------------- NRCLIENT = 4 ----------------------------------------+ x 10 27891.073 28305.466 28059.892 28060.231 115.65612 + 10 27237.685 27639.645 27297.79 27375.966 145.31006 Difference at 95.0% confidence -684.265 +/- 123.39 -2.43856% +/- 0.439734% +--------------------- NRCLIENT = 8 ----------------------------------------+ x 10 53063.509 53498.677 53343.24 53309.697 138.77983 + 10 51705.708 52208.274 52030.06 51987.067 156.65323 Difference at 95.0% confidence -1322.63 +/- 139.048 -2.48103% +/- 0.26083% +--------------------- NRCLIENT = 16 ---------------------------------------+ x 10 50043.347 52701.253 52235.978 51993.466 817.44911 + 10 51562.772 52272.412 51905.317 51946.557 228.54314 No difference proven at 95.0% confidence +--------------------- NRCLIENT = 32 --------------------------------------+ x 10 49178.789 51284.599 50288.185 50275.212 616.80154 + 10 50722.097 52145.041 51551.112 51512.423 469.18898 Difference at 95.0% confidence 1237.21 +/- 514.888 2.46088% +/- 1.02414% +--------------------------------------------------------------------------+ Let me know if you have any sugestion/comments... --- V4 kernel changes: https://lkml.org/lkml/2012/1/14/66 Qemu changes for V4: http://www.mail-archive.com/kvm at vger.kernel.org/msg66450.html V3 kernel Changes: https://lkml.org/lkml/2011/11/30/62 V2 kernel changes : https://lkml.org/lkml/2011/10/23/207 Previous discussions : (posted by Srivatsa V). https://lkml.org/lkml/2010/7/26/24 https://lkml.org/lkml/2011/1/19/212 Qemu patch for V3: http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg00397.html Srivatsa Vaddagiri, Suzuki Poulose, Raghavendra K T (6): Add debugfs support to print u32-arrays in debugfs Add a hypercall to KVM hypervisor to support pv-ticketlocks Add unhalt msr to aid migration Added configuration support to enable debug information for KVM Guests pv-ticketlock support for linux guests running on KVM hypervisor Add documentation on Hypercalls and features used for PV spinlock Documentation/virtual/kvm/api.txt | 7 + Documentation/virtual/kvm/cpuid.txt | 4 + Documentation/virtual/kvm/hypercalls.txt | 59 +++++++ Documentation/virtual/kvm/msr.txt | 9 + arch/x86/Kconfig | 9 + arch/x86/include/asm/kvm_para.h | 18 ++- arch/x86/kernel/kvm.c | 254 ++++++++++++++++++++++++++++++ arch/x86/kvm/cpuid.c | 3 +- arch/x86/kvm/x86.c | 40 +++++- arch/x86/xen/debugfs.c | 104 ------------ arch/x86/xen/debugfs.h | 4 - arch/x86/xen/spinlock.c | 2 +- fs/debugfs/file.c | 128 +++++++++++++++ include/linux/debugfs.h | 11 ++ include/linux/kvm.h | 1 + include/linux/kvm_host.h | 1 + include/linux/kvm_para.h | 1 + virt/kvm/kvm_main.c | 4 + 18 files changed, 545 insertions(+), 114 deletions(-)
Raghavendra K T
2012-Mar-23 08:06 UTC
[PATCH RFC V5 1/6] debugfs: Add support to print u32 array in debugfs
From: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> Move the code from Xen to debugfs to make the code common for other users as well. Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com> --- diff --git a/arch/x86/xen/debugfs.c b/arch/x86/xen/debugfs.c index ef1db19..c8377fb 100644 --- a/arch/x86/xen/debugfs.c +++ b/arch/x86/xen/debugfs.c @@ -19,107 +19,3 @@ struct dentry * __init xen_init_debugfs(void) return d_xen_debug; } -struct array_data -{ - void *array; - unsigned elements; -}; - -static int u32_array_open(struct inode *inode, struct file *file) -{ - file->private_data = NULL; - return nonseekable_open(inode, file); -} - -static size_t format_array(char *buf, size_t bufsize, const char *fmt, - u32 *array, unsigned array_size) -{ - size_t ret = 0; - unsigned i; - - for(i = 0; i < array_size; i++) { - size_t len; - - len = snprintf(buf, bufsize, fmt, array[i]); - len++; /* ' ' or '\n' */ - ret += len; - - if (buf) { - buf += len; - bufsize -= len; - buf[-1] = (i == array_size-1) ? '\n' : ' '; - } - } - - ret++; /* \0 */ - if (buf) - *buf = '\0'; - - return ret; -} - -static char *format_array_alloc(const char *fmt, u32 *array, unsigned array_size) -{ - size_t len = format_array(NULL, 0, fmt, array, array_size); - char *ret; - - ret = kmalloc(len, GFP_KERNEL); - if (ret == NULL) - return NULL; - - format_array(ret, len, fmt, array, array_size); - return ret; -} - -static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len, - loff_t *ppos) -{ - struct inode *inode = file->f_path.dentry->d_inode; - struct array_data *data = inode->i_private; - size_t size; - - if (*ppos == 0) { - if (file->private_data) { - kfree(file->private_data); - file->private_data = NULL; - } - - file->private_data = format_array_alloc("%u", data->array, data->elements); - } - - size = 0; - if (file->private_data) - size = strlen(file->private_data); - - return simple_read_from_buffer(buf, len, ppos, file->private_data, size); -} - -static int xen_array_release(struct inode *inode, struct file *file) -{ - kfree(file->private_data); - - return 0; -} - -static const struct file_operations u32_array_fops = { - .owner = THIS_MODULE, - .open = u32_array_open, - .release= xen_array_release, - .read = u32_array_read, - .llseek = no_llseek, -}; - -struct dentry *xen_debugfs_create_u32_array(const char *name, umode_t mode, - struct dentry *parent, - u32 *array, unsigned elements) -{ - struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL); - - if (data == NULL) - return NULL; - - data->array = array; - data->elements = elements; - - return debugfs_create_file(name, mode, parent, data, &u32_array_fops); -} diff --git a/arch/x86/xen/debugfs.h b/arch/x86/xen/debugfs.h index 78d2549..12ebf33 100644 --- a/arch/x86/xen/debugfs.h +++ b/arch/x86/xen/debugfs.h @@ -3,8 +3,4 @@ struct dentry * __init xen_init_debugfs(void); -struct dentry *xen_debugfs_create_u32_array(const char *name, umode_t mode, - struct dentry *parent, - u32 *array, unsigned elements); - #endif /* _XEN_DEBUGFS_H */ diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 4926974..b74cebb 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -314,7 +314,7 @@ static int __init xen_spinlock_debugfs(void) debugfs_create_u64("time_blocked", 0444, d_spin_debug, &spinlock_stats.time_blocked); - xen_debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug, + debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug, spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1); return 0; diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c index ef023ee..cb6cff3 100644 --- a/fs/debugfs/file.c +++ b/fs/debugfs/file.c @@ -20,6 +20,7 @@ #include <linux/namei.h> #include <linux/debugfs.h> #include <linux/io.h> +#include <linux/slab.h> static ssize_t default_read_file(struct file *file, char __user *buf, size_t count, loff_t *ppos) @@ -528,6 +529,133 @@ struct dentry *debugfs_create_blob(const char *name, umode_t mode, } EXPORT_SYMBOL_GPL(debugfs_create_blob); +struct array_data { + void *array; + u32 elements; +}; + +static int u32_array_open(struct inode *inode, struct file *file) +{ + file->private_data = NULL; + return nonseekable_open(inode, file); +} + +static size_t format_array(char *buf, size_t bufsize, const char *fmt, + u32 *array, u32 array_size) +{ + size_t ret = 0; + u32 i; + + for (i = 0; i < array_size; i++) { + size_t len; + + len = snprintf(buf, bufsize, fmt, array[i]); + len++; /* ' ' or '\n' */ + ret += len; + + if (buf) { + buf += len; + bufsize -= len; + buf[-1] = (i == array_size-1) ? '\n' : ' '; + } + } + + ret++; /* \0 */ + if (buf) + *buf = '\0'; + + return ret; +} + +static char *format_array_alloc(const char *fmt, u32 *array, + u32 array_size) +{ + size_t len = format_array(NULL, 0, fmt, array, array_size); + char *ret; + + ret = kmalloc(len, GFP_KERNEL); + if (ret == NULL) + return NULL; + + format_array(ret, len, fmt, array, array_size); + return ret; +} + +static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len, + loff_t *ppos) +{ + struct inode *inode = file->f_path.dentry->d_inode; + struct array_data *data = inode->i_private; + size_t size; + + if (*ppos == 0) { + if (file->private_data) { + kfree(file->private_data); + file->private_data = NULL; + } + + file->private_data = format_array_alloc("%u", data->array, + data->elements); + } + + size = 0; + if (file->private_data) + size = strlen(file->private_data); + + return simple_read_from_buffer(buf, len, ppos, + file->private_data, size); +} + +static int u32_array_release(struct inode *inode, struct file *file) +{ + kfree(file->private_data); + + return 0; +} + +static const struct file_operations u32_array_fops = { + .owner = THIS_MODULE, + .open = u32_array_open, + .release = u32_array_release, + .read = u32_array_read, + .llseek = no_llseek, +}; + +/** + * debugfs_create_u32_array - create a debugfs file that is used to read u32 + * array. + * @name: a pointer to a string containing the name of the file to create. + * @mode: the permission that the file should have. + * @parent: a pointer to the parent dentry for this file. This should be a + * directory dentry if set. If this parameter is %NULL, then the + * file will be created in the root of the debugfs filesystem. + * @array: u32 array that provides data. + * @elements: total number of elements in the array. + * + * This function creates a file in debugfs with the given name that exports + * @array as data. If the @mode variable is so set it can be read from. + * Writing is not supported. Seek within the file is also not supported. + * Once array is created its size can not be changed. + * + * The function returns a pointer to dentry on success. If debugfs is not + * enabled in the kernel, the value -%ENODEV will be returned. + */ +struct dentry *debugfs_create_u32_array(const char *name, umode_t mode, + struct dentry *parent, + u32 *array, u32 elements) +{ + struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL); + + if (data == NULL) + return NULL; + + data->array = array; + data->elements = elements; + + return debugfs_create_file(name, mode, parent, data, &u32_array_fops); +} +EXPORT_SYMBOL_GPL(debugfs_create_u32_array); + #ifdef CONFIG_HAS_IOMEM /* diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h index 6169c26..5cb4435 100644 --- a/include/linux/debugfs.h +++ b/include/linux/debugfs.h @@ -93,6 +93,10 @@ struct dentry *debugfs_create_regset32(const char *name, mode_t mode, int debugfs_print_regs32(struct seq_file *s, const struct debugfs_reg32 *regs, int nregs, void __iomem *base, char *prefix); +struct dentry *debugfs_create_u32_array(const char *name, umode_t mode, + struct dentry *parent, + u32 *array, u32 elements); + bool debugfs_initialized(void); #else @@ -219,6 +223,13 @@ static inline bool debugfs_initialized(void) return false; } +struct dentry *debugfs_create_u32_array(const char *name, umode_t mode, + struct dentry *parent, + u32 *array, u32 elements) +{ + return ERR_PTR(-ENODEV); +} + #endif #endif
Raghavendra K T
2012-Mar-23 08:07 UTC
[PATCH RFC V5 2/6] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
From: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state. The presence of these hypercalls is indicated to guest via KVM_FEATURE_PV_UNHALT/KVM_CAP_PV_UNHALT. Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> --- diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 734c376..9234f13 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -16,12 +16,14 @@ #define KVM_FEATURE_CLOCKSOURCE 0 #define KVM_FEATURE_NOP_IO_DELAY 1 #define KVM_FEATURE_MMU_OP 2 + /* This indicates that the new set of kvmclock msrs * are available. The use of 0x11 and 0x12 is deprecated */ #define KVM_FEATURE_CLOCKSOURCE2 3 #define KVM_FEATURE_ASYNC_PF 4 #define KVM_FEATURE_STEAL_TIME 5 +#define KVM_FEATURE_PV_UNHALT 6 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -32,6 +34,7 @@ #define MSR_KVM_SYSTEM_TIME 0x12 #define KVM_MSR_ENABLED 1 + /* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */ #define MSR_KVM_WALL_CLOCK_NEW 0x4b564d00 #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01 diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 89b02bf..61388b9 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -408,7 +408,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, (1 << KVM_FEATURE_NOP_IO_DELAY) | (1 << KVM_FEATURE_CLOCKSOURCE2) | (1 << KVM_FEATURE_ASYNC_PF) | - (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); + (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | + (1 << KVM_FEATURE_PV_UNHALT); if (sched_info_on()) entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9cbfc06..bd5ef91 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2079,6 +2079,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_XSAVE: case KVM_CAP_ASYNC_PF: case KVM_CAP_GET_TSC_KHZ: + case KVM_CAP_PV_UNHALT: r = 1; break; case KVM_CAP_COALESCED_MMIO: @@ -4913,6 +4914,30 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) return 1; } +/* + * kvm_pv_kick_cpu_op: Kick a vcpu. + * + * @apicid - apicid of vcpu to be kicked. + */ +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid) +{ + struct kvm_vcpu *vcpu = NULL; + int i; + + kvm_for_each_vcpu(i, vcpu, kvm) { + if (!kvm_apic_present(vcpu)) + continue; + + if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0)) + break; + } + if (vcpu) { + vcpu->pv_unhalted = 1; + smp_mb(); + kvm_vcpu_kick(vcpu); + } +} + int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) { unsigned long nr, a0, a1, a2, a3, ret; @@ -4946,6 +4971,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) case KVM_HC_VAPIC_POLL_IRQ: ret = 0; break; + case KVM_HC_KICK_CPU: + kvm_pv_kick_cpu_op(vcpu->kvm, a0); + ret = 0; + break; default: ret = -KVM_ENOSYS; break; @@ -6174,6 +6203,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) !vcpu->arch.apf.halted) || !list_empty_careful(&vcpu->async_pf.done) || vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED + || vcpu->pv_unhalted || atomic_read(&vcpu->arch.nmi_queued) || (kvm_arch_interrupt_allowed(vcpu) && kvm_cpu_has_interrupt(vcpu)); diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 68e67e5..e822d96 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo { #define KVM_CAP_PPC_PAPR 68 #define KVM_CAP_S390_GMAP 71 #define KVM_CAP_TSC_DEADLINE_TIMER 72 +#define KVM_CAP_PV_UNHALT 73 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 900c763..433ae97 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -158,6 +158,7 @@ struct kvm_vcpu { #endif struct kvm_vcpu_arch arch; + int pv_unhalted; }; static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu) diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h index ff476dd..38226e1 100644 --- a/include/linux/kvm_para.h +++ b/include/linux/kvm_para.h @@ -19,6 +19,7 @@ #define KVM_HC_MMU_OP 2 #define KVM_HC_FEATURES 3 #define KVM_HC_PPC_MAP_MAGIC_PAGE 4 +#define KVM_HC_KICK_CPU 5 /* * hypercalls use architecture specific diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index a91f980..d3b98b1 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -226,6 +226,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) vcpu->kvm = kvm; vcpu->vcpu_id = id; vcpu->pid = NULL; + vcpu->pv_unhalted = 0; init_waitqueue_head(&vcpu->wq); kvm_async_pf_vcpu_init(vcpu); @@ -1567,6 +1568,9 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); if (kvm_arch_vcpu_runnable(vcpu)) { + vcpu->pv_unhalted = 0; + /* preventing reordering should be enough here */ + barrier(); kvm_make_request(KVM_REQ_UNHALT, vcpu); break; }
Raghavendra K T
2012-Mar-23 08:07 UTC
[PATCH RFC V5 3/6] kvm : Add unhalt msr to aid (live) migration
From: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> Currently guest does not need to know pv_unhalt state and intended to be used via GET/SET_MSR ioctls during migration. Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> --- diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 9234f13..46f9751 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -40,6 +40,7 @@ #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 #define MSR_KVM_STEAL_TIME 0x4b564d03 +#define MSR_KVM_PV_UNHALT 0x4b564d04 struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bd5ef91..38e6c47 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -784,12 +784,13 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc); * kvm-specific. Those are put in the beginning of the list. */ -#define KVM_SAVE_MSRS_BEGIN 9 +#define KVM_SAVE_MSRS_BEGIN 10 static u32 msrs_to_save[] = { MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME, + MSR_KVM_PV_UNHALT, MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, MSR_STAR, #ifdef CONFIG_X86_64 @@ -1606,7 +1607,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu); break; - + case MSR_KVM_PV_UNHALT: + vcpu->pv_unhalted = (u32) data; + break; case MSR_IA32_MCG_CTL: case MSR_IA32_MCG_STATUS: case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1: @@ -1917,6 +1920,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_KVM_STEAL_TIME: data = vcpu->arch.st.msr_val; break; + case MSR_KVM_PV_UNHALT: + data = (u64)vcpu->pv_unhalted; + break; case MSR_IA32_P5_MC_ADDR: case MSR_IA32_P5_MC_TYPE: case MSR_IA32_MCG_CAP:
Raghavendra K T
2012-Mar-23 08:08 UTC
[PATCH RFC V5 4/6] kvm guest : Added configuration support to enable debug information for KVM Guests
From: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> --- diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 10c28ec..a4530bd 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -600,6 +600,15 @@ config KVM_GUEST This option enables various optimizations for running under the KVM hypervisor. +config KVM_DEBUG_FS + bool "Enable debug information for KVM Guests in debugfs" + depends on KVM_GUEST && DEBUG_FS + default n + ---help--- + This option enables collection of various statistics for KVM guest. + Statistics are displayed in debugfs filesystem. Enabling this option + may incur significant overhead. + source "arch/x86/lguest/Kconfig" config PARAVIRT
Raghavendra K T
2012-Mar-23 08:08 UTC
[PATCH RFC V5 5/6] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
From: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> During smp_boot_cpus paravirtualied KVM guest detects if the hypervisor has required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so, support for pv-ticketlocks is registered via pv_lock_ops. Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu. Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> --- diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 46f9751..2888c45 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -197,10 +197,20 @@ void kvm_async_pf_task_wait(u32 token); void kvm_async_pf_task_wake(u32 token); u32 kvm_read_and_reset_pf_reason(void); extern void kvm_disable_steal_time(void); -#else -#define kvm_guest_init() do { } while (0) + +#ifdef CONFIG_PARAVIRT_SPINLOCKS +void __init kvm_spinlock_init(void); +#else /* !CONFIG_PARAVIRT_SPINLOCKS */ +static void kvm_spinlock_init(void) +{ +} +#endif /* CONFIG_PARAVIRT_SPINLOCKS */ + +#else /* CONFIG_KVM_GUEST */ +#define kvm_guest_init() do {} while (0) #define kvm_async_pf_task_wait(T) do {} while(0) #define kvm_async_pf_task_wake(T) do {} while(0) + static inline u32 kvm_read_and_reset_pf_reason(void) { return 0; diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index f0c6fd6..c535422 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -33,6 +33,7 @@ #include <linux/sched.h> #include <linux/slab.h> #include <linux/kprobes.h> +#include <linux/debugfs.h> #include <asm/timer.h> #include <asm/cpu.h> #include <asm/traps.h> @@ -364,6 +365,7 @@ static void __init kvm_smp_prepare_boot_cpu(void) #endif kvm_guest_cpu_init(); native_smp_prepare_boot_cpu(); + kvm_spinlock_init(); } static void __cpuinit kvm_guest_cpu_online(void *dummy) @@ -446,3 +448,255 @@ static __init int activate_jump_labels(void) return 0; } arch_initcall(activate_jump_labels); + +#ifdef CONFIG_PARAVIRT_SPINLOCKS + +enum kvm_contention_stat { + TAKEN_SLOW, + TAKEN_SLOW_PICKUP, + RELEASED_SLOW, + RELEASED_SLOW_KICKED, + NR_CONTENTION_STATS +}; + +#ifdef CONFIG_KVM_DEBUG_FS +#define HISTO_BUCKETS 30 + +static struct kvm_spinlock_stats +{ + u32 contention_stats[NR_CONTENTION_STATS]; + u32 histo_spin_blocked[HISTO_BUCKETS+1]; + u64 time_blocked; +} spinlock_stats; + +static u8 zero_stats; + +static inline void check_zero(void) +{ + u8 ret; + u8 old; + + old = ACCESS_ONCE(zero_stats); + if (unlikely(old)) { + ret = cmpxchg(&zero_stats, old, 0); + /* This ensures only one fellow resets the stat */ + if (ret == old) + memset(&spinlock_stats, 0, sizeof(spinlock_stats)); + } +} + +static inline void add_stats(enum kvm_contention_stat var, u32 val) +{ + check_zero(); + spinlock_stats.contention_stats[var] += val; +} + + +static inline u64 spin_time_start(void) +{ + return sched_clock(); +} + +static void __spin_time_accum(u64 delta, u32 *array) +{ + unsigned index; + + index = ilog2(delta); + check_zero(); + + if (index < HISTO_BUCKETS) + array[index]++; + else + array[HISTO_BUCKETS]++; +} + +static inline void spin_time_accum_blocked(u64 start) +{ + u32 delta; + + delta = sched_clock() - start; + __spin_time_accum(delta, spinlock_stats.histo_spin_blocked); + spinlock_stats.time_blocked += delta; +} + +static struct dentry *d_spin_debug; +static struct dentry *d_kvm_debug; + +struct dentry *kvm_init_debugfs(void) +{ + d_kvm_debug = debugfs_create_dir("kvm", NULL); + if (!d_kvm_debug) + printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n"); + + return d_kvm_debug; +} + +static int __init kvm_spinlock_debugfs(void) +{ + struct dentry *d_kvm; + + d_kvm = kvm_init_debugfs(); + if (d_kvm == NULL) + return -ENOMEM; + + d_spin_debug = debugfs_create_dir("spinlocks", d_kvm); + + debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats); + + debugfs_create_u32("taken_slow", 0444, d_spin_debug, + &spinlock_stats.contention_stats[TAKEN_SLOW]); + debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug, + &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]); + + debugfs_create_u32("released_slow", 0444, d_spin_debug, + &spinlock_stats.contention_stats[RELEASED_SLOW]); + debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug, + &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]); + + debugfs_create_u64("time_blocked", 0444, d_spin_debug, + &spinlock_stats.time_blocked); + + debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug, + spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1); + + return 0; +} +fs_initcall(kvm_spinlock_debugfs); +#else /* !CONFIG_KVM_DEBUG_FS */ +#define TIMEOUT (1 << 10) +static inline void add_stats(enum kvm_contention_stat var, u32 val) +{ +} + +static inline u64 spin_time_start(void) +{ + return 0; +} + +static inline void spin_time_accum_blocked(u64 start) +{ +} +#endif /* CONFIG_KVM_DEBUG_FS */ + +struct kvm_lock_waiting { + struct arch_spinlock *lock; + __ticket_t want; +}; + +/* cpus 'waiting' on a spinlock to become available */ +static cpumask_t waiting_cpus; + +/* Track spinlock on which a cpu is waiting */ +static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting); + +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want) +{ + struct kvm_lock_waiting *w; + int cpu; + u64 start; + unsigned long flags; + + w = &__get_cpu_var(lock_waiting); + cpu = smp_processor_id(); + start = spin_time_start(); + + /* + * Make sure an interrupt handler can't upset things in a + * partially setup state. + */ + local_irq_save(flags); + + /* + * The ordering protocol on this is that the "lock" pointer + * may only be set non-NULL if the "want" ticket is correct. + * If we're updating "want", we must first clear "lock". + */ + w->lock = NULL; + smp_wmb(); + w->want = want; + smp_wmb(); + w->lock = lock; + + add_stats(TAKEN_SLOW, 1); + + /* + * This uses set_bit, which is atomic but we should not rely on its + * reordering gurantees. So barrier is needed after this call. + */ + cpumask_set_cpu(cpu, &waiting_cpus); + + barrier(); + + /* + * Mark entry to slowpath before doing the pickup test to make + * sure we don't deadlock with an unlocker. + */ + __ticket_enter_slowpath(lock); + + /* + * check again make sure it didn't become free while + * we weren't looking. + */ + if (ACCESS_ONCE(lock->tickets.head) == want) { + add_stats(TAKEN_SLOW_PICKUP, 1); + goto out; + } + + /* Allow interrupts while blocked */ + local_irq_restore(flags); + + /* halt until it's our turn and kicked. */ + halt(); + + local_irq_save(flags); +out: + cpumask_clear_cpu(cpu, &waiting_cpus); + w->lock = NULL; + local_irq_restore(flags); + spin_time_accum_blocked(start); +} +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning); + +/* Kick a cpu by its apicid*/ +static inline void kvm_kick_cpu(int cpu) +{ + int apicid; + + apicid = per_cpu(x86_cpu_to_apicid, cpu); + kvm_hypercall1(KVM_HC_KICK_CPU, apicid); +} + +/* Kick vcpu waiting on @lock->head to reach value @ticket */ +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket) +{ + int cpu; + + add_stats(RELEASED_SLOW, 1); + for_each_cpu(cpu, &waiting_cpus) { + const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu); + if (ACCESS_ONCE(w->lock) == lock && + ACCESS_ONCE(w->want) == ticket) { + add_stats(RELEASED_SLOW_KICKED, 1); + kvm_kick_cpu(cpu); + break; + } + } +} + +/* + * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present. + */ +void __init kvm_spinlock_init(void) +{ + if (!kvm_para_available()) + return; + /* Does host kernel support KVM_FEATURE_PV_UNHALT? */ + if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT)) + return; + + jump_label_inc(¶virt_ticketlocks_enabled); + + pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning); + pv_lock_ops.unlock_kick = kvm_unlock_kick; +} +#endif /* CONFIG_PARAVIRT_SPINLOCKS */
Raghavendra K T
2012-Mar-23 08:08 UTC
[PATCH RFC V5 6/6] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
From: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> KVM_HC_KICK_CPU hypercall added to wakeup halted vcpu in paravirtual spinlock enabled guest. KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be enabled in guest. support in host is queried via ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PV_UNHALT) Thanks Alex for KVM_HC_FEATURES inputs and Vatsa for rewriting KVM_HC_KICK_CPU Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> --- diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index e1d94bf..cf8bf3b 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1109,6 +1109,13 @@ support. Instead it is reported via if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the feature in userspace, then you can enable the feature for KVM_SET_CPUID2. +Paravirtualized ticket spinlocks can be enabled in guest by checking whether +support exists in host via, + + ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PV_UNHALT) + +if this call return true, guest can use the feature. + 4.47 KVM_PPC_GET_PVINFO Capability: KVM_CAP_PPC_GET_PVINFO diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 8820685..062dff9 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -39,6 +39,10 @@ KVM_FEATURE_CLOCKSOURCE2 || 3 || kvmclock available at msrs KVM_FEATURE_ASYNC_PF || 4 || async pf can be enabled by || || writing to msr 0x4b564d02 ------------------------------------------------------------------------------ +KVM_FEATURE_PV_UNHALT || 6 || guest checks this feature bit + || || before enabling paravirtualized + || || spinlock support. +------------------------------------------------------------------------------ KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt new file mode 100644 index 0000000..c9e8b9c --- /dev/null +++ b/Documentation/virtual/kvm/hypercalls.txt @@ -0,0 +1,59 @@ +KVM Hypercalls Documentation +==========================+Template for documentation is +The documenenation for hypercalls should inlcude +1. Hypercall name, value. +2. Architecture(s) +3. status (deprecated, obsolete, active) +4. Purpose + + +1. KVM_HC_VAPIC_POLL_IRQ +------------------------ +value: 1 +Architecture: x86 +Purpose: + +2. KVM_HC_MMU_OP +------------------------ +value: 2 +Architecture: x86 +status: deprecated. +Purpose: Support MMU operations such as writing to PTE, +flushing TLB, release PT. + +3. KVM_HC_FEATURES +------------------------ +value: 3 +Architecture: PPC +Purpose: Expose hypercall availability to the guest. On x86 platforms, cpuid +used to enumerate which hypercalls are available. On PPC, either device tree +based lookup ( which is also what EPAPR dictates) OR KVM specific enumeration +mechanism (which is this hypercall) can be used. + +4. KVM_HC_PPC_MAP_MAGIC_PAGE +------------------------ +value: 4 +Architecture: PPC +Purpose: To enable communication between the hypervisor and guest there is a +shared page that contains parts of supervisor visible register state. +The guest can map this shared page to access its supervisor register through +memory using this hypercall. + +5. KVM_HC_KICK_CPU +------------------------ +value: 5 +Architecture: x86 +Purpose: Hypercall used to wakeup a vcpu from HLT state + +Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest +kernel mode for an event to occur (ex: a spinlock to become available) can +execute HLT instruction once it has busy-waited for more than a threshold +time-interval. Execution of HLT instruction would cause the hypervisor to put +the vcpu to sleep until occurence of an appropriate event. Another vcpu of the +same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall, +specifying APIC ID of the vcpu to be wokenup. + +TODO: +1. more information on input and output needed? +2. Add more detail to purpose of hypercalls. diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt index 5031780..a7662d3 100644 --- a/Documentation/virtual/kvm/msr.txt +++ b/Documentation/virtual/kvm/msr.txt @@ -219,3 +219,12 @@ MSR_KVM_STEAL_TIME: 0x4b564d03 steal: the amount of time in which this vCPU did not run, in nanoseconds. Time during which the vcpu is idle, will not be reported as steal time. + +MSR_KVM_PV_UNHALT: 0x4b564d04 + data: 32 bit flag indicating the paravirtual unhalt state of the VCPU. + This data is not expected to reside in guest memory. The unhalt state + indicates that corresponding VCPU (halted for some reason) is ready for + unhalt operation. + + This data is expected to be filled only via ioctl. This is needed for + live migration of virtual machine.
Raghavendra K T
2012-Mar-28 18:32 UTC
[PATCH RFC V5 0/6] kvm : Paravirt-spinlock support for KVM guests
On 03/23/2012 01:35 PM, Raghavendra K T wrote:> The 6-patch series to follow this email extends KVM-hypervisor and Linux guest > running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's > implementation. > > One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick > another vcpu out of halt state. > The blocking of vcpu is done using halt() in (lock_spinning) slowpath. > one MSR is added to aid live migration. > > Changes in V5: > - rebased to 3.3-rc6 > - added PV_UNHALT_MSR that would help in live migration (Avi) > - removed PV_LOCK_KICK vcpu request and pv_unhalt flag (re)added.Sorry for pinging I know it is busy time. But I hope to get response on these patches in your free time, so that I can target next merge window for this. (whether it has reached some good state or it is heading in reverse direction!). it would really boost my morale. especially MSR stuff and dropping vcpu request bit for PV unhalt. - Raghu
Marcelo Tosatti
2012-Apr-12 00:06 UTC
[PATCH RFC V5 2/6] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
On Fri, Mar 23, 2012 at 01:37:04PM +0530, Raghavendra K T wrote:> From: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> > > KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state. > > The presence of these hypercalls is indicated to guest via > KVM_FEATURE_PV_UNHALT/KVM_CAP_PV_UNHALT. > > Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> > Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com> > Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> > --- > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h > index 734c376..9234f13 100644 > --- a/arch/x86/include/asm/kvm_para.h > +++ b/arch/x86/include/asm/kvm_para.h > @@ -16,12 +16,14 @@ > #define KVM_FEATURE_CLOCKSOURCE 0 > #define KVM_FEATURE_NOP_IO_DELAY 1 > #define KVM_FEATURE_MMU_OP 2 > + > /* This indicates that the new set of kvmclock msrs > * are available. The use of 0x11 and 0x12 is deprecated > */ > #define KVM_FEATURE_CLOCKSOURCE2 3 > #define KVM_FEATURE_ASYNC_PF 4 > #define KVM_FEATURE_STEAL_TIME 5 > +#define KVM_FEATURE_PV_UNHALT 6 > > /* The last 8 bits are used to indicate how to interpret the flags field > * in pvclock structure. If no bits are set, all flags are ignored. > @@ -32,6 +34,7 @@ > #define MSR_KVM_SYSTEM_TIME 0x12 > > #define KVM_MSR_ENABLED 1 > + > /* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */ > #define MSR_KVM_WALL_CLOCK_NEW 0x4b564d00 > #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01 > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 89b02bf..61388b9 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -408,7 +408,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > (1 << KVM_FEATURE_NOP_IO_DELAY) | > (1 << KVM_FEATURE_CLOCKSOURCE2) | > (1 << KVM_FEATURE_ASYNC_PF) | > - (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); > + (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | > + (1 << KVM_FEATURE_PV_UNHALT); > > if (sched_info_on()) > entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 9cbfc06..bd5ef91 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2079,6 +2079,7 @@ int kvm_dev_ioctl_check_extension(long ext) > case KVM_CAP_XSAVE: > case KVM_CAP_ASYNC_PF: > case KVM_CAP_GET_TSC_KHZ: > + case KVM_CAP_PV_UNHALT: > r = 1; > break; > case KVM_CAP_COALESCED_MMIO: > @@ -4913,6 +4914,30 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) > return 1; > } > > +/* > + * kvm_pv_kick_cpu_op: Kick a vcpu. > + * > + * @apicid - apicid of vcpu to be kicked. > + */ > +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid) > +{ > + struct kvm_vcpu *vcpu = NULL; > + int i; > + > + kvm_for_each_vcpu(i, vcpu, kvm) { > + if (!kvm_apic_present(vcpu)) > + continue; > + > + if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0)) > + break; > + } > + if (vcpu) { > + vcpu->pv_unhalted = 1; > + smp_mb(); > + kvm_vcpu_kick(vcpu); > + } > +} > + > int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) > { > unsigned long nr, a0, a1, a2, a3, ret; > @@ -4946,6 +4971,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) > case KVM_HC_VAPIC_POLL_IRQ: > ret = 0; > break; > + case KVM_HC_KICK_CPU: > + kvm_pv_kick_cpu_op(vcpu->kvm, a0); > + ret = 0; > + break; > default: > ret = -KVM_ENOSYS; > break; > @@ -6174,6 +6203,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) > !vcpu->arch.apf.halted) > || !list_empty_careful(&vcpu->async_pf.done) > || vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED > + || vcpu->pv_unhalted > || atomic_read(&vcpu->arch.nmi_queued) || > (kvm_arch_interrupt_allowed(vcpu) && > kvm_cpu_has_interrupt(vcpu)); > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > index 68e67e5..e822d96 100644 > --- a/include/linux/kvm.h > +++ b/include/linux/kvm.h > @@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo { > #define KVM_CAP_PPC_PAPR 68 > #define KVM_CAP_S390_GMAP 71 > #define KVM_CAP_TSC_DEADLINE_TIMER 72 > +#define KVM_CAP_PV_UNHALT 73 > > #ifdef KVM_CAP_IRQ_ROUTING > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 900c763..433ae97 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -158,6 +158,7 @@ struct kvm_vcpu { > #endif > > struct kvm_vcpu_arch arch; > + int pv_unhalted; > }; > > static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu) > diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h > index ff476dd..38226e1 100644 > --- a/include/linux/kvm_para.h > +++ b/include/linux/kvm_para.h > @@ -19,6 +19,7 @@ > #define KVM_HC_MMU_OP 2 > #define KVM_HC_FEATURES 3 > #define KVM_HC_PPC_MAP_MAGIC_PAGE 4 > +#define KVM_HC_KICK_CPU 5 > > /* > * hypercalls use architecture specific > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index a91f980..d3b98b1 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -226,6 +226,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) > vcpu->kvm = kvm; > vcpu->vcpu_id = id; > vcpu->pid = NULL; > + vcpu->pv_unhalted = 0; > init_waitqueue_head(&vcpu->wq); > kvm_async_pf_vcpu_init(vcpu); > > @@ -1567,6 +1568,9 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) > prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); > > if (kvm_arch_vcpu_runnable(vcpu)) { > + vcpu->pv_unhalted = 0; > + /* preventing reordering should be enough here */ > + barrier();Is it always OK to erase the notification, even in case an unrelated event such as interrupt was the source of wakeup? It would be easier to verify that notifications are not lost with atomic test_and_clear(pv_unhalted). Also x86 specific code should remain in arch/x86/kvm/
Marcelo Tosatti
2012-Apr-12 00:15 UTC
[PATCH RFC V5 3/6] kvm : Add unhalt msr to aid (live) migration
On Fri, Mar 23, 2012 at 01:37:26PM +0530, Raghavendra K T wrote:> From: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> > > Currently guest does not need to know pv_unhalt state and intended to be > used via GET/SET_MSR ioctls during migration. > > Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> > --- > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h > index 9234f13..46f9751 100644 > --- a/arch/x86/include/asm/kvm_para.h > +++ b/arch/x86/include/asm/kvm_para.h > @@ -40,6 +40,7 @@ > #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01 > #define MSR_KVM_ASYNC_PF_EN 0x4b564d02 > #define MSR_KVM_STEAL_TIME 0x4b564d03 > +#define MSR_KVM_PV_UNHALT 0x4b564d04 > > struct kvm_steal_time { > __u64 steal; > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index bd5ef91..38e6c47 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -784,12 +784,13 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc); > * kvm-specific. Those are put in the beginning of the list. > */ > > -#define KVM_SAVE_MSRS_BEGIN 9 > +#define KVM_SAVE_MSRS_BEGIN 10 > static u32 msrs_to_save[] = { > MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, > MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, > HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, > HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME, > + MSR_KVM_PV_UNHALT, > MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, > MSR_STAR, > #ifdef CONFIG_X86_64 > @@ -1606,7 +1607,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) > kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu); > > break; > - > + case MSR_KVM_PV_UNHALT: > + vcpu->pv_unhalted = (u32) data; > + break; > case MSR_IA32_MCG_CTL: > case MSR_IA32_MCG_STATUS: > case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1: > @@ -1917,6 +1920,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) > case MSR_KVM_STEAL_TIME: > data = vcpu->arch.st.msr_val; > break; > + case MSR_KVM_PV_UNHALT: > + data = (u64)vcpu->pv_unhalted; > + break; > case MSR_IA32_P5_MC_ADDR: > case MSR_IA32_P5_MC_TYPE: > case MSR_IA32_MCG_CAP:Unless there is a reason to use an MSR, should use a normal ioctl such as KVM_{GET,SET}_MP_STATE.
Marcelo Tosatti
2012-Apr-12 00:29 UTC
[PATCH RFC V5 2/6] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
On Wed, Apr 11, 2012 at 09:06:29PM -0300, Marcelo Tosatti wrote:> On Fri, Mar 23, 2012 at 01:37:04PM +0530, Raghavendra K T wrote: > > From: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> > > > > KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state. > > > > The presence of these hypercalls is indicated to guest via > > KVM_FEATURE_PV_UNHALT/KVM_CAP_PV_UNHALT. > > > > Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> > > Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com> > > Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> > > --- > > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h > > index 734c376..9234f13 100644 > > --- a/arch/x86/include/asm/kvm_para.h > > +++ b/arch/x86/include/asm/kvm_para.h > > @@ -16,12 +16,14 @@ > > #define KVM_FEATURE_CLOCKSOURCE 0 > > #define KVM_FEATURE_NOP_IO_DELAY 1 > > #define KVM_FEATURE_MMU_OP 2 > > + > > /* This indicates that the new set of kvmclock msrs > > * are available. The use of 0x11 and 0x12 is deprecated > > */ > > #define KVM_FEATURE_CLOCKSOURCE2 3 > > #define KVM_FEATURE_ASYNC_PF 4 > > #define KVM_FEATURE_STEAL_TIME 5 > > +#define KVM_FEATURE_PV_UNHALT 6 > > > > /* The last 8 bits are used to indicate how to interpret the flags field > > * in pvclock structure. If no bits are set, all flags are ignored. > > @@ -32,6 +34,7 @@ > > #define MSR_KVM_SYSTEM_TIME 0x12 > > > > #define KVM_MSR_ENABLED 1 > > + > > /* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */ > > #define MSR_KVM_WALL_CLOCK_NEW 0x4b564d00 > > #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01 > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > > index 89b02bf..61388b9 100644 > > --- a/arch/x86/kvm/cpuid.c > > +++ b/arch/x86/kvm/cpuid.c > > @@ -408,7 +408,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > > (1 << KVM_FEATURE_NOP_IO_DELAY) | > > (1 << KVM_FEATURE_CLOCKSOURCE2) | > > (1 << KVM_FEATURE_ASYNC_PF) | > > - (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); > > + (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | > > + (1 << KVM_FEATURE_PV_UNHALT); > > > > if (sched_info_on()) > > entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index 9cbfc06..bd5ef91 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -2079,6 +2079,7 @@ int kvm_dev_ioctl_check_extension(long ext) > > case KVM_CAP_XSAVE: > > case KVM_CAP_ASYNC_PF: > > case KVM_CAP_GET_TSC_KHZ: > > + case KVM_CAP_PV_UNHALT: > > r = 1; > > break; > > case KVM_CAP_COALESCED_MMIO: > > @@ -4913,6 +4914,30 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) > > return 1; > > } > > > > +/* > > + * kvm_pv_kick_cpu_op: Kick a vcpu. > > + * > > + * @apicid - apicid of vcpu to be kicked. > > + */ > > +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid) > > +{ > > + struct kvm_vcpu *vcpu = NULL; > > + int i; > > + > > + kvm_for_each_vcpu(i, vcpu, kvm) { > > + if (!kvm_apic_present(vcpu)) > > + continue; > > + > > + if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0)) > > + break; > > + } > > + if (vcpu) { > > + vcpu->pv_unhalted = 1; > > + smp_mb(); > > + kvm_vcpu_kick(vcpu); > > + } > > +} > > + > > int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) > > { > > unsigned long nr, a0, a1, a2, a3, ret; > > @@ -4946,6 +4971,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) > > case KVM_HC_VAPIC_POLL_IRQ: > > ret = 0; > > break; > > + case KVM_HC_KICK_CPU: > > + kvm_pv_kick_cpu_op(vcpu->kvm, a0); > > + ret = 0; > > + break; > > default: > > ret = -KVM_ENOSYS; > > break; > > @@ -6174,6 +6203,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) > > !vcpu->arch.apf.halted) > > || !list_empty_careful(&vcpu->async_pf.done) > > || vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED > > + || vcpu->pv_unhalted > > || atomic_read(&vcpu->arch.nmi_queued) || > > (kvm_arch_interrupt_allowed(vcpu) && > > kvm_cpu_has_interrupt(vcpu)); > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > > index 68e67e5..e822d96 100644 > > --- a/include/linux/kvm.h > > +++ b/include/linux/kvm.h > > @@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo { > > #define KVM_CAP_PPC_PAPR 68 > > #define KVM_CAP_S390_GMAP 71 > > #define KVM_CAP_TSC_DEADLINE_TIMER 72 > > +#define KVM_CAP_PV_UNHALT 73 > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index 900c763..433ae97 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -158,6 +158,7 @@ struct kvm_vcpu { > > #endif > > > > struct kvm_vcpu_arch arch; > > + int pv_unhalted; > > }; > > > > static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu) > > diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h > > index ff476dd..38226e1 100644 > > --- a/include/linux/kvm_para.h > > +++ b/include/linux/kvm_para.h > > @@ -19,6 +19,7 @@ > > #define KVM_HC_MMU_OP 2 > > #define KVM_HC_FEATURES 3 > > #define KVM_HC_PPC_MAP_MAGIC_PAGE 4 > > +#define KVM_HC_KICK_CPU 5 > > > > /* > > * hypercalls use architecture specific > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index a91f980..d3b98b1 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -226,6 +226,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) > > vcpu->kvm = kvm; > > vcpu->vcpu_id = id; > > vcpu->pid = NULL; > > + vcpu->pv_unhalted = 0; > > init_waitqueue_head(&vcpu->wq); > > kvm_async_pf_vcpu_init(vcpu); > > > > @@ -1567,6 +1568,9 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) > > prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); > > > > if (kvm_arch_vcpu_runnable(vcpu)) { > > + vcpu->pv_unhalted = 0; > > + /* preventing reordering should be enough here */ > > + barrier(); > > Is it always OK to erase the notification, even in case an unrelated > event such as interrupt was the source of wakeup?Note i am only asking whether it is OK to lose a notification, not requesting a change to atomic test-and-clear. It would be nice to have a comment explaining it.> > It would be easier to verify that notifications are not lost with atomic > test_and_clear(pv_unhalted). > > Also x86 specific code should remain in arch/x86/kvm/ >
Raghavendra K T
2012-Apr-17 07:06 UTC
[PATCH RFC V5 2/6] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
On 04/12/2012 05:59 AM, Marcelo Tosatti wrote:> On Wed, Apr 11, 2012 at 09:06:29PM -0300, Marcelo Tosatti wrote: >> On Fri, Mar 23, 2012 at 01:37:04PM +0530, Raghavendra K T wrote: >>> From: Srivatsa Vaddagiri<vatsa at linux.vnet.ibm.com> >>>[...] barrier();>> >> Is it always OK to erase the notification, even in case an unrelated >> event such as interrupt was the source of wakeup? > > Note i am only asking whether it is OK to lose a notification, not > requesting a change to atomic test-and-clear.Yes.. got your point. IMO, this is the only (safe) place where it can clear kicked(pv_unhalted) flag. Since it is going to be runnable. and you are also right in having concern on unwanted clear of flag since that would result in vcpu /vm hangs eventually. Hope I did not miss anything.> > It would be nice to have a comment explaining it. >Sure will do that>>
Raghavendra K T
2012-Apr-17 07:17 UTC
[PATCH RFC V5 3/6] kvm : Add unhalt msr to aid (live) migration
On 04/12/2012 05:45 AM, Marcelo Tosatti wrote:> On Fri, Mar 23, 2012 at 01:37:26PM +0530, Raghavendra K T wrote: >> From: Raghavendra K T<raghavendra.kt at linux.vnet.ibm.com> >>[...]> > Unless there is a reason to use an MSR, should use a normal ioctl > such as KVM_{GET,SET}_MP_STATE. > >I agree with you. In the current implementation, since we are not doing any communication between host/guest (on this flag), I too felt MSR is an overkill for this. IMO, patch like below should do the job, which I am planning to include in next version of patch. Let me know if you foresee any side-effects. --- diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index aa44292..5c81a66 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5691,7 +5691,9 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, struct kvm_mp_state *mp_state) { - mp_state->mp_state = vcpu->arch.mp_state; + mp_state->mp_state = (vcpu->arch.mp_state == KVM_MP_STATE_HALTED && + vcpu->pv_unhalted)? + KVM_MP_STATE_RUNNABLE : vcpu->arch.mp_state; return 0; }
Maybe Matching Threads
- [PATCH RFC V5 0/6] kvm : Paravirt-spinlock support for KVM guests
- [PATCH RFC V6 0/5] kvm : Paravirt-spinlock support for KVM guests
- [PATCH RFC V6 0/5] kvm : Paravirt-spinlock support for KVM guests
- [PATCH V13 0/4] Paravirtualized ticket spinlocks for KVM host
- [PATCH V13 0/4] Paravirtualized ticket spinlocks for KVM host