Raghavendra K T
2012-Apr-23 09:59 UTC
[PATCH RFC V6 0/5] kvm : Paravirt-spinlock support for KVM guests
The 5-patch series to follow this email extends KVM-hypervisor and Linux guest running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's implementation. One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick another vcpu out of halt state. The blocking of vcpu is done using halt() in (lock_spinning) slowpath. Note: 1) patch is based on 3.4-rc3 + ticketlock patches in https://lkml.org/lkml/2012/4/19/335 [ The patches are targeted for 3.5 window ] Changes in V6: - Rebased to 3.4-rc3 - Removed debugfs changes patch which should now be in Xen/linux-next. (https://lkml.org/lkml/2012/3/30/687) - Removed PV_UNHALT_MSR since currently we don't need guest communication, and made pv_unhalt folded to GET_MP_STATE (Marcello, Avi[long back]) - Take jumplabel changes from Ingo/Jason into use (static_key_slow_inc usage) - Added inline to spinlock_init in non PARAVIRT case - Move arch specific code to arch/x86 and add stubs to other archs (Marcello) - Added more comments on pv_unhalt usage etc (Marcello) Changes in V5: - rebased to 3.3-rc6 - added PV_UNHALT_MSR that would help in live migration (Avi) - removed PV_LOCK_KICK vcpu request and pv_unhalt flag (re)added. - Changed hypercall documentaion (Alex). - mode_t changed to umode_t in debugfs. - MSR related documentation added. - rename PV_LOCK_KICK to PV_UNHALT. - host and guest patches not mixed. (Marcelo, Alex) - kvm_kick_cpu now takes cpu so it can be used by flush_tlb_ipi_other paravirtualization (Nikunj) - coding style changes in variable declarion etc (Srikar) Changes in V4: - reabsed to 3.2.0 pre. - use APIC ID for kicking the vcpu and use kvm_apic_match_dest for matching (Avi) - fold vcpu->kicked flag into vcpu->requests (KVM_REQ_PVLOCK_KICK) and related changes for UNHALT path to make pv ticket spinlock migration friendly(Avi, Marcello) - Added Documentation for CPUID, Hypercall (KVM_HC_KICK_CPU) and capabilty (KVM_CAP_PVLOCK_KICK) (Avi) - Remove unneeded kvm_arch_vcpu_ioctl_set_mpstate call. (Marcello) - cumulative variable type changed (int ==> u32) in add_stat (Konrad) - remove unneeded kvm_guest_init for !CONFIG_KVM_GUEST case Changes in V3: - rebased to 3.2-rc1 - use halt() instead of wait for kick hypercall. - modify kick hyper call to do wakeup halted vcpu. - hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of head##.c). - fix the potential race when zero_stat is read. - export debugfs_create_32 and add documentation to API. - use static inline and enum instead of ADDSTAT macro. - add barrier() in after setting kick_vcpu. - empty static inline function for kvm_spinlock_init. - combine the patches one and two readuce overhead. - make KVM_DEBUGFS depends on DEBUGFS. - include debugfs header unconditionally. Changes in V2: - rebased patchesto -rc9 - synchronization related changes based on Jeremy's changes (Jeremy Fitzhardinge <jeremy.fitzhardinge at citrix.com>) pointed by Stephan Diestelhorst <stephan.diestelhorst at amd.com> - enabling 32 bit guests - splitted patches into two more chunks Results: results for PLE / non PLE machine were posted for V5 patches in https://lkml.org/lkml/2012/3/23/50 https://lkml.org/lkml/2012/4/5/73 Current series is giving similar results but more formal results will be posted in next couple of days. Interestingly with current patchset I do not see CPU stalls observed in vanilla. TODO: 1) remove CONFIG_PARAVIRT_SPINLOCK ? 2) experiments on further optimization possibilities. (discussed in V6 of ticketlock) 3) possible debugfs cleanups (combining common code of Xen/KVM) Let me know if you have any sugestion/comments... --- V5 kernel changes: https://lkml.org/lkml/2012/3/23/50 Qemu changes for V5: http://lists.gnu.org/archive/html/qemu-devel/2012-03/msg04455.html V4 kernel changes: https://lkml.org/lkml/2012/1/14/66 Qemu changes for V4: http://www.mail-archive.com/kvm at vger.kernel.org/msg66450.html V3 kernel Changes: https://lkml.org/lkml/2011/11/30/62 V2 kernel changes : https://lkml.org/lkml/2011/10/23/207 Previous discussions : (posted by Srivatsa V). https://lkml.org/lkml/2010/7/26/24 https://lkml.org/lkml/2011/1/19/212 Qemu patch for V3: http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg00397.html Srivatsa Vaddagiri (3): Add a hypercall to KVM hypervisor to support pv-ticketlocks Added configuration support to enable debug information for KVM Guests pv-ticketlock support for linux guests running on KVM hypervisor Raghavendra K T (2): Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration Add documentation on Hypercalls and features used for PV spinlock Documentation/virtual/kvm/api.txt | 7 + Documentation/virtual/kvm/cpuid.txt | 4 + Documentation/virtual/kvm/hypercalls.txt | 59 +++++++ arch/ia64/include/asm/kvm_host.h | 3 + arch/powerpc/include/asm/kvm_host.h | 4 + arch/s390/include/asm/kvm_host.h | 4 + arch/x86/Kconfig | 9 + arch/x86/include/asm/kvm_host.h | 6 + arch/x86/include/asm/kvm_para.h | 16 ++- arch/x86/kernel/kvm.c | 254 ++++++++++++++++++++++++++++++ arch/x86/kvm/cpuid.c | 3 +- arch/x86/kvm/x86.c | 46 ++++++- include/linux/kvm.h | 1 + include/linux/kvm_para.h | 1 + virt/kvm/kvm_main.c | 8 + 15 files changed, 421 insertions(+), 4 deletions(-)
Raghavendra K T
2012-Apr-23 09:59 UTC
[PATCH RFC V6 1/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
From: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state. The presence of these hypercalls is indicated to guest via KVM_FEATURE_PV_UNHALT/KVM_CAP_PV_UNHALT. Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> --- diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h index e35b3a8..3252339 100644 --- a/arch/ia64/include/asm/kvm_host.h +++ b/arch/ia64/include/asm/kvm_host.h @@ -597,6 +597,9 @@ void kvm_sal_emul(struct kvm_vcpu *vcpu); struct kvm *kvm_arch_alloc_vm(void); void kvm_arch_free_vm(struct kvm *kvm); +static inline void kvm_arch_vcpu_reset_pv_unhalted(struct kvm_vcpu *vcpu) +{ +} #endif /* __ASSEMBLY__*/ #endif diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 52eb9c1..28446de 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -498,4 +498,8 @@ struct kvm_vcpu_arch { #define KVM_MMIO_REG_QPR 0x0040 #define KVM_MMIO_REG_FQPR 0x0060 +static inline void kvm_arch_vcpu_reset_pv_unhalted(struct kvm_vcpu *vcpu) +{ +} + #endif /* __POWERPC_KVM_HOST_H__ */ diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index 7343872..c47f874 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -256,4 +256,8 @@ struct kvm_arch{ }; extern int sie64a(struct kvm_s390_sie_block *, u64 *); +static inline void kvm_arch_vcpu_reset_pv_unhalted(struct kvm_vcpu *vcpu) +{ +} + #endif diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e216ba0..dad475b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -481,6 +481,10 @@ struct kvm_vcpu_arch { u64 length; u64 status; } osvw; + /* pv related host specific info */ + struct { + int pv_unhalted; + } pv; }; struct kvm_lpage_info { @@ -967,4 +971,6 @@ int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data); void kvm_handle_pmu_event(struct kvm_vcpu *vcpu); void kvm_deliver_pmi(struct kvm_vcpu *vcpu); +void kvm_arch_vcpu_reset_pv_unhalted(struct kvm_vcpu *vcpu); + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 734c376..5b647ea 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -16,12 +16,14 @@ #define KVM_FEATURE_CLOCKSOURCE 0 #define KVM_FEATURE_NOP_IO_DELAY 1 #define KVM_FEATURE_MMU_OP 2 + /* This indicates that the new set of kvmclock msrs * are available. The use of 0x11 and 0x12 is deprecated */ #define KVM_FEATURE_CLOCKSOURCE2 3 #define KVM_FEATURE_ASYNC_PF 4 #define KVM_FEATURE_STEAL_TIME 5 +#define KVM_FEATURE_PV_UNHALT 6 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 9fed5be..7c93806 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -408,7 +408,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, (1 << KVM_FEATURE_NOP_IO_DELAY) | (1 << KVM_FEATURE_CLOCKSOURCE2) | (1 << KVM_FEATURE_ASYNC_PF) | - (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); + (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | + (1 << KVM_FEATURE_PV_UNHALT); if (sched_info_on()) entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4044ce0..7fc9be6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2147,6 +2147,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_ASYNC_PF: case KVM_CAP_GET_TSC_KHZ: case KVM_CAP_PCI_2_3: + case KVM_CAP_PV_UNHALT: r = 1; break; case KVM_CAP_COALESCED_MMIO: @@ -4993,6 +4994,36 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) return 1; } +/* + * kvm_pv_kick_cpu_op: Kick a vcpu. + * + * @apicid - apicid of vcpu to be kicked. + */ +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid) +{ + struct kvm_vcpu *vcpu = NULL; + int i; + + kvm_for_each_vcpu(i, vcpu, kvm) { + if (!kvm_apic_present(vcpu)) + continue; + + if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0)) + break; + } + if (vcpu) { + /* + * Setting unhalt flag here can result in spurious runnable + * state when unhalt reset does not happen in vcpu_block. + * But that is harmless since that should soon result in halt. + */ + vcpu->arch.pv.pv_unhalted = 1; + /* We need everybody see unhalt before vcpu unblocks */ + smp_mb(); + kvm_vcpu_kick(vcpu); + } +} + int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) { unsigned long nr, a0, a1, a2, a3, ret; @@ -5026,6 +5057,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) case KVM_HC_VAPIC_POLL_IRQ: ret = 0; break; + case KVM_HC_KICK_CPU: + kvm_pv_kick_cpu_op(vcpu->kvm, a0); + ret = 0; + break; default: ret = -KVM_ENOSYS; break; @@ -6128,6 +6163,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) BUG_ON(vcpu->kvm == NULL); kvm = vcpu->kvm; + kvm_arch_vcpu_reset_pv_unhalted(vcpu); vcpu->arch.emulate_ctxt.ops = &emulate_ops; if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu)) vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; @@ -6398,11 +6434,17 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) !vcpu->arch.apf.halted) || !list_empty_careful(&vcpu->async_pf.done) || vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED + || vcpu->arch.pv.pv_unhalted || atomic_read(&vcpu->arch.nmi_queued) || (kvm_arch_interrupt_allowed(vcpu) && kvm_cpu_has_interrupt(vcpu)); } +void kvm_arch_vcpu_reset_pv_unhalted(struct kvm_vcpu *vcpu) +{ + vcpu->arch.pv.pv_unhalted = 0; +} + void kvm_vcpu_kick(struct kvm_vcpu *vcpu) { int me; diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 6c322a9..a189f02 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -589,6 +589,7 @@ struct kvm_ppc_pvinfo { #define KVM_CAP_S390_UCONTROL 73 #define KVM_CAP_SYNC_REGS 74 #define KVM_CAP_PCI_2_3 75 +#define KVM_CAP_PV_UNHALT 76 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h index ff476dd..38226e1 100644 --- a/include/linux/kvm_para.h +++ b/include/linux/kvm_para.h @@ -19,6 +19,7 @@ #define KVM_HC_MMU_OP 2 #define KVM_HC_FEATURES 3 #define KVM_HC_PPC_MAP_MAGIC_PAGE 4 +#define KVM_HC_KICK_CPU 5 /* * hypercalls use architecture specific diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 42b7393..edf56d4 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1500,6 +1500,14 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); if (kvm_arch_vcpu_runnable(vcpu)) { + /* + * This is the only safe place to reset unhalt flag. + * otherwise it results in loosing the notification + * which eventually can result in vcpu hangs. + */ + kvm_arch_vcpu_reset_pv_unhalted(vcpu); + /* preventing reordering should be enough here */ + barrier(); kvm_make_request(KVM_REQ_UNHALT, vcpu); break; }
Raghavendra K T
2012-Apr-23 10:00 UTC
[PATCH RFC V6 2/5] kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration
From: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> --- diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6faa550..7354c1b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5691,7 +5691,9 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, struct kvm_mp_state *mp_state) { - mp_state->mp_state = vcpu->arch.mp_state; + mp_state->mp_state = (vcpu->arch.mp_state == KVM_MP_STATE_HALTED && + vcpu->arch.pv.pv_unhalted) ? + KVM_MP_STATE_RUNNABLE : vcpu->arch.mp_state; return 0; }
Raghavendra K T
2012-Apr-23 10:00 UTC
[PATCH RFC V6 3/5] kvm guest : Add configuration support to enable debug information for KVM Guests
From: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> --- diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 35eb2e4..a9ec0da 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -584,6 +584,15 @@ config KVM_GUEST This option enables various optimizations for running under the KVM hypervisor. +config KVM_DEBUG_FS + bool "Enable debug information for KVM Guests in debugfs" + depends on KVM_GUEST && DEBUG_FS + default n + ---help--- + This option enables collection of various statistics for KVM guest. + Statistics are displayed in debugfs filesystem. Enabling this option + may incur significant overhead. + source "arch/x86/lguest/Kconfig" config PARAVIRT
Raghavendra K T
2012-Apr-23 10:00 UTC
[PATCH RFC V6 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
From: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> During smp_boot_cpus paravirtualied KVM guest detects if the hypervisor has required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so, support for pv-ticketlocks is registered via pv_lock_ops. Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu. Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> --- diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 5b647ea..77266d3 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -195,10 +195,20 @@ void kvm_async_pf_task_wait(u32 token); void kvm_async_pf_task_wake(u32 token); u32 kvm_read_and_reset_pf_reason(void); extern void kvm_disable_steal_time(void); -#else -#define kvm_guest_init() do { } while (0) + +#ifdef CONFIG_PARAVIRT_SPINLOCKS +void __init kvm_spinlock_init(void); +#else /* !CONFIG_PARAVIRT_SPINLOCKS */ +static inline void kvm_spinlock_init(void) +{ +} +#endif /* CONFIG_PARAVIRT_SPINLOCKS */ + +#else /* CONFIG_KVM_GUEST */ +#define kvm_guest_init() do {} while (0) #define kvm_async_pf_task_wait(T) do {} while(0) #define kvm_async_pf_task_wake(T) do {} while(0) + static inline u32 kvm_read_and_reset_pf_reason(void) { return 0; diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index b8ba6e4..98f0378 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -33,6 +33,7 @@ #include <linux/sched.h> #include <linux/slab.h> #include <linux/kprobes.h> +#include <linux/debugfs.h> #include <asm/timer.h> #include <asm/cpu.h> #include <asm/traps.h> @@ -368,6 +369,7 @@ static void __init kvm_smp_prepare_boot_cpu(void) #endif kvm_guest_cpu_init(); native_smp_prepare_boot_cpu(); + kvm_spinlock_init(); } static void __cpuinit kvm_guest_cpu_online(void *dummy) @@ -450,3 +452,255 @@ static __init int activate_jump_labels(void) return 0; } arch_initcall(activate_jump_labels); + +#ifdef CONFIG_PARAVIRT_SPINLOCKS + +enum kvm_contention_stat { + TAKEN_SLOW, + TAKEN_SLOW_PICKUP, + RELEASED_SLOW, + RELEASED_SLOW_KICKED, + NR_CONTENTION_STATS +}; + +#ifdef CONFIG_KVM_DEBUG_FS +#define HISTO_BUCKETS 30 + +static struct kvm_spinlock_stats +{ + u32 contention_stats[NR_CONTENTION_STATS]; + u32 histo_spin_blocked[HISTO_BUCKETS+1]; + u64 time_blocked; +} spinlock_stats; + +static u8 zero_stats; + +static inline void check_zero(void) +{ + u8 ret; + u8 old; + + old = ACCESS_ONCE(zero_stats); + if (unlikely(old)) { + ret = cmpxchg(&zero_stats, old, 0); + /* This ensures only one fellow resets the stat */ + if (ret == old) + memset(&spinlock_stats, 0, sizeof(spinlock_stats)); + } +} + +static inline void add_stats(enum kvm_contention_stat var, u32 val) +{ + check_zero(); + spinlock_stats.contention_stats[var] += val; +} + + +static inline u64 spin_time_start(void) +{ + return sched_clock(); +} + +static void __spin_time_accum(u64 delta, u32 *array) +{ + unsigned index; + + index = ilog2(delta); + check_zero(); + + if (index < HISTO_BUCKETS) + array[index]++; + else + array[HISTO_BUCKETS]++; +} + +static inline void spin_time_accum_blocked(u64 start) +{ + u32 delta; + + delta = sched_clock() - start; + __spin_time_accum(delta, spinlock_stats.histo_spin_blocked); + spinlock_stats.time_blocked += delta; +} + +static struct dentry *d_spin_debug; +static struct dentry *d_kvm_debug; + +struct dentry *kvm_init_debugfs(void) +{ + d_kvm_debug = debugfs_create_dir("kvm", NULL); + if (!d_kvm_debug) + printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n"); + + return d_kvm_debug; +} + +static int __init kvm_spinlock_debugfs(void) +{ + struct dentry *d_kvm; + + d_kvm = kvm_init_debugfs(); + if (d_kvm == NULL) + return -ENOMEM; + + d_spin_debug = debugfs_create_dir("spinlocks", d_kvm); + + debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats); + + debugfs_create_u32("taken_slow", 0444, d_spin_debug, + &spinlock_stats.contention_stats[TAKEN_SLOW]); + debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug, + &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]); + + debugfs_create_u32("released_slow", 0444, d_spin_debug, + &spinlock_stats.contention_stats[RELEASED_SLOW]); + debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug, + &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]); + + debugfs_create_u64("time_blocked", 0444, d_spin_debug, + &spinlock_stats.time_blocked); + + debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug, + spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1); + + return 0; +} +fs_initcall(kvm_spinlock_debugfs); +#else /* !CONFIG_KVM_DEBUG_FS */ +#define TIMEOUT (1 << 10) +static inline void add_stats(enum kvm_contention_stat var, u32 val) +{ +} + +static inline u64 spin_time_start(void) +{ + return 0; +} + +static inline void spin_time_accum_blocked(u64 start) +{ +} +#endif /* CONFIG_KVM_DEBUG_FS */ + +struct kvm_lock_waiting { + struct arch_spinlock *lock; + __ticket_t want; +}; + +/* cpus 'waiting' on a spinlock to become available */ +static cpumask_t waiting_cpus; + +/* Track spinlock on which a cpu is waiting */ +static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting); + +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want) +{ + struct kvm_lock_waiting *w; + int cpu; + u64 start; + unsigned long flags; + + w = &__get_cpu_var(lock_waiting); + cpu = smp_processor_id(); + start = spin_time_start(); + + /* + * Make sure an interrupt handler can't upset things in a + * partially setup state. + */ + local_irq_save(flags); + + /* + * The ordering protocol on this is that the "lock" pointer + * may only be set non-NULL if the "want" ticket is correct. + * If we're updating "want", we must first clear "lock". + */ + w->lock = NULL; + smp_wmb(); + w->want = want; + smp_wmb(); + w->lock = lock; + + add_stats(TAKEN_SLOW, 1); + + /* + * This uses set_bit, which is atomic but we should not rely on its + * reordering gurantees. So barrier is needed after this call. + */ + cpumask_set_cpu(cpu, &waiting_cpus); + + barrier(); + + /* + * Mark entry to slowpath before doing the pickup test to make + * sure we don't deadlock with an unlocker. + */ + __ticket_enter_slowpath(lock); + + /* + * check again make sure it didn't become free while + * we weren't looking. + */ + if (ACCESS_ONCE(lock->tickets.head) == want) { + add_stats(TAKEN_SLOW_PICKUP, 1); + goto out; + } + + /* Allow interrupts while blocked */ + local_irq_restore(flags); + + /* halt until it's our turn and kicked. */ + halt(); + + local_irq_save(flags); +out: + cpumask_clear_cpu(cpu, &waiting_cpus); + w->lock = NULL; + local_irq_restore(flags); + spin_time_accum_blocked(start); +} +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning); + +/* Kick a cpu by its apicid*/ +static inline void kvm_kick_cpu(int cpu) +{ + int apicid; + + apicid = per_cpu(x86_cpu_to_apicid, cpu); + kvm_hypercall1(KVM_HC_KICK_CPU, apicid); +} + +/* Kick vcpu waiting on @lock->head to reach value @ticket */ +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket) +{ + int cpu; + + add_stats(RELEASED_SLOW, 1); + for_each_cpu(cpu, &waiting_cpus) { + const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu); + if (ACCESS_ONCE(w->lock) == lock && + ACCESS_ONCE(w->want) == ticket) { + add_stats(RELEASED_SLOW_KICKED, 1); + kvm_kick_cpu(cpu); + break; + } + } +} + +/* + * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present. + */ +void __init kvm_spinlock_init(void) +{ + if (!kvm_para_available()) + return; + /* Does host kernel support KVM_FEATURE_PV_UNHALT? */ + if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT)) + return; + + static_key_slow_inc(¶virt_ticketlocks_enabled); + + pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning); + pv_lock_ops.unlock_kick = kvm_unlock_kick; +} +#endif /* CONFIG_PARAVIRT_SPINLOCKS */
Raghavendra K T
2012-Apr-23 10:00 UTC
[PATCH RFC V6 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
From: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> KVM_HC_KICK_CPU hypercall added to wakeup halted vcpu in paravirtual spinlock enabled guest. KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be enabled in guest. support in host is queried via ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PV_UNHALT) Thanks Alex for KVM_HC_FEATURES inputs and Vatsa for rewriting KVM_HC_KICK_CPU Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> --- diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 6386f8c..7e2eba0 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1119,6 +1119,13 @@ support. Instead it is reported via if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the feature in userspace, then you can enable the feature for KVM_SET_CPUID2. +Paravirtualized ticket spinlocks can be enabled in guest by checking whether +support exists in host via, + + ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PV_UNHALT) + +if this call return true, guest can use the feature. + 4.47 KVM_PPC_GET_PVINFO Capability: KVM_CAP_PPC_GET_PVINFO diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 8820685..062dff9 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -39,6 +39,10 @@ KVM_FEATURE_CLOCKSOURCE2 || 3 || kvmclock available at msrs KVM_FEATURE_ASYNC_PF || 4 || async pf can be enabled by || || writing to msr 0x4b564d02 ------------------------------------------------------------------------------ +KVM_FEATURE_PV_UNHALT || 6 || guest checks this feature bit + || || before enabling paravirtualized + || || spinlock support. +------------------------------------------------------------------------------ KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt new file mode 100644 index 0000000..c9e8b9c --- /dev/null +++ b/Documentation/virtual/kvm/hypercalls.txt @@ -0,0 +1,59 @@ +KVM Hypercalls Documentation +==========================+Template for documentation is +The documenenation for hypercalls should inlcude +1. Hypercall name, value. +2. Architecture(s) +3. status (deprecated, obsolete, active) +4. Purpose + + +1. KVM_HC_VAPIC_POLL_IRQ +------------------------ +value: 1 +Architecture: x86 +Purpose: + +2. KVM_HC_MMU_OP +------------------------ +value: 2 +Architecture: x86 +status: deprecated. +Purpose: Support MMU operations such as writing to PTE, +flushing TLB, release PT. + +3. KVM_HC_FEATURES +------------------------ +value: 3 +Architecture: PPC +Purpose: Expose hypercall availability to the guest. On x86 platforms, cpuid +used to enumerate which hypercalls are available. On PPC, either device tree +based lookup ( which is also what EPAPR dictates) OR KVM specific enumeration +mechanism (which is this hypercall) can be used. + +4. KVM_HC_PPC_MAP_MAGIC_PAGE +------------------------ +value: 4 +Architecture: PPC +Purpose: To enable communication between the hypervisor and guest there is a +shared page that contains parts of supervisor visible register state. +The guest can map this shared page to access its supervisor register through +memory using this hypercall. + +5. KVM_HC_KICK_CPU +------------------------ +value: 5 +Architecture: x86 +Purpose: Hypercall used to wakeup a vcpu from HLT state + +Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest +kernel mode for an event to occur (ex: a spinlock to become available) can +execute HLT instruction once it has busy-waited for more than a threshold +time-interval. Execution of HLT instruction would cause the hypervisor to put +the vcpu to sleep until occurence of an appropriate event. Another vcpu of the +same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall, +specifying APIC ID of the vcpu to be wokenup. + +TODO: +1. more information on input and output needed? +2. Add more detail to purpose of hypercalls.
Gleb Natapov
2012-Apr-24 09:59 UTC
[PATCH RFC V6 1/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
On Mon, Apr 23, 2012 at 03:29:47PM +0530, Raghavendra K T wrote:> From: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> > > KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state. > > The presence of these hypercalls is indicated to guest via > KVM_FEATURE_PV_UNHALT/KVM_CAP_PV_UNHALT. > > Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com> > Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com> > Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> > --- > diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h > index e35b3a8..3252339 100644 > --- a/arch/ia64/include/asm/kvm_host.h > +++ b/arch/ia64/include/asm/kvm_host.h > @@ -597,6 +597,9 @@ void kvm_sal_emul(struct kvm_vcpu *vcpu); > struct kvm *kvm_arch_alloc_vm(void); > void kvm_arch_free_vm(struct kvm *kvm); > > +static inline void kvm_arch_vcpu_reset_pv_unhalted(struct kvm_vcpu *vcpu) > +{ > +} > #endif /* __ASSEMBLY__*/ > > #endif > diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h > index 52eb9c1..28446de 100644 > --- a/arch/powerpc/include/asm/kvm_host.h > +++ b/arch/powerpc/include/asm/kvm_host.h > @@ -498,4 +498,8 @@ struct kvm_vcpu_arch { > #define KVM_MMIO_REG_QPR 0x0040 > #define KVM_MMIO_REG_FQPR 0x0060 > > +static inline void kvm_arch_vcpu_reset_pv_unhalted(struct kvm_vcpu *vcpu) > +{ > +} > + > #endif /* __POWERPC_KVM_HOST_H__ */ > diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h > index 7343872..c47f874 100644 > --- a/arch/s390/include/asm/kvm_host.h > +++ b/arch/s390/include/asm/kvm_host.h > @@ -256,4 +256,8 @@ struct kvm_arch{ > }; > > extern int sie64a(struct kvm_s390_sie_block *, u64 *); > +static inline void kvm_arch_vcpu_reset_pv_unhalted(struct kvm_vcpu *vcpu) > +{ > +} > + > #endif > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index e216ba0..dad475b 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -481,6 +481,10 @@ struct kvm_vcpu_arch { > u64 length; > u64 status; > } osvw; > + /* pv related host specific info */ > + struct { > + int pv_unhalted; > + } pv; > }; > > struct kvm_lpage_info { > @@ -967,4 +971,6 @@ int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data); > void kvm_handle_pmu_event(struct kvm_vcpu *vcpu); > void kvm_deliver_pmi(struct kvm_vcpu *vcpu); > > +void kvm_arch_vcpu_reset_pv_unhalted(struct kvm_vcpu *vcpu); > + > #endif /* _ASM_X86_KVM_HOST_H */ > diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h > index 734c376..5b647ea 100644 > --- a/arch/x86/include/asm/kvm_para.h > +++ b/arch/x86/include/asm/kvm_para.h > @@ -16,12 +16,14 @@ > #define KVM_FEATURE_CLOCKSOURCE 0 > #define KVM_FEATURE_NOP_IO_DELAY 1 > #define KVM_FEATURE_MMU_OP 2 > + > /* This indicates that the new set of kvmclock msrs > * are available. The use of 0x11 and 0x12 is deprecated > */ > #define KVM_FEATURE_CLOCKSOURCE2 3 > #define KVM_FEATURE_ASYNC_PF 4 > #define KVM_FEATURE_STEAL_TIME 5 > +#define KVM_FEATURE_PV_UNHALT 6 > > /* The last 8 bits are used to indicate how to interpret the flags field > * in pvclock structure. If no bits are set, all flags are ignored. > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 9fed5be..7c93806 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -408,7 +408,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, > (1 << KVM_FEATURE_NOP_IO_DELAY) | > (1 << KVM_FEATURE_CLOCKSOURCE2) | > (1 << KVM_FEATURE_ASYNC_PF) | > - (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); > + (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | > + (1 << KVM_FEATURE_PV_UNHALT); > > if (sched_info_on()) > entry->eax |= (1 << KVM_FEATURE_STEAL_TIME); > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 4044ce0..7fc9be6 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2147,6 +2147,7 @@ int kvm_dev_ioctl_check_extension(long ext) > case KVM_CAP_ASYNC_PF: > case KVM_CAP_GET_TSC_KHZ: > case KVM_CAP_PCI_2_3: > + case KVM_CAP_PV_UNHALT: > r = 1; > break; > case KVM_CAP_COALESCED_MMIO: > @@ -4993,6 +4994,36 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) > return 1; > } > > +/* > + * kvm_pv_kick_cpu_op: Kick a vcpu. > + * > + * @apicid - apicid of vcpu to be kicked. > + */ > +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid) > +{ > + struct kvm_vcpu *vcpu = NULL; > + int i; > + > + kvm_for_each_vcpu(i, vcpu, kvm) { > + if (!kvm_apic_present(vcpu)) > + continue; > + > + if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0)) > + break; > + } > + if (vcpu) { > + /* > + * Setting unhalt flag here can result in spurious runnable > + * state when unhalt reset does not happen in vcpu_block. > + * But that is harmless since that should soon result in halt. > + */ > + vcpu->arch.pv.pv_unhalted = 1; > + /* We need everybody see unhalt before vcpu unblocks */ > + smp_mb(); > + kvm_vcpu_kick(vcpu); > + } > +}This is too similar to kvm_irq_delivery_to_apic(). Why not reuse it. We can use one of reserved delivery modes as PV delivery mode. We will disallow guest to trigger it through apic interface, so this will not be part of ABI and can be changed at will.> + > int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) > { > unsigned long nr, a0, a1, a2, a3, ret; > @@ -5026,6 +5057,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) > case KVM_HC_VAPIC_POLL_IRQ: > ret = 0; > break; > + case KVM_HC_KICK_CPU: > + kvm_pv_kick_cpu_op(vcpu->kvm, a0); > + ret = 0; > + break; > default: > ret = -KVM_ENOSYS; > break; > @@ -6128,6 +6163,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) > BUG_ON(vcpu->kvm == NULL); > kvm = vcpu->kvm; > > + kvm_arch_vcpu_reset_pv_unhalted(vcpu); > vcpu->arch.emulate_ctxt.ops = &emulate_ops; > if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu)) > vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; > @@ -6398,11 +6434,17 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) > !vcpu->arch.apf.halted) > || !list_empty_careful(&vcpu->async_pf.done) > || vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED > + || vcpu->arch.pv.pv_unhalted > || atomic_read(&vcpu->arch.nmi_queued) || > (kvm_arch_interrupt_allowed(vcpu) && > kvm_cpu_has_interrupt(vcpu)); > } > > +void kvm_arch_vcpu_reset_pv_unhalted(struct kvm_vcpu *vcpu) > +{ > + vcpu->arch.pv.pv_unhalted = 0; > +} > + > void kvm_vcpu_kick(struct kvm_vcpu *vcpu) > { > int me; > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > index 6c322a9..a189f02 100644 > --- a/include/linux/kvm.h > +++ b/include/linux/kvm.h > @@ -589,6 +589,7 @@ struct kvm_ppc_pvinfo { > #define KVM_CAP_S390_UCONTROL 73 > #define KVM_CAP_SYNC_REGS 74 > #define KVM_CAP_PCI_2_3 75 > +#define KVM_CAP_PV_UNHALT 76 > > #ifdef KVM_CAP_IRQ_ROUTING > > diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h > index ff476dd..38226e1 100644 > --- a/include/linux/kvm_para.h > +++ b/include/linux/kvm_para.h > @@ -19,6 +19,7 @@ > #define KVM_HC_MMU_OP 2 > #define KVM_HC_FEATURES 3 > #define KVM_HC_PPC_MAP_MAGIC_PAGE 4 > +#define KVM_HC_KICK_CPU 5 > > /* > * hypercalls use architecture specific > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 42b7393..edf56d4 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1500,6 +1500,14 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) > prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); > > if (kvm_arch_vcpu_runnable(vcpu)) { > + /* > + * This is the only safe place to reset unhalt flag. > + * otherwise it results in loosing the notification > + * which eventually can result in vcpu hangs. > + */Why this is the only safe place? Why clearing it in kvm/x86.c just after call to kvm_vcpu_block() if KVM_REQ_UNHALT is set is not safe enough?> + kvm_arch_vcpu_reset_pv_unhalted(vcpu); > + /* preventing reordering should be enough here */ > + barrier(); > kvm_make_request(KVM_REQ_UNHALT, vcpu); > break; > }-- Gleb.
Raghavendra K T
2012-Apr-26 08:11 UTC
[PATCH RFC V6 1/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
On 04/24/2012 03:29 PM, Gleb Natapov wrote:> On Mon, Apr 23, 2012 at 03:29:47PM +0530, Raghavendra K T wrote: >> From: Srivatsa Vaddagiri<vatsa at linux.vnet.ibm.com>[...]>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >> index 42b7393..edf56d4 100644 >> --- a/virt/kvm/kvm_main.c >> +++ b/virt/kvm/kvm_main.c >> @@ -1500,6 +1500,14 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) >> prepare_to_wait(&vcpu->wq,&wait, TASK_INTERRUPTIBLE); >> >> if (kvm_arch_vcpu_runnable(vcpu)) { >> + /* >> + * This is the only safe place to reset unhalt flag. >> + * otherwise it results in loosing the notification >> + * which eventually can result in vcpu hangs. >> + */ > Why this is the only safe place? Why clearing it in kvm/x86.c just after > call to kvm_vcpu_block() if KVM_REQ_UNHALT is set is not safe enough? >Yes, You are Right. The acceptable window to reset the flag can be extended till there. and Good point about that is it removes the need for having the stubs for other archs and simplifies everything a lot. Thanks for that. [ When I was experimenting with request bit, clearing request bit in the same place was causing vm hang after some 16 iteration of stress test. I had carried the same impression. Now I have done stress testing to ensure that the change works. Basically as you know, my fear was loosing kick(s) leads to vm hang eventually. ]
Avi Kivity
2012-Apr-29 13:27 UTC
[PATCH RFC V6 2/5] kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration
On 04/23/2012 01:00 PM, Raghavendra K T wrote:> From: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> > > Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> > --- > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 6faa550..7354c1b 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -5691,7 +5691,9 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, > int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, > struct kvm_mp_state *mp_state) > { > - mp_state->mp_state = vcpu->arch.mp_state; > + mp_state->mp_state = (vcpu->arch.mp_state == KVM_MP_STATE_HALTED && > + vcpu->arch.pv.pv_unhalted) ? > + KVM_MP_STATE_RUNNABLE : vcpu->arch.mp_state; > return 0; > }Not pretty. Suggest rewriting using an if. -- error compiling committee.c: too many arguments to function
Apparently Analagous Threads
- [PATCH RFC V6 0/5] kvm : Paravirt-spinlock support for KVM guests
- [PATCH RFC V5 0/6] kvm : Paravirt-spinlock support for KVM guests
- [PATCH RFC V5 0/6] kvm : Paravirt-spinlock support for KVM guests
- [PATCH RFC V10 0/18] Paravirtualized ticket spinlocks
- [PATCH RFC V10 0/18] Paravirtualized ticket spinlocks