thr3ads.net - Linux Virtualization - [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Raghavendra K T

2012-Jan-14 18:25 UTC

[PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests

The 5-patch series to follow this email extends KVM-hypervisor and Linux guest 
running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's
implementation.

One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick
another vcpu out of halt state.
The blocking of vcpu is done using halt() in (lock_spinning) slowpath.

Changes in V4:
- reabsed to 3.2.0 pre.
- use APIC ID for kicking the vcpu and use kvm_apic_match_dest for matching.
(Avi)
- fold vcpu->kicked flag into vcpu->requests (KVM_REQ_PVLOCK_KICK) and
related
  changes for UNHALT path to make pv ticket spinlock migration friendly. (Avi,
Marcello)
- Added Documentation for CPUID, Hypercall (KVM_HC_KICK_CPU)
  and capabilty (KVM_CAP_PVLOCK_KICK) (Avi)
- Remove unneeded kvm_arch_vcpu_ioctl_set_mpstate call. (Marcello)
- cumulative variable type changed (int ==> u32) in add_stat (Konrad)
- remove unneeded kvm_guest_init for !CONFIG_KVM_GUEST case

Changes in V3:
- rebased to 3.2-rc1
- use halt() instead of wait for kick hypercall.
- modify kick hyper call to do wakeup halted vcpu.
- hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of
head##.c).
- fix the potential race when zero_stat is read.
- export debugfs_create_32 and add documentation to API.
- use static inline and enum instead of ADDSTAT macro. 
- add  barrier() in after setting kick_vcpu.
- empty static inline function for kvm_spinlock_init.
- combine the patches one and two readuce overhead.
- make KVM_DEBUGFS depends on DEBUGFS.
- include debugfs header unconditionally.

Changes in V2:
- rebased patchesto -rc9
- synchronization related changes based on Jeremy's changes 
 (Jeremy Fitzhardinge <jeremy.fitzhardinge at citrix.com>) pointed by 
 Stephan Diestelhorst <stephan.diestelhorst at amd.com>
- enabling 32 bit guests
- splitted patches into two more chunks

 Srivatsa Vaddagiri, Suzuki Poulose, Raghavendra K T (5): 
  Add debugfs support to print u32-arrays in debugfs
  Add a hypercall to KVM hypervisor to support pv-ticketlocks
  Added configuration support to enable debug information for KVM Guests
  pv-ticketlocks support for linux guests running on KVM hypervisor
  Add documentation on Hypercalls and features used for PV spinlock
 
Test Set up :
The BASE patch is pre 3.2.0 + Jeremy's following patches.
xadd (https://lkml.org/lkml/2011/10/4/328)
x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
Kernel for host/guest : 3.2.0 + Jeremy's xadd, pv spinlock patches as BASE
(Note:locked add change is not taken yet)

Results:
 The performance gain is mainly because of reduced busy-wait time.
 From the results we can see that patched kernel performance is similar to
 BASE when there is no lock contention. But once we start seeing more
 contention, patched kernel outperforms BASE (non PLE).
 On PLE machine we do not see greater performance improvement because of PLE
 complimenting halt()

3 guests with 8VCPU, 4GB RAM, 1 used for kernbench
(kernbench -f -H -M -o 20) other for cpuhog (shell script while
true with an instruction)

scenario A: unpinned

1x: no hogs
2x: 8hogs in one guest
3x: 8hogs each in two guest

scenario B: unpinned, run kernbench on all the guests no hogs.

Dbench on PLE machine:
dbench run on all the guest simultaneously with
dbench --warmup=30 -t 120 with NRCLIENTS=(8/16/32).

Result for Non PLE machine :
===========================Machine : IBM xSeries with Intel(R) Xeon(R) x5570
2.93GHz CPU with 8 core , 64GB RAM
		 BASE                    BASE+patch            %improvement
		 mean (sd)               mean (sd)
Scenario A:
case 1x:	 164.233 (16.5506) 	 163.584 (15.4598 	0.39517
case 2x:	 897.654 (543.993) 	 328.63 (103.771) 	63.3901
case 3x:	 2855.73 (2201.41) 	 315.029 (111.854) 	88.9685

Dbench:
Throughput is in MB/sec
NRCLIENTS	 BASE                    BASE+patch            %improvement
               	 mean (sd)               mean (sd)
8       	1.774307  (0.061361) 	1.725667  (0.034644) 	-2.74135
16      	1.445967  (0.044805) 	1.463173  (0.094399) 	1.18993
32        	2.136667  (0.105717) 	2.193792  (0.129357) 	2.67356

Result for PLE machine:
=====================Machine : IBM xSeries with Intel(R) Xeon(R)  X7560 2.27GHz
CPU with 32/64 core, with 8
         online cores and 4*64GB RAM

Kernbench:
		 BASE                    BASE+patch            %improvement
		 mean (sd)               mean (sd)
Scenario A:	 			
case 1x:	 161.263 (56.518) 	 159.635 (40.5621) 	1.00953
case 2x:	 190.748 (61.2745) 	 190.606 (54.4766) 	0.0744438
case 3x:	 227.378 (100.215) 	 225.442 (92.0809) 	0.851446

Scenario B:
		 446.104 (58.54 )	 433.12733 (54.476)	2.91

Dbench:
Throughput is in MB/sec
NRCLIENTS	 BASE                    BASE+patch            %improvement
               	 mean (sd)               mean (sd)
8       	1.101190  (0.875082) 	1.700395  (0.846809) 	54.4143
16      	1.524312  (0.120354) 	1.477553  (0.058166) 	-3.06755
32        	2.143028  (0.157103) 	2.090307  (0.136778) 	-2.46012

---
 V3 kernel Changes:
 https://lkml.org/lkml/2011/11/30/62
 V2 kernel changes : 
 https://lkml.org/lkml/2011/10/23/207

 Previous discussions : (posted by Srivatsa V).
 https://lkml.org/lkml/2010/7/26/24
 https://lkml.org/lkml/2011/1/19/212
 
 Qemu patch for V3:
 http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg00397.html

 Documentation/virtual/kvm/api.txt        |    7 +
 Documentation/virtual/kvm/cpuid.txt      |    4 +
 Documentation/virtual/kvm/hypercalls.txt |   54 +++++++
 arch/x86/Kconfig                         |    9 +
 arch/x86/include/asm/kvm_para.h          |   16 ++-
 arch/x86/kernel/kvm.c                    |  249 ++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                       |   37 ++++-
 arch/x86/xen/debugfs.c                   |  104 -------------
 arch/x86/xen/debugfs.h                   |    4 -
 arch/x86/xen/spinlock.c                  |    2 +-
 fs/debugfs/file.c                        |  128 +++++++++++++++
 include/linux/debugfs.h                  |   11 ++
 include/linux/kvm.h                      |    1 +
 include/linux/kvm_host.h                 |    1 +
 include/linux/kvm_para.h                 |    1 +
 15 files changed, 514 insertions(+), 114 deletions(-)

Raghavendra K T

2012-Jan-14 18:25 UTC

head link

[PATCH RFC V4 1/5] debugfs: Add support to print u32 array in debugfs

Add debugfs support to print u32-arrays in debugfs. Move the code from Xen to
debugfs
to make the code common for other users as well.

Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
---
diff --git a/arch/x86/xen/debugfs.c b/arch/x86/xen/debugfs.c
index 7c0fedd..c8377fb 100644
--- a/arch/x86/xen/debugfs.c
+++ b/arch/x86/xen/debugfs.c
@@ -19,107 +19,3 @@ struct dentry * __init xen_init_debugfs(void)
 	return d_xen_debug;
 }
 
-struct array_data
-{
-	void *array;
-	unsigned elements;
-};
-
-static int u32_array_open(struct inode *inode, struct file *file)
-{
-	file->private_data = NULL;
-	return nonseekable_open(inode, file);
-}
-
-static size_t format_array(char *buf, size_t bufsize, const char *fmt,
-			   u32 *array, unsigned array_size)
-{
-	size_t ret = 0;
-	unsigned i;
-
-	for(i = 0; i < array_size; i++) {
-		size_t len;
-
-		len = snprintf(buf, bufsize, fmt, array[i]);
-		len++;	/* ' ' or '\n' */
-		ret += len;
-
-		if (buf) {
-			buf += len;
-			bufsize -= len;
-			buf[-1] = (i == array_size-1) ? '\n' : ' ';
-		}
-	}
-
-	ret++;		/* \0 */
-	if (buf)
-		*buf = '\0';
-
-	return ret;
-}
-
-static char *format_array_alloc(const char *fmt, u32 *array, unsigned
array_size)
-{
-	size_t len = format_array(NULL, 0, fmt, array, array_size);
-	char *ret;
-
-	ret = kmalloc(len, GFP_KERNEL);
-	if (ret == NULL)
-		return NULL;
-
-	format_array(ret, len, fmt, array, array_size);
-	return ret;
-}
-
-static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len,
-			      loff_t *ppos)
-{
-	struct inode *inode = file->f_path.dentry->d_inode;
-	struct array_data *data = inode->i_private;
-	size_t size;
-
-	if (*ppos == 0) {
-		if (file->private_data) {
-			kfree(file->private_data);
-			file->private_data = NULL;
-		}
-
-		file->private_data = format_array_alloc("%u", data->array,
data->elements);
-	}
-
-	size = 0;
-	if (file->private_data)
-		size = strlen(file->private_data);
-
-	return simple_read_from_buffer(buf, len, ppos, file->private_data, size);
-}
-
-static int xen_array_release(struct inode *inode, struct file *file)
-{
-	kfree(file->private_data);
-
-	return 0;
-}
-
-static const struct file_operations u32_array_fops = {
-	.owner	= THIS_MODULE,
-	.open	= u32_array_open,
-	.release= xen_array_release,
-	.read	= u32_array_read,
-	.llseek = no_llseek,
-};
-
-struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode,
-					    struct dentry *parent,
-					    u32 *array, unsigned elements)
-{
-	struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
-
-	if (data == NULL)
-		return NULL;
-
-	data->array = array;
-	data->elements = elements;
-
-	return debugfs_create_file(name, mode, parent, data, &u32_array_fops);
-}
diff --git a/arch/x86/xen/debugfs.h b/arch/x86/xen/debugfs.h
index e281320..12ebf33 100644
--- a/arch/x86/xen/debugfs.h
+++ b/arch/x86/xen/debugfs.h
@@ -3,8 +3,4 @@
 
 struct dentry * __init xen_init_debugfs(void);
 
-struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode,
-					    struct dentry *parent,
-					    u32 *array, unsigned elements);
-
 #endif /* _XEN_DEBUGFS_H */
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index fc506e6..14a8961 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -286,7 +286,7 @@ static int __init xen_spinlock_debugfs(void)
 	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
 			   &spinlock_stats.time_blocked);
 
-	xen_debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
 				     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
 
 	return 0;
diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index 90f7657..df44ccf 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -18,6 +18,7 @@
 #include <linux/pagemap.h>
 #include <linux/namei.h>
 #include <linux/debugfs.h>
+#include <linux/slab.h>
 
 static ssize_t default_read_file(struct file *file, char __user *buf,
 				 size_t count, loff_t *ppos)
@@ -525,3 +526,130 @@ struct dentry *debugfs_create_blob(const char *name,
mode_t mode,
 	return debugfs_create_file(name, mode, parent, blob, &fops_blob);
 }
 EXPORT_SYMBOL_GPL(debugfs_create_blob);
+
+struct array_data {
+	void *array;
+	u32 elements;
+};
+
+static int u32_array_open(struct inode *inode, struct file *file)
+{
+	file->private_data = NULL;
+	return nonseekable_open(inode, file);
+}
+
+static size_t format_array(char *buf, size_t bufsize, const char *fmt,
+			   u32 *array, u32 array_size)
+{
+	size_t ret = 0;
+	u32 i;
+
+	for (i = 0; i < array_size; i++) {
+		size_t len;
+
+		len = snprintf(buf, bufsize, fmt, array[i]);
+		len++;	/* ' ' or '\n' */
+		ret += len;
+
+		if (buf) {
+			buf += len;
+			bufsize -= len;
+			buf[-1] = (i == array_size-1) ? '\n' : ' ';
+		}
+	}
+
+	ret++;		/* \0 */
+	if (buf)
+		*buf = '\0';
+
+	return ret;
+}
+
+static char *format_array_alloc(const char *fmt, u32 *array,
+						u32 array_size)
+{
+	size_t len = format_array(NULL, 0, fmt, array, array_size);
+	char *ret;
+
+	ret = kmalloc(len, GFP_KERNEL);
+	if (ret == NULL)
+		return NULL;
+
+	format_array(ret, len, fmt, array, array_size);
+	return ret;
+}
+
+static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len,
+			      loff_t *ppos)
+{
+	struct inode *inode = file->f_path.dentry->d_inode;
+	struct array_data *data = inode->i_private;
+	size_t size;
+
+	if (*ppos == 0) {
+		if (file->private_data) {
+			kfree(file->private_data);
+			file->private_data = NULL;
+		}
+
+		file->private_data = format_array_alloc("%u", data->array,
+							      data->elements);
+	}
+
+	size = 0;
+	if (file->private_data)
+		size = strlen(file->private_data);
+
+	return simple_read_from_buffer(buf, len, ppos,
+					file->private_data, size);
+}
+
+static int u32_array_release(struct inode *inode, struct file *file)
+{
+	kfree(file->private_data);
+
+	return 0;
+}
+
+static const struct file_operations u32_array_fops = {
+	.owner	 = THIS_MODULE,
+	.open	 = u32_array_open,
+	.release = u32_array_release,
+	.read	 = u32_array_read,
+	.llseek  = no_llseek,
+};
+
+/**
+ * debugfs_create_u32_array - create a debugfs file that is used to read u32
+ * array.
+ * @name: a pointer to a string containing the name of the file to create.
+ * @mode: the permission that the file should have.
+ * @parent: a pointer to the parent dentry for this file.  This should be a
+ *          directory dentry if set.  If this parameter is %NULL, then the
+ *          file will be created in the root of the debugfs filesystem.
+ * @array: u32 array that provides data.
+ * @elements: total number of elements in the array.
+ *
+ * This function creates a file in debugfs with the given name that exports
+ * @array as data. If the @mode variable is so set it can be read from.
+ * Writing is not supported. Seek within the file is also not supported.
+ * Once array is created its size can not be changed.
+ *
+ * The function returns a pointer to dentry on success. If debugfs is not
+ * enabled in the kernel, the value -%ENODEV will be returned.
+ */
+struct dentry *debugfs_create_u32_array(const char *name, mode_t mode,
+					    struct dentry *parent,
+					    u32 *array, u32 elements)
+{
+	struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
+
+	if (data == NULL)
+		return NULL;
+
+	data->array = array;
+	data->elements = elements;
+
+	return debugfs_create_file(name, mode, parent, data, &u32_array_fops);
+}
+EXPORT_SYMBOL_GPL(debugfs_create_u32_array);
diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
index e7d9b20..253e2fb 100644
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -74,6 +74,10 @@ struct dentry *debugfs_create_blob(const char *name, mode_t
mode,
 				  struct dentry *parent,
 				  struct debugfs_blob_wrapper *blob);
 
+struct dentry *debugfs_create_u32_array(const char *name, mode_t mode,
+					struct dentry *parent,
+					u32 *array, u32 elements);
+
 bool debugfs_initialized(void);
 
 #else
@@ -193,6 +197,13 @@ static inline bool debugfs_initialized(void)
 	return false;
 }
 
+struct dentry *debugfs_create_u32_array(const char *name, mode_t mode,
+					struct dentry *parent,
+					u32 *array, u32 elements)
+{
+	return ERR_PTR(-ENODEV);
+}
+
 #endif
 
 #endif

Raghavendra K T

2012-Jan-14 18:25 UTC

head link

[PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

Add a hypercall to KVM hypervisor to support pv-ticketlocks 

KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state.
    
The presence of these hypercalls is indicated to guest via
KVM_FEATURE_PVLOCK_KICK/KVM_CAP_PVLOCK_KICK.

Qemu needs a corresponding patch to pass up the presence of this feature to 
guest via cpuid. Patch to qemu will be sent separately.

Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com>
---
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 734c376..7a94987 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -16,12 +16,14 @@
 #define KVM_FEATURE_CLOCKSOURCE		0
 #define KVM_FEATURE_NOP_IO_DELAY	1
 #define KVM_FEATURE_MMU_OP		2
+
 /* This indicates that the new set of kvmclock msrs
  * are available. The use of 0x11 and 0x12 is deprecated
  */
 #define KVM_FEATURE_CLOCKSOURCE2        3
 #define KVM_FEATURE_ASYNC_PF		4
 #define KVM_FEATURE_STEAL_TIME		5
+#define KVM_FEATURE_PVLOCK_KICK		6
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4c938da..c7b05fc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2099,6 +2099,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_XSAVE:
 	case KVM_CAP_ASYNC_PF:
 	case KVM_CAP_GET_TSC_KHZ:
+	case KVM_CAP_PVLOCK_KICK:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
@@ -2576,7 +2577,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry,
u32 function,
 			     (1 << KVM_FEATURE_NOP_IO_DELAY) |
 			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
 			     (1 << KVM_FEATURE_ASYNC_PF) |
-			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
+			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+			     (1 << KVM_FEATURE_PVLOCK_KICK);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
@@ -5304,6 +5306,29 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+/*
+ * kvm_pv_kick_cpu_op:  Kick a vcpu.
+ *
+ * @apicid - apicid of vcpu to be kicked.
+ */
+static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
+{
+	struct kvm_vcpu *vcpu = NULL;
+	int i;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!kvm_apic_present(vcpu))
+			continue;
+
+		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
+			break;
+	}
+	if (vcpu) {
+		kvm_make_request(KVM_REQ_PVLOCK_KICK, vcpu);
+		kvm_vcpu_kick(vcpu);
+	}
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
@@ -5340,6 +5365,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 	case KVM_HC_MMU_OP:
 		r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2), &ret);
 		break;
+	case KVM_HC_KICK_CPU:
+		kvm_pv_kick_cpu_op(vcpu->kvm, a0);
+		ret = 0;
+		break;
 	default:
 		ret = -KVM_ENOSYS;
 		break;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 68e67e5..63fb6b0 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_PAPR 68
 #define KVM_CAP_S390_GMAP 71
 #define KVM_CAP_TSC_DEADLINE_TIMER 72
+#define KVM_CAP_PVLOCK_KICK 73
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d526231..3b1ae7b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -50,6 +50,7 @@
 #define KVM_REQ_APF_HALT          12
 #define KVM_REQ_STEAL_UPDATE      13
 #define KVM_REQ_NMI               14
+#define KVM_REQ_PVLOCK_KICK       15
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID	0
 
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index 47a070b..19f10bd 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -19,6 +19,7 @@
 #define KVM_HC_MMU_OP			2
 #define KVM_HC_FEATURES			3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
+#define KVM_HC_KICK_CPU			5
 
 /*
  * hypercalls use architecture specific

Raghavendra K T

2012-Jan-14 18:26 UTC

head link

[PATCH RFC V4 3/5] kvm guest : Added configuration support to enable debug information for KVM Guests

Added configuration support to enable debug information
for KVM Guests in debugfs
    
Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com>
---
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 72e8b64..344a7db 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -565,6 +565,15 @@ config KVM_GUEST
 	  This option enables various optimizations for running under the KVM
 	  hypervisor.
 
+config KVM_DEBUG_FS
+	bool "Enable debug information for KVM Guests in debugfs"
+	depends on KVM_GUEST && DEBUG_FS
+	default n
+	---help---
+	  This option enables collection of various statistics for KVM guest.
+   	  Statistics are displayed in debugfs filesystem. Enabling this option
+	  may incur significant overhead.
+
 source "arch/x86/lguest/Kconfig"
 
 config PARAVIRT

Raghavendra K T

2012-Jan-14 18:26 UTC

head link

[PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor

Extends Linux guest running on KVM hypervisor to support pv-ticketlocks. 

During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks. If so,
 support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com>
---
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 7a94987..cf5327c 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -195,10 +195,20 @@ void kvm_async_pf_task_wait(u32 token);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
-#else
-#define kvm_guest_init() do { } while (0)
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_spinlock_init(void);
+#else /* CONFIG_PARAVIRT_SPINLOCKS */
+static void kvm_spinlock_init(void)
+{
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#else /* CONFIG_KVM_GUEST */
+#define kvm_guest_init() do {} while (0)
 #define kvm_async_pf_task_wait(T) do {} while(0)
 #define kvm_async_pf_task_wake(T) do {} while(0)
+
 static inline u32 kvm_read_and_reset_pf_reason(void)
 {
 	return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a9c2116..ec55a0b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -33,6 +33,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/kprobes.h>
+#include <linux/debugfs.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -545,6 +546,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
 #endif
 	kvm_guest_cpu_init();
 	native_smp_prepare_boot_cpu();
+	kvm_spinlock_init();
 }
 
 static void __cpuinit kvm_guest_cpu_online(void *dummy)
@@ -627,3 +629,250 @@ static __init int activate_jump_labels(void)
 	return 0;
 }
 arch_initcall(activate_jump_labels);
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+enum kvm_contention_stat {
+	TAKEN_SLOW,
+	TAKEN_SLOW_PICKUP,
+	RELEASED_SLOW,
+	RELEASED_SLOW_KICKED,
+	NR_CONTENTION_STATS
+};
+
+#ifdef CONFIG_KVM_DEBUG_FS
+
+static struct kvm_spinlock_stats
+{
+	u32 contention_stats[NR_CONTENTION_STATS];
+
+#define HISTO_BUCKETS	30
+	u32 histo_spin_blocked[HISTO_BUCKETS+1];
+
+	u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+	u8 ret;
+	u8 old = ACCESS_ONCE(zero_stats);
+	if (unlikely(old)) {
+		ret = cmpxchg(&zero_stats, old, 0);
+		/* This ensures only one fellow resets the stat */
+		if (ret == old)
+			memset(&spinlock_stats, 0, sizeof(spinlock_stats));
+	}
+}
+
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+	check_zero();
+	spinlock_stats.contention_stats[var] += val;
+}
+
+
+static inline u64 spin_time_start(void)
+{
+	return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+	unsigned index = ilog2(delta);
+
+	check_zero();
+
+	if (index < HISTO_BUCKETS)
+		array[index]++;
+	else
+		array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+	u32 delta = sched_clock() - start;
+
+	__spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+	spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+	d_kvm_debug = debugfs_create_dir("kvm", NULL);
+	if (!d_kvm_debug)
+		printk(KERN_WARNING "Could not create 'kvm' debugfs
directory\n");
+
+	return d_kvm_debug;
+}
+
+static int __init kvm_spinlock_debugfs(void)
+{
+	struct dentry *d_kvm = kvm_init_debugfs();
+
+	if (d_kvm == NULL)
+		return -ENOMEM;
+
+	d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
+
+	debugfs_create_u8("zero_stats", 0644, d_spin_debug,
&zero_stats);
+
+	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
+	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
+
+	debugfs_create_u32("released_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
+	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
+
+	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
+			   &spinlock_stats.time_blocked);
+
+	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
+
+	return 0;
+}
+fs_initcall(kvm_spinlock_debugfs);
+#else  /* !CONFIG_KVM_DEBUG_FS */
+#define TIMEOUT			(1 << 10)
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+}
+
+static inline u64 spin_time_start(void)
+{
+	return 0;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+}
+#endif  /* CONFIG_KVM_DEBUG_FS */
+
+struct kvm_lock_waiting {
+	struct arch_spinlock *lock;
+	__ticket_t want;
+};
+
+/* cpus 'waiting' on a spinlock to become available */
+static cpumask_t waiting_cpus;
+
+/* Track spinlock on which a cpu is waiting */
+static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
+
+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
+{
+	struct kvm_lock_waiting *w = &__get_cpu_var(lock_waiting);
+	int cpu = smp_processor_id();
+	u64 start;
+	unsigned long flags;
+
+	start = spin_time_start();
+
+	/*
+	 * Make sure an interrupt handler can't upset things in a
+	 * partially setup state.
+	 */
+	local_irq_save(flags);
+
+	/*
+	 * The ordering protocol on this is that the "lock" pointer
+	 * may only be set non-NULL if the "want" ticket is correct.
+	 * If we're updating "want", we must first clear
"lock".
+	 */
+	w->lock = NULL;
+	smp_wmb();
+	w->want = want;
+	smp_wmb();
+	w->lock = lock;
+
+	add_stats(TAKEN_SLOW, 1);
+
+	/*
+	 * This uses set_bit, which is atomic but we should not rely on its
+	 * reordering gurantees. So barrier is needed after this call.
+	 */
+	cpumask_set_cpu(cpu, &waiting_cpus);
+
+	barrier();
+
+	/*
+	 * Mark entry to slowpath before doing the pickup test to make
+	 * sure we don't deadlock with an unlocker.
+	 */
+	__ticket_enter_slowpath(lock);
+
+	/*
+	 * check again make sure it didn't become free while
+	 * we weren't looking.
+	 */
+	if (ACCESS_ONCE(lock->tickets.head) == want) {
+		add_stats(TAKEN_SLOW_PICKUP, 1);
+		goto out;
+	}
+
+	/* Allow interrupts while blocked */
+	local_irq_restore(flags);
+
+	/* halt until it's our turn and kicked. */
+	halt();
+
+	local_irq_save(flags);
+out:
+	cpumask_clear_cpu(cpu, &waiting_cpus);
+	w->lock = NULL;
+	local_irq_restore(flags);
+	spin_time_accum_blocked(start);
+}
+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
+
+/* Kick a cpu by its apicid*/
+static inline void kvm_kick_cpu(int apicid)
+{
+	kvm_hypercall1(KVM_HC_KICK_CPU, apicid);
+}
+
+/* Kick vcpu waiting on @lock->head to reach value @ticket */
+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
+{
+	int cpu;
+	int apicid;
+
+	add_stats(RELEASED_SLOW, 1);
+
+	for_each_cpu(cpu, &waiting_cpus) {
+		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
+		if (ACCESS_ONCE(w->lock) == lock &&
+		    ACCESS_ONCE(w->want) == ticket) {
+			add_stats(RELEASED_SLOW_KICKED, 1);
+			apicid = per_cpu(x86_cpu_to_apicid, cpu);
+			kvm_kick_cpu(apicid);
+			break;
+		}
+	}
+}
+
+/*
+ * Setup pv_lock_ops to exploit KVM_FEATURE_PVLOCK_KICK if present.
+ */
+void __init kvm_spinlock_init(void)
+{
+	if (!kvm_para_available())
+		return;
+	/* Does host kernel support KVM_FEATURE_PVLOCK_KICK? */
+	if (!kvm_para_has_feature(KVM_FEATURE_PVLOCK_KICK))
+		return;
+
+	jump_label_inc(&paravirt_ticketlocks_enabled);
+
+	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
+	pv_lock_ops.unlock_kick = kvm_unlock_kick;
+}
+#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c7b05fc..4d7a950 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5754,8 +5754,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	local_irq_disable();
 
-	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
-	    || need_resched() || signal_pending(current)) {
+	if (vcpu->mode == EXITING_GUEST_MODE
+		 || (vcpu->requests & ~(1UL<<KVM_REQ_PVLOCK_KICK))
+		 || need_resched() || signal_pending(current)) {
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		smp_wmb();
 		local_irq_enable();
@@ -6711,6 +6712,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 		!vcpu->arch.apf.halted)
 		|| !list_empty_careful(&vcpu->async_pf.done)
 		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
+		|| kvm_check_request(KVM_REQ_PVLOCK_KICK, vcpu)
 		|| atomic_read(&vcpu->arch.nmi_queued) ||
 		(kvm_arch_interrupt_allowed(vcpu) &&
 		 kvm_cpu_has_interrupt(vcpu));

Raghavendra K T

2012-Jan-14 18:27 UTC

head link

[PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock

Add Documentation on CPUID, KVM_CAP_PVLOCK_KICK, and Hypercalls supported.

KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in
paravirtual spinlock enabled guest.

KVM_FEATURE_PVLOCK_KICK enables guest to check whether pv spinlock can
be enabled in guest. support in host is queried via
ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)

A minimal Documentation and template is added for hypercalls.

Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com>
---
diff --git a/Documentation/virtual/kvm/api.txt
b/Documentation/virtual/kvm/api.txt
index e2a4b52..1583bc7 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1109,6 +1109,13 @@ support.  Instead it is reported via
 if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
 feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
 
+Paravirtualized ticket spinlocks can be enabled in guest by checking whether
+support exists in host via,
+
+  ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)
+
+if this call return true, guest can use the feature.
+
 4.47 KVM_PPC_GET_PVINFO
 
 Capability: KVM_CAP_PPC_GET_PVINFO
diff --git a/Documentation/virtual/kvm/cpuid.txt
b/Documentation/virtual/kvm/cpuid.txt
index 8820685..c7fc0da 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -39,6 +39,10 @@ KVM_FEATURE_CLOCKSOURCE2           ||     3 || kvmclock
available at msrs
 KVM_FEATURE_ASYNC_PF               ||     4 || async pf can be enabled by
                                    ||       || writing to msr 0x4b564d02
 ------------------------------------------------------------------------------
+KVM_FEATURE_PVLOCK_KICK            ||     6 || guest checks this feature bit
+                                   ||       || before enabling paravirtualized
+                                   ||       || spinlock support.
+------------------------------------------------------------------------------
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
                                    ||       || per-cpu warps are expected in
                                    ||       || kvmclock.
diff --git a/Documentation/virtual/kvm/hypercalls.txt
b/Documentation/virtual/kvm/hypercalls.txt
new file mode 100644
index 0000000..7872da5
--- /dev/null
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -0,0 +1,54 @@
+KVM Hypercalls Documentation
+==========================+Template for documentation is
+The documenenation for hypercalls should inlcude
+1. Hypercall name, value.
+2. Architecture(s)
+3. Purpose
+
+
+1. KVM_HC_VAPIC_POLL_IRQ
+------------------------
+value: 1
+Architecture: x86
+Purpose:
+
+2. KVM_HC_MMU_OP
+------------------------
+value: 2
+Architecture: x86
+Purpose: Support MMU operations such as writing to PTE,
+flushing TLB, release PT.
+
+3. KVM_HC_FEATURES
+------------------------
+value: 3
+Architecture: PPC
+Purpose:
+
+4. KVM_HC_PPC_MAP_MAGIC_PAGE
+------------------------
+value: 4
+Architecture: PPC
+Purpose: To enable communication between the hypervisor and guest there is a
+new shared page that contains parts of supervisor visible register state.
+The guest can map this shared page using this hypercall.
+
+5. KVM_HC_KICK_CPU
+------------------------
+value: 5
+Architecture: x86
+Purpose: Hypercall used to wakeup a vcpu from HLT state
+
+Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
+kernel mode for an event to occur (ex: a spinlock to become available)
+can execute HLT instruction once it has busy-waited for more than a
+threshold time-interval. Execution of HLT instruction would cause
+the hypervisor to put the vcpu to sleep (unless yield_on_hlt=0) until occurence
+of an appropriate event. Another vcpu of the same guest can wakeup the sleeping
+vcpu by issuing KVM_HC_KICK_CPU hypercall, specifying APIC ID of the vcpu to be
+wokenup.
+
+TODO:
+1. more information on input and output needed?
+2. Add more detail to purpose of hypercalls.

Alexander Graf

2012-Jan-16 03:57 UTC

head link

[PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests

On 14.01.2012, at 19:25, Raghavendra K T wrote:
> The 5-patch series to follow this email extends KVM-hypervisor and Linux
guest
> running on KVM-hypervisor to support pv-ticket spinlocks, based on
Xen's implementation.
> 
> One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick
> another vcpu out of halt state.
> The blocking of vcpu is done using halt() in (lock_spinning) slowpath.
Is the code for this even upstream? Prerequisite series seem to have been posted
by Jeremy, but they didn't appear to have made it in yet.

Either way, thinking about this I stumbled over the following passage of his
patch:
> +               unsigned count = SPIN_THRESHOLD;
> +
> +               do {
> +                       if (inc.head == inc.tail)
> +                               goto out;
> +                       cpu_relax();
> +                       inc.head = ACCESS_ONCE(lock->tickets.head);
> +               } while (--count);
> +               __ticket_lock_spinning(lock, inc.tail);

That means we're spinning for n cycles, then notify the spinlock holder that
we'd like to get kicked and go sleeping. While I'm pretty sure that it
improves the situation, it doesn't solve all of the issues we have.

Imagine we have an idle host. All vcpus can freely run and everyone can fetch
the lock as fast as on real machines. We don't need to / want to go to sleep
here. Locks that take too long are bugs that need to be solved on real hw just
as well, so all we do is possibly incur overhead.

Imagine we have a contended host. Every vcpu gets at most 10% of a real
CPU's runtime. So chances are 1:10 that you're currently running while
you need to be. In such a setup, it's probably a good idea to be very
pessimistic. Try to fetch the lock for 100 cycles and then immediately make room
for all the other VMs that have real work going on!

So what I'm trying to get to is that if we had a hypervisor settable spin
threshold, we could adjust it according to the host's load, getting VMs to
behave differently on different (guest invisible) circumstances.

Speaking of which - don't we have spin lock counters in the CPUs now? I
thought we could set intercepts that notify us when the guest issues too many
repz nops or whatever the typical spinlock identifier was. Can't we reuse
that and just interrupt the guest if we see this with a special KVM interrupt
that kicks off the internal spin lock waiting code? That way we don't slow
down all those bare metal boxes.

Speaking of which - have you benchmarked performance degradation of pv ticket
locks on bare metal? Last time I checked, enabling all the PV ops did incur
significant slowdown which is why I went though the work to split the individual
pv ops features up to only enable a few for KVM guests.
> 
> Changes in V4:
> - reabsed to 3.2.0 pre.
> - use APIC ID for kicking the vcpu and use kvm_apic_match_dest for
matching. (Avi)
> - fold vcpu->kicked flag into vcpu->requests (KVM_REQ_PVLOCK_KICK)
and related
>  changes for UNHALT path to make pv ticket spinlock migration friendly.
(Avi, Marcello)
> - Added Documentation for CPUID, Hypercall (KVM_HC_KICK_CPU)
>  and capabilty (KVM_CAP_PVLOCK_KICK) (Avi)
> - Remove unneeded kvm_arch_vcpu_ioctl_set_mpstate call. (Marcello)
> - cumulative variable type changed (int ==> u32) in add_stat (Konrad)
> - remove unneeded kvm_guest_init for !CONFIG_KVM_GUEST case
> 
> Changes in V3:
> - rebased to 3.2-rc1
> - use halt() instead of wait for kick hypercall.
> - modify kick hyper call to do wakeup halted vcpu.
> - hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of
head##.c).
> - fix the potential race when zero_stat is read.
> - export debugfs_create_32 and add documentation to API.
> - use static inline and enum instead of ADDSTAT macro. 
> - add  barrier() in after setting kick_vcpu.
> - empty static inline function for kvm_spinlock_init.
> - combine the patches one and two readuce overhead.
> - make KVM_DEBUGFS depends on DEBUGFS.
> - include debugfs header unconditionally.
> 
> Changes in V2:
> - rebased patchesto -rc9
> - synchronization related changes based on Jeremy's changes 
> (Jeremy Fitzhardinge <jeremy.fitzhardinge at citrix.com>) pointed by 
> Stephan Diestelhorst <stephan.diestelhorst at amd.com>
> - enabling 32 bit guests
> - splitted patches into two more chunks
> 
> Srivatsa Vaddagiri, Suzuki Poulose, Raghavendra K T (5): 
>  Add debugfs support to print u32-arrays in debugfs
>  Add a hypercall to KVM hypervisor to support pv-ticketlocks
>  Added configuration support to enable debug information for KVM Guests
>  pv-ticketlocks support for linux guests running on KVM hypervisor
>  Add documentation on Hypercalls and features used for PV spinlock
> 
> Test Set up :
> The BASE patch is pre 3.2.0 + Jeremy's following patches.
> xadd (https://lkml.org/lkml/2011/10/4/328)
> x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
> Kernel for host/guest : 3.2.0 + Jeremy's xadd, pv spinlock patches as
BASE
> (Note:locked add change is not taken yet)
> 
> Results:
> The performance gain is mainly because of reduced busy-wait time.
> From the results we can see that patched kernel performance is similar to
> BASE when there is no lock contention. But once we start seeing more
> contention, patched kernel outperforms BASE (non PLE).
> On PLE machine we do not see greater performance improvement because of PLE
> complimenting halt()
> 
> 3 guests with 8VCPU, 4GB RAM, 1 used for kernbench
> (kernbench -f -H -M -o 20) other for cpuhog (shell script while
> true with an instruction)
> 
> scenario A: unpinned
> 
> 1x: no hogs
> 2x: 8hogs in one guest
> 3x: 8hogs each in two guest
> 
> scenario B: unpinned, run kernbench on all the guests no hogs.
> 
> Dbench on PLE machine:
> dbench run on all the guest simultaneously with
> dbench --warmup=30 -t 120 with NRCLIENTS=(8/16/32).
> 
> Result for Non PLE machine :
> ===========================> Machine : IBM xSeries with Intel(R) Xeon(R)
x5570 2.93GHz CPU with 8 core , 64GB RAM
> 		 BASE                    BASE+patch            %improvement
> 		 mean (sd)               mean (sd)
> Scenario A:
> case 1x:	 164.233 (16.5506) 	 163.584 (15.4598 	0.39517
> case 2x:	 897.654 (543.993) 	 328.63 (103.771) 	63.3901
> case 3x:	 2855.73 (2201.41) 	 315.029 (111.854) 	88.9685
> 
> Dbench:
> Throughput is in MB/sec
> NRCLIENTS	 BASE                    BASE+patch            %improvement
>               	 mean (sd)               mean (sd)
> 8       	1.774307  (0.061361) 	1.725667  (0.034644) 	-2.74135
> 16      	1.445967  (0.044805) 	1.463173  (0.094399) 	1.18993
> 32        	2.136667  (0.105717) 	2.193792  (0.129357) 	2.67356
> 
> Result for PLE machine:
> =====================> Machine : IBM xSeries with Intel(R) Xeon(R) 
X7560 2.27GHz CPU with 32/64 core, with 8
>         online cores and 4*64GB RAM
> 
> Kernbench:
> 		 BASE                    BASE+patch            %improvement
> 		 mean (sd)               mean (sd)
> Scenario A:	 			
> case 1x:	 161.263 (56.518) 	 159.635 (40.5621) 	1.00953
> case 2x:	 190.748 (61.2745) 	 190.606 (54.4766) 	0.0744438
> case 3x:	 227.378 (100.215) 	 225.442 (92.0809) 	0.851446
> 
> Scenario B:
> 		 446.104 (58.54 )	 433.12733 (54.476)	2.91
> 
> Dbench:
> Throughput is in MB/sec
> NRCLIENTS	 BASE                    BASE+patch            %improvement
>               	 mean (sd)               mean (sd)
> 8       	1.101190  (0.875082) 	1.700395  (0.846809) 	54.4143
> 16      	1.524312  (0.120354) 	1.477553  (0.058166) 	-3.06755
> 32        	2.143028  (0.157103) 	2.090307  (0.136778) 	-2.46012
So on a very contended system we're actually slower? Is this expected?


Alex

Avi Kivity

2012-Jan-16 09:00 UTC

head link

[PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock

On 01/14/2012 08:27 PM, Raghavendra K T wrote:> +
> +5. KVM_HC_KICK_CPU
> +------------------------
> +value: 5
> +Architecture: x86
> +Purpose: Hypercall used to wakeup a vcpu from HLT state
> +
> +Usage example : A vcpu of a paravirtualized guest that is busywaiting in
guest
> +kernel mode for an event to occur (ex: a spinlock to become available)
> +can execute HLT instruction once it has busy-waited for more than a
> +threshold time-interval. Execution of HLT instruction would cause
> +the hypervisor to put the vcpu to sleep (unless yield_on_hlt=0) until
occurence
> +of an appropriate event. Another vcpu of the same guest can wakeup the
sleeping
> +vcpu by issuing KVM_HC_KICK_CPU hypercall, specifying APIC ID of the vcpu
to be
> +wokenup.
Wait, what happens with yield_on_hlt=0?  Will the hypercall work as
advertised?
> +
> +TODO:
> +1. more information on input and output needed?
> +2. Add more detail to purpose of hypercalls.
>

-- 
error compiling committee.c: too many arguments to function

Avi Kivity

2012-Jan-16 09:03 UTC

head link

[PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

On 01/14/2012 08:25 PM, Raghavendra K T wrote:> Add a hypercall to KVM hypervisor to support pv-ticketlocks 
>
> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt
state.
>     
> The presence of these hypercalls is indicated to guest via
> KVM_FEATURE_PVLOCK_KICK/KVM_CAP_PVLOCK_KICK.
>
> Qemu needs a corresponding patch to pass up the presence of this feature to
> guest via cpuid. Patch to qemu will be sent separately.
No need to discuss qemu in a kernel patch.
>  
> +/*
> + * kvm_pv_kick_cpu_op:  Kick a vcpu.
> + *
> + * @apicid - apicid of vcpu to be kicked.
> + */
> +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
> +{
> +	struct kvm_vcpu *vcpu = NULL;
> +	int i;
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		if (!kvm_apic_present(vcpu))
> +			continue;
> +
> +		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
> +			break;
> +	}
> +	if (vcpu) {
> +		kvm_make_request(KVM_REQ_PVLOCK_KICK, vcpu);
> +		kvm_vcpu_kick(vcpu);
> +	}
> +}
> +
The code that handles KVM_REQ_PVLOCK_KICK needs to be in this patch.


-- 
error compiling committee.c: too many arguments to function

Avi Kivity

2012-Jan-16 09:05 UTC

head link

[PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor

On 01/14/2012 08:26 PM, Raghavendra K T wrote:> Extends Linux guest running on KVM hypervisor to support pv-ticketlocks. 
>
> During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor
has
> required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks. If
so,
>  support for pv-ticketlocks is registered via pv_lock_ops.
>
> Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.
> +
> +	debugfs_create_u8("zero_stats", 0644, d_spin_debug,
&zero_stats);
> +
> +	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
> +	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
> +
> +	debugfs_create_u32("released_slow", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
> +	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
> +
> +	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
> +			   &spinlock_stats.time_blocked);
> +
> +	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
> +		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
> +
>
Please drop all of these and replace with tracepoints in the appropriate
spots.  Everything else (including the historgram) can be reconstructed
the tracepoints in userspace.

-- 
error compiling committee.c: too many arguments to function

Raghavendra K T

2012-Jan-16 09:55 UTC

head link

[PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

On 01/16/2012 02:33 PM, Avi Kivity wrote:>> +/*
>> + * kvm_pv_kick_cpu_op:  Kick a vcpu.
>> + *
>> + * @apicid - apicid of vcpu to be kicked.
>> + */
>> +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
>> +{
>> +	struct kvm_vcpu *vcpu = NULL;
>> +	int i;
>> +
>> +	kvm_for_each_vcpu(i, vcpu, kvm) {
>> +		if (!kvm_apic_present(vcpu))
>> +			continue;
>> +
>> +		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
>> +			break;
>> +	}
>> +	if (vcpu) {
>> +		kvm_make_request(KVM_REQ_PVLOCK_KICK, vcpu);
>> +		kvm_vcpu_kick(vcpu);
>> +	}
>> +}
>> +
>
> The code that handles KVM_REQ_PVLOCK_KICK needs to be in this patch.
>
>
Yes, Agree. as Alex also pointed, the related hunk from patch 4 should 
be added here.

Raghavendra K T

2012-Jan-16 14:13 UTC

head link

[PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor

On 01/16/2012 02:35 PM, Avi Kivity wrote:> On 01/14/2012 08:26 PM, Raghavendra K T wrote:
>> Extends Linux guest running on KVM hypervisor to support
pv-ticketlocks.
>>
>> During smp_boot_cpus  paravirtualied KVM guest detects if the
hypervisor has
>> required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks.
If so,
>>   support for pv-ticketlocks is registered via pv_lock_ops.
>>
>> Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.
>> +
>> +	debugfs_create_u8("zero_stats", 0644,
d_spin_debug,&zero_stats);
>> +
>> +	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
>> +		&spinlock_stats.contention_stats[TAKEN_SLOW]);
>> +	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
>> +		&spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
>> +
>> +	debugfs_create_u32("released_slow", 0444, d_spin_debug,
>> +		&spinlock_stats.contention_stats[RELEASED_SLOW]);
>> +	debugfs_create_u32("released_slow_kicked", 0444,
d_spin_debug,
>> +		&spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
>> +
>> +	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
>> +			&spinlock_stats.time_blocked);
>> +
>> +	debugfs_create_u32_array("histo_blocked", 0444,
d_spin_debug,
>> +		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
>> +
>>
>
> Please drop all of these and replace with tracepoints in the appropriate
> spots.  Everything else (including the historgram) can be reconstructed
> the tracepoints in userspace.
>
I think Jeremy pointed that tracepoints use spinlocks and hence debugfs
is the option.. no ?

Raghavendra K T

2012-Jan-18 02:27 UTC

head link

[PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests

On 01/18/2012 03:27 AM, Dave Hansen wrote:> On 01/17/2012 10:36 AM, Raghavendra K T wrote:
>> It was a quick test.  two iteration of kernbench (=6runs) and had
>> ensured cache is cleared.
>>
>> echo "1">  /proc/sys/vm/drop_caches
>> ccache -C. Yes may be I can run test as you mentioned..
>
> echo 3>  /proc/sys/vm/drop_caches
>
> is better.  1 will only do page cache, but 3 also does dentries fwiw.
>
yes. that needs to be used.

Raghavendra K T

2012-Jan-24 19:01 UTC

head link

[PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor

On 01/18/2012 12:27 AM, Raghavendra K T wrote:> On 01/17/2012 04:32 PM, Marcelo Tosatti wrote:
>> On Sat, Jan 14, 2012 at 11:56:46PM +0530, Raghavendra K T wrote:
[...]>>> + || (vcpu->requests& ~(1UL<<KVM_REQ_PVLOCK_KICK))
>>> + || need_resched() || signal_pending(current)) {
>>> vcpu->mode = OUTSIDE_GUEST_MODE;
>>> smp_wmb();
>>> local_irq_enable();
>>> @@ -6711,6 +6712,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu
*vcpu)
>>> !vcpu->arch.apf.halted)
>>> || !list_empty_careful(&vcpu->async_pf.done)
>>> || vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
>>> + || kvm_check_request(KVM_REQ_PVLOCK_KICK, vcpu)
>>
>> The bit should only be read here (kvm_arch_vcpu_runnable), but cleared
>> on vcpu entry (along with the other kvm_check_request processing).
>>
[...]>
> I had tried alternative approach earlier, I think that is closer
> to your expectation.
>
> where
> - flag is read in kvm_arch_vcpu_runnable
> - flag cleared in vcpu entry along with others.
>
> But it needs per vcpu flag to remember that pv_unhalted while clearing
> the flag in vcpu enter [ patch below ]. Could not find third alternative
> though.
[...]> do you think having pv_unhalt flag in below patch cause problem to
> live migration still? (vcpu->request bit is retained as is) OR do we
> have to have KVM_GET_MP_STATE changes also with below patch you
> mentioned earlier.
>
Avi, Marcello, Please let me know, any comments you have on how should
it look like in next version?
Should I get rid of KVM_REQ_PVLOCK_KICK bit in vcpu->request and have
only pv_unahlt flag as below and also add MSR as suggested?
> ---8<---
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c38efd7..1bf8fa8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5684,6 +5717,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> r = 1;
> goto out;
> }
> + if (kvm_check_request(KVM_REQ_PVKICK, vcpu)) {
> + vcpu->pv_unhalted = 1;
> + r = 1;
> + goto out;
> + }
> if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu))
> record_steal_time(vcpu);
> if (kvm_check_request(KVM_REQ_NMI, vcpu))
> @@ -6683,6 +6720,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
> !vcpu->arch.apf.halted)
> || !list_empty_careful(&vcpu->async_pf.done)
> || vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
> + || (kvm_test_request(KVM_REQ_PVKICK, vcpu) || vcpu->pv_unhalted)
> || atomic_read(&vcpu->arch.nmi_queued) ||
> (kvm_arch_interrupt_allowed(vcpu) &&
> kvm_cpu_has_interrupt(vcpu));
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d526231..a48e0f2 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -154,6 +155,8 @@ struct kvm_vcpu {
> #endif
>
> struct kvm_vcpu_arch arch;
> +
> + int pv_unhalted;
> };
>
> static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
> @@ -770,5 +773,12 @@ static inline bool kvm_check_request(int req,
> struct kvm_vcpu *vcpu)
> }
> }
>
> +static inline bool kvm_test_request(int req, struct kvm_vcpu *vcpu)
> +{
> + if (test_bit(req, &vcpu->requests))
> + return true;
> + else
> + return false;
> +}
> #endif
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index d9cfb78..55c44a2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -226,6 +226,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm
> *kvm, unsigned id)
> vcpu->kvm = kvm;
> vcpu->vcpu_id = id;
> vcpu->pid = NULL;
> + vcpu->pv_unhalted = 0;
> init_waitqueue_head(&vcpu->wq);
> kvm_async_pf_vcpu_init(vcpu);
>
> @@ -1509,11 +1510,12 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
> void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> {
> DEFINE_WAIT(wait);
> for (;;) {
> prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
>
> if (kvm_arch_vcpu_runnable(vcpu)) {
> + vcpu->pv_unhalted = 0;
> kvm_make_request(KVM_REQ_UNHALT, vcpu);
> break;
> }
>

Konrad Rzeszutek Wilk

2012-Jan-25 19:05 UTC

head link

[PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests

On Wed, Jan 25, 2012 at 11:15:36PM +0530, Raghavendra K T
wrote:> On 01/25/2012 10:05 PM, Konrad Rzeszutek Wilk wrote:
> >>So it seems we have worst case overhead of around 8%. But we see
> >>improvement of at-least 15% once when little more time is spent in
> >>critical section.
> >
> >Is this with collecting the histogram information about spinlocks? We
found
> >that if you enable that for production runs it makes them quite slower.
> >
> 
> Ok. Are you referring to CONFIG_KVM_DEBUG_FS/CONFIG_XEN_DEBUG_FS?. Not
Those were the ones.
> it was not enabled. But then may be I was not precise. This test was
> only on native. So it should have not affected IMO.
Yup.> 
> It is nice to know that histogram affects, since I was always under the
> impression that it does not affect much on guest too. My experiments
> had almost always enabled that.
> 
> Let me know if you referred to something else..

Possibly Parallel Threads

Search for more seemingly similar threads

Linux Virtualization - Jan 2012 - [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests

[PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests

[PATCH RFC V4 1/5] debugfs: Add support to print u32 array in debugfs

[PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

[PATCH RFC V4 3/5] kvm guest : Added configuration support to enable debug information for KVM Guests

[PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor

[PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock

[PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests

[PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock

[PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

[PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor

[PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

[PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor

[PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests

[PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor

[PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests

Possibly Parallel Threads