thr3ads.net - Virtualization - [PATCH RFC V5 0/6] kvm : Paravirt-spinlock support for KVM guests [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Raghavendra K T

2012-Mar-23 08:05 UTC

[PATCH RFC V5 0/6] kvm : Paravirt-spinlock support for KVM guests

The 6-patch series to follow this email extends KVM-hypervisor and Linux guest
running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's 
implementation.

One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick
another vcpu out of halt state.
The blocking of vcpu is done using halt() in (lock_spinning) slowpath.
one MSR is added to aid live migration.

Changes in V5:
- rebased to 3.3-rc6
- added PV_UNHALT_MSR that would help in live migration (Avi)
- removed PV_LOCK_KICK vcpu request and pv_unhalt flag (re)added.
- Changed hypercall documentaion (Alex).
- mode_t changed to umode_t in debugfs.
- MSR related documentation added.
- rename PV_LOCK_KICK to PV_UNHALT. 
- host and guest patches not mixed. (Marcelo, Alex)
- kvm_kick_cpu now takes cpu so it can be used by flush_tlb_ipi_other 
   paravirtualization (Nikunj)
- coding style changes in variable declarion etc (Srikar)

Changes in V4:
- reabsed to 3.2.0 pre.
- use APIC ID for kicking the vcpu and use kvm_apic_match_dest for matching
(Avi)
- fold vcpu->kicked flag into vcpu->requests (KVM_REQ_PVLOCK_KICK) and
related
  changes for UNHALT path to make pv ticket spinlock migration friendly(Avi,
Marcello)
- Added Documentation for CPUID, Hypercall (KVM_HC_KICK_CPU)
  and capabilty (KVM_CAP_PVLOCK_KICK) (Avi)
- Remove unneeded kvm_arch_vcpu_ioctl_set_mpstate call. (Marcello)
- cumulative variable type changed (int ==> u32) in add_stat (Konrad)
- remove unneeded kvm_guest_init for !CONFIG_KVM_GUEST case

Changes in V3:
- rebased to 3.2-rc1
- use halt() instead of wait for kick hypercall.
- modify kick hyper call to do wakeup halted vcpu.
- hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of
head##.c).
- fix the potential race when zero_stat is read.
- export debugfs_create_32 and add documentation to API.
- use static inline and enum instead of ADDSTAT macro. 
- add  barrier() in after setting kick_vcpu.
- empty static inline function for kvm_spinlock_init.
- combine the patches one and two readuce overhead.
- make KVM_DEBUGFS depends on DEBUGFS.
- include debugfs header unconditionally.

Changes in V2:
- rebased patchesto -rc9
- synchronization related changes based on Jeremy's changes 
 (Jeremy Fitzhardinge <jeremy.fitzhardinge at citrix.com>) pointed by 
 Stephan Diestelhorst <stephan.diestelhorst at amd.com>
- enabling 32 bit guests
- splitted patches into two more chunks

Test Set up :
The BASE patch is  3.3.0-rc6 + jumplabel split patch
(https://lkml.org/lkml/2012/2/21/167)
+  ticketlock cleanup patch (https://lkml.org/lkml/2012/3/21/161)

Results:
 The performance gain is mainly because of reduced busy-wait time.
 From the results we can see that patched kernel performance is similar to
 BASE when there is no lock contention. But once we start seeing more
 contention, patched kernel outperforms BASE.

3 guests with 8VCPU, 8GB RAM, 1 used for kernbench
(kernbench -f -H -M -o 20) other for cpuhog (shell script while
true with an instruction)

1x: no hogs
2x: 8hogs in one guest
3x: 8hogs each in two guest

1) kernbench
Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB
RAM
		 BASE                    BASE+patch            %improvement
		 mean (sd)               mean (sd)
case 1x:	 38.1033 (43.502) 	 38.09 (43.4269) 	0.0349051
case 2x:	 778.622 (1092.68) 	 129.342 (156.324) 	83.3883
case 3x:	 2399.11 (3548.32) 	 114.913 (139.5) 	95.2102

2) pgbench:
pgbench version: http://www.postgresql.org/ftp/snapshot/dev/
tool used for benchmarking: git://git.postgresql.org/git/pgbench-tools.git
Ananlysis is done using ministat.
Test is done for 1x overcommit to check overhead of pv spinlock.
There is small performance penalty in non contention scenario (note BASE
is jeremy's ticketlock). But with increase in number of threads, improvement
is
seen. 

guest: 64bit 8 vCPU and 8GB RAM
shared buffer size = 2GB
x base_kernel
+ patched_kernel
    N           Min           Max        Median           Avg        Stddev
+--------------------- NRCLIENT = 1 ----------------------------------------+
x  10     7468.0719     7774.0026     7529.9217     7594.9696      128.7725
+  10      7280.413     7650.6619     7425.7968     7434.9344     144.59127
Difference at 95.0% confidence
	-160.035 +/- 128.641
	-2.10712% +/- 1.69376%
+--------------------- NRCLIENT = 2 ----------------------------------------+
x  10     14604.344     14849.358     14725.845     14724.722     76.866294
+  10     14070.064     14246.013     14125.556     14138.169     60.556379
Difference at 95.0% confidence
	-586.553 +/- 65.014
	-3.98346% +/- 0.441529%
+--------------------- NRCLIENT = 4 ----------------------------------------+
x  10     27891.073     28305.466     28059.892     28060.231     115.65612
+  10     27237.685     27639.645      27297.79     27375.966     145.31006
Difference at 95.0% confidence
	-684.265 +/- 123.39
	-2.43856% +/- 0.439734%
+--------------------- NRCLIENT = 8 ----------------------------------------+
x  10     53063.509     53498.677      53343.24     53309.697     138.77983
+  10     51705.708     52208.274      52030.06     51987.067     156.65323
Difference at 95.0% confidence
	-1322.63 +/- 139.048
	-2.48103% +/- 0.26083%
+--------------------- NRCLIENT = 16 ---------------------------------------+
x  10     50043.347     52701.253     52235.978     51993.466     817.44911
+  10     51562.772     52272.412     51905.317     51946.557     228.54314
No difference proven at 95.0% confidence
+--------------------- NRCLIENT = 32 --------------------------------------+
x  10     49178.789     51284.599     50288.185     50275.212     616.80154
+  10     50722.097     52145.041     51551.112     51512.423     469.18898
Difference at 95.0% confidence
	1237.21 +/- 514.888
	2.46088% +/- 1.02414%
+--------------------------------------------------------------------------+

Let me know if you have any sugestion/comments...

---
 V4 kernel changes:
 https://lkml.org/lkml/2012/1/14/66
 Qemu changes for V4:
 http://www.mail-archive.com/kvm at vger.kernel.org/msg66450.html

 V3 kernel Changes:
 https://lkml.org/lkml/2011/11/30/62
 V2 kernel changes : 
 https://lkml.org/lkml/2011/10/23/207

 Previous discussions : (posted by Srivatsa V).
 https://lkml.org/lkml/2010/7/26/24
 https://lkml.org/lkml/2011/1/19/212
 
 Qemu patch for V3:
 http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg00397.html
 
Srivatsa Vaddagiri, Suzuki Poulose, Raghavendra K T (6): 
  Add debugfs support to print u32-arrays in debugfs
  Add a hypercall to KVM hypervisor to support pv-ticketlocks
  Add  unhalt msr to aid migration
  Added configuration support to enable debug information for KVM Guests
  pv-ticketlock support for linux guests running on KVM hypervisor
  Add documentation on Hypercalls and features used for PV spinlock

 Documentation/virtual/kvm/api.txt        |    7 +
 Documentation/virtual/kvm/cpuid.txt      |    4 +
 Documentation/virtual/kvm/hypercalls.txt |   59 +++++++
 Documentation/virtual/kvm/msr.txt        |    9 +
 arch/x86/Kconfig                         |    9 +
 arch/x86/include/asm/kvm_para.h          |   18 ++-
 arch/x86/kernel/kvm.c                    |  254 ++++++++++++++++++++++++++++++
 arch/x86/kvm/cpuid.c                     |    3 +-
 arch/x86/kvm/x86.c                       |   40 +++++-
 arch/x86/xen/debugfs.c                   |  104 ------------
 arch/x86/xen/debugfs.h                   |    4 -
 arch/x86/xen/spinlock.c                  |    2 +-
 fs/debugfs/file.c                        |  128 +++++++++++++++
 include/linux/debugfs.h                  |   11 ++
 include/linux/kvm.h                      |    1 +
 include/linux/kvm_host.h                 |    1 +
 include/linux/kvm_para.h                 |    1 +
 virt/kvm/kvm_main.c                      |    4 +
 18 files changed, 545 insertions(+), 114 deletions(-)

Raghavendra K T

2012-Mar-23 08:06 UTC

head link

[PATCH RFC V5 1/6] debugfs: Add support to print u32 array in debugfs

From: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com>

Move the code from Xen to debugfs to make the code common
for other users as well.

Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
---
diff --git a/arch/x86/xen/debugfs.c b/arch/x86/xen/debugfs.c
index ef1db19..c8377fb 100644
--- a/arch/x86/xen/debugfs.c
+++ b/arch/x86/xen/debugfs.c
@@ -19,107 +19,3 @@ struct dentry * __init xen_init_debugfs(void)
 	return d_xen_debug;
 }
 
-struct array_data
-{
-	void *array;
-	unsigned elements;
-};
-
-static int u32_array_open(struct inode *inode, struct file *file)
-{
-	file->private_data = NULL;
-	return nonseekable_open(inode, file);
-}
-
-static size_t format_array(char *buf, size_t bufsize, const char *fmt,
-			   u32 *array, unsigned array_size)
-{
-	size_t ret = 0;
-	unsigned i;
-
-	for(i = 0; i < array_size; i++) {
-		size_t len;
-
-		len = snprintf(buf, bufsize, fmt, array[i]);
-		len++;	/* ' ' or '\n' */
-		ret += len;
-
-		if (buf) {
-			buf += len;
-			bufsize -= len;
-			buf[-1] = (i == array_size-1) ? '\n' : ' ';
-		}
-	}
-
-	ret++;		/* \0 */
-	if (buf)
-		*buf = '\0';
-
-	return ret;
-}
-
-static char *format_array_alloc(const char *fmt, u32 *array, unsigned
array_size)
-{
-	size_t len = format_array(NULL, 0, fmt, array, array_size);
-	char *ret;
-
-	ret = kmalloc(len, GFP_KERNEL);
-	if (ret == NULL)
-		return NULL;
-
-	format_array(ret, len, fmt, array, array_size);
-	return ret;
-}
-
-static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len,
-			      loff_t *ppos)
-{
-	struct inode *inode = file->f_path.dentry->d_inode;
-	struct array_data *data = inode->i_private;
-	size_t size;
-
-	if (*ppos == 0) {
-		if (file->private_data) {
-			kfree(file->private_data);
-			file->private_data = NULL;
-		}
-
-		file->private_data = format_array_alloc("%u", data->array,
data->elements);
-	}
-
-	size = 0;
-	if (file->private_data)
-		size = strlen(file->private_data);
-
-	return simple_read_from_buffer(buf, len, ppos, file->private_data, size);
-}
-
-static int xen_array_release(struct inode *inode, struct file *file)
-{
-	kfree(file->private_data);
-
-	return 0;
-}
-
-static const struct file_operations u32_array_fops = {
-	.owner	= THIS_MODULE,
-	.open	= u32_array_open,
-	.release= xen_array_release,
-	.read	= u32_array_read,
-	.llseek = no_llseek,
-};
-
-struct dentry *xen_debugfs_create_u32_array(const char *name, umode_t mode,
-					    struct dentry *parent,
-					    u32 *array, unsigned elements)
-{
-	struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
-
-	if (data == NULL)
-		return NULL;
-
-	data->array = array;
-	data->elements = elements;
-
-	return debugfs_create_file(name, mode, parent, data, &u32_array_fops);
-}
diff --git a/arch/x86/xen/debugfs.h b/arch/x86/xen/debugfs.h
index 78d2549..12ebf33 100644
--- a/arch/x86/xen/debugfs.h
+++ b/arch/x86/xen/debugfs.h
@@ -3,8 +3,4 @@
 
 struct dentry * __init xen_init_debugfs(void);
 
-struct dentry *xen_debugfs_create_u32_array(const char *name, umode_t mode,
-					    struct dentry *parent,
-					    u32 *array, unsigned elements);
-
 #endif /* _XEN_DEBUGFS_H */
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 4926974..b74cebb 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -314,7 +314,7 @@ static int __init xen_spinlock_debugfs(void)
 	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
 			   &spinlock_stats.time_blocked);
 
-	xen_debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
 				     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
 
 	return 0;
diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index ef023ee..cb6cff3 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -20,6 +20,7 @@
 #include <linux/namei.h>
 #include <linux/debugfs.h>
 #include <linux/io.h>
+#include <linux/slab.h>
 
 static ssize_t default_read_file(struct file *file, char __user *buf,
 				 size_t count, loff_t *ppos)
@@ -528,6 +529,133 @@ struct dentry *debugfs_create_blob(const char *name,
umode_t mode,
 }
 EXPORT_SYMBOL_GPL(debugfs_create_blob);
 
+struct array_data {
+	void *array;
+	u32 elements;
+};
+
+static int u32_array_open(struct inode *inode, struct file *file)
+{
+	file->private_data = NULL;
+	return nonseekable_open(inode, file);
+}
+
+static size_t format_array(char *buf, size_t bufsize, const char *fmt,
+			   u32 *array, u32 array_size)
+{
+	size_t ret = 0;
+	u32 i;
+
+	for (i = 0; i < array_size; i++) {
+		size_t len;
+
+		len = snprintf(buf, bufsize, fmt, array[i]);
+		len++;	/* ' ' or '\n' */
+		ret += len;
+
+		if (buf) {
+			buf += len;
+			bufsize -= len;
+			buf[-1] = (i == array_size-1) ? '\n' : ' ';
+		}
+	}
+
+	ret++;		/* \0 */
+	if (buf)
+		*buf = '\0';
+
+	return ret;
+}
+
+static char *format_array_alloc(const char *fmt, u32 *array,
+						u32 array_size)
+{
+	size_t len = format_array(NULL, 0, fmt, array, array_size);
+	char *ret;
+
+	ret = kmalloc(len, GFP_KERNEL);
+	if (ret == NULL)
+		return NULL;
+
+	format_array(ret, len, fmt, array, array_size);
+	return ret;
+}
+
+static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len,
+			      loff_t *ppos)
+{
+	struct inode *inode = file->f_path.dentry->d_inode;
+	struct array_data *data = inode->i_private;
+	size_t size;
+
+	if (*ppos == 0) {
+		if (file->private_data) {
+			kfree(file->private_data);
+			file->private_data = NULL;
+		}
+
+		file->private_data = format_array_alloc("%u", data->array,
+							      data->elements);
+	}
+
+	size = 0;
+	if (file->private_data)
+		size = strlen(file->private_data);
+
+	return simple_read_from_buffer(buf, len, ppos,
+					file->private_data, size);
+}
+
+static int u32_array_release(struct inode *inode, struct file *file)
+{
+	kfree(file->private_data);
+
+	return 0;
+}
+
+static const struct file_operations u32_array_fops = {
+	.owner	 = THIS_MODULE,
+	.open	 = u32_array_open,
+	.release = u32_array_release,
+	.read	 = u32_array_read,
+	.llseek  = no_llseek,
+};
+
+/**
+ * debugfs_create_u32_array - create a debugfs file that is used to read u32
+ * array.
+ * @name: a pointer to a string containing the name of the file to create.
+ * @mode: the permission that the file should have.
+ * @parent: a pointer to the parent dentry for this file.  This should be a
+ *          directory dentry if set.  If this parameter is %NULL, then the
+ *          file will be created in the root of the debugfs filesystem.
+ * @array: u32 array that provides data.
+ * @elements: total number of elements in the array.
+ *
+ * This function creates a file in debugfs with the given name that exports
+ * @array as data. If the @mode variable is so set it can be read from.
+ * Writing is not supported. Seek within the file is also not supported.
+ * Once array is created its size can not be changed.
+ *
+ * The function returns a pointer to dentry on success. If debugfs is not
+ * enabled in the kernel, the value -%ENODEV will be returned.
+ */
+struct dentry *debugfs_create_u32_array(const char *name, umode_t mode,
+					    struct dentry *parent,
+					    u32 *array, u32 elements)
+{
+	struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
+
+	if (data == NULL)
+		return NULL;
+
+	data->array = array;
+	data->elements = elements;
+
+	return debugfs_create_file(name, mode, parent, data, &u32_array_fops);
+}
+EXPORT_SYMBOL_GPL(debugfs_create_u32_array);
+
 #ifdef CONFIG_HAS_IOMEM
 
 /*
diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
index 6169c26..5cb4435 100644
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -93,6 +93,10 @@ struct dentry *debugfs_create_regset32(const char *name,
mode_t mode,
 int debugfs_print_regs32(struct seq_file *s, const struct debugfs_reg32 *regs,
 			 int nregs, void __iomem *base, char *prefix);
 
+struct dentry *debugfs_create_u32_array(const char *name, umode_t mode,
+					struct dentry *parent,
+					u32 *array, u32 elements);
+
 bool debugfs_initialized(void);
 
 #else
@@ -219,6 +223,13 @@ static inline bool debugfs_initialized(void)
 	return false;
 }
 
+struct dentry *debugfs_create_u32_array(const char *name, umode_t mode,
+					struct dentry *parent,
+					u32 *array, u32 elements)
+{
+	return ERR_PTR(-ENODEV);
+}
+
 #endif
 
 #endif

Raghavendra K T

2012-Mar-23 08:07 UTC

head link

[PATCH RFC V5 2/6] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

From: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com>

KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state.
    
The presence of these hypercalls is indicated to guest via
KVM_FEATURE_PV_UNHALT/KVM_CAP_PV_UNHALT.

Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com>
---
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 734c376..9234f13 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -16,12 +16,14 @@
 #define KVM_FEATURE_CLOCKSOURCE		0
 #define KVM_FEATURE_NOP_IO_DELAY	1
 #define KVM_FEATURE_MMU_OP		2
+
 /* This indicates that the new set of kvmclock msrs
  * are available. The use of 0x11 and 0x12 is deprecated
  */
 #define KVM_FEATURE_CLOCKSOURCE2        3
 #define KVM_FEATURE_ASYNC_PF		4
 #define KVM_FEATURE_STEAL_TIME		5
+#define KVM_FEATURE_PV_UNHALT		6
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -32,6 +34,7 @@
 #define MSR_KVM_SYSTEM_TIME 0x12
 
 #define KVM_MSR_ENABLED 1
+
 /* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */
 #define MSR_KVM_WALL_CLOCK_NEW  0x4b564d00
 #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 89b02bf..61388b9 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -408,7 +408,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32
function,
 			     (1 << KVM_FEATURE_NOP_IO_DELAY) |
 			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
 			     (1 << KVM_FEATURE_ASYNC_PF) |
-			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
+			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+			     (1 << KVM_FEATURE_PV_UNHALT);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9cbfc06..bd5ef91 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2079,6 +2079,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_XSAVE:
 	case KVM_CAP_ASYNC_PF:
 	case KVM_CAP_GET_TSC_KHZ:
+	case KVM_CAP_PV_UNHALT:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
@@ -4913,6 +4914,30 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+/*
+ * kvm_pv_kick_cpu_op:  Kick a vcpu.
+ *
+ * @apicid - apicid of vcpu to be kicked.
+ */
+static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
+{
+	struct kvm_vcpu *vcpu = NULL;
+	int i;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!kvm_apic_present(vcpu))
+			continue;
+
+		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
+			break;
+	}
+	if (vcpu) {
+		vcpu->pv_unhalted = 1;
+		smp_mb();
+		kvm_vcpu_kick(vcpu);
+	}
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
@@ -4946,6 +4971,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 	case KVM_HC_VAPIC_POLL_IRQ:
 		ret = 0;
 		break;
+	case KVM_HC_KICK_CPU:
+		kvm_pv_kick_cpu_op(vcpu->kvm, a0);
+		ret = 0;
+		break;
 	default:
 		ret = -KVM_ENOSYS;
 		break;
@@ -6174,6 +6203,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 		!vcpu->arch.apf.halted)
 		|| !list_empty_careful(&vcpu->async_pf.done)
 		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
+		|| vcpu->pv_unhalted
 		|| atomic_read(&vcpu->arch.nmi_queued) ||
 		(kvm_arch_interrupt_allowed(vcpu) &&
 		 kvm_cpu_has_interrupt(vcpu));
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 68e67e5..e822d96 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_PAPR 68
 #define KVM_CAP_S390_GMAP 71
 #define KVM_CAP_TSC_DEADLINE_TIMER 72
+#define KVM_CAP_PV_UNHALT 73
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 900c763..433ae97 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -158,6 +158,7 @@ struct kvm_vcpu {
 #endif
 
 	struct kvm_vcpu_arch arch;
+	int pv_unhalted;
 };
 
 static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index ff476dd..38226e1 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -19,6 +19,7 @@
 #define KVM_HC_MMU_OP			2
 #define KVM_HC_FEATURES			3
 #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
+#define KVM_HC_KICK_CPU			5
 
 /*
  * hypercalls use architecture specific
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a91f980..d3b98b1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -226,6 +226,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm,
unsigned id)
 	vcpu->kvm = kvm;
 	vcpu->vcpu_id = id;
 	vcpu->pid = NULL;
+	vcpu->pv_unhalted = 0;
 	init_waitqueue_head(&vcpu->wq);
 	kvm_async_pf_vcpu_init(vcpu);
 
@@ -1567,6 +1568,9 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
 
 		if (kvm_arch_vcpu_runnable(vcpu)) {
+			vcpu->pv_unhalted = 0;
+			/* preventing reordering should be enough here */
+			barrier();
 			kvm_make_request(KVM_REQ_UNHALT, vcpu);
 			break;
 		}

Raghavendra K T

2012-Mar-23 08:07 UTC

head link

[PATCH RFC V5 3/6] kvm : Add unhalt msr to aid (live) migration

From: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com>

Currently guest does not need to know pv_unhalt state and intended to be
used via GET/SET_MSR ioctls  during migration.

Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com>
---
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 9234f13..46f9751 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -40,6 +40,7 @@
 #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
+#define MSR_KVM_PV_UNHALT   0x4b564d04
 
 struct kvm_steal_time {
 	__u64 steal;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bd5ef91..38e6c47 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -784,12 +784,13 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
  * kvm-specific. Those are put in the beginning of the list.
  */
 
-#define KVM_SAVE_MSRS_BEGIN	9
+#define KVM_SAVE_MSRS_BEGIN	10
 static u32 msrs_to_save[] = {
 	MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
 	MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
 	HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
 	HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
+	MSR_KVM_PV_UNHALT,
 	MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
 	MSR_STAR,
 #ifdef CONFIG_X86_64
@@ -1606,7 +1607,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64
data)
 		kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
 
 		break;
-
+	case MSR_KVM_PV_UNHALT:
+		vcpu->pv_unhalted = (u32) data;
+		break;
 	case MSR_IA32_MCG_CTL:
 	case MSR_IA32_MCG_STATUS:
 	case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1:
@@ -1917,6 +1920,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64
*pdata)
 	case MSR_KVM_STEAL_TIME:
 		data = vcpu->arch.st.msr_val;
 		break;
+	case MSR_KVM_PV_UNHALT:
+		data = (u64)vcpu->pv_unhalted;
+		break;
 	case MSR_IA32_P5_MC_ADDR:
 	case MSR_IA32_P5_MC_TYPE:
 	case MSR_IA32_MCG_CAP:

Raghavendra K T

2012-Mar-23 08:08 UTC

head link

[PATCH RFC V5 4/6] kvm guest : Added configuration support to enable debug information for KVM Guests

From: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com>

Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com>
---
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 10c28ec..a4530bd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -600,6 +600,15 @@ config KVM_GUEST
 	  This option enables various optimizations for running under the KVM
 	  hypervisor.
 
+config KVM_DEBUG_FS
+	bool "Enable debug information for KVM Guests in debugfs"
+	depends on KVM_GUEST && DEBUG_FS
+	default n
+	---help---
+	  This option enables collection of various statistics for KVM guest.
+   	  Statistics are displayed in debugfs filesystem. Enabling this option
+	  may incur significant overhead.
+
 source "arch/x86/lguest/Kconfig"
 
 config PARAVIRT

Raghavendra K T

2012-Mar-23 08:08 UTC

head link

[PATCH RFC V5 5/6] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor

From: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com>

During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so,
 support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com>
---
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 46f9751..2888c45 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -197,10 +197,20 @@ void kvm_async_pf_task_wait(u32 token);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
-#else
-#define kvm_guest_init() do { } while (0)
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_spinlock_init(void);
+#else /* !CONFIG_PARAVIRT_SPINLOCKS */
+static void kvm_spinlock_init(void)
+{
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#else /* CONFIG_KVM_GUEST */
+#define kvm_guest_init() do {} while (0)
 #define kvm_async_pf_task_wait(T) do {} while(0)
 #define kvm_async_pf_task_wake(T) do {} while(0)
+
 static inline u32 kvm_read_and_reset_pf_reason(void)
 {
 	return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index f0c6fd6..c535422 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -33,6 +33,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/kprobes.h>
+#include <linux/debugfs.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -364,6 +365,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
 #endif
 	kvm_guest_cpu_init();
 	native_smp_prepare_boot_cpu();
+	kvm_spinlock_init();
 }
 
 static void __cpuinit kvm_guest_cpu_online(void *dummy)
@@ -446,3 +448,255 @@ static __init int activate_jump_labels(void)
 	return 0;
 }
 arch_initcall(activate_jump_labels);
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+enum kvm_contention_stat {
+	TAKEN_SLOW,
+	TAKEN_SLOW_PICKUP,
+	RELEASED_SLOW,
+	RELEASED_SLOW_KICKED,
+	NR_CONTENTION_STATS
+};
+
+#ifdef CONFIG_KVM_DEBUG_FS
+#define HISTO_BUCKETS	30
+
+static struct kvm_spinlock_stats
+{
+	u32 contention_stats[NR_CONTENTION_STATS];
+	u32 histo_spin_blocked[HISTO_BUCKETS+1];
+	u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+	u8 ret;
+	u8 old;
+
+	old = ACCESS_ONCE(zero_stats);
+	if (unlikely(old)) {
+		ret = cmpxchg(&zero_stats, old, 0);
+		/* This ensures only one fellow resets the stat */
+		if (ret == old)
+			memset(&spinlock_stats, 0, sizeof(spinlock_stats));
+	}
+}
+
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+	check_zero();
+	spinlock_stats.contention_stats[var] += val;
+}
+
+
+static inline u64 spin_time_start(void)
+{
+	return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+	unsigned index;
+
+	index = ilog2(delta);
+	check_zero();
+
+	if (index < HISTO_BUCKETS)
+		array[index]++;
+	else
+		array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+	u32 delta;
+
+	delta = sched_clock() - start;
+	__spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+	spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+	d_kvm_debug = debugfs_create_dir("kvm", NULL);
+	if (!d_kvm_debug)
+		printk(KERN_WARNING "Could not create 'kvm' debugfs
directory\n");
+
+	return d_kvm_debug;
+}
+
+static int __init kvm_spinlock_debugfs(void)
+{
+	struct dentry *d_kvm;
+
+	d_kvm = kvm_init_debugfs();
+	if (d_kvm == NULL)
+		return -ENOMEM;
+
+	d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
+
+	debugfs_create_u8("zero_stats", 0644, d_spin_debug,
&zero_stats);
+
+	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
+	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
+
+	debugfs_create_u32("released_slow", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
+	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
+		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
+
+	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
+			   &spinlock_stats.time_blocked);
+
+	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
+
+	return 0;
+}
+fs_initcall(kvm_spinlock_debugfs);
+#else  /* !CONFIG_KVM_DEBUG_FS */
+#define TIMEOUT			(1 << 10)
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+}
+
+static inline u64 spin_time_start(void)
+{
+	return 0;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+}
+#endif  /* CONFIG_KVM_DEBUG_FS */
+
+struct kvm_lock_waiting {
+	struct arch_spinlock *lock;
+	__ticket_t want;
+};
+
+/* cpus 'waiting' on a spinlock to become available */
+static cpumask_t waiting_cpus;
+
+/* Track spinlock on which a cpu is waiting */
+static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
+
+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
+{
+	struct kvm_lock_waiting *w;
+	int cpu;
+	u64 start;
+	unsigned long flags;
+
+	w = &__get_cpu_var(lock_waiting);
+	cpu = smp_processor_id();
+	start = spin_time_start();
+
+	/*
+	 * Make sure an interrupt handler can't upset things in a
+	 * partially setup state.
+	 */
+	local_irq_save(flags);
+
+	/*
+	 * The ordering protocol on this is that the "lock" pointer
+	 * may only be set non-NULL if the "want" ticket is correct.
+	 * If we're updating "want", we must first clear
"lock".
+	 */
+	w->lock = NULL;
+	smp_wmb();
+	w->want = want;
+	smp_wmb();
+	w->lock = lock;
+
+	add_stats(TAKEN_SLOW, 1);
+
+	/*
+	 * This uses set_bit, which is atomic but we should not rely on its
+	 * reordering gurantees. So barrier is needed after this call.
+	 */
+	cpumask_set_cpu(cpu, &waiting_cpus);
+
+	barrier();
+
+	/*
+	 * Mark entry to slowpath before doing the pickup test to make
+	 * sure we don't deadlock with an unlocker.
+	 */
+	__ticket_enter_slowpath(lock);
+
+	/*
+	 * check again make sure it didn't become free while
+	 * we weren't looking.
+	 */
+	if (ACCESS_ONCE(lock->tickets.head) == want) {
+		add_stats(TAKEN_SLOW_PICKUP, 1);
+		goto out;
+	}
+
+	/* Allow interrupts while blocked */
+	local_irq_restore(flags);
+
+	/* halt until it's our turn and kicked. */
+	halt();
+
+	local_irq_save(flags);
+out:
+	cpumask_clear_cpu(cpu, &waiting_cpus);
+	w->lock = NULL;
+	local_irq_restore(flags);
+	spin_time_accum_blocked(start);
+}
+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
+
+/* Kick a cpu by its apicid*/
+static inline void kvm_kick_cpu(int cpu)
+{
+	int apicid;
+
+	apicid = per_cpu(x86_cpu_to_apicid, cpu);
+	kvm_hypercall1(KVM_HC_KICK_CPU, apicid);
+}
+
+/* Kick vcpu waiting on @lock->head to reach value @ticket */
+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
+{
+	int cpu;
+
+	add_stats(RELEASED_SLOW, 1);
+	for_each_cpu(cpu, &waiting_cpus) {
+		const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
+		if (ACCESS_ONCE(w->lock) == lock &&
+		    ACCESS_ONCE(w->want) == ticket) {
+			add_stats(RELEASED_SLOW_KICKED, 1);
+			kvm_kick_cpu(cpu);
+			break;
+		}
+	}
+}
+
+/*
+ * Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
+ */
+void __init kvm_spinlock_init(void)
+{
+	if (!kvm_para_available())
+		return;
+	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
+	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
+		return;
+
+	jump_label_inc(&paravirt_ticketlocks_enabled);
+
+	pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
+	pv_lock_ops.unlock_kick = kvm_unlock_kick;
+}
+#endif	/* CONFIG_PARAVIRT_SPINLOCKS */

Raghavendra K T

2012-Mar-23 08:08 UTC

head link

[PATCH RFC V5 6/6] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock

From: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> 

KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in paravirtual spinlock
enabled guest.

KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be enabled
in guest. support in host is queried via
ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PV_UNHALT)

Thanks Alex for KVM_HC_FEATURES inputs and Vatsa for rewriting KVM_HC_KICK_CPU

Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com>
Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com>
---
diff --git a/Documentation/virtual/kvm/api.txt
b/Documentation/virtual/kvm/api.txt
index e1d94bf..cf8bf3b 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1109,6 +1109,13 @@ support.  Instead it is reported via
 if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
 feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
 
+Paravirtualized ticket spinlocks can be enabled in guest by checking whether
+support exists in host via,
+
+  ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PV_UNHALT)
+
+if this call return true, guest can use the feature.
+
 4.47 KVM_PPC_GET_PVINFO
 
 Capability: KVM_CAP_PPC_GET_PVINFO
diff --git a/Documentation/virtual/kvm/cpuid.txt
b/Documentation/virtual/kvm/cpuid.txt
index 8820685..062dff9 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -39,6 +39,10 @@ KVM_FEATURE_CLOCKSOURCE2           ||     3 || kvmclock
available at msrs
 KVM_FEATURE_ASYNC_PF               ||     4 || async pf can be enabled by
                                    ||       || writing to msr 0x4b564d02
 ------------------------------------------------------------------------------
+KVM_FEATURE_PV_UNHALT              ||     6 || guest checks this feature bit
+                                   ||       || before enabling paravirtualized
+                                   ||       || spinlock support.
+------------------------------------------------------------------------------
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||    24 || host will warn if no guest-side
                                    ||       || per-cpu warps are expected in
                                    ||       || kvmclock.
diff --git a/Documentation/virtual/kvm/hypercalls.txt
b/Documentation/virtual/kvm/hypercalls.txt
new file mode 100644
index 0000000..c9e8b9c
--- /dev/null
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -0,0 +1,59 @@
+KVM Hypercalls Documentation
+==========================+Template for documentation is
+The documenenation for hypercalls should inlcude
+1. Hypercall name, value.
+2. Architecture(s)
+3. status (deprecated, obsolete, active)
+4. Purpose
+
+
+1. KVM_HC_VAPIC_POLL_IRQ
+------------------------
+value: 1
+Architecture: x86
+Purpose:
+
+2. KVM_HC_MMU_OP
+------------------------
+value: 2
+Architecture: x86
+status: deprecated.
+Purpose: Support MMU operations such as writing to PTE,
+flushing TLB, release PT.
+
+3. KVM_HC_FEATURES
+------------------------
+value: 3
+Architecture: PPC
+Purpose: Expose hypercall availability to the guest. On x86 platforms, cpuid
+used to enumerate which hypercalls are available. On PPC, either device tree
+based lookup ( which is also what EPAPR dictates) OR KVM specific enumeration
+mechanism (which is this hypercall) can be used.
+
+4. KVM_HC_PPC_MAP_MAGIC_PAGE
+------------------------
+value: 4
+Architecture: PPC
+Purpose: To enable communication between the hypervisor and guest there is a
+shared page that contains parts of supervisor visible register state.
+The guest can map this shared page to access its supervisor register through
+memory using this hypercall.
+
+5. KVM_HC_KICK_CPU
+------------------------
+value: 5
+Architecture: x86
+Purpose: Hypercall used to wakeup a vcpu from HLT state
+
+Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
+kernel mode for an event to occur (ex: a spinlock to become available) can
+execute HLT instruction once it has busy-waited for more than a threshold
+time-interval. Execution of HLT instruction would cause the hypervisor to put
+the vcpu to sleep until occurence of an appropriate event. Another vcpu of the
+same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall,
+specifying APIC ID of the vcpu to be wokenup.
+
+TODO:
+1. more information on input and output needed?
+2. Add more detail to purpose of hypercalls.
diff --git a/Documentation/virtual/kvm/msr.txt
b/Documentation/virtual/kvm/msr.txt
index 5031780..a7662d3 100644
--- a/Documentation/virtual/kvm/msr.txt
+++ b/Documentation/virtual/kvm/msr.txt
@@ -219,3 +219,12 @@ MSR_KVM_STEAL_TIME: 0x4b564d03
 		steal: the amount of time in which this vCPU did not run, in
 		nanoseconds. Time during which the vcpu is idle, will not be
 		reported as steal time.
+
+MSR_KVM_PV_UNHALT: 0x4b564d04
+	data: 32 bit flag indicating the paravirtual unhalt state of the VCPU.
+	This data is not expected to reside in guest memory. The unhalt state
+	indicates that corresponding VCPU (halted for some reason) is ready for
+	unhalt operation.
+
+	This data is expected to be filled only via ioctl. This is needed for
+	live migration of virtual machine.

Raghavendra K T

2012-Mar-28 18:32 UTC

head link

[PATCH RFC V5 0/6] kvm : Paravirt-spinlock support for KVM guests

On 03/23/2012 01:35 PM, Raghavendra K T wrote:> The 6-patch series to follow this email extends KVM-hypervisor and Linux
guest
> running on KVM-hypervisor to support pv-ticket spinlocks, based on
Xen's
> implementation.
>
> One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick
> another vcpu out of halt state.
> The blocking of vcpu is done using halt() in (lock_spinning) slowpath.
> one MSR is added to aid live migration.
>
> Changes in V5:
> - rebased to 3.3-rc6
> - added PV_UNHALT_MSR that would help in live migration (Avi)
> - removed PV_LOCK_KICK vcpu request and pv_unhalt flag (re)added.
Sorry for pinging
I know it is busy time. But I hope to get response on these patches in 
your free time, so that I can target next merge window for this. 
(whether it has reached some good state or it is heading in reverse 
direction!). it would really boost my morale.
especially MSR stuff and dropping vcpu request bit for PV unhalt.

- Raghu

Marcelo Tosatti

2012-Apr-12 00:06 UTC

head link

[PATCH RFC V5 2/6] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

On Fri, Mar 23, 2012 at 01:37:04PM +0530, Raghavendra K T
wrote:> From: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com>
> 
> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt
state.
>     
> The presence of these hypercalls is indicated to guest via
> KVM_FEATURE_PV_UNHALT/KVM_CAP_PV_UNHALT.
> 
> Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com>
> Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com>
> Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com>
> ---
> diff --git a/arch/x86/include/asm/kvm_para.h
b/arch/x86/include/asm/kvm_para.h
> index 734c376..9234f13 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -16,12 +16,14 @@
>  #define KVM_FEATURE_CLOCKSOURCE		0
>  #define KVM_FEATURE_NOP_IO_DELAY	1
>  #define KVM_FEATURE_MMU_OP		2
> +
>  /* This indicates that the new set of kvmclock msrs
>   * are available. The use of 0x11 and 0x12 is deprecated
>   */
>  #define KVM_FEATURE_CLOCKSOURCE2        3
>  #define KVM_FEATURE_ASYNC_PF		4
>  #define KVM_FEATURE_STEAL_TIME		5
> +#define KVM_FEATURE_PV_UNHALT		6
>  
>  /* The last 8 bits are used to indicate how to interpret the flags field
>   * in pvclock structure. If no bits are set, all flags are ignored.
> @@ -32,6 +34,7 @@
>  #define MSR_KVM_SYSTEM_TIME 0x12
>  
>  #define KVM_MSR_ENABLED 1
> +
>  /* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */
>  #define MSR_KVM_WALL_CLOCK_NEW  0x4b564d00
>  #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 89b02bf..61388b9 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -408,7 +408,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry,
u32 function,
>  			     (1 << KVM_FEATURE_NOP_IO_DELAY) |
>  			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
>  			     (1 << KVM_FEATURE_ASYNC_PF) |
> -			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
> +			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
> +			     (1 << KVM_FEATURE_PV_UNHALT);
>  
>  		if (sched_info_on())
>  			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9cbfc06..bd5ef91 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2079,6 +2079,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>  	case KVM_CAP_XSAVE:
>  	case KVM_CAP_ASYNC_PF:
>  	case KVM_CAP_GET_TSC_KHZ:
> +	case KVM_CAP_PV_UNHALT:
>  		r = 1;
>  		break;
>  	case KVM_CAP_COALESCED_MMIO:
> @@ -4913,6 +4914,30 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
>  	return 1;
>  }
>  
> +/*
> + * kvm_pv_kick_cpu_op:  Kick a vcpu.
> + *
> + * @apicid - apicid of vcpu to be kicked.
> + */
> +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
> +{
> +	struct kvm_vcpu *vcpu = NULL;
> +	int i;
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		if (!kvm_apic_present(vcpu))
> +			continue;
> +
> +		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
> +			break;
> +	}
> +	if (vcpu) {
> +		vcpu->pv_unhalted = 1;
> +		smp_mb();
> +		kvm_vcpu_kick(vcpu);
> +	}
> +}
> +
>  int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>  {
>  	unsigned long nr, a0, a1, a2, a3, ret;
> @@ -4946,6 +4971,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>  	case KVM_HC_VAPIC_POLL_IRQ:
>  		ret = 0;
>  		break;
> +	case KVM_HC_KICK_CPU:
> +		kvm_pv_kick_cpu_op(vcpu->kvm, a0);
> +		ret = 0;
> +		break;
>  	default:
>  		ret = -KVM_ENOSYS;
>  		break;
> @@ -6174,6 +6203,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
>  		!vcpu->arch.apf.halted)
>  		|| !list_empty_careful(&vcpu->async_pf.done)
>  		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
> +		|| vcpu->pv_unhalted
>  		|| atomic_read(&vcpu->arch.nmi_queued) ||
>  		(kvm_arch_interrupt_allowed(vcpu) &&
>  		 kvm_cpu_has_interrupt(vcpu));
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 68e67e5..e822d96 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
>  #define KVM_CAP_PPC_PAPR 68
>  #define KVM_CAP_S390_GMAP 71
>  #define KVM_CAP_TSC_DEADLINE_TIMER 72
> +#define KVM_CAP_PV_UNHALT 73
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 900c763..433ae97 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -158,6 +158,7 @@ struct kvm_vcpu {
>  #endif
>  
>  	struct kvm_vcpu_arch arch;
> +	int pv_unhalted;
>  };
>  
>  static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
> diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
> index ff476dd..38226e1 100644
> --- a/include/linux/kvm_para.h
> +++ b/include/linux/kvm_para.h
> @@ -19,6 +19,7 @@
>  #define KVM_HC_MMU_OP			2
>  #define KVM_HC_FEATURES			3
>  #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
> +#define KVM_HC_KICK_CPU			5
>  
>  /*
>   * hypercalls use architecture specific
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a91f980..d3b98b1 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -226,6 +226,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm
*kvm, unsigned id)
>  	vcpu->kvm = kvm;
>  	vcpu->vcpu_id = id;
>  	vcpu->pid = NULL;
> +	vcpu->pv_unhalted = 0;
>  	init_waitqueue_head(&vcpu->wq);
>  	kvm_async_pf_vcpu_init(vcpu);
>  
> @@ -1567,6 +1568,9 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
>  
>  		if (kvm_arch_vcpu_runnable(vcpu)) {
> +			vcpu->pv_unhalted = 0;
> +			/* preventing reordering should be enough here */
> +			barrier();
Is it always OK to erase the notification, even in case an unrelated
event such as interrupt was the source of wakeup?

It would be easier to verify that notifications are not lost with atomic

test_and_clear(pv_unhalted).

Also x86 specific code should remain in arch/x86/kvm/

Marcelo Tosatti

2012-Apr-12 00:15 UTC

head link

[PATCH RFC V5 3/6] kvm : Add unhalt msr to aid (live) migration

On Fri, Mar 23, 2012 at 01:37:26PM +0530, Raghavendra K T
wrote:> From: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com>
> 
> Currently guest does not need to know pv_unhalt state and intended to be
> used via GET/SET_MSR ioctls  during migration.
> 
> Signed-off-by: Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com>
> ---
> diff --git a/arch/x86/include/asm/kvm_para.h
b/arch/x86/include/asm/kvm_para.h
> index 9234f13..46f9751 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -40,6 +40,7 @@
>  #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
>  #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
>  #define MSR_KVM_STEAL_TIME  0x4b564d03
> +#define MSR_KVM_PV_UNHALT   0x4b564d04
>  
>  struct kvm_steal_time {
>  	__u64 steal;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index bd5ef91..38e6c47 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -784,12 +784,13 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
>   * kvm-specific. Those are put in the beginning of the list.
>   */
>  
> -#define KVM_SAVE_MSRS_BEGIN	9
> +#define KVM_SAVE_MSRS_BEGIN	10
>  static u32 msrs_to_save[] = {
>  	MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
>  	MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
>  	HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
>  	HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
> +	MSR_KVM_PV_UNHALT,
>  	MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
>  	MSR_STAR,
>  #ifdef CONFIG_X86_64
> @@ -1606,7 +1607,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32
msr, u64 data)
>  		kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
>  
>  		break;
> -
> +	case MSR_KVM_PV_UNHALT:
> +		vcpu->pv_unhalted = (u32) data;
> +		break;
>  	case MSR_IA32_MCG_CTL:
>  	case MSR_IA32_MCG_STATUS:
>  	case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1:
> @@ -1917,6 +1920,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32
msr, u64 *pdata)
>  	case MSR_KVM_STEAL_TIME:
>  		data = vcpu->arch.st.msr_val;
>  		break;
> +	case MSR_KVM_PV_UNHALT:
> +		data = (u64)vcpu->pv_unhalted;
> +		break;
>  	case MSR_IA32_P5_MC_ADDR:
>  	case MSR_IA32_P5_MC_TYPE:
>  	case MSR_IA32_MCG_CAP:
Unless there is a reason to use an MSR, should use a normal ioctl
such as KVM_{GET,SET}_MP_STATE.

Marcelo Tosatti

2012-Apr-12 00:29 UTC

head link

[PATCH RFC V5 2/6] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

On Wed, Apr 11, 2012 at 09:06:29PM -0300, Marcelo Tosatti
wrote:> On Fri, Mar 23, 2012 at 01:37:04PM +0530, Raghavendra K T wrote:
> > From: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com>
> > 
> > KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of
halt state.
> >     
> > The presence of these hypercalls is indicated to guest via
> > KVM_FEATURE_PV_UNHALT/KVM_CAP_PV_UNHALT.
> > 
> > Signed-off-by: Srivatsa Vaddagiri <vatsa at linux.vnet.ibm.com>
> > Signed-off-by: Suzuki Poulose <suzuki at in.ibm.com>
> > Signed-off-by: Raghavendra K T <raghavendra.kt at
linux.vnet.ibm.com>
> > ---
> > diff --git a/arch/x86/include/asm/kvm_para.h
b/arch/x86/include/asm/kvm_para.h
> > index 734c376..9234f13 100644
> > --- a/arch/x86/include/asm/kvm_para.h
> > +++ b/arch/x86/include/asm/kvm_para.h
> > @@ -16,12 +16,14 @@
> >  #define KVM_FEATURE_CLOCKSOURCE		0
> >  #define KVM_FEATURE_NOP_IO_DELAY	1
> >  #define KVM_FEATURE_MMU_OP		2
> > +
> >  /* This indicates that the new set of kvmclock msrs
> >   * are available. The use of 0x11 and 0x12 is deprecated
> >   */
> >  #define KVM_FEATURE_CLOCKSOURCE2        3
> >  #define KVM_FEATURE_ASYNC_PF		4
> >  #define KVM_FEATURE_STEAL_TIME		5
> > +#define KVM_FEATURE_PV_UNHALT		6
> >  
> >  /* The last 8 bits are used to indicate how to interpret the flags
field
> >   * in pvclock structure. If no bits are set, all flags are ignored.
> > @@ -32,6 +34,7 @@
> >  #define MSR_KVM_SYSTEM_TIME 0x12
> >  
> >  #define KVM_MSR_ENABLED 1
> > +
> >  /* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */
> >  #define MSR_KVM_WALL_CLOCK_NEW  0x4b564d00
> >  #define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 89b02bf..61388b9 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -408,7 +408,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2
*entry, u32 function,
> >  			     (1 << KVM_FEATURE_NOP_IO_DELAY) |
> >  			     (1 << KVM_FEATURE_CLOCKSOURCE2) |
> >  			     (1 << KVM_FEATURE_ASYNC_PF) |
> > -			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
> > +			     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
> > +			     (1 << KVM_FEATURE_PV_UNHALT);
> >  
> >  		if (sched_info_on())
> >  			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 9cbfc06..bd5ef91 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2079,6 +2079,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> >  	case KVM_CAP_XSAVE:
> >  	case KVM_CAP_ASYNC_PF:
> >  	case KVM_CAP_GET_TSC_KHZ:
> > +	case KVM_CAP_PV_UNHALT:
> >  		r = 1;
> >  		break;
> >  	case KVM_CAP_COALESCED_MMIO:
> > @@ -4913,6 +4914,30 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
> >  	return 1;
> >  }
> >  
> > +/*
> > + * kvm_pv_kick_cpu_op:  Kick a vcpu.
> > + *
> > + * @apicid - apicid of vcpu to be kicked.
> > + */
> > +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
> > +{
> > +	struct kvm_vcpu *vcpu = NULL;
> > +	int i;
> > +
> > +	kvm_for_each_vcpu(i, vcpu, kvm) {
> > +		if (!kvm_apic_present(vcpu))
> > +			continue;
> > +
> > +		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
> > +			break;
> > +	}
> > +	if (vcpu) {
> > +		vcpu->pv_unhalted = 1;
> > +		smp_mb();
> > +		kvm_vcpu_kick(vcpu);
> > +	}
> > +}
> > +
> >  int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> >  {
> >  	unsigned long nr, a0, a1, a2, a3, ret;
> > @@ -4946,6 +4971,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu
*vcpu)
> >  	case KVM_HC_VAPIC_POLL_IRQ:
> >  		ret = 0;
> >  		break;
> > +	case KVM_HC_KICK_CPU:
> > +		kvm_pv_kick_cpu_op(vcpu->kvm, a0);
> > +		ret = 0;
> > +		break;
> >  	default:
> >  		ret = -KVM_ENOSYS;
> >  		break;
> > @@ -6174,6 +6203,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu
*vcpu)
> >  		!vcpu->arch.apf.halted)
> >  		|| !list_empty_careful(&vcpu->async_pf.done)
> >  		|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
> > +		|| vcpu->pv_unhalted
> >  		|| atomic_read(&vcpu->arch.nmi_queued) ||
> >  		(kvm_arch_interrupt_allowed(vcpu) &&
> >  		 kvm_cpu_has_interrupt(vcpu));
> > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > index 68e67e5..e822d96 100644
> > --- a/include/linux/kvm.h
> > +++ b/include/linux/kvm.h
> > @@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
> >  #define KVM_CAP_PPC_PAPR 68
> >  #define KVM_CAP_S390_GMAP 71
> >  #define KVM_CAP_TSC_DEADLINE_TIMER 72
> > +#define KVM_CAP_PV_UNHALT 73
> >  
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >  
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 900c763..433ae97 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -158,6 +158,7 @@ struct kvm_vcpu {
> >  #endif
> >  
> >  	struct kvm_vcpu_arch arch;
> > +	int pv_unhalted;
> >  };
> >  
> >  static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
> > diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
> > index ff476dd..38226e1 100644
> > --- a/include/linux/kvm_para.h
> > +++ b/include/linux/kvm_para.h
> > @@ -19,6 +19,7 @@
> >  #define KVM_HC_MMU_OP			2
> >  #define KVM_HC_FEATURES			3
> >  #define KVM_HC_PPC_MAP_MAGIC_PAGE	4
> > +#define KVM_HC_KICK_CPU			5
> >  
> >  /*
> >   * hypercalls use architecture specific
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index a91f980..d3b98b1 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -226,6 +226,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct
kvm *kvm, unsigned id)
> >  	vcpu->kvm = kvm;
> >  	vcpu->vcpu_id = id;
> >  	vcpu->pid = NULL;
> > +	vcpu->pv_unhalted = 0;
> >  	init_waitqueue_head(&vcpu->wq);
> >  	kvm_async_pf_vcpu_init(vcpu);
> >  
> > @@ -1567,6 +1568,9 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
> >  		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> >  
> >  		if (kvm_arch_vcpu_runnable(vcpu)) {
> > +			vcpu->pv_unhalted = 0;
> > +			/* preventing reordering should be enough here */
> > +			barrier();
> 
> Is it always OK to erase the notification, even in case an unrelated
> event such as interrupt was the source of wakeup?
Note i am only asking whether it is OK to lose a notification, not 
requesting a change to atomic test-and-clear.

It would be nice to have a comment explaining it.
> 
> It would be easier to verify that notifications are not lost with atomic
> test_and_clear(pv_unhalted).
> 
> Also x86 specific code should remain in arch/x86/kvm/
>

Raghavendra K T

2012-Apr-17 07:06 UTC

head link

[PATCH RFC V5 2/6] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

On 04/12/2012 05:59 AM, Marcelo Tosatti wrote:> On Wed, Apr 11, 2012 at 09:06:29PM -0300, Marcelo Tosatti wrote:
>> On Fri, Mar 23, 2012 at 01:37:04PM +0530, Raghavendra K T wrote:
>>> From: Srivatsa Vaddagiri<vatsa at linux.vnet.ibm.com>
>>>[...]

		barrier();>>
>> Is it always OK to erase the notification, even in case an unrelated
>> event such as interrupt was the source of wakeup?
>
> Note i am only asking whether it is OK to lose a notification, not
> requesting a change to atomic test-and-clear.
Yes.. got your point.
IMO, this is the only (safe) place where it can clear
kicked(pv_unhalted) flag. Since it is going to be runnable.

and you are also right in having concern on unwanted clear of flag
since that would result in vcpu /vm hangs eventually.

Hope I did not miss anything.
>
> It would be nice to have a comment explaining it.
>
Sure will do that
>>

Raghavendra K T

2012-Apr-17 07:17 UTC

head link

[PATCH RFC V5 3/6] kvm : Add unhalt msr to aid (live) migration

On 04/12/2012 05:45 AM, Marcelo Tosatti wrote:> On Fri, Mar 23, 2012 at 01:37:26PM +0530, Raghavendra K T wrote:
>> From: Raghavendra K T<raghavendra.kt at linux.vnet.ibm.com>
>>
[...]>
> Unless there is a reason to use an MSR, should use a normal ioctl
> such as KVM_{GET,SET}_MP_STATE.
>
>
I agree with you. In the current implementation, since we are not doing
any communication between host/guest (on this flag), I too felt MSR is
an overkill for this.

IMO, patch like below should do the job, which I am planning to include
in next version of patch.
Let me know if you foresee any side-effects.

---
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index aa44292..5c81a66 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5691,7 +5691,9 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu 
*vcpu,
  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
  				    struct kvm_mp_state *mp_state)
  {
-	mp_state->mp_state = vcpu->arch.mp_state;
+	mp_state->mp_state = (vcpu->arch.mp_state == KVM_MP_STATE_HALTED
&&
+				vcpu->pv_unhalted)?
+				KVM_MP_STATE_RUNNABLE : vcpu->arch.mp_state;
  	return 0;
  }

Seemingly Similar Threads

Search for more possibly parallel threads

Virtualization - Mar 2012 - [PATCH RFC V5 0/6] kvm : Paravirt-spinlock support for KVM guests

[PATCH RFC V5 0/6] kvm : Paravirt-spinlock support for KVM guests

[PATCH RFC V5 1/6] debugfs: Add support to print u32 array in debugfs

[PATCH RFC V5 2/6] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

[PATCH RFC V5 3/6] kvm : Add unhalt msr to aid (live) migration

[PATCH RFC V5 4/6] kvm guest : Added configuration support to enable debug information for KVM Guests

[PATCH RFC V5 5/6] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor

[PATCH RFC V5 6/6] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock

[PATCH RFC V5 0/6] kvm : Paravirt-spinlock support for KVM guests

[PATCH RFC V5 2/6] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

[PATCH RFC V5 3/6] kvm : Add unhalt msr to aid (live) migration

[PATCH RFC V5 2/6] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

[PATCH RFC V5 2/6] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

[PATCH RFC V5 3/6] kvm : Add unhalt msr to aid (live) migration

Seemingly Similar Threads