thr3ads.net - Linux Virtualization - [PATCH v4 0/5] implement vcpu preempted check [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Christian Borntraeger

2016-Oct-19 06:47 UTC

[PATCH v4 0/5] implement vcpu preempted check

On 10/19/2016 12:20 PM, Pan Xinhui wrote:> change from v3:
> 	add x86 vcpu preempted check patch
If you want you could add the s390 patch that I provided for your last version.
I also gave my Acked-by for all previous patches.


> change from v2:
> 	no code change, fix typos, update some comments
> change from v1:
> 	a simplier definition of default vcpu_is_preempted
> 	skip mahcine type check on ppc, and add config. remove dedicated macro.
> 	add one patch to drop overload of rwsem_spin_on_owner and
mutex_spin_on_owner.
> 	add more comments
> 	thanks boqun and Peter's suggestion.
> 
> This patch set aims to fix lock holder preemption issues.
> 
> test-case:
> perf record -a perf bench sched messaging -g 400 -p && perf report
> 
> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
> 
> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some
spin
> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
> These spin_on_onwer variant also cause rcu stall before we apply this patch
set
> 
> We also have observed some performace improvements.
> 
> PPC test result:
> 
> 1 copy - 0.94%
> 2 copy - 7.17%
> 4 copy - 11.9%
> 8 copy -  3.04%
> 16 copy - 15.11%
> 
> details below:
> Without patch:
> 
> 1 copy - File Write 4096 bufsize 8000 maxblocks      2188223.0 KBps  (30.0
s, 1 samples)
> 2 copy - File Write 4096 bufsize 8000 maxblocks      1804433.0 KBps  (30.0
s, 1 samples)
> 4 copy - File Write 4096 bufsize 8000 maxblocks      1237257.0 KBps  (30.0
s, 1 samples)
> 8 copy - File Write 4096 bufsize 8000 maxblocks      1032658.0 KBps  (30.0
s, 1 samples)
> 16 copy - File Write 4096 bufsize 8000 maxblocks       768000.0 KBps  (30.1
s, 1 samples)
> 
> With patch: 
> 
> 1 copy - File Write 4096 bufsize 8000 maxblocks      2209189.0 KBps  (30.0
s, 1 samples)
> 2 copy - File Write 4096 bufsize 8000 maxblocks      1943816.0 KBps  (30.0
s, 1 samples)
> 4 copy - File Write 4096 bufsize 8000 maxblocks      1405591.0 KBps  (30.0
s, 1 samples)
> 8 copy - File Write 4096 bufsize 8000 maxblocks      1065080.0 KBps  (30.0
s, 1 samples)
> 16 copy - File Write 4096 bufsize 8000 maxblocks       904762.0 KBps  (30.0
s, 1 samples)
> 
> X86 test result:
> 	test-case			after-patch	  before-patch
> Execl Throughput                       |    18307.9 lps  |    11701.6 lps 
> File Copy 1024 bufsize 2000 maxblocks  |  1352407.3 KBps |   790418.9 KBps
> File Copy 256 bufsize 500 maxblocks    |   367555.6 KBps |   222867.7 KBps
> File Copy 4096 bufsize 8000 maxblocks  |  3675649.7 KBps |  1780614.4 KBps
> Pipe Throughput                        | 11872208.7 lps  | 11855628.9 lps 
> Pipe-based Context Switching           |  1495126.5 lps  |  1490533.9 lps 
> Process Creation                       |    29881.2 lps  |    28572.8 lps 
> Shell Scripts (1 concurrent)           |    23224.3 lpm  |    22607.4 lpm 
> Shell Scripts (8 concurrent)           |     3531.4 lpm  |     3211.9 lpm 
> System Call Overhead                   | 10385653.0 lps  | 10419979.0 lps 
> 
> Pan Xinhui (5):
>   kernel/sched: introduce vcpu preempted check interface
>   locking/osq: Drop the overload of osq_lock()
>   kernel/locking: Drop the overload of {mutex,rwsem}_spin_on_owner
>   powerpc/spinlock: support vcpu preempted check
>   x86, kvm: support vcpu preempted check
> 
>  arch/powerpc/include/asm/spinlock.h   |  8 ++++++++
>  arch/x86/include/asm/paravirt_types.h |  6 ++++++
>  arch/x86/include/asm/spinlock.h       |  8 ++++++++
>  arch/x86/include/uapi/asm/kvm_para.h  |  3 ++-
>  arch/x86/kernel/kvm.c                 | 11 +++++++++++
>  arch/x86/kernel/paravirt.c            | 11 +++++++++++
>  arch/x86/kvm/x86.c                    | 12 ++++++++++++
>  include/linux/sched.h                 | 12 ++++++++++++
>  kernel/locking/mutex.c                | 15 +++++++++++++--
>  kernel/locking/osq_lock.c             | 10 +++++++++-
>  kernel/locking/rwsem-xadd.c           | 16 +++++++++++++---
>  11 files changed, 105 insertions(+), 7 deletions(-)
>

Pan Xinhui

2016-Oct-19 10:20 UTC

head link

[PATCH v4 0/5] implement vcpu preempted check

change from v3:
	add x86 vcpu preempted check patch
change from v2:
	no code change, fix typos, update some comments
change from v1:
	a simplier definition of default vcpu_is_preempted
	skip mahcine type check on ppc, and add config. remove dedicated macro.
	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. 
	add more comments
	thanks boqun and Peter's suggestion.

This patch set aims to fix lock holder preemption issues.

test-case:
perf record -a perf bench sched messaging -g 400 -p && perf report

18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
 3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
 2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call

We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
These spin_on_onwer variant also cause rcu stall before we apply this patch set

We also have observed some performace improvements.

PPC test result:

1 copy - 0.94%
2 copy - 7.17%
4 copy - 11.9%
8 copy -  3.04%
16 copy - 15.11%

details below:
Without patch:

1 copy - File Write 4096 bufsize 8000 maxblocks      2188223.0 KBps  (30.0 s, 1
samples)
2 copy - File Write 4096 bufsize 8000 maxblocks      1804433.0 KBps  (30.0 s, 1
samples)
4 copy - File Write 4096 bufsize 8000 maxblocks      1237257.0 KBps  (30.0 s, 1
samples)
8 copy - File Write 4096 bufsize 8000 maxblocks      1032658.0 KBps  (30.0 s, 1
samples)
16 copy - File Write 4096 bufsize 8000 maxblocks       768000.0 KBps  (30.1 s, 1
samples)

With patch: 

1 copy - File Write 4096 bufsize 8000 maxblocks      2209189.0 KBps  (30.0 s, 1
samples)
2 copy - File Write 4096 bufsize 8000 maxblocks      1943816.0 KBps  (30.0 s, 1
samples)
4 copy - File Write 4096 bufsize 8000 maxblocks      1405591.0 KBps  (30.0 s, 1
samples)
8 copy - File Write 4096 bufsize 8000 maxblocks      1065080.0 KBps  (30.0 s, 1
samples)
16 copy - File Write 4096 bufsize 8000 maxblocks       904762.0 KBps  (30.0 s, 1
samples)

X86 test result:
	test-case			after-patch	  before-patch
Execl Throughput                       |    18307.9 lps  |    11701.6 lps 
File Copy 1024 bufsize 2000 maxblocks  |  1352407.3 KBps |   790418.9 KBps
File Copy 256 bufsize 500 maxblocks    |   367555.6 KBps |   222867.7 KBps
File Copy 4096 bufsize 8000 maxblocks  |  3675649.7 KBps |  1780614.4 KBps
Pipe Throughput                        | 11872208.7 lps  | 11855628.9 lps 
Pipe-based Context Switching           |  1495126.5 lps  |  1490533.9 lps 
Process Creation                       |    29881.2 lps  |    28572.8 lps 
Shell Scripts (1 concurrent)           |    23224.3 lpm  |    22607.4 lpm 
Shell Scripts (8 concurrent)           |     3531.4 lpm  |     3211.9 lpm 
System Call Overhead                   | 10385653.0 lps  | 10419979.0 lps 

Pan Xinhui (5):
  kernel/sched: introduce vcpu preempted check interface
  locking/osq: Drop the overload of osq_lock()
  kernel/locking: Drop the overload of {mutex,rwsem}_spin_on_owner
  powerpc/spinlock: support vcpu preempted check
  x86, kvm: support vcpu preempted check

 arch/powerpc/include/asm/spinlock.h   |  8 ++++++++
 arch/x86/include/asm/paravirt_types.h |  6 ++++++
 arch/x86/include/asm/spinlock.h       |  8 ++++++++
 arch/x86/include/uapi/asm/kvm_para.h  |  3 ++-
 arch/x86/kernel/kvm.c                 | 11 +++++++++++
 arch/x86/kernel/paravirt.c            | 11 +++++++++++
 arch/x86/kvm/x86.c                    | 12 ++++++++++++
 include/linux/sched.h                 | 12 ++++++++++++
 kernel/locking/mutex.c                | 15 +++++++++++++--
 kernel/locking/osq_lock.c             | 10 +++++++++-
 kernel/locking/rwsem-xadd.c           | 16 +++++++++++++---
 11 files changed, 105 insertions(+), 7 deletions(-)

-- 
2.4.11

Pan Xinhui

2016-Oct-19 10:20 UTC

head link

[PATCH v4 1/5] kernel/sched: introduce vcpu preempted check interface

This patch support to fix lock holder preemption issue.

For kernel users, we could use bool vcpu_is_preempted(int cpu) to detech if
one vcpu is preempted or not.

The default implementation is a macro defined by false. So compiler can
wrap it out if arch dose not support such vcpu pteempted check.

Suggested-by: Peter Zijlstra (Intel) <peterz at infradead.org>
Signed-off-by: Pan Xinhui <xinhui.pan at linux.vnet.ibm.com>
---
 include/linux/sched.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 348f51b..44c1ce7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3506,6 +3506,18 @@ static inline void set_task_cpu(struct task_struct *p,
unsigned int cpu)
 
 #endif /* CONFIG_SMP */
 
+/*
+ * In order to deal with a various lock holder preemption issues provide an
+ * interface to see if a vCPU is currently running or not.
+ *
+ * This allows us to terminate optimistic spin loops and block, analogous to
+ * the native optimistic spin heuristic of testing if the lock owner task is
+ * running or not.
+ */
+#ifndef vcpu_is_preempted
+#define vcpu_is_preempted(cpu)	false
+#endif
+
 extern long sched_setaffinity(pid_t pid, const struct cpumask *new_mask);
 extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
 
-- 
2.4.11

Pan Xinhui

2016-Oct-19 10:20 UTC

head link

[PATCH v4 2/5] locking/osq: Drop the overload of osq_lock()

An over-committed guest with more vCPUs than pCPUs has a heavy overload in
osq_lock().

This is because vCPU A hold the osq lock and yield out, vCPU B wait per_cpu
node->locked to be set. IOW, vCPU B wait vCPU A to run and unlock the osq
lock.

Kernel has an interface bool vcpu_is_preempted(int cpu) to see if a vCPU is
currently running or not. So break the spin loops on true condition.

test case:
perf record -a perf bench sched messaging -g 400 -p && perf report

before patch:
18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
 3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
 2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call

after patch:
20.68%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner
 8.45%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 4.12%  sched-messaging  [kernel.vmlinux]  [k] system_call
 3.01%  sched-messaging  [kernel.vmlinux]  [k] system_call_common
 2.83%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 2.64%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 2.00%  sched-messaging  [kernel.vmlinux]  [k] osq_lock

Suggested-by: Boqun Feng <boqun.feng at gmail.com>
Signed-off-by: Pan Xinhui <xinhui.pan at linux.vnet.ibm.com>
---
 kernel/locking/osq_lock.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..39d1385 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -21,6 +21,11 @@ static inline int encode_cpu(int cpu_nr)
 	return cpu_nr + 1;
 }
 
+static inline int node_cpu(struct optimistic_spin_node *node)
+{
+	return node->cpu - 1;
+}
+
 static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
 {
 	int cpu_nr = encoded_cpu_val - 1;
@@ -118,8 +123,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 	while (!READ_ONCE(node->locked)) {
 		/*
 		 * If we need to reschedule bail... so we can block.
+		 * Use vcpu_is_preempted to detech lock holder preemption issue
+		 * and break. vcpu_is_preempted is a macro defined by false if
+		 * arch does not support vcpu preempted check,
 		 */
-		if (need_resched())
+		if (need_resched() || vcpu_is_preempted(node_cpu(node->prev)))
 			goto unqueue;
 
 		cpu_relax_lowlatency();
-- 
2.4.11

Pan Xinhui

2016-Oct-19 10:20 UTC

head link

[PATCH v4 3/5] kernel/locking: Drop the overload of {mutex, rwsem}_spin_on_owner

An over-committed guest with more vCPUs than pCPUs has a heavy overload in
the two spin_on_owner. This blames on the lock holder preemption issue.

Kernel has an interface bool vcpu_is_preempted(int cpu) to see if a vCPU is
currently running or not. So break the spin loops on true condition.

test-case:
perf record -a perf bench sched messaging -g 400 -p && perf report

before patch:
20.68%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner
 8.45%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 4.12%  sched-messaging  [kernel.vmlinux]  [k] system_call
 3.01%  sched-messaging  [kernel.vmlinux]  [k] system_call_common
 2.83%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 2.64%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 2.00%  sched-messaging  [kernel.vmlinux]  [k] osq_lock

after patch:
 9.99%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 5.28%  sched-messaging  [unknown]         [H] 0xc0000000000768e0
 4.27%  sched-messaging  [kernel.vmlinux]  [k] __copy_tofrom_user_power7
 3.77%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 3.24%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.02%  sched-messaging  [kernel.vmlinux]  [k] system_call
 2.69%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task

Signed-off-by: Pan Xinhui <xinhui.pan at linux.vnet.ibm.com>
---
 kernel/locking/mutex.c      | 15 +++++++++++++--
 kernel/locking/rwsem-xadd.c | 16 +++++++++++++---
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index a70b90d..8927e96 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -236,7 +236,13 @@ bool mutex_spin_on_owner(struct mutex *lock, struct
task_struct *owner)
 		 */
 		barrier();
 
-		if (!owner->on_cpu || need_resched()) {
+		/*
+		 * Use vcpu_is_preempted to detech lock holder preemption issue
+		 * and break. vcpu_is_preempted is a macro defined by false if
+		 * arch does not support vcpu preempted check,
+		 */
+		if (!owner->on_cpu || need_resched() ||
+				vcpu_is_preempted(task_cpu(owner))) {
 			ret = false;
 			break;
 		}
@@ -261,8 +267,13 @@ static inline int mutex_can_spin_on_owner(struct mutex
*lock)
 
 	rcu_read_lock();
 	owner = READ_ONCE(lock->owner);
+
+	/*
+	 * As lock holder preemption issue, we both skip spinning if task is not
+	 * on cpu or its cpu is preempted
+	 */
 	if (owner)
-		retval = owner->on_cpu;
+		retval = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
 	rcu_read_unlock();
 	/*
 	 * if lock->owner is not set, the mutex owner may have just acquired
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 2337b4b..ad0b5bb 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -336,7 +336,11 @@ static inline bool rwsem_can_spin_on_owner(struct
rw_semaphore *sem)
 		goto done;
 	}
 
-	ret = owner->on_cpu;
+	/*
+	 * As lock holder preemption issue, we both skip spinning if task is not
+	 * on cpu or its cpu is preempted
+	 */
+	ret = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
 done:
 	rcu_read_unlock();
 	return ret;
@@ -362,8 +366,14 @@ static noinline bool rwsem_spin_on_owner(struct
rw_semaphore *sem)
 		 */
 		barrier();
 
-		/* abort spinning when need_resched or owner is not running */
-		if (!owner->on_cpu || need_resched()) {
+		/*
+		 * abort spinning when need_resched or owner is not running or
+		 * owner's cpu is preempted. vcpu_is_preempted is a macro
+		 * defined by false if arch does not support vcpu preempted
+		 * check
+		 */
+		if (!owner->on_cpu || need_resched() ||
+				vcpu_is_preempted(task_cpu(owner))) {
 			rcu_read_unlock();
 			return false;
 		}
-- 
2.4.11

Pan Xinhui

2016-Oct-19 10:20 UTC

head link

[PATCH v4 4/5] powerpc/spinlock: support vcpu preempted check

This is to fix some lock holder preemption issues. Some other locks
implementation do a spin loop before acquiring the lock itself.
Currently kernel has an interface of bool vcpu_is_preempted(int cpu). It
takes the cpu as parameter and return true if the cpu is preempted. Then
kernel can break the spin loops upon on the retval of vcpu_is_preempted.

As kernel has used this interface, So lets support it.

Only pSeries need support it. And the fact is powerNV are built into
same kernel image with pSeries. So we need return false if we are runnig
as powerNV. The another fact is that lppaca->yiled_count keeps zero on
powerNV. So we can just skip the machine type check.

Suggested-by: Boqun Feng <boqun.feng at gmail.com>
Suggested-by: Peter Zijlstra (Intel) <peterz at infradead.org>
Signed-off-by: Pan Xinhui <xinhui.pan at linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/spinlock.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/include/asm/spinlock.h
b/arch/powerpc/include/asm/spinlock.h
index abb6b0f..af4285b 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -52,6 +52,14 @@
 #define SYNC_IO
 #endif
 
+#ifdef CONFIG_PPC_PSERIES
+#define vcpu_is_preempted vcpu_is_preempted
+static inline bool vcpu_is_preempted(int cpu)
+{
+	return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1);
+}
+#endif
+
 #if defined(CONFIG_PPC_SPLPAR)
 /* We only yield to the hypervisor if we are in shared processor mode */
 #define SHARED_PROCESSOR (lppaca_shared_proc(local_paca->lppaca_ptr))
-- 
2.4.11

Pan Xinhui

2016-Oct-19 10:20 UTC

head link

[PATCH v4 5/5] x86, kvm: support vcpu preempted check

This is to fix some lock holder preemption issues. Some other locks
implementation do a spin loop before acquiring the lock itself.
Currently kernel has an interface of bool vcpu_is_preempted(int cpu). It
takes the cpu as parameter and return true if the cpu is preempted.  Then
kernel can break the spin loops upon on the retval of vcpu_is_preempted.

As kernel has used this interface, So lets support it.

We use one field of struct kvm_steal_time to indicate that if one vcpu
is running or not.

unix benchmark result:
host:  kernel 4.8.1, i5-4570, 4 cpus
guest: kernel 4.8.1, 8 vcpus

	test-case			after-patch	  before-patch
Execl Throughput                       |    18307.9 lps  |    11701.6 lps 
File Copy 1024 bufsize 2000 maxblocks  |  1352407.3 KBps |   790418.9 KBps
File Copy 256 bufsize 500 maxblocks    |   367555.6 KBps |   222867.7 KBps
File Copy 4096 bufsize 8000 maxblocks  |  3675649.7 KBps |  1780614.4 KBps
Pipe Throughput                        | 11872208.7 lps  | 11855628.9 lps 
Pipe-based Context Switching           |  1495126.5 lps  |  1490533.9 lps 
Process Creation                       |    29881.2 lps  |    28572.8 lps 
Shell Scripts (1 concurrent)           |    23224.3 lpm  |    22607.4 lpm 
Shell Scripts (8 concurrent)           |     3531.4 lpm  |     3211.9 lpm 
System Call Overhead                   | 10385653.0 lps  | 10419979.0 lps 

Signed-off-by: Pan Xinhui <xinhui.pan at linux.vnet.ibm.com>
---
 arch/x86/include/asm/paravirt_types.h |  6 ++++++
 arch/x86/include/asm/spinlock.h       |  8 ++++++++
 arch/x86/include/uapi/asm/kvm_para.h  |  3 ++-
 arch/x86/kernel/kvm.c                 | 11 +++++++++++
 arch/x86/kernel/paravirt.c            | 11 +++++++++++
 arch/x86/kvm/x86.c                    | 12 ++++++++++++
 6 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/paravirt_types.h
b/arch/x86/include/asm/paravirt_types.h
index 0f400c0..b1c7937 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -98,6 +98,10 @@ struct pv_time_ops {
 	unsigned long long (*steal_clock)(int cpu);
 };
 
+struct pv_vcpu_ops {
+	bool (*vcpu_is_preempted)(int cpu);
+};
+
 struct pv_cpu_ops {
 	/* hooks for various privileged instructions */
 	unsigned long (*get_debugreg)(int regno);
@@ -318,6 +322,7 @@ struct pv_lock_ops {
 struct paravirt_patch_template {
 	struct pv_init_ops pv_init_ops;
 	struct pv_time_ops pv_time_ops;
+	struct pv_vcpu_ops pv_vcpu_ops;
 	struct pv_cpu_ops pv_cpu_ops;
 	struct pv_irq_ops pv_irq_ops;
 	struct pv_mmu_ops pv_mmu_ops;
@@ -327,6 +332,7 @@ struct paravirt_patch_template {
 extern struct pv_info pv_info;
 extern struct pv_init_ops pv_init_ops;
 extern struct pv_time_ops pv_time_ops;
+extern struct pv_vcpu_ops pv_vcpu_ops;
 extern struct pv_cpu_ops pv_cpu_ops;
 extern struct pv_irq_ops pv_irq_ops;
 extern struct pv_mmu_ops pv_mmu_ops;
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 921bea7..52fd942 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -26,6 +26,14 @@
 extern struct static_key paravirt_ticketlocks_enabled;
 static __always_inline bool static_key_false(struct static_key *key);
 
+#ifdef CONFIG_PARAVIRT
+#define vcpu_is_preempted vcpu_is_preempted
+static inline bool vcpu_is_preempted(int cpu)
+{
+	return pv_vcpu_ops.vcpu_is_preempted(cpu);
+}
+#endif
+
 #include <asm/qspinlock.h>
 
 /*
diff --git a/arch/x86/include/uapi/asm/kvm_para.h
b/arch/x86/include/uapi/asm/kvm_para.h
index 94dc8ca..e9c12a1 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -45,7 +45,8 @@ struct kvm_steal_time {
 	__u64 steal;
 	__u32 version;
 	__u32 flags;
-	__u32 pad[12];
+	__u32 preempted;
+	__u32 pad[11];
 };
 
 #define KVM_STEAL_ALIGNMENT_BITS 5
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index edbbfc8..0011bef 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -415,6 +415,15 @@ void kvm_disable_steal_time(void)
 	wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
 }
 
+static bool kvm_vcpu_is_preempted(int cpu)
+{
+	struct kvm_steal_time *src;
+
+	src = &per_cpu(steal_time, cpu);
+
+	return !!src->preempted;
+}
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
@@ -488,6 +497,8 @@ void __init kvm_guest_init(void)
 	kvm_guest_cpu_init();
 #endif
 
+	pv_vcpu_ops.vcpu_is_preempted = kvm_vcpu_is_preempted;
+
 	/*
 	 * Hard lockup detection is enabled by default. Disable it, as guests
 	 * can get false positives too easily, for example if the host is
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index bbf3d59..7adb7e9 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -122,6 +122,7 @@ static void *get_call_destination(u8 type)
 	struct paravirt_patch_template tmpl = {
 		.pv_init_ops = pv_init_ops,
 		.pv_time_ops = pv_time_ops,
+		.pv_vcpu_ops = pv_vcpu_ops,
 		.pv_cpu_ops = pv_cpu_ops,
 		.pv_irq_ops = pv_irq_ops,
 		.pv_mmu_ops = pv_mmu_ops,
@@ -203,6 +204,11 @@ static u64 native_steal_clock(int cpu)
 	return 0;
 }
 
+static bool native_vcpu_is_preempted(int cpu)
+{
+	return 0;
+}
+
 /* These are in entry.S */
 extern void native_iret(void);
 extern void native_usergs_sysret64(void);
@@ -312,6 +318,10 @@ struct pv_time_ops pv_time_ops = {
 	.steal_clock = native_steal_clock,
 };
 
+struct pv_vcpu_ops pv_vcpu_ops = {
+	.vcpu_is_preempted = native_vcpu_is_preempted,
+};
+
 __visible struct pv_irq_ops pv_irq_ops = {
 	.save_fl = __PV_IS_CALLEE_SAVE(native_save_fl),
 	.restore_fl = __PV_IS_CALLEE_SAVE(native_restore_fl),
@@ -458,6 +468,7 @@ struct pv_mmu_ops pv_mmu_ops __ro_after_init = {
 };
 
 EXPORT_SYMBOL_GPL(pv_time_ops);
+EXPORT_SYMBOL    (pv_vcpu_ops);
 EXPORT_SYMBOL    (pv_cpu_ops);
 EXPORT_SYMBOL    (pv_mmu_ops);
 EXPORT_SYMBOL_GPL(pv_info);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6c633de..0ffc5aa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2057,6 +2057,8 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 		&vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
 		return;
 
+	vcpu->arch.st.steal.preempted = 0;
+
 	if (vcpu->arch.st.steal.version & 1)
 		vcpu->arch.st.steal.version += 1;  /* first time write, random junk */
 
@@ -2812,6 +2814,16 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	if (vcpu->arch.st.msr_val & KVM_MSR_ENABLED)
+		if (kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
+					&vcpu->arch.st.steal,
+					sizeof(struct kvm_steal_time)) == 0) {
+			vcpu->arch.st.steal.preempted = 1;
+			kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
+					&vcpu->arch.st.steal,
+					sizeof(struct kvm_steal_time));
+		}
+
 	kvm_x86_ops->vcpu_put(vcpu);
 	kvm_put_guest_fpu(vcpu);
 	vcpu->arch.last_host_tsc = rdtsc();
-- 
2.4.11

Juergen Gross

2016-Oct-19 15:58 UTC

head link

[PATCH v4 0/5] implement vcpu preempted check

On 19/10/16 12:20, Pan Xinhui wrote:> change from v3:
> 	add x86 vcpu preempted check patch
> change from v2:
> 	no code change, fix typos, update some comments
> change from v1:
> 	a simplier definition of default vcpu_is_preempted
> 	skip mahcine type check on ppc, and add config. remove dedicated macro.
> 	add one patch to drop overload of rwsem_spin_on_owner and
mutex_spin_on_owner.
> 	add more comments
> 	thanks boqun and Peter's suggestion.
> 
> This patch set aims to fix lock holder preemption issues.
> 
> test-case:
> perf record -a perf bench sched messaging -g 400 -p && perf report
> 
> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
> 
> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some
spin
> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
> These spin_on_onwer variant also cause rcu stall before we apply this patch
set
> 
> We also have observed some performace improvements.
> 
> PPC test result:
> 
> 1 copy - 0.94%
> 2 copy - 7.17%
> 4 copy - 11.9%
> 8 copy -  3.04%
> 16 copy - 15.11%
> 
> details below:
> Without patch:
> 
> 1 copy - File Write 4096 bufsize 8000 maxblocks      2188223.0 KBps  (30.0
s, 1 samples)
> 2 copy - File Write 4096 bufsize 8000 maxblocks      1804433.0 KBps  (30.0
s, 1 samples)
> 4 copy - File Write 4096 bufsize 8000 maxblocks      1237257.0 KBps  (30.0
s, 1 samples)
> 8 copy - File Write 4096 bufsize 8000 maxblocks      1032658.0 KBps  (30.0
s, 1 samples)
> 16 copy - File Write 4096 bufsize 8000 maxblocks       768000.0 KBps  (30.1
s, 1 samples)
> 
> With patch: 
> 
> 1 copy - File Write 4096 bufsize 8000 maxblocks      2209189.0 KBps  (30.0
s, 1 samples)
> 2 copy - File Write 4096 bufsize 8000 maxblocks      1943816.0 KBps  (30.0
s, 1 samples)
> 4 copy - File Write 4096 bufsize 8000 maxblocks      1405591.0 KBps  (30.0
s, 1 samples)
> 8 copy - File Write 4096 bufsize 8000 maxblocks      1065080.0 KBps  (30.0
s, 1 samples)
> 16 copy - File Write 4096 bufsize 8000 maxblocks       904762.0 KBps  (30.0
s, 1 samples)
> 
> X86 test result:
> 	test-case			after-patch	  before-patch
> Execl Throughput                       |    18307.9 lps  |    11701.6 lps 
> File Copy 1024 bufsize 2000 maxblocks  |  1352407.3 KBps |   790418.9 KBps
> File Copy 256 bufsize 500 maxblocks    |   367555.6 KBps |   222867.7 KBps
> File Copy 4096 bufsize 8000 maxblocks  |  3675649.7 KBps |  1780614.4 KBps
> Pipe Throughput                        | 11872208.7 lps  | 11855628.9 lps 
> Pipe-based Context Switching           |  1495126.5 lps  |  1490533.9 lps 
> Process Creation                       |    29881.2 lps  |    28572.8 lps 
> Shell Scripts (1 concurrent)           |    23224.3 lpm  |    22607.4 lpm 
> Shell Scripts (8 concurrent)           |     3531.4 lpm  |     3211.9 lpm 
> System Call Overhead                   | 10385653.0 lps  | 10419979.0 lps 
> 
> Pan Xinhui (5):
>   kernel/sched: introduce vcpu preempted check interface
>   locking/osq: Drop the overload of osq_lock()
>   kernel/locking: Drop the overload of {mutex,rwsem}_spin_on_owner
>   powerpc/spinlock: support vcpu preempted check
>   x86, kvm: support vcpu preempted check
The attached patch adds Xen support for x86. Please tell me whether you
want to add this patch to your series or if I should post it when your
series has been accepted.

You can add my

Tested-by: Juergen Gross <jgross at suse.com>

for patches 1-3 and 5 (paravirt parts only).


Juergen
> 
>  arch/powerpc/include/asm/spinlock.h   |  8 ++++++++
>  arch/x86/include/asm/paravirt_types.h |  6 ++++++
>  arch/x86/include/asm/spinlock.h       |  8 ++++++++
>  arch/x86/include/uapi/asm/kvm_para.h  |  3 ++-
>  arch/x86/kernel/kvm.c                 | 11 +++++++++++
>  arch/x86/kernel/paravirt.c            | 11 +++++++++++
>  arch/x86/kvm/x86.c                    | 12 ++++++++++++
>  include/linux/sched.h                 | 12 ++++++++++++
>  kernel/locking/mutex.c                | 15 +++++++++++++--
>  kernel/locking/osq_lock.c             | 10 +++++++++-
>  kernel/locking/rwsem-xadd.c           | 16 +++++++++++++---
>  11 files changed, 105 insertions(+), 7 deletions(-)
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-x86-xen-support-vcpu-preempted-check.patch
Type: text/x-patch
Size: 1415 bytes
Desc: not available
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20161019/bcf327a1/attachment.bin>

Pan Xinhui

2016-Oct-19 16:57 UTC

head link

[PATCH v4 0/5] implement vcpu preempted check

? 2016/10/19 14:47, Christian Borntraeger ??:> On 10/19/2016 12:20 PM, Pan Xinhui wrote:
>> change from v3:
>> 	add x86 vcpu preempted check patch
>
> If you want you could add the s390 patch that I provided for your last
version.
> I also gave my Acked-by for all previous patches.
>hi, Christian
	Thanks a lot?
I can include your new s390 patch into my next patchset(if v5 is needed).

xinhui
>
>
>> change from v2:
>> 	no code change, fix typos, update some comments
>> change from v1:
>> 	a simplier definition of default vcpu_is_preempted
>> 	skip mahcine type check on ppc, and add config. remove dedicated
macro.
>> 	add one patch to drop overload of rwsem_spin_on_owner and
mutex_spin_on_owner.
>> 	add more comments
>> 	thanks boqun and Peter's suggestion.
>>
>> This patch set aims to fix lock holder preemption issues.
>>
>> test-case:
>> perf record -a perf bench sched messaging -g 400 -p && perf
report
>>
>> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
>> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>>  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>>  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>>  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>>  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>>  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
>>
>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in
some spin
>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
>> These spin_on_onwer variant also cause rcu stall before we apply this
patch set
>>
>> We also have observed some performace improvements.
>>
>> PPC test result:
>>
>> 1 copy - 0.94%
>> 2 copy - 7.17%
>> 4 copy - 11.9%
>> 8 copy -  3.04%
>> 16 copy - 15.11%
>>
>> details below:
>> Without patch:
>>
>> 1 copy - File Write 4096 bufsize 8000 maxblocks      2188223.0 KBps 
(30.0 s, 1 samples)
>> 2 copy - File Write 4096 bufsize 8000 maxblocks      1804433.0 KBps 
(30.0 s, 1 samples)
>> 4 copy - File Write 4096 bufsize 8000 maxblocks      1237257.0 KBps 
(30.0 s, 1 samples)
>> 8 copy - File Write 4096 bufsize 8000 maxblocks      1032658.0 KBps 
(30.0 s, 1 samples)
>> 16 copy - File Write 4096 bufsize 8000 maxblocks       768000.0 KBps 
(30.1 s, 1 samples)
>>
>> With patch:
>>
>> 1 copy - File Write 4096 bufsize 8000 maxblocks      2209189.0 KBps 
(30.0 s, 1 samples)
>> 2 copy - File Write 4096 bufsize 8000 maxblocks      1943816.0 KBps 
(30.0 s, 1 samples)
>> 4 copy - File Write 4096 bufsize 8000 maxblocks      1405591.0 KBps 
(30.0 s, 1 samples)
>> 8 copy - File Write 4096 bufsize 8000 maxblocks      1065080.0 KBps 
(30.0 s, 1 samples)
>> 16 copy - File Write 4096 bufsize 8000 maxblocks       904762.0 KBps 
(30.0 s, 1 samples)
>>
>> X86 test result:
>> 	test-case			after-patch	  before-patch
>> Execl Throughput                       |    18307.9 lps  |    11701.6
lps
>> File Copy 1024 bufsize 2000 maxblocks  |  1352407.3 KBps |   790418.9
KBps
>> File Copy 256 bufsize 500 maxblocks    |   367555.6 KBps |   222867.7
KBps
>> File Copy 4096 bufsize 8000 maxblocks  |  3675649.7 KBps |  1780614.4
KBps
>> Pipe Throughput                        | 11872208.7 lps  | 11855628.9
lps
>> Pipe-based Context Switching           |  1495126.5 lps  |  1490533.9
lps
>> Process Creation                       |    29881.2 lps  |    28572.8
lps
>> Shell Scripts (1 concurrent)           |    23224.3 lpm  |    22607.4
lpm
>> Shell Scripts (8 concurrent)           |     3531.4 lpm  |     3211.9
lpm
>> System Call Overhead                   | 10385653.0 lps  | 10419979.0
lps
>>
>> Pan Xinhui (5):
>>   kernel/sched: introduce vcpu preempted check interface
>>   locking/osq: Drop the overload of osq_lock()
>>   kernel/locking: Drop the overload of {mutex,rwsem}_spin_on_owner
>>   powerpc/spinlock: support vcpu preempted check
>>   x86, kvm: support vcpu preempted check
>>
>>  arch/powerpc/include/asm/spinlock.h   |  8 ++++++++
>>  arch/x86/include/asm/paravirt_types.h |  6 ++++++
>>  arch/x86/include/asm/spinlock.h       |  8 ++++++++
>>  arch/x86/include/uapi/asm/kvm_para.h  |  3 ++-
>>  arch/x86/kernel/kvm.c                 | 11 +++++++++++
>>  arch/x86/kernel/paravirt.c            | 11 +++++++++++
>>  arch/x86/kvm/x86.c                    | 12 ++++++++++++
>>  include/linux/sched.h                 | 12 ++++++++++++
>>  kernel/locking/mutex.c                | 15 +++++++++++++--
>>  kernel/locking/osq_lock.c             | 10 +++++++++-
>>  kernel/locking/rwsem-xadd.c           | 16 +++++++++++++---
>>  11 files changed, 105 insertions(+), 7 deletions(-)
>>
>

Pan Xinhui

2016-Oct-19 17:08 UTC

head link

[PATCH v4 0/5] implement vcpu preempted check

? 2016/10/19 23:58, Juergen Gross ??:> On 19/10/16 12:20, Pan Xinhui wrote:
>> change from v3:
>> 	add x86 vcpu preempted check patch
>> change from v2:
>> 	no code change, fix typos, update some comments
>> change from v1:
>> 	a simplier definition of default vcpu_is_preempted
>> 	skip mahcine type check on ppc, and add config. remove dedicated
macro.
>> 	add one patch to drop overload of rwsem_spin_on_owner and
mutex_spin_on_owner.
>> 	add more comments
>> 	thanks boqun and Peter's suggestion.
>>
>> This patch set aims to fix lock holder preemption issues.
>>
>> test-case:
>> perf record -a perf bench sched messaging -g 400 -p && perf
report
>>
>> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
>> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>>  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>>  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>>  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>>  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>>  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
>>
>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in
some spin
>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
>> These spin_on_onwer variant also cause rcu stall before we apply this
patch set
>>
>> We also have observed some performace improvements.
>>
>> PPC test result:
>>
>> 1 copy - 0.94%
>> 2 copy - 7.17%
>> 4 copy - 11.9%
>> 8 copy -  3.04%
>> 16 copy - 15.11%
>>
>> details below:
>> Without patch:
>>
>> 1 copy - File Write 4096 bufsize 8000 maxblocks      2188223.0 KBps 
(30.0 s, 1 samples)
>> 2 copy - File Write 4096 bufsize 8000 maxblocks      1804433.0 KBps 
(30.0 s, 1 samples)
>> 4 copy - File Write 4096 bufsize 8000 maxblocks      1237257.0 KBps 
(30.0 s, 1 samples)
>> 8 copy - File Write 4096 bufsize 8000 maxblocks      1032658.0 KBps 
(30.0 s, 1 samples)
>> 16 copy - File Write 4096 bufsize 8000 maxblocks       768000.0 KBps 
(30.1 s, 1 samples)
>>
>> With patch:
>>
>> 1 copy - File Write 4096 bufsize 8000 maxblocks      2209189.0 KBps 
(30.0 s, 1 samples)
>> 2 copy - File Write 4096 bufsize 8000 maxblocks      1943816.0 KBps 
(30.0 s, 1 samples)
>> 4 copy - File Write 4096 bufsize 8000 maxblocks      1405591.0 KBps 
(30.0 s, 1 samples)
>> 8 copy - File Write 4096 bufsize 8000 maxblocks      1065080.0 KBps 
(30.0 s, 1 samples)
>> 16 copy - File Write 4096 bufsize 8000 maxblocks       904762.0 KBps 
(30.0 s, 1 samples)
>>
>> X86 test result:
>> 	test-case			after-patch	  before-patch
>> Execl Throughput                       |    18307.9 lps  |    11701.6
lps
>> File Copy 1024 bufsize 2000 maxblocks  |  1352407.3 KBps |   790418.9
KBps
>> File Copy 256 bufsize 500 maxblocks    |   367555.6 KBps |   222867.7
KBps
>> File Copy 4096 bufsize 8000 maxblocks  |  3675649.7 KBps |  1780614.4
KBps
>> Pipe Throughput                        | 11872208.7 lps  | 11855628.9
lps
>> Pipe-based Context Switching           |  1495126.5 lps  |  1490533.9
lps
>> Process Creation                       |    29881.2 lps  |    28572.8
lps
>> Shell Scripts (1 concurrent)           |    23224.3 lpm  |    22607.4
lpm
>> Shell Scripts (8 concurrent)           |     3531.4 lpm  |     3211.9
lpm
>> System Call Overhead                   | 10385653.0 lps  | 10419979.0
lps
>>
>> Pan Xinhui (5):
>>   kernel/sched: introduce vcpu preempted check interface
>>   locking/osq: Drop the overload of osq_lock()
>>   kernel/locking: Drop the overload of {mutex,rwsem}_spin_on_owner
>>   powerpc/spinlock: support vcpu preempted check
>>   x86, kvm: support vcpu preempted check
>
> The attached patch adds Xen support for x86. Please tell me whether you
> want to add this patch to your series or if I should post it when your
> series has been accepted.
>hi, Juergen
	Your patch is pretty small and nice :) thanks!
I can include your patch into my next patchset after this patchset reviewed. :)

> You can add my
>
> Tested-by: Juergen Gross <jgross at suse.com>
>
> for patches 1-3 and 5 (paravirt parts only).
>
Thanks a lot!

xinhui
>
> Juergen
>
>>
>>  arch/powerpc/include/asm/spinlock.h   |  8 ++++++++
>>  arch/x86/include/asm/paravirt_types.h |  6 ++++++
>>  arch/x86/include/asm/spinlock.h       |  8 ++++++++
>>  arch/x86/include/uapi/asm/kvm_para.h  |  3 ++-
>>  arch/x86/kernel/kvm.c                 | 11 +++++++++++
>>  arch/x86/kernel/paravirt.c            | 11 +++++++++++
>>  arch/x86/kvm/x86.c                    | 12 ++++++++++++
>>  include/linux/sched.h                 | 12 ++++++++++++
>>  kernel/locking/mutex.c                | 15 +++++++++++++--
>>  kernel/locking/osq_lock.c             | 10 +++++++++-
>>  kernel/locking/rwsem-xadd.c           | 16 +++++++++++++---
>>  11 files changed, 105 insertions(+), 7 deletions(-)
>>
>

Radim Krčmář

2016-Oct-19 17:24 UTC

head link

[PATCH v4 5/5] x86, kvm: support vcpu preempted check

2016-10-19 06:20-0400, Pan Xinhui:> This is to fix some lock holder preemption issues. Some other locks
> implementation do a spin loop before acquiring the lock itself.
> Currently kernel has an interface of bool vcpu_is_preempted(int cpu). It
> takes the cpu as parameter and return true if the cpu is preempted.  Then
> kernel can break the spin loops upon on the retval of vcpu_is_preempted.
> 
> As kernel has used this interface, So lets support it.
> 
> We use one field of struct kvm_steal_time to indicate that if one vcpu
> is running or not.
> 
> unix benchmark result:
> host:  kernel 4.8.1, i5-4570, 4 cpus
> guest: kernel 4.8.1, 8 vcpus
> 
> 	test-case			after-patch	  before-patch
> Execl Throughput                       |    18307.9 lps  |    11701.6 lps 
> File Copy 1024 bufsize 2000 maxblocks  |  1352407.3 KBps |   790418.9 KBps
> File Copy 256 bufsize 500 maxblocks    |   367555.6 KBps |   222867.7 KBps
> File Copy 4096 bufsize 8000 maxblocks  |  3675649.7 KBps |  1780614.4 KBps
> Pipe Throughput                        | 11872208.7 lps  | 11855628.9 lps 
> Pipe-based Context Switching           |  1495126.5 lps  |  1490533.9 lps 
> Process Creation                       |    29881.2 lps  |    28572.8 lps 
> Shell Scripts (1 concurrent)           |    23224.3 lpm  |    22607.4 lpm 
> Shell Scripts (8 concurrent)           |     3531.4 lpm  |     3211.9 lpm 
> System Call Overhead                   | 10385653.0 lps  | 10419979.0 lps 
> 
> Signed-off-by: Pan Xinhui <xinhui.pan at linux.vnet.ibm.com>
> ---
> diff --git a/arch/x86/include/asm/paravirt_types.h
b/arch/x86/include/asm/paravirt_types.h
> @@ -98,6 +98,10 @@ struct pv_time_ops {
>  	unsigned long long (*steal_clock)(int cpu);
>  };
>  
> +struct pv_vcpu_ops {
> +	bool (*vcpu_is_preempted)(int cpu);
> +};
> +
(I would put it into pv_lock_ops to save the plumbing.)
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h
b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -45,7 +45,8 @@ struct kvm_steal_time {
>  	__u64 steal;
>  	__u32 version;
>  	__u32 flags;
> -	__u32 pad[12];
> +	__u32 preempted;
Why __u32 instead of __u8?
> +	__u32 pad[11];
>  };
Please document the change in Documentation/virtual/kvm/msr.txt, section
MSR_KVM_STEAL_TIME.
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> @@ -415,6 +415,15 @@ void kvm_disable_steal_time(void)
> +static bool kvm_vcpu_is_preempted(int cpu)
> +{
> +	struct kvm_steal_time *src;
> +
> +	src = &per_cpu(steal_time, cpu);
> +
> +	return !!src->preempted;
> +}
> +
>  #ifdef CONFIG_SMP
>  static void __init kvm_smp_prepare_boot_cpu(void)
>  {
> @@ -488,6 +497,8 @@ void __init kvm_guest_init(void)
>  	kvm_guest_cpu_init();
>  #endif
>  
> +	pv_vcpu_ops.vcpu_is_preempted = kvm_vcpu_is_preempted;
Would be nicer to assign conditionally in the KVM_FEATURE_STEAL_TIME
block.  The steal_time structure has to be zeroed, so this code would
work, but the native function (return false) is better if we know that
the kvm_vcpu_is_preempted() would always return false anway.

Old KVMs won't have the feature, so we could also assign only when KVM
reports it, but that requires extra definitions and the performance gain
is fairly small, so I'm ok with this.
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> @@ -2057,6 +2057,8 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>  		&vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
>  		return;
>  
> +	vcpu->arch.st.steal.preempted = 0;
> +
>  	if (vcpu->arch.st.steal.version & 1)
>  		vcpu->arch.st.steal.version += 1;  /* first time write, random junk
*/
>  
> @@ -2812,6 +2814,16 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int
cpu)
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  {
> +	if (vcpu->arch.st.msr_val & KVM_MSR_ENABLED)
> +		if (kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
> +					&vcpu->arch.st.steal,
> +					sizeof(struct kvm_steal_time)) == 0) {
> +			vcpu->arch.st.steal.preempted = 1;
> +			kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
> +					&vcpu->arch.st.steal,
> +					sizeof(struct kvm_steal_time));
> +		}
Please name this block of code.  Something like
  kvm_steal_time_set_preempted(vcpu);

Thanks.

Maybe Matching Threads

Search for more seemingly similar threads

Linux Virtualization - Oct 2016 - [PATCH v4 0/5] implement vcpu preempted check

[PATCH v4 0/5] implement vcpu preempted check

[PATCH v4 0/5] implement vcpu preempted check

[PATCH v4 1/5] kernel/sched: introduce vcpu preempted check interface

[PATCH v4 2/5] locking/osq: Drop the overload of osq_lock()

[PATCH v4 3/5] kernel/locking: Drop the overload of {mutex, rwsem}_spin_on_owner

[PATCH v4 4/5] powerpc/spinlock: support vcpu preempted check

[PATCH v4 5/5] x86, kvm: support vcpu preempted check

[PATCH v4 0/5] implement vcpu preempted check

[PATCH v4 0/5] implement vcpu preempted check

[PATCH v4 0/5] implement vcpu preempted check

[PATCH v4 5/5] x86, kvm: support vcpu preempted check

Maybe Matching Threads