thr3ads.net - Virtualization - [PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly [May 2008]

If this information is useful, please help other people find it:
Share via:

Christian Borntraeger

2008-May-08 13:20 UTC

[PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly

On kvm I have seen some rare hangs in stop_machine when I used more guest
cpus than hosts cpus. e.g. 32 guest cpus on 1 host cpu triggered the
hang quite often. I could also reproduce the problem on a 4 way z/VM host with 
a 64 way guest.

It turned out that the guest was consuming all available cpus mostly for
spinning on scheduler locks like rq->lock. This is expected as the threads
are
calling yield all the time. 
The problem is now, that the host scheduling decisings together with the guest 
scheduling decisions and spinlocks not being fair managed to create an 
interesting scenario similar to a live lock. (Sometimes the hang resolved 
itself after some minutes)

Changing stop_machine to yield the cpu to the hypervisor when yielding inside 
the guest fixed the problem for me. While I am not completely happy with this 
patch, I think it causes no harm and it really improves the situation for me.

I used cpu_relax for yielding to the hypervisor, does that work on all 
architectures?

p.s.: If you want to reproduce the problem, cpu hotplug and kprobes use 
stop_machine_run and both triggered the problem after some retries. 


Signed-off-by: Christian Borntraeger <borntraeger at de.ibm.com>
CC: Ingo Molnar <mingo at elte.hu>
CC: Rusty Russell <rusty at rustcorp.com.au>

---
 kernel/stop_machine.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Index: kvm/kernel/stop_machine.c
==================================================================---
kvm.orig/kernel/stop_machine.c
+++ kvm/kernel/stop_machine.c
@@ -62,8 +62,7 @@ static int stopmachine(void *cpu)
 		 * help our sisters onto their CPUs. */
 		if (!prepared && !irqs_disabled)
 			yield();
-		else
-			cpu_relax();
+		cpu_relax();
 	}
 
 	/* Ack: we are exiting. */
@@ -106,8 +105,10 @@ static int stop_machine(void)
 	}
 
 	/* Wait for them all to come to life. */
-	while (atomic_read(&stopmachine_thread_ack) != stopmachine_num_threads)
+	while (atomic_read(&stopmachine_thread_ack) != stopmachine_num_threads) {
 		yield();
+		cpu_relax();
+	}
 
 	/* If some failed, kill them all. */
 	if (ret < 0) {

Jeremy Fitzhardinge

2008-May-08 13:33 UTC

head link

[PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly

Christian Borntraeger wrote:> On kvm I have seen some rare hangs in stop_machine when I used more guest
> cpus than hosts cpus. e.g. 32 guest cpus on 1 host cpu triggered the
> hang quite often. I could also reproduce the problem on a 4 way z/VM host
with
> a 64 way guest.
>   
I think that's one of those "don't do that then" cases ;)
> It turned out that the guest was consuming all available cpus mostly for
> spinning on scheduler locks like rq->lock. This is expected as the
threads are
> calling yield all the time. 
> The problem is now, that the host scheduling decisings together with the
guest
> scheduling decisions and spinlocks not being fair managed to create an 
> interesting scenario similar to a live lock. (Sometimes the hang resolved 
> itself after some minutes)
>   
I think x86 (at least) is now using ticket locks, which is fair.  Which 
kernel are you seeing this problem on?
> Changing stop_machine to yield the cpu to the hypervisor when yielding
inside
> the guest fixed the problem for me. While I am not completely happy with
this
> patch, I think it causes no harm and it really improves the situation for
me.
>
> I used cpu_relax for yielding to the hypervisor, does that work on all 
> architectures?
>   
On x86, cpu_relax is just a "pause" instruction ("rep;nop").
We don't
hook it in paravirt_ops, and while VT/SVM can be used to fault into the 
hypervisor on this instruction, I don't know if kvm actually does so.  
Either way, it wouldn't work for VMI, Xen or lguest.

    J
> p.s.: If you want to reproduce the problem, cpu hotplug and kprobes use 
> stop_machine_run and both triggered the problem after some retries. 
>
>
> Signed-off-by: Christian Borntraeger <borntraeger at de.ibm.com>
> CC: Ingo Molnar <mingo at elte.hu>
> CC: Rusty Russell <rusty at rustcorp.com.au>
>
> ---
>  kernel/stop_machine.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> Index: kvm/kernel/stop_machine.c
> ==================================================================> ---
kvm.orig/kernel/stop_machine.c
> +++ kvm/kernel/stop_machine.c
> @@ -62,8 +62,7 @@ static int stopmachine(void *cpu)
>  		 * help our sisters onto their CPUs. */
>  		if (!prepared && !irqs_disabled)
>  			yield();
> -		else
> -			cpu_relax();
> +		cpu_relax();
>  	}
>  
>  	/* Ack: we are exiting. */
> @@ -106,8 +105,10 @@ static int stop_machine(void)
>  	}
>  
>  	/* Wait for them all to come to life. */
> -	while (atomic_read(&stopmachine_thread_ack) !=
stopmachine_num_threads)
> +	while (atomic_read(&stopmachine_thread_ack) !=
stopmachine_num_threads) {
>  		yield();
> +		cpu_relax();
> +	}
>  
>  	/* If some failed, kill them all. */
>  	if (ret < 0) {
>
> _______________________________________________
> Virtualization mailing list
> Virtualization at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/virtualization
>

Rusty Russell

2008-May-09 01:10 UTC

head link

[PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly

On Thursday 08 May 2008 23:20:38 Christian Borntraeger
wrote:> Changing stop_machine to yield the cpu to the hypervisor when yielding
> inside the guest fixed the problem for me. While I am not completely happy
> with this patch, I think it causes no harm and it really improves the
> situation for me.
Yes, this change is harmless.  I'm reworking (ie. rewriting) stop_machine at
the moment to simplify it, and as a side effect it won't be yielding.  (The 
yield is almost useless, since there's nothing at same priority as this 
thread anyway).

I've included this patch for my next push to Linus.

Thanks,
Rusty.

Apparently Analagous Threads

Search for more seemingly similar threads

Virtualization - May 2008 - [PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly

[PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly

[PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly

[PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly

Apparently Analagous Threads