thr3ads.net - Virtualization - [PATCH RFC v6 09/11] pvqspinlock, x86: Add qspinlock para-virtualization support [Mar 2014]

If this information is useful, please help other people find it:
Share via:

David Vrabel

2014-Mar-13 11:21 UTC

[PATCH RFC v6 09/11] pvqspinlock, x86: Add qspinlock para-virtualization support

On 12/03/14 18:54, Waiman Long wrote:> This patch adds para-virtualization support to the queue spinlock in
> the same way as was done in the PV ticket lock code. In essence, the
> lock waiters will spin for a specified number of times (QSPIN_THRESHOLD
> = 2^14) and then halted itself. The queue head waiter will spins
> 2*QSPIN_THRESHOLD times before halting itself. When it has spinned
> QSPIN_THRESHOLD times, the queue head will assume that the lock
> holder may be scheduled out and attempt to kick the lock holder CPU
> if it has the CPU number on hand.
I don't really understand the reasoning for kicking the lock holder.  It
will either be: running, runnable, or halted because it's in a slow path
wait for another lock.  In any of these states I do not see how a kick
is useful.
> Enabling the PV code does have a performance impact on spinlock
> acquisitions and releases. The following table shows the execution
> time (in ms) of a spinlock micro-benchmark that does lock/unlock
> operations 5M times for each task versus the number of contending
> tasks on a Westmere-EX system.
> 
>   # of        Ticket lock	     Queue lock
>   tasks   PV off/PV on/%Change 	  PV off/PV on/%Change
>   ------  --------------------   ---------------------
>     1	     135/  179/+33%	     137/  169/+23%
>     2	    1045/ 1103/ +6%	    1120/ 1536/+37%
>     3	    1827/ 2683/+47%	    2313/ 2425/ +5%
>     4       2689/ 4191/+56%	    2914/ 3128/ +7%
>     5       3736/ 5830/+56%	    3715/ 3762/ +1%
>     6       4942/ 7609/+54%	    4504/ 4558/ +2%
>     7       6304/ 9570/+52%	    5292/ 5351/ +1%
>     8       7736/11323/+46%	    6037/ 6097/ +1%
Do you have measurements from tests when VCPUs are overcommitted?
> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
> +/**
> + * queue_spin_unlock_slowpath - kick up the CPU of the queue head
> + * @lock : Pointer to queue spinlock structure
> + *
> + * The lock is released after finding the queue head to avoid racing
> + * condition between the queue head and the lock holder.
> + */
> +void queue_spin_unlock_slowpath(struct qspinlock *lock)
> +{
> +	struct qnode *node, *prev;
> +	u32 qcode = (u32)queue_get_qcode(lock);
> +
> +	/*
> +	 * Get the queue tail node
> +	 */
> +	node = xlate_qcode(qcode);
> +
> +	/*
> +	 * Locate the queue head node by following the prev pointer from
> +	 * tail to head.
> +	 * It is assumed that the PV guests won't have that many CPUs so
> +	 * that it won't take a long time to follow the pointers.
This isn't a valid assumption, but this isn't that different from the
search done in the ticket slow unlock path so I guess it's ok.

David

Paolo Bonzini

2014-Mar-13 13:57 UTC

head link

[PATCH RFC v6 09/11] pvqspinlock, x86: Add qspinlock para-virtualization support

Il 13/03/2014 12:21, David Vrabel ha scritto:> On 12/03/14 18:54, Waiman Long wrote:
>> This patch adds para-virtualization support to the queue spinlock in
>> the same way as was done in the PV ticket lock code. In essence, the
>> lock waiters will spin for a specified number of times (QSPIN_THRESHOLD
>> = 2^14) and then halted itself. The queue head waiter will spins
>> 2*QSPIN_THRESHOLD times before halting itself. When it has spinned
>> QSPIN_THRESHOLD times, the queue head will assume that the lock
>> holder may be scheduled out and attempt to kick the lock holder CPU
>> if it has the CPU number on hand.
>
> I don't really understand the reasoning for kicking the lock holder.
I agree.  If the lock holder isn't running, there's probably a good 
reason for that and going to sleep will not necessarily convince the 
scheduler to give more CPU to the lock holder.  I think there are two 
choices:

1) use yield_to to donate part of the waiter's quantum to the lock 
holder?    For this we probably need a new, separate hypercall 
interface.  For KVM it would be the same as hlt in the guest but with an 
additional yield_to in the host.

2) do nothing, just go to sleep.

Could you get (or do you have) numbers for (2)?

More important, I think a barrier is missing:

	Lock holder ---------------------------------------

	// queue_spin_unlock
	barrier();
	ACCESS_ONCE(qlock->lock) = 0;
	barrier();

	// pv_kick_node:
	if (pv->cpustate != PV_CPU_HALTED)
		return;
	ACCESS_ONCE(pv->cpustate) = PV_CPU_KICKED;
	__queue_kick_cpu(pv->mycpu, PV_KICK_QUEUE_HEAD);

		Waiter -------------------------------------------

		// pv_head_spin_check
		ACCESS_ONCE(pv->cpustate) = PV_CPU_HALTED;
		lockval = cmpxchg(&qlock->lock,
				  _QSPINLOCK_LOCKED,
				  _QSPINLOCK_LOCKED_SLOWPATH);
		if (lockval == 0) {
			/*
			 * Can exit now as the lock is free
			 */
			ACCESS_ONCE(pv->cpustate) = PV_CPU_ACTIVE;
			*count = 0;
			return;
		}
		__queue_hibernate();

Nothing protects from writing qlock->lock before pv->cpustate is read, 
leading to this:

	Lock holder			Waiter
	---------------------------------------------------------------
	read pv->cpustate
		(it is PV_CPU_ACTIVE)
					pv->cpustate = PV_CPU_HALTED
					lockval = cmpxchg(...)
					hibernate()
	qlock->lock = 0
	if (pv->cpustate != PV_CPU_HALTED)
		return;

I think you need this:

-	if (pv->cpustate != PV_CPU_HALTED)
-		return;
-	ACCESS_ONCE(pv->cpustate) = PV_CPU_KICKED;
+	if (cmpxchg(pv->cpustate, PV_CPU_HALTED, PV_CPU_KICKED)
+			!= PV_CPU_HALTED)
+		return;

Paolo

Waiman Long

2014-Mar-13 19:05 UTC

head link

[PATCH RFC v6 09/11] pvqspinlock, x86: Add qspinlock para-virtualization support

On 03/13/2014 07:21 AM, David Vrabel wrote:> On 12/03/14 18:54, Waiman Long wrote:
>> This patch adds para-virtualization support to the queue spinlock in
>> the same way as was done in the PV ticket lock code. In essence, the
>> lock waiters will spin for a specified number of times (QSPIN_THRESHOLD
>> = 2^14) and then halted itself. The queue head waiter will spins
>> 2*QSPIN_THRESHOLD times before halting itself. When it has spinned
>> QSPIN_THRESHOLD times, the queue head will assume that the lock
>> holder may be scheduled out and attempt to kick the lock holder CPU
>> if it has the CPU number on hand.
> I don't really understand the reasoning for kicking the lock holder. 
It
> will either be: running, runnable, or halted because it's in a slow
path
> wait for another lock.  In any of these states I do not see how a kick
> is useful.
You may be right. I can certainly take this part out of the patch if 
people don't think that is useful.
>> Enabling the PV code does have a performance impact on spinlock
>> acquisitions and releases. The following table shows the execution
>> time (in ms) of a spinlock micro-benchmark that does lock/unlock
>> operations 5M times for each task versus the number of contending
>> tasks on a Westmere-EX system.
>>
>>    # of        Ticket lock	     Queue lock
>>    tasks   PV off/PV on/%Change 	  PV off/PV on/%Change
>>    ------  --------------------   ---------------------
>>      1	     135/  179/+33%	     137/  169/+23%
>>      2	    1045/ 1103/ +6%	    1120/ 1536/+37%
>>      3	    1827/ 2683/+47%	    2313/ 2425/ +5%
>>      4       2689/ 4191/+56%	    2914/ 3128/ +7%
>>      5       3736/ 5830/+56%	    3715/ 3762/ +1%
>>      6       4942/ 7609/+54%	    4504/ 4558/ +2%
>>      7       6304/ 9570/+52%	    5292/ 5351/ +1%
>>      8       7736/11323/+46%	    6037/ 6097/ +1%
> Do you have measurements from tests when VCPUs are overcommitted?
I don't have a measurement with overcommitted guests yet. I will set up 
such an environment and do some tests on it.
>> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
>> +/**
>> + * queue_spin_unlock_slowpath - kick up the CPU of the queue head
>> + * @lock : Pointer to queue spinlock structure
>> + *
>> + * The lock is released after finding the queue head to avoid racing
>> + * condition between the queue head and the lock holder.
>> + */
>> +void queue_spin_unlock_slowpath(struct qspinlock *lock)
>> +{
>> +	struct qnode *node, *prev;
>> +	u32 qcode = (u32)queue_get_qcode(lock);
>> +
>> +	/*
>> +	 * Get the queue tail node
>> +	 */
>> +	node = xlate_qcode(qcode);
>> +
>> +	/*
>> +	 * Locate the queue head node by following the prev pointer from
>> +	 * tail to head.
>> +	 * It is assumed that the PV guests won't have that many CPUs so
>> +	 * that it won't take a long time to follow the pointers.
> This isn't a valid assumption, but this isn't that different from
the
> search done in the ticket slow unlock path so I guess it's ok.
>
> David
I will change that to say that in most cases, the queue length will be 
short.

-Longman

Waiman Long

2014-Mar-13 19:49 UTC

head link

[PATCH RFC v6 09/11] pvqspinlock, x86: Add qspinlock para-virtualization support

On 03/13/2014 09:57 AM, Paolo Bonzini wrote:> Il 13/03/2014 12:21, David Vrabel ha scritto:
>> On 12/03/14 18:54, Waiman Long wrote:
>>> This patch adds para-virtualization support to the queue spinlock
in
>>> the same way as was done in the PV ticket lock code. In essence,
the
>>> lock waiters will spin for a specified number of times
(QSPIN_THRESHOLD
>>> = 2^14) and then halted itself. The queue head waiter will spins
>>> 2*QSPIN_THRESHOLD times before halting itself. When it has spinned
>>> QSPIN_THRESHOLD times, the queue head will assume that the lock
>>> holder may be scheduled out and attempt to kick the lock holder CPU
>>> if it has the CPU number on hand.
>>
>> I don't really understand the reasoning for kicking the lock
holder.
>
> I agree.  If the lock holder isn't running, there's probably a good
> reason for that and going to sleep will not necessarily convince the 
> scheduler to give more CPU to the lock holder.  I think there are two 
> choices:
>
> 1) use yield_to to donate part of the waiter's quantum to the lock 
> holder?    For this we probably need a new, separate hypercall 
> interface.  For KVM it would be the same as hlt in the guest but with 
> an additional yield_to in the host.
>
> 2) do nothing, just go to sleep.
>
> Could you get (or do you have) numbers for (2)?
I will take out the lock holder kick portion from the patch. I will also 
try to collect more test data.
>
> More important, I think a barrier is missing:
>
>     Lock holder ---------------------------------------
>
>     // queue_spin_unlock
>     barrier();
>     ACCESS_ONCE(qlock->lock) = 0;
>     barrier();
>
This is not the unlock code that is used when PV spinlock is enabled. 
The right unlock code is

         if (static_key_false(&paravirt_spinlocks_enabled)) {
                 /*
                  * Need to atomically clear the lock byte to avoid 
racing with
                  * queue head waiter trying to set 
_QSPINLOCK_LOCKED_SLOWPATH.
                  */
                 if (likely(cmpxchg(&qlock->lock, _QSPINLOCK_LOCKED, 0)
                                 == _QSPINLOCK_LOCKED))
                         return;
                 else
                         queue_spin_unlock_slowpath(lock);

         } else {
                 __queue_spin_unlock(lock);
         }
>     // pv_kick_node:
>     if (pv->cpustate != PV_CPU_HALTED)
>         return;
>     ACCESS_ONCE(pv->cpustate) = PV_CPU_KICKED;
>     __queue_kick_cpu(pv->mycpu, PV_KICK_QUEUE_HEAD);
>
>         Waiter -------------------------------------------
>
>         // pv_head_spin_check
>         ACCESS_ONCE(pv->cpustate) = PV_CPU_HALTED;
>         lockval = cmpxchg(&qlock->lock,
>                   _QSPINLOCK_LOCKED,
>                   _QSPINLOCK_LOCKED_SLOWPATH);
>         if (lockval == 0) {
>             /*
>              * Can exit now as the lock is free
>              */
>             ACCESS_ONCE(pv->cpustate) = PV_CPU_ACTIVE;
>             *count = 0;
>             return;
>         }
>         __queue_hibernate();
>
> Nothing protects from writing qlock->lock before pv->cpustate is
read,
> leading to this:
>
>     Lock holder            Waiter
>     ---------------------------------------------------------------
>     read pv->cpustate
>         (it is PV_CPU_ACTIVE)
>                     pv->cpustate = PV_CPU_HALTED
>                     lockval = cmpxchg(...)
>                     hibernate()
>     qlock->lock = 0
>     if (pv->cpustate != PV_CPU_HALTED)
>         return;
>
The lock holder will read cpustate only if the lock byte has been 
changed to _QSPINLOCK_LOCKED_SLOWPATH. So the setting of the lock byte 
synchronize the 2 threads. The only thing that I am not certain is when 
the waiter is trying to go to sleep while, at the same time, the lock 
holder is trying to kick it. Will there be a missed wakeup because of 
this timing issue?

-Longman

Maybe Matching Threads

Search for more reasonably related threads

Virtualization - Mar 2014 - [PATCH RFC v6 09/11] pvqspinlock, x86: Add qspinlock para-virtualization support

[PATCH RFC v6 09/11] pvqspinlock, x86: Add qspinlock para-virtualization support

[PATCH RFC v6 09/11] pvqspinlock, x86: Add qspinlock para-virtualization support

[PATCH RFC v6 09/11] pvqspinlock, x86: Add qspinlock para-virtualization support

[PATCH RFC v6 09/11] pvqspinlock, x86: Add qspinlock para-virtualization support

Maybe Matching Threads