David Vrabel
2014-Mar-13 10:54 UTC
[PATCH v6 05/11] pvqspinlock, x86: Allow unfair spinlock in a PV guest
On 12/03/14 18:54, Waiman Long wrote:> Locking is always an issue in a virtualized environment as the virtual > CPU that is waiting on a lock may get scheduled out and hence block > any progress in lock acquisition even when the lock has been freed. > > One solution to this problem is to allow unfair lock in a > para-virtualized environment. In this case, a new lock acquirer can > come and steal the lock if the next-in-line CPU to get the lock is > scheduled out. Unfair lock in a native environment is generally not a > good idea as there is a possibility of lock starvation for a heavily > contended lock.I do not think this is a good idea -- the problems with unfair locks are worse in a virtualized guest. If a waiting VCPU deschedules and has to be kicked to grab a lock then it is very likely to lose a race with another running VCPU trying to take a lock (since it takes time for the VCPU to be rescheduled).> With the unfair locking activated on bare metal 4-socket Westmere-EX > box, the execution times (in ms) of a spinlock micro-benchmark were > as follows: > > # of Ticket Fair Unfair > tasks lock queue lock queue lock > ------ ------- ---------- ---------- > 1 135 135 137 > 2 1045 1120 747 > 3 1827 2345 1084 > 4 2689 2934 1438 > 5 3736 3658 1722 > 6 4942 4434 2092 > 7 6304 5176 2245 > 8 7736 5955 2388Are these figures with or without the later PV support patches? David
Paolo Bonzini
2014-Mar-13 13:16 UTC
[PATCH v6 05/11] pvqspinlock, x86: Allow unfair spinlock in a PV guest
Il 13/03/2014 11:54, David Vrabel ha scritto:> On 12/03/14 18:54, Waiman Long wrote: >> Locking is always an issue in a virtualized environment as the virtual >> CPU that is waiting on a lock may get scheduled out and hence block >> any progress in lock acquisition even when the lock has been freed. >> >> One solution to this problem is to allow unfair lock in a >> para-virtualized environment. In this case, a new lock acquirer can >> come and steal the lock if the next-in-line CPU to get the lock is >> scheduled out. Unfair lock in a native environment is generally not a >> good idea as there is a possibility of lock starvation for a heavily >> contended lock. > > I do not think this is a good idea -- the problems with unfair locks are > worse in a virtualized guest. If a waiting VCPU deschedules and has to > be kicked to grab a lock then it is very likely to lose a race with > another running VCPU trying to take a lock (since it takes time for the > VCPU to be rescheduled).Actually, I think the unfair version should be automatically selected if running on a hypervisor. Per-hypervisor pvops can choose to enable the fair one. Lock unfairness may be particularly evident on a virtualized guest when the host is overcommitted, but problems with fair locks are even worse. In fact, RHEL/CentOS 6 already uses unfair locks if X86_FEATURE_HYPERVISOR is set. The patch was rejected upstream in favor of pv ticketlocks, but pv ticketlocks do not cover all hypervisors so perhaps we could revisit that choice. Measurements were done by Gleb for two guests running 2.6.32 with 16 vcpus each, on a 16-core system. One guest ran with unfair locks, one guest ran with fair locks. Two kernel compilations ("time make -j 16 all") were started at the same time on both guests, and times were as follows: unfair: fair: real 13m34.674s real 19m35.827s user 96m2.638s user 102m38.665s sys 56m14.991s sys 158m22.470s real 13m3.768s real 19m4.375s user 95m34.509s user 111m9.903s sys 53m40.550s sys 141m59.370s Actually, interpreting the numbers shows an even worse slowdown. Compilation took ~6.5 minutes in a guest when the host was not overcommitted, and with unfair locks everything scaled just fine. Ticketlocks fell completely apart; during the first 13 minutes they were allotted 16*6.5=104 minutes of CPU time, and they spent almost all of it spinning in the kernel (102 minutes in the first run). They did perhaps 30 seconds worth of work because, as soon as the unfair-lock guest finished and the host was no longer overcommitted, compilation finished in 6 minutes. So that's approximately 12x slowdown from using non-pv fair locks (vs. unfair locks) on a 200%-overcommitted host. Paolo>> With the unfair locking activated on bare metal 4-socket Westmere-EX >> box, the execution times (in ms) of a spinlock micro-benchmark were >> as follows: >> >> # of Ticket Fair Unfair >> tasks lock queue lock queue lock >> ------ ------- ---------- ---------- >> 1 135 135 137 >> 2 1045 1120 747 >> 3 1827 2345 1084 >> 4 2689 2934 1438 >> 5 3736 3658 1722 >> 6 4942 4434 2092 >> 7 6304 5176 2245 >> 8 7736 5955 2388 > > Are these figures with or without the later PV support patches?
Waiman Long
2014-Mar-13 19:03 UTC
[PATCH v6 05/11] pvqspinlock, x86: Allow unfair spinlock in a PV guest
On 03/13/2014 06:54 AM, David Vrabel wrote:> On 12/03/14 18:54, Waiman Long wrote: >> Locking is always an issue in a virtualized environment as the virtual >> CPU that is waiting on a lock may get scheduled out and hence block >> any progress in lock acquisition even when the lock has been freed. >> >> One solution to this problem is to allow unfair lock in a >> para-virtualized environment. In this case, a new lock acquirer can >> come and steal the lock if the next-in-line CPU to get the lock is >> scheduled out. Unfair lock in a native environment is generally not a >> good idea as there is a possibility of lock starvation for a heavily >> contended lock. > I do not think this is a good idea -- the problems with unfair locks are > worse in a virtualized guest. If a waiting VCPU deschedules and has to > be kicked to grab a lock then it is very likely to lose a race with > another running VCPU trying to take a lock (since it takes time for the > VCPU to be rescheduled).I have seen figure that it will take about 1000 cycles to kick in a CPU. As long as the critical section isn't that long, there is enough time for a lock stealer to come in, grab the lock, do whatever it needs to do and leave without introducing too much latency to the kicked-in CPU. Anyway there are people who ask for unfair lock. In fact, RHEL6 ship a virtual guest with unfair lock. So I provide an option for those people who want unfair lock to enable it in their virtual guest. For those who don't want it, they can always turn them off when building the kernel.>> With the unfair locking activated on bare metal 4-socket Westmere-EX >> box, the execution times (in ms) of a spinlock micro-benchmark were >> as follows: >> >> # of Ticket Fair Unfair >> tasks lock queue lock queue lock >> ------ ------- ---------- ---------- >> 1 135 135 137 >> 2 1045 1120 747 >> 3 1827 2345 1084 >> 4 2689 2934 1438 >> 5 3736 3658 1722 >> 6 4942 4434 2092 >> 7 6304 5176 2245 >> 8 7736 5955 2388 > Are these figures with or without the later PV support patches?This is without the PV patch. Regards, Longman
Konrad Rzeszutek Wilk
2014-Mar-17 19:05 UTC
[PATCH v6 05/11] pvqspinlock, x86: Allow unfair spinlock in a PV guest
On Thu, Mar 13, 2014 at 02:16:06PM +0100, Paolo Bonzini wrote:> Il 13/03/2014 11:54, David Vrabel ha scritto: > >On 12/03/14 18:54, Waiman Long wrote: > >>Locking is always an issue in a virtualized environment as the virtual > >>CPU that is waiting on a lock may get scheduled out and hence block > >>any progress in lock acquisition even when the lock has been freed. > >> > >>One solution to this problem is to allow unfair lock in a > >>para-virtualized environment. In this case, a new lock acquirer can > >>come and steal the lock if the next-in-line CPU to get the lock is > >>scheduled out. Unfair lock in a native environment is generally not a > >>good idea as there is a possibility of lock starvation for a heavily > >>contended lock. > > > >I do not think this is a good idea -- the problems with unfair locks are > >worse in a virtualized guest. If a waiting VCPU deschedules and has to > >be kicked to grab a lock then it is very likely to lose a race with > >another running VCPU trying to take a lock (since it takes time for the > >VCPU to be rescheduled). > > Actually, I think the unfair version should be automatically > selected if running on a hypervisor. Per-hypervisor pvops can > choose to enable the fair one. > > Lock unfairness may be particularly evident on a virtualized guest > when the host is overcommitted, but problems with fair locks are > even worse. > > In fact, RHEL/CentOS 6 already uses unfair locks if > X86_FEATURE_HYPERVISOR is set. The patch was rejected upstream in > favor of pv ticketlocks, but pv ticketlocks do not cover all > hypervisors so perhaps we could revisit that choice. > > Measurements were done by Gleb for two guests running 2.6.32 with 16 > vcpus each, on a 16-core system. One guest ran with unfair locks, > one guest ran with fair locks. Two kernel compilations ("time makeAnd when you say fair locks are you saying PV ticketlocks or generic ticketlocks?> -j 16 all") were started at the same time on both guests, and times > were as follows: > > unfair: fair: > real 13m34.674s real 19m35.827s > user 96m2.638s user 102m38.665s > sys 56m14.991s sys 158m22.470s > > real 13m3.768s real 19m4.375s > user 95m34.509s user 111m9.903s > sys 53m40.550s sys 141m59.370s > > Actually, interpreting the numbers shows an even worse slowdown. > > Compilation took ~6.5 minutes in a guest when the host was not > overcommitted, and with unfair locks everything scaled just fine.You should see the same values with the PV ticketlock. It is not clear to me if this testing did include that variant of locks?> > Ticketlocks fell completely apart; during the first 13 minutes they > were allotted 16*6.5=104 minutes of CPU time, and they spent almost > all of it spinning in the kernel (102 minutes in the first run).Right, the non-PV variant of them do fall apart. That is why PV ticketlocks are so nice.> They did perhaps 30 seconds worth of work because, as soon as the > unfair-lock guest finished and the host was no longer overcommitted, > compilation finished in 6 minutes. > > So that's approximately 12x slowdown from using non-pv fair locks > (vs. unfair locks) on a 200%-overcommitted host.Ah, so it was non-PV. I am curious if the test was any different if you tested PV ticketlocks vs Red Hat variant of unfair locks.> > Paolo > > >>With the unfair locking activated on bare metal 4-socket Westmere-EX > >>box, the execution times (in ms) of a spinlock micro-benchmark were > >>as follows: > >> > >> # of Ticket Fair Unfair > >> tasks lock queue lock queue lock > >> ------ ------- ---------- ---------- > >> 1 135 135 137 > >> 2 1045 1120 747 > >> 3 1827 2345 1084 > >> 4 2689 2934 1438 > >> 5 3736 3658 1722 > >> 6 4942 4434 2092 > >> 7 6304 5176 2245 > >> 8 7736 5955 2388 > > > >Are these figures with or without the later PV support patches? > >
Possibly Parallel Threads
- [PATCH v6 05/11] pvqspinlock, x86: Allow unfair spinlock in a PV guest
- [PATCH v6 05/11] pvqspinlock, x86: Allow unfair spinlock in a PV guest
- [PATCH v6 05/11] pvqspinlock, x86: Allow unfair spinlock in a PV guest
- [PATCH v6 05/11] pvqspinlock, x86: Allow unfair spinlock in a PV guest
- [PATCH v6 05/11] pvqspinlock, x86: Allow unfair spinlock in a PV guest