I noticed that Linux has the "Ticket spinlock" patch now merged for 2.6.25 (see http://lwn.net/Articles/267968/). Basically, the new spinlock ensures that lock contenders will acquire the lock in FIFO order. This is good news if you''re concerned with latency guarantees or fairness for larger scale SMP systems. However, spinlocks can have some rather unpleasant effects on multi-processor guests. You don''t want to preempt a vCPU holding a spinlock because other vCPUs trying to acquire the lock will then spend their processing time busy-waiting for the usually shortly held lock. Gang-scheduling gets around the problem by ensuring that lock contenders are not busy-waiting while the lock-holder is preempted, but often leads to wasted processing resources. The bad news is that the lock-holder preemption problem is aggravated when one introduces ticket spinlocks. Not only should one try to prevent preempting lock-holders. Since the lock contenders are granted the lock in FIFO order one had better also make sure that the waiting vCPUs are scheduled in the apropriate order. With regular spinlocks any one of the waiting vCPUs can acquire the lock and be done with it quickly. This is not an option with ticket spinlocks. I have seen various references to ''bad preemption'' avoidance in Xen, but not any concrete description of what this actually means. Has some lock-holder preemption mechanism been implelemted or was it deemed unnecessary. A few years ago we did some measurements of the effects of lock-holder preemptions in SMP guests [1]. However, the experiments were performed using a 2.4.20 kernel. The picture is probably a bit different today. Do people feel that the new ticket spinlocks should raise any concerns, or do typical Xen SMP guest workloads remain largely unaffected by the new locking scheme? eSk [1] http://l4ka.org/publications/paper.php?docid=1086 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 14/2/08 13:43, "Espen Skoglund" <espen.skoglund@netronome.com> wrote:> Do people feel that the new ticket spinlocks should raise any > concerns, or do typical Xen SMP guest workloads remain largely > unaffected by the new locking scheme?That''s a good question which I do not think can be answered without taking some measurements. If it''s an issue we might consider pv_ops''ifying spinlocks to turn them into sleeping locks (this would be easy these days, since spin_lock() and spin_unlock() are not inlined). I have some neat ideas about how that might work requiring no extra space for spinlock_t and no modifications at the hypervisor interface. I suppose it depends whether there are any hot spinlocks in the kernel these days. I know things have improved a lot in recent years with things like RCU. Most spinlock regions are now pretty short. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser >Sent: 2008年2月14日 23:24 > >On 14/2/08 13:43, "Espen Skoglund" ><espen.skoglund@netronome.com> wrote: > >> Do people feel that the new ticket spinlocks should raise any >> concerns, or do typical Xen SMP guest workloads remain largely >> unaffected by the new locking scheme? > >That''s a good question which I do not think can be answered >without taking >some measurements. If it''s an issue we might consider pv_ops''ifying >spinlocks to turn them into sleeping locks (this would be easy >these days, >since spin_lock() and spin_unlock() are not inlined). I have >some neat ideas >about how that might work requiring no extra space for >spinlock_t and no >modifications at the hypervisor interface. > >I suppose it depends whether there are any hot spinlocks in >the kernel these >days. I know things have improved a lot in recent years with >things like >RCU. Most spinlock regions are now pretty short. >The side effect is to enlarge average critical section length, if most spinlock regions are pretty short. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/2/08 01:03, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> I suppose it depends whether there are any hot spinlocks in >> the kernel these >> days. I know things have improved a lot in recent years with >> things like >> RCU. Most spinlock regions are now pretty short. >> > > The side effect is to enlarge average critical section length, if > most spinlock regions are pretty short.Side effect of ticket-based locks, or pv''ed sleeping locks? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser >Sent: 2008年2月15日 16:23 > >On 15/2/08 01:03, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >>> I suppose it depends whether there are any hot spinlocks in >>> the kernel these >>> days. I know things have improved a lot in recent years with >>> things like >>> RCU. Most spinlock regions are now pretty short. >>> >> >> The side effect is to enlarge average critical section length, if >> most spinlock regions are pretty short. > >Side effect of ticket-based locks, or pv''ed sleeping locks? > > -- Keir >Side effect about pv''ed sleeping locks, since originally maybe dozens of cycles are used to spin lock while pv''ed locks force hypercall to Xen and may even require owner with another unlock hypercall to notify back? Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/2/08 08:27, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> Side effect of ticket-based locks, or pv''ed sleeping locks? >> >> -- Keir >> > > Side effect about pv''ed sleeping locks, since originally maybe > dozens of cycles are used to spin lock while pv''ed locks force > hypercall to Xen and may even require owner with another > unlock hypercall to notify back?You would of course spin for a while and only then sleep. That''s a standard mutex implementation trick. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk] >Sent: 2008年2月15日 16:37 > >On 15/2/08 08:27, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >>> Side effect of ticket-based locks, or pv''ed sleeping locks? >>> >>> -- Keir >>> >> >> Side effect about pv''ed sleeping locks, since originally maybe >> dozens of cycles are used to spin lock while pv''ed locks force >> hypercall to Xen and may even require owner with another >> unlock hypercall to notify back? > >You would of course spin for a while and only then sleep. >That''s a standard >mutex implementation trick. >I''m not sure how to define ''a while'', since even for same critical section the spin cycles varies at different point. You always risk adding more overhead than a normal spin loop. But well, it depends on how frequent forementioned case may occur, and the gain of pv''ed spinlock may be larger than overhead it causes. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/2/08 08:42, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> You would of course spin for a while and only then sleep. >> That''s a standard >> mutex implementation trick. >> > > I''m not sure how to define ''a while'', since even for same critical > section the spin cycles varies at different point. You always risk > adding more overhead than a normal spin loop. But well, it depends > on how frequent forementioned case may occur, and the gain > of pv''ed spinlock may be larger than overhead it causes.You could certainly end up in the situation that the lock becomes available just after you decide to sleep, no matter what spin threshold you choose. It''s a balance of probabilities: e.g., if you spin for 1us, what is the probability distribution of remaining wait time? If the lock-holder is preempted then you are likely to spin for ages. That, coupled with most spinlock regions in the kernel being very fast, means that we wouldn''t need to be very smart to filter out the former cases without hurting performance in the latter. The distribution of waits will be very obviously bimodal. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk] >Sent: 2008年2月15日 16:53 > >On 15/2/08 08:42, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >>> You would of course spin for a while and only then sleep. >>> That''s a standard >>> mutex implementation trick. >>> >> >> I''m not sure how to define ''a while'', since even for same critical >> section the spin cycles varies at different point. You always risk >> adding more overhead than a normal spin loop. But well, it depends >> on how frequent forementioned case may occur, and the gain >> of pv''ed spinlock may be larger than overhead it causes. > >You could certainly end up in the situation that the lock >becomes available >just after you decide to sleep, no matter what spin threshold >you choose. >It''s a balance of probabilities: e.g., if you spin for 1us, what is the >probability distribution of remaining wait time? If the lock-holder is >preempted then you are likely to spin for ages. That, coupled with most >spinlock regions in the kernel being very fast, means that we >wouldn''t need >to be very smart to filter out the former cases without >hurting performance >in the latter. The distribution of waits will be very >obviously bimodal. > > -- KeirYes, that makes sense. It can be extended to cover ticket spinlock usage, like forcing sleep if smaller ticket is already in, and only wake vcpu with ''next'' ticket at unlock, etc. :-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
[Keir Fraser]> On 15/2/08 08:42, "Tian, Kevin" <kevin.tian@intel.com> wrote: >>> You would of course spin for a while and only then sleep. >>> That''s a standard >>> mutex implementation trick. >>> >> >> I''m not sure how to define ''a while'', since even for same critical >> section the spin cycles varies at different point. You always risk >> adding more overhead than a normal spin loop. But well, it depends >> on how frequent forementioned case may occur, and the gain of pv''ed >> spinlock may be larger than overhead it causes.> You could certainly end up in the situation that the lock becomes > available just after you decide to sleep, no matter what spin > threshold you choose. It''s a balance of probabilities: e.g., if you > spin for 1us, what is the probability distribution of remaining wait > time? If the lock-holder is preempted then you are likely to spin > for ages. That, coupled with most spinlock regions in the kernel > being very fast, means that we wouldn''t need to be very smart to > filter out the former cases without hurting performance in the > latter. The distribution of waits will be very obviously bimodal.Just as a sidenote: When we measured the 2.4 kernel years ago we found that more than 90% of the spinlocks were held for less than 20us (on very kernel intensive workloads). That number is likely to be much smaller today. eSk _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel