Linus Torvalds
2015-Feb-10  00:53 UTC
[PATCH] x86 spinlock: Fix memory corruption on completing completions
On Mon, Feb 9, 2015 at 4:02 AM, Peter Zijlstra <peterz at infradead.org> wrote:> On Mon, Feb 09, 2015 at 03:04:22PM +0530, Raghavendra K T wrote: >> So we have 3 choices, >> 1. xadd >> 2. continue with current approach. >> 3. a read before unlock and also after that. > > For the truly paranoid we have probe_kernel_address(), suppose the lock > was in module space and the module just got unloaded under us.That's much too expensive. The xadd shouldn't be noticeably more expensive than the current "add_smp()". Yes, "lock xadd" used to be several cycles slower than just "lock add" on some early cores, but I think these days it's down to a single-cycle difference, which is not really different from doing a separate load after the add. The real problem with xadd used to be that we always had to do magic special-casing for i386, but that's one of the reasons we dropped support for original 80386. So I think Raghavendra's last version (which hopefully fixes the lockup problem that Sasha reported) together with changing that add_smp(&lock->tickets.head, TICKET_LOCK_INC); if (READ_ONCE(lock->tickets.tail) & TICKET_SLOWPATH_FLAG) .. into something like val = xadd((&lock->ticket.head_tail, TICKET_LOCK_INC << TICKET_SHIFT); if (unlikely(val & TICKET_SLOWPATH_FLAG)) ... would be the right thing to do. Somebody should just check that I got that shift right, and that the tail is in the high bytes (head really needs to be high to work, if it's in the low byte(s) the xadd would overflow from head into tail which would be wrong). Linus
Raghavendra K T
2015-Feb-10  09:30 UTC
[PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/10/2015 06:23 AM, Linus Torvalds wrote:> On Mon, Feb 9, 2015 at 4:02 AM, Peter Zijlstra <peterz at infradead.org> wrote: >> On Mon, Feb 09, 2015 at 03:04:22PM +0530, Raghavendra K T wrote: >>> So we have 3 choices, >>> 1. xadd >>> 2. continue with current approach. >>> 3. a read before unlock and also after that. >> >> For the truly paranoid we have probe_kernel_address(), suppose the lock >> was in module space and the module just got unloaded under us. > > That's much too expensive. > > The xadd shouldn't be noticeably more expensive than the current > "add_smp()". Yes, "lock xadd" used to be several cycles slower than > just "lock add" on some early cores, but I think these days it's down > to a single-cycle difference, which is not really different from doing > a separate load after the add. > > The real problem with xadd used to be that we always had to do magic > special-casing for i386, but that's one of the reasons we dropped > support for original 80386. > > So I think Raghavendra's last version (which hopefully fixes the > lockup problem that Sasha reported) together with changing thatV2 did pass the stress, but getting confirmation Sasha would help.> add_smp(&lock->tickets.head, TICKET_LOCK_INC); > if (READ_ONCE(lock->tickets.tail) & TICKET_SLOWPATH_FLAG) .. > > into something like > > val = xadd((&lock->ticket.head_tail, TICKET_LOCK_INC << TICKET_SHIFT); > if (unlikely(val & TICKET_SLOWPATH_FLAG)) ... > > would be the right thing to do. Somebody should just check that I got > that shift right, and that the tail is in the high bytes (head really > needs to be high to work, if it's in the low byte(s) the xadd would > overflow from head into tail which would be wrong).Unfortunately xadd could result in head overflow as tail is high. The other option was repeated cmpxchg which is bad I believe. Any suggestions? ( I have the V3 with Oleg's suggestion and performance numbers but without this getting resolved, It will be one unnecessary iteration). How about getting rid off SLOW_PATH_FLAG in spinlock (i.e. use it only as hint for paravirt), but do unlock_kick whenever we see that (tail-head) > TICKET_LOCK_INC?. (but this also may need cmpxchg in loop in unlock but we will be able to get rid of clear slowpath logic) Only problem is we may do unnecessary kicks even in 1x load.
Denys Vlasenko
2015-Feb-10  13:18 UTC
[PATCH] x86 spinlock: Fix memory corruption on completing completions
On Tue, Feb 10, 2015 at 10:30 AM, Raghavendra K T <raghavendra.kt at linux.vnet.ibm.com> wrote:> On 02/10/2015 06:23 AM, Linus Torvalds wrote: >> add_smp(&lock->tickets.head, TICKET_LOCK_INC); >> if (READ_ONCE(lock->tickets.tail) & TICKET_SLOWPATH_FLAG) .. >> >> into something like >> >> val = xadd((&lock->ticket.head_tail, TICKET_LOCK_INC << >> TICKET_SHIFT); >> if (unlikely(val & TICKET_SLOWPATH_FLAG)) ... >> >> would be the right thing to do. Somebody should just check that I got >> that shift right, and that the tail is in the high bytes (head really >> needs to be high to work, if it's in the low byte(s) the xadd would >> overflow from head into tail which would be wrong). > > Unfortunately xadd could result in head overflow as tail is high.xadd can overflow, but is this really a problem? # define HEAD_MASK (TICKET_SLOWPATH_FLAG-1) ... unlock_again: val = xadd((&lock->ticket.head_tail, TICKET_LOCK_INC); if (unlikely(!(val & HEAD_MASK))) { /* overflow. we inadvertently incremented the tail word. * tail's lsb is TICKET_SLOWPATH_FLAG. * Increment inverted this bit, fix it up. * (inc _may_ have messed up tail counter too, * will deal with it after kick.) */ val ^= TICKET_SLOWPATH_FLAG; } if (unlikely(val & TICKET_SLOWPATH_FLAG)) { ...kick the waiting task... val -= TICKET_SLOWPATH_FLAG; if (unlikely(!(val & HEAD_MASK))) { /* overflow. we inadvertently incremented tail word, *and* * TICKET_SLOWPATH_FLAG was set, increment overflowed * that bit too and incremented tail counter. * This means we (inadvertently) taking the lock again! * Oh well. Take it, and unlock it again... */ while (1) { if (READ_ONCE(lock->tickets.head) != TICKET_TAIL(val)) cpu_relax(); } goto unlock_again; } Granted, this looks ugly.
Sasha Levin
2015-Feb-10  13:23 UTC
[PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/10/2015 04:30 AM, Raghavendra K T wrote:>> >> So I think Raghavendra's last version (which hopefully fixes the >> lockup problem that Sasha reported) together with changing that > > V2 did pass the stress, but getting confirmation Sasha would help.I've been running it for the last two days, and didn't see any lockups or other strange behaviour aside from some invalid reads which we expected. Thanks, Sasha
Oleg Nesterov
2015-Feb-10  13:26 UTC
[PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/10, Raghavendra K T wrote:> > On 02/10/2015 06:23 AM, Linus Torvalds wrote: > >> add_smp(&lock->tickets.head, TICKET_LOCK_INC); >> if (READ_ONCE(lock->tickets.tail) & TICKET_SLOWPATH_FLAG) .. >> >> into something like >> >> val = xadd((&lock->ticket.head_tail, TICKET_LOCK_INC << TICKET_SHIFT); >> if (unlikely(val & TICKET_SLOWPATH_FLAG)) ... >> >> would be the right thing to do. Somebody should just check that I got >> that shift right, and that the tail is in the high bytes (head really >> needs to be high to work, if it's in the low byte(s) the xadd would >> overflow from head into tail which would be wrong). > > Unfortunately xadd could result in head overflow as tail is high. > > The other option was repeated cmpxchg which is bad I believe. > Any suggestions?Stupid question... what if we simply move SLOWPATH from .tail to .head? In this case arch_spin_unlock() could do xadd(tickets.head) and check the result In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg the whole .head_tail. Plus obviously more boring changes. This needs a separate patch even _if_ this can work. BTW. If we move "clear slowpath" into "lock" path, then probably trylock should be changed too? Something like below, we just need to clear SLOWPATH before cmpxchg. Oleg. --- x/arch/x86/include/asm/spinlock.h +++ x/arch/x86/include/asm/spinlock.h @@ -109,7 +109,8 @@ static __always_inline int arch_spin_try if (old.tickets.head != (old.tickets.tail & ~TICKET_SLOWPATH_FLAG)) return 0; - new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT); + new.tickets.head = old.tickets.head; + new.tickets.tail = (old.tickets.tail & ~TICKET_SLOWPATH_FLAG) + TICKET_LOCK_INC; /* cmpxchg is a full barrier, so nothing can move before it */ return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
Reasonably Related Threads
- [PATCH] x86 spinlock: Fix memory corruption on completing completions
- [PATCH] x86 spinlock: Fix memory corruption on completing completions
- [PATCH] x86 spinlock: Fix memory corruption on completing completions
- [PATCH] x86 spinlock: Fix memory corruption on completing completions
- [PATCH] x86 spinlock: Fix memory corruption on completing completions