On Fri, Jan 15, 2016 at 10:13:48AM +0100, Peter Zijlstra wrote:> On Fri, Jan 15, 2016 at 09:55:54AM +0100, Peter Zijlstra wrote: > > On Thu, Jan 14, 2016 at 01:29:13PM -0800, Paul E. McKenney wrote: > > > So smp_mb() provides transitivity, as do pairs of smp_store_release() > > > and smp_read_acquire(), > > > > But they provide different grades of transitivity, which is where all > > the confusion lays. > > > > smp_mb() is strongly/globally transitive, all CPUs will agree on the order. > > > > Whereas the RCpc release+acquire is weakly so, only the two cpus > > involved in the handover will agree on the order. > > And the stuff we're confused about is how best to express the difference > and guarantees of these two forms of transitivity and how exactly they > interact.Hoping my memory-barrier.txt patch helps here...> And smp_load_acquire()/smp_store_release() are RCpc because TSO archs > and PPC. the atomic*_{acquire,release}() are RCpc because PPC and > LOCK,UNLOCK are similarly RCpc because of PPC. > > Now we'd like PPC to stick a SYNC in either LOCK or UNLOCK so at least > the locks are RCsc again, but they resist for performance reasons but > waver because they don't want to be the ones finding all the nasty bugs > because they're the only one.I believe that the relevant proverb said something about starving to death between two bales of hay... ;-)> Now the thing I worry about, and still have not had an answer to is if > weakly ordered MIPS will end up being RCsc or RCpc for their locks if > they get implemented with SYNC_ACQUIRE and SYNC_RELEASE instead of the > current SYNC.It would be good to have better clarity on this, no two ways about it. Thanx, Paul
On Fri, Jan 15, 2016 at 09:46:12AM -0800, Paul E. McKenney wrote:> On Fri, Jan 15, 2016 at 10:13:48AM +0100, Peter Zijlstra wrote:> > And the stuff we're confused about is how best to express the difference > > and guarantees of these two forms of transitivity and how exactly they > > interact. > > Hoping my memory-barrier.txt patch helps here...Yes, that seems a good start. But yesterday you raised the 'fun' point of two globally ordered sequences connected by a single local link. And I think I'm still confused on LWSYNC (in the smp_wmb case) when one of the stores looses a conflict, and if that scenario matters. If it does, we should inspect the same case for other barriers.
On Fri, Jan 15, 2016 at 10:27:14PM +0100, Peter Zijlstra wrote:> On Fri, Jan 15, 2016 at 09:46:12AM -0800, Paul E. McKenney wrote: > > On Fri, Jan 15, 2016 at 10:13:48AM +0100, Peter Zijlstra wrote: > > > > And the stuff we're confused about is how best to express the difference > > > and guarantees of these two forms of transitivity and how exactly they > > > interact. > > > > Hoping my memory-barrier.txt patch helps here... > > Yes, that seems a good start. But yesterday you raised the 'fun' point > of two globally ordered sequences connected by a single local link.The conclusion that I am slowly coming to is that litmus tests should not be thought of as linear chains, but rather as cycles. If you think of it as a cycle, then it doesn't matter where the local link is, just how many of them and how they are connected. But I will admit that there are some rather strange litmus tests that challenge this cycle-centric view, for example, the one shown below. It turns out that herd and ppcmem disagree on the outcome. (The Power architects side with ppcmem.)> And I think I'm still confused on LWSYNC (in the smp_wmb case) when one > of the stores looses a conflict, and if that scenario matters. If it > does, we should inspect the same case for other barriers.Indeed. I am still working on how these should be described. My current thought is to be quite conservative on what ordering is actually respected, however, the current task is formalizing how RCU plays with the rest of the memory model. Thanx, Paul ------------------------------------------------------------------------ PPC Overlapping Group-B sets version 4 "" (* When the Group-B sets from two different barriers involve instructions in the same thread, within that thread one set must contain the other. P0 P1 P2 Rx=1 Wy=1 Wz=2 dep. lwsync lwsync Ry=0 Wz=1 Wx=1 Rz=1 assert(!(z=2)) Forbidden by ppcmem, allowed by herd. *) { 0:r1=x; 0:r2=y; 0:r3=z; 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1; 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2; } P0 | P1 | P2 ; lwz r6,0(r1) | stw r4,0(r2) | stw r5,0(r3) ; xor r7,r6,r6 | lwsync | lwsync ; lwzx r7,r7,r2 | stw r4,0(r3) | stw r4,0(r1) ; lwz r8,0(r3) | | ; exists (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)