thr3ads.net - Linux Virtualization - [v3,11/41] mips: reuse asm-generic/barrier.h [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Paul E. McKenney

2016-Jan-26 06:03 UTC

[v3,11/41] mips: reuse asm-generic/barrier.h

On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon
wrote:> On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > On Fri, Jan 15, 2016 at 10:27:14PM +0100, Peter Zijlstra wrote:
> > > On Fri, Jan 15, 2016 at 09:46:12AM -0800, Paul E. McKenney wrote:
> > > > On Fri, Jan 15, 2016 at 10:13:48AM +0100, Peter Zijlstra
wrote:
> > > 
> > > > > And the stuff we're confused about is how best to
express the difference
> > > > > and guarantees of these two forms of transitivity and
how exactly they
> > > > > interact.
> > > > 
> > > > Hoping my memory-barrier.txt patch helps here...
> > > 
> > > Yes, that seems a good start. But yesterday you raised the
'fun' point
> > > of two globally ordered sequences connected by a single local
link.
> > 
> > The conclusion that I am slowly coming to is that litmus tests should
> > not be thought of as linear chains, but rather as cycles.  If you
think
> > of it as a cycle, then it doesn't matter where the local link is,
just
> > how many of them and how they are connected.
> 
> Do you have some examples of this? I'm struggling to make it work in my
> mind, or are you talking specifically in the context of the kernel
> memory model?
Now that you mention it, maybe it would be best to keep the transitive
and non-transitive separate for the time being anyway.  Just because it
might be possible to deal with does not necessarily mean that we should
be encouraging it.  ;-)
> > But I will admit that there are some rather strange litmus tests that
> > challenge this cycle-centric view, for example, the one shown below.
> > It turns out that herd and ppcmem disagree on the outcome.  (The Power
> > architects side with ppcmem.)
> > 
> > > And I think I'm still confused on LWSYNC (in the smp_wmb
case) when one
> > > of the stores looses a conflict, and if that scenario matters. If
it
> > > does, we should inspect the same case for other barriers.
> > 
> > Indeed.  I am still working on how these should be described.  My
> > current thought is to be quite conservative on what ordering is
> > actually respected, however, the current task is formalizing how
> > RCU plays with the rest of the memory model.
> > 
> > 							Thanx, Paul
> > 
> >
------------------------------------------------------------------------
> > 
> > PPC Overlapping Group-B sets version 4
> > ""
> > (* When the Group-B sets from two different barriers involve
instructions in
> >    the same thread, within that thread one set must contain the other.
> > 
> > 	P0	P1	P2
> > 	Rx=1	Wy=1	Wz=2
> > 	dep.	lwsync	lwsync
> > 	Ry=0	Wz=1	Wx=1
> > 	Rz=1
> > 
> > 	assert(!(z=2))
> > 
> >    Forbidden by ppcmem, allowed by herd.
> > *)
> > {
> > 0:r1=x; 0:r2=y; 0:r3=z;
> > 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> > 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> > }
> >  P0		| P1		| P2		;
> >  lwz r6,0(r1)	| stw r4,0(r2)	| stw r5,0(r3)	;
> >  xor r7,r6,r6	| lwsync	| lwsync	;
> >  lwzx r7,r7,r2	| stw r4,0(r3)	| stw r4,0(r1)	;
> >  lwz r8,0(r3)	|		|		;
> > 
> > exists
> > (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
> 
> That really hurts. Assuming that the "assert(!(z=2))" is actually
there
> to constrain the coherence order of z to be {0->1->2}, then I think
that
> this test is forbidden on arm using dmb instead of lwsync. That said, I
> also don't think the Rz=1 in P0 changes anything.
What about the smp_wmb() variant of dmb that orders only stores?
> The double negatives don't help here! (it is forbidden to guarantee
that
> z is not always 2).
Yes, this is a weird one, and I don't know of any use of it.

							Thanx, Paul

Peter Zijlstra

2016-Jan-26 10:19 UTC

head link

[v3,11/41] mips: reuse asm-generic/barrier.h

On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney
wrote:> On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > On Fri, Jan 15, 2016 at 10:27:14PM +0100, Peter Zijlstra wrote:
> > > > Yes, that seems a good start. But yesterday you raised the
'fun' point
> > > > of two globally ordered sequences connected by a single
local link.
> > > 
> > > The conclusion that I am slowly coming to is that litmus tests
should
> > > not be thought of as linear chains, but rather as cycles.  If you
think
> > > of it as a cycle, then it doesn't matter where the local link
is, just
> > > how many of them and how they are connected.
> > 
> > Do you have some examples of this? I'm struggling to make it work
in my
> > mind, or are you talking specifically in the context of the kernel
> > memory model?
> 
> Now that you mention it, maybe it would be best to keep the transitive
> and non-transitive separate for the time being anyway.  Just because it
> might be possible to deal with does not necessarily mean that we should
> be encouraging it.  ;-)
So isn't smp_mb__after_unlock_lock() exactly such a scenario? And would
not someone trying to implement RCsc locks using locally transitive
RELEASE/ACQUIRE operations need exactly this stuff?

That is, I am afraid we need to cover the mix of local and global
transitive operations at least in overview.

Will Deacon

2016-Jan-26 12:16 UTC

head link

[v3,11/41] mips: reuse asm-generic/barrier.h

On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney
wrote:> On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > PPC Overlapping Group-B sets version 4
> > > ""
> > > (* When the Group-B sets from two different barriers involve
instructions in
> > >    the same thread, within that thread one set must contain the
other.
> > > 
> > > 	P0	P1	P2
> > > 	Rx=1	Wy=1	Wz=2
> > > 	dep.	lwsync	lwsync
> > > 	Ry=0	Wz=1	Wx=1
> > > 	Rz=1
> > > 
> > > 	assert(!(z=2))
> > > 
> > >    Forbidden by ppcmem, allowed by herd.
> > > *)
> > > {
> > > 0:r1=x; 0:r2=y; 0:r3=z;
> > > 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> > > 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> > > }
> > >  P0		| P1		| P2		;
> > >  lwz r6,0(r1)	| stw r4,0(r2)	| stw r5,0(r3)	;
> > >  xor r7,r6,r6	| lwsync	| lwsync	;
> > >  lwzx r7,r7,r2	| stw r4,0(r3)	| stw r4,0(r1)	;
> > >  lwz r8,0(r3)	|		|		;
> > > 
> > > exists
> > > (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
> > 
> > That really hurts. Assuming that the "assert(!(z=2))" is
actually there
> > to constrain the coherence order of z to be {0->1->2}, then I
think that
> > this test is forbidden on arm using dmb instead of lwsync. That said,
I
> > also don't think the Rz=1 in P0 changes anything.
> 
> What about the smp_wmb() variant of dmb that orders only stores?
Tricky, but I think it still works out if the coherence order of z is as
I described above. The line of reasoning is weird though -- I ended up
considering the two cases where P0 reads z before and after it reads x
and what that means for the read of y.

Will

Boqun Feng

2016-Jan-26 14:35 UTC

head link

[v3,11/41] mips: reuse asm-generic/barrier.h

Hi Will,

On Tue, Jan 26, 2016 at 12:16:09PM +0000, Will Deacon
wrote:> On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > > PPC Overlapping Group-B sets version 4
> > > > ""
> > > > (* When the Group-B sets from two different barriers involve
instructions in
> > > >    the same thread, within that thread one set must contain
the other.
> > > > 
> > > > 	P0	P1	P2
> > > > 	Rx=1	Wy=1	Wz=2
> > > > 	dep.	lwsync	lwsync
> > > > 	Ry=0	Wz=1	Wx=1
> > > > 	Rz=1
> > > > 
> > > > 	assert(!(z=2))
> > > > 
> > > >    Forbidden by ppcmem, allowed by herd.
> > > > *)
> > > > {
> > > > 0:r1=x; 0:r2=y; 0:r3=z;
> > > > 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> > > > 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> > > > }
> > > >  P0		| P1		| P2		;
> > > >  lwz r6,0(r1)	| stw r4,0(r2)	| stw r5,0(r3)	;
> > > >  xor r7,r6,r6	| lwsync	| lwsync	;
> > > >  lwzx r7,r7,r2	| stw r4,0(r3)	| stw r4,0(r1)	;
> > > >  lwz r8,0(r3)	|		|		;
> > > > 
> > > > exists
> > > > (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
> > > 
> > > That really hurts. Assuming that the "assert(!(z=2))"
is actually there
> > > to constrain the coherence order of z to be {0->1->2}, then
I think that
> > > this test is forbidden on arm using dmb instead of lwsync. That
said, I
> > > also don't think the Rz=1 in P0 changes anything.
> > 
> > What about the smp_wmb() variant of dmb that orders only stores?
> 
> Tricky, but I think it still works out if the coherence order of z is as
> I described above. The line of reasoning is weird though -- I ended up
> considering the two cases where P0 reads z before and after it reads x                                             ^^^^^^^^^^^^^^^
Because of the fact that two reads on the same processors can't be
executed simultaneously? I feel like this is exactly something herd
missed.
> and what that means for the read of y.
> 
And the reasoning on PPC is similar, so looks like the read of z on P0
is a necessary condition for the exists clause to be forbidden.

Regards,
Boqun
> Will-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20160126/233e96e6/attachment.sig>

Paul E. McKenney

2016-Jan-26 19:58 UTC

head link

[v3,11/41] mips: reuse asm-generic/barrier.h

On Tue, Jan 26, 2016 at 12:16:09PM +0000, Will Deacon
wrote:> On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > > PPC Overlapping Group-B sets version 4
> > > > ""
> > > > (* When the Group-B sets from two different barriers involve
instructions in
> > > >    the same thread, within that thread one set must contain
the other.
> > > > 
> > > > 	P0	P1	P2
> > > > 	Rx=1	Wy=1	Wz=2
> > > > 	dep.	lwsync	lwsync
> > > > 	Ry=0	Wz=1	Wx=1
> > > > 	Rz=1
> > > > 
> > > > 	assert(!(z=2))
> > > > 
> > > >    Forbidden by ppcmem, allowed by herd.
> > > > *)
> > > > {
> > > > 0:r1=x; 0:r2=y; 0:r3=z;
> > > > 1:r1=x; 1:r2=y; 1:r3=z; 1:r4=1;
> > > > 2:r1=x; 2:r2=y; 2:r3=z; 2:r4=1; 2:r5=2;
> > > > }
> > > >  P0		| P1		| P2		;
> > > >  lwz r6,0(r1)	| stw r4,0(r2)	| stw r5,0(r3)	;
> > > >  xor r7,r6,r6	| lwsync	| lwsync	;
> > > >  lwzx r7,r7,r2	| stw r4,0(r3)	| stw r4,0(r1)	;
> > > >  lwz r8,0(r3)	|		|		;
> > > > 
> > > > exists
> > > > (z=2 /\ 0:r6=1 /\ 0:r7=0 /\ 0:r8=1)
> > > 
> > > That really hurts. Assuming that the "assert(!(z=2))"
is actually there
> > > to constrain the coherence order of z to be {0->1->2}, then
I think that
> > > this test is forbidden on arm using dmb instead of lwsync. That
said, I
> > > also don't think the Rz=1 in P0 changes anything.
> > 
> > What about the smp_wmb() variant of dmb that orders only stores?
> 
> Tricky, but I think it still works out if the coherence order of z is as
> I described above. The line of reasoning is weird though -- I ended up
> considering the two cases where P0 reads z before and after it reads x
> and what that means for the read of y.
By "works out" you mean that ARM prohibits the outcome?

BTW, I never have seen a real-world use for this case.  At the moment
it is mostly a cautionary tale about memory-model corner cases and
tools.

							Thanx, Paul

Paul E. McKenney

2016-Jan-26 20:13 UTC

head link

[v3,11/41] mips: reuse asm-generic/barrier.h

On Tue, Jan 26, 2016 at 11:19:27AM +0100, Peter Zijlstra
wrote:> On Mon, Jan 25, 2016 at 10:03:22PM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 25, 2016 at 04:42:43PM +0000, Will Deacon wrote:
> > > On Fri, Jan 15, 2016 at 01:58:53PM -0800, Paul E. McKenney wrote:
> > > > On Fri, Jan 15, 2016 at 10:27:14PM +0100, Peter Zijlstra
wrote:
> 
> > > > > Yes, that seems a good start. But yesterday you raised
the 'fun' point
> > > > > of two globally ordered sequences connected by a single
local link.
> > > > 
> > > > The conclusion that I am slowly coming to is that litmus
tests should
> > > > not be thought of as linear chains, but rather as cycles. 
If you think
> > > > of it as a cycle, then it doesn't matter where the local
link is, just
> > > > how many of them and how they are connected.
> > > 
> > > Do you have some examples of this? I'm struggling to make it
work in my
> > > mind, or are you talking specifically in the context of the
kernel
> > > memory model?
> > 
> > Now that you mention it, maybe it would be best to keep the transitive
> > and non-transitive separate for the time being anyway.  Just because
it
> > might be possible to deal with does not necessarily mean that we
should
> > be encouraging it.  ;-)
> 
> So isn't smp_mb__after_unlock_lock() exactly such a scenario? And would
> not someone trying to implement RCsc locks using locally transitive
> RELEASE/ACQUIRE operations need exactly this stuff?
> 
> That is, I am afraid we need to cover the mix of local and global
> transitive operations at least in overview.
True, but we haven't gotten to locking yet.  That said, I would argue
that smp_mb__after_unlock_lock() upgrades locks to transitive, and
thus would not be an exception to the "no combining transitive and
non-transitive steps in cycles" rule.

							Thanx, Paul

Possibly Parallel Threads

Search for more reasonably related threads

Linux Virtualization - Jan 2016 - [v3,11/41] mips: reuse asm-generic/barrier.h

[v3,11/41] mips: reuse asm-generic/barrier.h

[v3,11/41] mips: reuse asm-generic/barrier.h

[v3,11/41] mips: reuse asm-generic/barrier.h

[v3,11/41] mips: reuse asm-generic/barrier.h

[v3,11/41] mips: reuse asm-generic/barrier.h

[v3,11/41] mips: reuse asm-generic/barrier.h

Possibly Parallel Threads