thr3ads.net - llvm dev - [LLVMdev] Proposal for a new LLVM concurrency memory model [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Renato Golin

2010-Apr-26 20:53 UTC

[LLVMdev] Proposal for a new LLVM concurrency memory model

On 26 April 2010 21:09, David Greene <dag at cray.com>
wrote:> Vector atomics are extremely useful on architectures that support them.
>
> I'm not sure we need atomicity across vector elements, so decomposing
> shouldn't be a problem, but I will have to think about it a bit.
What is the semantics for vectorization across atomic vector operations?

Suppose I atomically write in thread 1 and read in thread 2, to a
vector with 64 elements. If I do automatic vectorization, it'd naively
be converted into N operations of 64/N-wide atomically writes and
reads, but not necessarily reading block k on thread 2 would happen
before writing it on thread 1, supposing reads are much faster than
writes.

I suppose one would have to have great care when doing such
transformations, to keep the same semantics. For instance, splitting
in two loops and putting a barrier between them, thus back to the
original design.

cheers,
--renato

http://systemcall.org/

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

David Greene

2010-Apr-27 15:46 UTC

head link

[LLVMdev] Proposal for a new LLVM concurrency memory model

On Monday 26 April 2010 15:53:31 Renato Golin wrote:> On 26 April 2010 21:09, David Greene <dag at cray.com> wrote:
> > Vector atomics are extremely useful on architectures that support
them.
> >
> > I'm not sure we need atomicity across vector elements, so
decomposing
> > shouldn't be a problem, but I will have to think about it a bit.
>
> What is the semantics for vectorization across atomic vector operations?
>
> Suppose I atomically write in thread 1 and read in thread 2, to a
> vector with 64 elements. If I do automatic vectorization, it'd naively
> be converted into N operations of 64/N-wide atomically writes and
> reads, but not necessarily reading block k on thread 2 would happen
> before writing it on thread 1, supposing reads are much faster than
> writes.
>
> I suppose one would have to have great care when doing such
> transformations, to keep the same semantics. For instance, splitting
> in two loops and putting a barrier between them, thus back to the
> original design.
So I think there are at least two cases here.

The first case is the one you outline: a produce-consumer relationship.
In that case we would have to respect atomicity across vector elements,
so that a read in thread 2 would not get some elements with the updates
value of the write in thread 1 and some elements with the old value.

The second case is a pure atomic update: a bunch of thread collaborate
to produce a set of values.  A partial reduction, for example.  A bunch
of threads in a loop atomically operate on a vector, for example
computing a vector sum into it via an atomic add.  After this operation
the code does a barrier sync and continues with the next phase.  In
this case there is no producer-consumer relationship within the loop
(everyone's producing/updating) so we don't need to worry about
respecting
atomicity across elements.

My intuition is that the second case is more important (in the sense of 
computation time) than the first, but I will have to talk to some people here 
more familiar with the common codes than I am.  The first case might be used 
for boundary updates and that kind of thing while the second case is used for 
the meat of the computation.

It shouldn't be very hard for the compiler to detect the second case.
It's a pretty straightforward pattern.  For everything else it would
have to assume case #1.

So perhaps we want two kinds of vector atomic: one that respects
atomicity across elements and one that doesn't.

Of course this only matters when looking at decomposing vector atomics
into scalars.  I think it is probably a better strategy just to not
generate the vector atomics in the first place if the target doesn't support
them.  Then we only need one kind: the one that respects atomicity across 
elements.

                               -Dave

Renato Golin

2010-Apr-27 16:01 UTC

head link

[LLVMdev] Proposal for a new LLVM concurrency memory model

On 27 April 2010 16:46, David Greene <dag at cray.com>
wrote:> It shouldn't be very hard for the compiler to detect the second case.
> It's a pretty straightforward pattern.  For everything else it would
> have to assume case #1.
The only problem is that threads are rarely within language specs, so
the compiler would have to choose one particular flavour of threads or
"find out" what flavour is being used. That would require the compiler
to know all / implement all types, which are not only time consuming
and error prone, but not the compiler's job in the first place.

cheers,
--renato

http://systemcall.org/

Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Apr 2010 - [LLVMdev] Proposal for a new LLVM concurrency memory model

[LLVMdev] Proposal for a new LLVM concurrency memory model

[LLVMdev] Proposal for a new LLVM concurrency memory model

[LLVMdev] Proposal for a new LLVM concurrency memory model

Seemingly Similar Threads