thr3ads.net - llvm dev - [LLVMdev] Atomic Operation and Synchronization Proposal v2 [Jul 2007]

If this information is useful, please help other people find it:
Share via:

David Greene

2007-Jul-12 15:06 UTC

[LLVMdev] Atomic Operation and Synchronization Proposal v2

On Thursday 12 July 2007 07:23, Torvald Riegel wrote:
> > The single instruction constraints can, at their most flexible,
constrain
> > any set of possible pairings of loads from memory and stores to memory
>
> I'm not sure about this, but can we get issues due to
"special" kinds of
> data transfers (such as vector stuff, DMA, ...?). Memcpy implementations
> could be a one thing to look at.
> This kind of breaks down to how universal you want the memory model to be.
Right.  For example, the Cray X1 has a much richer set of memory ordering 
instructions than anything on the commodity micros:

http://tinyurl.com/3agjjn

The memory ordering intrinsics in the current llvm proposal can't take 
advantage of them because they are too coarse-grained.

Now, I don't expect we'll see an llvm-based X1 code generator, but
looking at
what the HPC vendors are doing in this area will go a long way toward 
informing the kind of operations we may want to include in llvm.  The trend is 
for vendors to include ever more finely targeted semantics to allow scaling to 
machines with millions of cores.

If we can incrementally refine the size of the memory ordering hammers, I'm
ok with that.  If it's simply a matter of adding finer-grained intrinsics 
later, that's cool.  But I don't want to get us into a situation where
llvm
requires stricter memory ordering than is strictly necessary and we can't
get
out from under the stone.

                                                      -Dave

Dan Gohman

2007-Jul-12 15:56 UTC

head link

[LLVMdev] Atomic Operation and Synchronization Proposal v2

On Thu, Jul 12, 2007 at 10:06:04AM -0500, David Greene
wrote:> On Thursday 12 July 2007 07:23, Torvald Riegel wrote:
> 
> > > The single instruction constraints can, at their most flexible,
constrain
> > > any set of possible pairings of loads from memory and stores to
memory
> >
> > I'm not sure about this, but can we get issues due to
"special" kinds of
> > data transfers (such as vector stuff, DMA, ...?). Memcpy
implementations
> > could be a one thing to look at.
> > This kind of breaks down to how universal you want the memory model to
be.
> 
> Right.  For example, the Cray X1 has a much richer set of memory ordering 
> instructions than anything on the commodity micros:
> 
> http://tinyurl.com/3agjjn
> 
> The memory ordering intrinsics in the current llvm proposal can't take 
> advantage of them because they are too coarse-grained.
I guess the descriptions on that page are, heh, a little terse ;-). The
Cray X1 has a dimension of synchronization that isn't covered in this
proposal, and that's the set of observers need to observe the ordering.
For example you can synchronize a team of streams in a multi-streaming
processor without requiring that the ordering of memory operations be
observed by the entire system. That's what motivates most of the variety
in that list.

There's one other specific aspect I'd like to point out here.
There's an
"acquire" which orders prior *scalar* loads with *all* subsequent
memory
accesses, and a "release" which orders *all* prior accesses with
subsequent
*scalar* stores. The Cray X1's interest in distinguishing scalar accesses
from vector accesses is specific to its architecture, but in general, it
is another case that motivates having more granularity than just "all
loads" and "all stores". 

Overall though, I'm quite happy to see that the newest revision of the
proposal has switched from LLVM instructions to LLVM intrinsics. That
will make it easier to experiment with extensions in the future. And
having the string "atomic" right there in the names of each operation
is very much appreciated :-).

Dan

-- 
Dan Gohman, Cray Inc.

Chandler Carruth

2007-Jul-12 17:59 UTC

head link

[LLVMdev] Atomic Operation and Synchronization Proposal v2

On 7/12/07, David Greene <dag at cray.com> wrote:> On Thursday 12 July 2007 07:23, Torvald Riegel wrote:
>
> > > The single instruction constraints can, at their most flexible,
constrain
> > > any set of possible pairings of loads from memory and stores to
memory
> >
> > I'm not sure about this, but can we get issues due to
"special" kinds of
> > data transfers (such as vector stuff, DMA, ...?). Memcpy
implementations
> > could be a one thing to look at.
> > This kind of breaks down to how universal you want the memory model to
be.
>
> Right.  For example, the Cray X1 has a much richer set of memory ordering
> instructions than anything on the commodity micros:
>
> http://tinyurl.com/3agjjn
Thanks for this link! Very interesting to see an architecture which
pays much more attention to its memory ordering.
> The memory ordering intrinsics in the current llvm proposal can't take
> advantage of them because they are too coarse-grained.
>From what I can clean, this coarseness comes in two flavors -- globalv. local memory access, and type-based granularities. Is this a
correct interpretation? (I'm clearly not going to be an expert on the
X1. ;])
>
> Now, I don't expect we'll see an llvm-based X1 code generator, but
looking at
> what the HPC vendors are doing in this area will go a long way toward
> informing the kind of operations we may want to include in llvm.  The trend
is
> for vendors to include ever more finely targeted semantics to allow scaling
to
> machines with millions of cores.
Absolutely! Like I said, its great to see this kind of information. A
few points about the current proposal:

1) It currently only deals with integers in order to make it simple to
implement, and representable across all architectures. While this is
limiting, I think it remains a good starting point, and shouldn't
cause any problems for later expansion to more type-aware
interpretations.

2) The largest assumption made is that all memory is just "memory".
After that, the most fine grained interpretation of barriers available
was chosen (note that only SPARC can do all the various
combinations... most only use one big fence...). The only major thing
I can see that would increase this granularity is to treat different
types differently, or treat them as going into different parts of
"memory"? Really not sure here, but it definitely is something to look
into. However, I think this may require a much later proposal when the
hardware is actively being used at this level, and we can try and find
a more finegrained way of targetting all the available architectures.
For the time being, it seems that the current proposal hits all the
architectures very neatly.

> If we can incrementally refine the size of the memory ordering hammers,
I'm
> ok with that.  If it's simply a matter of adding finer-grained
intrinsics
> later, that's cool.  But I don't want to get us into a situation
where llvm
> requires stricter memory ordering than is strictly necessary and we
can't get
> out from under the stone.
With the current version you can specify exactly what ordering you
desire. The only thing ignored is the type of the various loads and
stores. I think adding that level of granularity to the existing
highly granular pairing selection would be a smooth incremental
update. Is there another update you see needed that would be less
smooth?

Again, thanks for the information on the X1's memory architecture,
very interesting... I'm going to try and get into it a bit more in a
response to Dan Gohman's email below... =]

-Chandler
>
>                                                       -Dave
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Chandler Carruth

2007-Jul-12 18:08 UTC

head link

[LLVMdev] Atomic Operation and Synchronization Proposal v2

On 7/12/07, Dan Gohman <djg at cray.com> wrote:> On Thu, Jul 12, 2007 at 10:06:04AM -0500, David Greene wrote:
> > On Thursday 12 July 2007 07:23, Torvald Riegel wrote:
> >
> > > > The single instruction constraints can, at their most
flexible, constrain
> > > > any set of possible pairings of loads from memory and stores
to memory
> > >
> > > I'm not sure about this, but can we get issues due to
"special" kinds of
> > > data transfers (such as vector stuff, DMA, ...?). Memcpy
implementations
> > > could be a one thing to look at.
> > > This kind of breaks down to how universal you want the memory
model to be.
> >
> > Right.  For example, the Cray X1 has a much richer set of memory
ordering
> > instructions than anything on the commodity micros:
> >
> > http://tinyurl.com/3agjjn
> >
> > The memory ordering intrinsics in the current llvm proposal can't
take
> > advantage of them because they are too coarse-grained.
>
> I guess the descriptions on that page are, heh, a little terse ;-).
A bit. ;] I was glad to see your clarification.
> The
> Cray X1 has a dimension of synchronization that isn't covered in this
> proposal, and that's the set of observers need to observe the ordering.
> For example you can synchronize a team of streams in a multi-streaming
> processor without requiring that the ordering of memory operations be
> observed by the entire system. That's what motivates most of the
variety
> in that list.
This is fascinating to me, personally. I don't know how reasonable it
is to implement directly in LLVM, however, could a codegen for the X1
in theory establish if the "shared memory" was part of a stream in a
multi-streaming processor, and use those local synchronization
routines? I'm not sure how reasonable this is. Alternatively, to
target this specific of an architecture, perhaps the LLVM code could
be annotated to show where it is operating on streams, versus across
processors, and allow that to guide the codegen decision as to which
type of synchronization to utilize. As LLVM doesn't really understand
the parallel implementation the code is running on, it seems like it
might be impossible to build this into LLVM without it being
X1-type-system specific... but perhaps you have better ideas how to do
such things from working on it for some time?
>
> There's one other specific aspect I'd like to point out here.
There's an
> "acquire" which orders prior *scalar* loads with *all* subsequent
memory
> accesses, and a "release" which orders *all* prior accesses with
subsequent
> *scalar* stores. The Cray X1's interest in distinguishing scalar
accesses
> from vector accesses is specific to its architecture, but in general, it
> is another case that motivates having more granularity than just "all
> loads" and "all stores".
This clarifies some of those instructions. Here is my thought on how
to fit this behavior in with the current proposal:

You're still ordering load-store pairings, there is juts the added
dimensionality of types. This seems like an easy extension to the
existing proposal to combine the load and store pairings with a type
dimension to achieve finer-grained control. Does this make sense as an
incremental step from your end with much more experience comparing
your hardware to LLVM's IR?
>
> Overall though, I'm quite happy to see that the newest revision of the
> proposal has switched from LLVM instructions to LLVM intrinsics. That
> will make it easier to experiment with extensions in the future. And
> having the string "atomic" right there in the names of each
operation
> is very much appreciated :-).
The atomic in the name is nice. It does make the syntax a bit less
elegant, but it'll get the ball rolling faster, and thats far more
important! Thanks for the input, and I really love the X1 example for
a radically different memory model from the architectures LLVM is
currently targeting.

-Chandler Carruth

>
> Dan
>
> --
> Dan Gohman, Cray Inc.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - Jul 2007 - [LLVMdev] Atomic Operation and Synchronization Proposal v2

[LLVMdev] Atomic Operation and Synchronization Proposal v2

[LLVMdev] Atomic Operation and Synchronization Proposal v2

[LLVMdev] Atomic Operation and Synchronization Proposal v2

[LLVMdev] Atomic Operation and Synchronization Proposal v2

Maybe Matching Threads