Renato Golin
2010-Apr-26  20:53 UTC
[LLVMdev] Proposal for a new LLVM concurrency memory model
On 26 April 2010 21:09, David Greene <dag at cray.com> wrote:> Vector atomics are extremely useful on architectures that support them. > > I'm not sure we need atomicity across vector elements, so decomposing > shouldn't be a problem, but I will have to think about it a bit.What is the semantics for vectorization across atomic vector operations? Suppose I atomically write in thread 1 and read in thread 2, to a vector with 64 elements. If I do automatic vectorization, it'd naively be converted into N operations of 64/N-wide atomically writes and reads, but not necessarily reading block k on thread 2 would happen before writing it on thread 1, supposing reads are much faster than writes. I suppose one would have to have great care when doing such transformations, to keep the same semantics. For instance, splitting in two loops and putting a barrier between them, thus back to the original design. cheers, --renato http://systemcall.org/ Reclaim your digital rights, eliminate DRM, learn more at http://www.defectivebydesign.org/what_is_drm
David Greene
2010-Apr-27  15:46 UTC
[LLVMdev] Proposal for a new LLVM concurrency memory model
On Monday 26 April 2010 15:53:31 Renato Golin wrote:> On 26 April 2010 21:09, David Greene <dag at cray.com> wrote: > > Vector atomics are extremely useful on architectures that support them. > > > > I'm not sure we need atomicity across vector elements, so decomposing > > shouldn't be a problem, but I will have to think about it a bit. > > What is the semantics for vectorization across atomic vector operations? > > Suppose I atomically write in thread 1 and read in thread 2, to a > vector with 64 elements. If I do automatic vectorization, it'd naively > be converted into N operations of 64/N-wide atomically writes and > reads, but not necessarily reading block k on thread 2 would happen > before writing it on thread 1, supposing reads are much faster than > writes. > > I suppose one would have to have great care when doing such > transformations, to keep the same semantics. For instance, splitting > in two loops and putting a barrier between them, thus back to the > original design.So I think there are at least two cases here. The first case is the one you outline: a produce-consumer relationship. In that case we would have to respect atomicity across vector elements, so that a read in thread 2 would not get some elements with the updates value of the write in thread 1 and some elements with the old value. The second case is a pure atomic update: a bunch of thread collaborate to produce a set of values. A partial reduction, for example. A bunch of threads in a loop atomically operate on a vector, for example computing a vector sum into it via an atomic add. After this operation the code does a barrier sync and continues with the next phase. In this case there is no producer-consumer relationship within the loop (everyone's producing/updating) so we don't need to worry about respecting atomicity across elements. My intuition is that the second case is more important (in the sense of computation time) than the first, but I will have to talk to some people here more familiar with the common codes than I am. The first case might be used for boundary updates and that kind of thing while the second case is used for the meat of the computation. It shouldn't be very hard for the compiler to detect the second case. It's a pretty straightforward pattern. For everything else it would have to assume case #1. So perhaps we want two kinds of vector atomic: one that respects atomicity across elements and one that doesn't. Of course this only matters when looking at decomposing vector atomics into scalars. I think it is probably a better strategy just to not generate the vector atomics in the first place if the target doesn't support them. Then we only need one kind: the one that respects atomicity across elements. -Dave
Renato Golin
2010-Apr-27  16:01 UTC
[LLVMdev] Proposal for a new LLVM concurrency memory model
On 27 April 2010 16:46, David Greene <dag at cray.com> wrote:> It shouldn't be very hard for the compiler to detect the second case. > It's a pretty straightforward pattern. For everything else it would > have to assume case #1.The only problem is that threads are rarely within language specs, so the compiler would have to choose one particular flavour of threads or "find out" what flavour is being used. That would require the compiler to know all / implement all types, which are not only time consuming and error prone, but not the compiler's job in the first place. cheers, --renato http://systemcall.org/ Reclaim your digital rights, eliminate DRM, learn more at http://www.defectivebydesign.org/what_is_drm
Seemingly Similar Threads
- [LLVMdev] Proposal for a new LLVM concurrency memory model
- [LLVMdev] Proposal for a new LLVM concurrency memory model
- [LLVMdev] Proposal for a new LLVM concurrency memory model
- [LLVMdev] Proposal for a new LLVM concurrency memory model
- [LLVMdev] Proposal for a new LLVM concurrency memory model