dag at cray.com
2012-Oct-23 16:25 UTC
[LLVMdev] Predication on SIMD architectures and LLVM
David Chisnall <David.Chisnall at cl.cam.ac.uk> writes:> Perhaps I am missing something, but isn't a predicated instruction > effectively an single-instruction version of an arithmetic operation > followed by a select?No, it is not. Among other things, predication is used to avoid traps. A vector select is an entirely different operation.> As we can already represent this in the IR, and already match other > predicated instructions (e.g. on ARM) to this pattern, what is gained > by adding predication directly to the IR?Predicated loads, stores, divides, sqrts, etc. are essential for correctly vectorizing loops with conditionals due to safety concerns. If the loop body has no dangerous operations, then yes, a vector select can be used without problems but it is often slower than predication. Usually the hardware can optimize instructions with certain values of predicates. -David
I am talking about the LLVM select instruction, not a vector select: http://llvm.org/docs/LangRef.html#i_select In any non-trapping case, an arithmetic operation (or sequence of operations) followed by a select is semantically equivalent to the predicated version. This is exactly how predicated instructions on ARM are handled. For example, the following IR: %cmp = icmp sgt i32 %c, %b %add = add nsw i32 %b, 1 %add1 = add nsw i32 %c, 2 %retval.0 = select i1 %cmp, i32 %add, i32 %add1 Becomes this ARM assembly: add r2, r1, #2 cmp r1, r0 addgt r2, r0, #1 mov r0, r2 An equally valid form would be: cmp r1, r0 addle r2, r1, #2 addgt r2, r0, #1 mov r0, r2 Separating the select, which embodies the predication, from the operations allows more choice in terms of the final representation. Unless the load or store is volatile, the compiler is free to elide it if its result is not used, and is most definitely free to fold it into a predicated load. The same is obviously true of any side-effect-free operations, such as divides and square roots: folding them into predicated instructions is no less invalid than conditionally executing them in branches or removing them entirely via dead code elimination. Just because the generated machine code must contain predicated instructions most definitely does mean that the LLVM IR must contain it, or even that we would gain anything in terms of expressive power by permitting it. David On 23 Oct 2012, at 17:25, <dag at cray.com> wrote:> David Chisnall <David.Chisnall at cl.cam.ac.uk> writes: > >> Perhaps I am missing something, but isn't a predicated instruction >> effectively an single-instruction version of an arithmetic operation >> followed by a select? > > No, it is not. Among other things, predication is used to avoid traps. > A vector select is an entirely different operation. > >> As we can already represent this in the IR, and already match other >> predicated instructions (e.g. on ARM) to this pattern, what is gained >> by adding predication directly to the IR? > > Predicated loads, stores, divides, sqrts, etc. are essential for > correctly vectorizing loops with conditionals due to safety concerns. > If the loop body has no dangerous operations, then yes, a vector select > can be used without problems but it is often slower than predication. > Usually the hardware can optimize instructions with certain values of > predicates. > > -David
dag at cray.com
2012-Oct-24 17:24 UTC
[LLVMdev] Predication on SIMD architectures and LLVM
David Chisnall <David.Chisnall at cl.cam.ac.uk> writes:> I am talking about the LLVM select instruction, not a vector select: > > http://llvm.org/docs/LangRef.html#i_selectThat is what I mean by a vector select.> In any non-trapping case, an arithmetic operation (or sequence of > operations) followed by a select is semantically equivalent to the > predicated version.Yes.> Separating the select, which embodies the predication, from the > operations allows more choice in terms of the final representation.Sure.> Just because the generated machine code must contain predicated > instructions most definitely does mean that the LLVM IR must contain > it, or even that we would gain anything in terms of expressive power > by permitting it.Certainly such transformations *can* be done, but is it the most efficient/best way to do things? I wonder how many different passes of "select to predication" we will end up having, one per target. -David
Apparently Analagous Threads
- [LLVMdev] Predication on SIMD architectures and LLVM
- [LLVMdev] Predication on SIMD architectures and LLVM
- [LLVMdev] Predication on SIMD architectures and LLVM
- [LLVMdev] Predication on SIMD architectures and LLVM
- [LLVMdev] Predication on SIMD architectures and LLVM