thr3ads.net - llvm dev - [LLVMdev] Predication on SIMD architectures and LLVM [Oct 2012]

If this information is useful, please help other people find it:
Share via:

dag at cray.com

2012-Oct-23 16:25 UTC

[LLVMdev] Predication on SIMD architectures and LLVM

David Chisnall <David.Chisnall at cl.cam.ac.uk> writes:
> Perhaps I am missing something, but isn't a predicated instruction
> effectively an single-instruction version of an arithmetic operation
> followed by a select?  
No, it is not.  Among other things, predication is used to avoid traps.
A vector select is an entirely different operation.
> As we can already represent this in the IR, and already match other
> predicated instructions (e.g. on ARM) to this pattern, what is gained
> by adding predication directly to the IR?
Predicated loads, stores, divides, sqrts, etc. are essential for
correctly vectorizing loops with conditionals due to safety concerns.
If the loop body has no dangerous operations, then yes, a vector select
can be used without problems but it is often slower than predication.
Usually the hardware can optimize instructions with certain values of
predicates.

                              -David

David Chisnall

2012-Oct-23 16:43 UTC

head link

[LLVMdev] Predication on SIMD architectures and LLVM

I am talking about the LLVM select instruction, not a vector select:

http://llvm.org/docs/LangRef.html#i_select

In any non-trapping case, an arithmetic operation (or sequence of operations)
followed by a select is semantically equivalent to the predicated version.  This
is exactly how predicated instructions on ARM are handled.  For example, the
following IR:

  %cmp = icmp sgt i32 %c, %b
  %add = add nsw i32 %b, 1
  %add1 = add nsw i32 %c, 2
  %retval.0 = select i1 %cmp, i32 %add, i32 %add1

Becomes this ARM assembly:

	add	r2, r1, #2
	cmp	r1, r0
	addgt	r2, r0, #1
	mov	r0, r2

An equally valid form would be:

	cmp	r1, r0
	addle	r2, r1, #2
	addgt	r2, r0, #1
	mov	r0, r2

Separating the select, which embodies the predication, from the operations
allows more choice in terms of the final representation.  Unless the load or
store is volatile, the compiler is free to elide it if its result is not used,
and is most definitely free to fold it into a predicated load.  The same is
obviously true of any side-effect-free operations, such as divides and square
roots: folding them into predicated instructions is no less invalid than
conditionally executing them in branches or removing them entirely via dead code
elimination.

Just because the generated machine code must contain predicated instructions
most definitely does mean that the LLVM IR must contain it, or even that we
would gain anything in terms of expressive power by permitting it.

David

On 23 Oct 2012, at 17:25, <dag at cray.com> wrote:
> David Chisnall <David.Chisnall at cl.cam.ac.uk> writes:
> 
>> Perhaps I am missing something, but isn't a predicated instruction
>> effectively an single-instruction version of an arithmetic operation
>> followed by a select?  
> 
> No, it is not.  Among other things, predication is used to avoid traps.
> A vector select is an entirely different operation.
> 
>> As we can already represent this in the IR, and already match other
>> predicated instructions (e.g. on ARM) to this pattern, what is gained
>> by adding predication directly to the IR?
> 
> Predicated loads, stores, divides, sqrts, etc. are essential for
> correctly vectorizing loops with conditionals due to safety concerns.
> If the loop body has no dangerous operations, then yes, a vector select
> can be used without problems but it is often slower than predication.
> Usually the hardware can optimize instructions with certain values of
> predicates.
> 
>                              -David

dag at cray.com

2012-Oct-24 17:24 UTC

head link

[LLVMdev] Predication on SIMD architectures and LLVM

David Chisnall <David.Chisnall at cl.cam.ac.uk> writes:
> I am talking about the LLVM select instruction, not a vector select:
>
> http://llvm.org/docs/LangRef.html#i_select
That is what I mean by a vector select.
> In any non-trapping case, an arithmetic operation (or sequence of
> operations) followed by a select is semantically equivalent to the
> predicated version.  
Yes.
> Separating the select, which embodies the predication, from the
> operations allows more choice in terms of the final representation.
Sure.
> Just because the generated machine code must contain predicated
> instructions most definitely does mean that the LLVM IR must contain
> it, or even that we would gain anything in terms of expressive power
> by permitting it.
Certainly such transformations *can* be done, but is it the most
efficient/best way to do things?  I wonder how many different passes of
"select to predication" we will end up having, one per target.

                           -David

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Oct 2012 - [LLVMdev] Predication on SIMD architectures and LLVM

[LLVMdev] Predication on SIMD architectures and LLVM

[LLVMdev] Predication on SIMD architectures and LLVM

[LLVMdev] Predication on SIMD architectures and LLVM

Possibly Parallel Threads