thr3ads.net - llvm dev - [LLVMdev] NEON vector instructions and the fast math IR flags [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Renato Golin

2013-Jun-07 06:58 UTC

[LLVMdev] NEON vector instructions and the fast math IR flags

On 7 June 2013 07:05, Owen Anderson <resistor at mac.com> wrote:
> Darwin uses NEON for floating point, but does *not* (and should not).
> globally enable fast math flags.  Use of NEON for FP needs to remain
> achievable without globally setting the fast math flags.  Fast math may
> imply reasonably imply NEON, but the opposite direction is not accurate.
>
> That said, I don't think anyone would object to making VFP codegen
> available under non-Darwin triples.  It's just a matter of making it
happen.
>
Hi Owen,

ARMSubtarget::resetSubtargetFeatures(StringRef CPU, StringRef FS) has a
check to see if the target is Darwin or if UnsafeMath is enabled to set
the UseNEONForSinglePrecisionFP, but only for A5 and A8, where this was a
problem. Maybe I was too conservative on my fix.

Tobi,

The march=arm option would default to ARMv4, while mattr=+neon would force
NEON, but I'm not sure it would default to A8, which would be a weird
combination of ARM7TDMI+NEON.

There are two things to know at this point:

1. When the execution gets to resetSubtargetFeatures, what CPU has it
detected for your arguments. You may also have to look at ARM.td to see if
the CPU that it got detected has in its description the feature
"FeatureNEONForFP".

2. If the CPU is correct (Cortex-A*), and it's neither A5 nor A8, do we
still want to generate single-precision float on NEON when non-Darwin and
safe math? I don't think so. Possibly, that condition should be extended to
ignore the CPU you're using and *only* emit NEON SP-FP when either Darwin
or UnsafeMath are on.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130607/5299507e/attachment.html>

Tobias Grosser

2013-Jun-07 07:48 UTC

head link

[LLVMdev] NEON vector instructions and the fast math IR flags

On 06/06/2013 11:58 PM, Renato Golin wrote:> On 7 June 2013 07:05, Owen Anderson <resistor at mac.com> wrote:
Hi Owen, hi Renato,

thanks for your replies.
>> Darwin uses NEON for floating point, but does *not* (and should not).
>> globally enable fast math flags.  Use of NEON for FP needs to remain
>> achievable without globally setting the fast math flags.  Fast math may
>> imply reasonably imply NEON, but the opposite direction is not
accurate.
Good point. Fast math is probably a too tough requirement. I need to 
look into what are the ways NEON does not comply with IEEE 754. For now 
the only difference I see is that it may round denormals to zero.
>> That said, I don't think anyone would object to making VFP codegen
>> available under non-Darwin triples.  It's just a matter of making
it happen.
I see.
> Tobi,
>
> The march=arm option would default to ARMv4, while mattr=+neon would force
> NEON, but I'm not sure it would default to A8, which would be a weird
> combination of ARM7TDMI+NEON.
>
> There are two things to know at this point:
>
> 1. When the execution gets to resetSubtargetFeatures, what CPU has it
> detected for your arguments. You may also have to look at ARM.td to see if
> the CPU that it got detected has in its description the feature
> "FeatureNEONForFP".
>
> 2. If the CPU is correct (Cortex-A*), and it's neither A5 nor A8, do we
> still want to generate single-precision float on NEON when non-Darwin and
> safe math? I don't think so. Possibly, that condition should be
extended to
> ignore the CPU you're using and *only* emit NEON SP-FP when either
Darwin
> or UnsafeMath are on.
Renato:

When to set which subtarget feature is a policy decision, where I 
honestly don't have any opinion on for clang. The best is probably to 
mirror the gcc behavior on linux targets. My current goal is to 
understand the implications of certain features and to make sure a tool 
using the LLVM back-ends can actually implement any policy it likes.

I just looked again at the +neonfp flag. Compiling with and without 
+neonfp flag seems to only affect scalar types in the attached test 
case. If e.g. the LLVM vectorizer introduces vector instructions on 
LLVM-IR level floating point vectors still yield NEON assembly even if 
compiled with "-mattr=+neon,-neonfp". Is this expected?

Cheers,
Tobias

-------------- next part --------------
; RUN: llc -march=arm -mattr=+vfp3,+neon < %s | FileCheck %s

; fooP() performs a vector floating point multiplication with full precision
; requirement. Even if we allow neon with -mattr=+neon, NEON should not be used
; to implement this function as it does not comply to the full precision
; requirements (NEON rounds e.g. denormals to zero which reduces precision)
define <4 x float> @fooP(<4 x float> %A, <4 x float> %B)
{
	%C = fmul <4 x float> %A, %B
; CHECK: fooP
; CHECK: vmul.f32	s
; CHECK: vmul.f32	s
; CHECK: vmul.f32	s
; CHECK: vmul.f32	s
	ret <4 x float> %C
}

; fooR() performs a vector floating point multiplication with relaxed precision
; requirements. In this case the precision loss introduced by neon is acceptable
; and we should generate NEON instructions
define <4 x float> @fooR(<4 x float> %A, <4 x float> %B)
{
	%C = fmul fast <4 x float> %A, %B
; CHECK: fooR
; CHECK: vmul.f32	q
	ret <4 x float> %C
}

; bar() performs a vector integer multiplication. On an ARM NEON device, this
; code should always be execute as vector code.
define <4 x i32> @bar(<4 x i32> %A, <4 x i32> %B)
{
	%C = mul <4 x i32> %A, %B
; CHECK: bar
; CHECK: vmul.i32	q
	ret <4 x i32> %C
}

define float @fooS(float %A, float %B)
{
        %C = fmul fast float %A, %B
; CHECK: fooR
; CHECK: vmul.f32       q
        ret float %C
}

David Tweed

2013-Jun-07 08:01 UTC

head link

[LLVMdev] NEON vector instructions and the fast math IR flags

>> Darwin uses NEON for floating point, but does *not* (and should not).
>> globally enable fast math flags.  Use of NEON for FP needs to remain
>> achievable without globally setting the fast math flags.  Fast math may
>> imply reasonably imply NEON, but the opposite direction is not
accurate.
| Good point. Fast math is probably a too tough requirement. I need to
| look into what are the ways NEON does not comply with IEEE 754. For now
| the only difference I see is that it may round denormals to zero.

Yes, I've gone on record before as saying that fast-math enables far too
many
different things for it to be "the canonical switch" for just about
any
transformation.
Rather, it should be what I think it is in gcc which is an effectively a
short-cut
for invoking of several individual math-option flags.

[snip]

|I just looked again at the +neonfp flag. Compiling with and without 
|+neonfp flag seems to only affect scalar types in the attached test 
|case. If e.g. the LLVM vectorizer introduces vector instructions on 
|LLVM-IR level floating point vectors still yield NEON assembly even if 
|compiled with "-mattr=+neon,-neonfp". Is this expected?

I'm virtually certain that's a problem since there are codebases out
there
which use that to effectively specify "integer neon but use VFP for
floats".
If the vectorizer is producing neon floating point from scalar code
in the presence of that flag then it's a (minor) issue waiting to happen. 

Cheers,
Dave

Renato Golin

2013-Jun-07 08:14 UTC

head link

[LLVMdev] NEON vector instructions and the fast math IR flags

On 7 June 2013 08:48, Tobias Grosser <tobias at grosser.es> wrote:
> When to set which subtarget feature is a policy decision, where I honestly
> don't have any opinion on for clang. The best is probably to mirror the
gcc
> behavior on linux targets.
>
Not really, since GCC has no special behaviour for Darwin, AFAIK.

My change will only generate SP-FP on NEON for A5 and A8 and only if it's
Darwin or UnsafeMath is on, which seems not to be the case for you, so I
don't think the problem is in that area. It's possible that some passes
are
not consulting that flag when generating NEON SP-FP. If that's true, this
is definitely a bug.

When I changed that, for VMUL.f32, it worked (ie. generated VFP
instruction), but it might not be taking the same path your code is.

I just looked again at the +neonfp flag. Compiling with and without
+neonfp> flag seems to only affect scalar types in the attached test case. If e.g.
> the LLVM vectorizer introduces vector instructions on LLVM-IR level
> floating point vectors still yield NEON assembly even if compiled with
> "-mattr=+neon,-neonfp". Is this expected?
>
No, vectorizers should honour FP contracts. This is probably a bug, too.

Please, fill both bugs on bugzilla, attaching the relevant IR to each one
and a way to reproduce, and I'll have a look at them.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130607/b71e00c9/attachment.html>

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Jun 2013 - [LLVMdev] NEON vector instructions and the fast math IR flags

[LLVMdev] NEON vector instructions and the fast math IR flags

[LLVMdev] NEON vector instructions and the fast math IR flags

[LLVMdev] NEON vector instructions and the fast math IR flags

[LLVMdev] NEON vector instructions and the fast math IR flags

Apparently Analagous Threads