thr3ads.net - llvm dev - [LLVMdev] NEON vector instructions and the fast math IR flags [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Renato Golin

2013-Jun-07 08:14 UTC

[LLVMdev] NEON vector instructions and the fast math IR flags

On 7 June 2013 08:48, Tobias Grosser <tobias at grosser.es> wrote:
> When to set which subtarget feature is a policy decision, where I honestly
> don't have any opinion on for clang. The best is probably to mirror the
gcc
> behavior on linux targets.
>
Not really, since GCC has no special behaviour for Darwin, AFAIK.

My change will only generate SP-FP on NEON for A5 and A8 and only if it's
Darwin or UnsafeMath is on, which seems not to be the case for you, so I
don't think the problem is in that area. It's possible that some passes
are
not consulting that flag when generating NEON SP-FP. If that's true, this
is definitely a bug.

When I changed that, for VMUL.f32, it worked (ie. generated VFP
instruction), but it might not be taking the same path your code is.

I just looked again at the +neonfp flag. Compiling with and without
+neonfp> flag seems to only affect scalar types in the attached test case. If e.g.
> the LLVM vectorizer introduces vector instructions on LLVM-IR level
> floating point vectors still yield NEON assembly even if compiled with
> "-mattr=+neon,-neonfp". Is this expected?
>
No, vectorizers should honour FP contracts. This is probably a bug, too.

Please, fill both bugs on bugzilla, attaching the relevant IR to each one
and a way to reproduce, and I'll have a look at them.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130607/b71e00c9/attachment.html>

Arnold Schwaighofer

2013-Jun-07 13:49 UTC

head link

[LLVMdev] NEON vector instructions and the fast math IR flags

On Jun 7, 2013, at 3:14 AM, Renato Golin <renato.golin at linaro.org>
wrote:
> On 7 June 2013 08:48, Tobias Grosser <tobias at grosser.es> wrote:
> When to set which subtarget feature is a policy decision, where I honestly
don't have any opinion on for clang. The best is probably to mirror the gcc
behavior on linux targets.
> 
> Not really, since GCC has no special behaviour for Darwin, AFAIK.
> 
> My change will only generate SP-FP on NEON for A5 and A8 and only if
it's Darwin or UnsafeMath is on, which seems not to be the case for you, so
I don't think the problem is in that area. It's possible that some
passes are not consulting that flag when generating NEON SP-FP. If that's
true, this is definitely a bug.
> 
> When I changed that, for VMUL.f32, it worked (ie. generated VFP
instruction), but it might not be taking the same path your code is.
> 
> 
> I just looked again at the +neonfp flag. Compiling with and without +neonfp
flag seems to only affect scalar types in the attached test case. If e.g. the
LLVM vectorizer introduces vector instructions on LLVM-IR level floating point
vectors still yield NEON assembly even if compiled with
"-mattr=+neon,-neonfp". Is this expected?
> 
> No, vectorizers should honour FP contracts. This is probably a bug, too.
> 
> Please, fill both bugs on bugzilla, attaching the relevant IR to each one
and a way to reproduce, and I'll have a look at them.
> 

It is not the vectorizer that is the issue, it is the ARM backend that currently
translates vectorized floating point IR to NEON instructions (it should
scalarize it if desired to do so - i.e. if people care about denormals). To fix
this issue one would have to fix the backend: i.e not declare v4f32 et al as
legal (under a flag). As to making this predicated on fast math flags on
operations (something like no-denormals - i don’t think we have that in the IR
yet - we only have no nan, no infinite, no signed zeros, etc) I believe this
would be a lot harder because I suspect you would have to custom lower all the
operations.

> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Renato Golin

2013-Jun-07 14:22 UTC

head link

[LLVMdev] NEON vector instructions and the fast math IR flags

On 7 June 2013 14:49, Arnold Schwaighofer <aschwaighofer at apple.com>
wrote:
> It is not the vectorizer that is the issue, it is the ARM backend that
> currently translates vectorized floating point IR to NEON instructions (it
> should scalarize it if desired to do so - i.e. if people care about
> denormals).
>
Hi Arnold,

Can't the vectorizer not generate the v4f32 vectors in the first place,
with that flag disabled?

To fix this issue one would have to fix the backend: i.e not declare
v4f32> et al as legal (under a flag). As to making this predicated on fast math
> flags on operations (something like no-denormals - i don’t think we have
> that in the IR yet - we only have no nan, no infinite, no signed zeros,
> etc) I believe this would be a lot harder because I suspect you would have
> to custom lower all the operations.
>
This is one way of solving it, and maybe we will have to implement it
anyway (for hand-coded IR or external front-ends).

However, that still doesn't solve the original issue. When the vectorizer
analysis the cost of the new loop, it takes into account that now you have
four operations (v4f32) instead of one, which is clearly profitable, but if
we know that the back-end will serialize, than it's no longer profitable,
and can quite possibly hurt performance.

I think we need both solutions.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130607/9a121993/attachment.html>

Tobias Grosser

2013-Jun-07 20:35 UTC

head link

[LLVMdev] NEON vector instructions and the fast math IR flags

On 06/07/2013 06:49 AM, Arnold Schwaighofer wrote:>
> On Jun 7, 2013, at 3:14 AM, Renato Golin <renato.golin at linaro.org>
wrote:
>
>> On 7 June 2013 08:48, Tobias Grosser <tobias at grosser.es>
wrote:
>> When to set which subtarget feature is a policy decision, where I
honestly don't have any opinion on for clang. The best is probably to mirror
the gcc behavior on linux targets.
>>
>> Not really, since GCC has no special behaviour for Darwin, AFAIK.
>>
>> My change will only generate SP-FP on NEON for A5 and A8 and only if
it's Darwin or UnsafeMath is on, which seems not to be the case for you, so
I don't think the problem is in that area. It's possible that some
passes are not consulting that flag when generating NEON SP-FP. If that's
true, this is definitely a bug.
>>
>> When I changed that, for VMUL.f32, it worked (ie. generated VFP
instruction), but it might not be taking the same path your code is.
>>
>>
>> I just looked again at the +neonfp flag. Compiling with and without
+neonfp flag seems to only affect scalar types in the attached test case. If
e.g. the LLVM vectorizer introduces vector instructions on LLVM-IR level
floating point vectors still yield NEON assembly even if compiled with
"-mattr=+neon,-neonfp". Is this expected?
>>
>> No, vectorizers should honour FP contracts. This is probably a bug,
too.
>>
>> Please, fill both bugs on bugzilla, attaching the relevant IR to each
one and a way to reproduce, and I'll have a look at them.
>>
>
>
> It is not the vectorizer that is the issue, it is the ARM backend that
currently translates vectorized floating point IR to NEON instructions (it
should scalarize it if desired to do so - i.e. if people care about denormals).
To fix this issue one would have to fix the backend: i.e not declare v4f32 et al
as legal (under a flag). As to making this predicated on fast math flags on
operations (something like no-denormals - i don’t think we have that in the IR
yet - we only have no nan, no infinite, no signed zeros, etc) I believe this
would be a lot harder because I suspect you would have to custom lower all the
operations.
Thanks for that explanation. I think it illustrates the situation well.

For programs that have mixed precision requirements for floating point 
operations we probably need to do this according to the fast math flags.
Until we get there, a good first step would probably be to provide a 
global option similar to -enable-no-infs-fp-math that specifies if 
denormals should be allowed or not. This would allow the user to specify 
the precision requirements, without the need to alter with the feature 
flags of a specific piece of hardware.

Tobi

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Jun 2013 - [LLVMdev] NEON vector instructions and the fast math IR flags

[LLVMdev] NEON vector instructions and the fast math IR flags

[LLVMdev] NEON vector instructions and the fast math IR flags

[LLVMdev] NEON vector instructions and the fast math IR flags

[LLVMdev] NEON vector instructions and the fast math IR flags

Possibly Parallel Threads