thr3ads.net - llvm dev - [LLVMdev] NEON vector instructions and the fast math IR flags [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Renato Golin

2013-Jun-07 14:22 UTC

[LLVMdev] NEON vector instructions and the fast math IR flags

On 7 June 2013 14:49, Arnold Schwaighofer <aschwaighofer at apple.com>
wrote:
> It is not the vectorizer that is the issue, it is the ARM backend that
> currently translates vectorized floating point IR to NEON instructions (it
> should scalarize it if desired to do so - i.e. if people care about
> denormals).
>
Hi Arnold,

Can't the vectorizer not generate the v4f32 vectors in the first place,
with that flag disabled?

To fix this issue one would have to fix the backend: i.e not declare
v4f32> et al as legal (under a flag). As to making this predicated on fast math
> flags on operations (something like no-denormals - i don’t think we have
> that in the IR yet - we only have no nan, no infinite, no signed zeros,
> etc) I believe this would be a lot harder because I suspect you would have
> to custom lower all the operations.
>
This is one way of solving it, and maybe we will have to implement it
anyway (for hand-coded IR or external front-ends).

However, that still doesn't solve the original issue. When the vectorizer
analysis the cost of the new loop, it takes into account that now you have
four operations (v4f32) instead of one, which is clearly profitable, but if
we know that the back-end will serialize, than it's no longer profitable,
and can quite possibly hurt performance.

I think we need both solutions.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130607/9a121993/attachment.html>

Arnold Schwaighofer

2013-Jun-07 14:41 UTC

head link

[LLVMdev] NEON vector instructions and the fast math IR flags

On Jun 7, 2013, at 9:22 AM, Renato Golin <renato.golin at linaro.org>
wrote:
> On 7 June 2013 14:49, Arnold Schwaighofer <aschwaighofer at
apple.com> wrote:
> It is not the vectorizer that is the issue, it is the ARM backend that
currently translates vectorized floating point IR to NEON instructions (it
should scalarize it if desired to do so - i.e. if people care about denormals).
> 
> Hi Arnold,
> 
> Can't the vectorizer not generate the v4f32 vectors in the first place,
with that flag disabled?
No, vectorized floating point IR and non-vectorized floating point IR are
semantically the same wrt to the end result - it is the backend that has to make
sure that this is the case (scalarize if desired). The vectorizer is not the
only one who could produce vectorize IR.

The vectorizer has two parts: legality and cost. It is legal to generate LLVM IR
with vectors because they are semantically the same. The cost model should
inform the vectorizer that it is a bad idea on ARM (after the backend has been
fixed) because it will be scalarized (dependent on flags).

(I took the liberty to call vectorized IR, and scalar IR semantically the same,
of course this only applies if you look at the execution not the individual
instruction).


We don’t want to encode backend knowledge into the vectorizer (i.e. don’t
vectorize type X because the backend does not support it). The only way to get
this result is indirectly via the cost model but the backend must still support
vectorized IR (it is part of the language) via scalarization.

(You can of course assign UMAX cost for all floating point vector types in the
cost model for ARM and get the desired result - this won’t solve the problem if
somebody else writes the vectorize LLVM IR though)

> 
> 
> To fix this issue one would have to fix the backend: i.e not declare v4f32
et al as legal (under a flag). As to making this predicated on fast math flags
on operations (something like no-denormals - i don’t think we have that in the
IR yet - we only have no nan, no infinite, no signed zeros, etc) I believe this
would be a lot harder because I suspect you would have to custom lower all the
operations.
> 
> This is one way of solving it, and maybe we will have to implement it
anyway (for hand-coded IR or external front-ends).
> 
> However, that still doesn't solve the original issue. When the
vectorizer analysis the cost of the new loop, it takes into account that now you
have four operations (v4f32) instead of one, which is clearly profitable, but if
we know that the back-end will serialize, than it's no longer profitable,
and can quite possibly hurt performance.
> 
> I think we need both solutions.
> 
> cheers,
> --renato

Renato Golin

2013-Jun-07 16:53 UTC

head link

[LLVMdev] NEON vector instructions and the fast math IR flags

On 7 June 2013 15:41, Arnold Schwaighofer <aschwaighofer at apple.com>
wrote:
> We don’t want to encode backend knowledge into the vectorizer (i.e. don’t
> vectorize type X because the backend does not support it).
>
We already do, via the cost table. This case is no different. It might not
be the best choice, but it is how the cost table is being built over the
last months.


The only way to get this result is indirectly via the cost model but
the> backend must still support vectorized IR (it is part of the language) via
> scalarization.
>
Absolutely! There are two problems to solve: increase the cost for SPFP
when UseNEONForSinglePrecisionFP is false, so that vectorizers don't
generate such code, and legalize correctly in the backend, for vector code
that does not respect that flag.


(You can of course assign UMAX cost for all floating point vector types
in> the cost model for ARM and get the desired result - this won’t solve the
> problem if somebody else writes the vectorize LLVM IR though)
>
I wouldn't use UMAX, since the idea is not to forbid, but to tell how
expensive it is. But it would be a big number, yes. ;)

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130607/439b07ec/attachment.html>

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Jun 2013 - [LLVMdev] NEON vector instructions and the fast math IR flags

[LLVMdev] NEON vector instructions and the fast math IR flags

[LLVMdev] NEON vector instructions and the fast math IR flags

[LLVMdev] NEON vector instructions and the fast math IR flags

Seemingly Similar Threads