thr3ads.net - llvm dev - [LLVMdev] NEON vector instructions and the fast math IR flags [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Renato Golin

2013-Jun-07 16:53 UTC

[LLVMdev] NEON vector instructions and the fast math IR flags

On 7 June 2013 15:41, Arnold Schwaighofer <aschwaighofer at apple.com>
wrote:
> We don’t want to encode backend knowledge into the vectorizer (i.e. don’t
> vectorize type X because the backend does not support it).
>
We already do, via the cost table. This case is no different. It might not
be the best choice, but it is how the cost table is being built over the
last months.


The only way to get this result is indirectly via the cost model but
the> backend must still support vectorized IR (it is part of the language) via
> scalarization.
>
Absolutely! There are two problems to solve: increase the cost for SPFP
when UseNEONForSinglePrecisionFP is false, so that vectorizers don't
generate such code, and legalize correctly in the backend, for vector code
that does not respect that flag.


(You can of course assign UMAX cost for all floating point vector types
in> the cost model for ARM and get the desired result - this won’t solve the
> problem if somebody else writes the vectorize LLVM IR though)
>
I wouldn't use UMAX, since the idea is not to forbid, but to tell how
expensive it is. But it would be a big number, yes. ;)

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130607/439b07ec/attachment.html>

Arnold Schwaighofer

2013-Jun-07 17:08 UTC

head link

[LLVMdev] NEON vector instructions and the fast math IR flags

Renato, I think we agree.

On Jun 7, 2013, at 11:53 AM, Renato Golin <renato.golin at linaro.org>
wrote:
> On 7 June 2013 15:41, Arnold Schwaighofer <aschwaighofer at
apple.com> wrote:
> We don’t want to encode backend knowledge into the vectorizer (i.e. don’t
vectorize type X because the backend does not support it).
> 
> We already do, via the cost table. This case is no different. It might not
be the best choice, but it is how the cost table is being built over the last
months.
> 
> 
Using the cost model to communicate that the backend will generate wrong code is
an abuse (in my opinion, this is not what the cost model is for). This is what I
meant by encoding backend knowledge. Of course, we use the cost model to tell us
how expensive an operation might be but we should not use it as an indicator how
wrong it will be ;). (Which is what we would do if we give a v4f32 operation a
high cost because the backend generates instructions that flush denormals to
zero).

What I wanted to say is that even if you give v4f32 a high cost you still have
to solve the real problem in the ARM backend.
> The only way to get this result is indirectly via the cost model but the
backend must still support vectorized IR (it is part of the language) via
scalarization.
> 
> Absolutely! There are two problems to solve: increase the cost for SPFP
when UseNEONForSinglePrecisionFP is false, so that vectorizers don't
generate such code, and legalize correctly in the backend, for vector code that
does not respect that flag.
> 
> 
> (You can of course assign UMAX cost for all floating point vector types in
the cost model for ARM and get the desired result - this won’t solve the problem
if somebody else writes the vectorize LLVM IR though)
> 
> I wouldn't use UMAX, since the idea is not to forbid, but to tell how
expensive it is. But it would be a big number, yes. ;)
I was referring to the case when you are abusing the cost model to forbid a
vectorized v4f32 IR (which I thought you were proposing).

What I am suggesting is that (if you care about denormals):

* the arm backend has to be fixed to scalarize floating point vector operations
(behind a flag)
* the arm target transform model has to correctly reflect that

What one could also do (but I don’t think is a good idea) is to just give
floating point vector operations a max cost. You might run into unforeseen
problems, including that other clients are generating vectorized LLVM IR.

(This makes we wonder whether we clamp the cost computation at TYPE_MAX :)
> 
> cheers,
> --renato

Renato Golin

2013-Jun-07 17:24 UTC

head link

[LLVMdev] NEON vector instructions and the fast math IR flags

On 7 June 2013 18:08, Arnold Schwaighofer <aschwaighofer at apple.com>
wrote:
> What I am suggesting is that (if you care about denormals):
>
> * the arm backend has to be fixed to scalarize floating point vector
> operations (behind a flag)
> * the arm target transform model has to correctly reflect that
>
Yup. What I had in mind, too. This is why I asked Tobi to create two bugs,
and we would fix them accordingly. ;)

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130607/f34ce44c/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Jun 2013 - [LLVMdev] NEON vector instructions and the fast math IR flags

[LLVMdev] NEON vector instructions and the fast math IR flags

[LLVMdev] NEON vector instructions and the fast math IR flags

[LLVMdev] NEON vector instructions and the fast math IR flags

Possibly Parallel Threads