thr3ads.net - llvm dev - [llvm-dev] Vectorization with fast-math on irregular ISA sub-sets [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Hal Finkel via llvm-dev

2016-Feb-09 03:48 UTC

[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

----- Original Message -----> From: "James Molloy" <James.Molloy at arm.com>
> To: "Renato Golin" <renato.golin at linaro.org>
> Cc: "Nadav Rotem" <nrotem at apple.com>, "Arnold
Schwaighofer" <aschwaighofer at apple.com>, "Hal Finkel"
> <hfinkel at anl.gov>, "LLVM Dev" <llvm-dev at
lists.llvm.org>, "nd" <nd at arm.com>
> Sent: Monday, February 8, 2016 3:35:26 PM
> Subject: Re: Vectorization with fast-math on irregular ISA sub-sets
> 
> The conditions in which the LV kicks in are different for FP and
> integer loops. The LV always kicks in for non-FP loops AFAIK
Yes, and generically speaking, it does for FP loops as well (except, as has been
noted, when there are FP reductions).

It seems like we need two things here:

 1. Use our backend fast-math flags during instruction selection to scalarize
vector instructions that don't have the right allowances (on targets where
that's necessary)

 2. Update the TTI cost model interfaces to take fast-math flags so that all
vectorizers can make appropriate decisions

 -Hal
> 
> > On 8 Feb 2016, at 20:51, Renato Golin <renato.golin at
linaro.org>
> > wrote:
> > 
> > On 8 February 2016 at 19:25, James Molloy <James.Molloy at
arm.com>
> > wrote:
> >>> For 16275, the fix is to disable loop vect. for no-fast-math +
> >>> hasUnsafeAlgebra.
> >> 
> >> Do you think there is a set of people that care about IEEE
> >> accuracy in so far that they don't want FTZ, but *are* happy
to
> >> reassociate FP operations? That seems fairly niche to me?
> > 
> > No. But I also don't want to disable the vectorizer for integer
> > arithmetic. I'm guessing hasUnsafeAlgebra is not just for FZ but
> > also
> > NaNs and Infs, so disabling the vectorization of loops that have
> > any
> > of those unless safe-math is chosen seems simple enough to me.
> > 
> > cheers,
> > --renato
> > 
> 
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Renato Golin via llvm-dev

2016-Feb-09 09:38 UTC

head link

[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

On 9 February 2016 at 03:48, Hal Finkel <hfinkel at anl.gov>
wrote:> Yes, and generically speaking, it does for FP loops as well (except, as has
been noted, when there are FP reductions).
Right, and I think that's the problem, since a series of FP inductions
could converge to a different value in NEON or VFP, basically acting
like a n-wise reduction. Since we can't (yet?) prove there isn't a
series of operations with the same data, we have to treat them as
unsafe for non-IEEE FP operations.

> It seems like we need two things here:
>
>  1. Use our backend fast-math flags during instruction selection to
scalarize vector instructions that don't have the right allowances (on
targets where that's necessary)
>  2. Update the TTI cost model interfaces to take fast-math flags so that
all vectorizers can make appropriate decisions
I think this is exactly the opposite of what James is saying, and I
have to agree with him, since this would scalarise everything.

If the scalarisation is in IR, then any NEON intrinsic in C code will
get wrongly scalarised. Builtins can be lowered in either IR
operations or builtins, and the back-end has no way of knowing the
origin.

If the scalarization is lower down, then we risk also changing inline
ASM snippets, which is even worse.

James' idea on this one is to have an additional flag to *enable* such
scalarisation when the user cares too much about it, which I also
think it's a better idea than to make that the default behaviour.

cheers,
--renato

Hal Finkel via llvm-dev

2016-Feb-09 20:29 UTC

head link

[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

----- Original Message -----> From: "Renato Golin" <renato.golin at linaro.org>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "James Molloy" <James.Molloy at arm.com>, "Nadav
Rotem" <nrotem at apple.com>, "Arnold Schwaighofer"
> <aschwaighofer at apple.com>, "LLVM Dev" <llvm-dev at
lists.llvm.org>, "nd" <nd at arm.com>
> Sent: Tuesday, February 9, 2016 3:38:20 AM
> Subject: Re: Vectorization with fast-math on irregular ISA sub-sets
> 
> On 9 February 2016 at 03:48, Hal Finkel <hfinkel at anl.gov> wrote:
> > Yes, and generically speaking, it does for FP loops as well
> > (except, as has been noted, when there are FP reductions).
> 
> Right, and I think that's the problem, since a series of FP
> inductions
> could converge to a different value in NEON or VFP, basically acting
> like a n-wise reduction. Since we can't (yet?) prove there isn't a
> series of operations with the same data, we have to treat them as
> unsafe for non-IEEE FP operations.
> 
> 
> > It seems like we need two things here:
> >
> >  1. Use our backend fast-math flags during instruction selection to
> >  scalarize vector instructions that don't have the right
> >  allowances (on targets where that's necessary)
> >  2. Update the TTI cost model interfaces to take fast-math flags so
> >  that all vectorizers can make appropriate decisions
> 
> I think this is exactly the opposite of what James is saying, and I
> have to agree with him, since this would scalarise everything.
No, it just means that the intrinsics need to set the appropriate fast-math
flags on the instructions generated. This might require some frontend enablement
work, so be it.

There might be a slight issue with legacy IR bitcode, but if that's going to
be a problem in practice, we can design some scheme to let auto-upgrade do the
right thing.
> 
> If the scalarisation is in IR, then any NEON intrinsic in C code will
> get wrongly scalarised. Builtins can be lowered in either IR
> operations or builtins, and the back-end has no way of knowing the
> origin.
> 
> If the scalarization is lower down, then we risk also changing inline
> ASM snippets, which is even worse.
Yes, but we don't do that, so that's not a practical concern.
> 
> James' idea on this one is to have an additional flag to *enable*
> such
> scalarisation when the user cares too much about it, which I also
> think it's a better idea than to make that the default behaviour.
The --stop-pretending-to-be-IEEE-compliant-when-not-really flag? ;) I don't
think that's a good idea.

To be fair, our IR language reference does not actually say that our
floating-point arithmetic is IEEE compliant, but it is implied, and frontends
depend on this fact. We really should not change the IR floating-point semantics
contract over this. It might require some user education, but that's much
better than producing subtly-wrong results.

We have a pass-feedback mechanism, I think it would be very useful for compiling
with -Rpass-missed=loop-vectorize and/or -Rpass-analysis=loop-vectorize
helpfully informed users that compiling with -ffast-math and/or
-ffinite-math-only and -fno-signed-zeros would allow the loop to be vectorized
for the targeted hardware.

 -Hal
> 
> cheers,
> --renato
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Feb 2016 - Vectorization with fast-math on irregular ISA sub-sets

[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets

Apparently Analagous Threads