thr3ads.net - similar to: "Vectorization with fast-math on irregular ISA sub-sets"

Displaying 20 results from an estimated 8000 matches similar to: "Vectorization with fast-math on irregular ISA sub-sets"

Vectorization with fast-math on irregular ISA sub-sets

2016 Feb 08

Vectorization with fast-math on irregular ISA sub-sets

On 8 February 2016 at 16:33, James Molloy <James.Molloy at arm.com> wrote: > The loop vectorizer does indeed require -ffast-math, but the IEEE-nonconformant transforms it does are far greater than using an ISA which may FTZ. It needs -ffast-math because any FP reductions necessarily have their execution order shuffled, due to executing some of them in parallel and reducing to scalar at

Vectorization with fast-math on irregular ISA sub-sets

2016 Feb 15

Vectorization with fast-math on irregular ISA sub-sets

Hi, > James, is that a correct assessment? Yes, it is also my belief that the only way ARMv7 NEON differs from IEEE754 is lack of denormal support. James > On 11 Feb 2016, at 10:53, Renato Golin <renato.golin at linaro.org> wrote: > > Hal, > > I had a read on the ARM ARM about VFP and SIMD FP semantics and my > analysis is that NEON's only problem is the

Vectorization with fast-math on irregular ISA sub-sets

2016 Feb 09

Vectorization with fast-math on irregular ISA sub-sets

----- Original Message ----- > From: "Renato Golin" <renato.golin at linaro.org> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "James Molloy" <James.Molloy at arm.com>, "Nadav Rotem" <nrotem at apple.com>, "Arnold Schwaighofer" > <aschwaighofer at apple.com>, "LLVM Dev" <llvm-dev at

Vectorization with fast-math on irregular ISA sub-sets

2016 Feb 09

Vectorization with fast-math on irregular ISA sub-sets

----- Original Message ----- > From: "James Molloy" <James.Molloy at arm.com> > To: "Renato Golin" <renato.golin at linaro.org> > Cc: "Nadav Rotem" <nrotem at apple.com>, "Arnold Schwaighofer" <aschwaighofer at apple.com>, "Hal Finkel" > <hfinkel at anl.gov>, "LLVM Dev" <llvm-dev at

Vectorization with fast-math on irregular ISA sub-sets

2016 Feb 11

Vectorization with fast-math on irregular ISA sub-sets

2016 Feb 11

Vectorization with fast-math on irregular ISA sub-sets

Our processor also has some issues regarding the handling of denormals - scalar and vector - and we ran into a related problem only a few days ago. The v3.8 compiler has done a lot of good work on optimisations for floating-point math, but ironically one of them broke our implementation of 'nextafterf'. The desired code fragment (FP32) is: float xAbs = fabsf(x); since we know our

NEON FP flags

2016 Mar 29

NEON FP flags

On Fri, Mar 25, 2016 at 01:23:03PM +0000, Renato Golin via llvm-dev wrote: > On 25 March 2016 at 04:11, Hal Finkel <hfinkel at anl.gov> wrote: > > As I understand it, the fundamental property being addresses here is: Are > > the semantics of scalar FP math the same as vector FP math? TTI seems like > > a good place to expose that information. If the semantics are indeed

NEON FP flags

2016 Mar 25

NEON FP flags

On 25 March 2016 at 04:11, Hal Finkel <hfinkel at anl.gov> wrote: > As I understand it, the fundamental property being addresses here is: Are the semantics of scalar FP math the same as vector FP math? TTI seems like a good place to expose that information. If the semantics are indeed different, then the vectorizer would require fast-math flags in order to vectorize FP operations

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

Hi Tim, > > cortex-a15 vfpv4 : vmla instruction emitted (which is a NEON instruction) > > I get a VFP vmla here rather than a NEON one (clang -target > armv7-linux-gnueabihf -mcpu=cortex-a15): "vmla.f32 s0, s1, s2". Are > you seeing something different? > As per Renato comment above, vmla instruction is NEON instruction while vmfa is VFP instruction. Correct

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

Hi all, Thanks for the info. Few observations from my side : LLVM : cortex-a8 vfpv3 : no vmla or vfma instruction emitted cortex-a8 vfpv4 : no vmla or vfma instruction emitted (This is invalid though as cortex-a8 does not have vfpv4) cortex-a8 vfpv4 with ffp-contract=fast : vfma instruction emitted ( this seems a bug to me!! If cortex-a8 doesn't come with vfpv4 then vfma instructions

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

On 19 December 2013 08:50, suyog sarda <sardask01 at gmail.com> wrote: > It may seem that total number of cycles are more or less same for single > vmla and vmul+vadd. However, when vmul+vadd combination is used instead of > vmla, then intermediate results will be generated which needs to be stored > in memory for future access. This will lead to lot of load/store ops being >

[LLVMdev] Help adding the Bullet physics sdk benchmark to the LLVM test suite?

2009 Dec 16

[LLVMdev] Help adding the Bullet physics sdk benchmark to the LLVM test suite?

The linux builds are not using SSE right now, but the vector data is 16-byte aligned on all platforms. So if you port this SSE code to another platform (Linux, Altivec, NEON), you could contribute it back to Bullet? The most interesting SSE part is the innerloop of the constraint solver: http://tinyurl.com/ydoapct Some developers replaced some linear algebra functions (in Bullet/LinearMath) with

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

> cortex-a8 vfpv4 with ffp-contract=fast : vfma instruction emitted ( this > seems a bug to me!! If cortex-a8 doesn't come with vfpv4 then vfma > instructions generated will be invalid ) If I'm understanding correctly, you've specifically told it this Cortex-A8 *does* come with vfpv4. Those kinds of odd combinations can be useful sometimes (if only for tests), so I'm not

Vectorization with fast-math on irregular ISA sub-sets

2016 Feb 08

Vectorization with fast-math on irregular ISA sub-sets

On 8 February 2016 at 19:25, James Molloy <James.Molloy at arm.com> wrote: >> For 16275, the fix is to disable loop vect. for no-fast-math + hasUnsafeAlgebra. > > Do you think there is a set of people that care about IEEE accuracy in so far that they don't want FTZ, but *are* happy to reassociate FP operations? That seems fairly niche to me? No. But I also don't want to

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 18

[LLVMdev] LLVM ARM VMLA instruction

On 18 December 2013 12:31, Tim Northover <t.p.northover at gmail.com> wrote: > That's what I thought! But we do seem to generate vfma on Cortex-A9. > Wonder if that's a bug, or Cortex-A9 is "VFPv3, but chuck in vfma > too"? > Hi Tim, I believe that's the NEON VMLA, not the VFP one. There was a discussion in the past about not using NEON and VFP

[LLVMdev] NEON vector instructions and the fast math IR flags

2013 Jun 07

[LLVMdev] NEON vector instructions and the fast math IR flags

>> Darwin uses NEON for floating point, but does *not* (and should not). >> globally enable fast math flags. Use of NEON for FP needs to remain >> achievable without globally setting the fast math flags. Fast math may >> imply reasonably imply NEON, but the opposite direction is not accurate. | Good point. Fast math is probably a too tough requirement. I need to | look

[LLVMdev] NEON vector instructions and the fast math IR flags

2013 Jun 07

[LLVMdev] NEON vector instructions and the fast math IR flags

> |I just looked again at the +neonfp flag. Compiling with and without > |+neonfp flag seems to only affect scalar types in the attached test > |case. If e.g. the LLVM vectorizer introduces vector instructions on > |LLVM-IR level floating point vectors still yield NEON assembly even if > |compiled with "-mattr=+neon,-neonfp". Is this expected? > > I'm virtually

[LLVMdev] Question about ARM/vfp/NEON code generation

2011 May 27

[LLVMdev] Question about ARM/vfp/NEON code generation

I have a code generation question for ARM with VFP and NEON. I am generating code for the following function as a test: void FloatingPointTest(float f1, float f2, float f3) { float f4 = f1 * f2; if (f4 > f3) printf("%f\n",f2); else printf("%f\n",f3); } I have tried compiling with: 1. -mfloat-abi=softfp and -mfpu=neon 2.

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

> As per Renato comment above, vmla instruction is NEON instruction while vmfa is VFP instruction. Correct me if i am wrong on this. My version of the ARM architecture reference manual (v7 A & R) lists versions requiring NEON and versions requiring VFP. (Section A8.8.337). Split in just the way you'd expect (SIMD variants need NEON). > It may seem that total number of cycles are

[LLVMdev] Help adding the Bullet physics sdk benchmark to the LLVM test suite?

2009 Dec 16

[LLVMdev] Help adding the Bullet physics sdk benchmark to the LLVM test suite?

Hello, Erwin > Although most of this is plain portable C++ perhaps LLVM can auto-vectorize > some of this? Well, I doubt so, unfortunately - LLVM does not have any autopar these days > There is a little bit of hand optimized x86 SSE code. This is only enabled > on 32bit Windows and Mac OSX Intel builds. Ok. What's about Linux builds? Are there any other implementations e.g.

similar to: Vectorization with fast-math on irregular ISA sub-sets