thr3ads.net - search: "neons"

Displaying 20 results from an estimated 1124 matches for "neons".

Did you mean: neon

2016 Mar 25

NEON FP flags

On 25 March 2016 at 04:11, Hal Finkel <hfinkel at anl.gov> wrote: > As I understand it, the fundamental property being addresses here is: Are the semantics of scalar FP math the same as vector FP math? TTI seems like a good place to expose that information. If the semantics are indeed different, then the vectorizer would require fast-math flags in order to vectorize FP operations

NEON FP flags

2016 Mar 29

NEON FP flags

On Fri, Mar 25, 2016 at 01:23:03PM +0000, Renato Golin via llvm-dev wrote: > On 25 March 2016 at 04:11, Hal Finkel <hfinkel at anl.gov> wrote: > > As I understand it, the fundamental property being addresses here is: Are > > the semantics of scalar FP math the same as vector FP math? TTI seems like > > a good place to expose that information. If the semantics are indeed

[RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Nov 25

[RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics

On Nov 25, 2014, at 10:07 AM, Viswanath Puttagunta <viswanath.puttagunta at linaro.org> wrote: > > > Also is there plans to make the NEON optimisations on ARMv7 run time > > detectable like they have in cairo/pixman? For generic distributions > > it would nice to be able to be able to enable them as they offer > > decent performance improvements but have the code

NEON FP flags

2016 Mar 25

NEON FP flags

Hi Renato, As I understand it, the fundamental property being addresses here is: Are the semantics of scalar FP math the same as vector FP math? TTI seems like a good place to expose that information. If the semantics are indeed different, then the vectorizer would require fast-math flags in order to vectorize FP operations (similarly, gcc's man page says it requires

[LLVMdev] Question about ARM/vfp/NEON code generation

2011 May 27

[LLVMdev] Question about ARM/vfp/NEON code generation

Thanks, that helps a lot. > All chips (to date) with NEON have VFP3, so it's safe to assume that a -mfpu=neon will have VFP3, so all the decisions > about code generated for VFP3 can safely be assumed by targets with NEON. Just to confirm my understanding, can I correctly say in general that the llc code generator might blur distinctions between NEON and VFP3 when it can do so

NEON FP flags

2016 Mar 22

NEON FP flags

On 22 March 2016 at 11:34, James Molloy <James.Molloy at arm.com> wrote: > I don’t think this part is right. The denormal flag would have to be set by > whatever code generates the FP instruction, which would be Clang’s codegen > layer. So the if (Darwin) would be there, not in TTI. Right, I meant the information to set/not set would be in TTI, not the actual setting. I don't

[LLVMdev] NEON vector instructions and the fast math IR flags

2013 Jun 07

[LLVMdev] NEON vector instructions and the fast math IR flags

On 7 June 2013 07:05, Owen Anderson <resistor at mac.com> wrote: > Darwin uses NEON for floating point, but does *not* (and should not). > globally enable fast math flags. Use of NEON for FP needs to remain > achievable without globally setting the fast math flags. Fast math may > imply reasonably imply NEON, but the opposite direction is not accurate. > > That said, I

[LLVMdev] NEON vector instructions and the fast math IR flags

2013 Jun 07

[LLVMdev] NEON vector instructions and the fast math IR flags

>> Darwin uses NEON for floating point, but does *not* (and should not). >> globally enable fast math flags. Use of NEON for FP needs to remain >> achievable without globally setting the fast math flags. Fast math may >> imply reasonably imply NEON, but the opposite direction is not accurate. | Good point. Fast math is probably a too tough requirement. I need to | look

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

2009 Nov 10

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

I tried to speed up Dhrystone on ARM Cortex-A8 by optimizing the memcpy intrinsic. I used the Neon load multiple instruction to move up to 48 bytes at a time . Over 15 scalar instructions collapsed down into these 2 Neon instructions. fldmiad r3, {d0, d1, d2, d3, d4, d5} @ SrcLine dhrystone.c 359 fstmiad r1, {d0, d1, d2, d3, d4, d5} It seems like this should be faster. But I did

[LLVMdev] NEON vector instructions and the fast math IR flags

2013 Jun 07

[LLVMdev] NEON vector instructions and the fast math IR flags

On 06/06/2013 11:58 PM, Renato Golin wrote: > On 7 June 2013 07:05, Owen Anderson <resistor at mac.com> wrote: Hi Owen, hi Renato, thanks for your replies. >> Darwin uses NEON for floating point, but does *not* (and should not). >> globally enable fast math flags. Use of NEON for FP needs to remain >> achievable without globally setting the fast math flags. Fast

[LLVMdev] Question about ARM/vfp/NEON code generation

2011 May 27

[LLVMdev] Question about ARM/vfp/NEON code generation

On May 27, 2011, at 10:49 AM, David Dunkle wrote: > Thanks, that helps a lot. > >> All chips (to date) with NEON have VFP3, so it's safe to assume that a > -mfpu=neon will have VFP3, so all the decisions >> about code generated for VFP3 can safely be assumed by targets with > NEON. > > Just to confirm my understanding, can I correctly say in general that >

[LLVMdev] Question about ARM/vfp/NEON code generation

2011 May 27

[LLVMdev] Question about ARM/vfp/NEON code generation

On 27 May 2011 02:04, David Dunkle <ddunkle at arxan.com> wrote: > In all cases, I get code that looks pretty very the same; its like what > is below. However, I am expecting to see instruction level differences > between the vfp3 and neon versions. When I do the same with gcc 4.2 I do > see differences in the generated code. Hi David, You could see different instructions (as

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

2009 Nov 10

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

On Nov 9, 2009, at 5:59 PM, David Conrad wrote: > On Nov 9, 2009, at 7:34 PM, Neel Nagar wrote: > >> I tried to speed up Dhrystone on ARM Cortex-A8 by optimizing the >> memcpy intrinsic. I used the Neon load multiple instruction to move >> up >> to 48 bytes at a time . Over 15 scalar instructions collapsed down >> into these 2 Neon instructions. Nice. Thanks

[LLVMdev] NEON vector instructions and the fast math IR flags

2013 Jun 07

[LLVMdev] NEON vector instructions and the fast math IR flags

> |I just looked again at the +neonfp flag. Compiling with and without > |+neonfp flag seems to only affect scalar types in the attached test > |case. If e.g. the LLVM vectorizer introduces vector instructions on > |LLVM-IR level floating point vectors still yield NEON assembly even if > |compiled with "-mattr=+neon,-neonfp". Is this expected? > > I'm virtually

[LLVMdev] Question about ARM/vfp/NEON code generation

2011 May 27

[LLVMdev] Question about ARM/vfp/NEON code generation

I have a code generation question for ARM with VFP and NEON. I am generating code for the following function as a test: void FloatingPointTest(float f1, float f2, float f3) { float f4 = f1 * f2; if (f4 > f3) printf("%f\n",f2); else printf("%f\n",f3); } I have tried compiling with: 1. -mfloat-abi=softfp and -mfpu=neon 2.

[RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Nov 25

[RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics

On 25 November 2014 at 10:11, Viswanath Puttagunta <viswanath.puttagunta at linaro.org> wrote: > > On 25 November 2014 at 09:39, Jonathan Lennox <jonathan at vidyo.com> wrote: > > > > On Nov 25, 2014, at 10:07 AM, Viswanath Puttagunta <viswanath.puttagunta at linaro.org> wrote: > >> > >> > Also is there plans to make the NEON optimisations

[LLVMdev] NEON vector instructions and the fast math IR flags

2013 Jun 07

[LLVMdev] NEON vector instructions and the fast math IR flags

Hi, I was recently looking into the translation of LLVM-IR vector instructions to ARM NEON assembly. Specifically, when this is legal to do and when we need to be careful. I attached a very simple test case: define <4 x float> @fooP(<4 x float> %A, <4 x float> %B) { %C = fmul <4 x float> %A, %B ret <4 x float> %C } If fooP is compiled with "llc -march=arm

[LLVMdev] ARM NEON intrinsics in clang

2013 Sep 26

[LLVMdev] ARM NEON intrinsics in clang

Hello LLVM Devs, I am starting my PhD on Automatic Parallelization for DSP and want to play with some ARM NEON intrinsics for a start. I spent the last three days trying to compile a version of LLVM that would allow me to compile sources that contain these intrinsics, but with no success. In the process I found out that clang doesn't support NEON (as per

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

2009 Nov 11

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

On Nov 11, 2009, at 3:27 AM, Rodolph Perfetta wrote: > > If you know about the alignment, maybe use structured load/store > (vst1.64/vld1.64 {dn-dm}). You may also want to work on whole cache > lines > (64 bytes on A8). You can find more in this discussion: > http://groups.google.com/group/beagleboard/browse_thread/thread/12c7bd415fbc >

[LLVMdev] 3.4.1 Release Plans

2014 Mar 26

[LLVMdev] 3.4.1 Release Plans

Hi, We are now about halfway between the 3.4 and 3.5 releases, and I would like to start preparing for a 3.4.1 release. Here is my proposed release schedule: Mar 26 - April 9: Identify and backport additional bug fixes to the 3.4 branch. April 9 - April 18: Testing Phase April 18: 3.4.1 Release How you can help: - If you have any bug fixes you think should be included to 3.4.1, send me an

search for: neons