thr3ads.net - search: "neon"

Displaying 20 results from an estimated 1124 matches for "neon".

Did you mean: leon

2016 Mar 25

NEON FP flags

...or FP math? TTI seems like a good place to expose that information. If the semantics are indeed different, then the vectorizer would require fast-math flags in order to vectorize FP operations (similarly, gcc's man page says it requires -funsafe-math-optimizations for vectorization unless -mfpu=neon or similar is specified). In this context, this different-semantics query would return true if: The semantics is indeed different, VFP is IEEE-754 compliant while NEON is not. We don't want to stop the compiler from using VFP for FP math, but we want to be cautious when using NEON in the same...

NEON FP flags

2016 Mar 29

NEON FP flags

...ood place to expose that information. If the semantics are indeed > > different, then the vectorizer would require fast-math flags in order to > > vectorize FP operations (similarly, gcc's man page says it requires > > -funsafe-math-optimizations for vectorization unless -mfpu=neon or similar > > is specified). In this context, this different-semantics query would return > > true if: > > The semantics is indeed different, VFP is IEEE-754 compliant while > NEON is not. We don't want to stop the compiler from using VFP for FP > math, but we want to...

[RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Nov 25

[RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics

On Nov 25, 2014, at 10:07 AM, Viswanath Puttagunta <viswanath.puttagunta at linaro.org> wrote: > > > Also is there plans to make the NEON optimisations on ARMv7 run time > > detectable like they have in cairo/pixman? For generic distributions > > it would nice to be able to be able to enable them as they offer > > decent performance improvements but have the code fall back on devices > > that don't support...

NEON FP flags

2016 Mar 25

NEON FP flags

...or FP math? TTI seems like a good place to expose that information. If the semantics are indeed different, then the vectorizer would require fast-math flags in order to vectorize FP operations (similarly, gcc's man page says it requires -funsafe-math-optimizations for vectorization unless -mfpu=neon or similar is specified). In this context, this different-semantics query would return true if: !(isDarwin OR ARMISA >= v8 OR fpMath == NEON) and then we need to teach people to use -mfpu=neon ;) I think this more-or-less matches what you've proposed. Is that right? -Hal P.S. Looking...

[LLVMdev] Question about ARM/vfp/NEON code generation

2011 May 27

[LLVMdev] Question about ARM/vfp/NEON code generation

Thanks, that helps a lot. > All chips (to date) with NEON have VFP3, so it's safe to assume that a -mfpu=neon will have VFP3, so all the decisions > about code generated for VFP3 can safely be assumed by targets with NEON. Just to confirm my understanding, can I correctly say in general that the llc code generator might blur distinctions between...

NEON FP flags

2016 Mar 22

NEON FP flags

On 22 March 2016 at 11:34, James Molloy <James.Molloy at arm.com> wrote: > I don’t think this part is right. The denormal flag would have to be set by > whatever code generates the FP instruction, which would be Clang’s codegen > layer. So the if (Darwin) would be there, not in TTI. Right, I meant the information to set/not set would be in TTI, not the actual setting. I don't

[LLVMdev] NEON vector instructions and the fast math IR flags

2013 Jun 07

[LLVMdev] NEON vector instructions and the fast math IR flags

On 7 June 2013 07:05, Owen Anderson <resistor at mac.com> wrote: > Darwin uses NEON for floating point, but does *not* (and should not). > globally enable fast math flags. Use of NEON for FP needs to remain > achievable without globally setting the fast math flags. Fast math may > imply reasonably imply NEON, but the opposite direction is not accurate. > > That sa...

[LLVMdev] NEON vector instructions and the fast math IR flags

2013 Jun 07

[LLVMdev] NEON vector instructions and the fast math IR flags

>> Darwin uses NEON for floating point, but does *not* (and should not). >> globally enable fast math flags. Use of NEON for FP needs to remain >> achievable without globally setting the fast math flags. Fast math may >> imply reasonably imply NEON, but the opposite direction is not accurate. | Go...

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

2009 Nov 10

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

I tried to speed up Dhrystone on ARM Cortex-A8 by optimizing the memcpy intrinsic. I used the Neon load multiple instruction to move up to 48 bytes at a time . Over 15 scalar instructions collapsed down into these 2 Neon instructions. fldmiad r3, {d0, d1, d2, d3, d4, d5} @ SrcLine dhrystone.c 359 fstmiad r1, {d0, d1, d2, d3, d4, d5} It seems like this should be faster. But I did...

[LLVMdev] NEON vector instructions and the fast math IR flags

2013 Jun 07

[LLVMdev] NEON vector instructions and the fast math IR flags

On 06/06/2013 11:58 PM, Renato Golin wrote: > On 7 June 2013 07:05, Owen Anderson <resistor at mac.com> wrote: Hi Owen, hi Renato, thanks for your replies. >> Darwin uses NEON for floating point, but does *not* (and should not). >> globally enable fast math flags. Use of NEON for FP needs to remain >> achievable without globally setting the fast math flags. Fast math may >> imply reasonably imply NEON, but the opposite direction is not accurate. Good...

[LLVMdev] Question about ARM/vfp/NEON code generation

2011 May 27

[LLVMdev] Question about ARM/vfp/NEON code generation

On May 27, 2011, at 10:49 AM, David Dunkle wrote: > Thanks, that helps a lot. > >> All chips (to date) with NEON have VFP3, so it's safe to assume that a > -mfpu=neon will have VFP3, so all the decisions >> about code generated for VFP3 can safely be assumed by targets with > NEON. > > Just to confirm my understanding, can I correctly say in general that > the llc code generator mig...

[LLVMdev] Question about ARM/vfp/NEON code generation

2011 May 27

[LLVMdev] Question about ARM/vfp/NEON code generation

On 27 May 2011 02:04, David Dunkle <ddunkle at arxan.com> wrote: > In all cases, I get code that looks pretty very the same; its like what > is below. However, I am expecting to see instruction level differences > between the vfp3 and neon versions. When I do the same with gcc 4.2 I do > see differences in the generated code. Hi David, You could see different instructions (as gcc does, you say), but it's not necessary. Your example has only floating point arithmetic, which both VFP3 and NEON can do, so the final assembly wi...

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

2009 Nov 10

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

On Nov 9, 2009, at 5:59 PM, David Conrad wrote: > On Nov 9, 2009, at 7:34 PM, Neel Nagar wrote: > >> I tried to speed up Dhrystone on ARM Cortex-A8 by optimizing the >> memcpy intrinsic. I used the Neon load multiple instruction to move >> up >> to 48 bytes at a time . Over 15 scalar instructions collapsed down >> into these 2 Neon instructions. Nice. Thanks for working on this. It has long been on my todo list. >> >> fldmiad r3, {d0, d1, d2, d3, d4, d5} @ S...

[LLVMdev] NEON vector instructions and the fast math IR flags

2013 Jun 07

[LLVMdev] NEON vector instructions and the fast math IR flags

> |I just looked again at the +neonfp flag. Compiling with and without > |+neonfp flag seems to only affect scalar types in the attached test > |case. If e.g. the LLVM vectorizer introduces vector instructions on > |LLVM-IR level floating point vectors still yield NEON assembly even if > |compiled with "-mattr=+neon,...

[LLVMdev] Question about ARM/vfp/NEON code generation

2011 May 27

[LLVMdev] Question about ARM/vfp/NEON code generation

I have a code generation question for ARM with VFP and NEON. I am generating code for the following function as a test: void FloatingPointTest(float f1, float f2, float f3) { float f4 = f1 * f2; if (f4 > f3) printf("%f\n",f2); else printf("%f\n",f3); } I have tried compiling with: 1. -mfloat-a...

[RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Nov 25

[RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics

...inaro.org> wrote: > > On 25 November 2014 at 09:39, Jonathan Lennox <jonathan at vidyo.com> wrote: > > > > On Nov 25, 2014, at 10:07 AM, Viswanath Puttagunta <viswanath.puttagunta at linaro.org> wrote: > >> > >> > Also is there plans to make the NEON optimisations on ARMv7 run time > >> > detectable like they have in cairo/pixman? For generic distributions > >> > it would nice to be able to be able to enable them as they offer > >> > decent performance improvements but have the code fall back on devices >...

[LLVMdev] NEON vector instructions and the fast math IR flags

2013 Jun 07

[LLVMdev] NEON vector instructions and the fast math IR flags

Hi, I was recently looking into the translation of LLVM-IR vector instructions to ARM NEON assembly. Specifically, when this is legal to do and when we need to be careful. I attached a very simple test case: define <4 x float> @fooP(<4 x float> %A, <4 x float> %B) { %C = fmul <4 x float> %A, %B ret <4 x float> %C } If fooP is compiled with "llc -mar...

[LLVMdev] ARM NEON intrinsics in clang

2013 Sep 26

[LLVMdev] ARM NEON intrinsics in clang

Hello LLVM Devs, I am starting my PhD on Automatic Parallelization for DSP and want to play with some ARM NEON intrinsics for a start. I spent the last three days trying to compile a version of LLVM that would allow me to compile sources that contain these intrinsics, but with no success. In the process I found out that clang doesn't support NEON (as per http://blog.llvm.org/2010/04/arm-advanced-simd-n...

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

2009 Nov 11

[LLVMdev] speed up memcpy intrinsic using ARM Neon registers

...n find more in this discussion: > http://groups.google.com/group/beagleboard/browse_thread/thread/12c7bd415fbc > 0993/e382202f1a92b0f8?lnk=gst&q=memcpy&pli=1 . > >> Even if it's not faster, it's still a code size win which is also >> important. > > Yes but NEON will drive up your power consumption, so if you are not > faster > you will drain your battery faster (assuming you care of course). > > In general we wouldn't recommend writing memcpy using NEON unless > you can > detect the exact core you will be running on: on A9 NEON w...

[LLVMdev] 3.4.1 Release Plans

2014 Mar 26

[LLVMdev] 3.4.1 Release Plans

Hi, We are now about halfway between the 3.4 and 3.5 releases, and I would like to start preparing for a 3.4.1 release. Here is my proposed release schedule: Mar 26 - April 9: Identify and backport additional bug fixes to the 3.4 branch. April 9 - April 18: Testing Phase April 18: 3.4.1 Release How you can help: - If you have any bug fixes you think should be included to 3.4.1, send me an

search for: neon