thr3ads.net - similar to: "[LLVMdev] LIT Any/All options"

Displaying 20 results from an estimated 20000 matches similar to: "[LLVMdev] LIT Any/All options"

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 06

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Linfeng, On 06/02/17 02:51 PM, Linfeng Zhang wrote: > However, the critical thing is that all the states in each stage when > processing input[i] are reused by the next input[i+1]. That is > input[i+1] must wait input[i] for 1 stage, and input[i+2] must wait > input[i+1] for 1 stage, etc. That is indeed the tricky part... and the one I think you could do slightly differently. If

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Linfeng, On 06/02/17 07:18 PM, Linfeng Zhang wrote: > This is a great idea. But the order (psEncC->shapingLPCOrder) can be > configured to 12, 14, 16, 20 and 24 according to complexity parameter. > > It's hard to get a universal function to handle all these orders > efficiently. Any suggestions? I can think of two ways of handling larger orders. The obvious one is

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 03

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Jean-Marc, Attached is the silk_warped_autocorrelation_FIX_neon() which implements your idea. Speed improvement vs the previous optimization: Complexity 0-4: Doesn't call this function. Complexity 5: 2.1% (order = 16) Complexity 6: 1.0% (order = 20) Complexity 8: 0.1% (order = 24) Complexity 10: 0.1% (order = 24) Code size of silk_warped_autocorrelation_FIX_neon() changes from 2,644

[LLVMdev] Question about ARM/vfp/NEON code generation

2011 May 27

[LLVMdev] Question about ARM/vfp/NEON code generation

I have a code generation question for ARM with VFP and NEON. I am generating code for the following function as a test: void FloatingPointTest(float f1, float f2, float f3) { float f4 = f1 * f2; if (f4 > f3) printf("%f\n",f2); else printf("%f\n",f3); } I have tried compiling with: 1. -mfloat-abi=softfp and -mfpu=neon 2.

error message

2011 Jan 18

error message

I was running a sampling syntax based on a data frame (ago) of 160 rows and 25 columns. Below are the column names: > names(ago) [1] "SubID" "AGR1" "AGR2" "AGR3" "AGR4" "AGR5" "AGR6" "AGR7" "AGR8" [10] "AGR9" "AGR10" "WAGR1" "WAGR2"

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Linfeng, Thanks for the updated patch. I'll have a look and get back to you. When you report speedup percentages, is that relative to the entire encoder or relative to just that function in C? Also, what's the speedup compared to master? Cheers, Jean-Marc On 05/04/17 12:14 PM, Linfeng Zhang wrote: > I attached a new patch with small cleanup (disassembly is identical as > the

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 06

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Linfeng, I had a closer look at your patch and the code looks good -- and slightly simpler than I had anticipated, so that's good. I did some profiling on a Cortex A57 and I've been seeing slightly less improvement than you're reporting, more like 3.5% at complexity 8. It appears that the warped autocorrelation function itself is only faster by a factor of about 1.35. That's a

similar to: [LLVMdev] LIT Any/All options