similar to: Several patches of ARM NEON optimization

Displaying 8 results from an estimated 8 matches similar to: "Several patches of ARM NEON optimization"

silk_warped_autocorrelation_FIX() NEON optimization

2016 Jul 01

1

silk_warped_autocorrelation_FIX() NEON optimization

Hi all, I'm sending patch "Optimize silk_warped_autocorrelation_FIX() for ARM NEON" in an separate email. It is based on Tim’s aarch64v8 branch https://git.xiph.org/?p=users/tterribe/opus.git;a=shortlog;h=refs/heads/aarch64v8 Thanks for your comments. Linfeng

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Jan 31

6

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi, Attached is a patch with arm neon optimizations for silk_warped_autocorrelation_FIX(). Please review. Thanks, Felicia -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170131/9a912bb4/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name:

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 06

2

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Jean-Marc, Thanks a lot for reviewing this huge assembly function! silk_warped_autocorrelation_FIX_c()'s kernel part is for( n = 0; n < length; n++ ) { tmp1_QS = silk_LSHIFT32( (opus_int32)input[ n ], QS ); /* Loop over allpass sections */ for( i = 0; i < order; i++ ) { /* Output of allpass section */ tmp2_QS = silk_SMLAWB(

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

2

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

This is a great idea. But the order (psEncC->shapingLPCOrder) can be configured to 12, 14, 16, 20 and 24 according to complexity parameter. It's hard to get a universal function to handle all these orders efficiently. Any suggestions? Thanks, Linfeng On Mon, Feb 6, 2017 at 12:40 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 06/02/17 02:51 PM,

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

3

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Jean-Marc, Thanks for your suggestions. Will get back to you once we have some updates. Linfeng On Mon, Feb 6, 2017 at 5:47 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 06/02/17 07:18 PM, Linfeng Zhang wrote: > > This is a great idea. But the order (psEncC->shapingLPCOrder) can be > > configured to 12, 14, 16, 20 and 24 according to

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

4

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Thank Jean-Marc! The speedup percentages are all relative to the entire encoder. Comparing to master, this optimization patch speeds up fixed-point SILK encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8% Complexity 8: 5.5% Complexity 10: 4.0% when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max MHz: 2116.5 Thanks, Linfeng On Wed, Apr 5, 2017 at 11:02 AM,

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

2

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

I attached a new patch with small cleanup (disassembly is identical as the last patch). We have done the same internal testing as usual. Also, attached 2 failed temporary versions which try to reduce code size (just for code review reference purpose). The new patch of silk_warped_autocorrelation_FIX_neon() has a code size of 3,228 bytes (with gcc). smaller_slower.c has a code size of 2,304

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 11

2

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Jean-Marc, Thanks for your suggestions! I attached the new patch, with inlined reply below. Thanks, Linfeng On Thu, Apr 6, 2017 at 12:55 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > I did some profiling on a Cortex A57 and I've been seeing slightly less > improvement than you're reporting, more like 3.5% at complexity 8. It > appears that the warped