similar to: [PATCH] Optimize silk_LPC_inverse_pred_gain() for ARM NEON

Displaying 20 results from an estimated 200 matches similar to: "[PATCH] Optimize silk_LPC_inverse_pred_gain() for ARM NEON"

2017 Feb 13
2
[PATCH] Optimize silk_LPC_inverse_pred_gain() for ARM NEON
Hi Jean-Marc, Yes I confirm that we have done the same internal review on this patch. For 1), I agree that an explicit unit test would be a good plus to cover the cases that "make check" cannot trigger. If you like, we may submit an unit test patch for code review. Thanks, Linfeng On Thu, Feb 9, 2017 at 4:48 PM Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng,
2017 Feb 15
2
[PATCH] Optimize silk_LPC_inverse_pred_gain() for ARM NEON
Hi Jean-Marc, (forgot cc opus@) Thanks for creating the unit test code. Attached is the updated optimization patch. On Mon, Feb 13, 2017 at 10:17 AM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > On 13/02/17 01:09 PM, Linfeng Zhang wrote: > > For 1), I agree that an explicit unit test would be a good plus to cover > > the cases that "make check" cannot
2017 Feb 10
0
[PATCH] Optimize silk_LPC_inverse_pred_gain() for ARM NEON
Hi Linfeng, Can you confirm that you the patch went through the same internal review (presumably from James) than the previous ones? I had a look and did some testing and it looked good to me. There's only two issues I'd like to resolve first -- none of which directly related to your code. 1) The overflow condition is essentially untested because none of the tests in "make
2017 Feb 13
0
[PATCH] Optimize silk_LPC_inverse_pred_gain() for ARM NEON
On 13/02/17 01:09 PM, Linfeng Zhang wrote: > For 1), I agree that an explicit unit test would be a good plus to cover > the cases that "make check" cannot trigger. If you like, we may submit > an unit test patch for code review. Yes, please include a unit test that triggers the overflow detection. Once that works, I think we can merge this optimization. Cheers, Jean-Marc
2017 Feb 15
0
[PATCH] Optimize silk_LPC_inverse_pred_gain() for ARM NEON
Hi Linfeng, Thanks for the updated patch. Just pushed it to master. One thing that still bothers me a bit is that the if( ( max > 0 ) || ( min < -1 ) ) line is still pretty much untested. By which I mean that if I remove the condition, then the tests (including the new unit tests) still pass. I wasn't able to figure out a case that triggers it -- and I'm not even 100% sure it's
2016 Jul 14
6
Several patches of ARM NEON optimization
I rebased my previous 3 patches to the current master with minor changes. Patches 1 to 3 replace all my previous submitted patches. Patches 4 and 5 are new. Thanks, Linfeng Zhang
2017 Feb 14
1
[PATCH] Add silk/tests/test_unit_optimization_LPC_inv_pred_gain
Hi, Attached is a patch with silk/tests/test_unit_optimization_LPC_inv_pred_gain which does the unit test of silk_LPC_inverse_pred_gain() optimizations. Please review. The testing loop number is set to 10,000, since all branches in this function get hit after 9,085 loops of random inputs. Thanks, Linfeng -------------- next part -------------- An HTML attachment was scrubbed... URL:
2017 Jun 05
4
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Jean-Marc, I attached the new version in inner_prod_5patches_v2.zip which synced to the current master. For fixed-point ARM, only 0003-Optimize-fixed-point-celt _inner_prod-and-dual_inner_.patch changes the performance. For floating-point ARM, only 0004-Optimize-floating-point-c elt_inner_prod-and-dual_inn.patch changes the performance. Patch 1 and 2 are code clean-up and can only affect x86
2017 Jun 06
3
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Linfeng, On 05/06/17 03:31 PM, Linfeng Zhang wrote: > Yes we'll have one more patch set related to xcorr in next week. Please > don't wait if it's too late for 1.2 release. Assuming there's no issue with the patches, next week isn't too late. Also, I've started looking at your patches. So far there's one thing that puzzles me a bit. In the OPUS_CHECK_ASM
2017 Apr 19
4
2 patches related to silk_biquad_alt() optimization
Hi, Attached are 2 patches related to silk_biquad_alt() optimization. Please review. Thanks, Linfeng Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170419/f08f5030/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name:
2017 Feb 15
4
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi, Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter(). And Patch 2 optimizes the new function celt_fir_permit_overflow() for ARM NEON. Please recommend a better function name. We did the same internal code review and testing already. Thanks, Linfeng -------------- next part -------------- An HTML attachment was scrubbed... URL:
2017 Jun 06
4
Antw: Re: celt_inner_prod() and dual_inner_prod() NEON intrinsics
>>> Linfeng Zhang <linfengz at google.com> schrieb am 06.06.2017 um 06:46 in Nachricht <CAKoqLCAfj+fDUMLfN4dLNSZ4NNAZpaSt_BWZRp+7XBqfhiSqiQ at mail.gmail.com>: > Hi Jean-Marc, > > I tried "==" before, and it failed when both results are 0.0. Maybe the > exponent or sign has difference because of the different 0.0 representation > in NEON. If anybody
2017 Apr 19
3
[PATCH] cosmetics,silk: correct input/output arg comments
Hi, Attached is a patch for cosmetics purpose. Please review. Thanks, Linfeng Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170419/34354707/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-cosmetics-silk-correct-input-output-arg-comments.patch
2017 Jun 01
4
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi, Attached are 5 patches related to celt_inner_prod() and dual_inner_prod() NEON intrinsics optimization. In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch, the optimization changed the order of floating-point inner products, which will change the results. I created celt_inner_prod_neon_float_c_simulation() and dual_inner_prod_neon_float_c_simulation() to simulate the order
2017 Jul 21
2
[PATCH] Fix celt_pitch_xcorr ARM jump table compiling error
Hi, Attached is a fix related to ARM optimization jump table compiling error. Thanks, Linfeng Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170720/661d96b5/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name:
2017 May 31
4
Opus floating-point NEON jump table question
Hi, ./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf --disable-assertions --disable-check-asm --enable-intrinsics CFLAGS=-O3 --disable-shared When configuring with floating-point and intrinsics enabled as above, the generated config.h only has OPUS_ARM_MAY_HAVE_NEON_INTR defined (to 1), with /* #undef OPUS_ARM_ASM */ /* #undef OPUS_ARM_INLINE_ASM */ /* #undef
2017 Jun 01
2
Opus floating-point NEON jump table question
Thank Jean-Mark and Jonathan! I tested current OPUS encoder in floating-point with Complexity 8. Hacking using the attached patch (which will generate "#define OPUS_ARM_MAY_HAVE_NEON 1" in config.h) will speed up about 14.7% on my Chromebook. Probably it's because many NEON intrinsics optimizations can benefit both fixed-point and floating-point encoder. So if it's safe enough
2017 Jun 06
2
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Linfeng, On 06/06/17 04:09 PM, Jonathan Lennox wrote: > Two comments on the various infrastructure for RTCD etc. > > 1. The 0002- patch changes the ABI of the celt_pitch_xcorr functions, > but doesn’t change the assembly in celt/arm/celt_pitch_xcorr_arm.s > correspondingly. I suspect the ‘arch’ parameter can just be ignored > by the assembly functions, but at least the
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Thank Jean-Marc! The speedup percentages are all relative to the entire encoder. Comparing to master, this optimization patch speeds up fixed-point SILK encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8% Complexity 8: 5.5% Complexity 10: 4.0% when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max MHz: 2116.5 Thanks, Linfeng On Wed, Apr 5, 2017 at 11:02 AM,
2017 Apr 05
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
I attached a new patch with small cleanup (disassembly is identical as the last patch). We have done the same internal testing as usual. Also, attached 2 failed temporary versions which try to reduce code size (just for code review reference purpose). The new patch of silk_warped_autocorrelation_FIX_neon() has a code size of 3,228 bytes (with gcc). smaller_slower.c has a code size of 2,304