thr3ads.net - search: "linfeng"

Displaying 20 results from an estimated 86 matches for "linfeng".

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 06

celt_inner_prod() and dual_inner_prod() NEON intrinsics

Hi Linfeng, On 05/06/17 03:31 PM, Linfeng Zhang wrote: > Yes we'll have one more patch set related to xcorr in next week. Please > don't wait if it's too late for 1.2 release. Assuming there's no issue with the patches, next week isn't too late. Also, I've started looking at y...

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 05

celt_inner_prod() and dual_inner_prod() NEON intrinsics

...nner_.patch changes the performance. For floating-point ARM, only 0004-Optimize-floating-point-c elt_inner_prod-and-dual_inn.patch changes the performance. Patch 1 and 2 are code clean-up and can only affect x86 performance. Patch 5 has neglectable effect on floating-point ARM performance. Thanks, Linfeng On Fri, Jun 2, 2017 at 11:26 AM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > I'll look into your patches. Can you let me know what's the expected > effect on performance (if any) for each of your patches? Also, are these > all the patches you inte...

[PATCH] Optimize silk_LPC_inverse_pred_gain() for ARM NEON

2017 Feb 15

[PATCH] Optimize silk_LPC_inverse_pred_gain() for ARM NEON

Hi Jean-Marc, (forgot cc opus@) Thanks for creating the unit test code. Attached is the updated optimization patch. On Mon, Feb 13, 2017 at 10:17 AM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > On 13/02/17 01:09 PM, Linfeng Zhang wrote: > > For 1), I agree that an explicit unit test would be a good plus to cover > > the cases that "make check" cannot trigger. If you like, we may submit > > an unit test patch for code review. > > Yes, please include a unit test that triggers the overfl...

2017 Apr 19

2 patches related to silk_biquad_alt() optimization

Hi, Attached are 2 patches related to silk_biquad_alt() optimization. Please review. Thanks, Linfeng Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170419/f08f5030/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-Optimize-silk_biquad_alt-for...

Antw: Re: celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 06

Antw: Re: celt_inner_prod() and dual_inner_prod() NEON intrinsics

>>> Linfeng Zhang <linfengz at google.com> schrieb am 06.06.2017 um 06:46 in Nachricht <CAKoqLCAfj+fDUMLfN4dLNSZ4NNAZpaSt_BWZRp+7XBqfhiSqiQ at mail.gmail.com>: > Hi Jean-Marc, > > I tried "==" before, and it failed when both results are 0.0. Maybe the > exponent or sign has d...

[PATCH] Optimize silk_LPC_inverse_pred_gain() for ARM NEON

2017 Feb 13

[PATCH] Optimize silk_LPC_inverse_pred_gain() for ARM NEON

Hi Jean-Marc, Yes I confirm that we have done the same internal review on this patch. For 1), I agree that an explicit unit test would be a good plus to cover the cases that "make check" cannot trigger. If you like, we may submit an unit test patch for code review. Thanks, Linfeng On Thu, Feb 9, 2017 at 4:48 PM Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > Can you confirm that you the patch went through the same internal review > (presumably from James) than the previous ones? > > I had a look and did some testing and it looked go...

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

2017 Feb 15

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

Hi, Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter(). And Patch 2 optimizes the new function celt_fir_permit_overflow() for ARM NEON. Please recommend a better function name. We did the same internal code review and testing already. Thanks, Linfeng -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170215/c44c8029/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-Optimize-celt_fir_permit_overflow-...

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 06

celt_inner_prod() and dual_inner_prod() NEON intrinsics

...and it failed when both results are 0.0. Maybe the exponent or sign has difference because of the different 0.0 representation in NEON. If anybody know how to handle this 0.0 comparison, that would be great. Or just use if(a==b || (a==0.0 && b==0.0)) ... but I haven't try this. Thanks, Linfeng On Mon, Jun 5, 2017 at 8:43 PM Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 05/06/17 03:31 PM, Linfeng Zhang wrote: > > Yes we'll have one more patch set related to xcorr in next week. Please > > don't wait if it's too late for 1.2 rel...

2017 Apr 25

2 patches related to silk_biquad_alt() optimization

...and I can remove the optimization of stride 1 case. If it's allowed to skip the split of A_Q28 and replace by 32-bit multiplication (result is 64-bit), probably it could be faster on NEON. This may change the encoder results because of different order of adding, shifting and rounding. Thanks, Linfeng On Wed, Apr 19, 2017 at 10:23 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > Thanks for the patches. I'll have a look and get back to you. What kind > of speedup are you getting for these functions? On what command line? > > Cheers, > >...

[PATCH] cosmetics,silk: correct input/output arg comments

2017 Apr 19

[PATCH] cosmetics,silk: correct input/output arg comments

Hi, Attached is a patch for cosmetics purpose. Please review. Thanks, Linfeng Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170419/34354707/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-cosmetics-silk-correct-input-outp...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...e to the entire encoder. Comparing to master, this optimization patch speeds up fixed-point SILK encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8% Complexity 8: 5.5% Complexity 10: 4.0% when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max MHz: 2116.5 Thanks, Linfeng On Wed, Apr 5, 2017 at 11:02 AM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > Thanks for the updated patch. I'll have a look and get back to you. When > you report speedup percentages, is that relative to the entire encoder > or relative to just that f...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...The new patch of silk_warped_autocorrelation_FIX_neon() has a code size of 3,228 bytes (with gcc). smaller_slower.c has a code size of 2,304 bytes, but the encoder is about 1.8% - 2.7% slower. smallest_slowest.c has a code size of 1,656 bytes, but the encoder is about 2.3% - 3.6% slower. Thanks, Linfeng On Mon, Apr 3, 2017 at 3:01 PM, Linfeng Zhang <linfengz at google.com> wrote: > Hi Jean-Marc, > > Attached is the silk_warped_autocorrelation_FIX_neon() which implements > your idea. > > Speed improvement vs the previous optimization: > > Complexity 0-4: Doesn't...

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 01

celt_inner_prod() and dual_inner_prod() NEON intrinsics

...nable small number or ratio. It's easy to create an input which 0 and 1,000 are both correct results by just manipulating the inner product order. The total speed gain is about 1.0% for fixed-point encoder, and 1.8% for floating-point encoder, in Complexity 8, tested on my Chromebook. Thanks, Linfeng -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170601/92c39072/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0005-Clean-celt_pitch_xcorr_float_neon....

Opus floating-point NEON jump table question

2017 Jun 01

Opus floating-point NEON jump table question

...about 14.7% on my Chromebook. Probably it's because many NEON intrinsics optimizations can benefit both fixed-point and floating-point encoder. So if it's safe enough to enable MAY_HAVE_NEON in floating-point by default, it could speed up floating-point NEON encoder a little bit. Thanks, Linfeng On Thu, Jun 1, 2017 at 2:22 PM, Jonathan Lennox <jonathan at vidyo.com> wrote: > > On May 31, 2017, at 12:47 PM, Linfeng Zhang <linfengz at google.com> wrote: > > Hi, > > ./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf > --disable-assertion...

[PATCH] Fix celt_pitch_xcorr ARM jump table compiling error

2017 Jul 21

[PATCH] Fix celt_pitch_xcorr ARM jump table compiling error

Hi, Attached is a fix related to ARM optimization jump table compiling error. Thanks, Linfeng Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170720/661d96b5/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Fix-celt_pitch_xcorr-ARM-jump-tab...

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 06

celt_inner_prod() and dual_inner_prod() NEON intrinsics

Hi Linfeng, On 06/06/17 04:09 PM, Jonathan Lennox wrote: > Two comments on the various infrastructure for RTCD etc. > > 1. The 0002- patch changes the ABI of the celt_pitch_xcorr functions, > but doesn’t change the assembly in celt/arm/celt_pitch_xcorr_arm.s > correspondingly. I suspect the...

[PATCH] Optimize silk_LPC_inverse_pred_gain() for ARM NEON

2017 Feb 07

[PATCH] Optimize silk_LPC_inverse_pred_gain() for ARM NEON

Hi, Attached is a patch with arm neon optimizations for silk_LPC_inverse_pred_gain(). Please review. Thanks, Linfeng -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170207/5c5ab508/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Optimize-silk_LPC_inverse_pred_gai...

[PATCH] Optimize silk_LPC_inverse_pred_gain() for ARM NEON

2017 Feb 15

[PATCH] Optimize silk_LPC_inverse_pred_gain() for ARM NEON

Hi Linfeng, Thanks for the updated patch. Just pushed it to master. One thing that still bothers me a bit is that the if( ( max > 0 ) || ( min < -1 ) ) line is still pretty much untested. By which I mean that if I remove the condition, then the tests (including the new unit tests) still pass. I wasn...

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 06

celt_inner_prod() and dual_inner_prod() NEON intrinsics

...s not IEEE 754-compliant here. :) Since x[i] == y[i] in both cases, they are actually calculating the energy. (-1.000134e-22 * -1.000134e-22) is larger than the smallest single-precision number and should be represented as none-zero (such as 0x8). I don't know why NEON gives 0 result. Thanks, Linfeng On Tue, Jun 6, 2017 at 12:03 AM, Ulrich Windl <Ulrich.Windl at rz.uni-regensbur g.de> wrote: > >>> Linfeng Zhang <linfengz at google.com> schrieb am 06.06.2017 um 06:46 in > Nachricht > <CAKoqLCAfj+fDUMLfN4dLNSZ4NNAZpaSt_BWZRp+7XBqfhiSqiQ at mail.gmail.com>: &g...

2017 Apr 24

2 patches related to silk_biquad_alt() optimization

Hi Ulrich, As Jean-mark recommended, we created "--enable-check-asm" config option to active OPUS_CHECK_ASM macros in the optimization, where the C function is called inside and the results of C and optimization functions are compared when encoding/decoding the real audio files. Thanks, Linfeng On Wed, Apr 19, 2017 at 11:46 PM, Ulrich Windl < Ulrich.Windl at rz.uni-regensburg.de> wrote: > >>> Linfeng Zhang <linfengz at google.com> schrieb am 19.04.2017 um 18:29 in > Nachricht > <CAKoqLCDX3eCUGbnZFvRzhiCV1Mbo2ksbj8K+pcVu60Dvit7WCQ at mail.gmail.com>: &...

search for: linfeng