Displaying 20 results from an estimated 400 matches similar to: "2 patches related to silk_biquad_alt() optimization"
2017 Apr 24
2
2 patches related to silk_biquad_alt() optimization
Hi Ulrich,
As Jean-mark recommended, we created "--enable-check-asm" config option to
active OPUS_CHECK_ASM macros in the optimization, where the C function is
called inside and the results of C and optimization functions are compared
when encoding/decoding the real audio files.
Thanks,
Linfeng
On Wed, Apr 19, 2017 at 11:46 PM, Ulrich Windl <
Ulrich.Windl at
2017 Apr 25
0
Antw: Re: 2 patches related to silk_biquad_alt() optimization
>>> Linfeng Zhang <linfengz at google.com> schrieb am 25.04.2017 um 01:52 in Nachricht
<CAKoqLCDvAk7eeS-gpmqSHVxp4t-Lzzw7TLo5rRo=Ey_Q==cxGg at mail.gmail.com>:
> Hi Ulrich,
>
> As Jean-mark recommended, we created "--enable-check-asm" config option to
> active OPUS_CHECK_ASM macros in the optimization, where the C function is
> called inside and the
2017 Apr 20
0
Antw: 2 patches related to silk_biquad_alt() optimization
>>> Linfeng Zhang <linfengz at google.com> schrieb am 19.04.2017 um 18:29 in Nachricht
<CAKoqLCDX3eCUGbnZFvRzhiCV1Mbo2ksbj8K+pcVu60Dvit7WCQ at mail.gmail.com>:
> Hi,
>
> Attached are 2 patches related to silk_biquad_alt() optimization. Please
> review.
Out of curiosity: How do you test "The optimization is bit exact with C function."? Use one example,
2017 Apr 25
2
2 patches related to silk_biquad_alt() optimization
Hi Jean-Marc,
Tested on my chromebook, when stride (channel) == 1, the optimization has
no gain compared with C function.
When stride (channel) == 2, the optimization is 1.2%-1.8% faster (1.6% at
Complexity 8) compared with C function.
Please let me know and I can remove the optimization of stride 1 case.
If it's allowed to skip the split of A_Q28 and replace by 32-bit
multiplication
2017 Apr 19
3
[PATCH] cosmetics,silk: correct input/output arg comments
Hi,
Attached is a patch for cosmetics purpose. Please review.
Thanks,
Linfeng Zhang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170419/34354707/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-cosmetics-silk-correct-input-output-arg-comments.patch
2017 May 15
2
2 patches related to silk_biquad_alt() optimization
Hi Linfeng,
Sorry for the delay -- I was actually trying to think of the best option
here. For now, my preference would be to keep things bit-exact, but
should there be more similar optimizations relying on 64-bit
multiplication results, then we could consider having a special option
to enable those (even in C).
Cheers,
Jean-Marc
On 08/05/17 12:12 PM, Linfeng Zhang wrote:
> Ping for
2017 Apr 26
2
2 patches related to silk_biquad_alt() optimization
On Tue, Apr 25, 2017 at 10:31 PM, Jean-Marc Valin <jmvalin at jmvalin.ca>
wrote:
>
> > A_Q28 is split to 2 14-bit (or 16-bit, whatever) integers, to make the
> > multiplication operation within 32-bits. NEON can do 32-bit x 32-bit =
> > 64-bit using 'int64x2_t vmull_s32(int32x2_t a, int32x2_t b)', and it
> > could possibly be faster and less
2017 Apr 25
0
2 patches related to silk_biquad_alt() optimization
On 24/04/17 08:03 PM, Linfeng Zhang wrote:
> Tested on my chromebook, when stride (channel) == 1, the optimization
> has no gain compared with C function.
You mean that the Neon code is the same speed as the C code for
stride==1? This is not terribly surprising for an IIRC filter.
> When stride (channel) == 2, the optimization is 1.2%-1.8% faster (1.6%
> at Complexity 8) compared
2017 Apr 20
0
2 patches related to silk_biquad_alt() optimization
Hi Linfeng,
Thanks for the patches. I'll have a look and get back to you. What kind
of speedup are you getting for these functions? On what command line?
Cheers,
Jean-Marc
On 19/04/17 12:29 PM, Linfeng Zhang wrote:
> Hi,
>
> Attached are 2 patches related to silk_biquad_alt() optimization. Please
> review.
>
> Thanks,
> Linfeng Zhang
>
>
>
>
2017 May 17
0
2 patches related to silk_biquad_alt() optimization
Hi Jean-Marc,
Thanks!
Please find the 2 updated patches which only optimize stride 2 case and
keep the bit exactness. They have passed our internal tests as usual.
Thanks,
Linfeng
On Mon, May 15, 2017 at 9:36 AM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
> Hi Linfeng,
>
> Sorry for the delay -- I was actually trying to think of the best option
> here. For now, my
2017 Apr 25
2
2 patches related to silk_biquad_alt() optimization
On Mon, Apr 24, 2017 at 5:52 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
> On 24/04/17 08:03 PM, Linfeng Zhang wrote:
> > Tested on my chromebook, when stride (channel) == 1, the optimization
> > has no gain compared with C function.
>
> You mean that the Neon code is the same speed as the C code for
> stride==1? This is not terribly surprising for an IIRC
2017 Jun 06
3
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Linfeng,
On 05/06/17 03:31 PM, Linfeng Zhang wrote:
> Yes we'll have one more patch set related to xcorr in next week. Please
> don't wait if it's too late for 1.2 release.
Assuming there's no issue with the patches, next week isn't too late.
Also, I've started looking at your patches. So far there's one thing
that puzzles me a bit. In the OPUS_CHECK_ASM
2017 Jun 06
4
Antw: Re: celt_inner_prod() and dual_inner_prod() NEON intrinsics
>>> Linfeng Zhang <linfengz at google.com> schrieb am 06.06.2017 um 06:46 in Nachricht
<CAKoqLCAfj+fDUMLfN4dLNSZ4NNAZpaSt_BWZRp+7XBqfhiSqiQ at mail.gmail.com>:
> Hi Jean-Marc,
>
> I tried "==" before, and it failed when both results are 0.0. Maybe the
> exponent or sign has difference because of the different 0.0 representation
> in NEON. If anybody
2017 Jun 01
2
Opus floating-point NEON jump table question
Thank Jean-Mark and Jonathan!
I tested current OPUS encoder in floating-point with Complexity 8. Hacking
using the attached patch (which will generate "#define
OPUS_ARM_MAY_HAVE_NEON 1" in config.h) will speed up about 14.7% on my
Chromebook. Probably it's because many NEON intrinsics optimizations can
benefit both fixed-point and floating-point encoder.
So if it's safe enough
2017 May 31
4
Opus floating-point NEON jump table question
Hi,
./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf
--disable-assertions --disable-check-asm --enable-intrinsics CFLAGS=-O3
--disable-shared
When configuring with floating-point and intrinsics enabled as above, the
generated config.h only has OPUS_ARM_MAY_HAVE_NEON_INTR defined (to 1), with
/* #undef OPUS_ARM_ASM */
/* #undef OPUS_ARM_INLINE_ASM */
/* #undef
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Thank Jean-Marc!
The speedup percentages are all relative to the entire encoder.
Comparing to master, this optimization patch speeds up fixed-point SILK
encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8%
Complexity 8: 5.5% Complexity 10: 4.0%
when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max
MHz: 2116.5
Thanks,
Linfeng
On Wed, Apr 5, 2017 at 11:02 AM,
2017 May 08
0
2 patches related to silk_biquad_alt() optimization
Ping for comments.
Thanks,
Linfeng
On Wed, Apr 26, 2017 at 2:15 PM, Linfeng Zhang <linfengz at google.com> wrote:
> On Tue, Apr 25, 2017 at 10:31 PM, Jean-Marc Valin <jmvalin at jmvalin.ca>
> wrote:
>
>>
>> > A_Q28 is split to 2 14-bit (or 16-bit, whatever) integers, to make the
>> > multiplication operation within 32-bits. NEON can do 32-bit x
2017 Jun 05
4
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Jean-Marc,
I attached the new version in inner_prod_5patches_v2.zip which synced to
the current master.
For fixed-point ARM, only 0003-Optimize-fixed-point-celt
_inner_prod-and-dual_inner_.patch changes the performance.
For floating-point ARM, only 0004-Optimize-floating-point-c
elt_inner_prod-and-dual_inn.patch changes the performance.
Patch 1 and 2 are code clean-up and can only affect x86
2017 Jun 06
2
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Linfeng,
On 06/06/17 04:09 PM, Jonathan Lennox wrote:
> Two comments on the various infrastructure for RTCD etc.
>
> 1. The 0002- patch changes the ABI of the celt_pitch_xcorr functions,
> but doesn’t change the assembly in celt/arm/celt_pitch_xcorr_arm.s
> correspondingly. I suspect the ‘arch’ parameter can just be ignored
> by the assembly functions, but at least the
2017 Apr 26
0
2 patches related to silk_biquad_alt() optimization
On 25/04/17 01:37 PM, Linfeng Zhang wrote:
> Is that gain due to Neon or simply due to computing two channels in
> parallel? For example, if you make a special case in the C code to
> handle both channels in the same loop, what kind of performance do
> you get?
>
>
> Tested Complexity 8, it's half half, i.e., 0.8% faster if handling both
> channels in