thr3ads.net - search: "linfengz"

Displaying 20 results from an estimated 58 matches for "linfengz".

Did you mean: linfeng

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 06

celt_inner_prod() and dual_inner_prod() NEON intrinsics

...loat patch actually bit-exact? If so, then maybe you should be using actual equality. If not, then I guess we need to find the right condition (which isn't obvious for floating point). Cheers, Jean-Marc > Thanks, > Linfeng > > On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang <linfengz at google.com > <mailto:linfengz at google.com>> wrote: > > Hi Jean-Marc, > > I attached the new version in inner_prod_5patches_v2.zip which > synced to the current master. > > For fixed-point ARM, only > 0003-Optimize-fixed-point-celt_inne...

Opus floating-point NEON jump table question

2017 Jun 01

Opus floating-point NEON jump table question

...if it's safe enough to enable MAY_HAVE_NEON in floating-point by default, it could speed up floating-point NEON encoder a little bit. Thanks, Linfeng On Thu, Jun 1, 2017 at 2:22 PM, Jonathan Lennox <jonathan at vidyo.com> wrote: > > On May 31, 2017, at 12:47 PM, Linfeng Zhang <linfengz at google.com> wrote: > > Hi, > > ./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf > --disable-assertions --disable-check-asm --enable-intrinsics CFLAGS=-O3 > --disable-shared > > When configuring with floating-point and intrinsics enabled as above,...

Antw: Re: celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 06

Antw: Re: celt_inner_prod() and dual_inner_prod() NEON intrinsics

>>> Linfeng Zhang <linfengz at google.com> schrieb am 06.06.2017 um 06:46 in Nachricht <CAKoqLCAfj+fDUMLfN4dLNSZ4NNAZpaSt_BWZRp+7XBqfhiSqiQ at mail.gmail.com>: > Hi Jean-Marc, > > I tried "==" before, and it failed when both results are 0.0. Maybe the > exponent or sign has difference because o...

Opus floating-point NEON jump table question

2017 Jun 01

Opus floating-point NEON jump table question

...is actually that silk/arm/arm_silk_map.c uses the MAY_HAVE_NEON macro, which it shouldn’t be using. If that file were changed so that the jump tables just listed the _neon versions of the functions directly, you’d get the speedup you’re looking for. On Jun 1, 2017, at 6:03 PM, Linfeng Zhang <linfengz at google.com<mailto:linfengz at google.com>> wrote: Thank Jean-Mark and Jonathan! I tested current OPUS encoder in floating-point with Complexity 8. Hacking using the attached patch (which will generate "#define OPUS_ARM_MAY_HAVE_NEON 1" in config.h) will speed up about 14.7%...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...2,304 bytes, but the encoder is > > about 1.8% - 2.7% slower. > > smallest_slowest.c has a code size of 1,656 bytes, but the encoder is > > about 2.3% - 3.6% slower. > > > > Thanks, > > Linfeng > > > > On Mon, Apr 3, 2017 at 3:01 PM, Linfeng Zhang <linfengz at google.com > > <mailto:linfengz at google.com>> wrote: > > > > Hi Jean-Marc, > > > > Attached is the silk_warped_autocorrelation_FIX_neon() which > > implements your idea. > > > > Speed improvement vs the previous optimizat...

Opus floating-point NEON jump table question

2017 May 31

Opus floating-point NEON jump table question

Hi, ./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf --disable-assertions --disable-check-asm --enable-intrinsics CFLAGS=-O3 --disable-shared When configuring with floating-point and intrinsics enabled as above, the generated config.h only has OPUS_ARM_MAY_HAVE_NEON_INTR defined (to 1), with /* #undef OPUS_ARM_ASM */ /* #undef OPUS_ARM_INLINE_ASM */ /* #undef

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 06

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...> > about 1.8% - 2.7% slower. > > smallest_slowest.c has a code size of 1,656 bytes, but the encoder is > > about 2.3% - 3.6% slower. > > > > Thanks, > > Linfeng > > > > On Mon, Apr 3, 2017 at 3:01 PM, Linfeng Zhang <linfengz at google.com <mailto:linfengz at google.com> > > <mailto:linfengz at google.com <mailto:linfengz at google.com>>> wrote: > > > > Hi Jean-Marc, > > > > Attached is the silk_warped_autocorrelation_FIX_neon() which >...

2017 Apr 24

2 patches related to silk_biquad_alt() optimization

...where the C function is called inside and the results of C and optimization functions are compared when encoding/decoding the real audio files. Thanks, Linfeng On Wed, Apr 19, 2017 at 11:46 PM, Ulrich Windl < Ulrich.Windl at rz.uni-regensburg.de> wrote: > >>> Linfeng Zhang <linfengz at google.com> schrieb am 19.04.2017 um 18:29 in > Nachricht > <CAKoqLCDX3eCUGbnZFvRzhiCV1Mbo2ksbj8K+pcVu60Dvit7WCQ at mail.gmail.com>: > > Hi, > > > > Attached are 2 patches related to silk_biquad_alt() optimization. Please > > review. > > Out of curios...

2017 May 15

2 patches related to silk_biquad_alt() optimization

...elying on 64-bit multiplication results, then we could consider having a special option to enable those (even in C). Cheers, Jean-Marc On 08/05/17 12:12 PM, Linfeng Zhang wrote: > Ping for comments. > > Thanks, > Linfeng > > On Wed, Apr 26, 2017 at 2:15 PM, Linfeng Zhang <linfengz at google.com > <mailto:linfengz at google.com>> wrote: > > On Tue, Apr 25, 2017 at 10:31 PM, Jean-Marc Valin > <jmvalin at jmvalin.ca <mailto:jmvalin at jmvalin.ca>> wrote: > > > > A_Q28 is split to 2 14-bit (or 16-bit, whatever) inte...

[PATCH] cosmetics,silk: correct input/output arg comments

2017 Apr 19

[PATCH] cosmetics,silk: correct input/output arg comments

Hi, Attached is a patch for cosmetics purpose. Please review. Thanks, Linfeng Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170419/34354707/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-cosmetics-silk-correct-input-output-arg-comments.patch

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 06

celt_inner_prod() and dual_inner_prod() NEON intrinsics

...e no further issues with your patches, so once you address the two issues Jonathan pointed out, I'll be able to merge them. Cheers, Jean-Marc > > Out of curiosity, what’s the CPU in the Chromebook you’re using to > test? > >> On Jun 1, 2017, at 6:33 PM, Linfeng Zhang <linfengz at google.com> >> wrote: >> >> Hi, >> >> Attached are 5 patches related to celt_inner_prod() and >> dual_inner_prod() NEON intrinsics optimization. >> >> In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch, >> the optimizati...

Antw: Re: 2 patches related to silk_biquad_alt() optimization

2017 Apr 25

Antw: Re: 2 patches related to silk_biquad_alt() optimization

>>> Linfeng Zhang <linfengz at google.com> schrieb am 25.04.2017 um 01:52 in Nachricht <CAKoqLCDvAk7eeS-gpmqSHVxp4t-Lzzw7TLo5rRo=Ey_Q==cxGg at mail.gmail.com>: > Hi Ulrich, > > As Jean-mark recommended, we created "--enable-check-asm" config option to > active OPUS_CHECK_ASM macros in the optim...

Opus floating-point NEON jump table question

2017 Jun 01

Opus floating-point NEON jump table question

On May 31, 2017, at 12:47 PM, Linfeng Zhang <linfengz at google.com<mailto:linfengz at google.com>> wrote: Hi, ./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf --disable-assertions --disable-check-asm --enable-intrinsics CFLAGS=-O3 --disable-shared When configuring with floating-point and intrinsics enabled as above,...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...as a code size of 3,228 bytes (with gcc). smaller_slower.c has a code size of 2,304 bytes, but the encoder is about 1.8% - 2.7% slower. smallest_slowest.c has a code size of 1,656 bytes, but the encoder is about 2.3% - 3.6% slower. Thanks, Linfeng On Mon, Apr 3, 2017 at 3:01 PM, Linfeng Zhang <linfengz at google.com> wrote: > Hi Jean-Marc, > > Attached is the silk_warped_autocorrelation_FIX_neon() which implements > your idea. > > Speed improvement vs the previous optimization: > > Complexity 0-4: Doesn't call this function. Complexity 5: 2.1% (order = > 16) Com...

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 06

celt_inner_prod() and dual_inner_prod() NEON intrinsics

...larger than the smallest single-precision number and should be represented as none-zero (such as 0x8). I don't know why NEON gives 0 result. Thanks, Linfeng On Tue, Jun 6, 2017 at 12:03 AM, Ulrich Windl <Ulrich.Windl at rz.uni-regensbur g.de> wrote: > >>> Linfeng Zhang <linfengz at google.com> schrieb am 06.06.2017 um 06:46 in > Nachricht > <CAKoqLCAfj+fDUMLfN4dLNSZ4NNAZpaSt_BWZRp+7XBqfhiSqiQ at mail.gmail.com>: > > Hi Jean-Marc, > > > > I tried "==" before, and it failed when both results are 0.0. Maybe the > > exponent or...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...; smaller_slower.c has a code size of 2,304 bytes, but the encoder is > about 1.8% - 2.7% slower. > smallest_slowest.c has a code size of 1,656 bytes, but the encoder is > about 2.3% - 3.6% slower. > > Thanks, > Linfeng > > On Mon, Apr 3, 2017 at 3:01 PM, Linfeng Zhang <linfengz at google.com > <mailto:linfengz at google.com>> wrote: > > Hi Jean-Marc, > > Attached is the silk_warped_autocorrelation_FIX_neon() which > implements your idea. > > Speed improvement vs the previous optimization: > > Complexity 0-4: D...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 11

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Jean-Marc, Thanks for your suggestions! I attached the new patch, with inlined reply below. Thanks, Linfeng On Thu, Apr 6, 2017 at 12:55 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > I did some profiling on a Cortex A57 and I've been seeing slightly less > improvement than you're reporting, more like 3.5% at complexity 8. It > appears that the warped

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 06

celt_inner_prod() and dual_inner_prod() NEON intrinsics

...ld be using actual equality. If not, then I guess we need to find > the right condition (which isn't obvious for floating point). > > Cheers, > > Jean-Marc > > > > Thanks, > > Linfeng > > > > On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang <linfengz at google.com > > <mailto:linfengz at google.com>> wrote: > > > > Hi Jean-Marc, > > > > I attached the new version in inner_prod_5patches_v2.zip which > > synced to the current master. > > > > For fixed-point ARM, only > &g...

2017 Apr 19

2 patches related to silk_biquad_alt() optimization

Hi, Attached are 2 patches related to silk_biquad_alt() optimization. Please review. Thanks, Linfeng Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170419/f08f5030/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name:

Opus floating-point NEON jump table question

2017 Jun 02

Opus floating-point NEON jump table question

...rm_silk_map.c uses > the MAY_HAVE_NEON macro, which it shouldn’t be using. If that file were > changed so that the jump tables just listed the _neon versions of the > functions directly, you’d get the speedup you’re looking for. > > > On Jun 1, 2017, at 6:03 PM, Linfeng Zhang <linfengz at google.com> wrote: > > Thank Jean-Mark and Jonathan! > > I tested current OPUS encoder in floating-point with Complexity 8. Hacking > using the attached patch (which will generate "#define > OPUS_ARM_MAY_HAVE_NEON 1" in config.h) will speed up about 14.7% on my &gt...

search for: linfengz