search for: linfengz

Displaying 20 results from an estimated 58 matches for "linfengz".

Did you mean: linfeng
2017 Jun 06
3
celt_inner_prod() and dual_inner_prod() NEON intrinsics
...loat patch actually bit-exact? If so, then maybe you should be using actual equality. If not, then I guess we need to find the right condition (which isn't obvious for floating point). Cheers, Jean-Marc > Thanks, > Linfeng > > On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang <linfengz at google.com > <mailto:linfengz at google.com>> wrote: > > Hi Jean-Marc, > > I attached the new version in inner_prod_5patches_v2.zip which > synced to the current master. > > For fixed-point ARM, only > 0003-Optimize-fixed-point-celt_inne...
2017 Jun 01
2
Opus floating-point NEON jump table question
...if it's safe enough to enable MAY_HAVE_NEON in floating-point by default, it could speed up floating-point NEON encoder a little bit. Thanks, Linfeng On Thu, Jun 1, 2017 at 2:22 PM, Jonathan Lennox <jonathan at vidyo.com> wrote: > > On May 31, 2017, at 12:47 PM, Linfeng Zhang <linfengz at google.com> wrote: > > Hi, > > ./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf > --disable-assertions --disable-check-asm --enable-intrinsics CFLAGS=-O3 > --disable-shared > > When configuring with floating-point and intrinsics enabled as ab...
2017 Jun 06
4
Antw: Re: celt_inner_prod() and dual_inner_prod() NEON intrinsics
>>> Linfeng Zhang <linfengz at google.com> schrieb am 06.06.2017 um 06:46 in Nachricht <CAKoqLCAfj+fDUMLfN4dLNSZ4NNAZpaSt_BWZRp+7XBqfhiSqiQ at mail.gmail.com>: > Hi Jean-Marc, > > I tried "==" before, and it failed when both results are 0.0. Maybe the > exponent or sign has difference because o...
2017 Jun 01
0
Opus floating-point NEON jump table question
...ally that silk/arm/arm_silk_map.c uses the MAY_HAVE_NEON macro, which it shouldn’t be using. If that file were changed so that the jump tables just listed the _neon versions of the functions directly, you’d get the speedup you’re looking for. On Jun 1, 2017, at 6:03 PM, Linfeng Zhang <linfengz at google.com<mailto:linfengz at google.com>> wrote: Thank Jean-Mark and Jonathan! I tested current OPUS encoder in floating-point with Complexity 8. Hacking using the attached patch (which will generate "#define OPUS_ARM_MAY_HAVE_NEON 1" in config.h) will speed up about 14.7%...
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...2,304 bytes, but the encoder is > > about 1.8% - 2.7% slower. > > smallest_slowest.c has a code size of 1,656 bytes, but the encoder is > > about 2.3% - 3.6% slower. > > > > Thanks, > > Linfeng > > > > On Mon, Apr 3, 2017 at 3:01 PM, Linfeng Zhang <linfengz at google.com > > <mailto:linfengz at google.com>> wrote: > > > > Hi Jean-Marc, > > > > Attached is the silk_warped_autocorrelation_FIX_neon() which > > implements your idea. > > > > Speed improvement vs the previous optimizat...
2017 May 31
4
Opus floating-point NEON jump table question
Hi, ./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf --disable-assertions --disable-check-asm --enable-intrinsics CFLAGS=-O3 --disable-shared When configuring with floating-point and intrinsics enabled as above, the generated config.h only has OPUS_ARM_MAY_HAVE_NEON_INTR defined (to 1), with /* #undef OPUS_ARM_ASM */ /* #undef OPUS_ARM_INLINE_ASM */
2017 Apr 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...> > about 1.8% - 2.7% slower. > > smallest_slowest.c has a code size of 1,656 bytes, but the encoder is > > about 2.3% - 3.6% slower. > > > > Thanks, > > Linfeng > > > > On Mon, Apr 3, 2017 at 3:01 PM, Linfeng Zhang <linfengz at google.com <mailto:linfengz at google.com> > > <mailto:linfengz at google.com <mailto:linfengz at google.com>>> wrote: > > > > Hi Jean-Marc, > > > > Attached is the silk_warped_autocorrelation_FIX_neon() which >...
2017 Apr 24
2
2 patches related to silk_biquad_alt() optimization
...re the C function is called inside and the results of C and optimization functions are compared when encoding/decoding the real audio files. Thanks, Linfeng On Wed, Apr 19, 2017 at 11:46 PM, Ulrich Windl < Ulrich.Windl at rz.uni-regensburg.de> wrote: > >>> Linfeng Zhang <linfengz at google.com> schrieb am 19.04.2017 um 18:29 in > Nachricht > <CAKoqLCDX3eCUGbnZFvRzhiCV1Mbo2ksbj8K+pcVu60Dvit7WCQ at mail.gmail.com>: > > Hi, > > > > Attached are 2 patches related to silk_biquad_alt() optimization. Please > > review. > > Out of curios...
2017 May 15
2
2 patches related to silk_biquad_alt() optimization
...n 64-bit multiplication results, then we could consider having a special option to enable those (even in C). Cheers, Jean-Marc On 08/05/17 12:12 PM, Linfeng Zhang wrote: > Ping for comments. > > Thanks, > Linfeng > > On Wed, Apr 26, 2017 at 2:15 PM, Linfeng Zhang <linfengz at google.com > <mailto:linfengz at google.com>> wrote: > > On Tue, Apr 25, 2017 at 10:31 PM, Jean-Marc Valin > <jmvalin at jmvalin.ca <mailto:jmvalin at jmvalin.ca>> wrote: > > > > A_Q28 is split to 2 14-bit (or 16-bit, whatever) inte...
2017 Apr 19
3
[PATCH] cosmetics,silk: correct input/output arg comments
Hi, Attached is a patch for cosmetics purpose. Please review. Thanks, Linfeng Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170419/34354707/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name:
2017 Jun 06
2
celt_inner_prod() and dual_inner_prod() NEON intrinsics
...e no further issues with your patches, so once you address the two issues Jonathan pointed out, I'll be able to merge them. Cheers, Jean-Marc > > Out of curiosity, what’s the CPU in the Chromebook you’re using to > test? > >> On Jun 1, 2017, at 6:33 PM, Linfeng Zhang <linfengz at google.com> >> wrote: >> >> Hi, >> >> Attached are 5 patches related to celt_inner_prod() and >> dual_inner_prod() NEON intrinsics optimization. >> >> In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch, >> the optimizati...
2017 Apr 05
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...as a code size of 3,228 bytes (with gcc). smaller_slower.c has a code size of 2,304 bytes, but the encoder is about 1.8% - 2.7% slower. smallest_slowest.c has a code size of 1,656 bytes, but the encoder is about 2.3% - 3.6% slower. Thanks, Linfeng On Mon, Apr 3, 2017 at 3:01 PM, Linfeng Zhang <linfengz at google.com> wrote: > Hi Jean-Marc, > > Attached is the silk_warped_autocorrelation_FIX_neon() which implements > your idea. > > Speed improvement vs the previous optimization: > > Complexity 0-4: Doesn't call this function. Complexity 5: 2.1% (order = > 16) Com...
2017 Apr 25
0
Antw: Re: 2 patches related to silk_biquad_alt() optimization
>>> Linfeng Zhang <linfengz at google.com> schrieb am 25.04.2017 um 01:52 in Nachricht <CAKoqLCDvAk7eeS-gpmqSHVxp4t-Lzzw7TLo5rRo=Ey_Q==cxGg at mail.gmail.com>: > Hi Ulrich, > > As Jean-mark recommended, we created "--enable-check-asm" config option to > active OPUS_CHECK_ASM macros in the optim...
2017 Jun 01
0
Opus floating-point NEON jump table question
On May 31, 2017, at 12:47 PM, Linfeng Zhang <linfengz at google.com<mailto:linfengz at google.com>> wrote: Hi, ./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf --disable-assertions --disable-check-asm --enable-intrinsics CFLAGS=-O3 --disable-shared When configuring with floating-point and intrinsics enabled as ab...
2017 Jun 06
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
...larger than the smallest single-precision number and should be represented as none-zero (such as 0x8). I don't know why NEON gives 0 result. Thanks, Linfeng On Tue, Jun 6, 2017 at 12:03 AM, Ulrich Windl <Ulrich.Windl at rz.uni-regensbur g.de> wrote: > >>> Linfeng Zhang <linfengz at google.com> schrieb am 06.06.2017 um 06:46 in > Nachricht > <CAKoqLCAfj+fDUMLfN4dLNSZ4NNAZpaSt_BWZRp+7XBqfhiSqiQ at mail.gmail.com>: > > Hi Jean-Marc, > > > > I tried "==" before, and it failed when both results are 0.0. Maybe the > > exponent or...
2017 Apr 05
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...; smaller_slower.c has a code size of 2,304 bytes, but the encoder is > about 1.8% - 2.7% slower. > smallest_slowest.c has a code size of 1,656 bytes, but the encoder is > about 2.3% - 3.6% slower. > > Thanks, > Linfeng > > On Mon, Apr 3, 2017 at 3:01 PM, Linfeng Zhang <linfengz at google.com > <mailto:linfengz at google.com>> wrote: > > Hi Jean-Marc, > > Attached is the silk_warped_autocorrelation_FIX_neon() which > implements your idea. > > Speed improvement vs the previous optimization: > > Complexity 0-4: D...
2017 Apr 11
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc, Thanks for your suggestions! I attached the new patch, with inlined reply below. Thanks, Linfeng On Thu, Apr 6, 2017 at 12:55 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > I did some profiling on a Cortex A57 and I've been seeing slightly less > improvement than you're reporting, more like 3.5% at complexity 8. It > appears that the warped
2017 Jun 06
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
...ld be using actual equality. If not, then I guess we need to find > the right condition (which isn't obvious for floating point). > > Cheers, > > Jean-Marc > > > > Thanks, > > Linfeng > > > > On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang <linfengz at google.com > > <mailto:linfengz at google.com>> wrote: > > > > Hi Jean-Marc, > > > > I attached the new version in inner_prod_5patches_v2.zip which > > synced to the current master. > > > > For fixed-point ARM, only > &g...
2017 Apr 19
4
2 patches related to silk_biquad_alt() optimization
Hi, Attached are 2 patches related to silk_biquad_alt() optimization. Please review. Thanks, Linfeng Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170419/f08f5030/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name:
2017 Jun 02
2
Opus floating-point NEON jump table question
...rm_silk_map.c uses > the MAY_HAVE_NEON macro, which it shouldn’t be using. If that file were > changed so that the jump tables just listed the _neon versions of the > functions directly, you’d get the speedup you’re looking for. > > > On Jun 1, 2017, at 6:03 PM, Linfeng Zhang <linfengz at google.com> wrote: > > Thank Jean-Mark and Jonathan! > > I tested current OPUS encoder in floating-point with Complexity 8. Hacking > using the attached patch (which will generate "#define > OPUS_ARM_MAY_HAVE_NEON 1" in config.h) will speed up about 14.7% on my &gt...