Displaying 20 results from an estimated 58 matches for "linfengz".
Did you mean:
linfeng
2017 Jun 06
3
celt_inner_prod() and dual_inner_prod() NEON intrinsics
...loat patch actually bit-exact? If so, then maybe you
should be using actual equality. If not, then I guess we need to find
the right condition (which isn't obvious for floating point).
Cheers,
Jean-Marc
> Thanks,
> Linfeng
>
> On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang <linfengz at google.com
> <mailto:linfengz at google.com>> wrote:
>
> Hi Jean-Marc,
>
> I attached the new version in inner_prod_5patches_v2.zip which
> synced to the current master.
>
> For fixed-point ARM, only
> 0003-Optimize-fixed-point-celt_inne...
2017 Jun 01
2
Opus floating-point NEON jump table question
...if it's safe enough to enable MAY_HAVE_NEON in floating-point by
default, it could speed up floating-point NEON encoder a little bit.
Thanks,
Linfeng
On Thu, Jun 1, 2017 at 2:22 PM, Jonathan Lennox <jonathan at vidyo.com> wrote:
>
> On May 31, 2017, at 12:47 PM, Linfeng Zhang <linfengz at google.com> wrote:
>
> Hi,
>
> ./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf
> --disable-assertions --disable-check-asm --enable-intrinsics CFLAGS=-O3
> --disable-shared
>
> When configuring with floating-point and intrinsics enabled as above,...
2017 Jun 06
4
Antw: Re: celt_inner_prod() and dual_inner_prod() NEON intrinsics
>>> Linfeng Zhang <linfengz at google.com> schrieb am 06.06.2017 um 06:46 in Nachricht
<CAKoqLCAfj+fDUMLfN4dLNSZ4NNAZpaSt_BWZRp+7XBqfhiSqiQ at mail.gmail.com>:
> Hi Jean-Marc,
>
> I tried "==" before, and it failed when both results are 0.0. Maybe the
> exponent or sign has difference because o...
2017 Jun 01
0
Opus floating-point NEON jump table question
...is actually that silk/arm/arm_silk_map.c uses the MAY_HAVE_NEON macro, which it shouldn’t be using. If that file were changed so that the jump tables just listed the _neon versions of the functions directly, you’d get the speedup you’re looking for.
On Jun 1, 2017, at 6:03 PM, Linfeng Zhang <linfengz at google.com<mailto:linfengz at google.com>> wrote:
Thank Jean-Mark and Jonathan!
I tested current OPUS encoder in floating-point with Complexity 8. Hacking using the attached patch (which will generate "#define OPUS_ARM_MAY_HAVE_NEON 1" in config.h) will speed up about 14.7%...
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...2,304 bytes, but the encoder is
> > about 1.8% - 2.7% slower.
> > smallest_slowest.c has a code size of 1,656 bytes, but the encoder is
> > about 2.3% - 3.6% slower.
> >
> > Thanks,
> > Linfeng
> >
> > On Mon, Apr 3, 2017 at 3:01 PM, Linfeng Zhang <linfengz at google.com
> > <mailto:linfengz at google.com>> wrote:
> >
> > Hi Jean-Marc,
> >
> > Attached is the silk_warped_autocorrelation_FIX_neon() which
> > implements your idea.
> >
> > Speed improvement vs the previous optimizat...
2017 May 31
4
Opus floating-point NEON jump table question
Hi,
./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf
--disable-assertions --disable-check-asm --enable-intrinsics CFLAGS=-O3
--disable-shared
When configuring with floating-point and intrinsics enabled as above, the
generated config.h only has OPUS_ARM_MAY_HAVE_NEON_INTR defined (to 1), with
/* #undef OPUS_ARM_ASM */
/* #undef OPUS_ARM_INLINE_ASM */
/* #undef
2017 Apr 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...> > about 1.8% - 2.7% slower.
> > smallest_slowest.c has a code size of 1,656 bytes, but the encoder is
> > about 2.3% - 3.6% slower.
> >
> > Thanks,
> > Linfeng
> >
> > On Mon, Apr 3, 2017 at 3:01 PM, Linfeng Zhang <linfengz at google.com <mailto:linfengz at google.com>
> > <mailto:linfengz at google.com <mailto:linfengz at google.com>>> wrote:
> >
> > Hi Jean-Marc,
> >
> > Attached is the silk_warped_autocorrelation_FIX_neon() which
>...
2017 Apr 24
2
2 patches related to silk_biquad_alt() optimization
...where the C function is
called inside and the results of C and optimization functions are compared
when encoding/decoding the real audio files.
Thanks,
Linfeng
On Wed, Apr 19, 2017 at 11:46 PM, Ulrich Windl <
Ulrich.Windl at rz.uni-regensburg.de> wrote:
> >>> Linfeng Zhang <linfengz at google.com> schrieb am 19.04.2017 um 18:29 in
> Nachricht
> <CAKoqLCDX3eCUGbnZFvRzhiCV1Mbo2ksbj8K+pcVu60Dvit7WCQ at mail.gmail.com>:
> > Hi,
> >
> > Attached are 2 patches related to silk_biquad_alt() optimization. Please
> > review.
>
> Out of curios...
2017 May 15
2
2 patches related to silk_biquad_alt() optimization
...elying on 64-bit
multiplication results, then we could consider having a special option
to enable those (even in C).
Cheers,
Jean-Marc
On 08/05/17 12:12 PM, Linfeng Zhang wrote:
> Ping for comments.
>
> Thanks,
> Linfeng
>
> On Wed, Apr 26, 2017 at 2:15 PM, Linfeng Zhang <linfengz at google.com
> <mailto:linfengz at google.com>> wrote:
>
> On Tue, Apr 25, 2017 at 10:31 PM, Jean-Marc Valin
> <jmvalin at jmvalin.ca <mailto:jmvalin at jmvalin.ca>> wrote:
>
>
> > A_Q28 is split to 2 14-bit (or 16-bit, whatever) inte...
2017 Apr 19
3
[PATCH] cosmetics,silk: correct input/output arg comments
Hi,
Attached is a patch for cosmetics purpose. Please review.
Thanks,
Linfeng Zhang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170419/34354707/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-cosmetics-silk-correct-input-output-arg-comments.patch
2017 Jun 06
2
celt_inner_prod() and dual_inner_prod() NEON intrinsics
...e no further issues with your patches, so once you address the two
issues Jonathan pointed out, I'll be able to merge them.
Cheers,
Jean-Marc
>
> Out of curiosity, what’s the CPU in the Chromebook you’re using to
> test?
>
>> On Jun 1, 2017, at 6:33 PM, Linfeng Zhang <linfengz at google.com>
>> wrote:
>>
>> Hi,
>>
>> Attached are 5 patches related to celt_inner_prod() and
>> dual_inner_prod() NEON intrinsics optimization.
>>
>> In 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch,
>> the optimizati...
2017 Apr 25
0
Antw: Re: 2 patches related to silk_biquad_alt() optimization
>>> Linfeng Zhang <linfengz at google.com> schrieb am 25.04.2017 um 01:52 in Nachricht
<CAKoqLCDvAk7eeS-gpmqSHVxp4t-Lzzw7TLo5rRo=Ey_Q==cxGg at mail.gmail.com>:
> Hi Ulrich,
>
> As Jean-mark recommended, we created "--enable-check-asm" config option to
> active OPUS_CHECK_ASM macros in the optim...
2017 Jun 01
0
Opus floating-point NEON jump table question
On May 31, 2017, at 12:47 PM, Linfeng Zhang <linfengz at google.com<mailto:linfengz at google.com>> wrote:
Hi,
./configure --build x86_64-unknown-linux-gnu --host arm-linux-gnueabihf --disable-assertions --disable-check-asm --enable-intrinsics CFLAGS=-O3 --disable-shared
When configuring with floating-point and intrinsics enabled as above,...
2017 Apr 05
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...as a code size of
3,228 bytes (with gcc).
smaller_slower.c has a code size of 2,304 bytes, but the encoder is about
1.8% - 2.7% slower.
smallest_slowest.c has a code size of 1,656 bytes, but the encoder is about
2.3% - 3.6% slower.
Thanks,
Linfeng
On Mon, Apr 3, 2017 at 3:01 PM, Linfeng Zhang <linfengz at google.com> wrote:
> Hi Jean-Marc,
>
> Attached is the silk_warped_autocorrelation_FIX_neon() which implements
> your idea.
>
> Speed improvement vs the previous optimization:
>
> Complexity 0-4: Doesn't call this function. Complexity 5: 2.1% (order =
> 16) Com...
2017 Jun 06
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
...larger than the smallest
single-precision number and should be represented as none-zero (such as
0x8). I don't know why NEON gives 0 result.
Thanks,
Linfeng
On Tue, Jun 6, 2017 at 12:03 AM, Ulrich Windl <Ulrich.Windl at rz.uni-regensbur
g.de> wrote:
> >>> Linfeng Zhang <linfengz at google.com> schrieb am 06.06.2017 um 06:46 in
> Nachricht
> <CAKoqLCAfj+fDUMLfN4dLNSZ4NNAZpaSt_BWZRp+7XBqfhiSqiQ at mail.gmail.com>:
> > Hi Jean-Marc,
> >
> > I tried "==" before, and it failed when both results are 0.0. Maybe the
> > exponent or...
2017 Apr 05
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...; smaller_slower.c has a code size of 2,304 bytes, but the encoder is
> about 1.8% - 2.7% slower.
> smallest_slowest.c has a code size of 1,656 bytes, but the encoder is
> about 2.3% - 3.6% slower.
>
> Thanks,
> Linfeng
>
> On Mon, Apr 3, 2017 at 3:01 PM, Linfeng Zhang <linfengz at google.com
> <mailto:linfengz at google.com>> wrote:
>
> Hi Jean-Marc,
>
> Attached is the silk_warped_autocorrelation_FIX_neon() which
> implements your idea.
>
> Speed improvement vs the previous optimization:
>
> Complexity 0-4: D...
2017 Apr 11
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc,
Thanks for your suggestions!
I attached the new patch, with inlined reply below.
Thanks,
Linfeng
On Thu, Apr 6, 2017 at 12:55 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
> I did some profiling on a Cortex A57 and I've been seeing slightly less
> improvement than you're reporting, more like 3.5% at complexity 8. It
> appears that the warped
2017 Jun 06
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
...ld be using actual equality. If not, then I guess we need to find
> the right condition (which isn't obvious for floating point).
>
> Cheers,
>
> Jean-Marc
>
>
> > Thanks,
> > Linfeng
> >
> > On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang <linfengz at google.com
> > <mailto:linfengz at google.com>> wrote:
> >
> > Hi Jean-Marc,
> >
> > I attached the new version in inner_prod_5patches_v2.zip which
> > synced to the current master.
> >
> > For fixed-point ARM, only
> &g...
2017 Apr 19
4
2 patches related to silk_biquad_alt() optimization
Hi,
Attached are 2 patches related to silk_biquad_alt() optimization. Please
review.
Thanks,
Linfeng Zhang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170419/f08f5030/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name:
2017 Jun 02
2
Opus floating-point NEON jump table question
...rm_silk_map.c uses
> the MAY_HAVE_NEON macro, which it shouldn’t be using. If that file were
> changed so that the jump tables just listed the _neon versions of the
> functions directly, you’d get the speedup you’re looking for.
>
>
> On Jun 1, 2017, at 6:03 PM, Linfeng Zhang <linfengz at google.com> wrote:
>
> Thank Jean-Mark and Jonathan!
>
> I tested current OPUS encoder in floating-point with Complexity 8. Hacking
> using the attached patch (which will generate "#define
> OPUS_ARM_MAY_HAVE_NEON 1" in config.h) will speed up about 14.7% on my
>...