thr3ads.net - similar to: "celt_inner_prod() and dual_inner

Displaying 20 results from an estimated 600 matches similar to: "celt_inner_prod() and dual_inner_prod() NEON intrinsics"

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 06

celt_inner_prod() and dual_inner_prod() NEON intrinsics

Hi Linfeng, On 06/06/17 04:09 PM, Jonathan Lennox wrote: > Two comments on the various infrastructure for RTCD etc. > > 1. The 0002- patch changes the ABI of the celt_pitch_xcorr functions, > but doesn’t change the assembly in celt/arm/celt_pitch_xcorr_arm.s > correspondingly. I suspect the ‘arch’ parameter can just be ignored > by the assembly functions, but at least the

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 05

celt_inner_prod() and dual_inner_prod() NEON intrinsics

Hi Jean-Marc, I attached the new version in inner_prod_5patches_v2.zip which synced to the current master. For fixed-point ARM, only 0003-Optimize-fixed-point-celt _inner_prod-and-dual_inner_.patch changes the performance. For floating-point ARM, only 0004-Optimize-floating-point-c elt_inner_prod-and-dual_inn.patch changes the performance. Patch 1 and 2 are code clean-up and can only affect x86

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 06

celt_inner_prod() and dual_inner_prod() NEON intrinsics

Hi Linfeng, On 05/06/17 03:31 PM, Linfeng Zhang wrote: > Yes we'll have one more patch set related to xcorr in next week. Please > don't wait if it's too late for 1.2 release. Assuming there's no issue with the patches, next week isn't too late. Also, I've started looking at your patches. So far there's one thing that puzzles me a bit. In the OPUS_CHECK_ASM

Antw: Re: celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 06

Antw: Re: celt_inner_prod() and dual_inner_prod() NEON intrinsics

>>> Linfeng Zhang <linfengz at google.com> schrieb am 06.06.2017 um 06:46 in Nachricht <CAKoqLCAfj+fDUMLfN4dLNSZ4NNAZpaSt_BWZRp+7XBqfhiSqiQ at mail.gmail.com>: > Hi Jean-Marc, > > I tried "==" before, and it failed when both results are 0.0. Maybe the > exponent or sign has difference because of the different 0.0 representation > in NEON. If anybody

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 06

celt_inner_prod() and dual_inner_prod() NEON intrinsics

Two comments on the various infrastructure for RTCD etc. 1. The 0002- patch changes the ABI of the celt_pitch_xcorr functions, but doesn’t change the assembly in celt/arm/celt_pitch_xcorr_arm.s correspondingly. I suspect the ‘arch’ parameter can just be ignored by the assembly functions, but at least the comments in that file should be updated to indicate the register that’s used to pass it in,

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 05

celt_inner_prod() and dual_inner_prod() NEON intrinsics

On 05/06/17 03:28 PM, Linfeng Zhang wrote: > For fixed-point ARM, only > 0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch changes > the performance. > For floating-point ARM, only > 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch changes the performance. Got any numbers? Cheers, Jean-Marc > Patch 1 and 2 are code clean-up and can only affect

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 06

celt_inner_prod() and dual_inner_prod() NEON intrinsics

Thank Jonathan and Jean-Marc! I attached the new patch sets in inner_prod_5patches_v3.zip. The Chromebook I'm using is Chromebook 13 CB5-311 series RMN: Z3ENN CPU info: $ cat /proc/cpuinfo processor : 0 model name : ARMv7 Processor rev 3 (v7l) BogoMIPS : 2.31 Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae CPU implementer : 0x41 CPU

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 06

celt_inner_prod() and dual_inner_prod() NEON intrinsics

Hi Jean-Marc, I tried "==" before, and it failed when both results are 0.0. Maybe the exponent or sign has difference because of the different 0.0 representation in NEON. If anybody know how to handle this 0.0 comparison, that would be great. Or just use if(a==b || (a==0.0 && b==0.0)) ... but I haven't try this. Thanks, Linfeng On Mon, Jun 5, 2017 at 8:43 PM Jean-Marc

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 05

celt_inner_prod() and dual_inner_prod() NEON intrinsics

Yes we'll have one more patch set related to xcorr in next week. Please don't wait if it's too late for 1.2 release. Thanks, Linfeng On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang <linfengz at google.com> wrote: > Hi Jean-Marc, > > I attached the new version in inner_prod_5patches_v2.zip which synced to > the current master. > > For fixed-point ARM, only

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 06

celt_inner_prod() and dual_inner_prod() NEON intrinsics

Thank Ulrich! Yes, using celt_assert(1.0 + celt_inner_prod_neon_float_c_simulation(x, y, N) == 1.0 + xy); celt_assert(1.0 + xy1_c == 1.0 + *xy1); celt_assert(1.0 + xy2_c == 1.0 + *xy2); can avoid the useage of VERY_SMALL. Hi Jean-Marc, I added { const opus_val32 xy_c = celt_inner_prod_neon_float_c_simulation(x, y, N); const int32_t *x_bin =

celt_inner_prod() and dual_inner_prod() NEON intrinsics

2017 Jun 02

celt_inner_prod() and dual_inner_prod() NEON intrinsics

Hi Linfeng, I'll look into your patches. Can you let me know what's the expected effect on performance (if any) for each of your patches? Also, are these all the patches you intend to merge for 1.2 or are there more upcoming ones? Cheers, Jean-Marc On 01/06/17 06:33 PM, Linfeng Zhang wrote: > Hi, > > Attached are 5 patches related to celt_inner_prod() > and

AVX Optimizations

2015 Nov 05

AVX Optimizations

Yes, Thank you. I'll follow up with the AVX code and tests for pitch code. Radu -----Original Message----- From: opus-bounces at xiph.org [mailto:opus-bounces at xiph.org] On Behalf Of Timothy B. Terriberry Sent: Thursday, November 5, 2015 10:31 AM To: opus at xiph.org Subject: Re: [opus] AVX Optimizations Velea, Radu wrote: > I've created a pull request[1] to enable configuration

[PATCH 12/15] Replace call of celt_inner_prod_c() (step 1)

2016 Sep 13

[PATCH 12/15] Replace call of celt_inner_prod_c() (step 1)

Should call celt_inner_prod(). --- celt/bands.c | 7 ++++--- celt/bands.h | 2 +- celt/celt_encoder.c | 6 +++--- celt/pitch.c | 2 +- src/opus_multistream_encoder.c | 2 +- 5 files changed, 10 insertions(+), 9 deletions(-) diff --git a/celt/bands.c b/celt/bands.c index bbe8a4c..1ab24aa 100644 --- a/celt/bands.c +++ b/celt/bands.c

[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.

2015 Mar 13

[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.

From: Jonathan Lennox <jonathan at vidyo.com> * Makes ?enable-intrinsics work with clang and other non-GCC compilers * Enables RTCD for the floating-point-mode SSE code in Celt. * Disables use of RTCD in cases where the compiler targets an instruction set by default. * Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in floating-point mode, not

[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.

2015 Mar 12

[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.

Patch cleaning up Opus x86 intrinsics configury

2015 Mar 02

Patch cleaning up Opus x86 intrinsics configury

The attached patch cleans up Opus's x86 intrinsics configury. It: * Makes ?enable-intrinsics work with clang and other non-GCC compilers * Enables RTCD for the floating-point-mode SSE code in Celt. * Disables use of RTCD in cases where the compiler targets an instruction set by default. * Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in

AVX Optimizations

2015 Nov 05

AVX Optimizations

Velea, Radu wrote: > Yes, > > Thank you. I'll follow up with the AVX code and tests for pitch code. Actually, I lied. Because you update opus_select_arch(), you can now return a value for arch (4) that is larger than the maximum we currently support (3). This doesn't actually cause failures, because we mask with OPUS_ARCHMASK, but it does mean that a CPU with AVX will invoke

AVX Optimizations

2015 Nov 05

AVX Optimizations

Sorry. I missed that. Good observation. Please go ahead and correct the patch. Thanks, Radu -----Original Message----- From: opus-bounces at xiph.org [mailto:opus-bounces at xiph.org] On Behalf Of Timothy B. Terriberry Sent: Thursday, November 5, 2015 11:08 AM To: opus at xiph.org Subject: Re: [opus] AVX Optimizations Velea, Radu wrote: > Yes, > > Thank you. I'll follow up with

[RFC PATCH v1 0/4] Enable aarch64 intrinsics/Ne10

2015 Mar 18

[RFC PATCH v1 0/4] Enable aarch64 intrinsics/Ne10

Hi All, Since I continue to base my work on top of Jonathan's patch, and my previous Ne10 fft/ifft/mdct_forward/backward patches, I thought it would be better to just post all new patches as a patch series. Please let me know if anyone disagrees with this approach. You can see wip branch of all latest patches at https://git.linaro.org/people/viswanath.puttagunta/opus.git Branch:

[RFC PATCH v1 0/5] aarch64: celt_pitch_xcorr: Fixed point series

2015 Mar 31

[RFC PATCH v1 0/5] aarch64: celt_pitch_xcorr: Fixed point series

Hi Timothy, As I mentioned earlier [1], I now fixed compile issues with fixed point and resubmitting the patch. I also have new patch that does intrinsics optimizations for celt_pitch_xcorr targetting aarch64. You can find my latest work-in-progress branch at [2] For reference, you can use the Ne10 pre-built libraries at [3] Note that I am working with Phil at ARM to get my patch at [4]

similar to: celt_inner_prod() and dual_inner_prod() NEON intrinsics