similar to: celt_inner_prod() and dual_inner_prod() NEON intrinsics

Displaying 20 results from an estimated 600 matches similar to: "celt_inner_prod() and dual_inner_prod() NEON intrinsics"

2017 Jun 06
2
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Linfeng, On 06/06/17 04:09 PM, Jonathan Lennox wrote: > Two comments on the various infrastructure for RTCD etc. > > 1. The 0002- patch changes the ABI of the celt_pitch_xcorr functions, > but doesn’t change the assembly in celt/arm/celt_pitch_xcorr_arm.s > correspondingly. I suspect the ‘arch’ parameter can just be ignored > by the assembly functions, but
2017 Jun 05
4
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Jean-Marc, I attached the new version in inner_prod_5patches_v2.zip which synced to the current master. For fixed-point ARM, only 0003-Optimize-fixed-point-celt _inner_prod-and-dual_inner_.patch changes the performance. For floating-point ARM, only 0004-Optimize-floating-point-c elt_inner_prod-and-dual_inn.patch changes the performance. Patch 1 and 2 are code clean-up and can only affect x86
2017 Jun 06
3
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Linfeng, On 05/06/17 03:31 PM, Linfeng Zhang wrote: > Yes we'll have one more patch set related to xcorr in next week. Please > don't wait if it's too late for 1.2 release. Assuming there's no issue with the patches, next week isn't too late. Also, I've started looking at your patches. So far there's one thing that puzzles me a bit. In the
2017 Jun 06
4
Antw: Re: celt_inner_prod() and dual_inner_prod() NEON intrinsics
>>> Linfeng Zhang <linfengz at google.com> schrieb am 06.06.2017 um 06:46 in Nachricht <CAKoqLCAfj+fDUMLfN4dLNSZ4NNAZpaSt_BWZRp+7XBqfhiSqiQ at mail.gmail.com>: > Hi Jean-Marc, > > I tried "==" before, and it failed when both results are 0.0. Maybe the > exponent or sign has difference because of the different 0.0 representation > in NEON. If anybody
2017 Jun 06
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Two comments on the various infrastructure for RTCD etc. 1. The 0002- patch changes the ABI of the celt_pitch_xcorr functions, but doesn’t change the assembly in celt/arm/celt_pitch_xcorr_arm.s correspondingly. I suspect the ‘arch’ parameter can just be ignored by the assembly functions, but at least the comments in that file should be updated to indicate the register that’s used to pass
2017 Jun 06
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Jean-Marc, I tried "==" before, and it failed when both results are 0.0. Maybe the exponent or sign has difference because of the different 0.0 representation in NEON. If anybody know how to handle this 0.0 comparison, that would be great. Or just use if(a==b || (a==0.0 && b==0.0)) ... but I haven't try this. Thanks, Linfeng On Mon, Jun 5, 2017 at 8:43 PM Jean-Marc
2017 Jun 06
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Thank Jonathan and Jean-Marc! I attached the new patch sets in inner_prod_5patches_v3.zip. The Chromebook I'm using is Chromebook 13 CB5-311 series RMN: Z3ENN CPU info: $ cat /proc/cpuinfo processor : 0 model name : ARMv7 Processor rev 3 (v7l) BogoMIPS : 2.31 Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae CPU implementer : 0x41 CPU
2017 Jun 05
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
On 05/06/17 03:28 PM, Linfeng Zhang wrote: > For fixed-point ARM, only > 0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch changes > the performance. > For floating-point ARM, only > 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch changes the performance. Got any numbers? Cheers, Jean-Marc > Patch 1 and 2 are code clean-up and can only
2017 Jun 05
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Yes we'll have one more patch set related to xcorr in next week. Please don't wait if it's too late for 1.2 release. Thanks, Linfeng On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang <linfengz at google.com> wrote: > Hi Jean-Marc, > > I attached the new version in inner_prod_5patches_v2.zip which synced to > the current master. > > For fixed-point ARM, only
2017 Jun 06
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Thank Ulrich! Yes, using celt_assert(1.0 + celt_inner_prod_neon_float_c_simulation(x, y, N) == 1.0 + xy); celt_assert(1.0 + xy1_c == 1.0 + *xy1); celt_assert(1.0 + xy2_c == 1.0 + *xy2); can avoid the useage of VERY_SMALL. Hi Jean-Marc, I added { const opus_val32 xy_c = celt_inner_prod_neon_float_c_simulation(x, y, N); const int32_t *x_bin =
2017 Jun 02
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Linfeng, I'll look into your patches. Can you let me know what's the expected effect on performance (if any) for each of your patches? Also, are these all the patches you intend to merge for 1.2 or are there more upcoming ones? Cheers, Jean-Marc On 01/06/17 06:33 PM, Linfeng Zhang wrote: > Hi, > > Attached are 5 patches related to celt_inner_prod() > and
2015 Nov 05
2
AVX Optimizations
Yes, Thank you. I'll follow up with the AVX code and tests for pitch code. Radu -----Original Message----- From: opus-bounces at xiph.org [mailto:opus-bounces at xiph.org] On Behalf Of Timothy B. Terriberry Sent: Thursday, November 5, 2015 10:31 AM To: opus at xiph.org Subject: Re: [opus] AVX Optimizations Velea, Radu wrote: > I've created a pull request[1] to enable configuration
2016 Sep 13
4
[PATCH 12/15] Replace call of celt_inner_prod_c() (step 1)
Should call celt_inner_prod(). --- celt/bands.c | 7 ++++--- celt/bands.h | 2 +- celt/celt_encoder.c | 6 +++--- celt/pitch.c | 2 +- src/opus_multistream_encoder.c | 2 +- 5 files changed, 10 insertions(+), 9 deletions(-) diff --git a/celt/bands.c b/celt/bands.c index bbe8a4c..1ab24aa 100644
2015 Mar 02
13
Patch cleaning up Opus x86 intrinsics configury
The attached patch cleans up Opus's x86 intrinsics configury. It: * Makes ?enable-intrinsics work with clang and other non-GCC compilers * Enables RTCD for the floating-point-mode SSE code in Celt. * Disables use of RTCD in cases where the compiler targets an instruction set by default. * Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in
2015 Mar 13
1
[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.
From: Jonathan Lennox <jonathan at vidyo.com> * Makes ?enable-intrinsics work with clang and other non-GCC compilers * Enables RTCD for the floating-point-mode SSE code in Celt. * Disables use of RTCD in cases where the compiler targets an instruction set by default. * Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in floating-point mode, not
2015 Mar 12
1
[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.
From: Jonathan Lennox <jonathan at vidyo.com> * Makes ?enable-intrinsics work with clang and other non-GCC compilers * Enables RTCD for the floating-point-mode SSE code in Celt. * Disables use of RTCD in cases where the compiler targets an instruction set by default. * Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in floating-point mode, not
2015 Nov 05
0
AVX Optimizations
Velea, Radu wrote: > Yes, > > Thank you. I'll follow up with the AVX code and tests for pitch code. Actually, I lied. Because you update opus_select_arch(), you can now return a value for arch (4) that is larger than the maximum we currently support (3). This doesn't actually cause failures, because we mask with OPUS_ARCHMASK, but it does mean that a CPU with AVX will invoke
2015 Nov 05
2
AVX Optimizations
Sorry. I missed that. Good observation. Please go ahead and correct the patch. Thanks, Radu -----Original Message----- From: opus-bounces at xiph.org [mailto:opus-bounces at xiph.org] On Behalf Of Timothy B. Terriberry Sent: Thursday, November 5, 2015 11:08 AM To: opus at xiph.org Subject: Re: [opus] AVX Optimizations Velea, Radu wrote: > Yes, > > Thank you. I'll follow up with
2015 Mar 18
5
[RFC PATCH v1 0/4] Enable aarch64 intrinsics/Ne10
Hi All, Since I continue to base my work on top of Jonathan's patch, and my previous Ne10 fft/ifft/mdct_forward/backward patches, I thought it would be better to just post all new patches as a patch series. Please let me know if anyone disagrees with this approach. You can see wip branch of all latest patches at
2015 Mar 31
6
[RFC PATCH v1 0/5] aarch64: celt_pitch_xcorr: Fixed point series
Hi Timothy, As I mentioned earlier [1], I now fixed compile issues with fixed point and resubmitting the patch. I also have new patch that does intrinsics optimizations for celt_pitch_xcorr targetting aarch64. You can find my latest work-in-progress branch at [2] For reference, you can use the Ne10 pre-built libraries at [3] Note that I am working with Phil at ARM to get my patch at [4]