Displaying 20 results from an estimated 600 matches similar to: "celt_inner_prod() and dual_inner_prod() NEON intrinsics"
2017 Jun 06
2
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Linfeng,
On 06/06/17 04:09 PM, Jonathan Lennox wrote:
> Two comments on the various infrastructure for RTCD etc.
>
> 1. The 0002- patch changes the ABI of the celt_pitch_xcorr functions,
> but doesn’t change the assembly in celt/arm/celt_pitch_xcorr_arm.s
> correspondingly. I suspect the ‘arch’ parameter can just be ignored
> by the assembly functions, but at least the
2017 Jun 05
4
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Jean-Marc,
I attached the new version in inner_prod_5patches_v2.zip which synced to
the current master.
For fixed-point ARM, only 0003-Optimize-fixed-point-celt
_inner_prod-and-dual_inner_.patch changes the performance.
For floating-point ARM, only 0004-Optimize-floating-point-c
elt_inner_prod-and-dual_inn.patch changes the performance.
Patch 1 and 2 are code clean-up and can only affect x86
2017 Jun 06
3
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Linfeng,
On 05/06/17 03:31 PM, Linfeng Zhang wrote:
> Yes we'll have one more patch set related to xcorr in next week. Please
> don't wait if it's too late for 1.2 release.
Assuming there's no issue with the patches, next week isn't too late.
Also, I've started looking at your patches. So far there's one thing
that puzzles me a bit. In the OPUS_CHECK_ASM
2017 Jun 06
4
Antw: Re: celt_inner_prod() and dual_inner_prod() NEON intrinsics
>>> Linfeng Zhang <linfengz at google.com> schrieb am 06.06.2017 um 06:46 in Nachricht
<CAKoqLCAfj+fDUMLfN4dLNSZ4NNAZpaSt_BWZRp+7XBqfhiSqiQ at mail.gmail.com>:
> Hi Jean-Marc,
>
> I tried "==" before, and it failed when both results are 0.0. Maybe the
> exponent or sign has difference because of the different 0.0 representation
> in NEON. If anybody
2017 Jun 06
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Two comments on the various infrastructure for RTCD etc.
1. The 0002- patch changes the ABI of the celt_pitch_xcorr functions, but doesn’t change the assembly in celt/arm/celt_pitch_xcorr_arm.s correspondingly. I suspect the ‘arch’ parameter can just be ignored by the assembly functions, but at least the comments in that file should be updated to indicate the register that’s used to pass it in,
2017 Jun 05
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
On 05/06/17 03:28 PM, Linfeng Zhang wrote:
> For fixed-point ARM, only
> 0003-Optimize-fixed-point-celt_inner_prod-and-dual_inner_.patch changes
> the performance.
> For floating-point ARM, only
> 0004-Optimize-floating-point-celt_inner_prod-and-dual_inn.patch changes the performance.
Got any numbers?
Cheers,
Jean-Marc
> Patch 1 and 2 are code clean-up and can only affect
2017 Jun 06
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Thank Jonathan and Jean-Marc!
I attached the new patch sets in inner_prod_5patches_v3.zip.
The Chromebook I'm using is
Chromebook 13
CB5-311 series
RMN: Z3ENN
CPU info:
$ cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 3 (v7l)
BogoMIPS : 2.31
Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4
idiva idivt vfpd32 lpae
CPU implementer : 0x41
CPU
2017 Jun 06
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Jean-Marc,
I tried "==" before, and it failed when both results are 0.0. Maybe the
exponent or sign has difference because of the different 0.0 representation
in NEON. If anybody know how to handle this 0.0 comparison, that would be
great.
Or just use if(a==b || (a==0.0 && b==0.0)) ... but I haven't try this.
Thanks,
Linfeng
On Mon, Jun 5, 2017 at 8:43 PM Jean-Marc
2017 Jun 05
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Yes we'll have one more patch set related to xcorr in next week. Please
don't wait if it's too late for 1.2 release.
Thanks,
Linfeng
On Mon, Jun 5, 2017 at 12:28 PM, Linfeng Zhang <linfengz at google.com> wrote:
> Hi Jean-Marc,
>
> I attached the new version in inner_prod_5patches_v2.zip which synced to
> the current master.
>
> For fixed-point ARM, only
2017 Jun 06
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Thank Ulrich!
Yes, using
celt_assert(1.0 + celt_inner_prod_neon_float_c_simulation(x, y, N)
== 1.0 + xy);
celt_assert(1.0 + xy1_c == 1.0 + *xy1);
celt_assert(1.0 + xy2_c == 1.0 + *xy2);
can avoid the useage of VERY_SMALL.
Hi Jean-Marc,
I added
{
const opus_val32 xy_c = celt_inner_prod_neon_float_c_simulation(x,
y, N);
const int32_t *x_bin =
2017 Jun 02
0
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Linfeng,
I'll look into your patches. Can you let me know what's the expected
effect on performance (if any) for each of your patches? Also, are these
all the patches you intend to merge for 1.2 or are there more upcoming ones?
Cheers,
Jean-Marc
On 01/06/17 06:33 PM, Linfeng Zhang wrote:
> Hi,
>
> Attached are 5 patches related to celt_inner_prod()
> and
2015 Nov 05
2
AVX Optimizations
Yes,
Thank you. I'll follow up with the AVX code and tests for pitch code.
Radu
-----Original Message-----
From: opus-bounces at xiph.org [mailto:opus-bounces at xiph.org] On Behalf Of Timothy B. Terriberry
Sent: Thursday, November 5, 2015 10:31 AM
To: opus at xiph.org
Subject: Re: [opus] AVX Optimizations
Velea, Radu wrote:
> I've created a pull request[1] to enable configuration
2016 Sep 13
4
[PATCH 12/15] Replace call of celt_inner_prod_c() (step 1)
Should call celt_inner_prod().
---
celt/bands.c | 7 ++++---
celt/bands.h | 2 +-
celt/celt_encoder.c | 6 +++---
celt/pitch.c | 2 +-
src/opus_multistream_encoder.c | 2 +-
5 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/celt/bands.c b/celt/bands.c
index bbe8a4c..1ab24aa 100644
--- a/celt/bands.c
+++ b/celt/bands.c
2015 Mar 13
1
[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.
From: Jonathan Lennox <jonathan at vidyo.com>
* Makes ?enable-intrinsics work with clang and other non-GCC compilers
* Enables RTCD for the floating-point-mode SSE code in Celt.
* Disables use of RTCD in cases where the compiler targets an instruction set by default.
* Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in floating-point mode, not
2015 Mar 12
1
[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.
From: Jonathan Lennox <jonathan at vidyo.com>
* Makes ?enable-intrinsics work with clang and other non-GCC compilers
* Enables RTCD for the floating-point-mode SSE code in Celt.
* Disables use of RTCD in cases where the compiler targets an instruction set by default.
* Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in floating-point mode, not
2015 Mar 02
13
Patch cleaning up Opus x86 intrinsics configury
The attached patch cleans up Opus's x86 intrinsics configury.
It:
* Makes ?enable-intrinsics work with clang and other non-GCC compilers
* Enables RTCD for the floating-point-mode SSE code in Celt.
* Disables use of RTCD in cases where the compiler targets an instruction set by default.
* Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in
2015 Nov 05
0
AVX Optimizations
Velea, Radu wrote:
> Yes,
>
> Thank you. I'll follow up with the AVX code and tests for pitch code.
Actually, I lied. Because you update opus_select_arch(), you can now
return a value for arch (4) that is larger than the maximum we currently
support (3). This doesn't actually cause failures, because we mask with
OPUS_ARCHMASK, but it does mean that a CPU with AVX will invoke
2015 Nov 05
2
AVX Optimizations
Sorry. I missed that. Good observation.
Please go ahead and correct the patch.
Thanks,
Radu
-----Original Message-----
From: opus-bounces at xiph.org [mailto:opus-bounces at xiph.org] On Behalf Of Timothy B. Terriberry
Sent: Thursday, November 5, 2015 11:08 AM
To: opus at xiph.org
Subject: Re: [opus] AVX Optimizations
Velea, Radu wrote:
> Yes,
>
> Thank you. I'll follow up with
2015 Mar 18
5
[RFC PATCH v1 0/4] Enable aarch64 intrinsics/Ne10
Hi All,
Since I continue to base my work on top of Jonathan's patch,
and my previous Ne10 fft/ifft/mdct_forward/backward patches,
I thought it would be better to just post all new patches
as a patch series. Please let me know if anyone disagrees
with this approach.
You can see wip branch of all latest patches at
https://git.linaro.org/people/viswanath.puttagunta/opus.git
Branch:
2015 Mar 31
6
[RFC PATCH v1 0/5] aarch64: celt_pitch_xcorr: Fixed point series
Hi Timothy,
As I mentioned earlier [1], I now fixed compile issues
with fixed point and resubmitting the patch.
I also have new patch that does intrinsics optimizations
for celt_pitch_xcorr targetting aarch64.
You can find my latest work-in-progress branch at [2]
For reference, you can use the Ne10 pre-built libraries
at [3]
Note that I am working with Phil at ARM to get my patch at [4]