search for: xcorr_kernel_neon_float

Displaying 20 results from an estimated 26 matches for "xcorr_kernel_neon_float".

2014 Dec 19
3
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...surable performance advantage to the switch statement, that would change my opinion, but if there isn't, then either of the other two approaches seem better. One more nit I hadn't noticed before: > + float *xi = x; > + float *yi = y; These need to be const float32_t (in both xcorr_kernel_neon_float and xcorr_kernel_neon_float_process1). They're currently causing a ton of warning spew. float32_t appears to not be considered equivalent to float, which means you'll also need casts here: > + vst1q_f32(sum, SUMM); and here: > + vst1_lane_f32(sum, SUMM_2[0], 0);
2014 Dec 19
2
[PATCH v1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
..., STRICT LIABILITY, OR TORT (INCLUDING > + NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS > + SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > +*/ > +#include <arm_neon.h> > +#include "../arch.h" > + > +/* > + * Function: xcorr_kernel_neon_float > + * --------------------------------- > + * Computes 4 correlation values and stores them in sum[4] > + */ > +static void xcorr_kernel_neon_float(const float *x, const float *y, > + float sum[4], int len) { > + float32x4_t YY[3]; > + float32x4_t...
2014 Nov 28
2
[RFC PATCHv1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...itch_xcorr_c, /* Media */ > +#if defined(OPUS_ARM_NEON_INTR) > + celt_pitch_xcorr_float_neon /* Neon */ Please do not use tabs in source code (this applies here and everywhere below). Even with the tabs expanded in context, the comments here do not line up properly. > +static void xcorr_kernel_neon_float(float *x, float *y, float sum[4], int len) { x and y should be const. > + float32x4_t YY[5]; > + float32x4_t XX[4]; > + float32x2_t XX_2; > + float32x4_t SUMM[4]; > + float *xi = x; > + float *yi = y; > + int cd = len/4; > + int cr = len%4; len is signed, so / and % are N...
2014 Dec 19
2
[PATCH v1] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi, Optimizes celt_pitch_xcorr for ARM NEON floating point. Changes from RFCv3: - celt_neon_intr.c - removed warnings due to not having constant pointers - Put simpler loop to take care of corner cases. Unrolling using intrinsics was not really mapping well to what was done in celt_pitch_xcorr_arm.s - Makefile.am Removed explicit -O3 optimization - test_unit_mathops.c,
2014 Dec 01
0
[RFC PATCHv1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...NTR) >> + celt_pitch_xcorr_float_neon /* Neon */ > > Please do not use tabs in source code (this applies here and everywhere > below). Even with the tabs expanded in context, the comments here do not > line up properly. Will do. Thanks. > >> +static void xcorr_kernel_neon_float(float *x, float *y, float sum[4], int len) { > > x and y should be const. > >> + float32x4_t YY[5]; >> + float32x4_t XX[4]; >> + float32x2_t XX_2; >> + float32x4_t SUMM[4]; >> + float *xi = x; >> + float *yi = y; >> + in...
2014 Dec 19
0
[PATCH v1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...RY OF + LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ +#include <arm_neon.h> +#include "../arch.h" + +/* + * Function: xcorr_kernel_neon_float + * --------------------------------- + * Computes 4 correlation values and stores them in sum[4] + */ +static void xcorr_kernel_neon_float(const float *x, const float *y, + float sum[4], int len) { + float32x4_t YY[3]; + float32x4_t YEXT[3]; + float32x4_t XX[2]; +...
2014 Dec 07
0
[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...RY OF + LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ +#include <arm_neon.h> +#include "../arch.h" + +/* + * Function: xcorr_kernel_neon_float + * --------------------------------- + * Computes 4 correlation values and stores them in sum[4] + */ +void xcorr_kernel_neon_float(const float *x, const float *y, + float sum[4], int len) { + float32x4_t YY[3]; + float32x4_t YEXT[3]; + float32x4_t XX[2]; + float3...
2014 Dec 10
0
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...RY OF + LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ +#include <arm_neon.h> +#include "../arch.h" + +/* + * Function: xcorr_kernel_neon_float + * --------------------------------- + * Computes 4 correlation values and stores them in sum[4] + */ +static void xcorr_kernel_neon_float(const float *x, const float *y, + float sum[4], int len) { + float32x4_t YY[3]; + float32x4_t YEXT[3]; + float32x4_t XX[2]; +...
2014 Dec 10
2
[RFC PATCH v3] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi, Optimizes celt_pitch_xcorr for floating point. Changes from RFCv2: - Changes recommended by Timothy for celt_neon_intr.c everything except, left the unrolled loop still unrolled - configure.ac - use AC_LINK_IFELSE instead of AC_COMPILE_IFELSE - Moved compile flags into Makefile.am - OPUS_ARM_NEON_INR --> typo --> OPUS_ARM_NEON_INTR Viswanath Puttagunta (1): armv7:
2014 Nov 21
4
[RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hello, I received feedback from engineers working on NE10 [1] that it would be better to use NE10 [1] for FFT optimizations for opus use cases. However, these FFT patches are currently in review and haven't been integrated into NE10 yet. While the FFT functions in NE10 are getting baked, I wanted to optimize the celt_pitch_xcorr (floating point only) and use it to introduce ARM NEON
2014 Dec 07
3
[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
From: Viswanath Puttagunta <viswanath.puttagunta at linaro.org> Hi, Optimizes celt_pitch_xcorr for floating point. Changes from RFCv1: - Rebased on top of commit aad281878: Fix celt_pitch_xcorr_c signature. which got rid of ugly code around CELT_PITCH_XCORR_IMPL passing of "arch" parameter. - Unified with --enable-intrinsics used by x86 - Modified algorithm to be more
2014 Dec 07
2
[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi, Optimizes celt_pitch_xcorr for floating point. Changes from RFCv1: - Rebased on top of commit aad281878: Fix celt_pitch_xcorr_c signature. which got rid of ugly code around CELT_PITCH_XCORR_IMPL passing of "arch" parameter. - Unified with --enable-intrinsics used by x86 - Modified algorithm to be more in-line with algorithm in celt_pitch_xcorr_arm.s Viswanath Puttagunta
2014 Nov 21
0
[RFC PATCHv1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...lt_pitch_xcorr_c +#endif +}; + # endif #endif diff --git a/celt/arm/celt_neon_intr.c b/celt/arm/celt_neon_intr.c new file mode 100644 index 0000000..88954fb --- /dev/null +++ b/celt/arm/celt_neon_intr.c @@ -0,0 +1,81 @@ +#include <arm_neon.h> +#include "../arch.h" + +static void xcorr_kernel_neon_float(float *x, float *y, float sum[4], int len) { + float32x4_t YY[5]; + float32x4_t XX[4]; + float32x2_t XX_2; + float32x4_t SUMM[4]; + float *xi = x; + float *yi = y; + int cd = len/4; + int cr = len%4; + int j; + + celt_assert(len>=3); + + /* Initialize sums to 0 */ + SUMM[0] = vdupq_n_f32(0); + S...
2015 Nov 21
0
[Aarch64 v2 08/18] Add Neon fixed-point implementation of xcorr_kernel.
...y += 8; + } + + for (; j < len; j++) + { + int16x4_t x0 = vld1_dup_s16(x); //load next x + int32x4_t a0 = vmlal_s16(a, y0, x0); + + int16x4_t y4 = vld1_dup_s16(y); //load next y + y0 = vext_s16(y0, y4, 1); + a = a0; + x++; + y++; + } + + vst1q_s32(sum, a); +} + +#else /* * Function: xcorr_kernel_neon_float * --------------------------------- diff --git a/celt/arm/pitch_arm.h b/celt/arm/pitch_arm.h index bd41774..545c115 100644 --- a/celt/arm/pitch_arm.h +++ b/celt/arm/pitch_arm.h @@ -56,7 +56,36 @@ opus_val32 celt_pitch_xcorr_edsp(const opus_val16 *_x, const opus_val16 *_y, # define OVERRIDE_PIT...
2014 Dec 19
0
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...r two approaches seem better. OK, thanks for your explanation. I will take a closer look at this part of celt_pitch_xcorr_arm.s > > > One more nit I hadn't noticed before: > >> + float *xi = x; >> + float *yi = y; > > These need to be const float32_t (in both xcorr_kernel_neon_float and > xcorr_kernel_neon_float_process1). They're currently causing a ton of > warning spew. float32_t appears to not be considered equivalent to > float, which means you'll also need casts here: > >> + vst1q_f32(sum, SUMM); > > and here: > >> + vst1_lane...
2014 Dec 18
2
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Almost there... just a few nits left. Viswanath Puttagunta wrote: > +if OPUS_ARM_NEON_INTR > +CELT_SOURCES += $(CELT_SOURCES_ARM_NEON_INTR) > +OPUS_ARM_NEON_INTR_CPPFLAGS = -mfpu=neon -O3 I'll repeat: I don't think you should change the optimization level here. > + /* Just unroll the rest of the loop */ I saw you decided to keep this unrolled, but you didn't actually
2015 Dec 23
6
[AArch64 neon intrinsics v4 0/5] Rework Neon intrinsic code for Aarch64 patchset
Following Tim's comments, here are my reworked patches for the Neon intrinsic function patches of of my Aarch64 patchset, i.e. replacing patches 5-8 of the v2 series. Patches 1-4 and 9-18 of the old series still apply unmodified. The one new (as opposed to changed) patch is the first one in this series, to add named constants for the ARM architecture variants. There are also some minor code
2016 Sep 13
4
[PATCH 12/15] Replace call of celt_inner_prod_c() (step 1)
Should call celt_inner_prod(). --- celt/bands.c | 7 ++++--- celt/bands.h | 2 +- celt/celt_encoder.c | 6 +++--- celt/pitch.c | 2 +- src/opus_multistream_encoder.c | 2 +- 5 files changed, 10 insertions(+), 9 deletions(-) diff --git a/celt/bands.c b/celt/bands.c index bbe8a4c..1ab24aa 100644 --- a/celt/bands.c +++ b/celt/bands.c
2015 Nov 21
12
[Aarch64 v2 00/18] Patches to enable Aarch64 (version 2)
As promised, here's a re-send of all my Aarch64 patches, following comments by John Ridges. Note that they actually affect more than just Aarch64 -- other than the ones specifically guarded by AARCH64_NEON defines, the Neon intrinsics all also apply on armv7; and the OPUS_FAST_INT64 patches apply on any 64-bit machine. The patches should largely be independent and independently useful, other
2015 Nov 07
12
[Aarch64 00/11] Patches to enable Aarch64 (arm64) optimizations, rebased to current master.
Here are my aarch64 patches rebased to the current tip of Opus master. They're largely the same as my previous patch set, with the addition of the final one (the Neon fixed-point implementation of xcorr_kernel). This replaces Viswanath's Neon fixed-point celt_pitch_xcorr, since xcorr_kernel is used in celt_fir and celt_iir as well. These have been tested for correctness under qemu