search for: xcorr_kernel_neon_float_process1

Displaying 9 results from an estimated 9 matches for "xcorr_kernel_neon_float_process1".

2014 Dec 19
3
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...to the switch statement, that would change my opinion, but if there isn't, then either of the other two approaches seem better. One more nit I hadn't noticed before: > + float *xi = x; > + float *yi = y; These need to be const float32_t (in both xcorr_kernel_neon_float and xcorr_kernel_neon_float_process1). They're currently causing a ton of warning spew. float32_t appears to not be considered equivalent to float, which means you'll also need casts here: > + vst1q_f32(sum, SUMM); and here: > + vst1_lane_f32(sum, SUMM_2[0], 0);
2014 Dec 19
2
[PATCH v1] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi, Optimizes celt_pitch_xcorr for ARM NEON floating point. Changes from RFCv3: - celt_neon_intr.c - removed warnings due to not having constant pointers - Put simpler loop to take care of corner cases. Unrolling using intrinsics was not really mapping well to what was done in celt_pitch_xcorr_arm.s - Makefile.am Removed explicit -O3 optimization - test_unit_mathops.c,
2014 Dec 19
0
[PATCH v1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...XX_2 = vld1_dup_f32(xi++); + SUMM = vmlaq_lane_f32(SUMM, YY[0], XX_2, 0); + YY[0]= vld1q_f32(yi++); + len--; + } + + if (len > 0) { + XX_2 = vld1_dup_f32(xi); + SUMM = vmlaq_lane_f32(SUMM, YY[0], XX_2, 0); + } + + vst1q_f32(sum, SUMM); +} + +/* + * Function: xcorr_kernel_neon_float_process1 + * --------------------------------- + * Computes single correlation values and stores in *sum + */ +static void xcorr_kernel_neon_float_process1(const float *x, const float *y, + float *sum, int len) { + float32x4_t XX[4]; + float32x4_t YY[4]; + float32x2_t XX_2; + float32x2...
2014 Dec 10
0
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...d1q_f32(yi++); + case 2: + XX_2 = vld1_dup_f32(xi++); + SUMM = vmlaq_lane_f32(SUMM, YY[0], XX_2, 0); + YY[0] = vld1q_f32(yi++); + case 1: + XX_2 = vld1_dup_f32(xi++); + SUMM = vmlaq_lane_f32(SUMM, YY[0], XX_2, 0); + } + + vst1q_f32(sum, SUMM); +} + +/* + * Function: xcorr_kernel_neon_float_process1 + * --------------------------------- + * Computes single correlation values and stores in *sum + */ +static void xcorr_kernel_neon_float_process1(const float *x, const float *y, + float *sum, int len) { + float32x4_t XX[4]; + float32x4_t YY[4]; + float32x2_t XX_2; + float32x2...
2014 Dec 10
2
[RFC PATCH v3] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi, Optimizes celt_pitch_xcorr for floating point. Changes from RFCv2: - Changes recommended by Timothy for celt_neon_intr.c everything except, left the unrolled loop still unrolled - configure.ac - use AC_LINK_IFELSE instead of AC_COMPILE_IFELSE - Moved compile flags into Makefile.am - OPUS_ARM_NEON_INR --> typo --> OPUS_ARM_NEON_INTR Viswanath Puttagunta (1): armv7:
2014 Dec 19
2
[PATCH v1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...= vmlaq_lane_f32(SUMM, YY[1], XX_2, 1); len -= 2; break; case 1: XX_2 = vld1_f32(xi); SUMM = vmlaq_lane_f32(SUMM, YY[0], XX_2, 0); len--; break; } } > + > + vst1q_f32(sum, SUMM); > +} > + > +/* > + * Function: xcorr_kernel_neon_float_process1 > + * --------------------------------- > + * Computes single correlation values and stores in *sum > + */ > +static void xcorr_kernel_neon_float_process1(const float *x, const float *y, > + float *sum, int len) { > + float32x4_t XX[4]; > + float32x4_t YY[4];...
2014 Dec 19
0
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
..., thanks for your explanation. I will take a closer look at this part of celt_pitch_xcorr_arm.s > > > One more nit I hadn't noticed before: > >> + float *xi = x; >> + float *yi = y; > > These need to be const float32_t (in both xcorr_kernel_neon_float and > xcorr_kernel_neon_float_process1). They're currently causing a ton of > warning spew. float32_t appears to not be considered equivalent to > float, which means you'll also need casts here: > >> + vst1q_f32(sum, SUMM); > > and here: > >> + vst1_lane_f32(sum, SUMM_2[0], 0); Thanks, will do....
2016 Sep 13
4
[PATCH 12/15] Replace call of celt_inner_prod_c() (step 1)
Should call celt_inner_prod(). --- celt/bands.c | 7 ++++--- celt/bands.h | 2 +- celt/celt_encoder.c | 6 +++--- celt/pitch.c | 2 +- src/opus_multistream_encoder.c | 2 +- 5 files changed, 10 insertions(+), 9 deletions(-) diff --git a/celt/bands.c b/celt/bands.c index bbe8a4c..1ab24aa 100644 --- a/celt/bands.c +++ b/celt/bands.c
2014 Dec 18
2
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Almost there... just a few nits left. Viswanath Puttagunta wrote: > +if OPUS_ARM_NEON_INTR > +CELT_SOURCES += $(CELT_SOURCES_ARM_NEON_INTR) > +OPUS_ARM_NEON_INTR_CPPFLAGS = -mfpu=neon -O3 I'll repeat: I don't think you should change the optimization level here. > + /* Just unroll the rest of the loop */ I saw you decided to keep this unrolled, but you didn't actually