Displaying 4 results from an estimated 4 matches for "xcorr3to1_kernel_neon_float".
2014 Dec 09
1
[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...> + SUMM = vmlaq_lane_f32(SUMM, YY[0], XX_2, 0);
> + YY[0] = vld1q_f32(yi++);
> + case 1:
> + XX_2 = vld1_dup_f32(xi++);
> + SUMM = vmlaq_lane_f32(SUMM, YY[0], XX_2, 0);
> + }
> +
> + vst1q_f32(sum, SUMM);
> +}
> +
> +/*
> + * Function: xcorr3to1_kernel_neon_float
> + * ---------------------------------
> + * Computes single correlation values and stores in *sum
> + */
> +void xcorr3to1_kernel_neon_float(const float *x, const float *y,
> + float *sum, int len) {
I had to think quite a bit about what "3to1" meant (since...
2014 Dec 07
2
[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi,
Optimizes celt_pitch_xcorr for floating point.
Changes from RFCv1:
- Rebased on top of commit
aad281878: Fix celt_pitch_xcorr_c signature.
which got rid of ugly code around CELT_PITCH_XCORR_IMPL
passing of "arch" parameter.
- Unified with --enable-intrinsics used by x86
- Modified algorithm to be more in-line with algorithm in
celt_pitch_xcorr_arm.s
Viswanath Puttagunta
2014 Dec 07
0
[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...d1q_f32(yi++);
+ case 2:
+ XX_2 = vld1_dup_f32(xi++);
+ SUMM = vmlaq_lane_f32(SUMM, YY[0], XX_2, 0);
+ YY[0] = vld1q_f32(yi++);
+ case 1:
+ XX_2 = vld1_dup_f32(xi++);
+ SUMM = vmlaq_lane_f32(SUMM, YY[0], XX_2, 0);
+ }
+
+ vst1q_f32(sum, SUMM);
+}
+
+/*
+ * Function: xcorr3to1_kernel_neon_float
+ * ---------------------------------
+ * Computes single correlation values and stores in *sum
+ */
+void xcorr3to1_kernel_neon_float(const float *x, const float *y,
+ float *sum, int len) {
+ int i;
+ float32x4_t XX[4];
+ float32x4_t YY[4];
+ float32x4_t SUMM;
+ float32x2_...
2014 Dec 07
3
[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
From: Viswanath Puttagunta <viswanath.puttagunta at linaro.org>
Hi,
Optimizes celt_pitch_xcorr for floating point.
Changes from RFCv1:
- Rebased on top of commit
aad281878: Fix celt_pitch_xcorr_c signature.
which got rid of ugly code around CELT_PITCH_XCORR_IMPL
passing of "arch" parameter.
- Unified with --enable-intrinsics used by x86
- Modified algorithm to be more