Displaying 4 results from an estimated 4 matches for "xcorr3to1_kernel_neon_float".
2014 Dec 09
1
[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...> +      SUMM = vmlaq_lane_f32(SUMM, YY[0], XX_2, 0);
> +      YY[0] = vld1q_f32(yi++);
> +   case 1:
> +      XX_2 = vld1_dup_f32(xi++);
> +      SUMM = vmlaq_lane_f32(SUMM, YY[0], XX_2, 0);
> +   }
> +
> +   vst1q_f32(sum, SUMM);
> +}
> +
> +/*
> + * Function: xcorr3to1_kernel_neon_float
> + * ---------------------------------
> + * Computes single correlation values and stores in *sum
> + */
> +void xcorr3to1_kernel_neon_float(const float *x, const float *y,
> +               float *sum, int len) {
I had to think quite a bit about what "3to1" meant (since...
2014 Dec 07
2
[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi,
Optimizes celt_pitch_xcorr for floating point.
Changes from RFCv1:
- Rebased on top of commit
  aad281878: Fix celt_pitch_xcorr_c signature.
  which got rid of ugly code around CELT_PITCH_XCORR_IMPL
  passing of "arch" parameter.
- Unified with --enable-intrinsics used by x86
- Modified algorithm to be more in-line with algorithm in
  celt_pitch_xcorr_arm.s
Viswanath Puttagunta
2014 Dec 07
0
[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...d1q_f32(yi++);
+   case 2:
+      XX_2 = vld1_dup_f32(xi++);
+      SUMM = vmlaq_lane_f32(SUMM, YY[0], XX_2, 0);
+      YY[0] = vld1q_f32(yi++);
+   case 1:
+      XX_2 = vld1_dup_f32(xi++);
+      SUMM = vmlaq_lane_f32(SUMM, YY[0], XX_2, 0);
+   }
+
+   vst1q_f32(sum, SUMM);
+}
+
+/*
+ * Function: xcorr3to1_kernel_neon_float
+ * ---------------------------------
+ * Computes single correlation values and stores in *sum
+ */
+void xcorr3to1_kernel_neon_float(const float *x, const float *y,
+               float *sum, int len) {
+   int i;
+   float32x4_t XX[4];
+   float32x4_t YY[4];
+   float32x4_t SUMM;
+   float32x2_...
2014 Dec 07
3
[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
From: Viswanath Puttagunta <viswanath.puttagunta at linaro.org>
Hi,
Optimizes celt_pitch_xcorr for floating point.
Changes from RFCv1:
- Rebased on top of commit
  aad281878: Fix celt_pitch_xcorr_c signature.
  which got rid of ugly code around CELT_PITCH_XCORR_IMPL
  passing of "arch" parameter.
- Unified with --enable-intrinsics used by x86
- Modified algorithm to be more