Displaying 13 results from an estimated 13 matches for "vst1_lane_f32".
2014 Dec 19
3
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...nst float32_t (in both xcorr_kernel_neon_float and
xcorr_kernel_neon_float_process1). They're currently causing a ton of
warning spew. float32_t appears to not be considered equivalent to
float, which means you'll also need casts here:
> + vst1q_f32(sum, SUMM);
and here:
> + vst1_lane_f32(sum, SUMM_2[0], 0);
2014 Dec 09
1
[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...ess
16 floats in one of those.
> + while (len >= 16) {
> + /* Accumulate results into single float */
> + tv.val[0] = vadd_f32(vget_low_f32(SUMM), vget_high_f32(SUMM));
> + tv = vtrn_f32(tv.val[0], ZERO);
> + tv.val[0] = vadd_f32(tv.val[0], tv.val[1]);
> +
> + vst1_lane_f32(&sumi, tv.val[0], 0);
Accessing tv.val[0] and tv.val[1] directly seems to send these values
through the stack, e.g.,
f4: f3ba7085 vtrn.32 d7, d5
f8: ed0b7b0f vstr d7, [fp, #-60]
fc: ed0b5b0d vstr d5, [fp, #-52]
...
114: ed1b6b09 vldr d6...
2014 Dec 19
0
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...eon_float and
> xcorr_kernel_neon_float_process1). They're currently causing a ton of
> warning spew. float32_t appears to not be considered equivalent to
> float, which means you'll also need casts here:
>
>> + vst1q_f32(sum, SUMM);
>
> and here:
>
>> + vst1_lane_f32(sum, SUMM_2[0], 0);
Thanks, will do.
> _______________________________________________
> opus mailing list
> opus at xiph.org
> http://lists.xiph.org/mailman/listinfo/opus
2014 Dec 18
2
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Almost there... just a few nits left.
Viswanath Puttagunta wrote:
> +if OPUS_ARM_NEON_INTR
> +CELT_SOURCES += $(CELT_SOURCES_ARM_NEON_INTR)
> +OPUS_ARM_NEON_INTR_CPPFLAGS = -mfpu=neon -O3
I'll repeat: I don't think you should change the optimization level here.
> + /* Just unroll the rest of the loop */
I saw you decided to keep this unrolled, but you didn't actually
2014 Dec 07
2
[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi,
Optimizes celt_pitch_xcorr for floating point.
Changes from RFCv1:
- Rebased on top of commit
aad281878: Fix celt_pitch_xcorr_c signature.
which got rid of ugly code around CELT_PITCH_XCORR_IMPL
passing of "arch" parameter.
- Unified with --enable-intrinsics used by x86
- Modified algorithm to be more in-line with algorithm in
celt_pitch_xcorr_arm.s
Viswanath Puttagunta
2014 Dec 19
2
[PATCH v1] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi,
Optimizes celt_pitch_xcorr for ARM NEON floating point.
Changes from RFCv3:
- celt_neon_intr.c
- removed warnings due to not having constant pointers
- Put simpler loop to take care of corner cases. Unrolling using
intrinsics was not really mapping well to what was done
in celt_pitch_xcorr_arm.s
- Makefile.am
Removed explicit -O3 optimization
- test_unit_mathops.c,
2014 Dec 19
0
[PATCH v1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
..._2[0] = vpadd_f32(SUMM_2[0], SUMM_2[0]);
+ /* Ok, now we have result accumulated in SUMM_2[0].0 */
+
+ if (len > 0) {
+ /* Case when you have one value left */
+ XX_2 = vld1_dup_f32(xi);
+ YY_2 = vld1_dup_f32(yi);
+ SUMM_2[0] = vmla_f32(SUMM_2[0], XX_2, YY_2);
+ }
+
+ vst1_lane_f32(sum, SUMM_2[0], 0);
+}
+
+void celt_pitch_xcorr_float_neon(const opus_val16 *_x, const opus_val16 *_y,
+ opus_val32 *xcorr, int len, int max_pitch) {
+ int i;
+ celt_assert(max_pitch > 0);
+ celt_assert((((unsigned char *)_x-(unsigned char *)NULL)&3)==0);
+
+ f...
2014 Dec 07
0
[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...1q_f32(yi);
+ yi += 4;
+ SUMM = vmlaq_f32(SUMM, YY[0], XX[0]);
+ len -= 4;
+ }
+ /* Accumulate results into single float */
+ tv.val[0] = vadd_f32(vget_low_f32(SUMM), vget_high_f32(SUMM));
+ tv = vtrn_f32(tv.val[0], ZERO);
+ tv.val[0] = vadd_f32(tv.val[0], tv.val[1]);
+
+ vst1_lane_f32(&sumi, tv.val[0], 0);
+ for (i = 0; i < len; i++)
+ sumi += xi[i] * yi[i];
+ *sum = sumi;
+}
+
+void celt_pitch_xcorr_float_neon(const opus_val16 *_x, const opus_val16 *_y,
+ opus_val32 *xcorr, int len, int max_pitch) {
+ int i;
+ celt_assert(max_pitch >...
2014 Dec 10
0
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
..._2[0] = vpadd_f32(SUMM_2[0], SUMM_2[0]);
+ /* Ok, now we have result accumulated in SUMM_2[0].0 */
+
+ if (len > 0) {
+ /* Case when you have one value left */
+ XX_2 = vld1_dup_f32(xi);
+ YY_2 = vld1_dup_f32(yi);
+ SUMM_2[0] = vmla_f32(SUMM_2[0], XX_2, YY_2);
+ }
+
+ vst1_lane_f32(sum, SUMM_2[0], 0);
+}
+
+void celt_pitch_xcorr_float_neon(const opus_val16 *_x, const opus_val16 *_y,
+ opus_val32 *xcorr, int len, int max_pitch) {
+ int i;
+ celt_assert(max_pitch > 0);
+ celt_assert((((unsigned char *)_x-(unsigned char *)NULL)&3)==0);
+
+ f...
2014 Dec 10
2
[RFC PATCH v3] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Hi,
Optimizes celt_pitch_xcorr for floating point.
Changes from RFCv2:
- Changes recommended by Timothy for celt_neon_intr.c
everything except, left the unrolled loop still unrolled
- configure.ac
- use AC_LINK_IFELSE instead of AC_COMPILE_IFELSE
- Moved compile flags into Makefile.am
- OPUS_ARM_NEON_INR --> typo --> OPUS_ARM_NEON_INTR
Viswanath Puttagunta (1):
armv7:
2014 Dec 07
3
[RFC PATCH v2] cover: armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
From: Viswanath Puttagunta <viswanath.puttagunta at linaro.org>
Hi,
Optimizes celt_pitch_xcorr for floating point.
Changes from RFCv1:
- Rebased on top of commit
aad281878: Fix celt_pitch_xcorr_c signature.
which got rid of ugly code around CELT_PITCH_XCORR_IMPL
passing of "arch" parameter.
- Unified with --enable-intrinsics used by x86
- Modified algorithm to be more
2014 Dec 19
2
[PATCH v1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
.../* Ok, now we have result accumulated in SUMM_2[0].0 */
> +
> + if (len > 0) {
> + /* Case when you have one value left */
> + XX_2 = vld1_dup_f32(xi);
> + YY_2 = vld1_dup_f32(yi);
> + SUMM_2[0] = vmla_f32(SUMM_2[0], XX_2, YY_2);
> + }
> +
> + vst1_lane_f32(sum, SUMM_2[0], 0);
> +}
> +
> +void celt_pitch_xcorr_float_neon(const opus_val16 *_x, const opus_val16 *_y,
> + opus_val32 *xcorr, int len, int max_pitch) {
> + int i;
> + celt_assert(max_pitch > 0);
> + celt_assert((((unsigned char *)_x-(unsign...
2016 Sep 13
4
[PATCH 12/15] Replace call of celt_inner_prod_c() (step 1)
Should call celt_inner_prod().
---
celt/bands.c | 7 ++++---
celt/bands.h | 2 +-
celt/celt_encoder.c | 6 +++---
celt/pitch.c | 2 +-
src/opus_multistream_encoder.c | 2 +-
5 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/celt/bands.c b/celt/bands.c
index bbe8a4c..1ab24aa 100644
--- a/celt/bands.c
+++ b/celt/bands.c