Displaying 2 results from an estimated 2 matches for "ones_2".
Did you mean:
ones
2014 Nov 09
0
[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics
...[1]); \
+ }while(0)
+
+#define ONES_MINUS_ONE 0xbf8000003f800000 //{-1.0, 1.0}
+#define MINUS_ONE 0xbf800000bf800000 // {-1.0, -1.0}
+
+static void kf_bfly4_neon_m1(kiss_fft_cpx *Fout, int N) {
+ float32x4_t Fout_4[2];
+ float32x2_t Fout_2[4];
+ float32x2_t scratch_2[2];
+ float32x2_t ones_2 = vcreate_f32(ONES_MINUS_ONE);
+ float32x2_t minusones_2 = vcreate_f32(MINUS_ONE);
+ float *ai = (float *)Fout;
+ float *bi = (float *)Fout;
+ int i;
+
+ /* Consume/update 4 complex Fout values per cycle
+ * just like normal C code, except each neon
+ * instruction consumes 1 complex number (2 fl...
2014 Nov 09
3
[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics
Hello,
This patch introduces ARM NEON Intrinsics to optimize
kf_bfly4 routine in celt part of libopus.
Using NEON optimized kf_bfly4(_neon) routine helped improve
performance of opus_fft_impl function by about 21.4%. The
end use case was decoding a music opus ogg file. The end
use case saw performance improvement of about 4.47%.
This patch has 2 components
i. Actual neon code to improve