Displaying 3 results from an estimated 3 matches for "vcreate_f32".
2014 Nov 09
0
[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics
...}while(0)
+
+#define ONES_MINUS_ONE 0xbf8000003f800000 //{-1.0, 1.0}
+#define MINUS_ONE 0xbf800000bf800000 // {-1.0, -1.0}
+
+static void kf_bfly4_neon_m1(kiss_fft_cpx *Fout, int N) {
+ float32x4_t Fout_4[2];
+ float32x2_t Fout_2[4];
+ float32x2_t scratch_2[2];
+ float32x2_t ones_2 = vcreate_f32(ONES_MINUS_ONE);
+ float32x2_t minusones_2 = vcreate_f32(MINUS_ONE);
+ float *ai = (float *)Fout;
+ float *bi = (float *)Fout;
+ int i;
+
+ /* Consume/update 4 complex Fout values per cycle
+ * just like normal C code, except each neon
+ * instruction consumes 1 complex number (2 floats)
+ * In...
2014 Nov 09
3
[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics
Hello,
This patch introduces ARM NEON Intrinsics to optimize
kf_bfly4 routine in celt part of libopus.
Using NEON optimized kf_bfly4(_neon) routine helped improve
performance of opus_fft_impl function by about 21.4%. The
end use case was decoding a music opus ogg file. The end
use case saw performance improvement of about 4.47%.
This patch has 2 components
i. Actual neon code to improve
2014 Sep 10
4
[RFC PATCH v1 0/3] Introducing ARM SIMD Support
libvorbis does not currently have any simd/vectorization.
Following patches add generic framework for simd/vectorization
and on top, add ARM-NEON simd vectorization using intrinsics.
I was able to get over 34% performance improvement on my
Beaglebone Black which is single Cortex-A8 based CPU.
You can find more information on metrics and procedure I used
to measure at