search for: fout_4

Displaying 2 results from an estimated 2 matches for "fout_4".

2014 Nov 09
0
[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics
...); \ + tv = vtrnq_f32(m, t); \ + m = vaddq_f32(tv.val[0], tv.val[1]); \ + }while(0) + +#define ONES_MINUS_ONE 0xbf8000003f800000 //{-1.0, 1.0} +#define MINUS_ONE 0xbf800000bf800000 // {-1.0, -1.0} + +static void kf_bfly4_neon_m1(kiss_fft_cpx *Fout, int N) { + float32x4_t Fout_4[2]; + float32x2_t Fout_2[4]; + float32x2_t scratch_2[2]; + float32x2_t ones_2 = vcreate_f32(ONES_MINUS_ONE); + float32x2_t minusones_2 = vcreate_f32(MINUS_ONE); + float *ai = (float *)Fout; + float *bi = (float *)Fout; + int i; + + /* Consume/update 4 complex Fout values per cycle + * just like no...
2014 Nov 09
3
[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics
Hello, This patch introduces ARM NEON Intrinsics to optimize kf_bfly4 routine in celt part of libopus. Using NEON optimized kf_bfly4(_neon) routine helped improve performance of opus_fft_impl function by about 21.4%. The end use case was decoding a music opus ogg file. The end use case saw performance improvement of about 4.47%. This patch has 2 components i. Actual neon code to improve