Displaying 2 results from an estimated 2 matches for "fstride_2".
Did you mean:
fstride
2014 Nov 09
0
[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics
...vcreate_f32(ONES_MINUS_ONE);
+ float32x2_t minusones_2 = vcreate_f32(MINUS_ONE);
+ float32x4_t ones = vcombine_f32(ones_2, ones_2);
+ float32x4_t minusones = vcombine_f32(minusones_2, minusones_2);
+ float32x4_t t;
+ float32x4x2_t tv;
+ float *tw1, *tw2, *tw3;
+ float *tw1_2, *tw2_2, *tw3_2;
+ int fstride_2 = 2*fstride;
+ int fs_tw1 = 2*fstride_2;
+ int fs_tw2 = 4*fstride_2;
+ int fs_tw3 = 6*fstride_2;
+ int fs_x = 3*fstride_2;
+ const int m1 = 2*m;
+ const int m2 = 4*m; // 2*(2*m)
+ const int m3 = 6*m; // 3*(2*m)
+ kiss_fft_cpx *Fout_beg = Fout;
+ float32x4_t tw[3];
+ float32x2_t tw_2[6];
+ float *ai...
2014 Nov 09
3
[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics
Hello,
This patch introduces ARM NEON Intrinsics to optimize
kf_bfly4 routine in celt part of libopus.
Using NEON optimized kf_bfly4(_neon) routine helped improve
performance of opus_fft_impl function by about 21.4%. The
end use case was decoding a music opus ogg file. The end
use case saw performance improvement of about 4.47%.
This patch has 2 components
i. Actual neon code to improve