Displaying 2 results from an estimated 2 matches for "tw_2".
Did you mean:
t_2
2014 Nov 09
0
[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics
...3_2;
+ int fstride_2 = 2*fstride;
+ int fs_tw1 = 2*fstride_2;
+ int fs_tw2 = 4*fstride_2;
+ int fs_tw3 = 6*fstride_2;
+ int fs_x = 3*fstride_2;
+ const int m1 = 2*m;
+ const int m2 = 4*m; // 2*(2*m)
+ const int m3 = 6*m; // 3*(2*m)
+ kiss_fft_cpx *Fout_beg = Fout;
+ float32x4_t tw[3];
+ float32x2_t tw_2[6];
+ float *ai;
+
+ /* m is guaranteed to be a multiple of 4
+ * however, this function will function properly
+ * so long as m is a multiple of 2
+ */
+ celt_assert((m%2 == 0));
+
+ for (i = 0; i < N; i++) {
+ Fout = Fout_beg + i*mm;
+ ai = (float *) Fout;
+ tw1 = tw2 = tw3 = (float *)st...
2014 Nov 09
3
[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics
Hello,
This patch introduces ARM NEON Intrinsics to optimize
kf_bfly4 routine in celt part of libopus.
Using NEON optimized kf_bfly4(_neon) routine helped improve
performance of opus_fft_impl function by about 21.4%. The
end use case was decoding a music opus ogg file. The end
use case saw performance improvement of about 4.47%.
This patch has 2 components
i. Actual neon code to improve