search for: tw_2

Displaying 2 results from an estimated 2 matches for "tw_2".

Did you mean: t_2
2014 Nov 09
0
[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics
...3_2; + int fstride_2 = 2*fstride; + int fs_tw1 = 2*fstride_2; + int fs_tw2 = 4*fstride_2; + int fs_tw3 = 6*fstride_2; + int fs_x = 3*fstride_2; + const int m1 = 2*m; + const int m2 = 4*m; // 2*(2*m) + const int m3 = 6*m; // 3*(2*m) + kiss_fft_cpx *Fout_beg = Fout; + float32x4_t tw[3]; + float32x2_t tw_2[6]; + float *ai; + + /* m is guaranteed to be a multiple of 4 + * however, this function will function properly + * so long as m is a multiple of 2 + */ + celt_assert((m%2 == 0)); + + for (i = 0; i < N; i++) { + Fout = Fout_beg + i*mm; + ai = (float *) Fout; + tw1 = tw2 = tw3 = (float *)st...
2014 Nov 09
3
[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics
Hello, This patch introduces ARM NEON Intrinsics to optimize kf_bfly4 routine in celt part of libopus. Using NEON optimized kf_bfly4(_neon) routine helped improve performance of opus_fft_impl function by about 21.4%. The end use case was decoding a music opus ogg file. The end use case saw performance improvement of about 4.47%. This patch has 2 components i. Actual neon code to improve