Displaying 2 results from an estimated 2 matches for "vmul_f32".
Did you mean:
vmulq_f32
2014 Nov 09
0
[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics
...);
+ scratch_2[1] = vadd_f32(Fout_2[1], Fout_2[3]);
+ Fout_2[2] = vsub_f32(Fout_2[0], scratch_2[1]);
+ Fout_2[0] = vadd_f32(Fout_2[0], scratch_2[1]);
+ scratch_2[1] = vsub_f32(Fout_2[1], Fout_2[3]);
+
+ scratch_2[1] = vrev64_f32(scratch_2[1]);
+ /* scratch_2[1] *= (1, -1) */
+ scratch_2[1] = vmul_f32(scratch_2[1], ones_2);
+ Fout_2[1] = vadd_f32(scratch_2[0], scratch_2[1]);
+
+ /* scratch_2[1] *= (-1, -1) */
+ scratch_2[1] = vmul_f32(scratch_2[1], minusones_2);
+ Fout_2[3] = vadd_f32(scratch_2[0], scratch_2[1]);
+
+ Fout_4[0] = vcombine_f32(Fout_2[0], Fout_2[1]);
+ Fout_4[1] = vcombine_f3...
2014 Nov 09
3
[RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics
Hello,
This patch introduces ARM NEON Intrinsics to optimize
kf_bfly4 routine in celt part of libopus.
Using NEON optimized kf_bfly4(_neon) routine helped improve
performance of opus_fft_impl function by about 21.4%. The
end use case was decoding a music opus ogg file. The end
use case saw performance improvement of about 4.47%.
This patch has 2 components
i. Actual neon code to improve