Phil Wang
2014-Nov-25 10:17 UTC
[opus] [Profiling][FFT][AArch64] FFT Profiling data on AArch64
Hi everyone, I have profiled Opus on AArch64. I just run opus_demo with some pcm files. Following is time proportion of FFT with different bitrate. Bitrate | Time cost by FFT/iFFT 24kb/s | 15% 48kb/s | 15% 96kb/s | 13% Any comment? I want some data close to real application, any suggestion? Thanks, Phil Wang -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/opus/attachments/20141125/ae6c1c8c/attachment.htm
Viswanath Puttagunta
2014-Nov-25 14:49 UTC
[opus] [Profiling][FFT][AArch64] FFT Profiling data on AArch64
Hello Phil, Data you presented is about in-line with what I observed on ARMv7 (Cortex-A8 Beaglebone Black) as well. opus_fft_impl() is one of the top contributors for performance in below opus decode (celt decode) case. To observe how much kf_bfly4_c (called by opus_fft_impl) contributes, I removed the "static" keyword and did the run below. So, overvall, opus_fft_impl contributes to about 11.9 + 5.59 = 17.49% during decode use case. Is this the kind of data you are looking for? More information is presented in [1] where I optimized the kf_bfly4_c(), posted patch at [2]. But after you mentioned about your FFT work in NE10, I requested that [2] be put on hold. Do let me know should you need any further information. $ perf_3.17.0-1 record opusdec music_48kbps.opus k.wav $ perf_3.17.0-1 report Samples: 99K of event 'cycles', Event count (approx.): 798645278 Overhead Command Shared Object Symbol 24.71% opusdec opusdec [.] audio_write 11.90% opusdec libopus.so [.] opus_fft_impl <---- 7.94% opusdec libopus.so [.] clt_mdct_backward 7.77% opusdec libm-2.19.so [.] lrintf 6.66% opusdec libopus.so [.] comb_filter 5.59% opusdec libopus.so [.] kf_bfly4_c <---- 5.03% opusdec libc-2.19.so [.] memmove 3.95% opusdec libopus.so [.] quant_all_bands 3.27% opusdec libopus.so [.] deemphasis.isra.1 2.85% opusdec libopus.so [.] exp_rotation1 1.52% opusdec libopus.so [.] decode_pulses 1.27% opusdec libopus.so [.] __udivsi3 1.21% opusdec libopus.so [.] haar1 1.20% opusdec libopus.so [.] alg_unquant 1.15% opusdec libm-2.19.so [.] __exp_finite 1.10% opusdec libopus.so [.] quant_partition 1.09% opusdec libopus.so [.] denormalise_bands 1.06% opusdec libopus.so [.] quant_band 1.04% opusdec opusdec [.] main 0.65% opusdec libopus.so [.] compute_theta 0.50% opusdec libopus.so [.] compute_allocation [1]: https://docs.google.com/document/d/1L6csATjSsXtzg_sa1iHZta8hOsoVWA4UjHXEakpTrNk/edit?usp=sharing [2]: http://lists.xiph.org/pipermail/opus/2014-November/002744.html Regards, Vish On 25 November 2014 at 04:17, Phil Wang <wzf0428 at gmail.com> wrote:> Hi everyone, > > I have profiled Opus on AArch64. I just run opus_demo with some pcm files. > Following is time proportion of FFT with different bitrate. > > Bitrate | Time cost by FFT/iFFT > 24kb/s | 15% > 48kb/s | 15% > 96kb/s | 13% > > Any comment? I want some data close to real application, any suggestion? > > Thanks, > Phil Wang
Possibly Parallel Threads
- [RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics
- opusdec forces decode at 48k ?
- [RFC PATCH v1 2/2] armv7(float): Optimize encode usecase using NE10 library
- Stereo voice not being retained
- [RFC PATCH v1] arm: kf_bfly4: Introduce ARM neon intrinsics