forrest
2018-Jan-06 09:02 UTC
[opus] Ask for suggestions about optimizing opus on STM32F407
<style>table.customTableClassName {margin-bottom: 10px;border-collapse: collapse;display: table;}.customTableClassName td, .customTableClassName th {border: 1px solid #ddd;}</style><div id="write-custom-write" tabindex="0" style="font-size: 12px; font-family: 宋体; outline: medium none currentcolor;"><p style="margin:0px;">Dear Developers,</p><p style="margin:0px;"><br></p><p style="margin:0px;">I make a opus 1.2.1 codec build for STM32F407(fixed-point and disable float APIs).</p><p style="margin:0px;">it seems too slow for the VOIP application.</p><p style="margin:0px;"><br></p><p style="margin:0px;">Case 1:<br></p><p style="margin:0px;">48KHz Sampling rate, Stereo, VBR, frame size: 20ms, Bit-rates: 96kbps</p><p style="margin:0px;">Encode cost: 2.11x real time</p><p style="margin:0px;">Decode cost: 1.54x real time</p><p style="margin:0px;">Encode + Decode: 3.65x<br></p><br><p style="margin:0px;">Case 2:</p><p style="margin:0px;">8KHz Sampling rate, Mono, VBR, frame size: 20ms, Bit-rates: 16kbps</p><p style="margin:0px;">Encode cost: 1.08x real time</p><p style="margin:0px;">Decode cost: 0.14x real time</p><p style="margin:0px;">Encode + Decode: 1.24x</p><p style="margin:0px;"><br></p><p style="margin:0px;">Are there any available optimizations or suggestions for Cortex-M4?</p><p style="margin:0px;"><br></p><p style="margin:0px;">As I knonw, TI TM4C129x is based on Cortex-M4 too:</p><p style="margin:0px;"><a href="http://www.ti.com/tool/TIDM-TM4C129POEAUDIO" _src="http://www.ti.com/tool/TIDM-TM4C129POEAUDIO">http://www.ti.com/tool/TIDM-TM4C129POEAUDIO</a></p><p style="margin:0px;"><a href="http://www.ti.com/tool/TIDM-TM4C129POEAUDIO" _src="http://www.ti.com/tool/TIDM-TM4C129POEAUDIO"></a><br></p><p style="margin:0px;">The performance of opus on it is good enough for mono 48KHz sampling rate.</p><p style="margin:0px;">CPU usage is only about 70% of 120MHz when encode/decode at same time.<br></p><p style="margin:0px;"><br> </p><p style="margin:0px;">Sincerely</p><p style="margin:0px;">Forrest</p><p style="margin:0px;"><br></p></div>
Amit Ashara
2018-Jan-12 17:03 UTC
[opus] Ask for suggestions about optimizing opus on STM32F407
Hello Forrest, Did you try using the same constraints as the Reference design for the TM4C129 implementation, i.e. CBR, mono at 48 KHz? Regards Amit Ashara On Sat, Jan 6, 2018 at 3:02 AM, forrest <forrest at 263.net> wrote:> Dear Developers, > > > I make a opus 1.2.1 codec build for STM32F407(fixed-point and disable > float APIs). > > it seems too slow for the VOIP application. > > > Case 1: > > 48KHz Sampling rate, Stereo, VBR, frame size: 20ms, Bit-rates: 96kbps > > Encode cost: 2.11x real time > > Decode cost: 1.54x real time > > Encode + Decode: 3.65x > > Case 2: > > 8KHz Sampling rate, Mono, VBR, frame size: 20ms, Bit-rates: 16kbps > > Encode cost: 1.08x real time > > Decode cost: 0.14x real time > > Encode + Decode: 1.24x > > > Are there any available optimizations or suggestions for Cortex-M4? > > > As I knonw, TI TM4C129x is based on Cortex-M4 too: > > http://www.ti.com/tool/TIDM-TM4C129POEAUDIO > > <http://www.ti.com/tool/TIDM-TM4C129POEAUDIO> > > The performance of opus on it is good enough for mono 48KHz sampling rate. > > CPU usage is only about 70% of 120MHz when encode/decode at same time. > > > Sincerely > > Forrest > > > > _______________________________________________ > opus mailing list > opus at xiph.org > http://lists.xiph.org/mailman/listinfo/opus > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20180112/d47fcf7d/attachment.html>
Thomas Böhm
2018-Jan-14 08:05 UTC
[opus] Ask for suggestions about optimizing opus on STM32F407
Hello Forrest, some years ago i developed a network media player based on a STM32F407ZGT6 (168MHz clock) and opus 1.1. I used just the fixed point code and did no particular optimization on the opus code itself because the performance was already quite good, see figures below. The figures are for real time playback with different frame sizes and various constant bit rates. I didn't play that much with encoding, but I'm convinced that the 32F407 is powerful enough to do the job, if you use all its capabilities. Most important is to use the hardware features of the processor like the DMA controller or the CRC calculation unit, if you deal with ogg, to unload the CPU. SILK narrow band, a) mono b) stereo: SILK medium band, a) mono b) stereo: Hybride wide band, a) mono b) stereo: Hybride super wide band, a) mono b) stereo: Hybride full band, a) mono b) stereo: CELT full band mono: CELT full band stereo: Regards, Thomas Am 06.01.2018 um 10:02 schrieb forrest:> > Dear Developers, > > > I make a opus 1.2.1 codec build for STM32F407(fixed-point and disable > float APIs). > > it seems too slow for the VOIP application. > > > Case 1: > > 48KHz Sampling rate, Stereo, VBR, frame size: 20ms, Bit-rates: 96kbps > > Encode cost: 2.11x real time > > Decode cost: 1.54x real time > > Encode + Decode: 3.65x > > > Case 2: > > 8KHz Sampling rate, Mono, VBR, frame size: 20ms, Bit-rates: 16kbps > > Encode cost: 1.08x real time > > Decode cost: 0.14x real time > > Encode + Decode: 1.24x > > > Are there any available optimizations or suggestions for Cortex-M4? > > > As I knonw, TI TM4C129x is based on Cortex-M4 too: > > http://www.ti.com/tool/TIDM-TM4C129POEAUDIO > > > The performance of opus on it is good enough for mono 48KHz sampling rate. > > CPU usage is only about 70% of 120MHz when encode/decode at same time. > > > Sincerely > > Forrest > > > > > _______________________________________________ > opus mailing list > opus at xiph.org > http://lists.xiph.org/mailman/listinfo/opus-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20180114/a0340dfa/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: bahjffad.png Type: image/png Size: 12590 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20180114/a0340dfa/attachment-0007.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: dcbbfhhe.png Type: image/png Size: 14211 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20180114/a0340dfa/attachment-0008.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: bgjebgge.png Type: image/png Size: 14336 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20180114/a0340dfa/attachment-0009.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: djgcfjic.png Type: image/png Size: 14779 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20180114/a0340dfa/attachment-0010.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: ehegigdg.png Type: image/png Size: 15176 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20180114/a0340dfa/attachment-0011.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: ccgagcba.png Type: image/png Size: 31109 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20180114/a0340dfa/attachment-0012.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: bjdcidbg.png Type: image/png Size: 30010 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20180114/a0340dfa/attachment-0013.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: thomas_boehm_tippi.vcf Type: text/x-vcard Size: 216 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20180114/a0340dfa/attachment-0001.vcf>
Forrest Zhang
2018-Jan-15 04:31 UTC
[opus] Ask for suggestions about optimizing opus on STM32F407
Hello Thomas and Amit, Thanks for your notice and the detailed decode performance report. I describe the details of my encode/decode test on STM32F407ZG. A. opus version: latest 1.2.1 (TI: opus 1.1.2) B. KEIL 5.23 (TI: ARM compiler tool chain 5.2.7) C. setup the encoder as the below (fs is the sampling frequency) enc = opus_encoder_create(fs, chans, OPUS_APPLICATION_AUDIO, &opus_err); opus_encoder_ctl(enc, OPUS_SET_BITRATE(fs * 2)); opus_encoder_ctl(enc, OPUS_SET_BANDWIDTH(OPUS_AUTO)); opus_encoder_ctl(enc, OPUS_SET_VBR(1)); opus_encoder_ctl(enc, OPUS_SET_VBR_CONSTRAINT(0)); opus_encoder_ctl(enc, OPUS_SET_COMPLEXITY(0)); opus_encoder_ctl(enc, OPUS_SET_INBAND_FEC(0)); opus_encoder_ctl(enc, OPUS_SET_FORCE_CHANNELS(OPUS_AUTO)); opus_encoder_ctl(enc, OPUS_SET_DTX(0)); opus_encoder_ctl(enc, OPUS_SET_PACKET_LOSS_PERC(0)); opus_encoder_ctl(enc, OPUS_GET_LOOKAHEAD(&lookahead)); opus_encoder_ctl(enc, OPUS_SET_LSB_DEPTH(16)); opus_encoder_ctl(enc, OPUS_SET_EXPERT_FRAME_DURATION(OPUS_FRAMESIZE_20_MS)); /* CELT is faster than SILK? */ opus_encoder_ctl(enc, OPUS_SET_FORCE_MODE(MODE_CELT_ONLY)); D. generate 20ms PCM sample data (Cosine wave with amplitude 0x6000 and frequency about 1150 Hz) E. encode the PCM data and decode it immediately, count the CPU usages. F. repeat until reach the duration time (1000ms or 10000ms) G. The summary of STM32F407 Test Result as below: Mode Sample Chan Freq. Duration Encode + Decode = Total FLOAT 48kHz 2 1150 1000ms 2735ms + 3367ms = 6102ms FIXED 48kHz 2 1150 1000ms 2112ms + 1543ms = 3698ms FIXED 48kHz 1 1150 1000ms 1312ms + 911ms = 2249ms FIXED 24kHz 1 1150 1000ms 1067ms + 783ms = 1872ms FIXED 16kHz 1 1150 1000ms 922ms + 711ms = 1651ms FIXED 12kHz 1 1150 1000ms 1296ms + 193ms = 1507ms FIXED 8kHz 2 1150 1000ms 1014ms + 147ms = 1181ms FIXED 8kHz 1 1150 1000ms 1086ms + 135ms = 1241ms FIXED 8kHz 1 1150 10000ms 11206ms + 1318ms = 12544ms H. Build Options FLOAT: OPUS_BUILD,USE_ALLOCA,CUSTOM_SUPPORT FIXED: OPUS_BUILD,USE_ALLOCA,CUSTOM_SUPPORT,FIXED_POINT,DISABLE_FLOAT_API Note: the target bit rate is twice of the sampling frequency. That's to say, the bit rate will be 96kbps, if the sampling frequency is 48kHz. The CPU usage is about 91% (911ms/1000ms), when decode 48KHz/mono/96bps. but encode requires more CPU (132%, 1312/1000ms). I will try lower bit rate and update the result later. Sincerely Forrest On Sunday, January 14, 2018 9:05:44 AM CST Thomas Böhm wrote:> Hello Forrest, > some years ago i developed a network media player based on a > STM32F407ZGT6 (168MHz clock) and opus 1.1. > I used just the fixed point code and did no particular optimization on > the opus code itself because the performance was already quite good, see > figures below. > The figures are for real time playback with different frame sizes and > various constant bit rates. > I didn't play that much with encoding, but I'm convinced that the 32F407 > is powerful enough to do the job, if you use all its capabilities. > > Most important is to use the hardware features of the processor like the > DMA controller or the CRC calculation unit, if you deal with ogg, to > unload the CPU. > > SILK narrow band, a) mono b) stereo: > > SILK medium band, a) mono b) stereo: > > Hybride wide band, a) mono b) stereo: > > Hybride super wide band, a) mono b) stereo: > > Hybride full band, a) mono b) stereo: > > > CELT full band mono: > > CELT full band stereo: > > Regards, > Thomas > > Am 06.01.2018 um 10:02 schrieb forrest: > > Dear Developers, > > > > > > I make a opus 1.2.1 codec build for STM32F407(fixed-point and disable > > float APIs). > > > > it seems too slow for the VOIP application. > > > > > > Case 1: > > > > 48KHz Sampling rate, Stereo, VBR, frame size: 20ms, Bit-rates: 96kbps > > > > Encode cost: 2.11x real time > > > > Decode cost: 1.54x real time > > > > Encode + Decode: 3.65x > > > > > > Case 2: > > > > 8KHz Sampling rate, Mono, VBR, frame size: 20ms, Bit-rates: 16kbps > > > > Encode cost: 1.08x real time > > > > Decode cost: 0.14x real time > > > > Encode + Decode: 1.24x > > > > > > Are there any available optimizations or suggestions for Cortex-M4? > > > > > > As I knonw, TI TM4C129x is based on Cortex-M4 too: > > > > http://www.ti.com/tool/TIDM-TM4C129POEAUDIO > > > > > > The performance of opus on it is good enough for mono 48KHz sampling rate. > > > > CPU usage is only about 70% of 120MHz when encode/decode at same time. > > > > > > Sincerely > > > > Forrest > > > > > > > > > > _______________________________________________ > > opus mailing list > > opus at xiph.org > > http://lists.xiph.org/mailman/listinfo/opus