Sebastian Reimers
2013-Oct-18 14:11 UTC
[opus] AM335x ARM Cortex-A8 performance drop opus 1.1
Hello!, i've just compared the 1.0.3 release with the master branch on a BeagleBone Black (AM335x 1GHz ARM Cortex-A8 with NEON floating-point accelerator) and Arch Linux ARM. At the moment I dont no why, but I see that 1.1 is much slower in encoding. Are there any default changes, that I missed and could explain this? Normaly I suggested a better performance with 1.1 and the ARM optimizations. Please let me know, if you need more informations. opus-tools version 0.1.7 [root at studio-connect.de audio]# opusenc music_orig.wav music_orig.opus Encoding using libopus 1.0.3 (audio) ----------------------------------------------------- Input: 48kHz 2 channels Output: 2 channels (2 coupled) 20ms packets, 96kbit/sec VBR Preskip: 312 Encoding complete ----------------------------------------------------- Encoded: 1 minute and 30.82 seconds Runtime: 1 minute and 8 seconds (1.336x realtime) Wrote: 1106810 bytes, 4541 packets, 93 pages Bitrate: 96.7055kbit/s (without overhead) Instant rates: 76kbit/s to 165.6kbit/s (190 to 414 bytes per packet) Overhead: 0.81% (container+metadata) [root at studio-connect.de audio]# opusenc music_orig.wav music_orig.opus1.1 Encoding using libopus unknown (audio) ----------------------------------------------------- Input: 48kHz 2 channels Output: 2 channels (2 coupled) 20ms packets, 96kbit/sec VBR Preskip: 312 Encoding complete ----------------------------------------------------- Encoded: 1 minute and 30.82 seconds Runtime: 1 minute and 24 seconds (1.081x realtime) Wrote: 1263224 bytes, 4541 packets, 93 pages Bitrate: 110.387kbit/s (without overhead) Instant rates: 24.4kbit/s to 197.2kbit/s (61 to 493 bytes per packet) Overhead: 0.796% (container+metadata) Compiler Options: -march=armv7-a -mfloat-abi=hard -mfpu=neon -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 same results with: -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 nice regards Sebastian Reimers ------------------------------------------ IT-Service Sebastian Reimers Am blanken Boom 14 32369 Rahden Festnetz: 05776-137324 Fax-Nummer: 05221-17242088 Skype: miete-admin E-Mail: service at it-sreimers.de Internet: www.it-sreimers.de Internet: www.miete-admin.de Steuernummer: 331/5079/2619 UST-IdNr.: DE239109607 ------------------------------------------
On 2013-10-18 7:11 AM, Sebastian Reimers wrote:> Hello!, > > i've just compared the 1.0.3 release with the master branch > on a BeagleBone Black (AM335x 1GHz ARM Cortex-A8 with NEON > floating-point accelerator) and Arch Linux ARM. > > At the moment I dont no why, but I see that 1.1 is much slower > in encoding. Are there any default changes, that I missed and could > explain this? Normaly I suggested a better performance with 1.1 and > the ARM optimizations.In 1.1 we have new float-only analysis code. In a floating point build it's enabled only at complexity setting 10, with the default dropped to 9 because it does cause a performance regression (in exchange for a quality progression). opusenc unconditionally sets the complexity to 10, so it won't adapt to the change in default, which could cause the issue you're seeing. Try comparing with --comp 9. -r
Gregory Maxwell
2013-Oct-18 19:26 UTC
[opus] AM335x ARM Cortex-A8 performance drop opus 1.1
On Fri, Oct 18, 2013 at 12:12 PM, Ralph Giles <giles at thaumas.net> wrote:> In 1.1 we have new float-only analysis code. In a floating point build > it's enabled only at complexity setting 10,That should say "In a fixed point build"
I wrote:> In a floating point build > it's enabled only at complexity setting 10Sorry. I meant that in a _fixed_point_ build the new analysis code is enabled only at complexity setting 10. It's currently enabled for floating point builds at complexity 7 or higher. Both fixed and float builds default to complexity 9 in 1.1. -r
Sebastian Reimers
2013-Oct-18 20:41 UTC
[opus] AM335x ARM Cortex-A8 performance drop opus 1.1
Hi,> Try comparing with --comp 9.many thanks for your quick reply. but my results looks really strange and no big difference between 10, 9, 8 and 7: # opusenc --comp 10 music_orig.wav music_orig.opus Encoding using libopus unknown (audio) ----------------------------------------------------- Input: 48kHz 2 channels Output: 2 channels (2 coupled) 20ms packets, 96kbit/sec VBR Preskip: 312 Encoding complete ----------------------------------------------------- Encoded: 1 minute and 30.82 seconds Runtime: 1 minute and 19 seconds (1.15x realtime) Wrote: 1263224 bytes, 4541 packets, 93 pages Bitrate: 110.387kbit/s (without overhead) Instant rates: 24.4kbit/s to 197.2kbit/s (61 to 493 bytes per packet) Overhead: 0.796% (container+metadata) # opusenc --comp 9 music_orig.wav music_orig.opus Encoding using libopus unknown (audio) ----------------------------------------------------- Input: 48kHz 2 channels Output: 2 channels (2 coupled) 20ms packets, 96kbit/sec VBR Preskip: 312 Encoding complete ----------------------------------------------------- Encoded: 1 minute and 30.82 seconds Runtime: 1 minute and 19 seconds (1.15x realtime) Wrote: 1263224 bytes, 4541 packets, 93 pages Bitrate: 110.387kbit/s (without overhead) Instant rates: 24.4kbit/s to 197.2kbit/s (61 to 493 bytes per packet) Overhead: 0.796% (container+metadata) # opusenc --comp 8 music_orig.wav music_orig.opus Encoding using libopus unknown (audio) ----------------------------------------------------- Input: 48kHz 2 channels Output: 2 channels (2 coupled) 20ms packets, 96kbit/sec VBR Preskip: 312 Encoding complete ----------------------------------------------------- Encoded: 1 minute and 30.82 seconds Runtime: 1 minute and 20 seconds (1.135x realtime) Wrote: 1263224 bytes, 4541 packets, 93 pages Bitrate: 110.387kbit/s (without overhead) Instant rates: 24.4kbit/s to 197.2kbit/s (61 to 493 bytes per packet) Overhead: 0.796% (container+metadata) # opusenc --comp 7 music_orig.wav music_orig.opus Encoding using libopus unknown (audio) ----------------------------------------------------- Input: 48kHz 2 channels Output: 2 channels (2 coupled) 20ms packets, 96kbit/sec VBR Preskip: 312 Encoding complete ----------------------------------------------------- Encoded: 1 minute and 30.82 seconds Runtime: 1 minute and 18 seconds (1.164x realtime) Wrote: 1252403 bytes, 4541 packets, 93 pages Bitrate: 109.435kbit/s (without overhead) Instant rates: 24.4kbit/s to 196.4kbit/s (61 to 491 bytes per packet) Overhead: 0.802% (container+metadata) # opusenc --comp 6 music_orig.wav music_orig.opus Encoding using libopus unknown (audio) ----------------------------------------------------- Input: 48kHz 2 channels Output: 2 channels (2 coupled) 20ms packets, 96kbit/sec VBR Preskip: 312 Encoding complete ----------------------------------------------------- Encoded: 1 minute and 30.82 seconds Runtime: 1 minute and 2 seconds (1.465x realtime) Wrote: 1208860 bytes, 4541 packets, 93 pages Bitrate: 105.661kbit/s (without overhead) Instant rates: 27.2kbit/s to 196.4kbit/s (68 to 491 bytes per packet) Overhead: 0.773% (container+metadata) # opusenc --comp 5 music_orig.wav music_orig.opus Encoding using libopus unknown (audio) ----------------------------------------------------- Input: 48kHz 2 channels Output: 2 channels (2 coupled) 20ms packets, 96kbit/sec VBR Preskip: 312 Encoding complete ----------------------------------------------------- Encoded: 1 minute and 30.82 seconds Runtime: 1 minute and 3 seconds (1.442x realtime) Wrote: 1208860 bytes, 4541 packets, 93 pages Bitrate: 105.661kbit/s (without overhead) Instant rates: 27.2kbit/s to 196.4kbit/s (68 to 491 bytes per packet) Overhead: 0.773% (container+metadata) # opusenc --comp 4 music_orig.wav music_orig.opus Encoding using libopus unknown (audio) ----------------------------------------------------- Input: 48kHz 2 channels Output: 2 channels (2 coupled) 20ms packets, 96kbit/sec VBR Preskip: 312 Encoding complete ----------------------------------------------------- Encoded: 1 minute and 30.82 seconds Runtime: 39 seconds (2.329x realtime) Wrote: 1213120 bytes, 4541 packets, 93 pages Bitrate: 106.034kbit/s (without overhead) Instant rates: 27.2kbit/s to 196kbit/s (68 to 490 bytes per packet) Overhead: 0.772% (container+metadata) # opusenc --comp 3 music_orig.wav music_orig.opus Encoding using libopus unknown (audio) ----------------------------------------------------- Input: 48kHz 2 channels Output: 2 channels (2 coupled) 20ms packets, 96kbit/sec VBR Preskip: 312 Encoding complete ----------------------------------------------------- Encoded: 1 minute and 30.82 seconds Runtime: 38 seconds (2.39x realtime) Wrote: 1216331 bytes, 4541 packets, 93 pages Bitrate: 106.316kbit/s (without overhead) Instant rates: 27.2kbit/s to 196.8kbit/s (68 to 492 bytes per packet) Overhead: 0.771% (container+metadata) # opusenc --comp 2 music_orig.wav music_orig.opus Encoding using libopus unknown (audio) ----------------------------------------------------- Input: 48kHz 2 channels Output: 2 channels (2 coupled) 20ms packets, 96kbit/sec VBR Preskip: 312 Encoding complete ----------------------------------------------------- Encoded: 1 minute and 30.82 seconds Runtime: 39 seconds (2.329x realtime) Wrote: 1215900 bytes, 4541 packets, 93 pages Bitrate: 106.278kbit/s (without overhead) Instant rates: 26.8kbit/s to 196.8kbit/s (67 to 492 bytes per packet) Overhead: 0.771% (container+metadata) # opusenc --comp 1 music_orig.wav music_orig.opus Encoding using libopus unknown (audio) ----------------------------------------------------- Input: 48kHz 2 channels Output: 2 channels (2 coupled) 20ms packets, 96kbit/sec VBR Preskip: 312 Encoding complete ----------------------------------------------------- Encoded: 1 minute and 30.82 seconds Runtime: 35 seconds (2.595x realtime) Wrote: 1214244 bytes, 4541 packets, 93 pages Bitrate: 106.133kbit/s (without overhead) Instant rates: 26.8kbit/s to 196.8kbit/s (67 to 492 bytes per packet) Overhead: 0.771% (container+metadata) cheers, Sebastian Reimers
Sebastian Reimers
2013-Oct-18 20:59 UTC
[opus] AM335x ARM Cortex-A8 performance drop opus 1.1
> Sorry. I meant that in a _fixed_point_ build the new analysis code is > enabled only at complexity setting 10. It's currently enabled for > floating point builds at complexity 7 or higher. Both fixed and float > builds default to complexity 9 in 1.1.Ok this could explain why --comp 6 is nearly the same as --comp 10 with 1.0.3 ?
Jean-Marc Valin
2013-Oct-18 21:21 UTC
[opus] AM335x ARM Cortex-A8 performance drop opus 1.1
Hi, Just to clear things up... So 1.1 has some new analysis code that increases the amount of CPU. When building as floating point (which you appear to be doing, right?), the new code is enabled at complexity 7 and up (opusenc defaults to complexity 10 IIRC). This is why you've been seeing an increase in the CPU time. In version 1.0.x, complexity 5-10 are the exactly the same for music. Running version 1.1 at complexity 5 should be slightly faster than 1.0.x. When it comes to fixed-point, things are similar except that the threshold at which the new analysis code is enabled is complexity 10. This is because the new code uses floating-point instructions, even for a fixed-point build. Cheers, Jean-Marc On 10/18/2013 10:11 AM, Sebastian Reimers wrote:> Hello!, > > i've just compared the 1.0.3 release with the master branch > on a BeagleBone Black (AM335x 1GHz ARM Cortex-A8 with NEON > floating-point accelerator) and Arch Linux ARM. > > At the moment I dont no why, but I see that 1.1 is much slower > in encoding. Are there any default changes, that I missed and could > explain this? Normaly I suggested a better performance with 1.1 and > the ARM optimizations. > > Please let me know, if you need more informations. > > opus-tools version 0.1.7 > > > [root at studio-connect.de audio]# opusenc music_orig.wav music_orig.opus > Encoding using libopus 1.0.3 (audio) > ----------------------------------------------------- > Input: 48kHz 2 channels > Output: 2 channels (2 coupled) > 20ms packets, 96kbit/sec VBR > Preskip: 312 > > Encoding > complete > ----------------------------------------------------- > Encoded: 1 minute and 30.82 seconds > Runtime: 1 minute and 8 seconds > (1.336x realtime) > Wrote: 1106810 bytes, 4541 packets, 93 pages > Bitrate: 96.7055kbit/s (without overhead) > Instant rates: 76kbit/s to 165.6kbit/s > (190 to 414 bytes per packet) > Overhead: 0.81% (container+metadata) > > > [root at studio-connect.de audio]# opusenc music_orig.wav > music_orig.opus1.1 > Encoding using libopus unknown (audio) > ----------------------------------------------------- > Input: 48kHz 2 channels > Output: 2 channels (2 coupled) > 20ms packets, 96kbit/sec VBR > Preskip: 312 > > Encoding > complete > ----------------------------------------------------- > Encoded: 1 minute and 30.82 seconds > Runtime: 1 minute and 24 seconds > (1.081x realtime) > Wrote: 1263224 bytes, 4541 packets, 93 pages > Bitrate: 110.387kbit/s (without overhead) > Instant rates: 24.4kbit/s to 197.2kbit/s > (61 to 493 bytes per packet) > Overhead: 0.796% (container+metadata) > > Compiler Options: > > -march=armv7-a -mfloat-abi=hard -mfpu=neon -O2 -pipe -fstack-protector > --param=ssp-buffer-size=4 > > same results with: > > -march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 -O2 -pipe > -fstack-protector --param=ssp-buffer-size=4 > > nice regards > > Sebastian Reimers > > ------------------------------------------ > IT-Service Sebastian Reimers > Am blanken Boom 14 > 32369 Rahden > Festnetz: 05776-137324 > Fax-Nummer: 05221-17242088 > Skype: miete-admin > E-Mail: service at it-sreimers.de > Internet: www.it-sreimers.de > Internet: www.miete-admin.de > Steuernummer: 331/5079/2619 > UST-IdNr.: DE239109607 > ------------------------------------------ > > _______________________________________________ > opus mailing list > opus at xiph.org > http://lists.xiph.org/mailman/listinfo/opus >
Gregory Maxwell
2013-Oct-18 21:23 UTC
[opus] AM335x ARM Cortex-A8 performance drop opus 1.1
On Fri, Oct 18, 2013 at 2:21 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:> Just to clear things up... So 1.1 has some new analysis code that > increases the amount of CPU. When building as floating point (which you > appear to be doing, right?), the new code is enabled at complexity 7 and > up (opusenc defaults to complexity 10 IIRC). This is why you've been > seeing an increase in the CPU time. In version 1.0.x, complexity 5-10 > are the exactly the same for music. Running version 1.1 at complexity 5 > should be slightly faster than 1.0.x. > > When it comes to fixed-point, things are similar except that the > threshold at which the new analysis code is enabled is complexity 10. > This is because the new code uses floating-point instructions, even for > a fixed-point build.Adding to this, on cortex a8 a fixed point build will be faster.