Since you're already set up for benchmarks, I would ask if you could
benchmark the difference between using and not using the ARM64 inline
assembly. I believe the original justification on ARMv7 for the assembly
was the processor's panoply of multiply instructions and their long
cycle times. It seems to me that the ARM64 processor is much more like
an x86 one, where using a simpleminded C multiply gives just as good of
results. Inline assembly tends to hobble the compiler's optimizer, and
in ARM64's case, may actually be counterproductive.
The NEON code of course is valuable on all the ARM processors.
On 11/10/2015 1:00 PM, opus-request at xiph.org wrote:> Send opus mailing list submissions to
> opus at xiph.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.xiph.org/mailman/listinfo/opus
> or, via email, send a message with subject or body 'help' to
> opus-request at xiph.org
>
> You can reach the person managing the list at
> opus-owner at xiph.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of opus digest..."
>
>
> Today's Topics:
>
> 1. Re: [Aarch64 00/11] Patches to enable Aarch64 (arm64)
> optimizations, rebased to current master. (Jonathan Lennox)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 10 Nov 2015 19:32:35 +0000
> From: Jonathan Lennox <jonathan at vidyo.com>
> Subject: Re: [opus] [Aarch64 00/11] Patches to enable Aarch64 (arm64)
> optimizations, rebased to current master.
> To: "opus at xiph.org" <opus at xiph.org>
> Message-ID: <A0373653-FF01-472A-AC31-A68348384BF2 at vidyo.com>
> Content-Type: text/plain; charset="utf-8"
>
>
>> On Nov 6, 2015, at 9:05 PM, Jonathan Lennox <jonathan at
vidyo.com> wrote:
>>
>> These have been tested for correctness under qemu (including running
>> the test vectors), but not yet performance tested on a live aarch64
>> CPU (which will probably be an iPhone). I should be able to do this
>> Monday or Tuesday.
> I?ve now done this, on an iPhone 5S. (Building with clang from Xcode 7.1)
>
> In fixed-point mode, relative to current HEAD of master, in my tests
aarch64 gets an 10-12% encode boost, and a 6-7% decode boost, without Ne10.
With Ne10, it?s an 11-13% encode boost, and a 14-15% decode boost. (Current HEAD
of master doesn?t use Ne10 on aarch64 at all.)
>
> There?s also about a 5-6% boost to aarch64 floating-point mode, since some
of the optimizations apply to both fixed and float code.
>
> Fixed-point mode is still substantially faster than floating-point (about
20% faster for encode, about 10% faster for decode.)
>
> These patches also speed armv7 up substantially, since a number of the Neon
intrinsics apply to armv7 as well.
>
> Any questions, feel free to ask me or ping me on #opus.
>
> ------------------------------
>
> _______________________________________________
> opus mailing list
> opus at xiph.org
> http://lists.xiph.org/mailman/listinfo/opus
>
>
> End of opus Digest, Vol 82, Issue 15
> ************************************
>