thr3ads.net - opus - [opus] [Aarch64 00/11] Patches to enable Aarch64 [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Jonathan Lennox

2015-Nov-19 21:52 UTC

[opus] [Aarch64 00/11] Patches to enable Aarch64

> On Nov 16, 2015, at 4:42 PM, Jonathan Lennox <jonathan at vidyo.com>
wrote:
> 
> I haven?t yet tried replacing SIG2WORD16 (or silk_ADD_SAT32/silk_SUB_SAT32)
with Neon intrinsics.  That?s an obvious next step.
This doesn?t show any appreciable speed difference in my tests, but the code is
obviously better by inspection (all three of these map directly to a single
Aarch64 instruction and a single Neon intrinsic) so my code paths may just not
exercise them.

Patches follow.

John Ridges

2015-Nov-19 22:47 UTC

head link

[opus] [Aarch64 00/11] Patches to enable Aarch64

Any speedup from the intrinsics may just be swamped by the rest of the 
encode/decode process. But I think you really want SIG2WORD16 to be 
(vqmovns_s32(PSHR32((x), SIG_SHIFT)))


On 11/19/2015 2:52 PM, Jonathan Lennox wrote:>> On Nov 16, 2015, at 4:42 PM, Jonathan Lennox <jonathan at
vidyo.com> wrote:
>>
>> I haven?t yet tried replacing SIG2WORD16 (or
silk_ADD_SAT32/silk_SUB_SAT32) with Neon intrinsics.  That?s an obvious next
step.
> This doesn?t show any appreciable speed difference in my tests, but the
code is obviously better by inspection (all three of these map directly to a
single Aarch64 instruction and a single Neon intrinsic) so my code paths may
just not exercise them.
>
> Patches follow.
>

Jonathan Lennox

2015-Nov-20 00:18 UTC

head link

[opus] [Aarch64 00/11] Patches to enable Aarch64

> On Nov 19, 2015, at 5:47 PM, John Ridges <jridges at masque.com>
wrote:
> 
> Any speedup from the intrinsics may just be swamped by the rest of the
encode/decode process. But I think you really want SIG2WORD16 to be
(vqmovns_s32(PSHR32((x), SIG_SHIFT)))
Yes, you?re right. I forgot to run the vectors under qemu with my previous
version (oh, the embarrassment!)  Fixed forthcoming once the tests actually run.
> On 11/19/2015 2:52 PM, Jonathan Lennox wrote:
>>> On Nov 16, 2015, at 4:42 PM, Jonathan Lennox <jonathan at
vidyo.com> wrote:
>>> 
>>> I haven?t yet tried replacing SIG2WORD16 (or
silk_ADD_SAT32/silk_SUB_SAT32) with Neon intrinsics.  That?s an obvious next
step.
>> This doesn?t show any appreciable speed difference in my tests, but the
code is obviously better by inspection (all three of these map directly to a
single Aarch64 instruction and a single Neon intrinsic) so my code paths may
just not exercise them.
>> 
>> Patches follow.
>> 
>

Maybe Matching Threads

Search for more apparently analagous threads

opus - Nov 2015 - [Aarch64 00/11] Patches to enable Aarch64

[opus] [Aarch64 00/11] Patches to enable Aarch64

[opus] [Aarch64 00/11] Patches to enable Aarch64

[opus] [Aarch64 00/11] Patches to enable Aarch64

Maybe Matching Threads