thr3ads.net - search: "smlal"

Displaying 3 results from an estimated 3 matches for "smlal".

Did you mean: smal

[LLVMdev] [ARM backend] adding pattern for SMLALBB

2015 May 28

[LLVMdev] [ARM backend] adding pattern for SMLALBB

Hi James/Tim, I am trying to add a patterns for SMLALBB I think these two assembly patterns can be reduced to SMLALBB using tablegen. 1) smulbb r2, r3, r2 adds r0, r2, r0 (RdLo) asr r3, r2, #31 adc r1, r3, r1 (RdHi) ==> smlalbb r0, r1, r3, r2 I have added pattern in def SMLALBB : AMulxyI64< ..... as below :- [...

[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.

2015 Nov 23

[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.

...n code is a performance boost for both platforms, and I?d rather not litter it with #ifdef?s unless there?s a large difference between the platforms. It looks like Clang (the version in Xcode 7.1.1, at least) is smart enough to optimize the first two operations you mention, figuring out sshll2 and smlal2 properly, though the third causes a gratuitous extra ?ext.16b? to be generated. I?ve filed a missed-optimization bug on Clang for the latter. Here?s the code it generates: _silk_NSQ_noise_shape_feedback_loop_neon: 000000000000004c ldr w9, [x0] 0000000000000050 cmp w3, #8...

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 20

[Aarch64 00/11] Patches to enable Aarch64

> On Nov 19, 2015, at 5:47 PM, John Ridges <jridges at masque.com> wrote: > > Any speedup from the intrinsics may just be swamped by the rest of the encode/decode process. But I think you really want SIG2WORD16 to be (vqmovns_s32(PSHR32((x), SIG_SHIFT))) Yes, you?re right. I forgot to run the vectors under qemu with my previous version (oh, the embarrassment!) Fixed forthcoming

search for: smlal