Displaying 5 results from an estimated 5 matches for "smlalbb".
Did you mean:
smlabb
2015 May 28
1
[LLVMdev] [ARM backend] adding pattern for SMLALBB
Hi James/Tim,
I am trying to add a patterns for SMLALBB
I think these two assembly patterns can be reduced to SMLALBB using tablegen.
1)
smulbb r2, r3, r2
adds r0, r2, r0 (RdLo)
asr r3, r2, #31
adc r1, r3, r1 (RdHi) ==> smlalbb r0, r1, r3, r2
I have added pattern in def SMLALBB : AMulxyI64< ..... as below :-
[]...
2014 Jun 20
2
Alleged bug in Silk codec
...a 64 bit implementation is
>> faster on, say, a 32 bit ARM CPU) then perhaps we should reconsider.
>>
>
> Doesn't ARMv6 have a dual signed 16x16->32 multiply with a 64-bit
> accumulator (SMLALD)? Even v5E should have a single 16x16->32 with a 64-bit
> accumulator (SMLALBB). I would think a 64-bit version could be made pretty
> fast on 32-bit ARM, without even resorting to SIMD.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/opus/attachments/20140620/6fc87274/attachment.htm
2014 Jun 20
2
Alleged bug in Silk codec
Right, there shouldn't be a problem with undefined behavior.
That said, a 64 bit implementation will work very well - in fact that's how
it was done originally.
The reason for the current implementation is to minimize 64-bit operations
in order to improve performance on limited-width architectures. This
functions gets used extensively, and I think the current implementation is
faster on
2014 Jun 20
0
Alleged bug in Silk codec
...e opposite to be true (ie that a 64 bit implementation is
> faster on, say, a 32 bit ARM CPU) then perhaps we should reconsider.
Doesn't ARMv6 have a dual signed 16x16->32 multiply with a 64-bit
accumulator (SMLALD)? Even v5E should have a single 16x16->32 with a
64-bit accumulator (SMLALBB). I would think a 64-bit version could be
made pretty fast on 32-bit ARM, without even resorting to SIMD.
2014 Jun 25
0
Alleged bug in Silk codec
...find the opposite to be true (ie that a 64 bit implementation is
faster on, say, a 32 bit ARM CPU) then perhaps we should reconsider.
Doesn't ARMv6 have a dual signed 16x16->32 multiply with a 64-bit accumulator (SMLALD)? Even v5E should have a single 16x16->32 with a 64-bit accumulator (SMLALBB). I would think a 64-bit version could be made pretty fast on 32-bit ARM, without even resorting to SIMD.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/opus/attachments/20140625/63c6442d/attachment.htm