search for: smlalbb

Displaying 5 results from an estimated 5 matches for "smlalbb".

Did you mean: smlabb
2015 May 28
1
[LLVMdev] [ARM backend] adding pattern for SMLALBB
Hi James/Tim, I am trying to add a patterns for SMLALBB I think these two assembly patterns can be reduced to SMLALBB using tablegen. 1) smulbb r2, r3, r2 adds r0, r2, r0 (RdLo) asr r3, r2, #31 adc r1, r3, r1 (RdHi) ==> smlalbb r0, r1, r3, r2 I have added pattern in def SMLALBB : AMulxyI64< ..... as below :- []...
2014 Jun 20
2
Alleged bug in Silk codec
...a 64 bit implementation is >> faster on, say, a 32 bit ARM CPU) then perhaps we should reconsider. >> > > Doesn't ARMv6 have a dual signed 16x16->32 multiply with a 64-bit > accumulator (SMLALD)? Even v5E should have a single 16x16->32 with a 64-bit > accumulator (SMLALBB). I would think a 64-bit version could be made pretty > fast on 32-bit ARM, without even resorting to SIMD. > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/opus/attachments/20140620/6fc87274/attachment.htm
2014 Jun 20
2
Alleged bug in Silk codec
Right, there shouldn't be a problem with undefined behavior. That said, a 64 bit implementation will work very well - in fact that's how it was done originally. The reason for the current implementation is to minimize 64-bit operations in order to improve performance on limited-width architectures. This functions gets used extensively, and I think the current implementation is faster on
2014 Jun 20
0
Alleged bug in Silk codec
...e opposite to be true (ie that a 64 bit implementation is > faster on, say, a 32 bit ARM CPU) then perhaps we should reconsider. Doesn't ARMv6 have a dual signed 16x16->32 multiply with a 64-bit accumulator (SMLALD)? Even v5E should have a single 16x16->32 with a 64-bit accumulator (SMLALBB). I would think a 64-bit version could be made pretty fast on 32-bit ARM, without even resorting to SIMD.
2014 Jun 25
0
Alleged bug in Silk codec
...find the opposite to be true (ie that a 64 bit implementation is faster on, say, a 32 bit ARM CPU) then perhaps we should reconsider. Doesn't ARMv6 have a dual signed 16x16->32 multiply with a 64-bit accumulator (SMLALD)? Even v5E should have a single 16x16->32 with a 64-bit accumulator (SMLALBB). I would think a 64-bit version could be made pretty fast on 32-bit ARM, without even resorting to SIMD. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/opus/attachments/20140625/63c6442d/attachment.htm