Displaying 2 results from an estimated 2 matches for "sum_sqr_shift".
2014 Jun 20
2
Alleged bug in Silk codec
Yes those instructions exist, although they're a bit slower than the basic
16x16->32 with 32-bit accumulation (SMLABB). So I'd be surprised if the
function with 64 bit accumulation would run as fast as the current code.
Don't know how much we care about 16-bit platforms. And accuracy should
not matter.
On the other hand, a 64-bit implementation is much cleaner/shorter, which
is
2014 Jun 25
0
Alleged bug in Silk codec
...ed conversion you are right, it is implementation defined. I just had an issue a couple of years ago with a compiler which incorrectly treated unsigned overflow as undefined rather than implementation defined?
Regarding the 64 bit profiling: I looked at the disassembly (gcc ?c ?S ?O2 ../opus/silk/sum_sqr_shift.c ?I../opus/include ?I../opus/celt) of the 64 bit accumulator version (unrolled twice like the current code) and found that, as well as having only one loop, the loop has 12 instructions per iteration.
The current version (after fixing the bug) gives 12 instructions per iteration until shift becom...