Displaying 2 results from an estimated 2 matches for "add_delta_256".
2018 Apr 07
0
SCEV and LoopStrengthReduction Formulae
...in a past life, when I used to do x86 SIMD optimization for a living, I did similar tricks pretty much everywhere in DSP functions. It’d be pretty nice if the compiler could do it too.
There is one alternate approach that I recall, which looks like this:
Original code (example, pseudocode):
int add_delta_256(uint8 *in1, uint8 *in2) {
int accum = 0;
for (int i = 0; i < 16; ++i) {
uint8x16 a = load16(in1 + i *16); // NOTE: takes an extra addressing op because x86
uint8x16 b = load16(in2 + i *16); // NOTE: takes an extra addressing op because x86
accum += psadbw(a, b);
}
return accum;...
2018 Apr 03
4
SCEV and LoopStrengthReduction Formulae
I am attempting to implement a minor loop strength reduction optimization for
targets that support compare and jump fusion, specifically
TTI::canMacroFuseCmp(). My approach might be wrong; however, I am soliciting
the idea for feedback, so that I can implement this correctly. My plan is to
add a Supplemental LSR formula to LoopStrengthReduce.cpp that optimizes the
following case, but perhaps