thr3ads.net - search: "d49636"

Displaying 5 results from an estimated 5 matches for "d49636".

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 24

[LoopVectorizer] Improving the performance of dot product reduction loop

...;> > I'm still missing something. Why do you want to separate out the even and > odd parts instead of just adding up the first half of the numbers and the > second half? > Doing even/odd matches up with a pattern I already have to support for the code in https://reviews.llvm.org/D49636. I wouldn't even need to detect is as a reduction to do the reassocation since even/odd exactly matches the behavior of the instruction. But you're right we could also just detect the reduction and add two halves. > > Thanks again, > Hal > > Then ensures that no pieces exc...

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 23

[LoopVectorizer] Improving the performance of dot product reduction loop

...her v8i32 add to accumulate the previous loop iterations. Then ensures that no pieces exceed the target vector width and the final operation is correctly sized to go around the loop in one register. All but the last add can then be pattern matched to vpmaddwd as proposed in https://reviews.llvm.org/D49636. And for the future CPU the whole thing can be matched to the new instruction. Do other targets have a similar instruction or a similar issue to this? Is this something we can solve in the loop vectorizer? Or should we have a separate IR transformation that can recognize this pattern and generate...

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 23

[LoopVectorizer] Improving the performance of dot product reduction loop

...umulate the previous loop iterations. Then ensures that no > pieces exceed the target vector width and the final operation is correctly > sized to go around the loop in one register. All but the last add can then > be pattern matched to vpmaddwd as proposed in > https://reviews.llvm.org/D49636. And for the future CPU the whole thing > can be matched to the new instruction. > > Do other targets have a similar instruction or a similar issue to this? Is > this something we can solve in the loop vectorizer? Or should we have a > separate IR transformation that can recognize th...

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 23

[LoopVectorizer] Improving the performance of dot product reduction loop

...ious loop iterations. >> Then ensures that no pieces exceed the target vector width and the >> final operation is correctly sized to go around the loop in one >> register. All but the last add can then be pattern matched to >> vpmaddwd as proposed in https://reviews.llvm.org/D49636. And for the >> future CPU the whole thing can be matched to the new instruction. >> >> Do other targets have a similar instruction or a similar issue to >> this? Is this something we can solve in the loop vectorizer? Or >> should we have a separate IR transformation t...

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 24

[LoopVectorizer] Improving the performance of dot product reduction loop

...ions. Then ensures that no pieces exceed the target > vector width and the final operation is correctly sized to go > around the loop in one register. All but the last add can then > be pattern matched to vpmaddwd as proposed > in https://reviews.llvm.org/D49636. And for the future CPU the > whole thing can be matched to the new instruction. > > > > Do other targets have a similar instruction or a similar issue > to this? Is this something we can solve in the loop > vectorizer? Or should we hav...

search for: d49636