Displaying 5 results from an estimated 5 matches for "ttyxpf".
2018 Jul 23
2
[LoopVectorizer] Improving the performance of dot product reduction loop
On 07/23/2018 06:23 PM, Hal Finkel via llvm-dev wrote:
>
> On 07/23/2018 05:22 PM, Craig Topper wrote:
>> Hello all,
>>
>> This code https://godbolt.org/g/tTyxpf is a dot product reduction
>> loop multipying sign extended 16-bit values to produce a 32-bit
>> accumulated result. The x86 backend is currently not able to optimize
>> it as well as gcc and icc. The IR we are getting from the loop
>> vectorizer has several v8i32 adds and m...
2018 Jul 23
3
[LoopVectorizer] Improving the performance of dot product reduction loop
Hello all,
This code https://godbolt.org/g/tTyxpf is a dot product reduction loop
multipying sign extended 16-bit values to produce a 32-bit accumulated
result. The x86 backend is currently not able to optimize it as well as gcc
and icc. The IR we are getting from the loop vectorizer has several v8i32
adds and muls inside the loop. These are fed b...
2018 Jul 23
4
[LoopVectorizer] Improving the performance of dot product reduction loop
~Craig
On Mon, Jul 23, 2018 at 4:24 PM Hal Finkel <hfinkel at anl.gov> wrote:
>
> On 07/23/2018 05:22 PM, Craig Topper wrote:
>
> Hello all,
>
> This code https://godbolt.org/g/tTyxpf is a dot product reduction loop
> multipying sign extended 16-bit values to produce a 32-bit accumulated
> result. The x86 backend is currently not able to optimize it as well as gcc
> and icc. The IR we are getting from the loop vectorizer has several v8i32
> adds and muls inside the l...
2018 Jul 24
4
[LoopVectorizer] Improving the performance of dot product reduction loop
...7/23/2018 06:37 PM, Craig Topper wrote:
>
>
> ~Craig
>
>
> On Mon, Jul 23, 2018 at 4:24 PM Hal Finkel <hfinkel at anl.gov> wrote:
>
>>
>> On 07/23/2018 05:22 PM, Craig Topper wrote:
>>
>> Hello all,
>>
>> This code https://godbolt.org/g/tTyxpf is a dot product reduction loop
>> multipying sign extended 16-bit values to produce a 32-bit accumulated
>> result. The x86 backend is currently not able to optimize it as well as gcc
>> and icc. The IR we are getting from the loop vectorizer has several v8i32
>> adds and m...
2018 Jul 24
2
[LoopVectorizer] Improving the performance of dot product reduction loop
...ot product reduction loop
>
>
>
>
>
>
>
> On 07/23/2018 06:23 PM, Hal Finkel via llvm-dev wrote:
>
>
>
> On 07/23/2018 05:22 PM, Craig Topper wrote:
>
> Hello all,
>
>
>
> This code https://godbolt.org/g/tTyxpf
> <https://godbolt.org/g/tTyxpf> is a dot product reduction loop
> multipying sign extended 16-bit values to produce a 32-bit
> accumulated result. The x86 backend is currently not able to
> optimize it as well as gcc and icc. The IR we are getting...