thr3ads.net - search: "avx_sad

Displaying 3 results from an estimated 3 matches for "avx_sad_4".

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

2020 Jul 16

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

...#1 = { norecurse readnone } attributes #2 = { nounwind readnone speculatable willreturn } !ubaa.Burst.Compiler.IL.Tests.VectorsMaths\2FFloatPointer.0 = !{!0, !0, !0, !0} !0 = !{i1 false} !1 = !{i1 true, i1 false, i1 false} If I run this with ../llvm-project/llvm/build/bin/opt.exe -o - -S -O3 ../avx_sad_4.ll -mattr=avx -debug, I can see that the loop vectorizer correctly considers using 8-wide ymm registers for this, but has decided that the 4-wide variant is cheaper based on some cost modelling I don't understand. So is this expected behaviour? I know there was some cost model changes in the 1...

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

2020 Jul 16

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

...willreturn } >> >> !ubaa.Burst.Compiler.IL.Tests.VectorsMaths\2FFloatPointer.0 = !{!0, !0, >> !0, !0} >> >> !0 = !{i1 false} >> !1 = !{i1 true, i1 false, i1 false} >> >> If I run this with ../llvm-project/llvm/build/bin/opt.exe -o - -S -O3 >> ../avx_sad_4.ll -mattr=avx -debug, I can see that the loop vectorizer >> correctly considers using 8-wide ymm registers for this, but has decided >> that the 4-wide variant is cheaper based on some cost modelling I don't >> understand. >> >> So is this expected behaviour? I kno...

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

2020 Jul 16

LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target

...ests.VectorsMaths\2FFloatPointer.0 = !{!0, !0, >>>> !0, !0} >>>> >>>> !0 = !{i1 false} >>>> !1 = !{i1 true, i1 false, i1 false} >>>> >>>> If I run this with ../llvm-project/llvm/build/bin/opt.exe -o - -S -O3 >>>> ../avx_sad_4.ll -mattr=avx -debug, I can see that the loop vectorizer >>>> correctly considers using 8-wide ymm registers for this, but has decided >>>> that the 4-wide variant is cheaper based on some cost modelling I don't >>>> understand. >>>> >>>&...

search for: avx_sad_4