thr3ads.net - search: "loadcombine"

2015 Sep 11

6

Optimizer issues on Windows

...e/llvm37> (llvm37 branch) project is facing an issue on Windows: When optimizations are turned on (llvm 3.7.0-final and more specifically<https://github.com/CausalityLtd/ponyc/blob/llvm37/src/libponyc/codegen/genopt.cc>, opt-level 3, BBVectorize, LoopVectorize, SLPVectorize, RerollLoops, LoadCombine + a custom heap to stack pass) writing an object file aborts (on Windows only) with the following fatal error: “Starting a function before ending the previous one!” at MCStreamer.cpp:407 during LLVMTargetMachineEmitToFile. Verifying the IR with llc raises no errors. What exact problem is being de...

Optimizer issues on Windows

2015 Sep 12

3

Optimizer issues on Windows

...e/llvm37> (llvm37 branch) project is facing an issue on Windows: When optimizations are turned on (llvm 3.7.0-final and more specifically<https://github.com/CausalityLtd/ponyc/blob/llvm37/src/libponyc/codegen/genopt.cc>, opt-level 3, BBVectorize, LoopVectorize, SLPVectorize, RerollLoops, LoadCombine + a custom heap to stack pass) writing an object file aborts (on Windows only) with the following fatal error: “Starting a function before ending the previous one!” at MCStreamer.cpp:407 during LLVMTargetMachineEmitToFile. Verifying the IR with llc raises no errors. What exact problem is being de...

Load combine pass

2016 Sep 28

4

Load combine pass

...> > At this point, my general view is that widening transformations of any kind should be done very late. Ideally, this is something the backend would do, but doing it as a CGP like fixup pass over the IR is also reasonable. > > With that in mind, I feel both the current placement of LoadCombine (within the inliner iteration) and the proposed InstCombine rule are undesirable. > > Philip > > > On 09/28/2016 08:22 AM, Artur Pilipenko wrote: >> Hi, >> >> I'm trying to optimize a pattern like this into a single i16 load: >> %1 = bitcast i16* %pDa...

masked-load endpoints optimization

2016 Mar 11

3

masked-load endpoints optimization

...;4 x i32> %v) { %ld1 = load i32, i32* %addr1 %addr2 = getelementptr i32, i32* %addr1, i64 3 %ld2 = load i32, i32* %addr2 %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0 %vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3 ret <4 x i32> %vec2 } $ ./llc -o - loadcombine.ll ... movups (%rdi), %xmm0 retq On Thu, Mar 10, 2016 at 10:22 PM, Nema, Ashutosh <Ashutosh.Nema at amd.com> wrote: > This looks interesting, the main motivation appears to be replacing masked > vector load with a general vector load followed by a select. > > >...

Load combine pass

2016 Sep 29

2

Load combine pass

...s point, my general view is that widening transformations of any kind should be done very late. Ideally, this is something the backend would do, but doing it as a CGP like fixup pass over the IR is also reasonable. > >> > >> With that in mind, I feel both the current placement of LoadCombine (within the inliner iteration) and the proposed InstCombine rule are undesirable. > >> > >> Philip > >> > >> > >> On 09/28/2016 08:22 AM, Artur Pilipenko wrote: > >>> Hi, > >>> > >>> I'm trying to optimize a patter...

Optimizer issues on Windows

2015 Sep 12

2

Optimizer issues on Windows

...e/llvm37> (llvm37 branch) project is facing an issue on Windows: When optimizations are turned on (llvm 3.7.0-final and more specifically<https://github.com/CausalityLtd/ponyc/blob/llvm37/src/libponyc/codegen/genopt.cc>, opt-level 3, BBVectorize, LoopVectorize, SLPVectorize, RerollLoops, LoadCombine + a custom heap to stack pass) writing an object file aborts (on Windows only) with the following fatal error: “Starting a function before ending the previous one!” at MCStreamer.cpp:407 during LLVMTargetMachineEmitToFile. Verifying the IR with llc raises no errors. What exact problem is being de...

RFC: Strong GC References in LLVM

2016 Jul 21

2

RFC: Strong GC References in LLVM

...; While I agree it can be lazy, and should be an analysis, i'm, again, > really not sure which passes you are thinking about here that do code > sinking/speculation that won't need it. > > Here's the list definitely needing it right now: > GVN > GVNHoist > LICM > LoadCombine > LoopReroll > LoopUnswitch > LoopVersioningLICM > MemCpyOptimizer > MergedLoadStoreMotion > Sink > > The list is almost certainly larger than this, this was a pretty trivial > grep and examination. > (and doesn't take into account bugs, etc) > > (Note, this...

RFC: Strong GC References in LLVM

2016 Jul 21

3

RFC: Strong GC References in LLVM

> On Jul 21, 2016, at 7:45 AM, Philip Reames <listmail at philipreames.com> wrote: > > Joining in very late, but the tangent here has been interesting (if rather OT for the original thread). > > I agree with Danny that we might want to take a close look at how we model things like maythrow calls, no return, and other implicit control flow. I'm not convinced that moving

Load combine pass

2019 Sep 11

2

Load combine pass

...eral view is that widening transformations of any > kind should be done very late. Ideally, this is something the backend > would do, but doing it as a CGP like fixup pass over the IR is also > reasonable. > >> > >> With that in mind, I feel both the current placement of LoadCombine > (within the inliner iteration) and the proposed InstCombine rule are > undesirable. > >> > >> Philip > >> > >> > >> On 09/28/2016 08:22 AM, Artur Pilipenko wrote: > >>> Hi, > >>> > >>> I'm trying to optimiz...

Load combine pass

2016 Sep 28

3

Load combine pass

Hi, I'm trying to optimize a pattern like this into a single i16 load: %1 = bitcast i16* %pData to i8* %2 = load i8, i8* %1, align 1 %3 = zext i8 %2 to i16 %4 = shl nuw i16 %3, 8 %5 = getelementptr inbounds i8, i8* %1, i16 1 %6 = load i8, i8* %5, align 1 %7 = zext i8 %6 to i16 %8 = shl nuw nsw i16 %7, 0 %9 = or i16 %8, %4 I came across load combine pass which is motivated

RFC: Strong GC References in LLVM

2016 Jul 21

4

RFC: Strong GC References in LLVM

...and should be an analysis, i'm, again, >> really not sure which passes you are thinking about here that do code >> sinking/speculation that won't need it. >> >> Here's the list definitely needing it right now: >> GVN >> GVNHoist >> LICM >> LoadCombine >> LoopReroll >> LoopUnswitch >> LoopVersioningLICM >> MemCpyOptimizer >> MergedLoadStoreMotion >> Sink >> >> The list is almost certainly larger than this, this was a pretty trivial >> grep and examination. >> (and doesn't take into a...

Load combine pass

2019 Sep 12

2

Load combine pass

...dening transformations of any >> kind should be done very late. Ideally, this is something the backend >> would do, but doing it as a CGP like fixup pass over the IR is also >> reasonable. >> >> >> >> With that in mind, I feel both the current placement of LoadCombine >> (within the inliner iteration) and the proposed InstCombine rule are >> undesirable. >> >> >> >> Philip >> >> >> >> >> >> On 09/28/2016 08:22 AM, Artur Pilipenko wrote: >> >>> Hi, >> >>> >&gt...

the as-if rule / perf vs. security

2016 Mar 15

3

the as-if rule / perf vs. security

...32* %addr1 > %addr2 = getelementptr i32, i32* %addr1, i64 3 > %ld2 = load i32, i32* %addr2 > %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0 > %vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3 > ret <4 x i32> %vec2 > } > > $ ./llc -o - loadcombine.ll > ... > movups (%rdi), %xmm0 > retq > > > > > On Thu, Mar 10, 2016 at 10:22 PM, Nema, Ashutosh <Ashutosh.Nema at amd.com> > wrote: > > This looks interesting, the main motivation appears to be replacing masked > vector load with a general vect...

Load combine pass

2019 Sep 25

2

Load combine pass

.... >>> Ideally, this is something the backend would do, but doing >>> it as a CGP like fixup pass over the IR is also reasonable. >>> >> >>> >> With that in mind, I feel both the current placement of >>> LoadCombine (within the inliner iteration) and the proposed >>> InstCombine rule are undesirable. >>> >> >>> >> Philip >>> >> >>> >> >>> >> On 09/28/2016 08:22 AM, Artur Pilipenko...

the as-if rule / perf vs. security

2016 Mar 16

3

the as-if rule / perf vs. security

...etelementptr i32, i32* %addr1, i64 3 >> %ld2 = load i32, i32* %addr2 >> %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0 >> %vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3 >> ret <4 x i32> %vec2 >> } >> >> $ ./llc -o - loadcombine.ll >> ... >> movups (%rdi), %xmm0 >> retq >> >> >> >> >> On Thu, Mar 10, 2016 at 10:22 PM, Nema, Ashutosh <Ashutosh.Nema at amd.com> >> wrote: >> >> This looks interesting, the main motivation appears to be replacing...

[cfe-dev] Just a quick heads up -- removing BBVectorize from LLVM (and Clang)

2017 Jul 01

3

[cfe-dev] Just a quick heads up -- removing BBVectorize from LLVM (and Clang)

Already added in the commit (I think) On Fri, Jun 30, 2017 at 3:58 PM Hans Wennborg <hans at chromium.org> wrote: > On Thu, Jun 29, 2017 at 3:42 PM, Chandler Carruth via cfe-dev > <cfe-dev at lists.llvm.org> wrote: > > If you don't use BBVectorize at all, you can ignore this. > > > > Hal suggested this in a thread in 2014: > >

the as-if rule / perf vs. security

2016 Mar 16

3

the as-if rule / perf vs. security

...1, i64 3 >>> %ld2 = load i32, i32* %addr2 >>> %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0 >>> %vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3 >>> ret <4 x i32> %vec2 >>> } >>> >>> $ ./llc -o - loadcombine.ll >>> ... >>> movups (%rdi), %xmm0 >>> retq >>> >>> >>> >>> >>> On Thu, Mar 10, 2016 at 10:22 PM, Nema, Ashutosh < >>> <Ashutosh.Nema at amd.com>Ashutosh.Nema at amd.com> wrote: >>> &gt...

masked-load endpoints optimization

2016 Mar 10

2

masked-load endpoints optimization

If we're loading the first and last elements of a vector using a masked load [1], can we replace the masked load with a full vector load? "The result of this operation is equivalent to a regular vector load instruction followed by a ‘select’ between the loaded and the passthru values, predicated on the same mask. However, using this intrinsic prevents exceptions on memory access to

Load combine pass

2016 Sep 29

3

Load combine pass

...ral view is that widening transformations of any kind should be done very late. Ideally, this is something the backend would do, but doing it as a CGP like fixup pass over the IR is also reasonable. > >>>> > >>>> With that in mind, I feel both the current placement of LoadCombine (within the inliner iteration) and the proposed InstCombine rule are undesirable. > >>>> > >>>> Philip > >>>> > >>>> > >>>> On 09/28/2016 08:22 AM, Artur Pilipenko wrote: > >>>>> Hi, > >>>>&g...

[LLVMdev] Memset/memcpy: user control of loop-idiom recognizer

2014 Dec 05

3

[LLVMdev] Memset/memcpy: user control of loop-idiom recognizer

On 3 Dec 2014, at 23:36, Robert Lougher <rob.lougher at gmail.com> wrote: > On 2 December 2014 at 22:18, Alex Rosenberg <alexr at leftfield.org> wrote: >> >> Our C library amplifies this problem by being in a dynamic library, so the >> call has additional overhead, which for small trip counts swamps the >> copy/set. >> > > I can't imagine

search for: loadcombine