search for: loadcombine

Displaying 20 results from an estimated 22 matches for "loadcombine".

2015 Sep 11
6
Optimizer issues on Windows
...e/llvm37> (llvm37 branch) project is facing an issue on Windows: When optimizations are turned on (llvm 3.7.0-final and more specifically<https://github.com/CausalityLtd/ponyc/blob/llvm37/src/libponyc/codegen/genopt.cc>, opt-level 3, BBVectorize, LoopVectorize, SLPVectorize, RerollLoops, LoadCombine + a custom heap to stack pass) writing an object file aborts (on Windows only) with the following fatal error: “Starting a function before ending the previous one!” at MCStreamer.cpp:407 during LLVMTargetMachineEmitToFile. Verifying the IR with llc raises no errors. What exact problem is being de...
2015 Sep 12
3
Optimizer issues on Windows
...e/llvm37> (llvm37 branch) project is facing an issue on Windows: When optimizations are turned on (llvm 3.7.0-final and more specifically<https://github.com/CausalityLtd/ponyc/blob/llvm37/src/libponyc/codegen/genopt.cc>, opt-level 3, BBVectorize, LoopVectorize, SLPVectorize, RerollLoops, LoadCombine + a custom heap to stack pass) writing an object file aborts (on Windows only) with the following fatal error: “Starting a function before ending the previous one!” at MCStreamer.cpp:407 during LLVMTargetMachineEmitToFile. Verifying the IR with llc raises no errors. What exact problem is being de...
2016 Sep 28
4
Load combine pass
...> > At this point, my general view is that widening transformations of any kind should be done very late. Ideally, this is something the backend would do, but doing it as a CGP like fixup pass over the IR is also reasonable. > > With that in mind, I feel both the current placement of LoadCombine (within the inliner iteration) and the proposed InstCombine rule are undesirable. > > Philip > > > On 09/28/2016 08:22 AM, Artur Pilipenko wrote: >> Hi, >> >> I'm trying to optimize a pattern like this into a single i16 load: >> %1 = bitcast i16* %pDa...
2016 Mar 11
3
masked-load endpoints optimization
...;4 x i32> %v) { %ld1 = load i32, i32* %addr1 %addr2 = getelementptr i32, i32* %addr1, i64 3 %ld2 = load i32, i32* %addr2 %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0 %vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3 ret <4 x i32> %vec2 } $ ./llc -o - loadcombine.ll ... movups (%rdi), %xmm0 retq On Thu, Mar 10, 2016 at 10:22 PM, Nema, Ashutosh <Ashutosh.Nema at amd.com> wrote: > This looks interesting, the main motivation appears to be replacing masked > vector load with a general vector load followed by a select. > > >...
2016 Sep 29
2
Load combine pass
...s point, my general view is that widening transformations of any kind should be done very late. Ideally, this is something the backend would do, but doing it as a CGP like fixup pass over the IR is also reasonable. > >> > >> With that in mind, I feel both the current placement of LoadCombine (within the inliner iteration) and the proposed InstCombine rule are undesirable. > >> > >> Philip > >> > >> > >> On 09/28/2016 08:22 AM, Artur Pilipenko wrote: > >>> Hi, > >>> > >>> I'm trying to optimize a patter...
2015 Sep 12
2
Optimizer issues on Windows
...e/llvm37> (llvm37 branch) project is facing an issue on Windows: When optimizations are turned on (llvm 3.7.0-final and more specifically<https://github.com/CausalityLtd/ponyc/blob/llvm37/src/libponyc/codegen/genopt.cc>, opt-level 3, BBVectorize, LoopVectorize, SLPVectorize, RerollLoops, LoadCombine + a custom heap to stack pass) writing an object file aborts (on Windows only) with the following fatal error: “Starting a function before ending the previous one!” at MCStreamer.cpp:407 during LLVMTargetMachineEmitToFile. Verifying the IR with llc raises no errors. What exact problem is being de...
2016 Jul 21
2
RFC: Strong GC References in LLVM
...; While I agree it can be lazy, and should be an analysis, i'm, again, > really not sure which passes you are thinking about here that do code > sinking/speculation that won't need it. > > Here's the list definitely needing it right now: > GVN > GVNHoist > LICM > LoadCombine > LoopReroll > LoopUnswitch > LoopVersioningLICM > MemCpyOptimizer > MergedLoadStoreMotion > Sink > > The list is almost certainly larger than this, this was a pretty trivial > grep and examination. > (and doesn't take into account bugs, etc) > > (Note, this...
2016 Jul 21
3
RFC: Strong GC References in LLVM
> On Jul 21, 2016, at 7:45 AM, Philip Reames <listmail at philipreames.com> wrote: > > Joining in very late, but the tangent here has been interesting (if rather OT for the original thread). > > I agree with Danny that we might want to take a close look at how we model things like maythrow calls, no return, and other implicit control flow. I'm not convinced that moving
2019 Sep 11
2
Load combine pass
...eral view is that widening transformations of any > kind should be done very late. Ideally, this is something the backend > would do, but doing it as a CGP like fixup pass over the IR is also > reasonable. > >> > >> With that in mind, I feel both the current placement of LoadCombine > (within the inliner iteration) and the proposed InstCombine rule are > undesirable. > >> > >> Philip > >> > >> > >> On 09/28/2016 08:22 AM, Artur Pilipenko wrote: > >>> Hi, > >>> > >>> I'm trying to optimiz...
2016 Sep 28
3
Load combine pass
Hi, I'm trying to optimize a pattern like this into a single i16 load: %1 = bitcast i16* %pData to i8* %2 = load i8, i8* %1, align 1 %3 = zext i8 %2 to i16 %4 = shl nuw i16 %3, 8 %5 = getelementptr inbounds i8, i8* %1, i16 1 %6 = load i8, i8* %5, align 1 %7 = zext i8 %6 to i16 %8 = shl nuw nsw i16 %7, 0 %9 = or i16 %8, %4 I came across load combine pass which is motivated
2016 Jul 21
4
RFC: Strong GC References in LLVM
...and should be an analysis, i'm, again, >> really not sure which passes you are thinking about here that do code >> sinking/speculation that won't need it. >> >> Here's the list definitely needing it right now: >> GVN >> GVNHoist >> LICM >> LoadCombine >> LoopReroll >> LoopUnswitch >> LoopVersioningLICM >> MemCpyOptimizer >> MergedLoadStoreMotion >> Sink >> >> The list is almost certainly larger than this, this was a pretty trivial >> grep and examination. >> (and doesn't take into a...
2019 Sep 12
2
Load combine pass
...dening transformations of any >> kind should be done very late. Ideally, this is something the backend >> would do, but doing it as a CGP like fixup pass over the IR is also >> reasonable. >> >> >> >> With that in mind, I feel both the current placement of LoadCombine >> (within the inliner iteration) and the proposed InstCombine rule are >> undesirable. >> >> >> >> Philip >> >> >> >> >> >> On 09/28/2016 08:22 AM, Artur Pilipenko wrote: >> >>> Hi, >> >>> >&gt...
2016 Mar 15
3
the as-if rule / perf vs. security
...32* %addr1 > %addr2 = getelementptr i32, i32* %addr1, i64 3 > %ld2 = load i32, i32* %addr2 > %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0 > %vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3 > ret <4 x i32> %vec2 > } > > $ ./llc -o - loadcombine.ll > ... > movups (%rdi), %xmm0 > retq > > > > > On Thu, Mar 10, 2016 at 10:22 PM, Nema, Ashutosh <Ashutosh.Nema at amd.com> > wrote: > > This looks interesting, the main motivation appears to be replacing masked > vector load with a general vect...
2019 Sep 25
2
Load combine pass
.... >>> Ideally, this is something the backend would do, but doing >>> it as a CGP like fixup pass over the IR is also reasonable. >>> >> >>> >> With that in mind, I feel both the current placement of >>> LoadCombine (within the inliner iteration) and the proposed >>> InstCombine rule are undesirable. >>> >> >>> >> Philip >>> >> >>> >> >>> >> On 09/28/2016 08:22 AM, Artur Pilipenko...
2016 Mar 16
3
the as-if rule / perf vs. security
...etelementptr i32, i32* %addr1, i64 3 >> %ld2 = load i32, i32* %addr2 >> %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0 >> %vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3 >> ret <4 x i32> %vec2 >> } >> >> $ ./llc -o - loadcombine.ll >> ... >> movups (%rdi), %xmm0 >> retq >> >> >> >> >> On Thu, Mar 10, 2016 at 10:22 PM, Nema, Ashutosh <Ashutosh.Nema at amd.com> >> wrote: >> >> This looks interesting, the main motivation appears to be replacing...
2017 Jul 01
3
[cfe-dev] Just a quick heads up -- removing BBVectorize from LLVM (and Clang)
Already added in the commit (I think) On Fri, Jun 30, 2017 at 3:58 PM Hans Wennborg <hans at chromium.org> wrote: > On Thu, Jun 29, 2017 at 3:42 PM, Chandler Carruth via cfe-dev > <cfe-dev at lists.llvm.org> wrote: > > If you don't use BBVectorize at all, you can ignore this. > > > > Hal suggested this in a thread in 2014: > >
2016 Mar 16
3
the as-if rule / perf vs. security
...1, i64 3 >>> %ld2 = load i32, i32* %addr2 >>> %vec1 = insertelement <4 x i32> undef, i32 %ld1, i32 0 >>> %vec2 = insertelement <4 x i32> %vec1, i32 %ld2, i32 3 >>> ret <4 x i32> %vec2 >>> } >>> >>> $ ./llc -o - loadcombine.ll >>> ... >>> movups (%rdi), %xmm0 >>> retq >>> >>> >>> >>> >>> On Thu, Mar 10, 2016 at 10:22 PM, Nema, Ashutosh < >>> <Ashutosh.Nema at amd.com>Ashutosh.Nema at amd.com> wrote: >>> &gt...
2016 Mar 10
2
masked-load endpoints optimization
If we're loading the first and last elements of a vector using a masked load [1], can we replace the masked load with a full vector load? "The result of this operation is equivalent to a regular vector load instruction followed by a ‘select’ between the loaded and the passthru values, predicated on the same mask. However, using this intrinsic prevents exceptions on memory access to
2016 Sep 29
3
Load combine pass
...ral view is that widening transformations of any kind should be done very late. Ideally, this is something the backend would do, but doing it as a CGP like fixup pass over the IR is also reasonable. > >>>> > >>>> With that in mind, I feel both the current placement of LoadCombine (within the inliner iteration) and the proposed InstCombine rule are undesirable. > >>>> > >>>> Philip > >>>> > >>>> > >>>> On 09/28/2016 08:22 AM, Artur Pilipenko wrote: > >>>>> Hi, > >>>>&g...
2014 Dec 05
3
[LLVMdev] Memset/memcpy: user control of loop-idiom recognizer
On 3 Dec 2014, at 23:36, Robert Lougher <rob.lougher at gmail.com> wrote: > On 2 December 2014 at 22:18, Alex Rosenberg <alexr at leftfield.org> wrote: >> >> Our C library amplifies this problem by being in a dynamic library, so the >> call has additional overhead, which for small trip counts swamps the >> copy/set. >> > > I can't imagine