similar to: LLVM-IR store-load propagation

Displaying 20 results from an estimated 10000 matches similar to: "LLVM-IR store-load propagation"

2019 Aug 08
2
Suboptimal code generated by clang+llc in quite a common scenario (?)
I found a something that I quite not understand when compiling a common piece of code using the -Os flags. I found it while testing my own backend but then I got deeper and found that at least the x86 is affected as well. This is the referred code: char pp[3]; char *scscx = pp; int tst( char i, char j, char k ) { scscx[0] = i; scscx[1] = j; scscx[2] = k; return 0; } The above gets
2013 Nov 11
2
[LLVMdev] What's the Alias Analysis does clang use ?
Hi, LLVM community: I found basicaa seems not to tell must-not-alias for __restrict__ arguments in c/c++. It only compares two pointers and the underlying objects they point to. I wonder how clang does alias analysis for c/c++ keyword restrict. let assume we compile the following code: $cat myalias.cc float foo(float * __restrict__ v0, float * __restrict__ v1, float * __restrict__ v2, float *
2013 Nov 12
0
[LLVMdev] What's the Alias Analysis does clang use ?
Hi, Your problem is that the function arguments, which are makes as noalias, are not being directly used as the base objects of the array accesses: > %v0.addr = alloca float*, align 8 > %v1.addr = alloca float*, align 8 > %v2.addr = alloca float*, align 8 > %t.addr = alloca float*, align 8 ... > store float* %v0, float** %v0.addr, align 8 > store float* %v1, float** %v1.addr,
2013 Feb 07
1
[LLVMdev] alloca scalarization with dynamic indexing into vectors
Hi all, I have a question regarding dynamic indexing into a vector with GEP. I see that in the ScalarReplAggregates pass in the LLVM 3.2 release the call SROA::isSafeGEP() will now allow alloca scalarization in the case where a GEP index into a vector isn’t a constant. My question is: what is the expected behavior when the index is out of bounds of the vector? Is it undefined? I have an
2014 Sep 19
3
[LLVMdev] [Vectorization] Mis match in code generated
Hi Arnold, Thanks for your reply. I tried test case as suggested by you. *void foo(int *a, int *sum) {*sum = a[0]+a[1]+a[2]+a[3]+a[4]+a[5]+a[6]+a[7]+a[8]+a[9]+a[10]+a[11]+a[12]+a[13]+a[14]+a[15];}* so that it has a 'store' in its IR. *IR before vectorization :*target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128" target triple =
2014 Nov 10
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Suyog, Thanks for looking at this. This has recently got itself onto my TODO list too. > I am not sure how much all this will improve the code quality for horizontal reduction > (donno how frequently such pattern of horizontal reduction from same array occurs in real world/SPECS). Actually the main loop of 470.lbm can be SLP vectorized like this. We have three parts to it: A fully
2014 Oct 17
2
[LLVMdev] opt -O2 leads to incorrect operation (possibly a bug in the DSE)
Hi all, Consider the following example: define void @fn(i8* %buf) #0 { entry: %arrayidx = getelementptr i8* %buf, i64 18 tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %arrayidx, i8* %buf, i64 18, i32 1, i1 false) %arrayidx1 = getelementptr i8* %buf, i64 18 store i8 1, i8* %arrayidx1, align 1 tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %buf, i8* %arrayidx, i64 18, i32 1, i1 false)
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi Nadav, Thanks for the quick reply !! Ok, so as of now we are lacking capability to handle flat large reductions. I did go through function vectorizeChainsInBlock() (line number 2862). In this function, we try to vectorize if we have phi nodes in the IR (several if's check for phi nodes) i.e we try to construct tree that starts at chains. Any pointers on how to join multiple trees? I
2019 Nov 10
2
Reassociation is blocking a vectorization
Hi Devs, I am looking at the bug https://bugs.llvm.org/show_bug.cgi?id=43953 and found that following piece of ir %arrayidx = getelementptr inbounds float, float* %Vec0, i64 %idxprom %0 = load float, float* %arrayidx, align 4, !tbaa !2 %arrayidx2 = getelementptr inbounds float, float* %Vec1, i64 %idxprom %1 = load float, float* %arrayidx2, align 4, !tbaa !2 %sub = fsub fast float %0, %1
2017 May 05
2
load instruction to gather intrinsics
Hi All, Can I change a vector load to gather intrinsic? If so, how can I do it? For example, I want to change the following IR code %1 = load <2 x i64>* %arrayidx1, align 8 to %1 = call <2 x i64> @llvm.masked.gather.v2i64(<2 x i64*> %arrayidx1, i32 8, <2 x i1> <i1 true, i1 true>, <2 x i64> undef) Basically, I am not sure how to get two consecutive
2014 Sep 18
2
[LLVMdev] [Vectorization] Mis match in code generated
Hi, I am trying to understand LLVM vectorization implementation and was looking into both loop and SLP vectorization. test case 1: *int foo(int *a) {int sum = 0,i;for(i=0; i<16; i++) sum += a[i];return sum;}* This code is vectorized by loop vectorizer where we calculate scalar loop cost as 4 and vector loop cost as 2. Since vector loop cost is less and above reduction is legal to
2014 Feb 19
2
[LLVMdev] better code for IV
Hi Andrew, The issue below refers to LSR, so I'll appreciate your feedback. It also refers to instruction combining and might impact backends other than X86, so if you know of others that might be interested you are more than welcome to add them. Thanks, Anat _____________________________________________ From: Shemer, Anat Sent: Tuesday, February 18, 2014 15:07 To: 'llvmdev at
2018 Nov 18
3
Dependence Analysis bug or undefined behavior?
Hi, Does this kind of IR have "undefined behavior" under LLVM semantics or is it acceptable? (TLDR: a store of i64 at offset n, followed by a load of i32 at offset n+1.) define void @foo(i32* %A, i64 %n) { entry: %arrayidx = getelementptr inbounds i32, i32* %A, i64 %n %arrayidx_cast = bitcast i32* %arrayidx to i64* store i64 0, i64* %arrayidx_cast, align 4 %add1 = add i64 %n,
2017 Sep 13
2
RFC phantom memory intrinsic
Hi Michael, >Interesting approach but how do you handle more complex offsets, e.g., when the pointer is part of an aggregate? Only one offset does not seem enough to handle generic cases. Yes, correct, this a little bit changed example is not working. #include <x86intrin.h> __m256d vsht_d4_fold(const double* ptr, unsigned long long i) { __m256d foo = (__m256d){ ptr[i], ptr[i+1],
2017 Sep 12
3
RFC phantom memory intrinsic
Hi, For PR21780 solution, I plan to add a new functionality to restore memory operations that was once deleted, in this particular case it is the load operations that were deleted by InstCombine, please note that once the load was removed there is no way to restore it back and that prevents us from vectorizing the shuffle operation. There are probably more similar issues where this approach could
2020 Apr 04
4
Legality of transformation
Please consider the following C code: * #define SZ 2048 int main(void) { int A[SZ]; int B[SZ]; int i, tmp; for (i = 0; i < SZ; i++) { tmp = A[i]; B[i] = tmp; } assert(A[SZ/2] == B[SZ/2]); }* On running -O1 followed by -reg2mem I get the following IR: *define dso_local i32 @main() local_unnamed_addr #0 {entry: %A = alloca [2048
2017 Sep 13
2
RFC phantom memory intrinsic
Hi Michael, >I have a case where InstCombine removes a store and your approach would be >valuable for me if the entire access to an aggregate could be restored. Yes, no problem and we could add the aggregate pointer to this new intrinsic and in my particular case I should ignore it, but I am looking now at "speculation_marker" metadata and I am still not sure how to implement it
2017 May 19
4
memcmp code fragment
Hi, Look at the following code: Look at the following C code seqence: unsigned char mainGtU ( unsigned int i1, unsigned int i2, unsigned char* block) { unsigned char c1, c2; c1 = block[i1]; c2 = block[i2]; if (c1 != c2) return (c1 > c2); i1++; i2++; c1 = block[i1]; c2 = block[i2]; if (c1 != c2) return (c1 > c2); i1++; i2++; .. ..
2013 Feb 05
3
[LLVMdev] Vectorizing global struct pointers
Hi all, One of the reasons the Livermore Loops couldn't be vectorized is that it was using global structures to hold the arrays. Today, I'm investigating why is that so and how to fix it. My investigation brought me to LoopVectorizationLegality::canVectorizeMemory(): if (WriteObjects.count(*it)) { DEBUG(dbgs() << "LV: Found a possible read/write reorder:"
2015 Dec 04
2
Field sensitive alias analysis?
Hi, I'm trying to optimize a simple C code and came across a situation where invariant code is not being moved out: On an -O3 compilation, I noticed that the "load" for the loop bounds (which remain invariant throughout) happens on each iteration of both the loops, even though it is not modified anywhere in the function "bigLoop". It seems that alias analysis is not able