Björn Pettersson A via llvm-dev
2020-Dec-04 22:55 UTC
[llvm-dev] Is it feasible for LV to vectorize a loop accessing A[i] using VF>2 when A has only 2 elements?
Hi! Consider an example like this int G[1000]; void gazonk(int N) { int A[2] = {0}; for (int i = 0; i < N; ++i) G[i] = A[i] + i; } When compiling with "-O3 -emit-llvm" ( see https://godbolt.org/z/f6jWfM ) the loop is vectorized with VF=4 and we get %2 = alloca i64, align 8 %3 = bitcast i64* %2 to [2 x i32]* ... vector.body: ... %24 = getelementptr inbounds [2 x i32], [2 x i32]* %3, i64 0, i64 %21, !dbg !35 %25 = bitcast i32* %24 to <4 x i32>*, !dbg !35 %26 = load <4 x i32>, <4 x i32>* %25, align 8, !dbg !35, !tbaa !36 ... Loading <4 x i32> from something pointing into [2 x i32] seems like a bad thing (UB?). And I believe that for example BasicAliasAnalysis will assume that the load won't alias with anything else since the size of the access is greater than the underlying object, so the code in the vector body is just crap afaict. There are some loop guards that perhaps (hopefully) protects from running the vector body here, but isn't it a bit weird thing to introduce such code anyway? BR, Björn
Philip Reames via llvm-dev
2020-Dec-04 23:16 UTC
[llvm-dev] Is it feasible for LV to vectorize a loop accessing A[i] using VF>2 when A has only 2 elements?
The code the vectorizer emits should be correct for any well defined value of N. (A program which runs this with N > 2 is full UB.) The existing code doesn't (yet) reason about out of bounds accesses from known object sizes as a means to imply bounds on loop induction variables. It would be feasible to do so, it's just not something anyone has bothered with. Philip On 12/4/20 2:55 PM, Björn Pettersson A via llvm-dev wrote:> Hi! > > Consider an example like this > > int G[1000]; > > void gazonk(int N) { > int A[2] = {0}; > for (int i = 0; i < N; ++i) > G[i] = A[i] + i; > } > > > When compiling with "-O3 -emit-llvm" ( see https://godbolt.org/z/f6jWfM ) > the loop is vectorized with VF=4 and we get > > %2 = alloca i64, align 8 > %3 = bitcast i64* %2 to [2 x i32]* > ... > vector.body: > ... > %24 = getelementptr inbounds [2 x i32], [2 x i32]* %3, i64 0, i64 %21, !dbg !35 > %25 = bitcast i32* %24 to <4 x i32>*, !dbg !35 > %26 = load <4 x i32>, <4 x i32>* %25, align 8, !dbg !35, !tbaa !36 > ... > > > Loading <4 x i32> from something pointing into [2 x i32] seems like a bad thing (UB?). > And I believe that for example BasicAliasAnalysis will assume that the load won't alias with anything else since the size of the access is greater than the underlying object, so the code in the vector body is just crap afaict. > > There are some loop guards that perhaps (hopefully) protects from running the vector body here, > but isn't it a bit weird thing to introduce such code anyway? > > BR, > Björn > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev