Lin, Jin via llvm-dev
2016-Aug-17 17:39 UTC
[llvm-dev] Loop vectorization with the loop containing bitcast
Hi , The following loop fails to be vectorized since the load c[i] is casted as i64 and the store c[i] is double. The loop access analysis gives up since they are in different types. Since these two memory operations are in the same size, I believe the loop access analysis should return forward dependence and thus the loop can be vectorized. Any comments? Thanks, Jin #define N 1000 double a[N], b[N],c[N]; void foo() { for (int i=0;i<N;i++) { b[i] =c[i]; c[i]=0.0; } } for.body: ; preds = %for.body, %entry %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] %arrayidx = getelementptr inbounds [1000 x double], [1000 x double]* @c, i64 0, i64 %indvars.iv %0 = bitcast double* %arrayidx to i64* %1 = load i64, i64* %0, align 8, !tbaa !1 %arrayidx2 = getelementptr inbounds [1000 x double], [1000 x double]* @b, i64 0, i64 %indvars.iv %2 = bitcast double* %arrayidx2 to i64* store i64 %1, i64* %2, align 8, !tbaa !1 store double 0.000000e+00, double* %arrayidx, align 8, !tbaa !1 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond = icmp eq i64 %indvars.iv.next, 1000 br i1 %exitcond, label %for.cond.cleanup, label %for.body LAA: Found a loop in foo: loop.17 LAA: Processing memory accesses... AST: Alias Set Tracker: 2 alias sets for 3 pointer values. AliasSet[0x9508b80, 1] must alias, No access Pointers: (<4 x i64>* %1, 18446744073709551615) AliasSet[0x95f8a70, 2] must alias, No access Pointers: (<4 x double>* %2, 18446744073709551615), (<4 x i64>* %0, 18446744073709551615) LAA: Accesses(3): %1 = bitcast double* %arrayIdx11 to <4 x i64>* (write) %2 = bitcast double* %arrayIdx to <4 x double>* (write) %0 = bitcast double* %arrayIdx to <4 x i64>* (read-only) Underlying objects for pointer %1 = bitcast double* %arrayIdx11 to <4 x i64>* @b = common local_unnamed_addr global [1000 x double] zeroinitializer, align 16 Underlying objects for pointer %2 = bitcast double* %arrayIdx to <4 x double>* @c = common local_unnamed_addr global [1000 x double] zeroinitializer, align 16 Underlying objects for pointer %0 = bitcast double* %arrayIdx to <4 x i64>* @c = common local_unnamed_addr global [1000 x double] zeroinitializer, align 16 LAA: Found a runtime check ptr: %1 = bitcast double* %arrayIdx11 to <4 x i64>* LAA: Found a runtime check ptr: %2 = bitcast double* %arrayIdx to <4 x double>* LAA: Found a runtime check ptr: %0 = bitcast double* %arrayIdx to <4 x i64>* LAA: We need to do 0 pointer comparisons. LAA: We can perform a memory runtime check if needed. LAA: Checking memory dependencies LAA: Src Scev: {@c,+,32}<nsw><%loop.17>Sink Scev: {@c,+,32}<nsw><%loop.17>(Induction step: 1) LAA: Distance for %gepload = load <4 x i64>, <4 x i64>* %0, align 16, !tbaa !1 to store <4 x double> zeroinitializer, <4 x double>* %2, align 16, !tbaa !1: 0 LAA: Zero dependence difference but different types Total Dependences: 1 LAA: unsafe dependent memory operations in loop -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160817/04c8d000/attachment.html>
Michael Kuperstein via llvm-dev
2016-Aug-17 18:33 UTC
[llvm-dev] Loop vectorization with the loop containing bitcast
Hi Jin, I agree, this looks wrong. The bitcasts are fallout from r226781 - and we should be able to look through them if the size is the same. Can you please file a PR? Thanks, Michael On Wed, Aug 17, 2016 at 10:39 AM, Lin, Jin via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi , > > > > The following loop fails to be vectorized since the load c[i] is casted as > i64 and the store c[i] is double. The loop access analysis gives up since > they are in different types. > > > > Since these two memory operations are in the same size, I believe the loop > access analysis should return forward dependence and thus the loop can be > vectorized. > > > > Any comments? > > > > Thanks, > > > > Jin > > > > #define N 1000 > > double a[N], b[N],c[N]; > > void foo() { > > for (int i=0;i<N;i++) { > > b[i] =c[i]; > > c[i]=0.0; > > } > > } > > > > for.body: ; preds = %for.body, > %entry > > %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] > > %arrayidx = getelementptr inbounds [1000 x double], [1000 x double]* @c, > i64 0, i64 %indvars.iv > > %0 = bitcast double* %arrayidx to i64* > > %1 = load i64, i64* %0, align 8, !tbaa !1 > > %arrayidx2 = getelementptr inbounds [1000 x double], [1000 x double]* > @b, i64 0, i64 %indvars.iv > > %2 = bitcast double* %arrayidx2 to i64* > > store i64 %1, i64* %2, align 8, !tbaa !1 > > store double 0.000000e+00, double* %arrayidx, align 8, !tbaa !1 > > %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 > > %exitcond = icmp eq i64 %indvars.iv.next, 1000 > > br i1 %exitcond, label %for.cond.cleanup, label %for.body > > > > LAA: Found a loop in foo: loop.17 > > LAA: Processing memory accesses... > > AST: Alias Set Tracker: 2 alias sets for 3 pointer values. > > AliasSet[0x9508b80, 1] must alias, No access Pointers: (<4 x i64>* %1, > 18446744073709551615) > > AliasSet[0x95f8a70, 2] must alias, No access Pointers: (<4 x double>* > %2, 18446744073709551615), (<4 x i64>* %0, 18446744073709551615) > > > > LAA: Accesses(3): > > %1 = bitcast double* %arrayIdx11 to <4 x i64>* (write) > > %2 = bitcast double* %arrayIdx to <4 x double>* (write) > > %0 = bitcast double* %arrayIdx to <4 x i64>* (read-only) > > Underlying objects for pointer %1 = bitcast double* %arrayIdx11 to <4 x > i64>* > > @b = common local_unnamed_addr global [1000 x double] zeroinitializer, > align 16 > > Underlying objects for pointer %2 = bitcast double* %arrayIdx to <4 x > double>* > > @c = common local_unnamed_addr global [1000 x double] zeroinitializer, > align 16 > > Underlying objects for pointer %0 = bitcast double* %arrayIdx to <4 x > i64>* > > @c = common local_unnamed_addr global [1000 x double] zeroinitializer, > align 16 > > LAA: Found a runtime check ptr: %1 = bitcast double* %arrayIdx11 to <4 x > i64>* > > LAA: Found a runtime check ptr: %2 = bitcast double* %arrayIdx to <4 x > double>* > > LAA: Found a runtime check ptr: %0 = bitcast double* %arrayIdx to <4 x > i64>* > > LAA: We need to do 0 pointer comparisons. > > LAA: We can perform a memory runtime check if needed. > > LAA: Checking memory dependencies > > LAA: Src Scev: {@c,+,32}<nsw><%loop.17>Sink Scev: > {@c,+,32}<nsw><%loop.17>(Induction step: 1) > > LAA: Distance for %gepload = load <4 x i64>, <4 x i64>* %0, align 16, > !tbaa !1 to store <4 x double> zeroinitializer, <4 x double>* %2, align > 16, !tbaa !1: 0 > > LAA: Zero dependence difference but different types > > Total Dependences: 1 > > LAA: unsafe dependent memory operations in loop > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160817/bf788a9c/attachment.html>
Lin, Jin via llvm-dev
2016-Aug-17 19:05 UTC
[llvm-dev] Loop vectorization with the loop containing bitcast
Hi Michael, Many thanks for your quick response. The PR 29021 has been filed to address this issue. Jin From: Michael Kuperstein [mailto:mkuper at google.com] Sent: Wednesday, August 17, 2016 11:33 AM To: Lin, Jin <jin.lin at intel.com> Cc: llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] Loop vectorization with the loop containing bitcast Hi Jin, I agree, this looks wrong. The bitcasts are fallout from r226781 - and we should be able to look through them if the size is the same. Can you please file a PR? Thanks, Michael On Wed, Aug 17, 2016 at 10:39 AM, Lin, Jin via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi , The following loop fails to be vectorized since the load c[i] is casted as i64 and the store c[i] is double. The loop access analysis gives up since they are in different types. Since these two memory operations are in the same size, I believe the loop access analysis should return forward dependence and thus the loop can be vectorized. Any comments? Thanks, Jin #define N 1000 double a[N], b[N],c[N]; void foo() { for (int i=0;i<N;i++) { b[i] =c[i]; c[i]=0.0; } } for.body: ; preds = %for.body, %entry %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] %arrayidx = getelementptr inbounds [1000 x double], [1000 x double]* @c, i64 0, i64 %indvars.iv %0 = bitcast double* %arrayidx to i64* %1 = load i64, i64* %0, align 8, !tbaa !1 %arrayidx2 = getelementptr inbounds [1000 x double], [1000 x double]* @b, i64 0, i64 %indvars.iv %2 = bitcast double* %arrayidx2 to i64* store i64 %1, i64* %2, align 8, !tbaa !1 store double 0.000000e+00, double* %arrayidx, align 8, !tbaa !1 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond = icmp eq i64 %indvars.iv.next, 1000 br i1 %exitcond, label %for.cond.cleanup, label %for.body LAA: Found a loop in foo: loop.17 LAA: Processing memory accesses... AST: Alias Set Tracker: 2 alias sets for 3 pointer values. AliasSet[0x9508b80, 1] must alias, No access Pointers: (<4 x i64>* %1, 18446744073709551615) AliasSet[0x95f8a70, 2] must alias, No access Pointers: (<4 x double>* %2, 18446744073709551615), (<4 x i64>* %0, 18446744073709551615) LAA: Accesses(3): %1 = bitcast double* %arrayIdx11 to <4 x i64>* (write) %2 = bitcast double* %arrayIdx to <4 x double>* (write) %0 = bitcast double* %arrayIdx to <4 x i64>* (read-only) Underlying objects for pointer %1 = bitcast double* %arrayIdx11 to <4 x i64>* @b = common local_unnamed_addr global [1000 x double] zeroinitializer, align 16 Underlying objects for pointer %2 = bitcast double* %arrayIdx to <4 x double>* @c = common local_unnamed_addr global [1000 x double] zeroinitializer, align 16 Underlying objects for pointer %0 = bitcast double* %arrayIdx to <4 x i64>* @c = common local_unnamed_addr global [1000 x double] zeroinitializer, align 16 LAA: Found a runtime check ptr: %1 = bitcast double* %arrayIdx11 to <4 x i64>* LAA: Found a runtime check ptr: %2 = bitcast double* %arrayIdx to <4 x double>* LAA: Found a runtime check ptr: %0 = bitcast double* %arrayIdx to <4 x i64>* LAA: We need to do 0 pointer comparisons. LAA: We can perform a memory runtime check if needed. LAA: Checking memory dependencies LAA: Src Scev: {@c,+,32}<nsw><%loop.17>Sink<mailto:%7b at c,+,32%7d%3cnsw%3e%3c%25loop.17%3eSink> Scev: {@c,+,32}<nsw><%loop.17>(Induction<mailto:%7b at c,+,32%7d%3cnsw%3e%3c%25loop.17%3e(Induction> step: 1) LAA: Distance for %gepload = load <4 x i64>, <4 x i64>* %0, align 16, !tbaa !1 to store <4 x double> zeroinitializer, <4 x double>* %2, align 16, !tbaa !1: 0 LAA: Zero dependence difference but different types Total Dependences: 1 LAA: unsafe dependent memory operations in loop _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160817/be815516/attachment.html>