Anna Thomas via llvm-dev
2016-May-12 21:20 UTC
[llvm-dev] Handling misaligned array accesses
Hi, I have tried couple of c test cases with llvm to see if we handle misaligned accesses, but it seems we do not have transformations to align loop accesses. Misaligned accesses can worsen performance depending on the underlying target (severity of crossing cache line boundaries) One example: //unaligned load and store int foo(short *a, int m){ int i; for(i=1; i<m ; i++) a[i] *=2; return a[3]; } IR generated though clang -O3 -mllvm -disable-llvm-optzns. Passed this through opt -O3 and the loop vectorizer adds vector code for this loop, but the GEP access starts at offset 1. vector.body: ; preds = %vector.body, %vector.body.preheader.new %index = phi i64 [ 0, %vector.body.preheader.new ], [ %index.next.3, %vector.body ] %niter = phi i64 [ %unroll_iter, %vector.body.preheader.new ], [ %niter.nsub.3, %vector.body ] %offset.idx = or i64 %index, 1 %9 = getelementptr inbounds i16, i16* %a, i64 %offset.idx %10 = bitcast i16* %9 to <8 x i16>* %wide.load = load <8 x i16>, <8 x i16>* %10, align 2, !tbaa !2 %11 = shl <8 x i16> %wide.load, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1> %12 = bitcast i16* %9 to <8 x i16>* store <8 x i16> %11, <8 x i16>* %12, align 2, !tbaa !2 Is there a reason we don’t support loop peeling for alignment handling? Thanks, Anna -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160512/8fa3f7b2/attachment.html>
Hal Finkel via llvm-dev
2016-May-12 23:42 UTC
[llvm-dev] Handling misaligned array accesses
----- Original Message -----> From: "Anna Thomas" <anna at azul.com> > To: llvm-dev at lists.llvm.org > Cc: hfinkel at anl.gov, anemet at apple.com > Sent: Thursday, May 12, 2016 4:20:24 PM > Subject: Handling misaligned array accesses > > Hi, > > > I have tried couple of c test cases with llvm to see if we handle > misaligned accesses, but it seems we do not have transformations to > align loop accesses. Misaligned accesses can worsen performance > depending on the underlying target (severity of crossing cache line > boundaries) > > > One example: > //unaligned load and store > > > int foo(short *a, int m){ > int i; > for(i=1; i<m ; i++) > a[i] *=2; > return a[3]; > } > > > IR generated though clang -O3 -mllvm -disable-llvm-optzns. Passed > this through opt -O3 and the loop vectorizer adds vector code for > this loop, but the GEP access starts at offset 1. > > > > vector.body: ; preds = %vector.body, %vector.body.preheader.new > %index = phi i64 [ 0, %vector.body.preheader.new ], [ %index.next.3, > %vector.body ] > %niter = phi i64 [ %unroll_iter, %vector.body.preheader.new ], [ > %niter.nsub.3, %vector.body ] > %offset.idx = or i64 %index, 1 > %9 = getelementptr inbounds i16, i16* %a, i64 %offset.idx > %10 = bitcast i16* %9 to <8 x i16>* > %wide.load = load <8 x i16>, <8 x i16>* %10, align 2, !tbaa !2 > %11 = shl <8 x i16> %wide.load, <i16 1, i16 1, i16 1, i16 1, i16 1, > i16 1, i16 1, i16 1> > %12 = bitcast i16* %9 to <8 x i16>* > store <8 x i16> %11, <8 x i16>* %12, align 2, !tbaa !2 > > > Is there a reason we don’t support loop peeling for alignment > handling? >No. AFAIK, just no one has done the work to implement it yet. I'd certainly be quite interested in it, however. Several targets I care about would benefit from alignment-based peeling. -Hal> > Thanks, > > Anna >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory