Daniel Berlin
2015-Jan-14 21:09 UTC
[LLVMdev] question about enabling cfl-aa and collecting a57 numbers
Oh, sorry, i didn't rebase it when i changed the fix, you would have had to apply the first on top of the second. Here is one against HEAD On Wed, Jan 14, 2015 at 12:32 PM, Ana Pazos <apazos at codeaurora.org> wrote:> Daniel, your patch does not apply cleanly. Are you on the tip? > > The code I see there is no line if (QueryResult == MayAlias|| QueryResult == PartialAlias) to be removed. > > Thanks, > > Ana. > > > > *From:* George Burgess IV [mailto:george.burgess.iv at gmail.com] > *Sent:* Wednesday, January 14, 2015 10:31 AM > *To:* Daniel Berlin > *Cc:* Nick Lewycky; Ana Pazos; Jiangning Liu; LLVM Developers Mailing List > *Subject:* Re: [LLVMdev] question about enabling cfl-aa and collecting > a57 numbers > > > > Inline > > - George > > > On Jan 14, 2015, at 10:49 AM, Daniel Berlin <dberlin at dberlin.org> wrote: > > > > > > On Tue, Jan 13, 2015 at 11:26 PM, Nick Lewycky <nlewycky at google.com> > wrote: > > On 13 January 2015 at 22:11, Daniel Berlin <dberlin at dberlin.org> wrote: > > This is caused by CFLAA returning PartialAlias for a query that BasicAA > can prove is NoAlias. > > > > One of them is wrong. Which one? > > > > CFL-AA. > > > > Right now it checks whether two things come out to be the same stratified > info and have the same index, and if so, returns PartialAlias. This is > wrong, because it never knows why two things got unified, only that they > did. > > > > Therefore, the current check it does where it returns PartialAlias seems > wrong. > > > > George, why does it return PartialAlias? > > It shouldn't -- unless I misinterpreted an earlier discussion, I was under > the impression that we were testing with that line updated to return > MayAlias. We do need to update that in LLVM though; thanks for the fix. > > > > IMHO, you can't ever prove any kind of must-alias or partialalias info > with CFL-AA, unless i'm missing something. > > Agreed. > > > > > > I'm not sure from your description that this is a chaining issue. > PartialAlias doesn't chain and isn't supposed to, it's a final answer just > like NoAlias and MustAlias are. You return partial alias when you have > proven that the accesses are overlapping, meaning that it is proven to > neither be no alias nor must alias. > > Well, it's still chaining wrong, because it did not change on MayAlias, > but you are right that it is not the issue i identified :) > > > > Anyway, updated patch attached. > > > > > > Nick > > > > I've attached a patch that fixes it, but truthfully, chaining seems > somewhat badly designed IMHO. > > > > It should let you pass along the result you've gotten so far, so that you > don't have CFLAA get PartialAlias, chain, and have BasicAA get MayAlias. > > > > (IE it should let it work strictly down the hierarchy). > > > > Note that this patch breaks the CFLAliasAnalysis tests because it no > longer returns PartialAlias in some cases it is expecting it. > > > > I don't have time ATM to fix chaining completely, so if someone wants to > shepherd this patch through, that would be appreciated. > > > > > > > > > > > > On Tue, Jan 13, 2015 at 8:58 PM, Daniel Berlin <dberlin at dberlin.org> > wrote: > > Can you send me actual LLVM IR or a preprocessed source from using -E? > I don't have a machine handy that has headers that target that arch. > > > > > > On Tue Jan 13 2015 at 4:33:29 PM Daniel Berlin <dberlin at dberlin.org> > wrote: > > Anything other than noalias or mustalias should be getting passed down the > stack, so either that is not happening or CFL aa is giving better answers > and something does worse with those better answers. I'll take a look this > evening > > On Jan 13, 2015 3:58 PM, "Ana Pazos" <apazos at codeaurora.org> wrote: > > Hi folks, > > Moving the discussion to llvm.dev. > > None of the changes we talked earlier help. > > Find attached the C source code that you can use to reproduce the issue. > > clang --target=aarch64-linux-gnu -c -mcpu=cortex-a57 -Ofast > -fno-math-errno test.c -S -o test.s -mllvm -debug-only=licm > LICM hoisting to while.body.lr.ph: %21 = load double** %arrayidx8, > align 8, !tbaa !5 > LICM hoisting to while.body.lr.ph: %arrayidx72 = getelementptr inbounds > double* %11, i64 1 > LICM hoisting to while.body.lr.ph: %arrayidx81 = getelementptr inbounds > double* %11, i64 2 > LICM hoisting to for.body.lr.ph: %2 = ptrtoint i32* %Index to i64 > > clang --target=aarch64-linux-gnu -c -mcpu=cortex-a57 -Ofast > -fno-math-errno test.c -S -o test-cflaa.s -mllvm -use-cfl-aa -mllvm > -debug-only=licm > LICM hoisting to for.body.lr.ph: %2 = ptrtoint i32* %Index to i64 > > Why CFL AA cannot allow hoisting this out: %21 = load double** > %arrayidx8, align 8, !tbaa !5 > > Which leads to this extra load in assembly code: ldr x14, [x4, x9, lsl > #3] > > Thanks! > Ana. > > -----Original Message----- > From: Hal Finkel [mailto:hfinkel at anl.gov] > Sent: Tuesday, January 13, 2015 12:51 AM > To: Chandler Carruth > Cc: Jiangning Liu; Pazos, Ana; Ana Pazos; Daniel Berlin; George Burgess IV > Subject: Re: question about enabling cfl-aa and collecting a57 numbers > > ----- Original Message ----- > > From: "Chandler Carruth" <chandlerc at gmail.com> > > To: "Ana Pazos" <apazos at codeaurora.org>, "Daniel Berlin" < > dberlin at dberlin.org>, "George Burgess IV" > > <george.burgess.iv at gmail.com>, "Hal Finkel" <hfinkel at anl.gov> > > Cc: "Jiangning Liu" <Jiangning.Liu at arm.com>, "Ana Pazos" > > <apazos at quicinc.com> > > Sent: Monday, January 12, 2015 6:43:44 PM > > Subject: Re: question about enabling cfl-aa and collecting a57 numbers > > > > Not quite: > > > > > > On Mon Jan 12 2015 at 4:27:30 PM Ana Pazos < apazos at codeaurora.org > > > wrote: > > > > Thanks George and Daniel, > > > > > > > > If the recommended order is “–basicaa –cfla-aa” it means we should fix > > the trunk code that processes the flags -mllvm use-cfl-aa-in-codegen > > and –mllvm use-cfl-aa, right? > > > > > > > > In Transforms/IPO/PassManagerBuilder.cpp and CodeGen/Passes.cpp we see > > this sequence: > > > > > > > > if (UseCFLAA) > > > > PM.add(createCFLAliasAnalysisPass()); > > > > PM.add(createTypeBasedAliasAnalysisPass()); > > > > PM.add(createScopedNoAliasAAPass()); > > > > PM.add(createBasicAliasAnalysisPass()); > > > > > > > > So are you recommending changing to the sequence below instead? > > > > > > Not quite: > > > > > > addPass(createTypeBasedAliasAnalysisPass()); > > > > addPass(createScopedNoAliasAAPass()); > > You want it here. > > > > addPass(createBasicAliasAnalysisPass()); > > > > if (UseCFLAA) > > > > addPass(createCFLAliasAnalysisPass()); > > > > > > The way to think about this is as follows: > > > > > > if TBAA or ScopedNoAlias *can* return results, you want them to. They > > are reflecting a-priori knowledge of aliasing. > > > > > > if both of those say "maybe", you want to ask CFL. If CFL says "maybe" > > you want to ask "BasicAA". > > I could very well be misunderstanding something, but I recall that we add > BasicAA last so that it will be queried first. We do this because of our > BasicAA delegation hack: We want a MustAlias from BasicAA to override a > NoAlias from TBAA, so that we can catch cases of basic local type punning. > Thus, assuming CFL delegates properly, I fail to see how switching its > order w.r.t. the metadata-based analysis changes anything. > > -Hal > > > > > > > It's possible that you could as a compile-time hack flip the last two > > to your suggested ordering, but a) it should really only be a compile > > time hack, and b) we haven't really tested that BasicAA actually > > delegates rather than directly returning "maybe". > > > > > > Thanks, > > > > Ana. > > > > > > > > From: Daniel Berlin [mailto: dberlin at dberlin.org ] > > Sent: Friday, January 09, 2015 6:41 PM > > To: George Burgess IV; Hal Finkel > > Cc: Ana Pazos; Pazos, Ana; Jiangning Liu; chandlerc > > > > > > Subject: Re: question about enabling cfl-aa and collecting a57 numbers > > > > > > Right. I would not try with -cfl-aa -basicaa as the order, it should > > be the other way around. > > > > > > If you discover performance regressions with *that*, that would be > > highly interesting. > > > > > > On Fri Jan 09 2015 at 6:32:28 PM George Burgess IV < > > george.burgess.iv at gmail.com > wrote: > > > > > > > > Thanks for the notes and observations Ana! :) > > > > >> Hal, what do you think? Does it meant that cfl-aa is more > > >> conservative (note I set the right order -cfa-aa followed by > > >> -basicaa)? > > Currently, yes. The implementation as it is is focused heavily on > > being fast, and on trying to cover cases that BasicAA can’t easily > > cover itself. If our end goal is to use this as a replacement for > > BasicAA, there’s probably quite a few low-hanging fruit for accuracy > > improvements, and we can start playing with sacrificing speed in > > pursuit of greater accuracy. On the other hand, if the end goal is to > > append this pass to the list of AA passes already being performed, > > then I’d be interested in seeing the difference in perf between > > `-basicaa` and `-basicaa -cfl-aa`, if any. > > > > WRT this issue in particular, I believe CFLAA answers conservatively > > because: > > A. it doesn’t take into account constant pointer offsets (i.e. it > > considers a[0]..a[n] all as the same “address”) B. CFLAA is entirely > > context insensitive > > > > IIRC, BasicAA is somewhat context sensitive, and it does take into > > account constant pointer offsets, so it can be more accurate in this > > case. That being said, I don’t have access to the SPEC code, so I can > > only speculate. :) > > > > George > > > > > On Jan 9, 2015, at 4:25 PM, Hal Finkel < hfinkel at anl.gov > wrote: > > > > > > Ana, > > > > > > Thanks so much for looking into this. I'm adding some additional > > > relevant people to the CC line... > > > > > > -Hal > > > > > > ----- Original Message ----- > > >> From: "Ana Pazos" < apazos at codeaurora.org > > > >> To: "Ana Pazos" < apazos at quicinc.com >, "Hal Finkel" < > > >> hfinkel at anl.gov >, "Jiangning Liu" < Jiangning.Liu at arm.com > > > >> Sent: Friday, January 9, 2015 3:12:04 PM > > >> Subject: RE: question about enabling cfl-aa and collecting a57 > > >> numbers > > >> > > >> Hi Hal and Jiangning, > > >> > > >> I started to look at the effect of cfl-aa on SPEC. > > >> > > >> The first observation is that I see more redundant loads not being > > >> hoisted out of loops by LICM. > > >> > > >> Due to license restrictions, I cannot paste SPEC code here, but I > > >> think you have access to it. > > >> > > >> I tried to simplify a top function in equake (quake.c smvp) to show > > >> the issue. See the attached reduced test case. > > >> > > >> If you look at the the source code you will see something like: > > >> for { > > >> ...load v[i][0]... > > >> while { > > >> ...load v[i][0]... > > >> store w[col][0].... > > >> } > > >> ...load w[i][0]... > > >> store w[i][0]... > > >> } > > >> > > >> It should be possible to avoid load &v[i][0], &v[i][1], &vi[i][2] > > >> inside the while loop. > > >> > > >> In the simplified example that would be this instruction: > > >> %21 = load double** %arrayidx8, align 8, !tbaa !5 > > >> > > >> And the pointer for the load instruction: > > >> %arrayidx8 = getelementptr inbounds double** %v, i64 %idxprom > > >> > > >> The decision to hoist it fails at LICM.cpp:189: AliasSet &AS > > >> CurAST->getAliasSetForPointer(V, Size, AAInfo).isMod. > > >> > > >> With basicaa, only loads are in the alias set, but with cfl-aa you > > >> find loads and stores, so isMod returns true. > > >> > > >> AliasSet with basicaa: > > >> AliasSet[0x2b35aa0, 7] may alias, Ref Pointers: (double*** > > >> %arrayidx32, 8), (double** %12, 8), (double** %arrayidx36, 8), > > >> (double** %arrayidx8, 8), (double** %arrayidx68, 8), (double** > > >> %arrayidx77, 8), (double** %arrayidx85, 8) > > >> $5 = void > > >> > > >> AliasSet with cfa-aa: > > >> AliasSet[0x2b37df0, 20] may alias, Mod/Ref Pointers: (i32* > > >> %arrayidx30, 4), (double*** %arrayidx32, 8), (double** %12, 8), > > >> (double* %13, 8), (double** %arrayidx36, 8), (double* %15, 8), > > >> (double* %arrayidx42, 8), (double* %arrayidx45, 8), (double* > > >> %arrayidx51, 8), (double* %arrayidx54, 8), (double** %arrayidx8, > > >> 8), (double* %21, 8), (double** %arrayidx68, 8), (double* %23, 8), > > >> (double* %arrayidx72, 8), (double** %arrayidx77, 8), (double* %26, > > >> 8), (double* %arrayidx81, 8), (double** %arrayidx85, 8), (double* > > >> %29, 8) > > >> $8 = void > > >> > > >> You can reproduce it with the commands: > > >> opt -O3 -S -cfl-aa -basicaa -licm -debug-only=licm reduce.ll -o out > > >> > > >> opt -O3 -S - -basicaa -licm -debug-only=licm reduce.ll -o out LICM > > >> hoisting to while.body.lr.ph : %21 = load double** %arrayidx8, > > >> align 8, !tbaa !5 LICM hoisting to while.body.lr.ph : %arrayidx72 > > >> getelementptr inbounds double* %11, i64 1 LICM hoisting to > > >> while.body.lr.ph : %arrayidx81 = getelementptr inbounds double* > > >> %11, i64 2 > > >> > > >> Hal, what do you think? Does it meant that cfl-aa is more > > >> conservative (note I set the right order -cfa-aa followed by > > >> -basicaa)? Could it be an issue with building the AliasSet tracker? > > >> Any pointers on how to fix this? > > >> > > >> Thanks, > > >> Ana. > > >> > > >> > > >> > > >> -----Original Message----- > > >> From: Jiangning Liu [mailto: Jiangning.Liu at arm.com ] > > >> Sent: Wednesday, December 24, 2014 6:27 PM > > >> To: Hal Finkel; Ana Pazos > > >> Subject: RE: question about enabling cfl-aa and collecting a57 > > >> numbers > > >> > > >> Hi, > > >> > > >> I saw big regressions on cortex-A57 for the following SPEC > > >> benchmarks after using "-mllvm -use-cfl-aa ". > > >> > > >> spec.cpu2006.ref.462_libquantum 9.22% spec.cpu2000.ref.179_art > > >> 5.32% > > >> spec.cpu2000.ref.256_bzip2 4.91% > > >> > > >> Thanks, > > >> -Jiangning > > >> > > >>> -----Original Message----- > > >>> From: Hal Finkel [mailto: hfinkel at anl.gov ] > > >>> Sent: Wednesday, December 24, 2014 3:56 AM > > >>> To: Ana Pazos > > >>> Cc: Jiangning Liu > > >>> Subject: Re: question about enabling cfl-aa and collecting a57 > > >>> numbers > > >>> > > >>> ----- Original Message ----- > > >>>> From: "Ana Pazos" < apazos at codeaurora.org > > > >>>> To: "Hal Finkel" < hfinkel at anl.gov >, "Jiangning Liu" > > >>>> < Jiangning.Liu at arm.com > > > >>>> Sent: Monday, December 22, 2014 4:25:07 PM > > >>>> Subject: question about enabling cfl-aa and collecting a57 > > >>>> numbers > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> Hi Hal, > > >>>> > > >>>> > > >>>> > > >>>> Can you clarify to enable cfl-aa from clang we need both these > > >>>> flags? > > >>>> > > >>>> > > >>> > > >>> Hi Ana, > > >>> > > >>> Thanks for following-up on this! You only need (-mllvm > > >>> -use-cfl-aa) > > >>> to > > >>> use CFL AA during the main optimization pipeline, and just this is > > >>> enough to reproduce the performance regressions as far as I know. > > >>> You > > >>> can also use CFL AA during code generation (-mllvm > > >>> -use-cfl-aa-in-codegen), which matters only if your using AA at > > >>> all for code generation (which AArch64 does only for the > > >>> Cortex/A53). > > >>> > > >>> -Hal > > >>> > > >>>> > > >>>> -mllvm -use-cfl-aa > > >>>> > > >>>> -mllvm -cfl-aa-in-codegen > > >>>> > > >>>> > > >>>> > > >>>> Hi Jiangning, > > >>>> > > >>>> Did you collect a57 perf results for SPEC 2000 and 2006 enabling > > >>>> these two flags? > > >>>> > > >>>> Did you notice any correctness and performance regression? > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> Thanks, > > >>>> > > >>>> Ana. > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> Ana Pazos > > >>>> Qualcomm Innovation Center, Inc. > > >>>> The Qualcomm Innovation Center, Inc. is a member of the Code > > >>>> Aurora Forum, a Linux Foundation Collaborative Project. > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>> > > >>> -- > > >>> Hal Finkel > > >>> Assistant Computational Scientist > > >>> Leadership Computing Facility > > >>> Argonne National Laboratory > > >> > > >> > > >> -- IMPORTANT NOTICE: The contents of this email and any attachments > > >> are confidential and may also be privileged. If you are not the > > >> intended recipient, please notify the sender immediately and do not > > >> disclose the contents to any other person, use it for any purpose, > > >> or store or copy the information in any medium. Thank you. > > >> > > >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 > > >> 9NJ, Registered in England & Wales, Company No: 2557590 ARM > > >> Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 > > >> 9NJ, Registered in England & Wales, Company No: 2548782 > > >> > > > > > > -- > > > Hal Finkel > > > Assistant Computational Scientist > > > Leadership Computing Facility > > > Argonne National Laboratory > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > <cflaafix.diff> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150114/b9e2ab45/attachment.html> -------------- next part -------------- diff --git a/lib/Analysis/CFLAliasAnalysis.cpp b/lib/Analysis/CFLAliasAnalysis.cpp index 5f1b3d3..0fa8c6d 100644 --- a/lib/Analysis/CFLAliasAnalysis.cpp +++ b/lib/Analysis/CFLAliasAnalysis.cpp @@ -227,10 +227,13 @@ public: // Comparisons between global variables and other constants should be // handled by BasicAA. if (isa<Constant>(LocA.Ptr) && isa<Constant>(LocB.Ptr)) { - return MayAlias; + return AliasAnalysis::alias(LocA, LocB); } + AliasResult QueryResult = query(LocA, LocB); + if (QueryResult == MayAlias) + return AliasAnalysis::alias(LocA, LocB); - return query(LocA, LocB); + return QueryResult; } void initializePass() override { InitializeAliasAnalysis(this); } @@ -993,7 +996,7 @@ CFLAliasAnalysis::query(const AliasAnalysis::Location &LocA, auto SetB = *MaybeB; if (SetA.Index == SetB.Index) - return AliasAnalysis::PartialAlias; + return AliasAnalysis::MayAlias; auto AttrsA = Sets.getLink(SetA.Index).Attrs; auto AttrsB = Sets.getLink(SetB.Index).Attrs; diff --git a/lib/CodeGen/Passes.cpp b/lib/CodeGen/Passes.cpp index a5d3210..b2e42e7 100644 --- a/lib/CodeGen/Passes.cpp +++ b/lib/CodeGen/Passes.cpp @@ -400,10 +400,10 @@ void TargetPassConfig::addIRPasses() { // Add TypeBasedAliasAnalysis before BasicAliasAnalysis so that // BasicAliasAnalysis wins if they disagree. This is intended to help // support "obvious" type-punning idioms. - if (UseCFLAA) - addPass(createCFLAliasAnalysisPass()); addPass(createTypeBasedAliasAnalysisPass()); addPass(createScopedNoAliasAAPass()); + if (UseCFLAA) + addPass(createCFLAliasAnalysisPass()); addPass(createBasicAliasAnalysisPass()); // Before running any passes, run the verifier to determine if the input
Ana Pazos
2015-Jan-15 19:33 UTC
[LLVMdev] question about enabling cfl-aa and collecting a57 numbers
Daniel, don’t we need to fix the order of invoking alias analyses in lib/Transforms/IPO/PassManagerBuilder.cpp as well? Your patch fixed the order in lib/CodeGen/Passes.cpp and the delegation code in lib/Analysis/CFLAliasAnalysis.cpp. Thanks, Ana. From: Daniel Berlin [mailto:dberlin at dberlin.org] Sent: Wednesday, January 14, 2015 1:10 PM To: Ana Pazos Cc: George Burgess IV; Nick Lewycky; Jiangning Liu; LLVM Developers Mailing List Subject: Re: [LLVMdev] question about enabling cfl-aa and collecting a57 numbers Oh, sorry, i didn't rebase it when i changed the fix, you would have had to apply the first on top of the second. Here is one against HEAD On Wed, Jan 14, 2015 at 12:32 PM, Ana Pazos <apazos at codeaurora.org <mailto:apazos at codeaurora.org> > wrote: Daniel, your patch does not apply cleanly. Are you on the tip? The code I see there is no line if (QueryResult == MayAlias|| QueryResult == PartialAlias) to be removed. Thanks, Ana. From: George Burgess IV [mailto:george.burgess.iv at gmail.com <mailto:george.burgess.iv at gmail.com> ] Sent: Wednesday, January 14, 2015 10:31 AM To: Daniel Berlin Cc: Nick Lewycky; Ana Pazos; Jiangning Liu; LLVM Developers Mailing List Subject: Re: [LLVMdev] question about enabling cfl-aa and collecting a57 numbers Inline - George On Jan 14, 2015, at 10:49 AM, Daniel Berlin <dberlin at dberlin.org <mailto:dberlin at dberlin.org> > wrote: On Tue, Jan 13, 2015 at 11:26 PM, Nick Lewycky <nlewycky at google.com <mailto:nlewycky at google.com> > wrote: On 13 January 2015 at 22:11, Daniel Berlin <dberlin at dberlin.org <mailto:dberlin at dberlin.org> > wrote: This is caused by CFLAA returning PartialAlias for a query that BasicAA can prove is NoAlias. One of them is wrong. Which one? CFL-AA. Right now it checks whether two things come out to be the same stratified info and have the same index, and if so, returns PartialAlias. This is wrong, because it never knows why two things got unified, only that they did. Therefore, the current check it does where it returns PartialAlias seems wrong. George, why does it return PartialAlias? It shouldn't -- unless I misinterpreted an earlier discussion, I was under the impression that we were testing with that line updated to return MayAlias. We do need to update that in LLVM though; thanks for the fix. IMHO, you can't ever prove any kind of must-alias or partialalias info with CFL-AA, unless i'm missing something. Agreed. I'm not sure from your description that this is a chaining issue. PartialAlias doesn't chain and isn't supposed to, it's a final answer just like NoAlias and MustAlias are. You return partial alias when you have proven that the accesses are overlapping, meaning that it is proven to neither be no alias nor must alias. Well, it's still chaining wrong, because it did not change on MayAlias, but you are right that it is not the issue i identified :) Anyway, updated patch attached. Nick I've attached a patch that fixes it, but truthfully, chaining seems somewhat badly designed IMHO. It should let you pass along the result you've gotten so far, so that you don't have CFLAA get PartialAlias, chain, and have BasicAA get MayAlias. (IE it should let it work strictly down the hierarchy). Note that this patch breaks the CFLAliasAnalysis tests because it no longer returns PartialAlias in some cases it is expecting it. I don't have time ATM to fix chaining completely, so if someone wants to shepherd this patch through, that would be appreciated. On Tue, Jan 13, 2015 at 8:58 PM, Daniel Berlin <dberlin at dberlin.org <mailto:dberlin at dberlin.org> > wrote: Can you send me actual LLVM IR or a preprocessed source from using -E? I don't have a machine handy that has headers that target that arch. On Tue Jan 13 2015 at 4:33:29 PM Daniel Berlin <dberlin at dberlin.org <mailto:dberlin at dberlin.org> > wrote: Anything other than noalias or mustalias should be getting passed down the stack, so either that is not happening or CFL aa is giving better answers and something does worse with those better answers. I'll take a look this evening On Jan 13, 2015 3:58 PM, "Ana Pazos" <apazos at codeaurora.org <mailto:apazos at codeaurora.org> > wrote: Hi folks, Moving the discussion to llvm.dev. None of the changes we talked earlier help. Find attached the C source code that you can use to reproduce the issue. clang --target=aarch64-linux-gnu -c -mcpu=cortex-a57 -Ofast -fno-math-errno test.c -S -o test.s -mllvm -debug-only=licm LICM hoisting to while.body.lr.ph <http://while.body.lr.ph> : %21 = load double** %arrayidx8, align 8, !tbaa !5 LICM hoisting to while.body.lr.ph <http://while.body.lr.ph> : %arrayidx72 = getelementptr inbounds double* %11, i64 1 LICM hoisting to while.body.lr.ph <http://while.body.lr.ph> : %arrayidx81 = getelementptr inbounds double* %11, i64 2 LICM hoisting to for.body.lr.ph <http://for.body.lr.ph> : %2 = ptrtoint i32* %Index to i64 clang --target=aarch64-linux-gnu -c -mcpu=cortex-a57 -Ofast -fno-math-errno test.c -S -o test-cflaa.s -mllvm -use-cfl-aa -mllvm -debug-only=licm LICM hoisting to for.body.lr.ph <http://for.body.lr.ph> : %2 = ptrtoint i32* %Index to i64 Why CFL AA cannot allow hoisting this out: %21 = load double** %arrayidx8, align 8, !tbaa !5 Which leads to this extra load in assembly code: ldr x14, [x4, x9, lsl #3] Thanks! Ana. -----Original Message----- From: Hal Finkel [mailto:hfinkel at anl.gov <mailto:hfinkel at anl.gov> ] Sent: Tuesday, January 13, 2015 12:51 AM To: Chandler Carruth Cc: Jiangning Liu; Pazos, Ana; Ana Pazos; Daniel Berlin; George Burgess IV Subject: Re: question about enabling cfl-aa and collecting a57 numbers ----- Original Message -----> From: "Chandler Carruth" <chandlerc at gmail.com <mailto:chandlerc at gmail.com> > > To: "Ana Pazos" <apazos at codeaurora.org <mailto:apazos at codeaurora.org> >, "Daniel Berlin" <dberlin at dberlin.org <mailto:dberlin at dberlin.org> >, "George Burgess IV" > <george.burgess.iv at gmail.com <mailto:george.burgess.iv at gmail.com> >, "Hal Finkel" <hfinkel at anl.gov <mailto:hfinkel at anl.gov> > > Cc: "Jiangning Liu" <Jiangning.Liu at arm.com <mailto:Jiangning.Liu at arm.com> >, "Ana Pazos" > <apazos at quicinc.com <mailto:apazos at quicinc.com> > > Sent: Monday, January 12, 2015 6:43:44 PM > Subject: Re: question about enabling cfl-aa and collecting a57 numbers > > Not quite: > > > On Mon Jan 12 2015 at 4:27:30 PM Ana Pazos < apazos at codeaurora.org <mailto:apazos at codeaurora.org> > > wrote: > > Thanks George and Daniel, > > > > If the recommended order is “–basicaa –cfla-aa” it means we should fix > the trunk code that processes the flags -mllvm use-cfl-aa-in-codegen > and –mllvm use-cfl-aa, right? > > > > In Transforms/IPO/PassManagerBuilder.cpp and CodeGen/Passes.cpp we see > this sequence: > > > > if (UseCFLAA) > > PM.add(createCFLAliasAnalysisPass()); > > PM.add(createTypeBasedAliasAnalysisPass()); > > PM.add(createScopedNoAliasAAPass()); > > PM.add(createBasicAliasAnalysisPass()); > > > > So are you recommending changing to the sequence below instead? > > > Not quite: > > > addPass(createTypeBasedAliasAnalysisPass()); > > addPass(createScopedNoAliasAAPass()); > You want it here. > > addPass(createBasicAliasAnalysisPass()); > > if (UseCFLAA) > > addPass(createCFLAliasAnalysisPass()); > > > The way to think about this is as follows: > > > if TBAA or ScopedNoAlias *can* return results, you want them to. They > are reflecting a-priori knowledge of aliasing. > > > if both of those say "maybe", you want to ask CFL. If CFL says "maybe" > you want to ask "BasicAA".I could very well be misunderstanding something, but I recall that we add BasicAA last so that it will be queried first. We do this because of our BasicAA delegation hack: We want a MustAlias from BasicAA to override a NoAlias from TBAA, so that we can catch cases of basic local type punning. Thus, assuming CFL delegates properly, I fail to see how switching its order w.r.t. the metadata-based analysis changes anything. -Hal> > > It's possible that you could as a compile-time hack flip the last two > to your suggested ordering, but a) it should really only be a compile > time hack, and b) we haven't really tested that BasicAA actually > delegates rather than directly returning "maybe". > > > Thanks, > > Ana. > > > > From: Daniel Berlin [mailto: dberlin at dberlin.org <mailto:dberlin at dberlin.org> ] > Sent: Friday, January 09, 2015 6:41 PM > To: George Burgess IV; Hal Finkel > Cc: Ana Pazos; Pazos, Ana; Jiangning Liu; chandlerc > > > Subject: Re: question about enabling cfl-aa and collecting a57 numbers > > > Right. I would not try with -cfl-aa -basicaa as the order, it should > be the other way around. > > > If you discover performance regressions with *that*, that would be > highly interesting. > > > On Fri Jan 09 2015 at 6:32:28 PM George Burgess IV < > george.burgess.iv at gmail.com <mailto:george.burgess.iv at gmail.com> > wrote: > > > > Thanks for the notes and observations Ana! :) > > >> Hal, what do you think? Does it meant that cfl-aa is more > >> conservative (note I set the right order -cfa-aa followed by > >> -basicaa)? > Currently, yes. The implementation as it is is focused heavily on > being fast, and on trying to cover cases that BasicAA can’t easily > cover itself. If our end goal is to use this as a replacement for > BasicAA, there’s probably quite a few low-hanging fruit for accuracy > improvements, and we can start playing with sacrificing speed in > pursuit of greater accuracy. On the other hand, if the end goal is to > append this pass to the list of AA passes already being performed, > then I’d be interested in seeing the difference in perf between > `-basicaa` and `-basicaa -cfl-aa`, if any. > > WRT this issue in particular, I believe CFLAA answers conservatively > because: > A. it doesn’t take into account constant pointer offsets (i.e. it > considers a[0]..a[n] all as the same “address”) B. CFLAA is entirely > context insensitive > > IIRC, BasicAA is somewhat context sensitive, and it does take into > account constant pointer offsets, so it can be more accurate in this > case. That being said, I don’t have access to the SPEC code, so I can > only speculate. :) > > George > > > On Jan 9, 2015, at 4:25 PM, Hal Finkel < hfinkel at anl.gov <mailto:hfinkel at anl.gov> > wrote: > > > > Ana, > > > > Thanks so much for looking into this. I'm adding some additional > > relevant people to the CC line... > > > > -Hal > > > > ----- Original Message ----- > >> From: "Ana Pazos" < apazos at codeaurora.org <mailto:apazos at codeaurora.org> > > >> To: "Ana Pazos" < apazos at quicinc.com <mailto:apazos at quicinc.com> >, "Hal Finkel" < > >> hfinkel at anl.gov <mailto:hfinkel at anl.gov> >, "Jiangning Liu" < Jiangning.Liu at arm.com <mailto:Jiangning.Liu at arm.com> > > >> Sent: Friday, January 9, 2015 3:12:04 PM > >> Subject: RE: question about enabling cfl-aa and collecting a57 > >> numbers > >> > >> Hi Hal and Jiangning, > >> > >> I started to look at the effect of cfl-aa on SPEC. > >> > >> The first observation is that I see more redundant loads not being > >> hoisted out of loops by LICM. > >> > >> Due to license restrictions, I cannot paste SPEC code here, but I > >> think you have access to it. > >> > >> I tried to simplify a top function in equake (quake.c smvp) to show > >> the issue. See the attached reduced test case. > >> > >> If you look at the the source code you will see something like: > >> for { > >> ...load v[i][0]... > >> while { > >> ...load v[i][0]... > >> store w[col][0].... > >> } > >> ...load w[i][0]... > >> store w[i][0]... > >> } > >> > >> It should be possible to avoid load &v[i][0], &v[i][1], &vi[i][2] > >> inside the while loop. > >> > >> In the simplified example that would be this instruction: > >> %21 = load double** %arrayidx8, align 8, !tbaa !5 > >> > >> And the pointer for the load instruction: > >> %arrayidx8 = getelementptr inbounds double** %v, i64 %idxprom > >> > >> The decision to hoist it fails at LICM.cpp:189: AliasSet &AS > >> CurAST->getAliasSetForPointer(V, Size, AAInfo).isMod. > >> > >> With basicaa, only loads are in the alias set, but with cfl-aa you > >> find loads and stores, so isMod returns true. > >> > >> AliasSet with basicaa: > >> AliasSet[0x2b35aa0, 7] may alias, Ref Pointers: (double*** > >> %arrayidx32, 8), (double** %12, 8), (double** %arrayidx36, 8), > >> (double** %arrayidx8, 8), (double** %arrayidx68, 8), (double** > >> %arrayidx77, 8), (double** %arrayidx85, 8) > >> $5 = void > >> > >> AliasSet with cfa-aa: > >> AliasSet[0x2b37df0, 20] may alias, Mod/Ref Pointers: (i32* > >> %arrayidx30, 4), (double*** %arrayidx32, 8), (double** %12, 8), > >> (double* %13, 8), (double** %arrayidx36, 8), (double* %15, 8), > >> (double* %arrayidx42, 8), (double* %arrayidx45, 8), (double* > >> %arrayidx51, 8), (double* %arrayidx54, 8), (double** %arrayidx8, > >> 8), (double* %21, 8), (double** %arrayidx68, 8), (double* %23, 8), > >> (double* %arrayidx72, 8), (double** %arrayidx77, 8), (double* %26, > >> 8), (double* %arrayidx81, 8), (double** %arrayidx85, 8), (double* > >> %29, 8) > >> $8 = void > >> > >> You can reproduce it with the commands: > >> opt -O3 -S -cfl-aa -basicaa -licm -debug-only=licm reduce.ll -o out > >> > >> opt -O3 -S - -basicaa -licm -debug-only=licm reduce.ll -o out LICM > >> hoisting to while.body.lr.ph <http://while.body.lr.ph> : %21 = load double** %arrayidx8, > >> align 8, !tbaa !5 LICM hoisting to while.body.lr.ph <http://while.body.lr.ph> : %arrayidx72 > >> getelementptr inbounds double* %11, i64 1 LICM hoisting to > >> while.body.lr.ph <http://while.body.lr.ph> : %arrayidx81 = getelementptr inbounds double* > >> %11, i64 2 > >> > >> Hal, what do you think? Does it meant that cfl-aa is more > >> conservative (note I set the right order -cfa-aa followed by > >> -basicaa)? Could it be an issue with building the AliasSet tracker? > >> Any pointers on how to fix this? > >> > >> Thanks, > >> Ana. > >> > >> > >> > >> -----Original Message----- > >> From: Jiangning Liu [mailto: Jiangning.Liu at arm.com <mailto:Jiangning.Liu at arm.com> ] > >> Sent: Wednesday, December 24, 2014 6:27 PM > >> To: Hal Finkel; Ana Pazos > >> Subject: RE: question about enabling cfl-aa and collecting a57 > >> numbers > >> > >> Hi, > >> > >> I saw big regressions on cortex-A57 for the following SPEC > >> benchmarks after using "-mllvm -use-cfl-aa ". > >> > >> spec.cpu2006.ref.462_libquantum 9.22% spec.cpu2000.ref.179_art > >> 5.32% > >> spec.cpu2000.ref.256_bzip2 4.91% > >> > >> Thanks, > >> -Jiangning > >> > >>> -----Original Message----- > >>> From: Hal Finkel [mailto: hfinkel at anl.gov <mailto:hfinkel at anl.gov> ] > >>> Sent: Wednesday, December 24, 2014 3:56 AM > >>> To: Ana Pazos > >>> Cc: Jiangning Liu > >>> Subject: Re: question about enabling cfl-aa and collecting a57 > >>> numbers > >>> > >>> ----- Original Message ----- > >>>> From: "Ana Pazos" < apazos at codeaurora.org <mailto:apazos at codeaurora.org> > > >>>> To: "Hal Finkel" < hfinkel at anl.gov <mailto:hfinkel at anl.gov> >, "Jiangning Liu" > >>>> < Jiangning.Liu at arm.com <mailto:Jiangning.Liu at arm.com> > > >>>> Sent: Monday, December 22, 2014 4:25:07 PM > >>>> Subject: question about enabling cfl-aa and collecting a57 > >>>> numbers > >>>> > >>>> > >>>> > >>>> > >>>> Hi Hal, > >>>> > >>>> > >>>> > >>>> Can you clarify to enable cfl-aa from clang we need both these > >>>> flags? > >>>> > >>>> > >>> > >>> Hi Ana, > >>> > >>> Thanks for following-up on this! You only need (-mllvm > >>> -use-cfl-aa) > >>> to > >>> use CFL AA during the main optimization pipeline, and just this is > >>> enough to reproduce the performance regressions as far as I know. > >>> You > >>> can also use CFL AA during code generation (-mllvm > >>> -use-cfl-aa-in-codegen), which matters only if your using AA at > >>> all for code generation (which AArch64 does only for the > >>> Cortex/A53). > >>> > >>> -Hal > >>> > >>>> > >>>> -mllvm -use-cfl-aa > >>>> > >>>> -mllvm -cfl-aa-in-codegen > >>>> > >>>> > >>>> > >>>> Hi Jiangning, > >>>> > >>>> Did you collect a57 perf results for SPEC 2000 and 2006 enabling > >>>> these two flags? > >>>> > >>>> Did you notice any correctness and performance regression? > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> Thanks, > >>>> > >>>> Ana. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> Ana Pazos > >>>> Qualcomm Innovation Center, Inc. > >>>> The Qualcomm Innovation Center, Inc. is a member of the Code > >>>> Aurora Forum, a Linux Foundation Collaborative Project. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>> > >>> -- > >>> Hal Finkel > >>> Assistant Computational Scientist > >>> Leadership Computing Facility > >>> Argonne National Laboratory > >> > >> > >> -- IMPORTANT NOTICE: The contents of this email and any attachments > >> are confidential and may also be privileged. If you are not the > >> intended recipient, please notify the sender immediately and do not > >> disclose the contents to any other person, use it for any purpose, > >> or store or copy the information in any medium. Thank you. > >> > >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 > >> 9NJ, Registered in England & Wales, Company No: 2557590 ARM > >> Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 > >> 9NJ, Registered in England & Wales, Company No: 2548782 > >> > > > > -- > > Hal Finkel > > Assistant Computational Scientist > > Leadership Computing Facility > > Argonne National Laboratory-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <cflaafix.diff> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150115/0a1e64b6/attachment.html>
Hal Finkel
2015-Jan-15 20:14 UTC
[LLVMdev] question about enabling cfl-aa and collecting a57 numbers
----- Original Message -----> From: "Ana Pazos" <apazos at codeaurora.org> > To: "Daniel Berlin" <dberlin at dberlin.org> > Cc: "George Burgess IV" <george.burgess.iv at gmail.com>, "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>, > "Jiangning Liu" <Jiangning.Liu at arm.com> > Sent: Thursday, January 15, 2015 1:33:57 PM > Subject: Re: [LLVMdev] question about enabling cfl-aa and collecting a57 numbers > > > > > > Daniel, don’t we need to fix the order of invoking alias analyses in > lib/Transforms/IPO/PassManagerBuilder.cpp as well? > > > Your patch fixed the order in lib/CodeGen/Passes.cpp and the > delegation code in lib/Analysis/CFLAliasAnalysis.cpp.Yes, the order in lib/CodeGen/Passes.cpp and lib/Transforms/IPO/PassManagerBuilder.cpp should be kept consistent. -Hal> > > > Thanks, > > Ana. > > > > From: Daniel Berlin [mailto:dberlin at dberlin.org] > Sent: Wednesday, January 14, 2015 1:10 PM > To: Ana Pazos > Cc: George Burgess IV; Nick Lewycky; Jiangning Liu; LLVM Developers > Mailing List > Subject: Re: [LLVMdev] question about enabling cfl-aa and collecting > a57 numbers > > > > > Oh, sorry, i didn't rebase it when i changed the fix, you would have > had to apply the first on top of the second. > > > > Here is one against HEAD > > > > > > On Wed, Jan 14, 2015 at 12:32 PM, Ana Pazos < apazos at codeaurora.org > > wrote: > > > > > > Daniel, your patch does not apply cleanly. Are you on the tip? The > code I see there is no line if (QueryResult == MayAlias|| > QueryResult == PartialAlias) to be removed. > > Thanks, > > Ana. > > > > > > From: George Burgess IV [mailto: george.burgess.iv at gmail.com ] > Sent: Wednesday, January 14, 2015 10:31 AM > To: Daniel Berlin > Cc: Nick Lewycky; Ana Pazos; Jiangning Liu; LLVM Developers Mailing > List > Subject: Re: [LLVMdev] question about enabling cfl-aa and collecting > a57 numbers > > > > > > > > > Inline > > - George > > > > On Jan 14, 2015, at 10:49 AM, Daniel Berlin < dberlin at dberlin.org > > wrote: > > > > > > > > > > > > On Tue, Jan 13, 2015 at 11:26 PM, Nick Lewycky < nlewycky at google.com > > wrote: > > > > > > > On 13 January 2015 at 22:11, Daniel Berlin < dberlin at dberlin.org > > wrote: > > > > > This is caused by CFLAA returning PartialAlias for a query that > BasicAA can prove is NoAlias. > > > > > > One of them is wrong. Which one? > > > > > > > > > > > CFL-AA. > > > > > > Right now it checks whether two things come out to be the same > stratified info and have the same index, and if so, returns > PartialAlias. This is wrong, because it never knows why two things > got unified, only that they did. > > > > > > Therefore, the current check it does where it returns PartialAlias > seems wrong. > > > > > > George, why does it return PartialAlias? > > It shouldn't -- unless I misinterpreted an earlier discussion, I was > under the impression that we were testing with that line updated to > return MayAlias. We do need to update that in LLVM though; thanks > for the fix. > > > > > > > > > > > > > IMHO, you can't ever prove any kind of must-alias or partialalias > info with CFL-AA, unless i'm missing something. > > Agreed. > > > > > > > > > > > > > > > > > > > > I'm not sure from your description that this is a chaining issue. > PartialAlias doesn't chain and isn't supposed to, it's a final > answer just like NoAlias and MustAlias are. You return partial alias > when you have proven that the accesses are overlapping, meaning that > it is proven to neither be no alias nor must alias. > > > Well, it's still chaining wrong, because it did not change on > MayAlias, but you are right that it is not the issue i identified :) > > > > > > Anyway, updated patch attached. > > > > > > > > > > > > > > Nick > > > > > > > > > > > I've attached a patch that fixes it, but truthfully, chaining seems > somewhat badly designed IMHO. > > > > > > It should let you pass along the result you've gotten so far, so that > you don't have CFLAA get PartialAlias, chain, and have BasicAA get > MayAlias. > > > > > > (IE it should let it work strictly down the hierarchy). > > > > > > Note that this patch breaks the CFLAliasAnalysis tests because it no > longer returns PartialAlias in some cases it is expecting it. > > > > > > I don't have time ATM to fix chaining completely, so if someone wants > to shepherd this patch through, that would be appreciated. > > > > > > > > > > > > > > > > > > > > > > On Tue, Jan 13, 2015 at 8:58 PM, Daniel Berlin < dberlin at dberlin.org > > wrote: > > > > Can you send me actual LLVM IR or a preprocessed source from using > -E? > I don't have a machine handy that has headers that target that arch. > > > > > > > > > > On Tue Jan 13 2015 at 4:33:29 PM Daniel Berlin < dberlin at dberlin.org > > wrote: > > > > Anything other than noalias or mustalias should be getting passed > down the stack, so either that is not happening or CFL aa is giving > better answers and something does worse with those better answers. > I'll take a look this evening > > > On Jan 13, 2015 3:58 PM, "Ana Pazos" < apazos at codeaurora.org > wrote: > > > > Hi folks, > > Moving the discussion to llvm.dev. > > None of the changes we talked earlier help. > > Find attached the C source code that you can use to reproduce the > issue. > > clang --target=aarch64-linux-gnu -c -mcpu=cortex-a57 -Ofast > -fno-math-errno test.c -S -o test.s -mllvm -debug-only=licm > LICM hoisting to while.body.lr.ph : %21 = load double** %arrayidx8, > align 8, !tbaa !5 > LICM hoisting to while.body.lr.ph : %arrayidx72 = getelementptr > inbounds double* %11, i64 1 > LICM hoisting to while.body.lr.ph : %arrayidx81 = getelementptr > inbounds double* %11, i64 2 > LICM hoisting to for.body.lr.ph : %2 = ptrtoint i32* %Index to i64 > > clang --target=aarch64-linux-gnu -c -mcpu=cortex-a57 -Ofast > -fno-math-errno test.c -S -o test-cflaa.s -mllvm -use-cfl-aa -mllvm > -debug-only=licm > LICM hoisting to for.body.lr.ph : %2 = ptrtoint i32* %Index to i64 > > Why CFL AA cannot allow hoisting this out: %21 = load double** > %arrayidx8, align 8, !tbaa !5 > > Which leads to this extra load in assembly code: ldr x14, [x4, x9, > lsl #3] > > Thanks! > Ana. > > -----Original Message----- > From: Hal Finkel [mailto: hfinkel at anl.gov ] > Sent: Tuesday, January 13, 2015 12:51 AM > To: Chandler Carruth > Cc: Jiangning Liu; Pazos, Ana; Ana Pazos; Daniel Berlin; George > Burgess IV > Subject: Re: question about enabling cfl-aa and collecting a57 > numbers > > ----- Original Message ----- > > From: "Chandler Carruth" < chandlerc at gmail.com > > > To: "Ana Pazos" < apazos at codeaurora.org >, "Daniel Berlin" < > > dberlin at dberlin.org >, "George Burgess IV" > > < george.burgess.iv at gmail.com >, "Hal Finkel" < hfinkel at anl.gov > > > Cc: "Jiangning Liu" < Jiangning.Liu at arm.com >, "Ana Pazos" > > < apazos at quicinc.com > > > Sent: Monday, January 12, 2015 6:43:44 PM > > Subject: Re: question about enabling cfl-aa and collecting a57 > > numbers > > > > Not quite: > > > > > > On Mon Jan 12 2015 at 4:27:30 PM Ana Pazos < apazos at codeaurora.org > > > > > wrote: > > > > Thanks George and Daniel, > > > > > > > > If the recommended order is “–basicaa –cfla-aa” it means we should > > fix > > the trunk code that processes the flags -mllvm > > use-cfl-aa-in-codegen > > and –mllvm use-cfl-aa, right? > > > > > > > > In Transforms/IPO/PassManagerBuilder.cpp and CodeGen/Passes.cpp we > > see > > this sequence: > > > > > > > > if (UseCFLAA) > > > > PM.add(createCFLAliasAnalysisPass()); > > > > PM.add(createTypeBasedAliasAnalysisPass()); > > > > PM.add(createScopedNoAliasAAPass()); > > > > PM.add(createBasicAliasAnalysisPass()); > > > > > > > > So are you recommending changing to the sequence below instead? > > > > > > Not quite: > > > > > > addPass(createTypeBasedAliasAnalysisPass()); > > > > addPass(createScopedNoAliasAAPass()); > > You want it here. > > > > addPass(createBasicAliasAnalysisPass()); > > > > if (UseCFLAA) > > > > addPass(createCFLAliasAnalysisPass()); > > > > > > The way to think about this is as follows: > > > > > > if TBAA or ScopedNoAlias *can* return results, you want them to. > > They > > are reflecting a-priori knowledge of aliasing. > > > > > > if both of those say "maybe", you want to ask CFL. If CFL says > > "maybe" > > you want to ask "BasicAA". > > I could very well be misunderstanding something, but I recall that we > add BasicAA last so that it will be queried first. We do this > because of our BasicAA delegation hack: We want a MustAlias from > BasicAA to override a NoAlias from TBAA, so that we can catch cases > of basic local type punning. Thus, assuming CFL delegates properly, > I fail to see how switching its order w.r.t. the metadata-based > analysis changes anything. > > -Hal > > > > > > > It's possible that you could as a compile-time hack flip the last > > two > > to your suggested ordering, but a) it should really only be a > > compile > > time hack, and b) we haven't really tested that BasicAA actually > > delegates rather than directly returning "maybe". > > > > > > Thanks, > > > > Ana. > > > > > > > > From: Daniel Berlin [mailto: dberlin at dberlin.org ] > > Sent: Friday, January 09, 2015 6:41 PM > > To: George Burgess IV; Hal Finkel > > Cc: Ana Pazos; Pazos, Ana; Jiangning Liu; chandlerc > > > > > > Subject: Re: question about enabling cfl-aa and collecting a57 > > numbers > > > > > > Right. I would not try with -cfl-aa -basicaa as the order, it > > should > > be the other way around. > > > > > > If you discover performance regressions with *that*, that would be > > highly interesting. > > > > > > On Fri Jan 09 2015 at 6:32:28 PM George Burgess IV < > > george.burgess.iv at gmail.com > wrote: > > > > > > > > Thanks for the notes and observations Ana! :) > > > > >> Hal, what do you think? Does it meant that cfl-aa is more > > >> conservative (note I set the right order -cfa-aa followed by > > >> -basicaa)? > > Currently, yes. The implementation as it is is focused heavily on > > being fast, and on trying to cover cases that BasicAA can’t easily > > cover itself. If our end goal is to use this as a replacement for > > BasicAA, there’s probably quite a few low-hanging fruit for > > accuracy > > improvements, and we can start playing with sacrificing speed in > > pursuit of greater accuracy. On the other hand, if the end goal is > > to > > append this pass to the list of AA passes already being performed, > > then I’d be interested in seeing the difference in perf between > > `-basicaa` and `-basicaa -cfl-aa`, if any. > > > > WRT this issue in particular, I believe CFLAA answers > > conservatively > > because: > > A. it doesn’t take into account constant pointer offsets (i.e. it > > considers a[0]..a[n] all as the same “address”) B. CFLAA is > > entirely > > context insensitive > > > > IIRC, BasicAA is somewhat context sensitive, and it does take into > > account constant pointer offsets, so it can be more accurate in > > this > > case. That being said, I don’t have access to the SPEC code, so I > > can > > only speculate. :) > > > > George > > > > > On Jan 9, 2015, at 4:25 PM, Hal Finkel < hfinkel at anl.gov > wrote: > > > > > > Ana, > > > > > > Thanks so much for looking into this. I'm adding some additional > > > relevant people to the CC line... > > > > > > -Hal > > > > > > ----- Original Message ----- > > >> From: "Ana Pazos" < apazos at codeaurora.org > > > >> To: "Ana Pazos" < apazos at quicinc.com >, "Hal Finkel" < > > >> hfinkel at anl.gov >, "Jiangning Liu" < Jiangning.Liu at arm.com > > > >> Sent: Friday, January 9, 2015 3:12:04 PM > > >> Subject: RE: question about enabling cfl-aa and collecting a57 > > >> numbers > > >> > > >> Hi Hal and Jiangning, > > >> > > >> I started to look at the effect of cfl-aa on SPEC. > > >> > > >> The first observation is that I see more redundant loads not > > >> being > > >> hoisted out of loops by LICM. > > >> > > >> Due to license restrictions, I cannot paste SPEC code here, but > > >> I > > >> think you have access to it. > > >> > > >> I tried to simplify a top function in equake (quake.c smvp) to > > >> show > > >> the issue. See the attached reduced test case. > > >> > > >> If you look at the the source code you will see something like: > > >> for { > > >> ...load v[i][0]... > > >> while { > > >> ...load v[i][0]... > > >> store w[col][0].... > > >> } > > >> ...load w[i][0]... > > >> store w[i][0]... > > >> } > > >> > > >> It should be possible to avoid load &v[i][0], &v[i][1], > > >> &vi[i][2] > > >> inside the while loop. > > >> > > >> In the simplified example that would be this instruction: > > >> %21 = load double** %arrayidx8, align 8, !tbaa !5 > > >> > > >> And the pointer for the load instruction: > > >> %arrayidx8 = getelementptr inbounds double** %v, i64 %idxprom > > >> > > >> The decision to hoist it fails at LICM.cpp:189: AliasSet &AS > > >> CurAST->getAliasSetForPointer(V, Size, AAInfo).isMod. > > >> > > >> With basicaa, only loads are in the alias set, but with cfl-aa > > >> you > > >> find loads and stores, so isMod returns true. > > >> > > >> AliasSet with basicaa: > > >> AliasSet[0x2b35aa0, 7] may alias, Ref Pointers: (double*** > > >> %arrayidx32, 8), (double** %12, 8), (double** %arrayidx36, 8), > > >> (double** %arrayidx8, 8), (double** %arrayidx68, 8), (double** > > >> %arrayidx77, 8), (double** %arrayidx85, 8) > > >> $5 = void > > >> > > >> AliasSet with cfa-aa: > > >> AliasSet[0x2b37df0, 20] may alias, Mod/Ref Pointers: (i32* > > >> %arrayidx30, 4), (double*** %arrayidx32, 8), (double** %12, 8), > > >> (double* %13, 8), (double** %arrayidx36, 8), (double* %15, 8), > > >> (double* %arrayidx42, 8), (double* %arrayidx45, 8), (double* > > >> %arrayidx51, 8), (double* %arrayidx54, 8), (double** %arrayidx8, > > >> 8), (double* %21, 8), (double** %arrayidx68, 8), (double* %23, > > >> 8), > > >> (double* %arrayidx72, 8), (double** %arrayidx77, 8), (double* > > >> %26, > > >> 8), (double* %arrayidx81, 8), (double** %arrayidx85, 8), > > >> (double* > > >> %29, 8) > > >> $8 = void > > >> > > >> You can reproduce it with the commands: > > >> opt -O3 -S -cfl-aa -basicaa -licm -debug-only=licm reduce.ll -o > > >> out > > >> > > >> opt -O3 -S - -basicaa -licm -debug-only=licm reduce.ll -o out > > >> LICM > > >> hoisting to while.body.lr.ph : %21 = load double** %arrayidx8, > > >> align 8, !tbaa !5 LICM hoisting to while.body.lr.ph : > > >> %arrayidx72 > > >> getelementptr inbounds double* %11, i64 1 LICM hoisting to > > >> while.body.lr.ph : %arrayidx81 = getelementptr inbounds double* > > >> %11, i64 2 > > >> > > >> Hal, what do you think? Does it meant that cfl-aa is more > > >> conservative (note I set the right order -cfa-aa followed by > > >> -basicaa)? Could it be an issue with building the AliasSet > > >> tracker? > > >> Any pointers on how to fix this? > > >> > > >> Thanks, > > >> Ana. > > >> > > >> > > >> > > >> -----Original Message----- > > >> From: Jiangning Liu [mailto: Jiangning.Liu at arm.com ] > > >> Sent: Wednesday, December 24, 2014 6:27 PM > > >> To: Hal Finkel; Ana Pazos > > >> Subject: RE: question about enabling cfl-aa and collecting a57 > > >> numbers > > >> > > >> Hi, > > >> > > >> I saw big regressions on cortex-A57 for the following SPEC > > >> benchmarks after using "-mllvm -use-cfl-aa ". > > >> > > >> spec.cpu2006.ref.462_libquantum 9.22% spec.cpu2000.ref.179_art > > >> 5.32% > > >> spec.cpu2000.ref.256_bzip2 4.91% > > >> > > >> Thanks, > > >> -Jiangning > > >> > > >>> -----Original Message----- > > >>> From: Hal Finkel [mailto: hfinkel at anl.gov ] > > >>> Sent: Wednesday, December 24, 2014 3:56 AM > > >>> To: Ana Pazos > > >>> Cc: Jiangning Liu > > >>> Subject: Re: question about enabling cfl-aa and collecting a57 > > >>> numbers > > >>> > > >>> ----- Original Message ----- > > >>>> From: "Ana Pazos" < apazos at codeaurora.org > > > >>>> To: "Hal Finkel" < hfinkel at anl.gov >, "Jiangning Liu" > > >>>> < Jiangning.Liu at arm.com > > > >>>> Sent: Monday, December 22, 2014 4:25:07 PM > > >>>> Subject: question about enabling cfl-aa and collecting a57 > > >>>> numbers > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> Hi Hal, > > >>>> > > >>>> > > >>>> > > >>>> Can you clarify to enable cfl-aa from clang we need both these > > >>>> flags? > > >>>> > > >>>> > > >>> > > >>> Hi Ana, > > >>> > > >>> Thanks for following-up on this! You only need (-mllvm > > >>> -use-cfl-aa) > > >>> to > > >>> use CFL AA during the main optimization pipeline, and just this > > >>> is > > >>> enough to reproduce the performance regressions as far as I > > >>> know. > > >>> You > > >>> can also use CFL AA during code generation (-mllvm > > >>> -use-cfl-aa-in-codegen), which matters only if your using AA at > > >>> all for code generation (which AArch64 does only for the > > >>> Cortex/A53). > > >>> > > >>> -Hal > > >>> > > >>>> > > >>>> -mllvm -use-cfl-aa > > >>>> > > >>>> -mllvm -cfl-aa-in-codegen > > >>>> > > >>>> > > >>>> > > >>>> Hi Jiangning, > > >>>> > > >>>> Did you collect a57 perf results for SPEC 2000 and 2006 > > >>>> enabling > > >>>> these two flags? > > >>>> > > >>>> Did you notice any correctness and performance regression? > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> Thanks, > > >>>> > > >>>> Ana. > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> Ana Pazos > > >>>> Qualcomm Innovation Center, Inc. > > >>>> The Qualcomm Innovation Center, Inc. is a member of the Code > > >>>> Aurora Forum, a Linux Foundation Collaborative Project. > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>> > > >>> -- > > >>> Hal Finkel > > >>> Assistant Computational Scientist > > >>> Leadership Computing Facility > > >>> Argonne National Laboratory > > >> > > >> > > >> -- IMPORTANT NOTICE: The contents of this email and any > > >> attachments > > >> are confidential and may also be privileged. If you are not the > > >> intended recipient, please notify the sender immediately and do > > >> not > > >> disclose the contents to any other person, use it for any > > >> purpose, > > >> or store or copy the information in any medium. Thank you. > > >> > > >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 > > >> 9NJ, Registered in England & Wales, Company No: 2557590 ARM > > >> Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 > > >> 9NJ, Registered in England & Wales, Company No: 2548782 > > >> > > > > > > -- > > > Hal Finkel > > > Assistant Computational Scientist > > > Leadership Computing Facility > > > Argonne National Laboratory > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > > > <cflaafix.diff> > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Daniel Berlin
2015-Jan-15 21:10 UTC
[LLVMdev] question about enabling cfl-aa and collecting a57 numbers
Yes. I've attached an updated patch that does the following: 1. Fixes the partialalias of globals/arguments 2. Enables partialalias for cases where nothing has been unified to a global/argument 3. Fixes that select was unifying the condition to the other pieces (the condition does not need to be processed :P). This was causing unnecessary aliasing. 4. Adds a regression test to must-and-partial 5. Fixes the pass ordering 6. Does not alias non-pointer arguments to random things :) 7. Fixes the CFL test cases that break with above. On Thu, Jan 15, 2015 at 11:33 AM, Ana Pazos <apazos at codeaurora.org> wrote:> Daniel, don’t we need to fix the order of invoking alias analyses in > lib/Transforms/IPO/PassManagerBuilder.cpp as well? > > > > Your patch fixed the order in lib/CodeGen/Passes.cpp and the delegation > code in lib/Analysis/CFLAliasAnalysis.cpp. > > > > Thanks, > > Ana. > > > > *From:* Daniel Berlin [mailto:dberlin at dberlin.org] > *Sent:* Wednesday, January 14, 2015 1:10 PM > *To:* Ana Pazos > *Cc:* George Burgess IV; Nick Lewycky; Jiangning Liu; LLVM Developers > Mailing List > > *Subject:* Re: [LLVMdev] question about enabling cfl-aa and collecting > a57 numbers > > > > Oh, sorry, i didn't rebase it when i changed the fix, you would have had > to apply the first on top of the second. > > Here is one against HEAD > > > > On Wed, Jan 14, 2015 at 12:32 PM, Ana Pazos <apazos at codeaurora.org> wrote: > > Daniel, your patch does not apply cleanly. Are you on the tip? > > The code I see there is no line if (QueryResult == MayAlias|| QueryResult == PartialAlias) to be removed. > > Thanks, > > Ana. > > > > *From:* George Burgess IV [mailto:george.burgess.iv at gmail.com] > *Sent:* Wednesday, January 14, 2015 10:31 AM > *To:* Daniel Berlin > *Cc:* Nick Lewycky; Ana Pazos; Jiangning Liu; LLVM Developers Mailing List > *Subject:* Re: [LLVMdev] question about enabling cfl-aa and collecting > a57 numbers > > > > Inline > > - George > > > On Jan 14, 2015, at 10:49 AM, Daniel Berlin <dberlin at dberlin.org> wrote: > > > > > > On Tue, Jan 13, 2015 at 11:26 PM, Nick Lewycky <nlewycky at google.com> > wrote: > > On 13 January 2015 at 22:11, Daniel Berlin <dberlin at dberlin.org> wrote: > > This is caused by CFLAA returning PartialAlias for a query that BasicAA > can prove is NoAlias. > > > > One of them is wrong. Which one? > > > > CFL-AA. > > > > Right now it checks whether two things come out to be the same stratified > info and have the same index, and if so, returns PartialAlias. This is > wrong, because it never knows why two things got unified, only that they > did. > > > > Therefore, the current check it does where it returns PartialAlias seems > wrong. > > > > George, why does it return PartialAlias? > > It shouldn't -- unless I misinterpreted an earlier discussion, I was under > the impression that we were testing with that line updated to return > MayAlias. We do need to update that in LLVM though; thanks for the fix. > > > > IMHO, you can't ever prove any kind of must-alias or partialalias info > with CFL-AA, unless i'm missing something. > > Agreed. > > > > > > I'm not sure from your description that this is a chaining issue. > PartialAlias doesn't chain and isn't supposed to, it's a final answer just > like NoAlias and MustAlias are. You return partial alias when you have > proven that the accesses are overlapping, meaning that it is proven to > neither be no alias nor must alias. > > Well, it's still chaining wrong, because it did not change on MayAlias, > but you are right that it is not the issue i identified :) > > > > Anyway, updated patch attached. > > > > > > Nick > > > > I've attached a patch that fixes it, but truthfully, chaining seems > somewhat badly designed IMHO. > > > > It should let you pass along the result you've gotten so far, so that you > don't have CFLAA get PartialAlias, chain, and have BasicAA get MayAlias. > > > > (IE it should let it work strictly down the hierarchy). > > > > Note that this patch breaks the CFLAliasAnalysis tests because it no > longer returns PartialAlias in some cases it is expecting it. > > > > I don't have time ATM to fix chaining completely, so if someone wants to > shepherd this patch through, that would be appreciated. > > > > > > > > > > > > On Tue, Jan 13, 2015 at 8:58 PM, Daniel Berlin <dberlin at dberlin.org> > wrote: > > Can you send me actual LLVM IR or a preprocessed source from using -E? > I don't have a machine handy that has headers that target that arch. > > > > > > On Tue Jan 13 2015 at 4:33:29 PM Daniel Berlin <dberlin at dberlin.org> > wrote: > > Anything other than noalias or mustalias should be getting passed down the > stack, so either that is not happening or CFL aa is giving better answers > and something does worse with those better answers. I'll take a look this > evening > > On Jan 13, 2015 3:58 PM, "Ana Pazos" <apazos at codeaurora.org> wrote: > > Hi folks, > > Moving the discussion to llvm.dev. > > None of the changes we talked earlier help. > > Find attached the C source code that you can use to reproduce the issue. > > clang --target=aarch64-linux-gnu -c -mcpu=cortex-a57 -Ofast > -fno-math-errno test.c -S -o test.s -mllvm -debug-only=licm > LICM hoisting to while.body.lr.ph: %21 = load double** %arrayidx8, > align 8, !tbaa !5 > LICM hoisting to while.body.lr.ph: %arrayidx72 = getelementptr inbounds > double* %11, i64 1 > LICM hoisting to while.body.lr.ph: %arrayidx81 = getelementptr inbounds > double* %11, i64 2 > LICM hoisting to for.body.lr.ph: %2 = ptrtoint i32* %Index to i64 > > clang --target=aarch64-linux-gnu -c -mcpu=cortex-a57 -Ofast > -fno-math-errno test.c -S -o test-cflaa.s -mllvm -use-cfl-aa -mllvm > -debug-only=licm > LICM hoisting to for.body.lr.ph: %2 = ptrtoint i32* %Index to i64 > > Why CFL AA cannot allow hoisting this out: %21 = load double** > %arrayidx8, align 8, !tbaa !5 > > Which leads to this extra load in assembly code: ldr x14, [x4, x9, lsl > #3] > > Thanks! > Ana. > > -----Original Message----- > From: Hal Finkel [mailto:hfinkel at anl.gov] > Sent: Tuesday, January 13, 2015 12:51 AM > To: Chandler Carruth > Cc: Jiangning Liu; Pazos, Ana; Ana Pazos; Daniel Berlin; George Burgess IV > Subject: Re: question about enabling cfl-aa and collecting a57 numbers > > ----- Original Message ----- > > From: "Chandler Carruth" <chandlerc at gmail.com> > > To: "Ana Pazos" <apazos at codeaurora.org>, "Daniel Berlin" < > dberlin at dberlin.org>, "George Burgess IV" > > <george.burgess.iv at gmail.com>, "Hal Finkel" <hfinkel at anl.gov> > > Cc: "Jiangning Liu" <Jiangning.Liu at arm.com>, "Ana Pazos" > > <apazos at quicinc.com> > > Sent: Monday, January 12, 2015 6:43:44 PM > > Subject: Re: question about enabling cfl-aa and collecting a57 numbers > > > > Not quite: > > > > > > On Mon Jan 12 2015 at 4:27:30 PM Ana Pazos < apazos at codeaurora.org > > > wrote: > > > > Thanks George and Daniel, > > > > > > > > If the recommended order is “–basicaa –cfla-aa” it means we should fix > > the trunk code that processes the flags -mllvm use-cfl-aa-in-codegen > > and –mllvm use-cfl-aa, right? > > > > > > > > In Transforms/IPO/PassManagerBuilder.cpp and CodeGen/Passes.cpp we see > > this sequence: > > > > > > > > if (UseCFLAA) > > > > PM.add(createCFLAliasAnalysisPass()); > > > > PM.add(createTypeBasedAliasAnalysisPass()); > > > > PM.add(createScopedNoAliasAAPass()); > > > > PM.add(createBasicAliasAnalysisPass()); > > > > > > > > So are you recommending changing to the sequence below instead? > > > > > > Not quite: > > > > > > addPass(createTypeBasedAliasAnalysisPass()); > > > > addPass(createScopedNoAliasAAPass()); > > You want it here. > > > > addPass(createBasicAliasAnalysisPass()); > > > > if (UseCFLAA) > > > > addPass(createCFLAliasAnalysisPass()); > > > > > > The way to think about this is as follows: > > > > > > if TBAA or ScopedNoAlias *can* return results, you want them to. They > > are reflecting a-priori knowledge of aliasing. > > > > > > if both of those say "maybe", you want to ask CFL. If CFL says "maybe" > > you want to ask "BasicAA". > > I could very well be misunderstanding something, but I recall that we add > BasicAA last so that it will be queried first. We do this because of our > BasicAA delegation hack: We want a MustAlias from BasicAA to override a > NoAlias from TBAA, so that we can catch cases of basic local type punning. > Thus, assuming CFL delegates properly, I fail to see how switching its > order w.r.t. the metadata-based analysis changes anything. > > -Hal > > > > > > > It's possible that you could as a compile-time hack flip the last two > > to your suggested ordering, but a) it should really only be a compile > > time hack, and b) we haven't really tested that BasicAA actually > > delegates rather than directly returning "maybe". > > > > > > Thanks, > > > > Ana. > > > > > > > > From: Daniel Berlin [mailto: dberlin at dberlin.org ] > > Sent: Friday, January 09, 2015 6:41 PM > > To: George Burgess IV; Hal Finkel > > Cc: Ana Pazos; Pazos, Ana; Jiangning Liu; chandlerc > > > > > > Subject: Re: question about enabling cfl-aa and collecting a57 numbers > > > > > > Right. I would not try with -cfl-aa -basicaa as the order, it should > > be the other way around. > > > > > > If you discover performance regressions with *that*, that would be > > highly interesting. > > > > > > On Fri Jan 09 2015 at 6:32:28 PM George Burgess IV < > > george.burgess.iv at gmail.com > wrote: > > > > > > > > Thanks for the notes and observations Ana! :) > > > > >> Hal, what do you think? Does it meant that cfl-aa is more > > >> conservative (note I set the right order -cfa-aa followed by > > >> -basicaa)? > > Currently, yes. The implementation as it is is focused heavily on > > being fast, and on trying to cover cases that BasicAA can’t easily > > cover itself. If our end goal is to use this as a replacement for > > BasicAA, there’s probably quite a few low-hanging fruit for accuracy > > improvements, and we can start playing with sacrificing speed in > > pursuit of greater accuracy. On the other hand, if the end goal is to > > append this pass to the list of AA passes already being performed, > > then I’d be interested in seeing the difference in perf between > > `-basicaa` and `-basicaa -cfl-aa`, if any. > > > > WRT this issue in particular, I believe CFLAA answers conservatively > > because: > > A. it doesn’t take into account constant pointer offsets (i.e. it > > considers a[0]..a[n] all as the same “address”) B. CFLAA is entirely > > context insensitive > > > > IIRC, BasicAA is somewhat context sensitive, and it does take into > > account constant pointer offsets, so it can be more accurate in this > > case. That being said, I don’t have access to the SPEC code, so I can > > only speculate. :) > > > > George > > > > > On Jan 9, 2015, at 4:25 PM, Hal Finkel < hfinkel at anl.gov > wrote: > > > > > > Ana, > > > > > > Thanks so much for looking into this. I'm adding some additional > > > relevant people to the CC line... > > > > > > -Hal > > > > > > ----- Original Message ----- > > >> From: "Ana Pazos" < apazos at codeaurora.org > > > >> To: "Ana Pazos" < apazos at quicinc.com >, "Hal Finkel" < > > >> hfinkel at anl.gov >, "Jiangning Liu" < Jiangning.Liu at arm.com > > > >> Sent: Friday, January 9, 2015 3:12:04 PM > > >> Subject: RE: question about enabling cfl-aa and collecting a57 > > >> numbers > > >> > > >> Hi Hal and Jiangning, > > >> > > >> I started to look at the effect of cfl-aa on SPEC. > > >> > > >> The first observation is that I see more redundant loads not being > > >> hoisted out of loops by LICM. > > >> > > >> Due to license restrictions, I cannot paste SPEC code here, but I > > >> think you have access to it. > > >> > > >> I tried to simplify a top function in equake (quake.c smvp) to show > > >> the issue. See the attached reduced test case. > > >> > > >> If you look at the the source code you will see something like: > > >> for { > > >> ...load v[i][0]... > > >> while { > > >> ...load v[i][0]... > > >> store w[col][0].... > > >> } > > >> ...load w[i][0]... > > >> store w[i][0]... > > >> } > > >> > > >> It should be possible to avoid load &v[i][0], &v[i][1], &vi[i][2] > > >> inside the while loop. > > >> > > >> In the simplified example that would be this instruction: > > >> %21 = load double** %arrayidx8, align 8, !tbaa !5 > > >> > > >> And the pointer for the load instruction: > > >> %arrayidx8 = getelementptr inbounds double** %v, i64 %idxprom > > >> > > >> The decision to hoist it fails at LICM.cpp:189: AliasSet &AS > > >> CurAST->getAliasSetForPointer(V, Size, AAInfo).isMod. > > >> > > >> With basicaa, only loads are in the alias set, but with cfl-aa you > > >> find loads and stores, so isMod returns true. > > >> > > >> AliasSet with basicaa: > > >> AliasSet[0x2b35aa0, 7] may alias, Ref Pointers: (double*** > > >> %arrayidx32, 8), (double** %12, 8), (double** %arrayidx36, 8), > > >> (double** %arrayidx8, 8), (double** %arrayidx68, 8), (double** > > >> %arrayidx77, 8), (double** %arrayidx85, 8) > > >> $5 = void > > >> > > >> AliasSet with cfa-aa: > > >> AliasSet[0x2b37df0, 20] may alias, Mod/Ref Pointers: (i32* > > >> %arrayidx30, 4), (double*** %arrayidx32, 8), (double** %12, 8), > > >> (double* %13, 8), (double** %arrayidx36, 8), (double* %15, 8), > > >> (double* %arrayidx42, 8), (double* %arrayidx45, 8), (double* > > >> %arrayidx51, 8), (double* %arrayidx54, 8), (double** %arrayidx8, > > >> 8), (double* %21, 8), (double** %arrayidx68, 8), (double* %23, 8), > > >> (double* %arrayidx72, 8), (double** %arrayidx77, 8), (double* %26, > > >> 8), (double* %arrayidx81, 8), (double** %arrayidx85, 8), (double* > > >> %29, 8) > > >> $8 = void > > >> > > >> You can reproduce it with the commands: > > >> opt -O3 -S -cfl-aa -basicaa -licm -debug-only=licm reduce.ll -o out > > >> > > >> opt -O3 -S - -basicaa -licm -debug-only=licm reduce.ll -o out LICM > > >> hoisting to while.body.lr.ph : %21 = load double** %arrayidx8, > > >> align 8, !tbaa !5 LICM hoisting to while.body.lr.ph : %arrayidx72 > > >> getelementptr inbounds double* %11, i64 1 LICM hoisting to > > >> while.body.lr.ph : %arrayidx81 = getelementptr inbounds double* > > >> %11, i64 2 > > >> > > >> Hal, what do you think? Does it meant that cfl-aa is more > > >> conservative (note I set the right order -cfa-aa followed by > > >> -basicaa)? Could it be an issue with building the AliasSet tracker? > > >> Any pointers on how to fix this? > > >> > > >> Thanks, > > >> Ana. > > >> > > >> > > >> > > >> -----Original Message----- > > >> From: Jiangning Liu [mailto: Jiangning.Liu at arm.com ] > > >> Sent: Wednesday, December 24, 2014 6:27 PM > > >> To: Hal Finkel; Ana Pazos > > >> Subject: RE: question about enabling cfl-aa and collecting a57 > > >> numbers > > >> > > >> Hi, > > >> > > >> I saw big regressions on cortex-A57 for the following SPEC > > >> benchmarks after using "-mllvm -use-cfl-aa ". > > >> > > >> spec.cpu2006.ref.462_libquantum 9.22% spec.cpu2000.ref.179_art > > >> 5.32% > > >> spec.cpu2000.ref.256_bzip2 4.91% > > >> > > >> Thanks, > > >> -Jiangning > > >> > > >>> -----Original Message----- > > >>> From: Hal Finkel [mailto: hfinkel at anl.gov ] > > >>> Sent: Wednesday, December 24, 2014 3:56 AM > > >>> To: Ana Pazos > > >>> Cc: Jiangning Liu > > >>> Subject: Re: question about enabling cfl-aa and collecting a57 > > >>> numbers > > >>> > > >>> ----- Original Message ----- > > >>>> From: "Ana Pazos" < apazos at codeaurora.org > > > >>>> To: "Hal Finkel" < hfinkel at anl.gov >, "Jiangning Liu" > > >>>> < Jiangning.Liu at arm.com > > > >>>> Sent: Monday, December 22, 2014 4:25:07 PM > > >>>> Subject: question about enabling cfl-aa and collecting a57 > > >>>> numbers > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> Hi Hal, > > >>>> > > >>>> > > >>>> > > >>>> Can you clarify to enable cfl-aa from clang we need both these > > >>>> flags? > > >>>> > > >>>> > > >>> > > >>> Hi Ana, > > >>> > > >>> Thanks for following-up on this! You only need (-mllvm > > >>> -use-cfl-aa) > > >>> to > > >>> use CFL AA during the main optimization pipeline, and just this is > > >>> enough to reproduce the performance regressions as far as I know. > > >>> You > > >>> can also use CFL AA during code generation (-mllvm > > >>> -use-cfl-aa-in-codegen), which matters only if your using AA at > > >>> all for code generation (which AArch64 does only for the > > >>> Cortex/A53). > > >>> > > >>> -Hal > > >>> > > >>>> > > >>>> -mllvm -use-cfl-aa > > >>>> > > >>>> -mllvm -cfl-aa-in-codegen > > >>>> > > >>>> > > >>>> > > >>>> Hi Jiangning, > > >>>> > > >>>> Did you collect a57 perf results for SPEC 2000 and 2006 enabling > > >>>> these two flags? > > >>>> > > >>>> Did you notice any correctness and performance regression? > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> Thanks, > > >>>> > > >>>> Ana. > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> Ana Pazos > > >>>> Qualcomm Innovation Center, Inc. > > >>>> The Qualcomm Innovation Center, Inc. is a member of the Code > > >>>> Aurora Forum, a Linux Foundation Collaborative Project. > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>> > > >>> -- > > >>> Hal Finkel > > >>> Assistant Computational Scientist > > >>> Leadership Computing Facility > > >>> Argonne National Laboratory > > >> > > >> > > >> -- IMPORTANT NOTICE: The contents of this email and any attachments > > >> are confidential and may also be privileged. If you are not the > > >> intended recipient, please notify the sender immediately and do not > > >> disclose the contents to any other person, use it for any purpose, > > >> or store or copy the information in any medium. Thank you. > > >> > > >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 > > >> 9NJ, Registered in England & Wales, Company No: 2557590 ARM > > >> Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 > > >> 9NJ, Registered in England & Wales, Company No: 2548782 > > >> > > > > > > -- > > > Hal Finkel > > > Assistant Computational Scientist > > > Leadership Computing Facility > > > Argonne National Laboratory > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory > > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > <cflaafix.diff> > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150115/17ec90ed/attachment.html> -------------- next part -------------- diff --git a/lib/Analysis/CFLAliasAnalysis.cpp b/lib/Analysis/CFLAliasAnalysis.cpp index 88bb84a..95fd377 100644 --- a/lib/Analysis/CFLAliasAnalysis.cpp +++ b/lib/Analysis/CFLAliasAnalysis.cpp @@ -227,10 +227,13 @@ public: // Comparisons between global variables and other constants should be // handled by BasicAA. if (isa<Constant>(LocA.Ptr) && isa<Constant>(LocB.Ptr)) { - return MayAlias; + return AliasAnalysis::alias(LocA, LocB); } + AliasResult QueryResult = query(LocA, LocB); + if (QueryResult == MayAlias) + return AliasAnalysis::alias(LocA, LocB); - return query(LocA, LocB); + return QueryResult; } void initializePass() override { InitializeAliasAnalysis(this); } @@ -295,8 +298,9 @@ public: } void visitSelectInst(SelectInst &Inst) { - auto *Condition = Inst.getCondition(); - Output.push_back(Edge(&Inst, Condition, EdgeType::Assign, AttrNone)); + // Condition is irrelevant, it is evaluated, but not loaded, + // stored, or assigned + // The results of the select are assigned auto *TrueVal = Inst.getTrueValue(); Output.push_back(Edge(&Inst, TrueVal, EdgeType::Assign, AttrNone)); auto *FalseVal = Inst.getFalseValue(); @@ -768,7 +772,10 @@ static Optional<StratifiedAttr> valueToAttrIndex(Value *Val) { return AttrGlobalIndex; if (auto *Arg = dyn_cast<Argument>(Val)) - if (!Arg->hasNoAliasAttr()) + // Only pointer arguments should have the argument attribute, + // because things can't escape through scalars without us seeing a + // cast, and thus, interaction with them doesn't matter. + if (!Arg->hasNoAliasAttr() && Arg->getType()->isPointerTy()) return argNumberToAttrIndex(Arg->getArgNo()); return NoneType(); } @@ -991,10 +998,6 @@ CFLAliasAnalysis::query(const AliasAnalysis::Location &LocA, auto SetA = *MaybeA; auto SetB = *MaybeB; - - if (SetA.Index == SetB.Index) - return AliasAnalysis::PartialAlias; - auto AttrsA = Sets.getLink(SetA.Index).Attrs; auto AttrsB = Sets.getLink(SetB.Index).Attrs; // Stratified set attributes are used as markets to signify whether a member @@ -1009,5 +1012,12 @@ CFLAliasAnalysis::query(const AliasAnalysis::Location &LocA, if (AttrsA.any() && AttrsB.any()) return AliasAnalysis::MayAlias; + // If we haven't interacted with globals, and they still got + // unified, they are partial alias right now. + // NOTE: This assumes we don't unify for any reason other than + // assignments + if (SetA.Index == SetB.Index) + return AliasAnalysis::PartialAlias; + return AliasAnalysis::NoAlias; } diff --git a/lib/CodeGen/Passes.cpp b/lib/CodeGen/Passes.cpp index 28e9f84..82ca024 100644 --- a/lib/CodeGen/Passes.cpp +++ b/lib/CodeGen/Passes.cpp @@ -400,10 +400,10 @@ void TargetPassConfig::addIRPasses() { // Add TypeBasedAliasAnalysis before BasicAliasAnalysis so that // BasicAliasAnalysis wins if they disagree. This is intended to help // support "obvious" type-punning idioms. - if (UseCFLAA) - addPass(createCFLAliasAnalysisPass()); addPass(createTypeBasedAliasAnalysisPass()); addPass(createScopedNoAliasAAPass()); + if (UseCFLAA) + addPass(createCFLAliasAnalysisPass()); addPass(createBasicAliasAnalysisPass()); // Before running any passes, run the verifier to determine if the input diff --git a/lib/Transforms/IPO/PassManagerBuilder.cpp b/lib/Transforms/IPO/PassManagerBuilder.cpp index bb776ef..645686c 100644 --- a/lib/Transforms/IPO/PassManagerBuilder.cpp +++ b/lib/Transforms/IPO/PassManagerBuilder.cpp @@ -132,10 +132,10 @@ PassManagerBuilder::addInitialAliasAnalysisPasses(PassManagerBase &PM) const { // Add TypeBasedAliasAnalysis before BasicAliasAnalysis so that // BasicAliasAnalysis wins if they disagree. This is intended to help // support "obvious" type-punning idioms. - if (UseCFLAA) - PM.add(createCFLAliasAnalysisPass()); PM.add(createTypeBasedAliasAnalysisPass()); PM.add(createScopedNoAliasAAPass()); + if (UseCFLAA) + PM.add(createCFLAliasAnalysisPass()); PM.add(createBasicAliasAnalysisPass()); } diff --git a/test/Analysis/CFLAliasAnalysis/gep-signed-arithmetic.ll b/test/Analysis/CFLAliasAnalysis/gep-signed-arithmetic.ll index a0195d7..f68edaf 100644 --- a/test/Analysis/CFLAliasAnalysis/gep-signed-arithmetic.ll +++ b/test/Analysis/CFLAliasAnalysis/gep-signed-arithmetic.ll @@ -5,7 +5,8 @@ target datalayout = "e-p:32:32:32" ; CHECK: 1 partial alias response -define i32 @test(i32* %tab, i32 %indvar) nounwind { +define i32 @test(i32 %indvar) nounwind { + %tab = alloca i32, align 4 %tmp31 = mul i32 %indvar, -2 %tmp32 = add i32 %tmp31, 30 %t.5 = getelementptr i32* %tab, i32 %tmp32 diff --git a/test/Analysis/CFLAliasAnalysis/must-and-partial.ll b/test/Analysis/CFLAliasAnalysis/must-and-partial.ll index df7de38..2c66d90 100644 --- a/test/Analysis/CFLAliasAnalysis/must-and-partial.ll +++ b/test/Analysis/CFLAliasAnalysis/must-and-partial.ll @@ -7,8 +7,9 @@ target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" ; CHECK: PartialAlias: i16* %bigbase0, i8* %phi -define i8 @test0(i8* %base, i1 %x) { +define i8 @test0(i1 %x) { entry: + %base = alloca i8, align 4 %baseplusone = getelementptr i8* %base, i64 1 br i1 %x, label %red, label %green red: @@ -25,8 +26,9 @@ green: } ; CHECK: PartialAlias: i16* %bigbase1, i8* %sel -define i8 @test1(i8* %base, i1 %x) { +define i8 @test1(i1 %x) { entry: + %base = alloca i8, align 4 %baseplusone = getelementptr i8* %base, i64 1 %sel = select i1 %x, i8* %baseplusone, i8* %base store i8 0, i8* %sel @@ -37,3 +39,15 @@ entry: %loaded = load i8* %sel ret i8 %loaded } + +; Arguments should not be partial alias +; CHECK: MayAlias: double* %A, double* %Index +define void @testr2(double* nocapture readonly %A, double* nocapture readonly %Index) { + %arrayidx22 = getelementptr inbounds double* %Index, i64 2 + %1 = load double* %arrayidx22 + %arrayidx25 = getelementptr inbounds double* %A, i64 2 + %2 = load double* %arrayidx25 + %mul26 = fmul double %1, %2 + ret void +} +