thr3ads.net - llvm dev - [LLVMdev] question about enabling cfl-aa and collecting a57 numbers [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Nick Lewycky

2015-Jan-14 07:26 UTC

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

On 13 January 2015 at 22:11, Daniel Berlin <dberlin at dberlin.org> wrote:
> This is caused by CFLAA returning PartialAlias for a query that BasicAA
> can prove is NoAlias.
>
One of them is wrong. Which one?

I'm not sure from your description that this is a chaining issue.
PartialAlias doesn't chain and isn't supposed to, it's a final
answer just
like NoAlias and MustAlias are. You return partial alias when you have
proven that the accesses are overlapping, meaning that it is proven to
neither be no alias nor must alias.

Nick

I've attached a patch that fixes it, but truthfully, chaining
seems> somewhat badly designed IMHO.
>
> It should let you pass along the result you've gotten so far, so that
you
> don't have CFLAA get PartialAlias, chain, and have BasicAA get
MayAlias.
>
> (IE it should let it work strictly down the hierarchy).
>
> Note that this patch breaks the CFLAliasAnalysis tests because it no
> longer returns PartialAlias in some cases it is expecting it.
>
> I don't have time ATM to fix chaining completely, so if someone wants
to
> shepherd this patch through, that would be appreciated.
>
>
>
>
>
> On Tue, Jan 13, 2015 at 8:58 PM, Daniel Berlin <dberlin at
dberlin.org>
> wrote:
>
>> Can you send me actual LLVM IR or a preprocessed source from using -E?
>> I don't have a machine handy that has headers that target that
arch.
>>
>>
>> On Tue Jan 13 2015 at 4:33:29 PM Daniel Berlin <dberlin at
dberlin.org>
>> wrote:
>>
>>> Anything other than noalias or mustalias should be getting passed
down
>>> the stack, so either that is not happening or CFL aa is giving
better
>>> answers and something does worse with those better answers. 
I'll take a
>>> look this evening
>>> On Jan 13, 2015 3:58 PM, "Ana Pazos" <apazos at
codeaurora.org> wrote:
>>>
>>>> Hi folks,
>>>>
>>>> Moving the discussion to llvm.dev.
>>>>
>>>> None of the changes we talked earlier help.
>>>>
>>>> Find attached the C source code that you can use to reproduce
the issue.
>>>>
>>>> clang --target=aarch64-linux-gnu  -c -mcpu=cortex-a57 -Ofast
>>>>  -fno-math-errno   test.c -S -o test.s  -mllvm -debug-only=licm
>>>> LICM hoisting to while.body.lr.ph:   %21 = load double**
%arrayidx8,
>>>> align 8, !tbaa !5
>>>> LICM hoisting to while.body.lr.ph:   %arrayidx72 =
getelementptr
>>>> inbounds double* %11, i64 1
>>>> LICM hoisting to while.body.lr.ph:   %arrayidx81 =
getelementptr
>>>> inbounds double* %11, i64 2
>>>> LICM hoisting to for.body.lr.ph:   %2 = ptrtoint i32* %Index to
i64
>>>>
>>>> clang --target=aarch64-linux-gnu  -c   -mcpu=cortex-a57  -Ofast
>>>>  -fno-math-errno   test.c -S -o test-cflaa.s -mllvm -use-cfl-aa
-mllvm
>>>> -debug-only=licm
>>>> LICM hoisting to for.body.lr.ph:   %2 = ptrtoint i32* %Index to
i64
>>>>
>>>> Why CFL AA cannot allow hoisting this out:  %21 = load double**
>>>> %arrayidx8, align 8, !tbaa !5
>>>>
>>>> Which leads to this extra load in assembly code: ldr    x14,
[x4, x9,
>>>> lsl #3]
>>>>
>>>> Thanks!
>>>> Ana.
>>>>
>>>> -----Original Message-----
>>>> From: Hal Finkel [mailto:hfinkel at anl.gov]
>>>> Sent: Tuesday, January 13, 2015 12:51 AM
>>>> To: Chandler Carruth
>>>> Cc: Jiangning Liu; Pazos, Ana; Ana Pazos; Daniel Berlin; George
Burgess
>>>> IV
>>>> Subject: Re: question about enabling cfl-aa and collecting a57
numbers
>>>>
>>>> ----- Original Message -----
>>>> > From: "Chandler Carruth" <chandlerc at
gmail.com>
>>>> > To: "Ana Pazos" <apazos at
codeaurora.org>, "Daniel Berlin" <
>>>> dberlin at dberlin.org>, "George Burgess IV"
>>>> > <george.burgess.iv at gmail.com>, "Hal
Finkel" <hfinkel at anl.gov>
>>>> > Cc: "Jiangning Liu" <Jiangning.Liu at
arm.com>, "Ana Pazos"
>>>> > <apazos at quicinc.com>
>>>> > Sent: Monday, January 12, 2015 6:43:44 PM
>>>> > Subject: Re: question about enabling cfl-aa and collecting
a57 numbers
>>>> >
>>>> > Not quite:
>>>> >
>>>> >
>>>> > On Mon Jan 12 2015 at 4:27:30 PM Ana Pazos < apazos at
codeaurora.org >
>>>> > wrote:
>>>> >
>>>> > Thanks George and Daniel,
>>>> >
>>>> >
>>>> >
>>>> > If the recommended order is “–basicaa –cfla-aa” it means
we should fix
>>>> > the trunk code that processes the flags -mllvm
use-cfl-aa-in-codegen
>>>> > and –mllvm use-cfl-aa, right?
>>>> >
>>>> >
>>>> >
>>>> > In Transforms/IPO/PassManagerBuilder.cpp and
CodeGen/Passes.cpp we
>>>> see
>>>> > this sequence:
>>>> >
>>>> >
>>>> >
>>>> > if (UseCFLAA)
>>>> >
>>>> > PM.add(createCFLAliasAnalysisPass());
>>>> >
>>>> > PM.add(createTypeBasedAliasAnalysisPass());
>>>> >
>>>> > PM.add(createScopedNoAliasAAPass());
>>>> >
>>>> > PM.add(createBasicAliasAnalysisPass());
>>>> >
>>>> >
>>>> >
>>>> > So are you recommending changing to the sequence below
instead?
>>>> >
>>>> >
>>>> > Not quite:
>>>> >
>>>> >
>>>> > addPass(createTypeBasedAliasAnalysisPass());
>>>> >
>>>> > addPass(createScopedNoAliasAAPass());
>>>> > You want it here.
>>>> >
>>>> > addPass(createBasicAliasAnalysisPass());
>>>> >
>>>> > if (UseCFLAA)
>>>> >
>>>> > addPass(createCFLAliasAnalysisPass());
>>>> >
>>>> >
>>>> > The way to think about this is as follows:
>>>> >
>>>> >
>>>> > if TBAA or ScopedNoAlias *can* return results, you want
them to. They
>>>> > are reflecting a-priori knowledge of aliasing.
>>>> >
>>>> >
>>>> > if both of those say "maybe", you want to ask
CFL. If CFL says "maybe"
>>>> > you want to ask "BasicAA".
>>>>
>>>> I could very well be misunderstanding something, but I recall
that we
>>>> add BasicAA last so that it will be queried first. We do this
because of
>>>> our BasicAA delegation hack: We want a MustAlias from BasicAA
to override a
>>>> NoAlias from TBAA, so that we can catch cases of basic local
type punning.
>>>> Thus, assuming CFL delegates properly, I fail to see how
switching its
>>>> order w.r.t. the metadata-based analysis changes anything.
>>>>
>>>>  -Hal
>>>>
>>>> >
>>>> >
>>>> > It's possible that you could as a compile-time hack
flip the last two
>>>> > to your suggested ordering, but a) it should really only
be a compile
>>>> > time hack, and b) we haven't really tested that
BasicAA actually
>>>> > delegates rather than directly returning
"maybe".
>>>> >
>>>> >
>>>> > Thanks,
>>>> >
>>>> > Ana.
>>>> >
>>>> >
>>>> >
>>>> > From: Daniel Berlin [mailto: dberlin at dberlin.org ]
>>>> > Sent: Friday, January 09, 2015 6:41 PM
>>>> > To: George Burgess IV; Hal Finkel
>>>> > Cc: Ana Pazos; Pazos, Ana; Jiangning Liu; chandlerc
>>>> >
>>>> >
>>>> > Subject: Re: question about enabling cfl-aa and collecting
a57 numbers
>>>> >
>>>> >
>>>> > Right. I would not try with -cfl-aa -basicaa as the order,
it should
>>>> > be the other way around.
>>>> >
>>>> >
>>>> > If you discover performance regressions with *that*, that
would be
>>>> > highly interesting.
>>>> >
>>>> >
>>>> > On Fri Jan 09 2015 at 6:32:28 PM George Burgess IV <
>>>> > george.burgess.iv at gmail.com > wrote:
>>>> >
>>>> >
>>>> >
>>>> > Thanks for the notes and observations Ana! :)
>>>> >
>>>> > >> Hal, what do you think? Does it meant that cfl-aa
is more
>>>> > >> conservative (note I set the right order -cfa-aa
followed by
>>>> > >> -basicaa)?
>>>> > Currently, yes. The implementation as it is is focused
heavily on
>>>> > being fast, and on trying to cover cases that BasicAA
can’t easily
>>>> > cover itself. If our end goal is to use this as a
replacement for
>>>> > BasicAA, there’s probably quite a few low-hanging fruit
for accuracy
>>>> > improvements, and we can start playing with sacrificing
speed in
>>>> > pursuit of greater accuracy. On the other hand, if the end
goal is to
>>>> > append this pass to the list of AA passes already being
performed,
>>>> > then I’d be interested in seeing the difference in perf
between
>>>> > `-basicaa` and `-basicaa -cfl-aa`, if any.
>>>> >
>>>> > WRT this issue in particular, I believe CFLAA answers
conservatively
>>>> > because:
>>>> > A. it doesn’t take into account constant pointer offsets
(i.e. it
>>>> > considers a[0]..a[n] all as the same “address”) B. CFLAA
is entirely
>>>> > context insensitive
>>>> >
>>>> > IIRC, BasicAA is somewhat context sensitive, and it does
take into
>>>> > account constant pointer offsets, so it can be more
accurate in this
>>>> > case. That being said, I don’t have access to the SPEC
code, so I can
>>>> > only speculate. :)
>>>> >
>>>> > George
>>>> >
>>>> > > On Jan 9, 2015, at 4:25 PM, Hal Finkel < hfinkel
at anl.gov > wrote:
>>>> > >
>>>> > > Ana,
>>>> > >
>>>> > > Thanks so much for looking into this. I'm adding
some additional
>>>> > > relevant people to the CC line...
>>>> > >
>>>> > > -Hal
>>>> > >
>>>> > > ----- Original Message -----
>>>> > >> From: "Ana Pazos" < apazos at
codeaurora.org >
>>>> > >> To: "Ana Pazos" < apazos at
quicinc.com >, "Hal Finkel" <
>>>> > >> hfinkel at anl.gov >, "Jiangning
Liu" < Jiangning.Liu at arm.com >
>>>> > >> Sent: Friday, January 9, 2015 3:12:04 PM
>>>> > >> Subject: RE: question about enabling cfl-aa and
collecting a57
>>>> > >> numbers
>>>> > >>
>>>> > >> Hi Hal and Jiangning,
>>>> > >>
>>>> > >> I started to look at the effect of cfl-aa on
SPEC.
>>>> > >>
>>>> > >> The first observation is that I see more
redundant loads not being
>>>> > >> hoisted out of loops by LICM.
>>>> > >>
>>>> > >> Due to license restrictions, I cannot paste SPEC
code here, but I
>>>> > >> think you have access to it.
>>>> > >>
>>>> > >> I tried to simplify a top function in equake
(quake.c smvp) to show
>>>> > >> the issue. See the attached reduced test case.
>>>> > >>
>>>> > >> If you look at the the source code you will see
something like:
>>>> > >> for {
>>>> > >> ...load v[i][0]...
>>>> > >> while {
>>>> > >> ...load v[i][0]...
>>>> > >> store w[col][0]....
>>>> > >> }
>>>> > >> ...load w[i][0]...
>>>> > >> store w[i][0]...
>>>> > >> }
>>>> > >>
>>>> > >> It should be possible to avoid load &v[i][0],
&v[i][1], &vi[i][2]
>>>> > >> inside the while loop.
>>>> > >>
>>>> > >> In the simplified example that would be this
instruction:
>>>> > >> %21 = load double** %arrayidx8, align 8, !tbaa !5
>>>> > >>
>>>> > >> And the pointer for the load instruction:
>>>> > >> %arrayidx8 = getelementptr inbounds double** %v,
i64 %idxprom
>>>> > >>
>>>> > >> The decision to hoist it fails at LICM.cpp:189:
AliasSet &AS >>>> > >>
CurAST->getAliasSetForPointer(V, Size, AAInfo).isMod.
>>>> > >>
>>>> > >> With basicaa, only loads are in the alias set,
but with cfl-aa you
>>>> > >> find loads and stores, so isMod returns true.
>>>> > >>
>>>> > >> AliasSet with basicaa:
>>>> > >> AliasSet[0x2b35aa0, 7] may alias, Ref Pointers:
(double***
>>>> > >> %arrayidx32, 8), (double** %12, 8), (double**
%arrayidx36, 8),
>>>> > >> (double** %arrayidx8, 8), (double** %arrayidx68,
8), (double**
>>>> > >> %arrayidx77, 8), (double** %arrayidx85, 8)
>>>> > >> $5 = void
>>>> > >>
>>>> > >> AliasSet with cfa-aa:
>>>> > >> AliasSet[0x2b37df0, 20] may alias, Mod/Ref
Pointers: (i32*
>>>> > >> %arrayidx30, 4), (double*** %arrayidx32, 8),
(double** %12, 8),
>>>> > >> (double* %13, 8), (double** %arrayidx36, 8),
(double* %15, 8),
>>>> > >> (double* %arrayidx42, 8), (double* %arrayidx45,
8), (double*
>>>> > >> %arrayidx51, 8), (double* %arrayidx54, 8),
(double** %arrayidx8,
>>>> > >> 8), (double* %21, 8), (double** %arrayidx68, 8),
(double* %23, 8),
>>>> > >> (double* %arrayidx72, 8), (double** %arrayidx77,
8), (double* %26,
>>>> > >> 8), (double* %arrayidx81, 8), (double**
%arrayidx85, 8), (double*
>>>> > >> %29, 8)
>>>> > >> $8 = void
>>>> > >>
>>>> > >> You can reproduce it with the commands:
>>>> > >> opt -O3 -S -cfl-aa -basicaa -licm
-debug-only=licm reduce.ll -o out
>>>> > >>
>>>> > >> opt -O3 -S - -basicaa -licm -debug-only=licm
reduce.ll -o out LICM
>>>> > >> hoisting to while.body.lr.ph : %21 = load
double** %arrayidx8,
>>>> > >> align 8, !tbaa !5 LICM hoisting to
while.body.lr.ph : %arrayidx72
>>>> >>>> > >> getelementptr inbounds double*
%11, i64 1 LICM hoisting to
>>>> > >> while.body.lr.ph : %arrayidx81 = getelementptr
inbounds double*
>>>> > >> %11, i64 2
>>>> > >>
>>>> > >> Hal, what do you think? Does it meant that cfl-aa
is more
>>>> > >> conservative (note I set the right order -cfa-aa
followed by
>>>> > >> -basicaa)? Could it be an issue with building the
AliasSet tracker?
>>>> > >> Any pointers on how to fix this?
>>>> > >>
>>>> > >> Thanks,
>>>> > >> Ana.
>>>> > >>
>>>> > >>
>>>> > >>
>>>> > >> -----Original Message-----
>>>> > >> From: Jiangning Liu [mailto: Jiangning.Liu at
arm.com ]
>>>> > >> Sent: Wednesday, December 24, 2014 6:27 PM
>>>> > >> To: Hal Finkel; Ana Pazos
>>>> > >> Subject: RE: question about enabling cfl-aa and
collecting a57
>>>> > >> numbers
>>>> > >>
>>>> > >> Hi,
>>>> > >>
>>>> > >> I saw big regressions on cortex-A57 for the
following SPEC
>>>> > >> benchmarks after using "-mllvm -use-cfl-aa
".
>>>> > >>
>>>> > >> spec.cpu2006.ref.462_libquantum 9.22%
spec.cpu2000.ref.179_art
>>>> > >> 5.32%
>>>> > >> spec.cpu2000.ref.256_bzip2 4.91%
>>>> > >>
>>>> > >> Thanks,
>>>> > >> -Jiangning
>>>> > >>
>>>> > >>> -----Original Message-----
>>>> > >>> From: Hal Finkel [mailto: hfinkel at anl.gov
]
>>>> > >>> Sent: Wednesday, December 24, 2014 3:56 AM
>>>> > >>> To: Ana Pazos
>>>> > >>> Cc: Jiangning Liu
>>>> > >>> Subject: Re: question about enabling cfl-aa
and collecting a57
>>>> > >>> numbers
>>>> > >>>
>>>> > >>> ----- Original Message -----
>>>> > >>>> From: "Ana Pazos" < apazos
at codeaurora.org >
>>>> > >>>> To: "Hal Finkel" < hfinkel
at anl.gov >, "Jiangning Liu"
>>>> > >>>> < Jiangning.Liu at arm.com >
>>>> > >>>> Sent: Monday, December 22, 2014 4:25:07
PM
>>>> > >>>> Subject: question about enabling cfl-aa
and collecting a57
>>>> > >>>> numbers
>>>> > >>>>
>>>> > >>>>
>>>> > >>>>
>>>> > >>>>
>>>> > >>>> Hi Hal,
>>>> > >>>>
>>>> > >>>>
>>>> > >>>>
>>>> > >>>> Can you clarify to enable cfl-aa from
clang we need both these
>>>> > >>>> flags?
>>>> > >>>>
>>>> > >>>>
>>>> > >>>
>>>> > >>> Hi Ana,
>>>> > >>>
>>>> > >>> Thanks for following-up on this! You only
need (-mllvm
>>>> > >>> -use-cfl-aa)
>>>> > >>> to
>>>> > >>> use CFL AA during the main optimization
pipeline, and just this is
>>>> > >>> enough to reproduce the performance
regressions as far as I know.
>>>> > >>> You
>>>> > >>> can also use CFL AA during code generation
(-mllvm
>>>> > >>> -use-cfl-aa-in-codegen), which matters only
if your using AA at
>>>> > >>> all for code generation (which AArch64 does
only for the
>>>> > >>> Cortex/A53).
>>>> > >>>
>>>> > >>> -Hal
>>>> > >>>
>>>> > >>>>
>>>> > >>>> -mllvm -use-cfl-aa
>>>> > >>>>
>>>> > >>>> -mllvm -cfl-aa-in-codegen
>>>> > >>>>
>>>> > >>>>
>>>> > >>>>
>>>> > >>>> Hi Jiangning,
>>>> > >>>>
>>>> > >>>> Did you collect a57 perf results for SPEC
2000 and 2006 enabling
>>>> > >>>> these two flags?
>>>> > >>>>
>>>> > >>>> Did you notice any correctness and
performance regression?
>>>> > >>>>
>>>> > >>>>
>>>> > >>>>
>>>> > >>>>
>>>> > >>>>
>>>> > >>>> Thanks,
>>>> > >>>>
>>>> > >>>> Ana.
>>>> > >>>>
>>>> > >>>>
>>>> > >>>>
>>>> > >>>>
>>>> > >>>>
>>>> > >>>> Ana Pazos
>>>> > >>>> Qualcomm Innovation Center, Inc.
>>>> > >>>> The Qualcomm Innovation Center, Inc. is a
member of the Code
>>>> > >>>> Aurora Forum, a Linux Foundation
Collaborative Project.
>>>> > >>>>
>>>> > >>>>
>>>> > >>>>
>>>> > >>>>
>>>> > >>>>
>>>> > >>>>
>>>> > >>>
>>>> > >>> --
>>>> > >>> Hal Finkel
>>>> > >>> Assistant Computational Scientist
>>>> > >>> Leadership Computing Facility
>>>> > >>> Argonne National Laboratory
>>>> > >>
>>>> > >>
>>>> > >> -- IMPORTANT NOTICE: The contents of this email
and any attachments
>>>> > >> are confidential and may also be privileged. If
you are not the
>>>> > >> intended recipient, please notify the sender
immediately and do not
>>>> > >> disclose the contents to any other person, use it
for any purpose,
>>>> > >> or store or copy the information in any medium.
Thank you.
>>>> > >>
>>>> > >> ARM Limited, Registered office 110 Fulbourn Road,
Cambridge CB1
>>>> > >> 9NJ, Registered in England & Wales, Company
No: 2557590 ARM
>>>> > >> Holdings plc, Registered office 110 Fulbourn
Road, Cambridge CB1
>>>> > >> 9NJ, Registered in England & Wales, Company
No: 2548782
>>>> > >>
>>>> > >
>>>> > > --
>>>> > > Hal Finkel
>>>> > > Assistant Computational Scientist
>>>> > > Leadership Computing Facility
>>>> > > Argonne National Laboratory
>>>>
>>>> --
>>>> Hal Finkel
>>>> Assistant Computational Scientist
>>>> Leadership Computing Facility
>>>> Argonne National Laboratory
>>>>
>>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150113/3a705123/attachment.html>

Daniel Berlin

2015-Jan-14 15:49 UTC

head link

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

On Tue, Jan 13, 2015 at 11:26 PM, Nick Lewycky <nlewycky at google.com>
wrote:
> On 13 January 2015 at 22:11, Daniel Berlin <dberlin at dberlin.org>
wrote:
>
>> This is caused by CFLAA returning PartialAlias for a query that BasicAA
>> can prove is NoAlias.
>>
>
> One of them is wrong. Which one?
>
>CFL-AA.

Right now it checks whether two things come out to be the same stratified
info and have the same index, and if so, returns PartialAlias. This is
wrong, because it never knows why two things got unified, only that they
did.

Therefore, the current check it does where it returns PartialAlias seems
wrong.

George, why does it return PartialAlias?

IMHO, you can't ever prove any kind of must-alias or partialalias info with
CFL-AA, unless i'm missing something.


> I'm not sure from your description that this is a chaining issue.
> PartialAlias doesn't chain and isn't supposed to, it's a final
answer just
> like NoAlias and MustAlias are. You return partial alias when you have
> proven that the accesses are overlapping, meaning that it is proven to
> neither be no alias nor must alias.
>Well, it's still chaining wrong, because it did not change on MayAlias, but
you are right that it is not the issue i identified :)

Anyway, updated patch attached.

> Nick
>
> I've attached a patch that fixes it, but truthfully, chaining seems
>> somewhat badly designed IMHO.
>>
>> It should let you pass along the result you've gotten so far, so
that you
>> don't have CFLAA get PartialAlias, chain, and have BasicAA get
MayAlias.
>>
>> (IE it should let it work strictly down the hierarchy).
>>
>> Note that this patch breaks the CFLAliasAnalysis tests because it no
>> longer returns PartialAlias in some cases it is expecting it.
>>
>> I don't have time ATM to fix chaining completely, so if someone
wants to
>> shepherd this patch through, that would be appreciated.
>>
>>
>>
>>
>>
>> On Tue, Jan 13, 2015 at 8:58 PM, Daniel Berlin <dberlin at
dberlin.org>
>> wrote:
>>
>>> Can you send me actual LLVM IR or a preprocessed source from using
-E?
>>> I don't have a machine handy that has headers that target that
arch.
>>>
>>>
>>> On Tue Jan 13 2015 at 4:33:29 PM Daniel Berlin <dberlin at
dberlin.org>
>>> wrote:
>>>
>>>> Anything other than noalias or mustalias should be getting
passed down
>>>> the stack, so either that is not happening or CFL aa is giving
better
>>>> answers and something does worse with those better answers. 
I'll take a
>>>> look this evening
>>>> On Jan 13, 2015 3:58 PM, "Ana Pazos" <apazos at
codeaurora.org> wrote:
>>>>
>>>>> Hi folks,
>>>>>
>>>>> Moving the discussion to llvm.dev.
>>>>>
>>>>> None of the changes we talked earlier help.
>>>>>
>>>>> Find attached the C source code that you can use to
reproduce the
>>>>> issue.
>>>>>
>>>>> clang --target=aarch64-linux-gnu  -c -mcpu=cortex-a57
-Ofast
>>>>>  -fno-math-errno   test.c -S -o test.s  -mllvm
-debug-only=licm
>>>>> LICM hoisting to while.body.lr.ph:   %21 = load double**
%arrayidx8,
>>>>> align 8, !tbaa !5
>>>>> LICM hoisting to while.body.lr.ph:   %arrayidx72 =
getelementptr
>>>>> inbounds double* %11, i64 1
>>>>> LICM hoisting to while.body.lr.ph:   %arrayidx81 =
getelementptr
>>>>> inbounds double* %11, i64 2
>>>>> LICM hoisting to for.body.lr.ph:   %2 = ptrtoint i32*
%Index to i64
>>>>>
>>>>> clang --target=aarch64-linux-gnu  -c   -mcpu=cortex-a57 
-Ofast
>>>>>  -fno-math-errno   test.c -S -o test-cflaa.s -mllvm
-use-cfl-aa -mllvm
>>>>> -debug-only=licm
>>>>> LICM hoisting to for.body.lr.ph:   %2 = ptrtoint i32*
%Index to i64
>>>>>
>>>>> Why CFL AA cannot allow hoisting this out:  %21 = load
double**
>>>>> %arrayidx8, align 8, !tbaa !5
>>>>>
>>>>> Which leads to this extra load in assembly code: ldr   
x14, [x4, x9,
>>>>> lsl #3]
>>>>>
>>>>> Thanks!
>>>>> Ana.
>>>>>
>>>>> -----Original Message-----
>>>>> From: Hal Finkel [mailto:hfinkel at anl.gov]
>>>>> Sent: Tuesday, January 13, 2015 12:51 AM
>>>>> To: Chandler Carruth
>>>>> Cc: Jiangning Liu; Pazos, Ana; Ana Pazos; Daniel Berlin;
George
>>>>> Burgess IV
>>>>> Subject: Re: question about enabling cfl-aa and collecting
a57 numbers
>>>>>
>>>>> ----- Original Message -----
>>>>> > From: "Chandler Carruth" <chandlerc at
gmail.com>
>>>>> > To: "Ana Pazos" <apazos at
codeaurora.org>, "Daniel Berlin" <
>>>>> dberlin at dberlin.org>, "George Burgess IV"
>>>>> > <george.burgess.iv at gmail.com>, "Hal
Finkel" <hfinkel at anl.gov>
>>>>> > Cc: "Jiangning Liu" <Jiangning.Liu at
arm.com>, "Ana Pazos"
>>>>> > <apazos at quicinc.com>
>>>>> > Sent: Monday, January 12, 2015 6:43:44 PM
>>>>> > Subject: Re: question about enabling cfl-aa and
collecting a57
>>>>> numbers
>>>>> >
>>>>> > Not quite:
>>>>> >
>>>>> >
>>>>> > On Mon Jan 12 2015 at 4:27:30 PM Ana Pazos < apazos
at codeaurora.org >
>>>>> > wrote:
>>>>> >
>>>>> > Thanks George and Daniel,
>>>>> >
>>>>> >
>>>>> >
>>>>> > If the recommended order is “–basicaa –cfla-aa” it
means we should
>>>>> fix
>>>>> > the trunk code that processes the flags -mllvm
use-cfl-aa-in-codegen
>>>>> > and –mllvm use-cfl-aa, right?
>>>>> >
>>>>> >
>>>>> >
>>>>> > In Transforms/IPO/PassManagerBuilder.cpp and
CodeGen/Passes.cpp we
>>>>> see
>>>>> > this sequence:
>>>>> >
>>>>> >
>>>>> >
>>>>> > if (UseCFLAA)
>>>>> >
>>>>> > PM.add(createCFLAliasAnalysisPass());
>>>>> >
>>>>> > PM.add(createTypeBasedAliasAnalysisPass());
>>>>> >
>>>>> > PM.add(createScopedNoAliasAAPass());
>>>>> >
>>>>> > PM.add(createBasicAliasAnalysisPass());
>>>>> >
>>>>> >
>>>>> >
>>>>> > So are you recommending changing to the sequence below
instead?
>>>>> >
>>>>> >
>>>>> > Not quite:
>>>>> >
>>>>> >
>>>>> > addPass(createTypeBasedAliasAnalysisPass());
>>>>> >
>>>>> > addPass(createScopedNoAliasAAPass());
>>>>> > You want it here.
>>>>> >
>>>>> > addPass(createBasicAliasAnalysisPass());
>>>>> >
>>>>> > if (UseCFLAA)
>>>>> >
>>>>> > addPass(createCFLAliasAnalysisPass());
>>>>> >
>>>>> >
>>>>> > The way to think about this is as follows:
>>>>> >
>>>>> >
>>>>> > if TBAA or ScopedNoAlias *can* return results, you
want them to. They
>>>>> > are reflecting a-priori knowledge of aliasing.
>>>>> >
>>>>> >
>>>>> > if both of those say "maybe", you want to
ask CFL. If CFL says
>>>>> "maybe"
>>>>> > you want to ask "BasicAA".
>>>>>
>>>>> I could very well be misunderstanding something, but I
recall that we
>>>>> add BasicAA last so that it will be queried first. We do
this because of
>>>>> our BasicAA delegation hack: We want a MustAlias from
BasicAA to override a
>>>>> NoAlias from TBAA, so that we can catch cases of basic
local type punning.
>>>>> Thus, assuming CFL delegates properly, I fail to see how
switching its
>>>>> order w.r.t. the metadata-based analysis changes anything.
>>>>>
>>>>>  -Hal
>>>>>
>>>>> >
>>>>> >
>>>>> > It's possible that you could as a compile-time
hack flip the last two
>>>>> > to your suggested ordering, but a) it should really
only be a compile
>>>>> > time hack, and b) we haven't really tested that
BasicAA actually
>>>>> > delegates rather than directly returning
"maybe".
>>>>> >
>>>>> >
>>>>> > Thanks,
>>>>> >
>>>>> > Ana.
>>>>> >
>>>>> >
>>>>> >
>>>>> > From: Daniel Berlin [mailto: dberlin at dberlin.org ]
>>>>> > Sent: Friday, January 09, 2015 6:41 PM
>>>>> > To: George Burgess IV; Hal Finkel
>>>>> > Cc: Ana Pazos; Pazos, Ana; Jiangning Liu; chandlerc
>>>>> >
>>>>> >
>>>>> > Subject: Re: question about enabling cfl-aa and
collecting a57
>>>>> numbers
>>>>> >
>>>>> >
>>>>> > Right. I would not try with -cfl-aa -basicaa as the
order, it should
>>>>> > be the other way around.
>>>>> >
>>>>> >
>>>>> > If you discover performance regressions with *that*,
that would be
>>>>> > highly interesting.
>>>>> >
>>>>> >
>>>>> > On Fri Jan 09 2015 at 6:32:28 PM George Burgess IV
<
>>>>> > george.burgess.iv at gmail.com > wrote:
>>>>> >
>>>>> >
>>>>> >
>>>>> > Thanks for the notes and observations Ana! :)
>>>>> >
>>>>> > >> Hal, what do you think? Does it meant that
cfl-aa is more
>>>>> > >> conservative (note I set the right order
-cfa-aa followed by
>>>>> > >> -basicaa)?
>>>>> > Currently, yes. The implementation as it is is focused
heavily on
>>>>> > being fast, and on trying to cover cases that BasicAA
can’t easily
>>>>> > cover itself. If our end goal is to use this as a
replacement for
>>>>> > BasicAA, there’s probably quite a few low-hanging
fruit for accuracy
>>>>> > improvements, and we can start playing with
sacrificing speed in
>>>>> > pursuit of greater accuracy. On the other hand, if the
end goal is to
>>>>> > append this pass to the list of AA passes already
being performed,
>>>>> > then I’d be interested in seeing the difference in
perf between
>>>>> > `-basicaa` and `-basicaa -cfl-aa`, if any.
>>>>> >
>>>>> > WRT this issue in particular, I believe CFLAA answers
conservatively
>>>>> > because:
>>>>> > A. it doesn’t take into account constant pointer
offsets (i.e. it
>>>>> > considers a[0]..a[n] all as the same “address”) B.
CFLAA is entirely
>>>>> > context insensitive
>>>>> >
>>>>> > IIRC, BasicAA is somewhat context sensitive, and it
does take into
>>>>> > account constant pointer offsets, so it can be more
accurate in this
>>>>> > case. That being said, I don’t have access to the SPEC
code, so I can
>>>>> > only speculate. :)
>>>>> >
>>>>> > George
>>>>> >
>>>>> > > On Jan 9, 2015, at 4:25 PM, Hal Finkel <
hfinkel at anl.gov > wrote:
>>>>> > >
>>>>> > > Ana,
>>>>> > >
>>>>> > > Thanks so much for looking into this. I'm
adding some additional
>>>>> > > relevant people to the CC line...
>>>>> > >
>>>>> > > -Hal
>>>>> > >
>>>>> > > ----- Original Message -----
>>>>> > >> From: "Ana Pazos" < apazos at
codeaurora.org >
>>>>> > >> To: "Ana Pazos" < apazos at
quicinc.com >, "Hal Finkel" <
>>>>> > >> hfinkel at anl.gov >, "Jiangning
Liu" < Jiangning.Liu at arm.com >
>>>>> > >> Sent: Friday, January 9, 2015 3:12:04 PM
>>>>> > >> Subject: RE: question about enabling cfl-aa
and collecting a57
>>>>> > >> numbers
>>>>> > >>
>>>>> > >> Hi Hal and Jiangning,
>>>>> > >>
>>>>> > >> I started to look at the effect of cfl-aa on
SPEC.
>>>>> > >>
>>>>> > >> The first observation is that I see more
redundant loads not being
>>>>> > >> hoisted out of loops by LICM.
>>>>> > >>
>>>>> > >> Due to license restrictions, I cannot paste
SPEC code here, but I
>>>>> > >> think you have access to it.
>>>>> > >>
>>>>> > >> I tried to simplify a top function in equake
(quake.c smvp) to
>>>>> show
>>>>> > >> the issue. See the attached reduced test
case.
>>>>> > >>
>>>>> > >> If you look at the the source code you will
see something like:
>>>>> > >> for {
>>>>> > >> ...load v[i][0]...
>>>>> > >> while {
>>>>> > >> ...load v[i][0]...
>>>>> > >> store w[col][0]....
>>>>> > >> }
>>>>> > >> ...load w[i][0]...
>>>>> > >> store w[i][0]...
>>>>> > >> }
>>>>> > >>
>>>>> > >> It should be possible to avoid load
&v[i][0], &v[i][1], &vi[i][2]
>>>>> > >> inside the while loop.
>>>>> > >>
>>>>> > >> In the simplified example that would be this
instruction:
>>>>> > >> %21 = load double** %arrayidx8, align 8,
!tbaa !5
>>>>> > >>
>>>>> > >> And the pointer for the load instruction:
>>>>> > >> %arrayidx8 = getelementptr inbounds double**
%v, i64 %idxprom
>>>>> > >>
>>>>> > >> The decision to hoist it fails at
LICM.cpp:189: AliasSet &AS >>>>> > >>
CurAST->getAliasSetForPointer(V, Size, AAInfo).isMod.
>>>>> > >>
>>>>> > >> With basicaa, only loads are in the alias
set, but with cfl-aa you
>>>>> > >> find loads and stores, so isMod returns true.
>>>>> > >>
>>>>> > >> AliasSet with basicaa:
>>>>> > >> AliasSet[0x2b35aa0, 7] may alias, Ref
Pointers: (double***
>>>>> > >> %arrayidx32, 8), (double** %12, 8), (double**
%arrayidx36, 8),
>>>>> > >> (double** %arrayidx8, 8), (double**
%arrayidx68, 8), (double**
>>>>> > >> %arrayidx77, 8), (double** %arrayidx85, 8)
>>>>> > >> $5 = void
>>>>> > >>
>>>>> > >> AliasSet with cfa-aa:
>>>>> > >> AliasSet[0x2b37df0, 20] may alias, Mod/Ref
Pointers: (i32*
>>>>> > >> %arrayidx30, 4), (double*** %arrayidx32, 8),
(double** %12, 8),
>>>>> > >> (double* %13, 8), (double** %arrayidx36, 8),
(double* %15, 8),
>>>>> > >> (double* %arrayidx42, 8), (double*
%arrayidx45, 8), (double*
>>>>> > >> %arrayidx51, 8), (double* %arrayidx54, 8),
(double** %arrayidx8,
>>>>> > >> 8), (double* %21, 8), (double** %arrayidx68,
8), (double* %23, 8),
>>>>> > >> (double* %arrayidx72, 8), (double**
%arrayidx77, 8), (double* %26,
>>>>> > >> 8), (double* %arrayidx81, 8), (double**
%arrayidx85, 8), (double*
>>>>> > >> %29, 8)
>>>>> > >> $8 = void
>>>>> > >>
>>>>> > >> You can reproduce it with the commands:
>>>>> > >> opt -O3 -S -cfl-aa -basicaa -licm
-debug-only=licm reduce.ll -o
>>>>> out
>>>>> > >>
>>>>> > >> opt -O3 -S - -basicaa -licm -debug-only=licm
reduce.ll -o out LICM
>>>>> > >> hoisting to while.body.lr.ph : %21 = load
double** %arrayidx8,
>>>>> > >> align 8, !tbaa !5 LICM hoisting to
while.body.lr.ph :
>>>>> %arrayidx72 >>>>> > >>
getelementptr inbounds double* %11, i64 1 LICM hoisting to
>>>>> > >> while.body.lr.ph : %arrayidx81 =
getelementptr inbounds double*
>>>>> > >> %11, i64 2
>>>>> > >>
>>>>> > >> Hal, what do you think? Does it meant that
cfl-aa is more
>>>>> > >> conservative (note I set the right order
-cfa-aa followed by
>>>>> > >> -basicaa)? Could it be an issue with building
the AliasSet
>>>>> tracker?
>>>>> > >> Any pointers on how to fix this?
>>>>> > >>
>>>>> > >> Thanks,
>>>>> > >> Ana.
>>>>> > >>
>>>>> > >>
>>>>> > >>
>>>>> > >> -----Original Message-----
>>>>> > >> From: Jiangning Liu [mailto: Jiangning.Liu at
arm.com ]
>>>>> > >> Sent: Wednesday, December 24, 2014 6:27 PM
>>>>> > >> To: Hal Finkel; Ana Pazos
>>>>> > >> Subject: RE: question about enabling cfl-aa
and collecting a57
>>>>> > >> numbers
>>>>> > >>
>>>>> > >> Hi,
>>>>> > >>
>>>>> > >> I saw big regressions on cortex-A57 for the
following SPEC
>>>>> > >> benchmarks after using "-mllvm
-use-cfl-aa ".
>>>>> > >>
>>>>> > >> spec.cpu2006.ref.462_libquantum 9.22%
spec.cpu2000.ref.179_art
>>>>> > >> 5.32%
>>>>> > >> spec.cpu2000.ref.256_bzip2 4.91%
>>>>> > >>
>>>>> > >> Thanks,
>>>>> > >> -Jiangning
>>>>> > >>
>>>>> > >>> -----Original Message-----
>>>>> > >>> From: Hal Finkel [mailto: hfinkel at
anl.gov ]
>>>>> > >>> Sent: Wednesday, December 24, 2014 3:56
AM
>>>>> > >>> To: Ana Pazos
>>>>> > >>> Cc: Jiangning Liu
>>>>> > >>> Subject: Re: question about enabling
cfl-aa and collecting a57
>>>>> > >>> numbers
>>>>> > >>>
>>>>> > >>> ----- Original Message -----
>>>>> > >>>> From: "Ana Pazos" <
apazos at codeaurora.org >
>>>>> > >>>> To: "Hal Finkel" <
hfinkel at anl.gov >, "Jiangning Liu"
>>>>> > >>>> < Jiangning.Liu at arm.com >
>>>>> > >>>> Sent: Monday, December 22, 2014
4:25:07 PM
>>>>> > >>>> Subject: question about enabling
cfl-aa and collecting a57
>>>>> > >>>> numbers
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>> Hi Hal,
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>> Can you clarify to enable cfl-aa from
clang we need both these
>>>>> > >>>> flags?
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>
>>>>> > >>> Hi Ana,
>>>>> > >>>
>>>>> > >>> Thanks for following-up on this! You only
need (-mllvm
>>>>> > >>> -use-cfl-aa)
>>>>> > >>> to
>>>>> > >>> use CFL AA during the main optimization
pipeline, and just this
>>>>> is
>>>>> > >>> enough to reproduce the performance
regressions as far as I know.
>>>>> > >>> You
>>>>> > >>> can also use CFL AA during code
generation (-mllvm
>>>>> > >>> -use-cfl-aa-in-codegen), which matters
only if your using AA at
>>>>> > >>> all for code generation (which AArch64
does only for the
>>>>> > >>> Cortex/A53).
>>>>> > >>>
>>>>> > >>> -Hal
>>>>> > >>>
>>>>> > >>>>
>>>>> > >>>> -mllvm -use-cfl-aa
>>>>> > >>>>
>>>>> > >>>> -mllvm -cfl-aa-in-codegen
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>> Hi Jiangning,
>>>>> > >>>>
>>>>> > >>>> Did you collect a57 perf results for
SPEC 2000 and 2006 enabling
>>>>> > >>>> these two flags?
>>>>> > >>>>
>>>>> > >>>> Did you notice any correctness and
performance regression?
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>> Thanks,
>>>>> > >>>>
>>>>> > >>>> Ana.
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>> Ana Pazos
>>>>> > >>>> Qualcomm Innovation Center, Inc.
>>>>> > >>>> The Qualcomm Innovation Center, Inc.
is a member of the Code
>>>>> > >>>> Aurora Forum, a Linux Foundation
Collaborative Project.
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>>
>>>>> > >>>
>>>>> > >>> --
>>>>> > >>> Hal Finkel
>>>>> > >>> Assistant Computational Scientist
>>>>> > >>> Leadership Computing Facility
>>>>> > >>> Argonne National Laboratory
>>>>> > >>
>>>>> > >>
>>>>> > >> -- IMPORTANT NOTICE: The contents of this
email and any
>>>>> attachments
>>>>> > >> are confidential and may also be privileged.
If you are not the
>>>>> > >> intended recipient, please notify the sender
immediately and do
>>>>> not
>>>>> > >> disclose the contents to any other person,
use it for any purpose,
>>>>> > >> or store or copy the information in any
medium. Thank you.
>>>>> > >>
>>>>> > >> ARM Limited, Registered office 110 Fulbourn
Road, Cambridge CB1
>>>>> > >> 9NJ, Registered in England & Wales,
Company No: 2557590 ARM
>>>>> > >> Holdings plc, Registered office 110 Fulbourn
Road, Cambridge CB1
>>>>> > >> 9NJ, Registered in England & Wales,
Company No: 2548782
>>>>> > >>
>>>>> > >
>>>>> > > --
>>>>> > > Hal Finkel
>>>>> > > Assistant Computational Scientist
>>>>> > > Leadership Computing Facility
>>>>> > > Argonne National Laboratory
>>>>>
>>>>> --
>>>>> Hal Finkel
>>>>> Assistant Computational Scientist
>>>>> Leadership Computing Facility
>>>>> Argonne National Laboratory
>>>>>
>>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150114/0f4e5ff1/attachment.html>
-------------- next part --------------
diff --git a/lib/Analysis/CFLAliasAnalysis.cpp
b/lib/Analysis/CFLAliasAnalysis.cpp
index 8dfb87e..0fa8c6d 100644
--- a/lib/Analysis/CFLAliasAnalysis.cpp
+++ b/lib/Analysis/CFLAliasAnalysis.cpp
@@ -230,8 +230,9 @@ public:
       return AliasAnalysis::alias(LocA, LocB);
     }
     AliasResult QueryResult = query(LocA, LocB);
-    if (QueryResult == MayAlias|| QueryResult == PartialAlias)
+    if (QueryResult == MayAlias)
       return AliasAnalysis::alias(LocA, LocB);
+
     return QueryResult;
   }
 
@@ -995,7 +996,7 @@ CFLAliasAnalysis::query(const AliasAnalysis::Location
&LocA,
   auto SetB = *MaybeB;
 
   if (SetA.Index == SetB.Index)
-    return AliasAnalysis::PartialAlias;
+    return AliasAnalysis::MayAlias;
 
   auto AttrsA = Sets.getLink(SetA.Index).Attrs;
   auto AttrsB = Sets.getLink(SetB.Index).Attrs;

Daniel Berlin

2015-Jan-14 17:16 UTC

head link

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

On Wed, Jan 14, 2015 at 7:49 AM, Daniel Berlin <dberlin at dberlin.org>
wrote:
>
>
> On Tue, Jan 13, 2015 at 11:26 PM, Nick Lewycky <nlewycky at
google.com>
> wrote:
>
>> On 13 January 2015 at 22:11, Daniel Berlin <dberlin at
dberlin.org> wrote:
>>
>>> This is caused by CFLAA returning PartialAlias for a query that
BasicAA
>>> can prove is NoAlias.
>>>
>>
>> One of them is wrong. Which one?
>>
>
>>
> CFL-AA.
>
> Right now it checks whether two things come out to be the same stratified
> info and have the same index, and if so, returns PartialAlias. This is
> wrong, because it never knows why two things got unified, only that they
> did.
>
Someone asked what I meant by this, so, two things:
First, CFL-AA does not know whether the statements it processes will ever
be executed or where they occur in the function.. The results also do not
depend on the order in which statements are processed.

So even if it only sees one assignment and nothing else, and unifies those
two variable, the result is no different whether the assignment is in a
branch or whether it is the only statement in the function.

IE

int *a;
int *b;

a = b

and

if (c)
  a = b

should produce the same pointer-analysis result with CFL-AA (and all flow +
path -insensitive analysis).

If these are the only pointer assigning statements in the function, the
first is must-alias, the second is may-alias.
Because it cannot distinguish these two, IMHO, it can never return
must-alias (or partialalias).

Second, the process of pointer info unification is not and should not be
related to the process of assignments in practice (and thus, even if you
could distinguish the above, the fact that it believes two pointers
equivalent does not make them must-alias).

Variables may get unified because we lack info and it's the conservatively
safe thing to do, or even because we are trying to lower the time bounds
and decide losing precision is okay.

In short, unless i'm missing something drastic, i can't see how CFL-AA
can
ever produce must-alias or partial-alias info.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150114/ee060fa3/attachment.html>

George Burgess IV

2015-Jan-14 18:30 UTC

head link

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

Inline 

- George
> On Jan 14, 2015, at 10:49 AM, Daniel Berlin <dberlin at dberlin.org>
wrote:
> 
> 
> 
>> On Tue, Jan 13, 2015 at 11:26 PM, Nick Lewycky <nlewycky at
google.com> wrote:
>>> On 13 January 2015 at 22:11, Daniel Berlin <dberlin at
dberlin.org> wrote:
>>> This is caused by CFLAA returning PartialAlias for a query that
BasicAA can prove is NoAlias.
>> 
>> One of them is wrong. Which one? 
> CFL-AA.
> 
> Right now it checks whether two things come out to be the same stratified
info and have the same index, and if so, returns PartialAlias. This is wrong,
because it never knows why two things got unified, only that they did.
> 
> Therefore, the current check it does where it returns PartialAlias seems
wrong.
> 
> George, why does it return PartialAlias?It shouldn't -- unless I misinterpreted an earlier discussion, I was under
the impression that we were testing with that line updated to return MayAlias.
We do need to update that in LLVM though; thanks for the fix.
> IMHO, you can't ever prove any kind of must-alias or partialalias info
with CFL-AA, unless i'm missing something.Agreed. 
>  
>> I'm not sure from your description that this is a chaining issue.
PartialAlias doesn't chain and isn't supposed to, it's a final
answer just like NoAlias and MustAlias are. You return partial alias when you
have proven that the accesses are overlapping, meaning that it is proven to
neither be no alias nor must alias.
> Well, it's still chaining wrong, because it did not change on MayAlias,
but you are right that it is not the issue i identified :)
>  
> Anyway, updated patch attached.
> 
>> 
>> Nick
>> 
>>> I've attached a patch that fixes it, but truthfully, chaining
seems somewhat badly designed IMHO.
>>> 
>>> It should let you pass along the result you've gotten so far,
so that you don't have CFLAA get PartialAlias, chain, and have BasicAA get
MayAlias.
>>> 
>>> (IE it should let it work strictly down the hierarchy).
>>> 
>>> Note that this patch breaks the CFLAliasAnalysis tests because it
no longer returns PartialAlias in some cases it is expecting it.
>>> 
>>> I don't have time ATM to fix chaining completely, so if someone
wants to shepherd this patch through, that would be appreciated.
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On Tue, Jan 13, 2015 at 8:58 PM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>>>> Can you send me actual LLVM IR or a preprocessed source from
using -E?
>>>> I don't have a machine handy that has headers that target
that arch.
>>>> 
>>>> 
>>>>> On Tue Jan 13 2015 at 4:33:29 PM Daniel Berlin <dberlin
at dberlin.org> wrote:
>>>>> Anything other than noalias or mustalias should be getting
passed down the stack, so either that is not happening or CFL aa is giving
better answers and something does worse with those better answers.  I'll
take a look this evening
>>>>> 
>>>>>> On Jan 13, 2015 3:58 PM, "Ana Pazos"
<apazos at codeaurora.org> wrote:
>>>>>> Hi folks,
>>>>>> 
>>>>>> Moving the discussion to llvm.dev.
>>>>>> 
>>>>>> None of the changes we talked earlier help.
>>>>>> 
>>>>>> Find attached the C source code that you can use to
reproduce the issue.
>>>>>> 
>>>>>> clang --target=aarch64-linux-gnu  -c -mcpu=cortex-a57
-Ofast   -fno-math-errno   test.c -S -o test.s  -mllvm -debug-only=licm
>>>>>> LICM hoisting to while.body.lr.ph:   %21 = load
double** %arrayidx8, align 8, !tbaa !5
>>>>>> LICM hoisting to while.body.lr.ph:   %arrayidx72 =
getelementptr inbounds double* %11, i64 1
>>>>>> LICM hoisting to while.body.lr.ph:   %arrayidx81 =
getelementptr inbounds double* %11, i64 2
>>>>>> LICM hoisting to for.body.lr.ph:   %2 = ptrtoint i32*
%Index to i64
>>>>>> 
>>>>>> clang --target=aarch64-linux-gnu  -c   -mcpu=cortex-a57
-Ofast   -fno-math-errno   test.c -S -o test-cflaa.s -mllvm -use-cfl-aa -mllvm
-debug-only=licm
>>>>>> LICM hoisting to for.body.lr.ph:   %2 = ptrtoint i32*
%Index to i64
>>>>>> 
>>>>>> Why CFL AA cannot allow hoisting this out:  %21 = load
double** %arrayidx8, align 8, !tbaa !5
>>>>>> 
>>>>>> Which leads to this extra load in assembly code: ldr   
x14, [x4, x9, lsl #3]
>>>>>> 
>>>>>> Thanks!
>>>>>> Ana.
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Hal Finkel [mailto:hfinkel at anl.gov]
>>>>>> Sent: Tuesday, January 13, 2015 12:51 AM
>>>>>> To: Chandler Carruth
>>>>>> Cc: Jiangning Liu; Pazos, Ana; Ana Pazos; Daniel
Berlin; George Burgess IV
>>>>>> Subject: Re: question about enabling cfl-aa and
collecting a57 numbers
>>>>>> 
>>>>>> ----- Original Message -----
>>>>>> > From: "Chandler Carruth" <chandlerc
at gmail.com>
>>>>>> > To: "Ana Pazos" <apazos at
codeaurora.org>, "Daniel Berlin" <dberlin at dberlin.org>,
"George Burgess IV"
>>>>>> > <george.burgess.iv at gmail.com>, "Hal
Finkel" <hfinkel at anl.gov>
>>>>>> > Cc: "Jiangning Liu" <Jiangning.Liu at
arm.com>, "Ana Pazos"
>>>>>> > <apazos at quicinc.com>
>>>>>> > Sent: Monday, January 12, 2015 6:43:44 PM
>>>>>> > Subject: Re: question about enabling cfl-aa and
collecting a57 numbers
>>>>>> >
>>>>>> > Not quite:
>>>>>> >
>>>>>> >
>>>>>> > On Mon Jan 12 2015 at 4:27:30 PM Ana Pazos <
apazos at codeaurora.org >
>>>>>> > wrote:
>>>>>> >
>>>>>> > Thanks George and Daniel,
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > If the recommended order is “–basicaa –cfla-aa” it
means we should fix
>>>>>> > the trunk code that processes the flags -mllvm
use-cfl-aa-in-codegen
>>>>>> > and –mllvm use-cfl-aa, right?
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > In Transforms/IPO/PassManagerBuilder.cpp and
CodeGen/Passes.cpp we see
>>>>>> > this sequence:
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > if (UseCFLAA)
>>>>>> >
>>>>>> > PM.add(createCFLAliasAnalysisPass());
>>>>>> >
>>>>>> > PM.add(createTypeBasedAliasAnalysisPass());
>>>>>> >
>>>>>> > PM.add(createScopedNoAliasAAPass());
>>>>>> >
>>>>>> > PM.add(createBasicAliasAnalysisPass());
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > So are you recommending changing to the sequence
below instead?
>>>>>> >
>>>>>> >
>>>>>> > Not quite:
>>>>>> >
>>>>>> >
>>>>>> > addPass(createTypeBasedAliasAnalysisPass());
>>>>>> >
>>>>>> > addPass(createScopedNoAliasAAPass());
>>>>>> > You want it here.
>>>>>> >
>>>>>> > addPass(createBasicAliasAnalysisPass());
>>>>>> >
>>>>>> > if (UseCFLAA)
>>>>>> >
>>>>>> > addPass(createCFLAliasAnalysisPass());
>>>>>> >
>>>>>> >
>>>>>> > The way to think about this is as follows:
>>>>>> >
>>>>>> >
>>>>>> > if TBAA or ScopedNoAlias *can* return results, you
want them to. They
>>>>>> > are reflecting a-priori knowledge of aliasing.
>>>>>> >
>>>>>> >
>>>>>> > if both of those say "maybe", you want
to ask CFL. If CFL says "maybe"
>>>>>> > you want to ask "BasicAA".
>>>>>> 
>>>>>> I could very well be misunderstanding something, but I
recall that we add BasicAA last so that it will be queried first. We do this
because of our BasicAA delegation hack: We want a MustAlias from BasicAA to
override a NoAlias from TBAA, so that we can catch cases of basic local type
punning. Thus, assuming CFL delegates properly, I fail to see how switching its
order w.r.t. the metadata-based analysis changes anything.
>>>>>> 
>>>>>>  -Hal
>>>>>> 
>>>>>> >
>>>>>> >
>>>>>> > It's possible that you could as a compile-time
hack flip the last two
>>>>>> > to your suggested ordering, but a) it should
really only be a compile
>>>>>> > time hack, and b) we haven't really tested
that BasicAA actually
>>>>>> > delegates rather than directly returning
"maybe".
>>>>>> >
>>>>>> >
>>>>>> > Thanks,
>>>>>> >
>>>>>> > Ana.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > From: Daniel Berlin [mailto: dberlin at
dberlin.org ]
>>>>>> > Sent: Friday, January 09, 2015 6:41 PM
>>>>>> > To: George Burgess IV; Hal Finkel
>>>>>> > Cc: Ana Pazos; Pazos, Ana; Jiangning Liu;
chandlerc
>>>>>> >
>>>>>> >
>>>>>> > Subject: Re: question about enabling cfl-aa and
collecting a57 numbers
>>>>>> >
>>>>>> >
>>>>>> > Right. I would not try with -cfl-aa -basicaa as
the order, it should
>>>>>> > be the other way around.
>>>>>> >
>>>>>> >
>>>>>> > If you discover performance regressions with
*that*, that would be
>>>>>> > highly interesting.
>>>>>> >
>>>>>> >
>>>>>> > On Fri Jan 09 2015 at 6:32:28 PM George Burgess IV
<
>>>>>> > george.burgess.iv at gmail.com > wrote:
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > Thanks for the notes and observations Ana! :)
>>>>>> >
>>>>>> > >> Hal, what do you think? Does it meant
that cfl-aa is more
>>>>>> > >> conservative (note I set the right order
-cfa-aa followed by
>>>>>> > >> -basicaa)?
>>>>>> > Currently, yes. The implementation as it is is
focused heavily on
>>>>>> > being fast, and on trying to cover cases that
BasicAA can’t easily
>>>>>> > cover itself. If our end goal is to use this as a
replacement for
>>>>>> > BasicAA, there’s probably quite a few low-hanging
fruit for accuracy
>>>>>> > improvements, and we can start playing with
sacrificing speed in
>>>>>> > pursuit of greater accuracy. On the other hand, if
the end goal is to
>>>>>> > append this pass to the list of AA passes already
being performed,
>>>>>> > then I’d be interested in seeing the difference in
perf between
>>>>>> > `-basicaa` and `-basicaa -cfl-aa`, if any.
>>>>>> >
>>>>>> > WRT this issue in particular, I believe CFLAA
answers conservatively
>>>>>> > because:
>>>>>> > A. it doesn’t take into account constant pointer
offsets (i.e. it
>>>>>> > considers a[0]..a[n] all as the same “address”) B.
CFLAA is entirely
>>>>>> > context insensitive
>>>>>> >
>>>>>> > IIRC, BasicAA is somewhat context sensitive, and
it does take into
>>>>>> > account constant pointer offsets, so it can be
more accurate in this
>>>>>> > case. That being said, I don’t have access to the
SPEC code, so I can
>>>>>> > only speculate. :)
>>>>>> >
>>>>>> > George
>>>>>> >
>>>>>> > > On Jan 9, 2015, at 4:25 PM, Hal Finkel <
hfinkel at anl.gov > wrote:
>>>>>> > >
>>>>>> > > Ana,
>>>>>> > >
>>>>>> > > Thanks so much for looking into this. I'm
adding some additional
>>>>>> > > relevant people to the CC line...
>>>>>> > >
>>>>>> > > -Hal
>>>>>> > >
>>>>>> > > ----- Original Message -----
>>>>>> > >> From: "Ana Pazos" < apazos
at codeaurora.org >
>>>>>> > >> To: "Ana Pazos" < apazos at
quicinc.com >, "Hal Finkel" <
>>>>>> > >> hfinkel at anl.gov >, "Jiangning
Liu" < Jiangning.Liu at arm.com >
>>>>>> > >> Sent: Friday, January 9, 2015 3:12:04 PM
>>>>>> > >> Subject: RE: question about enabling
cfl-aa and collecting a57
>>>>>> > >> numbers
>>>>>> > >>
>>>>>> > >> Hi Hal and Jiangning,
>>>>>> > >>
>>>>>> > >> I started to look at the effect of cfl-aa
on SPEC.
>>>>>> > >>
>>>>>> > >> The first observation is that I see more
redundant loads not being
>>>>>> > >> hoisted out of loops by LICM.
>>>>>> > >>
>>>>>> > >> Due to license restrictions, I cannot
paste SPEC code here, but I
>>>>>> > >> think you have access to it.
>>>>>> > >>
>>>>>> > >> I tried to simplify a top function in
equake (quake.c smvp) to show
>>>>>> > >> the issue. See the attached reduced test
case.
>>>>>> > >>
>>>>>> > >> If you look at the the source code you
will see something like:
>>>>>> > >> for {
>>>>>> > >> ...load v[i][0]...
>>>>>> > >> while {
>>>>>> > >> ...load v[i][0]...
>>>>>> > >> store w[col][0]....
>>>>>> > >> }
>>>>>> > >> ...load w[i][0]...
>>>>>> > >> store w[i][0]...
>>>>>> > >> }
>>>>>> > >>
>>>>>> > >> It should be possible to avoid load
&v[i][0], &v[i][1], &vi[i][2]
>>>>>> > >> inside the while loop.
>>>>>> > >>
>>>>>> > >> In the simplified example that would be
this instruction:
>>>>>> > >> %21 = load double** %arrayidx8, align 8,
!tbaa !5
>>>>>> > >>
>>>>>> > >> And the pointer for the load instruction:
>>>>>> > >> %arrayidx8 = getelementptr inbounds
double** %v, i64 %idxprom
>>>>>> > >>
>>>>>> > >> The decision to hoist it fails at
LICM.cpp:189: AliasSet &AS >>>>>> > >>
CurAST->getAliasSetForPointer(V, Size, AAInfo).isMod.
>>>>>> > >>
>>>>>> > >> With basicaa, only loads are in the alias
set, but with cfl-aa you
>>>>>> > >> find loads and stores, so isMod returns
true.
>>>>>> > >>
>>>>>> > >> AliasSet with basicaa:
>>>>>> > >> AliasSet[0x2b35aa0, 7] may alias, Ref
Pointers: (double***
>>>>>> > >> %arrayidx32, 8), (double** %12, 8),
(double** %arrayidx36, 8),
>>>>>> > >> (double** %arrayidx8, 8), (double**
%arrayidx68, 8), (double**
>>>>>> > >> %arrayidx77, 8), (double** %arrayidx85,
8)
>>>>>> > >> $5 = void
>>>>>> > >>
>>>>>> > >> AliasSet with cfa-aa:
>>>>>> > >> AliasSet[0x2b37df0, 20] may alias,
Mod/Ref Pointers: (i32*
>>>>>> > >> %arrayidx30, 4), (double*** %arrayidx32,
8), (double** %12, 8),
>>>>>> > >> (double* %13, 8), (double** %arrayidx36,
8), (double* %15, 8),
>>>>>> > >> (double* %arrayidx42, 8), (double*
%arrayidx45, 8), (double*
>>>>>> > >> %arrayidx51, 8), (double* %arrayidx54,
8), (double** %arrayidx8,
>>>>>> > >> 8), (double* %21, 8), (double**
%arrayidx68, 8), (double* %23, 8),
>>>>>> > >> (double* %arrayidx72, 8), (double**
%arrayidx77, 8), (double* %26,
>>>>>> > >> 8), (double* %arrayidx81, 8), (double**
%arrayidx85, 8), (double*
>>>>>> > >> %29, 8)
>>>>>> > >> $8 = void
>>>>>> > >>
>>>>>> > >> You can reproduce it with the commands:
>>>>>> > >> opt -O3 -S -cfl-aa -basicaa -licm
-debug-only=licm reduce.ll -o out
>>>>>> > >>
>>>>>> > >> opt -O3 -S - -basicaa -licm
-debug-only=licm reduce.ll -o out LICM
>>>>>> > >> hoisting to while.body.lr.ph : %21 = load
double** %arrayidx8,
>>>>>> > >> align 8, !tbaa !5 LICM hoisting to
while.body.lr.ph : %arrayidx72 >>>>>> > >>
getelementptr inbounds double* %11, i64 1 LICM hoisting to
>>>>>> > >> while.body.lr.ph : %arrayidx81 =
getelementptr inbounds double*
>>>>>> > >> %11, i64 2
>>>>>> > >>
>>>>>> > >> Hal, what do you think? Does it meant
that cfl-aa is more
>>>>>> > >> conservative (note I set the right order
-cfa-aa followed by
>>>>>> > >> -basicaa)? Could it be an issue with
building the AliasSet tracker?
>>>>>> > >> Any pointers on how to fix this?
>>>>>> > >>
>>>>>> > >> Thanks,
>>>>>> > >> Ana.
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >> -----Original Message-----
>>>>>> > >> From: Jiangning Liu [mailto:
Jiangning.Liu at arm.com ]
>>>>>> > >> Sent: Wednesday, December 24, 2014 6:27
PM
>>>>>> > >> To: Hal Finkel; Ana Pazos
>>>>>> > >> Subject: RE: question about enabling
cfl-aa and collecting a57
>>>>>> > >> numbers
>>>>>> > >>
>>>>>> > >> Hi,
>>>>>> > >>
>>>>>> > >> I saw big regressions on cortex-A57 for
the following SPEC
>>>>>> > >> benchmarks after using "-mllvm
-use-cfl-aa ".
>>>>>> > >>
>>>>>> > >> spec.cpu2006.ref.462_libquantum 9.22%
spec.cpu2000.ref.179_art
>>>>>> > >> 5.32%
>>>>>> > >> spec.cpu2000.ref.256_bzip2 4.91%
>>>>>> > >>
>>>>>> > >> Thanks,
>>>>>> > >> -Jiangning
>>>>>> > >>
>>>>>> > >>> -----Original Message-----
>>>>>> > >>> From: Hal Finkel [mailto: hfinkel at
anl.gov ]
>>>>>> > >>> Sent: Wednesday, December 24, 2014
3:56 AM
>>>>>> > >>> To: Ana Pazos
>>>>>> > >>> Cc: Jiangning Liu
>>>>>> > >>> Subject: Re: question about enabling
cfl-aa and collecting a57
>>>>>> > >>> numbers
>>>>>> > >>>
>>>>>> > >>> ----- Original Message -----
>>>>>> > >>>> From: "Ana Pazos" <
apazos at codeaurora.org >
>>>>>> > >>>> To: "Hal Finkel" <
hfinkel at anl.gov >, "Jiangning Liu"
>>>>>> > >>>> < Jiangning.Liu at arm.com
>
>>>>>> > >>>> Sent: Monday, December 22, 2014
4:25:07 PM
>>>>>> > >>>> Subject: question about enabling
cfl-aa and collecting a57
>>>>>> > >>>> numbers
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>> Hi Hal,
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>> Can you clarify to enable cfl-aa
from clang we need both these
>>>>>> > >>>> flags?
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>
>>>>>> > >>> Hi Ana,
>>>>>> > >>>
>>>>>> > >>> Thanks for following-up on this! You
only need (-mllvm
>>>>>> > >>> -use-cfl-aa)
>>>>>> > >>> to
>>>>>> > >>> use CFL AA during the main
optimization pipeline, and just this is
>>>>>> > >>> enough to reproduce the performance
regressions as far as I know.
>>>>>> > >>> You
>>>>>> > >>> can also use CFL AA during code
generation (-mllvm
>>>>>> > >>> -use-cfl-aa-in-codegen), which
matters only if your using AA at
>>>>>> > >>> all for code generation (which
AArch64 does only for the
>>>>>> > >>> Cortex/A53).
>>>>>> > >>>
>>>>>> > >>> -Hal
>>>>>> > >>>
>>>>>> > >>>>
>>>>>> > >>>> -mllvm -use-cfl-aa
>>>>>> > >>>>
>>>>>> > >>>> -mllvm -cfl-aa-in-codegen
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>> Hi Jiangning,
>>>>>> > >>>>
>>>>>> > >>>> Did you collect a57 perf results
for SPEC 2000 and 2006 enabling
>>>>>> > >>>> these two flags?
>>>>>> > >>>>
>>>>>> > >>>> Did you notice any correctness
and performance regression?
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>> Thanks,
>>>>>> > >>>>
>>>>>> > >>>> Ana.
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>> Ana Pazos
>>>>>> > >>>> Qualcomm Innovation Center, Inc.
>>>>>> > >>>> The Qualcomm Innovation Center,
Inc. is a member of the Code
>>>>>> > >>>> Aurora Forum, a Linux Foundation
Collaborative Project.
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> > >>>
>>>>>> > >>> --
>>>>>> > >>> Hal Finkel
>>>>>> > >>> Assistant Computational Scientist
>>>>>> > >>> Leadership Computing Facility
>>>>>> > >>> Argonne National Laboratory
>>>>>> > >>
>>>>>> > >>
>>>>>> > >> -- IMPORTANT NOTICE: The contents of this
email and any attachments
>>>>>> > >> are confidential and may also be
privileged. If you are not the
>>>>>> > >> intended recipient, please notify the
sender immediately and do not
>>>>>> > >> disclose the contents to any other
person, use it for any purpose,
>>>>>> > >> or store or copy the information in any
medium. Thank you.
>>>>>> > >>
>>>>>> > >> ARM Limited, Registered office 110
Fulbourn Road, Cambridge CB1
>>>>>> > >> 9NJ, Registered in England & Wales,
Company No: 2557590 ARM
>>>>>> > >> Holdings plc, Registered office 110
Fulbourn Road, Cambridge CB1
>>>>>> > >> 9NJ, Registered in England & Wales,
Company No: 2548782
>>>>>> > >>
>>>>>> > >
>>>>>> > > --
>>>>>> > > Hal Finkel
>>>>>> > > Assistant Computational Scientist
>>>>>> > > Leadership Computing Facility
>>>>>> > > Argonne National Laboratory
>>>>>> 
>>>>>> --
>>>>>> Hal Finkel
>>>>>> Assistant Computational Scientist
>>>>>> Leadership Computing Facility
>>>>>> Argonne National Laboratory
>>> 
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> <cflaafix.diff>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150114/47d076ab/attachment.html>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Jan 2015 - [LLVMdev] question about enabling cfl-aa and collecting a57 numbers

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

Apparently Analagous Threads