thr3ads.net - llvm dev - [LLVMdev] Testing the new CFL alias analysis [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Gerolf Hoflehner

2014-Sep-15 21:56 UTC

[LLVMdev] Testing the new CFL alias analysis

On CINT2006 ARM64/ref input/lto+pgo I practically measure no performance
difference for the 7 benchmarks that compile. This includes bzip2 (although
different source base than in CINT2000), mcf,  hmmer, sjeng, h364ref, astar,
xalancbmk

On Sep 15, 2014, at 11:59 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
>> From: "Gerolf Hoflehner" <ghoflehner at apple.com>
>> To: "Jiangning Liu" <liujiangning1 at gmail.com>,
"George Burgess IV" <george.burgess.iv at gmail.com>, "Hal
Finkel"
>> <hfinkel at anl.gov>
>> Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu>
>> Sent: Sunday, September 14, 2014 12:15:02 AM
>> Subject: Re: [LLVMdev] Testing the new CFL alias analysis
>> 
>> In lto+pgo some (5 out of 12 with usual suspect like perlbench and
>> gcc among them using -flto -Wl,-mllvm,-use-cfl-aa
>> -Wl,-mllvm,-use-cfl-aa-in-codegen) the CINT2006 benchmarks don’t
>> compile.
> 
> On what platform? Could you bugpoint it and file a report?
Ok, I’ll see that I can get a small test case.> 
>> Has the implementation been tested with lto?
> 
> I've not.
> 
>> If not, please
>> stress the implementation more.
>> Do we know reasons for gains? Where did you expect the biggest gains?
> 
> I don't want to make a global statement here. My expectation is that
we'll see wins from increasing register pressure ;) -- hoisting more loads
out of loops (there are certainly cases involving multiple-levels of
dereferencing and insert/extract instructions where CFL can provide a NoAlias
answer where BasicAA gives up). Obviously, we'll also have problems if we
increase pressure too much.Maybe. But I prefer the OoO HW to handle hoisting though. It is hard to tune in
the compiler.
I’m also curious about the impact on loop
transformations.> 
>> Some of the losses will likely boil down to increased register
>> pressure.
> 
> Agreed.
> 
>> 
>> 
>> Looks like the current performance numbers pose a good challenge for
>> gaining new and refreshing insights into our heuristics (and for
>> smoothing out the implementation along the way).
> 
> It certainly seems that way.
> 
> Thanks again,
> Hal
> 
>> 
>> 
>> Cheers
>> Gerolf
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Sep 12, 2014, at 1:27 AM, Jiangning Liu < liujiangning1 at
gmail.com
>>> wrote:
>> 
>> 
>> 
>> Hi Hal,
>> 
>> I run on SPEC2000 on cortex-a57(AArch64), and got the following
>> results,
>> 
>> (It is to measure run-time reduction, and negative is better
>> performance)
>> 
>> spec.cpu2000.ref.183_equake 33.77%
>> spec.cpu2000.ref.179_art 13.44%
>> spec.cpu2000.ref.256_bzip2 7.80%
>> spec.cpu2000.ref.186_crafty 3.69%
>> spec.cpu2000.ref.175_vpr 2.96%
>> spec.cpu2000.ref.176_gcc 1.77%
>> spec.cpu2000.ref.252_eon 1.77%
>> spec.cpu2000.ref.254_gap 1.19%
>> spec.cpu2000.ref.197_parser 1.15%
>> spec.cpu2000.ref.253_perlbmk 1.11%
>> spec.cpu2000.ref.300_twolf -1.04%
>> 
>> So we can see almost all got worse performance.
>> 
>> The command line option I'm using is "-O3 -std=gnu89
-ffast-math
>> -fslp-vectorize -fvectorize -mcpu=cortex-a57 -mllvm -use-cfl-aa
>> -mllvm -use-cfl-aa-in-codegen"
>> 
>> I didn't try compile-time, and I think your test on POWER7 native
>> build should already meant something for other hosts. Also I don't
>> have a good benchmark suit for compile time testing. My past
>> experiences showed both llvm-test-suite (single/multiple) and spec
>> benchmark are not good benchmarks for compile time testing.
>> 
>> Thanks,
>> -Jiangning
>> 
>> 
>> 2014-09-04 1:11 GMT+08:00 Hal Finkel < hfinkel at anl.gov > :
>> 
>> 
>> Hello everyone,
>> 
>> One of Google's summer interns, George Burgess IV, created an
>> implementation of the CFL pointer-aliasing analysis algorithm, and
>> this has now been added to LLVM trunk. Now we should determine
>> whether it is worthwhile adding this to the default optimization
>> pipeline. For ease of testing, I've added the command line option
>> -use-cfl-aa which will cause the CFL analysis to be added to the
>> optimization pipeline. This can be used with the opt program, and
>> also via Clang by passing: -mllvm -use-cfl-aa.
>> 
>> For the purpose of testing with those targets that make use of
>> aliasing analysis during code generation, there is also a
>> corresponding -use-cfl-aa-in-codegen option.
>> 
>> Running the test suite on one of our IBM POWER7 systems (comparing
>> -O3 -mcpu=native to -O3 -mcpu=native -mllvm -use-cfl-aa -mllvm
>> -use-cfl-aa-in-codegen [testing without use in code generation were
>> essentially the same]), I see no significant compile-time changes,
>> and the following performance results:
>> speedup:
>> MultiSource/Benchmarks/mafft/pairlocalalign: -11.5862% +/- 5.9257%
>> 
>> slowdown:
>> MultiSource/Benchmarks/FreeBench/neural/neural: 158.679% +/- 22.3212%
>> MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset:
>> 0.627176% +/- 0.290698%
>> MultiSource/Benchmarks/Ptrdist/ks/ks: 57.5457% +/- 21.8869%
>> 
>> I ran the test suite 20 times in each configuration, using make -j48
>> each time, so I'll only pick up large changes. I've not yet
>> investigated the cause of the slowdowns (or the speedup), and I
>> really need people to try this on x86, ARM, etc. I appears, however,
>> the better aliasing analysis results might have some negative
>> unintended consequences, and we'll need to look at those closely.
>> 
>> Please let me know how this fares on your systems!
>> 
>> Thanks again,
>> Hal
>> 
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
>> 
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140915/9e9b072e/attachment.html>

Gerolf Hoflehner

2014-Sep-16 00:58 UTC

head link

[LLVMdev] Testing the new CFL alias analysis

I filed bugzilla pr20954.

-Gerolf

On Sep 15, 2014, at 2:56 PM, Gerolf Hoflehner <ghoflehner at apple.com>
wrote:
> On CINT2006 ARM64/ref input/lto+pgo I practically measure no performance
difference for the 7 benchmarks that compile. This includes bzip2 (although
different source base than in CINT2000), mcf,  hmmer, sjeng, h364ref, astar,
xalancbmk
> 
> On Sep 15, 2014, at 11:59 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
>> ----- Original Message -----
>>> From: "Gerolf Hoflehner" <ghoflehner at apple.com>
>>> To: "Jiangning Liu" <liujiangning1 at gmail.com>,
"George Burgess IV" <george.burgess.iv at gmail.com>, "Hal
Finkel"
>>> <hfinkel at anl.gov>
>>> Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu>
>>> Sent: Sunday, September 14, 2014 12:15:02 AM
>>> Subject: Re: [LLVMdev] Testing the new CFL alias analysis
>>> 
>>> In lto+pgo some (5 out of 12 with usual suspect like perlbench and
>>> gcc among them using -flto -Wl,-mllvm,-use-cfl-aa
>>> -Wl,-mllvm,-use-cfl-aa-in-codegen) the CINT2006 benchmarks don’t
>>> compile.
>> 
>> On what platform? Could you bugpoint it and file a report?
> Ok, I’ll see that I can get a small test case.
>> 
>>> Has the implementation been tested with lto?
>> 
>> I've not.
>> 
>>> If not, please
>>> stress the implementation more.
>>> Do we know reasons for gains? Where did you expect the biggest
gains?
>> 
>> I don't want to make a global statement here. My expectation is
that we'll see wins from increasing register pressure ;) -- hoisting more
loads out of loops (there are certainly cases involving multiple-levels of
dereferencing and insert/extract instructions where CFL can provide a NoAlias
answer where BasicAA gives up). Obviously, we'll also have problems if we
increase pressure too much.
> Maybe. But I prefer the OoO HW to handle hoisting though. It is hard to
tune in the compiler.
> I’m also curious about the impact on loop transformations.
>> 
>>> Some of the losses will likely boil down to increased register
>>> pressure.
>> 
>> Agreed.
>> 
>>> 
>>> 
>>> Looks like the current performance numbers pose a good challenge
for
>>> gaining new and refreshing insights into our heuristics (and for
>>> smoothing out the implementation along the way).
>> 
>> It certainly seems that way.
>> 
>> Thanks again,
>> Hal
>> 
>>> 
>>> 
>>> Cheers
>>> Gerolf
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Sep 12, 2014, at 1:27 AM, Jiangning Liu < liujiangning1 at
gmail.com
>>>> wrote:
>>> 
>>> 
>>> 
>>> Hi Hal,
>>> 
>>> I run on SPEC2000 on cortex-a57(AArch64), and got the following
>>> results,
>>> 
>>> (It is to measure run-time reduction, and negative is better
>>> performance)
>>> 
>>> spec.cpu2000.ref.183_equake 33.77%
>>> spec.cpu2000.ref.179_art 13.44%
>>> spec.cpu2000.ref.256_bzip2 7.80%
>>> spec.cpu2000.ref.186_crafty 3.69%
>>> spec.cpu2000.ref.175_vpr 2.96%
>>> spec.cpu2000.ref.176_gcc 1.77%
>>> spec.cpu2000.ref.252_eon 1.77%
>>> spec.cpu2000.ref.254_gap 1.19%
>>> spec.cpu2000.ref.197_parser 1.15%
>>> spec.cpu2000.ref.253_perlbmk 1.11%
>>> spec.cpu2000.ref.300_twolf -1.04%
>>> 
>>> So we can see almost all got worse performance.
>>> 
>>> The command line option I'm using is "-O3 -std=gnu89
-ffast-math
>>> -fslp-vectorize -fvectorize -mcpu=cortex-a57 -mllvm -use-cfl-aa
>>> -mllvm -use-cfl-aa-in-codegen"
>>> 
>>> I didn't try compile-time, and I think your test on POWER7
native
>>> build should already meant something for other hosts. Also I
don't
>>> have a good benchmark suit for compile time testing. My past
>>> experiences showed both llvm-test-suite (single/multiple) and spec
>>> benchmark are not good benchmarks for compile time testing.
>>> 
>>> Thanks,
>>> -Jiangning
>>> 
>>> 
>>> 2014-09-04 1:11 GMT+08:00 Hal Finkel < hfinkel at anl.gov > :
>>> 
>>> 
>>> Hello everyone,
>>> 
>>> One of Google's summer interns, George Burgess IV, created an
>>> implementation of the CFL pointer-aliasing analysis algorithm, and
>>> this has now been added to LLVM trunk. Now we should determine
>>> whether it is worthwhile adding this to the default optimization
>>> pipeline. For ease of testing, I've added the command line
option
>>> -use-cfl-aa which will cause the CFL analysis to be added to the
>>> optimization pipeline. This can be used with the opt program, and
>>> also via Clang by passing: -mllvm -use-cfl-aa.
>>> 
>>> For the purpose of testing with those targets that make use of
>>> aliasing analysis during code generation, there is also a
>>> corresponding -use-cfl-aa-in-codegen option.
>>> 
>>> Running the test suite on one of our IBM POWER7 systems (comparing
>>> -O3 -mcpu=native to -O3 -mcpu=native -mllvm -use-cfl-aa -mllvm
>>> -use-cfl-aa-in-codegen [testing without use in code generation were
>>> essentially the same]), I see no significant compile-time changes,
>>> and the following performance results:
>>> speedup:
>>> MultiSource/Benchmarks/mafft/pairlocalalign: -11.5862% +/- 5.9257%
>>> 
>>> slowdown:
>>> MultiSource/Benchmarks/FreeBench/neural/neural: 158.679% +/-
22.3212%
>>> MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset:
>>> 0.627176% +/- 0.290698%
>>> MultiSource/Benchmarks/Ptrdist/ks/ks: 57.5457% +/- 21.8869%
>>> 
>>> I ran the test suite 20 times in each configuration, using make
-j48
>>> each time, so I'll only pick up large changes. I've not yet
>>> investigated the cause of the slowdowns (or the speedup), and I
>>> really need people to try this on x86, ARM, etc. I appears,
however,
>>> the better aliasing analysis results might have some negative
>>> unintended consequences, and we'll need to look at those
closely.
>>> 
>>> Please let me know how this fares on your systems!
>>> 
>>> Thanks again,
>>> Hal
>>> 
>>> --
>>> Hal Finkel
>>> Assistant Computational Scientist
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> 
>>> 
>> 
>> -- 
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140915/ce6e5076/attachment.html>

Hal Finkel

2014-Sep-16 01:35 UTC

head link

[LLVMdev] Testing the new CFL alias analysis

----- Original Message -----> From: "Gerolf Hoflehner" <ghoflehner at apple.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu>, "Jiangning
Liu" <liujiangning1 at gmail.com>, "George Burgess IV"
> <george.burgess.iv at gmail.com>
> Sent: Monday, September 15, 2014 7:58:59 PM
> Subject: Re: [LLVMdev] Testing the new CFL alias analysis
> 
> I filed bugzilla pr20954.
Thanks!

 -Hal
> 
> 
> -Gerolf
> 
> 
> 
> 
> 
> 
> On Sep 15, 2014, at 2:56 PM, Gerolf Hoflehner < ghoflehner at apple.com
> > wrote:
> 
> 
> 
> On CINT2006 ARM64/ref input/lto+pgo I practically measure no
> performance difference for the 7 benchmarks that compile. This
> includes bzip2 (although different source base than in CINT2000),
> mcf, hmmer, sjeng, h364ref, astar, xalancbmk
> 
> 
> 
> On Sep 15, 2014, at 11:59 AM, Hal Finkel < hfinkel at anl.gov >
wrote:
> 
> 
> 
> ----- Original Message -----
> 
> 
> From: "Gerolf Hoflehner" < ghoflehner at apple.com >
> To: "Jiangning Liu" < liujiangning1 at gmail.com >,
"George Burgess IV"
> < george.burgess.iv at gmail.com >, "Hal Finkel"
> < hfinkel at anl.gov >
> Cc: "LLVM Dev" < llvmdev at cs.uiuc.edu >
> Sent: Sunday, September 14, 2014 12:15:02 AM
> Subject: Re: [LLVMdev] Testing the new CFL alias analysis
> 
> In lto+pgo some (5 out of 12 with usual suspect like perlbench and
> gcc among them using -flto -Wl,-mllvm,-use-cfl-aa
> -Wl,-mllvm,-use-cfl-aa-in-codegen) the CINT2006 benchmarks don’t
> compile.
> 
> On what platform? Could you bugpoint it and file a report?
> Ok, I’ll see that I can get a small test case.
> 
> 
> 
> 
> 
> 
> Has the implementation been tested with lto?
> 
> I've not.
> 
> 
> 
> If not, please
> stress the implementation more.
> Do we know reasons for gains? Where did you expect the biggest gains?
> 
> I don't want to make a global statement here. My expectation is that
> we'll see wins from increasing register pressure ;) -- hoisting more
> loads out of loops (there are certainly cases involving
> multiple-levels of dereferencing and insert/extract instructions
> where CFL can provide a NoAlias answer where BasicAA gives up).
> Obviously, we'll also have problems if we increase pressure too
> much.
> 
> Maybe. But I prefer the OoO HW to handle hoisting though. It is hard
> to tune in the compiler.
> I’m also curious about the impact on loop transformations.
> 
> 
> 
> 
> 
> Some of the losses will likely boil down to increased register
> pressure.
> 
> Agreed.
> 
> 
> 
> 
> 
> Looks like the current performance numbers pose a good challenge for
> gaining new and refreshing insights into our heuristics (and for
> smoothing out the implementation along the way).
> 
> It certainly seems that way.
> 
> Thanks again,
> Hal
> 
> 
> 
> 
> 
> Cheers
> Gerolf
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Sep 12, 2014, at 1:27 AM, Jiangning Liu < liujiangning1 at gmail.com
> 
> 
> wrote:
> 
> 
> 
> Hi Hal,
> 
> I run on SPEC2000 on cortex-a57(AArch64), and got the following
> results,
> 
> (It is to measure run-time reduction, and negative is better
> performance)
> 
> spec.cpu2000.ref.183_equake 33.77%
> spec.cpu2000.ref.179_art 13.44%
> spec.cpu2000.ref.256_bzip2 7.80%
> spec.cpu2000.ref.186_crafty 3.69%
> spec.cpu2000.ref.175_vpr 2.96%
> spec.cpu2000.ref.176_gcc 1.77%
> spec.cpu2000.ref.252_eon 1.77%
> spec.cpu2000.ref.254_gap 1.19%
> spec.cpu2000.ref.197_parser 1.15%
> spec.cpu2000.ref.253_perlbmk 1.11%
> spec.cpu2000.ref.300_twolf -1.04%
> 
> So we can see almost all got worse performance.
> 
> The command line option I'm using is "-O3 -std=gnu89 -ffast-math
> -fslp-vectorize -fvectorize -mcpu=cortex-a57 -mllvm -use-cfl-aa
> -mllvm -use-cfl-aa-in-codegen"
> 
> I didn't try compile-time, and I think your test on POWER7 native
> build should already meant something for other hosts. Also I don't
> have a good benchmark suit for compile time testing. My past
> experiences showed both llvm-test-suite (single/multiple) and spec
> benchmark are not good benchmarks for compile time testing.
> 
> Thanks,
> -Jiangning
> 
> 
> 2014-09-04 1:11 GMT+08:00 Hal Finkel < hfinkel at anl.gov > :
> 
> 
> Hello everyone,
> 
> One of Google's summer interns, George Burgess IV, created an
> implementation of the CFL pointer-aliasing analysis algorithm, and
> this has now been added to LLVM trunk. Now we should determine
> whether it is worthwhile adding this to the default optimization
> pipeline. For ease of testing, I've added the command line option
> -use-cfl-aa which will cause the CFL analysis to be added to the
> optimization pipeline. This can be used with the opt program, and
> also via Clang by passing: -mllvm -use-cfl-aa.
> 
> For the purpose of testing with those targets that make use of
> aliasing analysis during code generation, there is also a
> corresponding -use-cfl-aa-in-codegen option.
> 
> Running the test suite on one of our IBM POWER7 systems (comparing
> -O3 -mcpu=native to -O3 -mcpu=native -mllvm -use-cfl-aa -mllvm
> -use-cfl-aa-in-codegen [testing without use in code generation were
> essentially the same]), I see no significant compile-time changes,
> and the following performance results:
> speedup:
> MultiSource/Benchmarks/mafft/pairlocalalign: -11.5862% +/- 5.9257%
> 
> slowdown:
> MultiSource/Benchmarks/FreeBench/neural/neural: 158.679% +/- 22.3212%
> MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset:
> 0.627176% +/- 0.290698%
> MultiSource/Benchmarks/Ptrdist/ks/ks: 57.5457% +/- 21.8869%
> 
> I ran the test suite 20 times in each configuration, using make -j48
> each time, so I'll only pick up large changes. I've not yet
> investigated the cause of the slowdowns (or the speedup), and I
> really need people to try this on x86, ARM, etc. I appears, however,
> the better aliasing analysis results might have some negative
> unintended consequences, and we'll need to look at those closely.
> 
> Please let me know how this fares on your systems!
> 
> Thanks again,
> Hal
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Sep 2014 - [LLVMdev] Testing the new CFL alias analysis

[LLVMdev] Testing the new CFL alias analysis

[LLVMdev] Testing the new CFL alias analysis

[LLVMdev] Testing the new CFL alias analysis

Reasonably Related Threads