thr3ads.net - llvm dev - [LLVMdev] Testing the new CFL alias analysis [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Gerolf Hoflehner

2014-Sep-14 05:15 UTC

[LLVMdev] Testing the new CFL alias analysis

In lto+pgo some  (5 out of 12 with usual suspect like perlbench and gcc among
them using -flto -Wl,-mllvm,-use-cfl-aa -Wl,-mllvm,-use-cfl-aa-in-codegen) the
CINT2006 benchmarks don’t compile. Has the implementation been tested with lto?
If not, please stress the implementation more.
Do we know reasons for gains? Where did you expect the biggest gains?
Some of the losses will likely boil down to increased register pressure.

Looks like the current performance numbers pose a good challenge for gaining new
and refreshing insights into our heuristics (and for smoothing out the
implementation along the way).

Cheers
Gerolf



On Sep 12, 2014, at 1:27 AM, Jiangning Liu <liujiangning1 at gmail.com>
wrote:
> Hi Hal,
> 
> I run on SPEC2000 on cortex-a57(AArch64), and got the following results,
> 
> (It is to measure run-time reduction, and negative is better performance)
> 
> spec.cpu2000.ref.183_equake 33.77%
> spec.cpu2000.ref.179_art 13.44%
> spec.cpu2000.ref.256_bzip2 7.80%
> spec.cpu2000.ref.186_crafty 3.69%
> spec.cpu2000.ref.175_vpr 2.96%
> spec.cpu2000.ref.176_gcc 1.77%
> spec.cpu2000.ref.252_eon 1.77%
> spec.cpu2000.ref.254_gap 1.19%
> spec.cpu2000.ref.197_parser 1.15%
> spec.cpu2000.ref.253_perlbmk 1.11%
> spec.cpu2000.ref.300_twolf -1.04%
> 
> So we can see almost all got worse performance.
> 
> The command line option I'm using is "-O3 -std=gnu89 -ffast-math
-fslp-vectorize -fvectorize -mcpu=cortex-a57 -mllvm -use-cfl-aa -mllvm
-use-cfl-aa-in-codegen"
> 
> I didn't try compile-time, and I think your test on POWER7 native build
should already meant something for other hosts. Also I don't have a good
benchmark suit for compile time testing. My past experiences showed both
llvm-test-suite (single/multiple) and spec benchmark are not good benchmarks for
compile time testing.
> 
> Thanks,
> -Jiangning
> 
> 2014-09-04 1:11 GMT+08:00 Hal Finkel <hfinkel at anl.gov>:
> Hello everyone,
> 
> One of Google's summer interns, George Burgess IV, created an
implementation of the CFL pointer-aliasing analysis algorithm, and this has now
been added to LLVM trunk. Now we should determine whether it is worthwhile
adding this to the default optimization pipeline. For ease of testing, I've
added the command line option -use-cfl-aa which will cause the CFL analysis to
be added to the optimization pipeline. This can be used with the opt program,
and also via Clang by passing: -mllvm -use-cfl-aa.
> 
> For the purpose of testing with those targets that make use of aliasing
analysis during code generation, there is also a corresponding
-use-cfl-aa-in-codegen option.
> 
> Running the test suite on one of our IBM POWER7 systems (comparing -O3
-mcpu=native to -O3 -mcpu=native -mllvm -use-cfl-aa -mllvm
-use-cfl-aa-in-codegen [testing without use in code generation were essentially
the same]), I see no significant compile-time changes, and the following
performance results:
>   speedup:
>     MultiSource/Benchmarks/mafft/pairlocalalign: -11.5862% +/- 5.9257%
> 
>   slowdown:
>     MultiSource/Benchmarks/FreeBench/neural/neural: 158.679% +/- 22.3212%
>     MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset:
0.627176% +/- 0.290698%
>     MultiSource/Benchmarks/Ptrdist/ks/ks: 57.5457% +/- 21.8869%
> 
> I ran the test suite 20 times in each configuration, using make -j48 each
time, so I'll only pick up large changes. I've not yet investigated the
cause of the slowdowns (or the speedup), and I really need people to try this on
x86, ARM, etc. I appears, however, the better aliasing analysis results might
have some negative unintended consequences, and we'll need to look at those
closely.
> 
> Please let me know how this fares on your systems!
> 
> Thanks again,
> Hal
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140913/256de6cc/attachment.html>

Hal Finkel

2014-Sep-15 18:59 UTC

head link

[LLVMdev] Testing the new CFL alias analysis

----- Original Message -----> From: "Gerolf Hoflehner" <ghoflehner at apple.com>
> To: "Jiangning Liu" <liujiangning1 at gmail.com>,
"George Burgess IV" <george.burgess.iv at gmail.com>, "Hal
Finkel"
> <hfinkel at anl.gov>
> Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu>
> Sent: Sunday, September 14, 2014 12:15:02 AM
> Subject: Re: [LLVMdev] Testing the new CFL alias analysis
> 
> In lto+pgo some (5 out of 12 with usual suspect like perlbench and
> gcc among them using -flto -Wl,-mllvm,-use-cfl-aa
> -Wl,-mllvm,-use-cfl-aa-in-codegen) the CINT2006 benchmarks don’t
> compile.
On what platform? Could you bugpoint it and file a report?
> Has the implementation been tested with lto?
I've not.
> If not, please
> stress the implementation more.
> Do we know reasons for gains? Where did you expect the biggest gains?
I don't want to make a global statement here. My expectation is that
we'll see wins from increasing register pressure ;) -- hoisting more loads
out of loops (there are certainly cases involving multiple-levels of
dereferencing and insert/extract instructions where CFL can provide a NoAlias
answer where BasicAA gives up). Obviously, we'll also have problems if we
increase pressure too much.
> Some of the losses will likely boil down to increased register
> pressure.
Agreed.
> 
> 
> Looks like the current performance numbers pose a good challenge for
> gaining new and refreshing insights into our heuristics (and for
> smoothing out the implementation along the way).
It certainly seems that way.

Thanks again,
Hal
> 
> 
> Cheers
> Gerolf
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Sep 12, 2014, at 1:27 AM, Jiangning Liu < liujiangning1 at gmail.com
> > wrote:
> 
> 
> 
> Hi Hal,
> 
> I run on SPEC2000 on cortex-a57(AArch64), and got the following
> results,
> 
> (It is to measure run-time reduction, and negative is better
> performance)
> 
> spec.cpu2000.ref.183_equake 33.77%
> spec.cpu2000.ref.179_art 13.44%
> spec.cpu2000.ref.256_bzip2 7.80%
> spec.cpu2000.ref.186_crafty 3.69%
> spec.cpu2000.ref.175_vpr 2.96%
> spec.cpu2000.ref.176_gcc 1.77%
> spec.cpu2000.ref.252_eon 1.77%
> spec.cpu2000.ref.254_gap 1.19%
> spec.cpu2000.ref.197_parser 1.15%
> spec.cpu2000.ref.253_perlbmk 1.11%
> spec.cpu2000.ref.300_twolf -1.04%
> 
> So we can see almost all got worse performance.
> 
> The command line option I'm using is "-O3 -std=gnu89 -ffast-math
> -fslp-vectorize -fvectorize -mcpu=cortex-a57 -mllvm -use-cfl-aa
> -mllvm -use-cfl-aa-in-codegen"
> 
> I didn't try compile-time, and I think your test on POWER7 native
> build should already meant something for other hosts. Also I don't
> have a good benchmark suit for compile time testing. My past
> experiences showed both llvm-test-suite (single/multiple) and spec
> benchmark are not good benchmarks for compile time testing.
> 
> Thanks,
> -Jiangning
> 
> 
> 2014-09-04 1:11 GMT+08:00 Hal Finkel < hfinkel at anl.gov > :
> 
> 
> Hello everyone,
> 
> One of Google's summer interns, George Burgess IV, created an
> implementation of the CFL pointer-aliasing analysis algorithm, and
> this has now been added to LLVM trunk. Now we should determine
> whether it is worthwhile adding this to the default optimization
> pipeline. For ease of testing, I've added the command line option
> -use-cfl-aa which will cause the CFL analysis to be added to the
> optimization pipeline. This can be used with the opt program, and
> also via Clang by passing: -mllvm -use-cfl-aa.
> 
> For the purpose of testing with those targets that make use of
> aliasing analysis during code generation, there is also a
> corresponding -use-cfl-aa-in-codegen option.
> 
> Running the test suite on one of our IBM POWER7 systems (comparing
> -O3 -mcpu=native to -O3 -mcpu=native -mllvm -use-cfl-aa -mllvm
> -use-cfl-aa-in-codegen [testing without use in code generation were
> essentially the same]), I see no significant compile-time changes,
> and the following performance results:
> speedup:
> MultiSource/Benchmarks/mafft/pairlocalalign: -11.5862% +/- 5.9257%
> 
> slowdown:
> MultiSource/Benchmarks/FreeBench/neural/neural: 158.679% +/- 22.3212%
> MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset:
> 0.627176% +/- 0.290698%
> MultiSource/Benchmarks/Ptrdist/ks/ks: 57.5457% +/- 21.8869%
> 
> I ran the test suite 20 times in each configuration, using make -j48
> each time, so I'll only pick up large changes. I've not yet
> investigated the cause of the slowdowns (or the speedup), and I
> really need people to try this on x86, ARM, etc. I appears, however,
> the better aliasing analysis results might have some negative
> unintended consequences, and we'll need to look at those closely.
> 
> Please let me know how this fares on your systems!
> 
> Thanks again,
> Hal
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Gerolf Hoflehner

2014-Sep-15 21:56 UTC

head link

[LLVMdev] Testing the new CFL alias analysis

On CINT2006 ARM64/ref input/lto+pgo I practically measure no performance
difference for the 7 benchmarks that compile. This includes bzip2 (although
different source base than in CINT2000), mcf,  hmmer, sjeng, h364ref, astar,
xalancbmk

On Sep 15, 2014, at 11:59 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
>> From: "Gerolf Hoflehner" <ghoflehner at apple.com>
>> To: "Jiangning Liu" <liujiangning1 at gmail.com>,
"George Burgess IV" <george.burgess.iv at gmail.com>, "Hal
Finkel"
>> <hfinkel at anl.gov>
>> Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu>
>> Sent: Sunday, September 14, 2014 12:15:02 AM
>> Subject: Re: [LLVMdev] Testing the new CFL alias analysis
>> 
>> In lto+pgo some (5 out of 12 with usual suspect like perlbench and
>> gcc among them using -flto -Wl,-mllvm,-use-cfl-aa
>> -Wl,-mllvm,-use-cfl-aa-in-codegen) the CINT2006 benchmarks don’t
>> compile.
> 
> On what platform? Could you bugpoint it and file a report?
Ok, I’ll see that I can get a small test case.> 
>> Has the implementation been tested with lto?
> 
> I've not.
> 
>> If not, please
>> stress the implementation more.
>> Do we know reasons for gains? Where did you expect the biggest gains?
> 
> I don't want to make a global statement here. My expectation is that
we'll see wins from increasing register pressure ;) -- hoisting more loads
out of loops (there are certainly cases involving multiple-levels of
dereferencing and insert/extract instructions where CFL can provide a NoAlias
answer where BasicAA gives up). Obviously, we'll also have problems if we
increase pressure too much.Maybe. But I prefer the OoO HW to handle hoisting though. It is hard to tune in
the compiler.
I’m also curious about the impact on loop
transformations.> 
>> Some of the losses will likely boil down to increased register
>> pressure.
> 
> Agreed.
> 
>> 
>> 
>> Looks like the current performance numbers pose a good challenge for
>> gaining new and refreshing insights into our heuristics (and for
>> smoothing out the implementation along the way).
> 
> It certainly seems that way.
> 
> Thanks again,
> Hal
> 
>> 
>> 
>> Cheers
>> Gerolf
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Sep 12, 2014, at 1:27 AM, Jiangning Liu < liujiangning1 at
gmail.com
>>> wrote:
>> 
>> 
>> 
>> Hi Hal,
>> 
>> I run on SPEC2000 on cortex-a57(AArch64), and got the following
>> results,
>> 
>> (It is to measure run-time reduction, and negative is better
>> performance)
>> 
>> spec.cpu2000.ref.183_equake 33.77%
>> spec.cpu2000.ref.179_art 13.44%
>> spec.cpu2000.ref.256_bzip2 7.80%
>> spec.cpu2000.ref.186_crafty 3.69%
>> spec.cpu2000.ref.175_vpr 2.96%
>> spec.cpu2000.ref.176_gcc 1.77%
>> spec.cpu2000.ref.252_eon 1.77%
>> spec.cpu2000.ref.254_gap 1.19%
>> spec.cpu2000.ref.197_parser 1.15%
>> spec.cpu2000.ref.253_perlbmk 1.11%
>> spec.cpu2000.ref.300_twolf -1.04%
>> 
>> So we can see almost all got worse performance.
>> 
>> The command line option I'm using is "-O3 -std=gnu89
-ffast-math
>> -fslp-vectorize -fvectorize -mcpu=cortex-a57 -mllvm -use-cfl-aa
>> -mllvm -use-cfl-aa-in-codegen"
>> 
>> I didn't try compile-time, and I think your test on POWER7 native
>> build should already meant something for other hosts. Also I don't
>> have a good benchmark suit for compile time testing. My past
>> experiences showed both llvm-test-suite (single/multiple) and spec
>> benchmark are not good benchmarks for compile time testing.
>> 
>> Thanks,
>> -Jiangning
>> 
>> 
>> 2014-09-04 1:11 GMT+08:00 Hal Finkel < hfinkel at anl.gov > :
>> 
>> 
>> Hello everyone,
>> 
>> One of Google's summer interns, George Burgess IV, created an
>> implementation of the CFL pointer-aliasing analysis algorithm, and
>> this has now been added to LLVM trunk. Now we should determine
>> whether it is worthwhile adding this to the default optimization
>> pipeline. For ease of testing, I've added the command line option
>> -use-cfl-aa which will cause the CFL analysis to be added to the
>> optimization pipeline. This can be used with the opt program, and
>> also via Clang by passing: -mllvm -use-cfl-aa.
>> 
>> For the purpose of testing with those targets that make use of
>> aliasing analysis during code generation, there is also a
>> corresponding -use-cfl-aa-in-codegen option.
>> 
>> Running the test suite on one of our IBM POWER7 systems (comparing
>> -O3 -mcpu=native to -O3 -mcpu=native -mllvm -use-cfl-aa -mllvm
>> -use-cfl-aa-in-codegen [testing without use in code generation were
>> essentially the same]), I see no significant compile-time changes,
>> and the following performance results:
>> speedup:
>> MultiSource/Benchmarks/mafft/pairlocalalign: -11.5862% +/- 5.9257%
>> 
>> slowdown:
>> MultiSource/Benchmarks/FreeBench/neural/neural: 158.679% +/- 22.3212%
>> MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset:
>> 0.627176% +/- 0.290698%
>> MultiSource/Benchmarks/Ptrdist/ks/ks: 57.5457% +/- 21.8869%
>> 
>> I ran the test suite 20 times in each configuration, using make -j48
>> each time, so I'll only pick up large changes. I've not yet
>> investigated the cause of the slowdowns (or the speedup), and I
>> really need people to try this on x86, ARM, etc. I appears, however,
>> the better aliasing analysis results might have some negative
>> unintended consequences, and we'll need to look at those closely.
>> 
>> Please let me know how this fares on your systems!
>> 
>> Thanks again,
>> Hal
>> 
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
>> 
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140915/9e9b072e/attachment.html>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Sep 2014 - [LLVMdev] Testing the new CFL alias analysis

[LLVMdev] Testing the new CFL alias analysis

[LLVMdev] Testing the new CFL alias analysis

[LLVMdev] Testing the new CFL alias analysis

Possibly Parallel Threads