search for: create_llvm_prof

Displaying 18 results from an estimated 18 matches for "create_llvm_prof".

2014 May 12
3
[LLVMdev] Questions about LLVM PGO and autoFDO
..., in CodeGenPGO.cpp line 56, it tried to find '\n' like below: CurPtr = strchr(CurPtr, '\n'); if (!CurPtr) { ReportBadPGOData(CGM, "pgo data file has malformed function entry"); return; } 2. Problems in autoFDO: Actually the problem happened in using create_llvm_prof, transformation is failed. clang -O2 -g test.c -o a.out perf record -b ./a.out (perf version is 0.0.2, "-b" option is not recognized, why?) change to: perf record ./a.out (so perf.data is generated) create_llvm_prof --binary=./a.out --out=code.prof (it read a.out and perf.da...
2016 Aug 12
3
AutoFDO sample profiles v. SelectInst,
...ot; benchmark out of SPEC2006 It is initially compiled clnag++ -o hmmer -O3 -std=gnu89 -DSPEC_CPU -DNDEBUG -fno-strict-aliasing -w -g *.c This baseline binary runs in about 164.2 seconds as reported by "perf stat" We build a sample file from this program using the AutoFDO tool "create_llvm_prof" perf report -b hmmer nph3.hmm swiss41wa create_llvm_prof -out hmmer.llvm ... and rebuild the binary using this profile clnag++ -o hmmer-fdo -fprofile-sample-use=hmmer.llvm \ -O3 -std=gnu89 -DSPEC_CPU -DNDEBUG -fno-strict-aliasing -w -g *.c now, sadly, this program runs...
2020 Sep 18
2
Making library calls for obj2yaml functionalities
...t your motivation for using obj2yaml is in this context? If > it's just for testing purposes, adding support to llvm-readobj/llvm-readelf > would be the more normal way, as it allows you to dump just that section. > Other than testing, we currently have code in an external tool called create_llvm_prof for parsing the ".bb_addr_map" section (+Han Shen <shenhan at google.com> who's the main developer of that tool) and loading it in memory. It would've been great if we could just link with an LLVM library which includes the data-structure and parsing support. It looks like l...
2016 Nov 01
2
(RFC) Encoding code duplication factor in discriminator
...om this design that the profile can easily be reused across different source versions and compiler versions, or even compilers. That being said, the design to encode more info into discriminator does not mean that we will change the profile. The encoded info in discriminator will be handled by the create_llvm_prof tool, which combines counts from different clones of the same source code and generate the combined profile data. The output profile will not have any cloning/dupliaction bits at all. So for the initial example profile I provided, the output profile will be: On Tue, Nov 1, 2016 at 1:04 PM, Hal Fi...
2016 Nov 01
2
(RFC) Encoding code duplication factor in discriminator
...e reused across >> different source versions and compiler versions, or even compilers. >> >> That being said, the design to encode more info into discriminator does >> not mean that we will change the profile. The encoded info in discriminator >> will be handled by the create_llvm_prof tool, which combines counts from >> different clones of the same source code and generate the combined profile >> data. The output profile will not have any cloning/dupliaction bits at all. >> So for the initial example profile I provided, the output profile will be: >> >...
2019 Sep 24
9
[RFC] Propeller: A frame work for Post Link Optimizations
...cc callee.cc -fpropeller-label -o a.out.labels -fuse-ld=lld # Step 2: Profile the binary, only one side of the branch is executed. $ perf record -e cycles:u -j any,u -- ./a.out.labels 1000000 2 >& /dev/null # Step 3: Convert the profiles using the tool provided $ $LLVM_DIR/llvm-propeller/create_llvm_prof --format=propeller \ --binary=./a.out.labels --profile=perf.data --out=perf.propeller # Step 4: Re-Optimize with Propeller, repeat Step 1 with propeller flag changed. $ clang++ -O2 main.cc callee.cc -fpropeller-optimize=perf.propeller -fuse-ld=lld In Step 4, the optimized bit code can be use...
2020 Nov 17
3
[RFC] Control Flow Sensitive AutoFDO (FS-AFDO)
...file count for the unrolled (or vectorized) BBs. The discriminator is divided into 3 components: * Base discriminator, assigned by AddDiscriminators * Duplication factor: used in loop unroll and loop vectorization (when cloning the loop body). This factor will be multiplied by sample counts in create_llvm_prof tool to get the count value before duplication. * Copy Identifier: reserved by not currently used. Duplication factor was explicitly assigned in loop unroll and loop vectorization. Instruction cloning will copy the discriminator. All the cloned instruction instances, other than inlining, loop unr...
2017 Jul 31
1
[RFC] Profile guided section layout
Michael Spencer via llvm-dev <llvm-dev at lists.llvm.org> writes: > I've recently implemented profile guided section layout in llvm + lld using > the Call-Chain Clustering (C³) heuristic from > https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf > . In the programs I've tested it on I've gotten from 0% to 5% performance > improvement over
2020 Nov 19
0
[RFC] Control Flow Sensitive AutoFDO (FS-AFDO)
...file count for the unrolled (or vectorized) BBs. The discriminator is divided into 3 components: * Base discriminator, assigned by AddDiscriminators * Duplication factor: used in loop unroll and loop vectorization (when cloning the loop body). This factor will be multiplied by sample counts in create_llvm_prof tool to get the count value before duplication. * Copy Identifier: reserved by not currently used. Duplication factor was explicitly assigned in loop unroll and loop vectorization. Instruction cloning will copy the discriminator. All the cloned instruction instances, other than inlining, loop un...
2016 Nov 01
2
(RFC) Encoding code duplication factor in discriminator
...> > > > > > > > > That being said, the design to encode more info into > > > > discriminator > > > > does not mean that we will change the profile. The encoded info > > > > in > > > > discriminator will be handled by the create_llvm_prof tool, > > > > which > > > > combines counts from different clones of the same source code > > > > and > > > > generate the combined profile data. The output profile will not > > > > have > > > > any cloning/dupliaction bits at...
2020 Sep 16
2
Making library calls for obj2yaml functionalities
Hi All, Following up on https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html, and https://reviews.llvm.org/D85408, we would like to consider a design which allows external tools to read the structured contents of the .bb_addr_map section with library calls into an LLVM library. At the same time, we need to have tools/obj2yaml tests in place for bb_addr_map. So it sounds like the
2016 Nov 02
3
(RFC) Encoding code duplication factor in discriminator
...design to encode more info into > > > > > > discriminator > > > > > > does not mean that we will change the profile. The encoded > > > > > > info > > > > > > in > > > > > > discriminator will be handled by the create_llvm_prof tool, > > > > > > which > > > > > > combines counts from different clones of the same source > > > > > > code > > > > > > and > > > > > > generate the combined profile data. The output profile will > > &...
2016 Nov 01
2
(RFC) Encoding code duplication factor in discriminator
As illustrated in the above example, it is not like "vectorization has a distinct bit". All different optimizations make clones of code which will be labeled by UIDs represented by N (e.g. 8) bits. In this way, the space will be capped by the number of clones all optimizations have made, instead of # of optimizations that has applied. And it will be capped at 2^N-1. The cons of using uid
2016 Nov 02
2
(RFC) Encoding code duplication factor in discriminator
...> > > > > > > does not mean that we will change the profile. The > > > > > > > > encoded > > > > > > > > info > > > > > > > > in > > > > > > > > discriminator will be handled by the create_llvm_prof > > > > > > > > tool, > > > > > > > > which > > > > > > > > combines counts from different clones of the same > > > > > > > > source > > > > > > > > code > > > > >...
2016 Nov 04
2
(RFC) Encoding code duplication factor in discriminator
...versions, or even compilers. >>>>>> >>>>>> That being said, the design to encode more info into discriminator >>>>>> does not mean that we will change the profile. The encoded info in >>>>>> discriminator will be handled by the create_llvm_prof tool, which combines >>>>>> counts from different clones of the same source code and generate the >>>>>> combined profile data. The output profile will not have any >>>>>> cloning/dupliaction bits at all. So for the initial example profile I &gt...
2016 Nov 21
4
(RFC) Encoding code duplication factor in discriminator
...om this design that the profile can easily be reused across different source versions and compiler versions, or even compilers. That being said, the design to encode more info into discriminator does not mean that we will change the profile. The encoded info in discriminator will be handled by the create_llvm_prof tool, which combines counts from different clones of the same source code and generate the combined profile data. The output profile will not have any cloning/dupliaction bits at all. So for the initial example profile I provided, the output profile will be: #1: 10 #3: 80 Not: #1: 10 #3.0x400: 7...
2019 Sep 26
2
[RFC] Propeller: A frame work for Post Link Optimizations
...abels > -fuse-ld=lld > > # Step 2: Profile the binary, only one side of the branch is executed. > $ perf record -e cycles:u -j any,u -- ./a.out.labels 1000000 2 >& > /dev/null > > > # Step 3: Convert the profiles using the tool provided > $ $LLVM_DIR/llvm-propeller/create_llvm_prof --format=propeller \ > --binary=./a.out.labels --profile=perf.data --out=perf.propeller > > > # Step 4: Re-Optimize with Propeller, repeat Step 1 with propeller flag > changed. > $ clang++ -O2 main.cc callee.cc -fpropeller-optimize=perf.propeller > -fuse-ld=lld > > In...
2017 Jun 15
7
[RFC] Profile guided section layout
I've recently implemented profile guided section layout in llvm + lld using the Call-Chain Clustering (C³) heuristic from https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf . In the programs I've tested it on I've gotten from 0% to 5% performance improvement over standard PGO with zero cases of slowdowns and up to 15% reduction in ITLB misses. There are