search for: csfdo

Displaying 7 results from an estimated 7 matches for "csfdo".

2020 Aug 05
10
[RFC] Machine Function Splitter - Split out cold blocks from machine functions using profile data
...the .text.unlikely section. Unlike Propeller <https://lists.llvm.org/pipermail/llvm-dev/2019-September/135393.html>, which is presently the main user of the basic block sections feature, this pass does not require an additional round of profiling and uses existing instrumentation based FDO or CSFDO profile information. [image: Machine Function Splitter.png] In the illustration above, the functions foo and bar contain a cold block each, index 5 and E respectively. We show a possible layout for these functions which optimizes for fall throughs. Note that all the blocks are kept in a contiguo...
2019 Sep 26
2
[RFC] Propeller: A frame work for Post Link Optimizations
...exactly the form you want for this sort of manipulation: it has basic > blocks which correspond closely to the final binary, and a high-level > representation of branch instructions. This was considered for Propeller. This is currently being explored in a similar way as an alternative of CSFDO which uses PMU samples. > And it's before the DWARF/CFI emission, so you don't need to worry about > fixing them afterwards. This should take less code overall, and much less > target-specific code. And infrastructure for function splitting would be > useful for non-Propeller...
2019 Sep 26
2
[RFC] Propeller: A frame work for Post Link Optimizations
...the form you want for this sort of manipulation: it has basic blocks which correspond closely to the final binary, and a high-level representation of branch instructions. > > > > This was considered for Propeller. This is currently being explored in a similar way as an alternative of CSFDO which uses PMU samples. > > > > Makes sense. > > > > And it's before the DWARF/CFI emission, so you don't need to worry about fixing them afterwards. This should take less code overall, and much less target-specific code. And infrastructure for function splitting wo...
2020 Aug 10
2
[RFC] Machine Function Splitter - Split out cold blocks from machine functions using profile data
...GClmgz-RBw&m=-cUmMKRcOXZHF-PpVxO_Dfg2mkIgP4L_QomIwDizeEE&s=y0u_TamS9xnHRAQVD1cDCxl-AzE-QbTNmnYU73oxxFE&e=>, which is presently the main user of the basic block sections feature, this pass does not require an additional round of profiling and uses existing instrumentation based FDO or CSFDO profile information. [cid:image001.png at 01D66F33.77D7E0A0] In the illustration above, the functions foo and bar contain a cold block each, index 5 and E respectively. We show a possible layout for these functions which optimizes for fall throughs. Note that all the blocks are kept in a contigu...
2020 Aug 05
3
[RFC] Machine Function Splitter - Split out cold blocks from machine functions using profile data
...ropeller >> <https://lists.llvm.org/pipermail/llvm-dev/2019-September/135393.html>, >> which is presently the main user of the basic block sections feature, this >> pass does not require an additional round of profiling and uses existing >> instrumentation based FDO or CSFDO profile information. >> >> [image: Machine Function Splitter.png] >> >> >> In the illustration above, the functions foo and bar contain a cold block >> each, index 5 and E respectively. We show a possible layout for these >> functions which optimizes for fa...
2019 Sep 27
5
[RFC] Propeller: A frame work for Post Link Optimizations
...ic blocks which correspond closely to the final > > binary, and a high-level representation of branch instructions. > > > > > > > > > > > > This was considered for Propeller. This is currently being explored in a similar > > way as an alternative of CSFDO which uses PMU samples. > > > > > > > > > > > > Makes sense. > > > > > > > > > > > > And it's before the DWARF/CFI emission, so you don't need to worry about > > fixing them afterwards. This should take less code...
2019 Sep 24
9
[RFC] Propeller: A frame work for Post Link Optimizations
Greetings, We, at Google, recently evaluated Facebook’s BOLT, a Post Link Optimizer framework, on large google benchmarks and noticed that it improves key performance metrics of these benchmarks by 2% to 6%, which is pretty impressive as this is over and above a baseline binaryalready heavily optimized with ThinLTO + PGO. Furthermore, BOLT is also able to improve the performance of binaries