similar to: [RFC] Profile guided section layout

Displaying 20 results from an estimated 2000 matches similar to: "[RFC] Profile guided section layout"

2017 Jul 31
2
[RFC] Profile guided section layout
A rebased version of the lld patch is attached. Cheers, Rafael On 31 July 2017 at 15:11, Rafael Avila de Espindola <rafael.espindola at gmail.com> wrote: > Tobias Edler von Koch <tobias at codeaurora.org> writes: > >> Hi Rafael, >> >> On 07/31/2017 04:20 PM, Rafael Avila de Espindola via llvm-dev wrote: >>> However, do we need to start with
2017 Aug 01
2
[RFC] Profile guided section layout
I updated the patch to read a call graph from a text file. I tested it with the attached call.txt from lld linking chromium. Unfortunately the resulting lld doesn't seem any faster. One thing I noticed is that the most used symbols seem to be at the end of the file. In any case, can you add tests and send the lld patch for review? Thanks, Rafael On 31 July 2017 at 15:19, Davide Italiano
2017 Jul 31
3
[RFC] Profile guided section layout
Hi Rafael, On 07/31/2017 04:20 PM, Rafael Avila de Espindola via llvm-dev wrote: > However, do we need to start with instrumentation? The original paper > uses sampling with good results and current intel cpus can record every > branch in a program. > > I would propose starting with just an lld patch that reads the call > graph from a file. The format would be very similar to
2017 Jul 31
1
[RFC] Profile guided section layout
Michael Spencer via llvm-dev <llvm-dev at lists.llvm.org> writes: > I've recently implemented profile guided section layout in llvm + lld using > the Call-Chain Clustering (C³) heuristic from > https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf > . In the programs I've tested it on I've gotten from 0% to 5% performance > improvement over
2016 Mar 16
2
LLD performance w.r.t. local symbols (and --build-id)
Slowdown by "[ELF] - Early continue in InputSectionBase<ELFT>::relocate(). NFC.?" looks wierd for me. I do not see any reasons for any impact on perfomance by this change. Good news is that since it was NFC it can easily be reverted. But I think slowdown in results is unrelative with that change and reverting will not give us 2-3% boost back. Best regards, George.
2015 Nov 21
2
[lld] Hiding original type of GOT related relocations
Hi, There are more than one MIPS relocations which need GOT entry creation. Let's consider two of them R_MIPS_GOT16 and R_MIPS_CALL16 [1]. R_MIPS_GOT16 is applicable to local and external symbols and performs a different calculation in each cases [2]. R_MIPS_CALL16 is applicable to external symbols only and a linker should show an error if it finds R_MIPS_CALL16 with a local target. Now LLD
2016 Jun 21
2
[LLD] thunk implementation correctness depends on order of input section.
I've been working on supporting ARM/Thumb interworking thunks in LLD and have encountered a limitation that I think it is worth bringing up in a wider context. This is all LLD specific, apologies if I've abused llvm-dev here. TL;DR summary: - Thunks in lld may not work if they are added to InputSections that have already been scanned. - There is a short term fix, but in the longer term I
2017 Feb 22
2
[lld] elf linker creates undefined empty symbol
Rafael, here is a repro.tar to look at: https://reviews.llvm.org/F3100177 The attached foo.diff adds a print which shows the issue. ``` NAME: sleep SYMINDEX: 2 NAME: sched_yield SYMINDEX: 1 NAME: __libc_start_main SYMINDEX: 0 ``` `readelf --relocs` Shows that we create : ... 000000255110 002900000007 R_X86_64_JUMP_SLO 0000000000254410 __xstat at GLIBC_2.2.5 + 0 000000255118 001e00000007
2016 Mar 16
2
LLD performance w.r.t. local symbols (and --build-id)
On Wed, Mar 16, 2016 at 9:05 AM, Rafael Espíndola < rafael.espindola at gmail.com> wrote: > On 16 March 2016 at 01:34, George Rimar <grimar at accesssoftek.com> wrote: > > Slowdown by "[ELF] - Early continue in > InputSectionBase<ELFT>::relocate(). > > NFC." looks wierd for me. I do not see any reasons for any impact on > > perfomance by this
2016 Mar 16
2
LLD performance w.r.t. local symbols (and --build-id)
Hi, Rafael took some measurements to try to investigate the effect of the local symbols changes. I've been taking a look at the measurements he got and there were some interesting things we noticed. For starters, in the range of revisions tested (r263214 through r263471), we found that the commit for --build-id was the most noticeable, with slowdowns from 7% to 23% (note: these were
2018 Aug 21
7
[lld] avoid emitting PLT entries for ifuncs
Hello, We've recently started using ifuncs in the x86(_64) FreeBSD kernel. Currently lld will emit a PLT entry for each ifunc, so ifunc calls are more expensive that those of regular functions. In our kernel, this overhead isn't really necessary: if lld instead emits PC-relative relocations for each ifunc call site, where each relocation references a symbol of type GNU_IFUNC, then during
2017 Mar 01
2
[lld] We call SymbolBody::getVA redundantly a lot...
On Tue, Feb 28, 2017 at 12:10 PM, Rui Ueyama <ruiu at google.com> wrote: > I don't think getVA is particularly expensive, and if it is not expensive > I wouldn't cache its result. Did you experiment to cache getVA results? I > think you can do that fairly easily by adding a std::atomic_uint64_t to > SymbolBody and use it as a cache for getVA. > You're right,
2011 Jan 12
3
[LLVMdev] Extending LLVM for high-level types
Hi all, I'm designing a programming language named C³ (or C3). I'm already using LLVM as a back-end for my prototype compiler and it's wonderful to use. Thanks for such a great system! I now have more ambitious goals and I would like to use the LLVM IR as my internal C³ IR. C³ is designed to support what I call "value-oriented programming" and it fits naturally with the
2017 Mar 01
2
[lld] We call SymbolBody::getVA redundantly a lot...
On Tue, Feb 28, 2017 at 11:39 PM, Rui Ueyama <ruiu at google.com> wrote: > I also did a quick profiling a few months ago and noticed just like you > that scanRelocations consumes a fairly large percentage of overall > execution time. That caught my attention because at the time I was looking > for a place that I can parallelize. > > scanRelocations is not parallelizable
2011 Jan 13
0
[LLVMdev] Extending LLVM for high-level types
Alexandre Cossette wrote: > Hi all, > > I'm designing a programming language named C³ (or C3). I'm already using LLVM as a back-end for my prototype compiler and it's wonderful to use. Thanks for such a great system! > > I now have more ambitious goals and I would like to use the LLVM IR as my internal C³ IR. Absolutely not. In short, LLVM is its own language. You
2017 Jun 15
3
[RFC] Profile guided section layout
On Thu, Jun 15, 2017 at 10:08 AM, Tobias Edler von Koch < tobias at codeaurora.org> wrote: > Hi Michael, > > This is cool stuff, thanks for sharing! > > On 06/15/2017 11:51 AM, Michael Spencer via llvm-dev wrote: > >> The first is a new llvm pass which uses branch frequency info to get >> counts for each call instruction and then adds a module flags metatdata
2014 Mar 06
2
[LLVMdev] [lld] Relocation reading refactoring
Hi Shankar, I almost implement ELFRelocationReader but still not completely sure that this is a right direction. Suppose somebody wants to override creation of the `ELFReference` object from the `Elf_Rela` or `Elf_Rel` record. Let's consider two implementations A and B: A: ===== 1. Factor out `ELFReference` creation from `createDefinedAtomAndAssignRelocations` into a couple of virtual
2015 Nov 21
2
[lld] R_MIPS_HI16 / R_MIPS_LO16 calculation
Hi, I am working on support R_MIPS_HI16 / R_MIPS_LO16 in the new LLD and have a couple of questions. == Q1 In case of MIPS O32 ABI we have to find a matching R_MIPS_LO16 relocation to calculate R_MIPS_HI16 one because R_MIPS_HI16 uses combined addend (AHI << 16) + (short)ALO where AHI is original R_MIPS_HI16 addend and ALO is addend of the matching R_MIPS_LO16 relocation [1]. There are two
2014 Feb 26
2
[LLVMdev] [lld] Relocation reading refactoring
Hi, Thanks for the explanation. If I understand you properly you suggest to move relocation parsing to the class with the following interface. Right? Who will be user of this class? If it is still only ELFFile class, what benefits will we get from separation of this logic? template <class ELFT> class ELFRelocationReader { public: ELFRelocationReader(.....); // Returns all created
2017 Jun 15
2
[RFC] Profile guided section layout
On Thu, Jun 15, 2017 at 11:09 AM, Xinliang David Li via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > > On Thu, Jun 15, 2017 at 10:55 AM, Michael Spencer via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> On Thu, Jun 15, 2017 at 10:08 AM, Tobias Edler von Koch < >> tobias at codeaurora.org> wrote: >> >>> Hi Michael,