thr3ads.net - search: "scanrelocs"

Displaying 19 results from an estimated 19 matches for "scanrelocs".

[LLD] thunk implementation correctness depends on order of input section.

2016 Jun 21

[LLD] thunk implementation correctness depends on order of input section.

I've been working on supporting ARM/Thumb interworking thunks in LLD and have encountered a limitation that I think it is worth bringing up in a wider context. This is all LLD specific, apologies if I've abused llvm-dev here. TL;DR summary: - Thunks in lld may not work if they are added to InputSections that have already been scanned. - There is a short term fix, but in the longer term I

[lld] We call SymbolBody::getVA redundantly a lot...

2017 Mar 01

[lld] We call SymbolBody::getVA redundantly a lot...

On Tue, Feb 28, 2017 at 12:10 PM, Rui Ueyama <ruiu at google.com> wrote: > I don't think getVA is particularly expensive, and if it is not expensive > I wouldn't cache its result. Did you experiment to cache getVA results? I > think you can do that fairly easily by adding a std::atomic_uint64_t to > SymbolBody and use it as a cache for getVA. > You're right,

[lld] We call SymbolBody::getVA redundantly a lot...

2017 Mar 01

[lld] We call SymbolBody::getVA redundantly a lot...

On Tue, Feb 28, 2017 at 11:39 PM, Rui Ueyama <ruiu at google.com> wrote: > I also did a quick profiling a few months ago and noticed just like you > that scanRelocations consumes a fairly large percentage of overall > execution time. That caught my attention because at the time I was looking > for a place that I can parallelize. > > scanRelocations is not parallelizable

lld dynamic relocation creation issue

2016 Feb 03

lld dynamic relocation creation issue

...nd 000000010dc0 000000000403 R_AARCH64_RELATIV 7a8 Where 0x7a8 (dynamic relocation r_addend) points to a function at .text segment (in this case frame_dummy). I am trying to made this on lld, but current step sequence is: - Write::run \_ Write::createSections \_ Write::scanRelocs \_ Write::addReloc // Dynamic Relocation creation \_ Write::writeSections \_ OutputSectionBase::writeTo \_ InputSection::writeTo \_ InputSection::relocate \_ TargetInfo::relocateOne The problem is only at TargetInfo::relocate the target (aarch64) wi...

LLD: Possible optimization for TargetInfo

2016 Mar 30

LLD: Possible optimization for TargetInfo

...hisophugis at gmail.com> wrote: > I believe the relocation stuff that Rafael is currently working on will > make this a non-issue (it will make relocation application much friendlier > for the CPU). > I don't think Rafael's patch would make this a non-issue. He's making scanRelocs to create data, which would reduce the number of calls to the virtual functions, but it wouldn't be reduced to zero. However, even in the current scheme, since the target is fixed, all the > indirect call sites should be monomorphic and so there shouldn't be much > branch-prediction...

[LLD] thunk implementation correctness depends on order of input section.

2016 Jun 22

[LLD] thunk implementation correctness depends on order of input section.

...MipsLA25Thunk or something like that. I think you want to create a new type > of thunk for ARM. > > The bug that we sometimes create broken MIPS thunks seems to have introduced > in r265673 which Rafael made. Before that patch, we didn't assume that > section VAs are available in scanRelocs. I think we want to revert that > change (although it cannot simply be reverted because the patch was > submitted in April, and many changes has been made on it since then.) > > Rafael, can you take at that change? > > On Tue, Jun 21, 2016 at 9:38 PM, Peter Smith <peter.smith a...

LLD: Possible optimization for TargetInfo

2016 Mar 30

LLD: Possible optimization for TargetInfo

...:chisophugis at gmail.com>> wrote: > I believe the relocation stuff that Rafael is currently working on will make this a non-issue (it will make relocation application much friendlier for the CPU). > > I don't think Rafael's patch would make this a non-issue. He's making scanRelocs to create data, which would reduce the number of calls to the virtual functions, but it wouldn't be reduced to zero. > > However, even in the current scheme, since the target is fixed, all the indirect call sites should be monomorphic and so there shouldn't be much branch-prediction...

LLD: Possible optimization for TargetInfo

2016 Mar 31

LLD: Possible optimization for TargetInfo

...> >> I believe the relocation stuff that Rafael is currently working on will >> make this a non-issue (it will make relocation application much friendlier >> for the CPU). >> > > I don't think Rafael's patch would make this a non-issue. He's making > scanRelocs to create data, which would reduce the number of calls to the > virtual functions, but it wouldn't be reduced to zero. > > However, even in the current scheme, since the target is fixed, all the >> indirect call sites should be monomorphic and so there shouldn't be much >&...

LLD: Possible optimization for TargetInfo

2016 Mar 31

LLD: Possible optimization for TargetInfo

...the relocation stuff that Rafael is currently working on will >>> make this a non-issue (it will make relocation application much friendlier >>> for the CPU). >>> >> >> I don't think Rafael's patch would make this a non-issue. He's making >> scanRelocs to create data, which would reduce the number of calls to the >> virtual functions, but it wouldn't be reduced to zero. >> >> However, even in the current scheme, since the target is fixed, all the >>> indirect call sites should be monomorphic and so there shouldn'...

[lld] We call SymbolBody::getVA redundantly a lot...

2017 Feb 28

[lld] We call SymbolBody::getVA redundantly a lot...

tl;dr: it looks like we call SymbolBody::getVA about 5x more times than we need to Should we cache it or something? (careful with threads). Here is a link to a PDF of my Mathematica notebook which has all the details of my investigation: https://drive.google.com/open?id=0B8v10qJ6EXRxVDQ3YnZtUlFtZ1k There seem to be two main regimes that we redundantly call SymbolBody::getVA: 1. most

LLD: Possible optimization for TargetInfo

2016 Mar 31

LLD: Possible optimization for TargetInfo

...at Rafael is currently working on will >>>> make this a non-issue (it will make relocation application much friendlier >>>> for the CPU). >>>> >>> >>> I don't think Rafael's patch would make this a non-issue. He's making >>> scanRelocs to create data, which would reduce the number of calls to the >>> virtual functions, but it wouldn't be reduced to zero. >>> >>> However, even in the current scheme, since the target is fixed, all the >>>> indirect call sites should be monomorphic and so t...

[lld] avoid emitting PLT entries for ifuncs

2018 Aug 21

[lld] avoid emitting PLT entries for ifuncs

Hello, We've recently started using ifuncs in the x86(_64) FreeBSD kernel. Currently lld will emit a PLT entry for each ifunc, so ifunc calls are more expensive that those of regular functions. In our kernel, this overhead isn't really necessary: if lld instead emits PC-relative relocations for each ifunc call site, where each relocation references a symbol of type GNU_IFUNC, then during

LLD: Possible optimization for TargetInfo

2016 Mar 30

LLD: Possible optimization for TargetInfo

I believe the relocation stuff that Rafael is currently working on will make this a non-issue (it will make relocation application much friendlier for the CPU). However, even in the current scheme, since the target is fixed, all the indirect call sites should be monomorphic and so there shouldn't be much branch-prediction cost (certainly nothing that would cause 1.8% performance delta for the

[llvm-mc] FreeBSD kernel module performance impact when upgrading clang

2020 Nov 02

[llvm-mc] FreeBSD kernel module performance impact when upgrading clang

Hi, I'm in the process of migrating from clang5 to clang10. Unfortunately clang10 introduced a negative performance impact. The cause is an increase of PLT entries from this patch (first released in clang7): https://bugs.llvm.org/show_bug.cgi?id=36370 https://reviews.llvm.org/D43383 If I revert that clang patch locally, the additional PLT entries and the performance impact disappear. This

RFC: LLD range extension thunks

2017 Jan 04

RFC: LLD range extension thunks

...This would mean that the final address of each caller and callee could be used, and after thunk creation there would be no further content changes. This would mean: - All code that runs prior to thunk creation may have the offset in the OutputSection altered by the addition of thunks. In particular scanRelocs() calculates the offset in the OutputSection of some relocations. We would need to find alternative ways of handling these cases so that they could either survive thunk creation or be patched up afterwards. - assignAddresses() will need to run at least twice if thunks are created. At least once to...

LLD: Possible optimization for TargetInfo

2016 Mar 31

LLD: Possible optimization for TargetInfo

...orking on >>>>> will make this a non-issue (it will make relocation application much >>>>> friendlier for the CPU). >>>>> >>>> >>>> I don't think Rafael's patch would make this a non-issue. He's making >>>> scanRelocs to create data, which would reduce the number of calls to the >>>> virtual functions, but it wouldn't be reduced to zero. >>>> >>>> However, even in the current scheme, since the target is fixed, all the >>>>> indirect call sites should be mono...

LLD: Possible optimization for TargetInfo

2016 Mar 30

LLD: Possible optimization for TargetInfo

I was wandering how much is the overhead of virtual function calls of TargetInfo member functions. TargetInfo handles platform-specific details, and we have target-specific subclasses of that class. The subclasses override functions defined in TargetInfo. The TargetInfo member functions are called multiple times for each relocation. So the cost of virtual function calls may be non-neglible. That

RFC: LLD range extension thunks

2017 Jan 05

RFC: LLD range extension thunks

...;> each caller and callee could be used, and after thunk creation there >> would be no further content changes. This would mean: >> - All code that runs prior to thunk creation may have the offset in >> the OutputSection altered by the addition of thunks. In particular >> scanRelocs() calculates the offset in the OutputSection of some >> relocations. We would need to find alternative ways of handling these >> cases so that they could either survive thunk creation or be patched >> up afterwards. >> - assignAddresses() will need to run at least twice if t...

[EXTERNAL] [llvm-mc] FreeBSD kernel module performance impact when upgrading clang

2020 Nov 05

[EXTERNAL] [llvm-mc] FreeBSD kernel module performance impact when upgrading clang

> You used -noinhibit-exec to ignore the diagnostic, which is usually a bad thing. I certainly agree with that. The point I was trying to make in my original email is that, specifically for kernel objects, this diagnostic is incorrect. R_X86_64_PC32 can be used safely against the symbol foo in that specific context, and should be possible without ignoring diagnostics. I wondered if there

search for: scanrelocs