thr3ads.net - similar to: "A couple metrics of LLD/ELF's performance"

Displaying 20 results from an estimated 800 matches similar to: "A couple metrics of LLD/ELF's performance"

[lld] We call SymbolBody::getVA redundantly a lot...

2017 Feb 28

[lld] We call SymbolBody::getVA redundantly a lot...

tl;dr: it looks like we call SymbolBody::getVA about 5x more times than we need to Should we cache it or something? (careful with threads). Here is a link to a PDF of my Mathematica notebook which has all the details of my investigation: https://drive.google.com/open?id=0B8v10qJ6EXRxVDQ3YnZtUlFtZ1k There seem to be two main regimes that we redundantly call SymbolBody::getVA: 1. most

LLD: time to enable --threads by default

2016 Nov 16

LLD: time to enable --threads by default

On 16 November 2016 at 15:52, Rafael Espíndola <rafael.espindola at gmail.com> wrote: > I will do a quick benchmark run. On a mac pro (running linux) the results I got with all cores available: firefox master 7.146418217 patch 5.304271767 1.34729488437x faster firefox-gc master 7.316743822 patch 5.46436812 1.33899174824x faster chromium master 4.265597914 patch

[lld] We call SymbolBody::getVA redundantly a lot...

2017 Mar 01

[lld] We call SymbolBody::getVA redundantly a lot...

On Tue, Feb 28, 2017 at 12:10 PM, Rui Ueyama <ruiu at google.com> wrote: > I don't think getVA is particularly expensive, and if it is not expensive > I wouldn't cache its result. Did you experiment to cache getVA results? I > think you can do that fairly easily by adding a std::atomic_uint64_t to > SymbolBody and use it as a cache for getVA. > You're right,

LLD: time to enable --threads by default

2016 Nov 17

LLD: time to enable --threads by default

SHA1 in LLVM is *very* naive, any improvement is welcome there! It think Amaury pointed it originally and he had an alternative implementation IIRC. — Mehdi > On Nov 16, 2016, at 3:58 PM, Rui Ueyama via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > By the way, while running benchmark, I found that our SHA1 function seems much slower than the one in gold. gold slowed down by

LLD: time to enable --threads by default

2016 Nov 17

LLD: time to enable --threads by default

The current implementation was “copy/pasted” from somewhere (it was explicitly public domain). > On Nov 16, 2016, at 4:05 PM, Rui Ueyama <ruiu at google.com> wrote: > > Can we just copy-and-paste optimized code from somewhere? > > On Wed, Nov 16, 2016 at 4:03 PM, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote: > SHA1 in LLVM is

[lld] We call SymbolBody::getVA redundantly a lot...

2017 Mar 01

[lld] We call SymbolBody::getVA redundantly a lot...

On Tue, Feb 28, 2017 at 11:39 PM, Rui Ueyama <ruiu at google.com> wrote: > I also did a quick profiling a few months ago and noticed just like you > that scanRelocations consumes a fairly large percentage of overall > execution time. That caught my attention because at the time I was looking > for a place that I can parallelize. > > scanRelocations is not parallelizable

LLD: time to enable --threads by default

2016 Nov 16

LLD: time to enable --threads by default

LLD supports multi-threading, and it seems to be working well as you can see in a recent result <http://llvm.org/viewvc/llvm-project?view=revision&revision=287140>. In short, LLD runs 30% faster with --threads option and more than 50% faster if you are using --build-id (your mileage may vary depending on your computer). However, I don't think most users even don't know about that

[LLVMdev] Metadata

2010 Feb 11

[LLVMdev] Metadata

On Thursday 11 February 2010 13:31:58 David Greene wrote: > > Putting a bit (or multiple bits) in MachineMemOperand for this > > would also make sense. > > Is there any chance a MachineMemOperand will be shared by multiple > instructions? So I tried to do this: %r8 = load <2 x double>* %r6, align 16, !"nontemporal" and the assembler doesn't like it.

LLD performance w.r.t. local symbols (and --build-id)

2016 Mar 16

LLD performance w.r.t. local symbols (and --build-id)

Hi, Rafael took some measurements to try to investigate the effect of the local symbols changes. I've been taking a look at the measurements he got and there were some interesting things we noticed. For starters, in the range of revisions tested (r263214 through r263471), we found that the commit for --build-id was the most noticeable, with slowdowns from 7% to 23% (note: these were

[LLVMdev] Metadata

2010 Feb 11

[LLVMdev] Metadata

On Thursday 11 February 2010 14:05:21 David Greene wrote: > Either ParseLoad and probably other instructions need to look for metadata > explicitly or ParseOptionalCommaAlign needs to know about general metadata. > > My inkling is to fix ParseOptionalCommaAlign. Sound reasonable? Well, that's a rat's nest. I backed up and thought maybe I have the metadata syntax wrong. So

Nontemporal memory accesses and fences

2020 Apr 28

Nontemporal memory accesses and fences

The current specification of the behavior of the !nontemporal attribute in LLVM, and the __builtin_nontemporal_* functions in Clang, is rather spartan and underspecified. In effect, it says the following things: * Atomic !nontemporal has no defined semantics * !nontemporal may use special instructions to save cache bandwidth, such as "MOVNT" on x86. What is crucially lacking

Nontemporal memory accesses and fences

2020 Apr 29

Nontemporal memory accesses and fences

________________________________ From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of JF Bastien via llvm-dev <llvm-dev at lists.llvm.org> Sent: Tuesday, April 28, 2020 4:54 PM To: Cranmer, Joshua <joshua.cranmer at intel.com> Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Nontemporal memory accesses and fences I see

New LLD performance builder

2018 Feb 16

New LLD performance builder

>Hello everyone, > >I have added a new public LLD performance builder at >http://lab.llvm.org:8011/builders/lld-perf-testsuite. >It builds LLVM and LLD by the latest releaed Clang and runs a set of >perfromance tests. > >The builder is reliable. Please pay attention on the failures. > >The performance statistics are here:

Non-Temporal hints from Loop Vectorizer

2018 Jan 20

Non-Temporal hints from Loop Vectorizer

i have already seen usage of __builtin_nontemporal_store but i want to automate identification of non temporal loads/stores. i think i need to go for a pass. is it possiblee to detect non temporal loops without polly? On Sat, Jan 20, 2018 at 11:26 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote: > On 20/01/2018 18:16, hameeza ahmed wrote: > > Actually i am working on vector

Non-Temporal hints from Loop Vectorizer

2018 Jan 20

Non-Temporal hints from Loop Vectorizer

Actually i am working on vector accelerator which will perform those instructions which are non temporal. for instance if i have this loop for(i=0;i<2048;i++) a[i]=b[i]+c[i]; currently it emits following IR; %0 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64 0, i64 %index %1 = bitcast i32* %0 to <16 x i32>* %wide.load = load <16 x i32>, <16 x i32>* %1,

[LLVMdev] Metadata

2010 Feb 11

[LLVMdev] Metadata

On Feb 11, 2010, at 12:50 PM, David Greene wrote: > On Thursday 11 February 2010 14:05:21 David Greene wrote: > >> Either ParseLoad and probably other instructions need to look for metadata >> explicitly or ParseOptionalCommaAlign needs to know about general metadata. >> >> My inkling is to fix ParseOptionalCommaAlign. Sound reasonable? > > Well, that's

Non-Temporal hints from Loop Vectorizer

2018 Jan 21

Non-Temporal hints from Loop Vectorizer

On 01/20/2018 12:29 PM, hameeza ahmed via llvm-dev wrote: > i have already seen usage of __builtin_nontemporal_store but i want to > automate identification of non temporal loads/stores. i think i need > to go for a pass. is it possiblee to detect non temporal loops without > polly? Yes, but we don't have anything that does that right now. The cost modeling is non-trivial,

Non-Temporal hints from Loop Vectorizer

2018 Jan 20

Non-Temporal hints from Loop Vectorizer

On 20/01/2018 18:16, hameeza ahmed wrote: > Actually i am working on vector accelerator which will perform those > instructions which are non temporal. > > for instance if i have this loop > > for(i=0;i<2048;i++) > a[i]=b[i]+c[i]; > > currently it emits following IR; > > > %0 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64 0, > i64 %index

RFC: non-temporal fencing in LLVM IR

2016 Jan 14

RFC: non-temporal fencing in LLVM IR

I agree with Tim's assessment for ARM. That's interesting; I wasn't previously aware of that instruction. My understanding is that Alpha would have the same problem for normal loads. I'm all in favor of more systematic handling of the fences associated with x86 non-temporal accesses. AFAICT, nontemporal loads and stores seem to have different fencing rules on x86, none of them

Non-Temporal hints from Loop Vectorizer

2018 Jan 20

Non-Temporal hints from Loop Vectorizer

Hello, My work deals with non-temporal loads and stores i found non-temporal meta data in llvm documentation but its not shown in IR. How to get non-temporal meta data? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180120/7dd4ba6f/attachment.html>

similar to: A couple metrics of LLD/ELF's performance