similar to: [LLVMdev] Loss of precision with very large branch weights

Displaying 20 results from an estimated 200 matches similar to: "[LLVMdev] Loss of precision with very large branch weights"

2015 Mar 05
5
[LLVMdev] RFC - Improvements to PGO profile support
> On Mar 2, 2015, at 4:19 PM, Diego Novillo <dnovillo at google.com> wrote: > > On Thu, Feb 26, 2015 at 6:54 PM, Diego Novillo <dnovillo at google.com <mailto:dnovillo at google.com>> wrote: > > I've created a few bugzilla issues with details of some of the things I'll be looking into. I'm not yet done wordsmithing the overall design document.
2015 Feb 26
6
[LLVMdev] RFC: Loop versioning for LICM
I like to propose a new loop multi versioning optimization for LICM. For now I kept this for LICM only, but it can be used in multiple places. The main motivation is to allow optimizations stuck because of memory alias dependencies. Most of the time when alias analysis is unsure about memory access and it says may-alias. This un surety from alias analysis restrict some of the memory based
2015 Feb 26
1
[LLVMdev] RFC: Loop versioning for LICM
Hi Ashutosh, Have you been following the recent Loop Access Analysis work? LAA was split out from the Loop Vectorizer that have been performing the kind of loop versioning that you describe. The main reason was to be able to share this functionality with other passes. Loop Access Analysis is an analysis pass that computes basic memory dependence and the runtime checks. The versioning decision
2012 Apr 23
0
[LLVMdev] SIV tests in LoopDependence Analysis, Sanjoy's patch
Hi, When I write various test cases and explore how they're handled by the code in LoopDependenceAnalysis::analysePair, I'm surprised. This loop collects pairs of subscripts from the source and destination refs. * // Collect GEP operand pairs (FIXME: use GetGEPOperands from BasicAA), adding* * // trailing zeroes to the smaller GEP, if needed.* * GEPOpdsTy destOpds, srcOpds;* *
2020 Oct 01
3
How to get the loop hotness data in a suite ?
Hi everybody, I'm trying to get loop hotness data across a suite (e.g. the llvm test-suite). Ideally, this would be a list that for each loop would list how many times it was entered and what was its iteration count (at least the latter). The closest thing I could come up with is: - clang -fprofile-instr-generate (without opts) to get a .profraw - Get the .profdata - Give that back to clang
2018 Jul 07
2
LoopVectorize fails to vectorize more complex loops
Hello. Could you please tell me why the first loop of the following program (also maybe the commented loop) doesn't get vectorized with LoopVectorize (from a recent LLVM build from the SVN repository from Jun 2018)? typedef short TYPE; TYPE data[1400][1200]; void kernel_covariance(int m, int n, TYPE mean[1200]) { int i, j, k; for (j = 0; j < m; j++) { mean[j] =
2015 Jun 12
4
[LLVMdev] Loop Vectorization and Store-Load Forwarding issue
I have been looking into this small test case (Part A) where loop vectorization is disabled due to possible store-load forwarding conflict (Part B). As you can see, due to the presence of dependence distance 2 the loop is vectorizable only for a width of 2. However, the presence of dependence distance 15 (due to y[j-15]) results in store-load forwarding issue as store packet of y[16:17] (iteration
2012 Apr 12
6
[LLVMdev] SIV tests in LoopDependence Analysis, Sanjoy's patch
Hi, Here is a preliminary (monolithic) version you can comment on. This is still buggy, however, and I'll be testing for and fixing bugs over the next few days. I've used your version of the strong siv test. Thanks! -- Sanjoy Das. http://playingwithpointers.com -------------- next part -------------- A non-text attachment was scrubbed... Name: patch.diff Type: application/octet-stream
2014 Jul 09
2
[LLVMdev] instprof tests down in ARM build
A few weeks ago, I started seeing 6 tests start failing in the Profile test suite of the ARM Linux build. Anyone else seeing this? They all fail with similar output: Command Output (stderr): -- compiler-rt/test/profile/instrprof-write-file.c:13:12: error: expected string not found in input // CHECK: br i1 %{{.*}}, label %{{.*}}, label %{{.*}}, !prof !1 ^ <stdin>:6:17: note:
2019 Sep 12
6
PGO is ineffective for Rust - but why?
Hi everyone, As part of my work for Mozilla's Low Level Tools team I've implemented PGO in the Rust compiler. The feature is available since Rust 1.37 [1]. However, so far we have not seen any actual performance gains from enabling PGO for Rust code. Performance even seems to drop 1-3% with PGO enabled. I wonder why that is and I'm hoping that someone here might have experience
2016 Feb 05
2
Profiling with LLVM.
Dear Duncan, I am generating branch-weights annotated IR files as described in the documentation of LLVM, using profiling with instrumentation. http://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation e.g. llvm-profdata merge -output=$(BENCH).profdata default.profraw > clang -S -emit-llvm -O3 -fprofile-instr-use=$(BENCH).profdata -o > bench.prof.ll bench.c The issue is
2012 Jul 31
2
[LLVMdev] Assertion failure on region analysis
Hi all, I ran across an assertion failure while using region analysis in my passes. I get the same thing when doing: opt -regions -analyze [...]   [1] entry => if.end End region tree Printing analysis 'Detect single entry single exit regions' for function 'njDecodeSOF': Region tree: [0] entry => <Function Return>   [1] entry => return     [2] if.end => return
2013 Nov 08
1
[LLVMdev] loop vectorizer and storing to uniform addresses
I changed the input C to using a 64 bit type for the loop index (this eliminates 'sext' instructions in the IR) Here the IR produced with clang -O0 define float @foo(i64 %start, i64 %end, float* %A) #0 { entry: %start.addr = alloca i64, align 8 %end.addr = alloca i64, align 8 %A.addr = alloca float*, align 8 %sum = alloca [4 x float], align 16 %i = alloca i64, align 8
2013 Nov 08
0
[LLVMdev] loop vectorizer and storing to uniform addresses
On 7 November 2013 17:18, Frank Winter <fwinter at jlab.org> wrote: > LV: We don't allow storing to uniform addresses > This is triggering because it didn't recognize as a reduction variable during the canVectorizeInstrs() but did recognize that sum[q] is loop invariant in canVectorizeMemory(). I'm guessing the nested loop was unrolled because of the low trip-count, and
2011 Oct 27
2
[LLVMdev] How to make Polly ignore some non-affine memory accesses
Perfect, thank you very much :) 2011/10/26 Tobias Grosser <tobias at grosser.es>: > On 10/24/2011 11:32 PM, Marcello Maggioni wrote: >> >> Strange , with --enable-shared (I use auto tool by the way ...) it gives: >> >> MacBook-Pro-di-Marcello:examples Kariddi$ ./compile_ex.sh >> not_so_simple_loop >> clang (LLVM option parsing): Unknown command line
2019 Sep 12
4
PGO is ineffective for Rust - but why?
On Thu, Sep 12, 2019 at 8:18 AM Teresa Johnson <tejohnson at google.com> wrote: > I just have a couple suggestions off the top of my head: > - have you tried using the new pass manager > (-fexperimental-new-pass-manager)? That has access to additional analysis > info during inlining and is able to make more precise PGO based inline > decisions. > (although note the above
2013 Nov 08
3
[LLVMdev] loop vectorizer and storing to uniform addresses
I am trying my luck on this global reduction kernel: float foo( int start , int end , float * A ) { float sum[4] = {0.,0.,0.,0.}; for (int i = start ; i < end ; ++i ) { for (int q = 0 ; q < 4 ; ++q ) sum[q] += A[i*4+q]; } return sum[0]+sum[1]+sum[2]+sum[3]; } LV: Checking a loop in "foo" LV: Found a loop: for.cond1 LV: Found an induction variable. LV: We
2014 Dec 19
1
[LLVMdev] Removing types from metadata
On Fri, Dec 19, 2014 at 12:56 PM, Duncan P. N. Exon Smith < dexonsmith at apple.com> wrote: > > However, I think this would set a bad precedent. There's nowhere else > (that I know of) where we accept two versions of assembly. The > LLParser is relatively easy to work with because it doesn't have that > kind of historical baggage. I can think of two precedents:
2015 Apr 24
2
[LLVMdev] Loss of precision with very large branch weights
On Fri, Apr 24, 2015 at 12:29 PM, Diego Novillo <dnovillo at google.com> wrote: > > > On Fri, Apr 24, 2015 at 3:28 PM, Xinliang David Li <davidxl at google.com> > wrote: >> >> yes -- for count representation, 64 bit is needed. The branch weight >> here is different and does not needs to be 64bit to represent branch >> probability precisely. > >
2019 Sep 24
3
PGO is ineffective for Rust - but why?
To give a little update here: - I've been further investigating and found an issue [1] with the Cargo build tool that most Rust projects use. This issue prevents all projects using Cargo from properly using PGO because it causes symbol names to be different between the generate and the use phase. With this issue fixed the number of "No profile data available for function" warnings