search for: branch_weight

Displaying 20 results from an estimated 22 matches for "branch_weight".

Did you mean: branch_weights
2020 Oct 01
3
How to get the loop hotness data in a suite ?
...it was entered and what was its iteration count (at least the latter). The closest thing I could come up with is: - clang -fprofile-instr-generate (without opts) to get a .profraw - Get the .profdata - Give that back to clang with -fprofile-instr-use and generate .ll - I note that here we get "branch_weights" stats, so if a branch is a back-edge, it basically gives us the iteration count. For example, check the bottom of this file: https://pastebin.com/ZnQqJdTN which was created with the procedure above. - Then, create a custom pass that goes through every loop and gathers this "branch_weigh...
2014 Jul 09
2
[LLVMdev] instprof tests down in ARM build
...ain(i32 %argc, i8** %argv) #0 { ^ <stdin>:17:2: note: possible intended match here br i1 %cmp, label %if.then, label %if.end, !prof !3 ^ compiler-rt/test/profile/instrprof-write-file.c:33:11: error: expected string not found in input // CHECK: !1 = metadata !{metadata !"branch_weights", i32 1, i32 2} ^ <stdin>:51:1: note: scanning from here cond.true: ; preds = %entry ^ <stdin>:76:1: note: possible intended match here !3 = metadata !{metadata !"branch_weights", i32 1, i32 2} ^ -- ******************** Testing: 0 .. 10.. 20.. 30.. 40.. 50.....
2015 Apr 24
5
[LLVMdev] Loss of precision with very large branch weights
...nt = 2000011 - for.body3: float = 250000.5, int = 2000003 - for.inc4: float = 250000.5, int = 2000003 - for.end6: float = 1.0, int = 8 But if I manually modify the frequencies of both to get close to MAX_INT32, the ratios between the frequencies do not reflect reality. For example, if I change branch_weights in both loops to be 4,294,967,295 and 2,147,483,647 $ bin/opt -analyze -block-freq -S unbiased-branches.ll Printing analysis 'Block Frequency Analysis' for function 'bar': block-frequency-info: bar - entry: float = 1.0, int = 8 Printing analysis 'Block Frequency Analysis'...
2016 Feb 05
2
Profiling with LLVM.
...llvm-profdata merge -output=$(BENCH).profdata default.profraw > clang -S -emit-llvm -O3 -fprofile-instr-use=$(BENCH).profdata -o > bench.prof.ll bench.c The issue is that in some benchmarks I get crazy numbers in the annotated metadata inside the generated *.ll files. e.g. !16 = !{!"branch_weights", i32 -2147483648, i32 0} > !155 = !{!"branch_weights", i32 1075807200, i32 -1501637297} > !181 = !{!"branch_weights", i32 -965299388, i32 218980800} This should be a counter overflow. Now the interesting thing is that by using these annotated files as input for t...
2019 Sep 12
6
PGO is ineffective for Rust - but why?
...egression tests that make sure that: - instrumentation shows up in LLVM IR for the `generate` phase, and that - profiling data is actually used during the `use` phase, i.e. that cold functions get marked with `cold` and hot functions get marked with `inline`. I also verified manually that `branch_weights` are being set in IR. So, from my perspective, the PGO implementation does what it is supposed to do. However, as already mentioned, in all benchmarks I've seen so far performance seems to stay the same at best and often even suffers slightly. Which is suprising because for C++ code using Cla...
2019 Sep 12
4
PGO is ineffective for Rust - but why?
...up in LLVM IR for the `generate` phase, >> and that >> >> - profiling data is actually used during the `use` phase, i.e. >> that cold functions get marked with `cold` and hot functions >> get marked with `inline`. >> >> I also verified manually that `branch_weights` are being set >> in IR. So, from my perspective, the PGO implementation does >> what it is supposed to do. >> >> However, as already mentioned, in all benchmarks I've seen so >> far performance seems to stay the same at best and often even >> suffers slight...
2019 Sep 16
2
PGO is ineffective for Rust - but why?
...while the counts for > the other linkers are correct. All of this suggests to me that > something goes wrong when `ld` tries to link in the profiling runtime. > > I'll be investigating further. > > [1] > https://github.com/michaelwoerister/rust-pgo-test-programs/tree/master/branch_weights > > > On Thu, Sep 12, 2019 at 6:31 PM Teresa Johnson <tejohnson at google.com> > wrote: > > > > > > > > On Thu, Sep 12, 2019 at 8:18 AM Teresa Johnson <tejohnson at google.com> > wrote: > >> > >> I just have a couple suggestions...
2019 Sep 17
2
PGO is ineffective for Rust - but why?
...ram [1] compiled with Clang 8 does not have any problems with GNU ld: The `__llvm_prf_data` section is the same size for all three linkers. It must be something specific to the Rust compiler that's going wrong here. [1] https://github.com/michaelwoerister/rust-pgo-test-programs/tree/master/cpp_branch_weights On Tue, Sep 17, 2019 at 3:26 PM Michael Woerister <mwoerister at mozilla.com> wrote: > > > Can you clarify if performance difference is caused by using different linkers at instrumentation build? > > Yes, good observation! Whether the bug occurs depends only on the > linke...
2019 Sep 24
3
PGO is ineffective for Rust - but why?
...medium sized benchmark, however, the PGO version has slightly *more* branch misses. This seems to indicate that there is still something wrong. I will further investigate. [1] https://github.com/rust-lang/cargo/issues/7416 [2] https://github.com/michaelwoerister/rust-pgo-test-programs/tree/master/branch_weights/ On Tue, Sep 17, 2019 at 6:16 PM Xinliang David Li <xinliangli at gmail.com> wrote: > > You can check the difference of input args and object files to the linker. > > Regarding gnu ld, it is possible that it triggers another bug relating to start section and garbage collection....
2019 Sep 16
2
PGO is ineffective for Rust - but why?
...while the counts for > the other linkers are correct. All of this suggests to me that > something goes wrong when `ld` tries to link in the profiling runtime. > > I'll be investigating further. > > [1] > https://github.com/michaelwoerister/rust-pgo-test-programs/tree/master/branch_weights > > > On Thu, Sep 12, 2019 at 6:31 PM Teresa Johnson <tejohnson at google.com> > wrote: > > > > > > > > On Thu, Sep 12, 2019 at 8:18 AM Teresa Johnson <tejohnson at google.com> > wrote: > >> > >> I just have a couple suggestions...
2014 Dec 19
1
[LLVMdev] Removing types from metadata
On Fri, Dec 19, 2014 at 12:56 PM, Duncan P. N. Exon Smith < dexonsmith at apple.com> wrote: > > However, I think this would set a bad precedent. There's nowhere else > (that I know of) where we accept two versions of assembly. The > LLParser is relatively easy to work with because it doesn't have that > kind of historical baggage. I can think of two precedents:
2015 Apr 24
2
[LLVMdev] Loss of precision with very large branch weights
On Fri, Apr 24, 2015 at 12:29 PM, Diego Novillo <dnovillo at google.com> wrote: > > > On Fri, Apr 24, 2015 at 3:28 PM, Xinliang David Li <davidxl at google.com> > wrote: >> >> yes -- for count representation, 64 bit is needed. The branch weight >> here is different and does not needs to be 64bit to represent branch >> probability precisely. > >
2016 Feb 04
2
Profiling with LLVM.
Dear Duncan, Thank you a lot for your feedback. I have a problem though. The branch weights counters overflow in some files and thus I get incorrect numbers. Is there any way to find a workaround for that? Is is supposed to be a known bug or is it something that needs configuration on my part? Again, thank you a lot for your reply. Best Regards, Georgios Zacharopoulos 2016-02-03 18:23
2017 Aug 02
3
[InstCombine] Simplification sometimes only transforms but doesn't simplify instruction, causing side effect in other pass
...; preds = %entry %r2 = and i32 %and.i, 31 store i32 %and.i, i32* @b, align 8 br label %return return: ; preds = %if.else, %if.then %ret = phi i32 [ %r1, %if.then ], [ %r2, %if.else ] ret i32 %ret } !0 = !{!"branch_weights", i32 2000, i32 1} ------------------------------------------------------------------------- For the snippet: %and.i = and i32 %conv.i, 255 ... %r2 = and i32 %and.i, 31 Look at %r2 in block %if.else, it is computed by two "and" operations. Both InstCombiner::SimplifyAssociativeO...
2017 Aug 02
3
[InstCombine] Simplification sometimes only transforms but doesn't simplify instruction, causing side effect in other pass
...31 >> store i32 %and.i, i32* @b, align 8 >> br label %return >> >> return: ; preds = %if.else, %if.then >> %ret = phi i32 [ %r1, %if.then ], [ %r2, %if.else ] >> ret i32 %ret >> } >> >> !0 = !{!"branch_weights", i32 2000, i32 1} >> ------------------------------------------------------------------------- >> >> For the snippet: >> %and.i = and i32 %conv.i, 255 >> ... >> %r2 = and i32 %and.i, 31 >> >> Look at %r2 in block %if.else, it is computed by tw...
2015 Apr 17
3
[LLVMdev] RFC: Indirect Call Promotion LLVM Pass
...ret void if.true: ; preds = %entry call void @foo(i32 10) #0 br label %if.merge if.false: ; preds = %entry call void %fun(i32 10), !prof !1 br label %if.merge } attributes #0 = { inlinehint } !0 = !{!"branch_weights", i32 5000, i32 1000} !1 = !{!"indirect_call_targets", i64 1000, !"bar", i64 100} ---------------------------------------------------------------------------- The ICP pass handles indirect call and indirect invoke LLVM IR instructions. It depends on the availability of in...
2011 Oct 19
0
[LLVMdev] Question regarding basic-block placement optimization
...) br label %else4 else4: %gep5 = getelementptr i32* %a, i32 3 %val5 = load i32* %gep5 %cond5 = icmp ugt i32 %val5, 3 br i1 %cond5, label %then5, label %exit, !prof !0 then5: call void @error(i32 %i, i32 1, i32 %b) br label %exit exit: ret i32 %b } !0 = metadata !{metadata !"branch_weights", i32 4, i32 64} % ./bin/llc -O2 -o - ifchain.ll .file "ifchain.ll" .text .globl test .align 16, 0x90 .type test, at function test: # @test .Ltmp4: .cfi_startproc # BB#0:...
2017 Aug 02
2
[InstCombine] Simplification sometimes only transforms but doesn't simplify instruction, causing side effect in other pass
...%entry > %r2 = and i32 %and.i, 31 > store i32 %and.i, i32* @b, align 8 > br label %return > > return: ; preds = %if.else, > %if.then > %ret = phi i32 [ %r1, %if.then ], [ %r2, %if.else ] > ret i32 %ret > } > > !0 = !{!"branch_weights", i32 2000, i32 1} > ------------------------------------------------------------------------- > > For the snippet: > %and.i = and i32 %conv.i, 255 > ... > %r2 = and i32 %and.i, 31 > > Look at %r2 in block %if.else, it is computed by two "and" operations. &gt...
2020 Aug 05
10
[RFC] Machine Function Splitter - Split out cold blocks from machine functions using profile data
...; preds = %6, %4 %10 = phi i32 [ %1, %4 ], [ %8, %6 ] %11 = load i32, i32* @i, align 4 %12 = add nsw i32 %10, %11 store i32 %12, i32* @i, align 4 ret i32 %12 } declare i32 @L1() declare i32 @R1() cold nounwind !1 = !{!"function_entry_count", i64 7} !2 = !{!"branch_weights", i32 0, i32 7} ``` Code generated by Machine Function Splitter $ llc < example.ll -mtriple=x86_64-unknown-linux-gnu -split-machine-functions ``` .text .file "<stdin>" .globl foo # -- Begin function foo...
2011 Oct 19
3
[LLVMdev] Question regarding basic-block placement optimization
On Tue, Oct 18, 2011 at 6:58 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk>wrote: > > On Oct 18, 2011, at 5:22 PM, Chandler Carruth wrote: > > As for why it should be an IR pass, mostly because once the selection dag >> runs through the code, we can never recover all of the freedom we have at >> the IR level. To start with, splicing MBBs around requires known about