thr3ads.net - search: "mispredict"

Displaying 20 results from an estimated 57 matches for "mispredict".

[LLVMdev] Hi Cache Miss and Branch Misprediction

2008 Sep 29

[LLVMdev] Hi Cache Miss and Branch Misprediction

...questions, i would be really great if someone could help me. 1. Can i find out (is there something already built), if the previous instruction / or some instruction was a cache miss. Basically i want to detect cache misses and instructions that are causing this 2. Can i find if there was a branch misprediction? 3. Can i find if a branch was taken or not taken? It would be really great if someone could answer this for me. Thank you Ketan Georgia Institue of Technology

[LLVMdev] Hi Cache Miss and Branch Misprediction

2008 Sep 29

[LLVMdev] Hi Cache Miss and Branch Misprediction

...--tool=callgrind). See http://valgrind.org/ > 1. Can i find out (is there something already built), if the previous instruction / or some instruction was a cache miss. Basically i want to detect cache misses and instructions that are causing this > > 2. Can i find if there was a branch misprediction? > > 3. Can i find if a branch was taken or not taken? > > It would be really great if someone could answer this for me. I suppose you could instrument the existing LLVM JIT to collect this sort of information. Realize that much of LLVM works in the LLVM IR language, and we just...

[LLVMdev] Hi Cache Miss and Branch Misprediction

2008 Sep 30

[LLVMdev] Hi Cache Miss and Branch Misprediction

...gger time, in farther away cache, bigger time, in dram, bigger time, page fault. No one in the real world does this. Instead the use tools like Shark or vtune. You can say, show me all the instructions that missed cache and how often they missed. > 2. Can i find if there was a branch misprediction? Likewise. Though, I'm trying to recall if mispredictions where one of the things one could watch for on x86. google around for performance counters on x86. > 3. Can i find if a branch was taken or not taken? :-) gcov will tell you this today.

[LLVMdev] Hi Cache Miss and Branch Misprediction

2008 Sep 30

[LLVMdev] Hi Cache Miss and Branch Misprediction

...est Ketan ----- Original Message ----- From: "OvermindDL1" <overminddl1 at gmail.com> To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> Sent: Monday, September 29, 2008 10:17:41 PM GMT -05:00 US/Canada Eastern Subject: Re: [LLVMdev] Hi Cache Miss and Branch Misprediction On Mon, Sep 29, 2008 at 6:30 PM, Mike Stump <mrs at apple.com> wrote: > /* snip */ AMD's CodeAnalyst is free and quite wonderful at this job. Shows details about just about anything the CPU reports (and on newer AMD CPU's there is an even more ridiculous amount of information...

[LLVMdev] Hi Cache Miss and Branch Misprediction

2008 Sep 30

[LLVMdev] Hi Cache Miss and Branch Misprediction

On Mon, Sep 29, 2008 at 6:30 PM, Mike Stump <mrs at apple.com> wrote: > /* snip */ AMD's CodeAnalyst is free and quite wonderful at this job. Shows details about just about anything the CPU reports (and on newer AMD CPU's there is an even more ridiculous amount of information) about every little function call, time they took, multiple profiling modes, etc...

[LLVMdev] Hi Cache Miss and Branch Misprediction

2008 Sep 30

[LLVMdev] Hi Cache Miss and Branch Misprediction

...u Ketan ----- Original Message ----- From: "John Criswell" <criswell at cs.uiuc.edu> To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> Sent: Tuesday, September 30, 2008 11:37:05 AM GMT -05:00 US/Canada Eastern Subject: Re: [LLVMdev] Hi Cache Miss and Branch Misprediction Ketan Pundlik Umare wrote: > Thanx a lot Guys!! > But i have to do this online and and use it to do some kind code transformation. Its for a different project. > Another possibility is to write an LLVM transform that inserts code into a program to do this profiling for you. For ex...

Spectre V1 Mitigation - Internals?

2019 Sep 17

Spectre V1 Mitigation - Internals?

...t I understand that as soon as the condition value is available, the processor can check about it's assumptions and revert back. That is, If the branch prediction is correct during speculation, we mask with all_ones, the processor can follow the predicted branch to retire. But if the processor mispredicted the branch, it will revert back as soon as condition become available if this is the case then we don't execute speculatively the operations : pointer1 &= predicate_state - (if branch) and *pointer2 & predicted_state - (else branch) right? Or out-of-processor's allow such access?...

Spectre V1 Mitigation - Internals?

2019 Sep 17

Spectre V1 Mitigation - Internals?

...ut the store operation in speculative execution. Am I wrong here? On Tue, 17 Sep 2019 at 21:30, Craig Topper <craig.topper at gmail.com> wrote: > The reverting of state doesn’t occur when the condition is available. The > processor has to “execute” the branch uop and see that it was mispredicted. > This occurs at least one cycle after the condition is available. The > conditional move on the predicate state can execute at the same time as the > branch. Or it can execute before it if the branch unit is busy. The load > can do the same. > > On Tue, Sep 17, 2019 at 7:57 AM...

Replace call stack with an equivalent on the heap?

2017 Dec 16

Replace call stack with an equivalent on the heap?

Hello, I'm implementing a custom Haskell-to-LLVM compiler, and in my experimentation, noticed that GHC is much slower than clang certain examples, such as the ackermann function. However, from reading their respective IRs (Cmm for GHC and LLVM for clang), I don't really see much of a difference. Here is a link to the numbers. (n, m) are the parameters to the ackermann function

[RFC] Delaying phi-to-select transformation until later in the pass pipeline

2018 Aug 14

[RFC] Delaying phi-to-select transformation until later in the pass pipeline

...56.hmmer -2.50% External/SPEC/CINT2006/464.h264ref/464.h264ref -1.60% MultiSource/Benchmarks/nbench/nbench -1.19% SingleSource/Benchmarks/Adobe-C++/functionobjects -1.07% I had a brief look at the regressions and they all look to be caused by getting bad luck with branch mispredictions: I looked into the Shootout-ary3 and yacr2 cases and in both the hot code path was the same with and without the patch, but with more mispredictions probably caused by changes elsewhere. John

Spectre V1 Mitigation - Internals?

2019 Sep 16

Spectre V1 Mitigation - Internals?

...redicate_state = all_ones_mask; if (condition) { predicate_state = !condition ? all_zeros_mask : predicate_state; pointer1 &= predicate_state; leak(*pointer1); } else { int value2 = *pointer2 & predicate_state; leak(value2); } } Let's assume that the branch is mispredicted and if body is taken. The value predicate_state mask is depend on the "result of the condition" but which is not yet available hence speculative execution. My question whether the value of predicate_state is also guessed by the processor? If it is correct, then the value of predicate_s...

[LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one

2014 Feb 03

[LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one

On Feb 2, 2014, at 6:18 PM, Andrew Trick <atrick at apple.com> wrote: >> The result of such a system would produce weights for every block in the above CFG as '1.0', or equivalent to the entry block weight. This to me is a really useful metric -- it indicates that no block in the CFG is really more or less likely than any other. Only *biases* in a specific direction would

XRay: Demo on x86_64/Linux almost done; some questions.

2016 Jul 29

XRay: Demo on x86_64/Linux almost done; some questions.

...ould also be faster because smaller code better fits in CPU > cache, and patching itself should run faster (because there is less code to > modify). It may well be slower. Larger CPUs tend to track the call stack in hardware and returning to an address pushed manually is an inevitable branch mispredict in those cases. Cheers. Tim.

[LLVMdev] Is PIC code defeating the branch predictor?

2011 Jan 04

[LLVMdev] Is PIC code defeating the branch predictor?

...use calls and returns no longer are matched. Yes, this will defeat the processor's return address stack predictor. That said, I suspect it's not much of an issue on "desktop" processors: the reissue of the pop is an Atom-specific issue, so you only need to worry about the branch misprediction caused on the next return. Assuming these sequences aren't too frequent, the more elaborate tournament predictors in more powerful processors may be able to compensate for it. That said, the alternative sequence you propose seems like it would be an improvement on any processor with a mult...

[RFC] Delaying phi-to-select transformation until later in the pass pipeline

2018 Aug 15

[RFC] Delaying phi-to-select transformation until later in the pass pipeline

.../SPEC/CINT2006/464.h264ref/464.h264ref -1.60% > MultiSource/Benchmarks/nbench/nbench -1.19% > SingleSource/Benchmarks/Adobe-C++/functionobjects -1.07% > > I had a brief look at the regressions and they all look to be > caused by > getting bad luck with branch mispredictions: I looked into the > Shootout-ary3 and > yacr2 cases and in both the hot code path was the same with and > without the > patch, but with more mispredictions probably caused by changes > elsewhere. > > John > > ______________________________...

[Bridge] [BRIDGE] Unaligned access on IA64 when comparing ethernet addresses

2007 Apr 18

[Bridge] [BRIDGE] Unaligned access on IA64 when comparing ethernet addresses

From: Evgeny Kravtsunov <emkravts@openvz.org> compare_ether_addr() implicitly requires that the addresses passed are 2-bytes aligned in memory. This is not true for br_stp_change_bridge_id() and br_stp_recalculate_bridge_id() in which one of the addresses is unsigned char *, and thus may not be 2-bytes aligned. Signed-off-by: Evgeny Kravtsunov <emkravts@openvz.org> Signed-off-by:

XRay: Demo on x86_64/Linux almost done; some questions.

2016 Jul 29

XRay: Demo on x86_64/Linux almost done; some questions.

...e better fits in > CPU > > cache, and patching itself should run faster (because there is less code > to > > modify). > > It may well be slower. Larger CPUs tend to track the call stack in > hardware and returning to an address pushed manually is an inevitable > branch mispredict in those cases. > > Cheers. > > Tim. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160729/ff713ce1/attachment.html>

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

2017 Apr 20

[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long

> This seems like it was done for perf reason (mispredict). Conditional-to-cmov transformation should keep > from introducing additional observable side-effects, and it's clear that whatever did this did not account > for floating point exception. That’s a very reasonable statement, but I’m not sure it corresponds to the way we have typically a...

[RFC] Delaying phi-to-select transformation until later in the pass pipeline

2018 Aug 17

[RFC] Delaying phi-to-select transformation until later in the pass pipeline

...64ref -1.60% >>> MultiSource/Benchmarks/nbench/nbench -1.19% >>> SingleSource/Benchmarks/Adobe-C++/functionobjects -1.07% >>> >>> I had a brief look at the regressions and they all look to be caused by >>> getting bad luck with branch mispredictions: I looked into the Shootout-ary3 and >>> yacr2 cases and in both the hot code path was the same with and without the >>> patch, but with more mispredictions probably caused by changes elsewhere. >>> >>> John >>> >>> _______________________...

[LLVMdev] Is PIC code defeating the branch predictor?

2011 Jan 04

[LLVMdev] Is PIC code defeating the branch predictor?

I noticed that we generate code like this for i386 PIC: calll L0$pb L0$pb: popl %eax movl %eax, -24(%ebp) ## 4-byte Spill I worry that this defeats the return address prediction for returns in the function because calls and returns no longer are matched. From Intel's Optimization Reference Manual: "The return address stack mechanism augments the static and dynamic

search for: mispredict