search for: iaca

Displaying 20 results from an estimated 21 matches for "iaca".

2013 Sep 17
2
[LLVMdev] Codegen performance issue: LEA vs. INC.
...e assembly code of the outer loop. It simply has 11 loads, 11 FP add, 11 FP mull, 1 FP store and lea+mov for index computation, cmp and jump. The problem is that lea is on critical path because it's dispatched on the same port as all FP add operations (port 1). Intel Architecture Code Analyzer (IACA) reports throughput for that assembly block is 12.95 cycles. I made a short investigation and found that there is a pass in code gen that replaces index increment with lea. Here is the snippet from llvm/lib/CodeGen/TwoAddressInstructionPass.cpp if (MI.isConvertibleTo3Addr()) { // This instructio...
2013 Oct 02
0
[LLVMdev] Codegen performance issue: LEA vs. INC.
...op. It simply has 11 loads, > 11 FP add, 11 FP mull, 1 FP store and lea+mov for index computation, cmp and > jump. > > The problem is that lea is on critical path because it’s dispatched on the > same port as all FP add operations (port 1). > > Intel Architecture Code Analyzer (IACA) reports throughput for that assembly > block is 12.95 cycles. > > I made a short investigation and found that there is a pass in code gen that > replaces index increment with lea. > > Here is the snippet from llvm/lib/CodeGen/TwoAddressInstructionPass.cpp > > > > if (...
2013 Oct 03
2
[LLVMdev] Codegen performance issue: LEA vs. INC.
...t;> 11 FP add, 11 FP mull, 1 FP store and lea+mov for index computation, cmp and >> jump. >> >> The problem is that lea is on critical path because it’s dispatched on the >> same port as all FP add operations (port 1). >> >> Intel Architecture Code Analyzer (IACA) reports throughput for that assembly >> block is 12.95 cycles. >> >> I made a short investigation and found that there is a pass in code gen that >> replaces index increment with lea. >> >> Here is the snippet from llvm/lib/CodeGen/TwoAddressInstructionPass.cp...
2018 Jan 04
0
FYI, we've posted a component of Spectre mitigation on llvm-commits
...units the way the busy loop does. >> > The pause instruction will also avoid tying up execution resources in speculative contexts, so I wouldn't expect it to be significantly different. Got it. The Software Developer Manual isn't entirely clear on this point (to me at least) and IACA shows a number of ports in use during the 4 or 5 Uops pause takes. Thank you, Steve -- Stephen Checkoway
2018 Jan 04
2
FYI, we've posted a component of Spectre mitigation on llvm-commits
On Thu, Jan 4, 2018 at 12:31 PM Stephen Checkoway via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > > On Jan 4, 2018, at 04:23, Chandler Carruth via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > > > Sending a note here as this seems likely to be of relatively broad > interest. > > It looks like this is producing code of the following form.
2013 Oct 05
0
[LLVMdev] Codegen performance issue: LEA vs. INC.
...mull, 1 FP store and lea+mov for index computation, cmp and >>> jump. >>> >>> The problem is that lea is on critical path because it’s dispatched on the >>> same port as all FP add operations (port 1). >>> >>> Intel Architecture Code Analyzer (IACA) reports throughput for that assembly >>> block is 12.95 cycles. >>> >>> I made a short investigation and found that there is a pass in code gen that >>> replaces index increment with lea. >>> >>> Here is the snippet from llvm/lib/CodeGen/TwoA...
2018 Dec 10
2
[RFC][llvm-mca] Adding binary support to llvm-mca.
+1 to what Clement said. I believe the intrinsics are a better design to support many architectures. IACA users are probably decorating their code with IACA_START / IACA_END macros. One possibility is to provide a header that define these macros in terms of the new intrinsics. On Mon, Dec 10, 2018 at 3:59 PM Clement Courbet <courbet at google.com> wrote: > Hi Matt/Andrea, > > I see pro...
2013 Jul 10
3
[LLVMdev] unaligned AVX store gets split into two instructions
...ise the heuristics that I put in and to see if it matches the Sandybridge optimization guide. If I remember correctly the optimization guide does not have too much information on this, but Elena looked over it and said that it made sense. BTW, you can validate that this is the problem using the IACA tool. It performs static analysis on your binary and tells you where the critical path is. http://software.intel.com/en-us/articles/intel-architecture-code-analyzer Thanks, Nadav On Jul 9, 2013, at 10:01 PM, Eli Friedman <eli.friedman at gmail.com> wrote: > On Tue, Jul 9, 2013 at 9:01...
2018 Dec 10
4
[RFC][llvm-mca] Adding binary support to llvm-mca.
...cipate lifting this restriction once branching > is handled. > > -Matt > > > On Mon, Dec 10, 2018 at 04:15:46PM +0100, Guillaume Chatelet wrote: >> +1 to what Clement said. >> I believe the intrinsics are a better design to support many architectures. >> >> IACA users are probably decorating their code with IACA_START / IACA_END >> macros. One possibility is to provide a header that define these macros in >> terms of the new intrinsics. >> >> On Mon, Dec 10, 2018 at 3:59 PM Clement Courbet <courbet at google.com> wrote: >&g...
2018 Dec 03
2
[RFC][llvm-mca] Adding binary support to llvm-mca.
Hi Andrea, On Mon, Dec 03, 2018 at 01:21:33PM +0000, Andrea Di Biagio wrote: > So, I have been thinking a bit more about this whole design. > > The more I think about your suggested design, the more I am convinced that > we should do something more to support ranges in binary object files too. > My understanding is that the reason why we don't support object files in >
2013 Jul 10
2
[LLVMdev] unaligned AVX store gets split into two instructions
...ee if it matches >> the Sandybridge optimization guide. If I remember correctly the >> optimization guide does not have too much information on this, but Elena >> looked over it and said that it made sense. >> >> BTW, you can validate that this is the problem using the IACA tool. It >> performs static analysis on your binary and tells you where the critical >> path is. >> http://software.intel.com/en-us/articles/intel-architecture-code-analyzer >> >> Thanks, >> Nadav >> >> >> On Jul 9, 2013, at 10:01 PM, Eli Friedm...
2013 Jul 10
0
[LLVMdev] unaligned AVX store gets split into two instructions
...at I put in and to see if it matches > the Sandybridge optimization guide. If I remember correctly the > optimization guide does not have too much information on this, but Elena > looked over it and said that it made sense. > > BTW, you can validate that this is the problem using the IACA tool. It > performs static analysis on your binary and tells you where the critical > path is. > http://software.intel.com/en-us/articles/intel-architecture-code-analyzer > > Thanks, > Nadav > > > On Jul 9, 2013, at 10:01 PM, Eli Friedman <eli.friedman at gmail.com>...
2013 Sep 19
0
[LLVMdev] unaligned AVX store gets split into two instructions
...;>> the Sandybridge optimization guide. If I remember correctly the >>> optimization guide does not have too much information on this, but Elena >>> looked over it and said that it made sense. >>> >>> BTW, you can validate that this is the problem using the IACA tool. It >>> performs static analysis on your binary and tells you where the critical >>> path is. >>> http://software.intel.com/en-us/articles/intel-architecture-code-analyzer >>> >>> Thanks, >>> Nadav >>> >>> >>> On...
2013 Jul 10
0
[LLVMdev] unaligned AVX store gets split into two instructions
On Tue, Jul 9, 2013 at 9:01 PM, Zach Devito <zdevito at gmail.com> wrote: > I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads > on AVX. > 3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as a > single instruction (details below). > In a matrix-matrix inner-kernel, I see a ~25% decrease in performance, which > seems to be
2020 May 09
2
[llvm-mca] Resource consumption of ProcResGroups
Hi, I’m trying to work out the behavior of llvm-mca on instructions with ProcResGroups. My current understanding is: When an instruction requests a port group (e.g., HWPort015) and all of its atomic sub-resources (e.g., HWPort0,HWPort1,HWPort5), HWPort015 is marked as “reserved” and is issued in parallel with HWPort0, HWPort1, and HWPort5, blocking future instructions from reserving HWPort015
2013 Jul 10
4
[LLVMdev] unaligned AVX store gets split into two instructions
I'm seeing a difference in how LLVM 3.3 and 3.2 emit unaligned vector loads on AVX. 3.3 is splitting up an unaligned vector load but in 3.2, it was emitted as a single instruction (details below). In a matrix-matrix inner-kernel, I see a ~25% decrease in performance, which seems to be due to this. Any ideas why this changed? Thanks! Zach LLVM Code: define <4 x double> @vstore(<4 x
2018 Mar 02
0
[RFC] llvm-mca: a static performance analysis tool
...ance > issues. You can’t really have one without the other, which is why this is a dream come true. > Given an assembly code sequence, llvm-mca estimates the IPC (instructions per > cycle), as well as hardware resources pressure. The analysis and reporting style > were inspired by the IACA tool from Intel. > > The presence of long data dependency chains, as well as poor usage of hardware > resources may lead to bottlenecks in the back-end. The tool is able to generate > a detailed report which should help with identifying and analyzing sources of > bottlenecks. >...
2018 Mar 02
0
[RFC] llvm-mca: a static performance analysis tool
...ener` interface in llvm-mca. - Analysis passes analyze the `SimulationLog` to extract whatever metric they care about (e.g. port pressure or IPC), or generate an annotated trace. A similar functionality is provided in llvm-mca `XXView` implementations. We also have a IACA-like binary that displays analysis results. For reference, our code can be found here: https://github.com/google/ EXEgesis/tree/master/llvm_sim > llvm-mca uses information which is already available in LLVM (e.g. > scheduling > models) to statically measure the performance of machine c...
2018 Mar 02
5
[RFC] llvm-mca: a static performance analysis tool
...You can’t really have one without the other, which is why this is a dream > come true. > > Given an assembly code sequence, llvm-mca estimates the IPC (instructions > per > cycle), as well as hardware resources pressure. The analysis and reporting > style > were inspired by the IACA tool from Intel. > > The presence of long data dependency chains, as well as poor usage of > hardware > resources may lead to bottlenecks in the back-end. The tool is able to > generate > a detailed report which should help with identifying and analyzing sources > of > bott...
2018 Mar 01
9
[RFC] llvm-mca: a static performance analysis tool
...redict the performance of the code when run on the target, but also help with diagnosing potential performance issues. Given an assembly code sequence, llvm-mca estimates the IPC (instructions per cycle), as well as hardware resources pressure. The analysis and reporting style were inspired by the IACA tool from Intel. The presence of long data dependency chains, as well as poor usage of hardware resources may lead to bottlenecks in the back-end. The tool is able to generate a detailed report which should help with identifying and analyzing sources of bottlenecks. Scheduling models are mostly...