Lewis, Cannada via llvm-dev
2019-Dec-24 17:07 UTC
[llvm-dev] Get llvm-mca results inside opt?
Hi, I am trying to generate performance models for specific pieces of code like an omp.outlined function. Lets say I have the following code: start_collect_parallel_for_data(-1.0,-1.0,-1.0, size, “tag for this region”); #pragma omp parallel for for(auto i = 0; i < size; ++i){ // … do work } stop_collecting_parallel_for_data(); The omp region will get outlined into a new function and what I would like to be be able to do in opt is compile just that function to assembly, for some target that I have chosen, run llvm-mca just on that function, and then replace the -1.0s with uOps Per Cycle, IPC, and Block RThroughput so that my logging code has some estimate of the performance of that region. Is there any reasonable way to do this from inside opt? I already have everything in place to find the start_collect_parallel_for_data calls and find the functions called between start and stop, but I could use some help with the rest of my idea. Thanks -Cannada Lewis
Andrea Di Biagio via llvm-dev
2020-Jan-02 16:09 UTC
[llvm-dev] Get llvm-mca results inside opt?
Hi Lewis, Basically - if I understand correctly - you want to design a pass that uses llvm-mca as a library to compute throughput indicators for your outlined functions. You would then use those indicators to classify outlined functions. llvm-mca doesn't know how to evaluate branches or instructions that affect the control flow. That basically restricts the analysis to single basic blocks that are assumed to be hot. I am not sure if this would be a blocker for your particular use case. llvm-mca only knows how to analyze/simulate a sequence of `mca::Instruction`. So, the expectation is that instructions in input have already been lowered into a sequence of mca::Instruction. The only way currently to obtain an `mca::Instruction` is by calling method `mca:: InstrBuilder::createInstruction()` [1] on every instruction in input (see for example how it is done in llvm-mca.cpp [2]). Unfortunately method `createInstructions()` only works on `MCInst&`. This strongly limits the usability of llvm-mca as a library; the expectation/assumption is that instructions have already been lowered to a sequence of MCInst. Basically the only supported scenarios are: - We have reached code emission stage and instructions have already been lowered into a sequence of MCInst, or - We obtained an MCInst sequence by parsing an assembly code sequence with the help of other llvm libraries (this is what the llvm-mca tool does). It is possible to implement a variant of `createInstruction()` that lowers directly from `MachineInstr` to`mca::Instruction`. That would make the mca library more usable. In particular, it would make it possible to use mca from a post regalloc pass which runs before code emission. Unfortunately, that functionality doesn't exist today (we can definitely implement it though; it may unblock other interesting use cases). That being said, I am not sure if it could help your particular use case. When would you want to run your new pass? Using llvm-mca to analyze llvm IR is unfortunately not possible. To compute the throughput indicators you would need to implement a logic similar to the one implemented by class SummaryView [3]. Ideally, most of that logic could be factored out into a helper class in order to help your particular use case and possibly avoid code duplication. I hope it helps, -Andrea [1] https://github.com/llvm-mirror/llvm/blob/master/include/llvm/MCA/InstrBuilder.h [2] https://github.com/llvm-mirror/llvm/blob/master/tools/llvm-mca/llvm-mca.cpp [3] https://github.com/llvm-mirror/llvm/blob/master/tools/llvm-mca/Views/SummaryView.h On Tue, Dec 24, 2019 at 5:08 PM Lewis, Cannada via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi, > > I am trying to generate performance models for specific pieces of code > like an omp.outlined function. Lets say I have the following code: > > start_collect_parallel_for_data(-1.0,-1.0,-1.0, size, “tag for this > region”); > #pragma omp parallel for > for(auto i = 0; i < size; ++i){ > // … do work > } > stop_collecting_parallel_for_data(); > > The omp region will get outlined into a new function and what I would like > to be be able to do in opt is compile just that function to assembly, for > some target that I have chosen, run llvm-mca just on that function, and > then replace the -1.0s with uOps Per Cycle, IPC, and Block RThroughput so > that my logging code has some estimate of the performance of that region. > > Is there any reasonable way to do this from inside opt? I already have > everything in place to find the start_collect_parallel_for_data calls and > find the functions called between start and stop, but I could use some help > with the rest of my idea. > > Thanks > -Cannada Lewis > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200102/cb74f045/attachment.html>
Lewis, Cannada via llvm-dev
2020-Jan-06 16:51 UTC
[llvm-dev] [EXTERNAL] Get llvm-mca results inside opt?
Andrea, thanks for the advice. On Jan 2, 2020, at 8:09 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com<mailto:andrea.dibiagio at gmail.com>> wrote: Hi Lewis, Basically - if I understand correctly - you want to design a pass that uses llvm-mca as a library to compute throughput indicators for your outlined functions. You would then use those indicators to classify outlined functions. Yes basically, the idea is to build a performance model for that outlined function. llvm-mca doesn't know how to evaluate branches or instructions that affect the control flow. That basically restricts the analysis to single basic blocks that are assumed to be hot. I am not sure if this would be a blocker for your particular use case. That would be okay and is something we would need to work around on our end anyways since we don’t know the branch probability. llvm-mca only knows how to analyze/simulate a sequence of `mca::Instruction`. So, the expectation is that instructions in input have already been lowered into a sequence of mca::Instruction. The only way currently to obtain an `mca::Instruction` is by calling method `mca::InstrBuilder::createInstruction()` [1] on every instruction in input (see for example how it is done in llvm-mca.cpp [2]). Unfortunately method `createInstructions()` only works on `MCInst&`. This strongly limits the usability of llvm-mca as a library; the expectation/assumption is that instructions have already been lowered to a sequence of MCInst. Basically the only supported scenarios are: - We have reached code emission stage and instructions have already been lowered into a sequence of MCInst, or - We obtained an MCInst sequence by parsing an assembly code sequence with the help of other llvm libraries (this is what the llvm-mca tool does). It is possible to implement a variant of `createInstruction()` that lowers directly from `MachineInstr` to`mca::Instruction`. That would make the mca library more usable. In particular, it would make it possible to use mca from a post regalloc pass which runs before code emission. Unfortunately, that functionality doesn't exist today (we can definitely implement it though; it may unblock other interesting use cases). That being said, I am not sure if it could help your particular use case. When would you want to run your new pass? Using llvm-mca to analyze llvm IR is unfortunately not possible. I would want to run my pass as late as possible so that all optimizations have been run on the outlined function. The values passed into the capture function should never influence the outlined function so I could in principle do something with a python script like: 1. Compile to ASM with MCA enable comments on the function I care about. 2. Run llvm-mca on that region 3. Source to source the original code with the new MCA information. Since this solution is not awesome and we may have many functions in a given TU that we care about I was hoping to find a better way. To compute the throughput indicators you would need to implement a logic similar to the one implemented by class SummaryView [3]. Ideally, most of that logic could be factored out into a helper class in order to help your particular use case and possibly avoid code duplication. Thanks, I’ll look into it. I hope it helps, -Andrea [1] https://github.com/llvm-mirror/llvm/blob/master/include/llvm/MCA/InstrBuilder.h [2] https://github.com/llvm-mirror/llvm/blob/master/tools/llvm-mca/llvm-mca.cpp [3] https://github.com/llvm-mirror/llvm/blob/master/tools/llvm-mca/Views/SummaryView.h On Tue, Dec 24, 2019 at 5:08 PM Lewis, Cannada via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Hi, I am trying to generate performance models for specific pieces of code like an omp.outlined function. Lets say I have the following code: start_collect_parallel_for_data(-1.0,-1.0,-1.0, size, “tag for this region”); #pragma omp parallel for for(auto i = 0; i < size; ++i){ // … do work } stop_collecting_parallel_for_data(); The omp region will get outlined into a new function and what I would like to be be able to do in opt is compile just that function to assembly, for some target that I have chosen, run llvm-mca just on that function, and then replace the -1.0s with uOps Per Cycle, IPC, and Block RThroughput so that my logging code has some estimate of the performance of that region. Is there any reasonable way to do this from inside opt? I already have everything in place to find the start_collect_parallel_for_data calls and find the functions called between start and stop, but I could use some help with the rest of my idea. Thanks -Cannada Lewis _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200106/c4f098a3/attachment.html>
Possibly Parallel Threads
- [EXTERNAL] Get llvm-mca results inside opt?
- [llvm-mca] Resource consumption of ProcResGroups
- [llvm-mca] What's the difference between Rthroughput and "total cycles" in llvm-mca
- [llvm-mca] What's the difference between Rthroughput and "total cycles" in llvm-mca
- [llvm-mca] Resource consumption of ProcResGroups